Upon waiting a request (when asked), we gave that request a small
priority boost, not enough for it to cause preemption, but enough for it
to be scheduled next before all equals. We also used that bit to give
new clients a small priority boost, similar to FQ_CODEL, such that we
favoured short interactive tasks ahead of long running streams.
However, this is causing lots of complications with timeslicing where we
both want to honour the boost and yet ignore it. Those complications
cause unexpected user behaviour (tasks not being timesliced and run
concurrently as epxected), and the easiest way to resolve that is to
remove the boost. Hopefully, we can find a compromise again if we need
to, but in theory timeslicing itself and future more advanced schedulers
should give us the interactivity boost we seek.
Testcase: igt/gem_exec_schedule/lateslice
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200507152338.7452-3-chris@chris-wilson.co.uk
The bspec is confusing on the nature of the upper 32bits of the LRC
descriptor. Once upon a time, it said that it uses the upper 32b to
decide if it should perform a lite-restore, and so we must ensure that
each unique context submitted to HW is given a unique CCID [for the
duration of it being on the HW]. Currently, this is achieved by using
a small circular tag, and assigning every context submitted to HW a
new id. However, this tag is being cleared on repinning an inflight
context such that we end up re-using the 0 tag for multiple contexts.
To avoid accidentally clearing the CCID in the upper 32bits of the LRC
descriptor, split the descriptor into two dwords so we can update the
GGTT address separately from the CCID.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796
Fixes: 2935ed5339 ("drm/i915: Remove logical HW ID")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.5+
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:205: warning: Excess function parameter 'supported' description in 'intel_uc_fw_init_early'
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:205: warning: Excess function parameter 'platform' description in 'intel_uc_fw_init_early'
drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c:205: warning: Excess function parameter 'rev' description in 'intel_uc_fw_init_early'
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:696: warning: Function parameter or member 'log' not described in 'intel_guc_log_info'
drivers/gpu/drm/i915/gt/uc/intel_guc_log.c:696: warning: Excess function parameter 'guc' description in 'intel_guc_log_info'
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200330212254.18236-1-chris@chris-wilson.co.uk
We currently initialize HuC support based on GuC being enabled in
modparam; this means that huc_is_supported() can return false on HW that
does have a HuC when enable_guc=0. The rationale for this behavior is
that HuC requires GuC for authentication and therefore is not supported
by itself. However, we do not allow defining HuC fw wthout GuC fw and
selecting HuC in modparam implicitly selects GuC as well, so we can't
actually hit a scenario where HuC is selected alone. Therefore, we can
flip the support check to reflect the HW capabilities and fw
availability, which is more intuitive and will make it cleaner to log
HuC the difference between not supported in HW and not selected.
Removing the difference between GuC and HuC also allows us to simplify
the init_early, since we don't need to differentiate the support based
on the type of uC.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200326181121.16869-4-daniele.ceraolospurio@intel.com
We are quite trigger happy in cleaning up the firmware blobs, as we do
so from several error/fini paths in GuC/HuC/uC code. We do have the
__uc_cleanup_firmwares cleanup function, which unwinds
__uc_fetch_firmwares and is already called both from the error path of
gem_init and from gem_driver_release, so let's stop cleaning up from
all the other paths.
The fact that we're not cleaning the firmware immediately means that
we can't consider firmware availability as an indication of
initialization success. A "LOADABLE" status has been added to
indicate that the initialization was successful, to be used to
selectively load HuC only if HuC init has completed (HuC init failure
is not considered a fatal error).
v2: s/ready_to_load/loadable (Michal), only run guc/huc_fini if the
fw is in loadable state
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> #v1
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-9-daniele.ceraolospurio@intel.com
Now that we can differentiate wants vs uses GuC/HuC, intel_uc_init is
restricted to running only if we have successfully fetched the required
blob(s) and are committed to using the microcontroller(s).
The only remaining thing that can go wrong in uc_init is the allocation
of GuC/HuC related objects; if we get such a failure better to bail out
immediately instead of wedging later, like we do for e.g.
intel_engines_init, since without objects we can't use the HW, including
not being able to attempt the firmware load.
While at it, remove the unneeded fw_cleanup call (this is handled
outside of gt_init) and add a probe failure injection point for testing.
Also, update the logs for <g/h>uc_init failures to probe_failure() since
they will cause the driver load to fail.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-8-daniele.ceraolospurio@intel.com
To be able to setup GuC submission functions during engine init we need
to commit to using GuC as soon as possible.
Currently, the only thing that can stop us from using the
microcontrollers once we've fetched the blobs is a fundamental
error (e.g. OOM); given that if we hit such an error we can't really
fall-back to anything, we can "officialize" the FW fetching completion
as the moment at which we're committing to using GuC.
To better differentiate this case, the uses_guc check, which indicates
that GuC is supported and was selected in modparam, is renamed to
wants_guc and a new uses_guc is introduced to represent the case were
we're committed to using the GuC. Note that uses_guc does still not imply
that the blob is actually loaded on the HW (is_running is the check for
that). Also, since we need to have attempted the fetch for the result
of uses_guc to be meaningful, we need to make sure we've moved away
from INTEL_UC_FIRMWARE_SELECTED.
All the GuC changes have been mirrored on the HuC for coherency.
v2: split fetch return changes and new macros to their own patches,
support HuC only if GuC is wanted, improve "used" state
description (Michal)
v3: s/wants_huc/uses_huc in uc_init_wopcm
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Fernando Pacheco <fernando.pacheco@intel.com> #v1
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200218223327.11058-6-daniele.ceraolospurio@intel.com
On Braswell and Broxton (also known as Valleyview and Apollolake), we
need to serialise updates of the GGTT using the big stop_machine()
hammer. This has the side effect of appearing to lockdep as a possible
reclaim (since it uses the cpuhp mutex and that is tainted by per-cpu
allocations). However, we want to use vm->mutex (including ggtt->mutex)
from within the shrinker and so must avoid such possible taints. For this
purpose, we introduced the asynchronous vma binding and we can apply it
to the PIN_GLOBAL so long as take care to add the necessary waits for
the worker afterwards.
Closes: https://gitlab.freedesktop.org/drm/intel/issues/211
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20200130181710.2030251-3-chris@chris-wilson.co.uk
We should never BUG_ON on any corruption in CTB descriptor as
data there can be also modified by the GuC. Instead we can
use flag "is_in_error" to indicate that we will not process
any further messages over this CTB (until reset). While here
move descriptor error reporting to the function that actually
touches that descriptor.
Note that unexpected content of the specific CT messages, that
still complies with generic CT message format, shall not trigger
disabling whole CTB, as that might just indicate new unsupported
message types.
v2: drop redundant message (Daniele)
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20200117082039.65644-2-michal.wajdeczko@intel.com
The GuC supports having multiple CT buffer pairs and we designed our
implementation with that in mind. However, the different channels are not
processed in parallel within the GuC, so there is very little advantage
in having multiple channels (independent locks?), compared to the
drawbacks (one channel can starve the other if messages keep being
submitted to it). Given this, it is unlikely we'll ever add a second
channel and therefore we can simplify our code by removing the
flexibility.
v2: split substructure grouping to separate patch, improve docs (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217012316.13271-3-daniele.ceraolospurio@intel.com
The only difference from the GuC POV between guc_communication_stop and
guc_communication_disable is that the former can be called after GuC
has been reset. Instead of having two separate paths, we can just skip
the call into GuC in the disabling path and re-use that.
Note that by using the disable() path instead of the stop() one there
are two additional changes in SW side for the stop path:
- interrupts are now disabled before disabling the CT, which is ok
because we do not want interrupts with CT disabled;
- guc_get_mmio_msg() is called in the stop case as well, which is ok
because if there are errors before the reset we do want to record
them.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191217012316.13271-1-daniele.ceraolospurio@intel.com