A xe_hw_engine_group is a group of hw engines. Two hw engines belong to
the same xe_hw_engine_group if one hw engine cannot make progress while
the other is stuck on a page fault.
Typically, hw engines of the same group share some resources such as EUs,
but this really depends on the hardware configuration of the platforms.
The simple engines partitioning proposed here might be too conservative
but is intended to work for existing platforms. It can be optimized later
if more sets of independent engines are identified.
The hw engine groups are intended to be used in the context of faulting
long-running jobs submissions.
v2: Move to own files, improve error handling (Matt Brost)
v3: Fix build issue reported by CI, improve commit message (Matt Roper)
v4: Fix kernel doc
v5: Add switch case for XE_ENGINE_CLASS_OTHER
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-2-francois.dugast@intel.com
Enable feature to allow memory reads to take a priority memory path.
This will reduce latency on the read path, but may introduce read after
write (RAW) hazards as read and writes will no longer be ordered.
To avoid RAW hazards, SW can use the MI_MEM_FENCE command or any other
MI command that generates non posted memory writes. This will ensure
data is coherent in memory prior to execution of commands which read
data from memory. RCS,BCS and CCS support this feature.
No pattern identified in KMD that could lead to a hazard.
v2: Modify commit message, enable priority mem read feature for media,
modify version range, modify bspec detail (Matt Roper)
v3: Rebase, fix cramped line-wrapping (jcavitt)
v4: Rebase
v5: Media does not support Priority Mem Read. Modify commit
to reflect the same.
v6: Rebase
Bspec: 60298, 60237, 60187, 60188
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Carl Zhang <carl.zhang@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240731195622.1868401-1-pallavi.mishra@intel.com
Per client engine utilization uses RING_TIMESTAMP to return
drm-total-cycles to the user. Current code uses XE_FW_GT to read this
register on the first available engine in a GT. When testing on DG2, it
is observed that this value is 0 when running test on some engines. To
resolve that, get the hwe domain specific FW for reading the engine
timestamp.
v2:
- update commit message
- use domain specific FW (Matt)
v3:
- Drop check for hwe in the helper (Matt, Michal)
v4:
- checkpatch fixes
v5: Rebase
Fixes: 188ced1e0f ("drm/xe/client: Print runtime to fdinfo")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240627235105.2631135-1-umesh.nerlige.ramappa@intel.com
The flags stored in the BO grew over time without following
much a naming pattern. First of all, get rid of the _BIT suffix that was
banned from everywhere else due to the guideline in
drivers/gpu/drm/i915/i915_reg.h that xe kind of follows:
Define bits using ``REG_BIT(N)``. Do **not** add ``_BIT`` suffix to the name.
Here the flags aren't for a register, but it's good practice to keep it
consistent.
Second divergence on names is the use or not of "CREATE". This is
because most of the flags are passed to xe_bo_create*() family of
functions, changing its behavior. However, since the flags are also
stored in the bo itself and checked elsewhere in the code, it seems
better to just omit the CREATE part.
With those 2 guidelines, all the flags are given the form
XE_BO_FLAG_<FLAG_NAME> with the following commands:
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i \
-e "s/XE_BO_\([_A-Z0-9]*\)_BIT/XE_BO_\1/g" \
-e 's/XE_BO_CREATE_/XE_BO_FLAG_/g'
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i -r \
-e 's/XE_BO_(DEFER_BACKING|SCANOUT|FIXED_PLACEMENT|PAGETABLE|NEEDS_CPU_ACCESS|NEEDS_UC|INTERNAL_TEST|INTERNAL_64K|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g'
And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to
follow the coding style.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Starting on Xe2, the GSCCS engine reset is a 2-step process. When the
driver or the GuC hits the GDRST register, the CS is immediately reset
and a success is reported, but the GSC shim continues its reset in the
background. While the shim reset is ongoing, the CS is able to accept
new context submission, but any commands that require the shim will
be stalled until the reset is completed. This means that we can keep
submitting to the GSCCS as long as we make sure that the preemption
timeout is big enough to cover any delay introduced by the reset; since
the GSC preempt timeout is not tunable at runtime, we only need to check
that the value set in kconfig is big enough (and increase it if it
isn't).
When the shim reset completes, a specific CS interrupt is triggered,
in response to which we need to check the GSCI_TIMER_STATUS register
to see if the reset was successful or not.
Note that the GSCI_TIMER_STATUS register is not power save/restored,
so it gets reset on MC6 entry. However, a reset failure stops MC6,
so in that scenario we're always guaranteed to find the correct value.
Since we can't check the register within interrupt context, the
existing GSC worker has been updated to handle it.
The expected action to take on ER failure is to trigger a driver FLR,
but we still don't support that, so for now we just print an error. A
comment has been added to the code to keep track of the FLR requirement.
v2: Add a check for the initial timeout value (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240304145634.820684-1-daniele.ceraolospurio@intel.com
Recommendation is to read FUSE4 register to check if WMTP has been
enabled/disabled by HW. If enabled we don't need to do anything special,
however if disabled recommendation is to also disable the WMTP mode in
the FF_SLICE_CS_CHICKEN2 register, falling back to thread-group and
mid-batch preemption only. However on Linux, the per-context CS_CHICKEN1
is how userspace controls pre-emption, so instead use the default lrc to
disable WMTP using CS_CHICKEN1, if disabled by HW. Userspace is still
free to set CS_CHICKEN1 to whatever they want later.
v2: remove redundant version check and also add descriptive name(Matt)
v3: remove usage of REG_FIELD_GET(Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20240104182615.21327-1-nirmoy.das@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
When interrupts are delivered using memory based mechanism, engines
will write status to the report page at the offset (in bytes) that
corresponds to their interrupt bit from the GT_INTR_DW register.
Add engine interrupt offset definitions to engine info as we will
need this to process memory based interrupts.
Bspec: 46149, 50829, 50844
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-6-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Disable dynamic HW load balancing of compute resource assignment
to engines and instead enabled fixed mode of mapping compute
resources to engines on all platforms with more than one compute
engine.
By default enable only one CCS engine with all compute slices
assigned to it. This is the desired configuration for common
workloads.
PVC platform supports only the fixed CCS mode (workaround 16016805146).
v2: Rebase, make it platform agnostic
v3: Minor code refactoring
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A helper for managed BO allocations makes it possible to remove specific
"fini" actions and will simplify the following patches adding ability to
execute a release action for specific BO directly.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the GSCCS to the media_xelpmp engine list. Note that since the
GSCCS is only used with the GSC FW, we can consider it disabled if we
don't have the FW available.
v2: mark GSCCS as allowed on the media IP in kunit tests
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It was reading (base) + 0x8c but that is not a valid register
and instead it should read (base) + 0x68.
So here reading the correct register and removing the wrong and
duplicated.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The guc_submission_enabled() function is being used as a boolean toggle
for all firmwares and all related features, not just GuC submission. We
could add additional flags/functions to distinguish and allow different
use-cases (e.g. loading HuC but not using GuC submission), but given
that not using GuC is a debug-only scenario having a global switch for
all FWs is enough. However, we want to make it clear that this switch
turns off everything, so rename it to uc_enabled().
v2: rebase on s/XE_WARN_ON/xe_assert
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The XE_WARN_ON macro maps to WARN_ON which is not justified
in many cases where only a simple debug check is needed.
Replace the use of the XE_WARN_ON macro with the new xe_assert
macros which relies on drm_*. This takes a struct drm_device
argument, which is one of the main changes in this commit. The
other main change is that the condition is reversed, as with
XE_WARN_ON a message is displayed if the condition is true,
whereas with xe_assert it is if the condition is false.
v2:
- Rebase
- Keep WARN splats in xe_wopcm.c (Matt Roper)
v3:
- Rebase
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>