Add helpers to safely add and delete the exec queues attached to a hw
engine group, and make use them at the time of creation and destruction of
the exec queues. Keeping track of them is required to control the
execution mode of the hw engine group.
v2: Improve error handling and robustness, suspend exec queues created in
fault mode if group in dma-fence mode, init queue link (Matt Brost)
v3: Delete queue from hw engine group when it is destroyed by the user,
also clean up at the time of closing the file in case the user did
not destroy the queue
v4: Use correct list when checking if empty, do not add the queue if VM
is in xe_vm_in_preempt_fence_mode (Matt Brost)
v5: Remove unrelated newline, add checks and asserts for group, unwind on
suspend failure (Matt Brost)
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-4-francois.dugast@intel.com
A xe_hw_engine_group is a group of hw engines. Two hw engines belong to
the same xe_hw_engine_group if one hw engine cannot make progress while
the other is stuck on a page fault.
Typically, hw engines of the same group share some resources such as EUs,
but this really depends on the hardware configuration of the platforms.
The simple engines partitioning proposed here might be too conservative
but is intended to work for existing platforms. It can be optimized later
if more sets of independent engines are identified.
The hw engine groups are intended to be used in the context of faulting
long-running jobs submissions.
v2: Move to own files, improve error handling (Matt Brost)
v3: Fix build issue reported by CI, improve commit message (Matt Roper)
v4: Fix kernel doc
v5: Add switch case for XE_ENGINE_CLASS_OTHER
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-2-francois.dugast@intel.com
When steering MCR register ranges of type "DSS," the group_id and
instance_id values are calculated by dividing the DSS pool according to
the size of a gslice or cslice, depending on the platform. These values
haven't changed much on past platforms, so we've been able to hardcode
the proper divisor so far. However the layout may not be so fixed on
future platforms so the proper, future-proof way to determine this is by
using some of the attributes from the GuC's hwconfig table. The
hwconfig has two attributes reflecting the architectural maximum slice
and subslice counts (i.e., before any fusing is considered) that can be
used for the purposes of calculating MCR steering targets.
If the hwconfig is lacking the necessary values (which should only be
possible on older platforms before these attributes were added), we can
still fall back to the old hardcoded values. Going forward the hwconfig
is expected to always provide the information we need on newer
platforms, and any failure to do so will be considered a bug in the
firmware that will prevent us from switching to the buggy firmware
release.
It's worth noting that over time GuC's hwconfig has provided a couple
different keys with similar-sounding descriptions. For our purposes
here, we only trust the newer key "70" which has supplanted the
similarly-named key "2" that existed on older platforms.
Bspec: 73210
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240815172602.2729146-4-matthew.d.roper@intel.com
Get drm-xe-next on v6.11-rc2 and synchronized with drm-intel-next for
the display side. This resolves the current conflict for the
enable_display module parameter and allows further pending refactors.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Having two methods to wait on GT TLB invalidations is not ideal. Remove
xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences.
In addition to two methods being less than ideal, once GT TLB
invalidations are coalesced the seqno cannot be assigned during
xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait
would not have a seqno to wait one. A fence however can be armed and
later signaled.
v3:
- Add explaination about coalescing to commit message
v4:
- Don't put dma fence if defined on stack (CI)
v5:
- Initialize ret to zero (CI)
v6:
- Use invalidation_fence_signal helper in tlb timeout (Matthew Auld)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-3-matthew.brost@intel.com
(cherry picked from commit 61ac035361)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When validating VF config on the media GT, we may wrongly report
that VF is already partially configured on it, as we consider GGTT
and LMEM provisioning done on the primary GT (since both GGTT and
LMEM are tile-level resources, not a GT-level).
This will cause skipping a VF auto-provisioning on the media-GT and
in result will block a VF from successfully initialize that GT.
Fix that by considering GGTT and LMEM configurations only when
checking if a VF provisioning is complete, and omit GGTT and LMEM
when reporting empty/partial provisioning.
Fixes: 234670cea9 ("drm/xe/pf: Skip fair VFs provisioning if already provisioned")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806180516.618-1-michal.wajdeczko@intel.com
(cherry picked from commit 5bdacb0907)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
xe_gt_enable_host_l2_vram() is reading the XE2_GAMREQSTRM_CTRL register
that is currently missing the MCR annotation. However, just adding the
annotation doesn't work as this function is called before MCR handling
is initialized in xe_gt_mcr_init().
xe_gt_enable_host_l2_vram() is used to implement WA 16023588340 that
needs to be done as early as possible during initialization in order to
be effective since the MMIO writes impact it. In the failure scenario,
driver would simply not be able to bind successfully.
Moving xe_gt_enable_host_l2_vram() later, after MCR initialization is
done, only incurs a few additional HW accesses, particularly when
loading GuC for hwconfig. Binding/unbinding the driver 100 times in BMG
still works so it should be ok to start handling the WA a little bit
later. This is sufficient to allow adding the MCR annotation to
XE2_GAMREQSTRM_CTRL.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240814095614.909774-2-tejas.upadhyay@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
The different approach used by xe regarding the initialization of
display HW has been proved a great addition for early driver bring up:
core xe can be tested without having all the bits sorted out on the
display side.
On the other hand, the approach exposed by i915-display is to *actively*
disable the display by programming it if needed, i.e. if it was left
enabled by firmware. It also has its use to make sure the HW is actually
disabled and not wasting power.
However having both the way it is in xe doesn't expose a good interface
wrt module params. From modinfo:
disable_display:Disable display (default: false) (bool)
enable_display:Enable display (bool)
Rename enable_display to probe_display to try to convey the message that
the HW is being touched and improve the module param description. To
avoid confusion, the enable_display is renamed everywhere, not only in
the module param. New description for the parameters:
disable_display:Disable display (default: false) (bool)
probe_display:Probe display HW, otherwise it's left untouched (default: true) (bool)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240813141931.3141395-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Early in the development of Xe we identified an issue with SVG state
handling on DG2 and MTL (and later on Xe2 as well). In
commit 72ac304769 ("drm/xe: Emit SVG state on RCS during driver load
on DG2 and MTL") and commit fb24b858a2 ("drm/xe/xe2: Update SVG state
handling") we implemented our own workaround to prevent SVG state from
leaking from context A to context B in cases where context B never
issues a specific state setting.
The hardware teams have now created official workaround Wa_14019789679
to cover this issue. The workaround description only requires emitting
3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction
that would potentially remain unset by a context B, but still cause
notable issues if unwanted values were inherited from context A.
However since we already have a more extensive implementation that emits
the entire SVG state and prevents _any_ SVG state from unintentionally
leaking, we'll stick with our existing implementation just to be safe.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240812181042.2013508-2-matthew.d.roper@intel.com
When validating VF config on the media GT, we may wrongly report
that VF is already partially configured on it, as we consider GGTT
and LMEM provisioning done on the primary GT (since both GGTT and
LMEM are tile-level resources, not a GT-level).
This will cause skipping a VF auto-provisioning on the media-GT and
in result will block a VF from successfully initialize that GT.
Fix that by considering GGTT and LMEM configurations only when
checking if a VF provisioning is complete, and omit GGTT and LMEM
when reporting empty/partial provisioning.
Fixes: 234670cea9 ("drm/xe/pf: Skip fair VFs provisioning if already provisioned")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240806180516.618-1-michal.wajdeczko@intel.com