The LRC W/A currently checks for LRC being iomem in some places, while
in others it checks if the scratch buffer is non-NULL. This
inconsistency causes issues with the VF post-migration recovery code,
which blindly passes in a scratch buffer.
This patch standardizes the check by consistently verifying whether the
LRC is iomem to determine if the scratch buffer should be used.
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-8-matthew.brost@intel.com
Both vm->xef and XE_LRC_CREATE_USER_CTX indicate in xe_lrc_init that
the context originates from userspace. However, XE_LRC_CREATE_USER_CTX
has a broader scope as it may be set even when no vm->xef is present.
The XE_BO_FLAG_PINNED_LATE_RESTORE flag can be extended to both cases,
so there is no point in handling the two cases separately.
Let's combine vm->xef and XE_LRC_CREATE_USER_CTX checks to detect
userspace context.
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20251003162619.1984236-6-piotr.piorkowski@intel.com
Currently it's only allowed for render and compute. Going forward we
want to enable it for more engine classes. Let the XE_LRC_FLAG_INDIRECT_CTX
flag (and thus gt_engine_needs_indirect_ctx()) be the deciding factor
for its availability.
While at it, add the missing const to rcs_funcs array. Since
CTX_INDIRECT_CTX_OFFSET_DEFAULT already matches the HW default and
gt_engine_needs_indirect_ctx() only ever enables it for rcs/ccs, there
is no change in behavior, it's only preparation for future use case.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250916-wa-bb-cmds-v5-5-306bddbc15da@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Introduce an xe_bo_create_pin_map_novm() function that does not
take the drm_exec paramenter to simplify the conversion of many
callsites.
For the rest, ensure that the same drm_exec context that was used
for locking the vm is passed down to validation.
Use xe_validation_guard() where appropriate.
v2:
- Avoid gotos from within xe_validation_guard(). (Matt Brost)
- Break out the change to pf_provision_vf_lmem8 to a separate
patch.
- Adapt to signature change of xe_validation_guard().
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com
The WA buffer we use to capture context utilization contains GGTT
references. This means its instructions have to be either fixed or
re-emitted during VF post-migration recovery.
This patch adds re-emitting content of the utilization WA BB during
the recovery.
The way we write to vram requires scratch buffer to be used before
the whole block is memcopied. We are re-using a scratch buffer
introduced in earlier part of the recovery. This is not a performance
optimization, but a necessity to avoid creating dependencies between
locks.
v2: Notable rebase after "Prepare WA BB setup for more users" patch
v3: Added error propagation
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-8-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
All contexts require an update of state data, as the data includes
GGTT references to memirq-related buffers.
Default contexts need these references updated as well, because they
are not refreshed when a new context is created from them.
The way we write to vram requires scratch buffer to be used
before the whole block is memcopied. Since using kalloc() within
specific recovery functions would lead to unintended relations
between locks, we are allocating the buffer earlier, before
any locks are taken. The same buffer will be used for other steps
of the recovery.
v2: Update addresses by xe_lrc_write_ctx_reg() rather than
set_memory_based_intr()
v3: Renamed parameter, reordered parameters in some functs
v4: Check if have MEMIRQ, move `xe_gt*` funct to proper file
v5: Revert back to requiring scratch buffer, but allocate it
earlier this time
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michal Winiarski <michal.winiarski@intel.com>
Acked-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Michal Winiarski <michal.winiarski@intel.com>
Link: https://lore.kernel.org/r/20250802031045.1127138-6-tomasz.lis@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Add XeLP workaround 16010904313.
The description calls for it to be emitted as the indirect context buffer
workaround for render and compute, and from the workaround batch buffer
for the other engines. Therefore we plug into the previously added
respective top level emission functions.
The actual command streamer programming sequence differs from what is
described in the PRM, in that it assumes the listed LRCA offset was
supposed to actually refer to the location of the CTX_TIMESTAMP register
instead of LRCA + 0x180c (which is in GPR space). Latter appears to make
more sense under the assumption that multiple writes are helping with
restoring the CTX_TIMESTAMP register content from the saved context state.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-8-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.
The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.
Nuke these leftover comments.
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.
v2:
- Fix typo in kernel doc (Ashutosh)
- Fix kunit (CI)
- Fix line wrap (Checkpatch)
v3:
- Fix sriov build (CI)
v4:
- Fix display build (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
Context Timestamp (CTX_TIMESTAMP) in the LRC accumulates the run ticks
of the context, but only gets updated when the context switches out. In
order to check how long a context has been active before it switches
out, two things are required:
(1) Determine if the context is running:
To do so, we program the WA BB to set an initial value for CTX_TIMESTAMP
in the LRC. The value chosen is 1 since 0 is the initial value when the
LRC is initialized. During a query, we just check for this value to
determine if the context is active. If the context switched out, it
would overwrite this location with the actual CTX_TIMESTAMP MMIO value.
Note that WA BB runs as the last part of the context restore, so reusing
this LRC location will not clobber anything.
(2) Calculate the time that the context has been active for:
The CTX_TIMESTAMP ticks only when the context is active. If a context is
active, we just use the CTX_TIMESTAMP MMIO as the new value of
utilization. While doing so, we need to read the CTX_TIMESTAMP MMIO
for the specific engine instance. Since we do not know which instance
the context is running on until it is scheduled, we also read the
ENGINE_ID MMIO in the WA BB and store it in the PPHSWP.
Using the above 2 instructions in a WA BB, capture active context
utilization.
v2: (Matt Brost)
- This breaks TDR, fix it by saving the CTX_TIMESTAMP register
"drm/xe: Save CTX_TIMESTAMP mmio value instead of LRC value"
- Drop tile from LRC if using gt
"drm/xe: Save the gt pointer in LRC and drop the tile"
v3:
- Remove helpers for bb_per_ctx_ptr (Matt)
- Add define for context active value (Matt)
- Use 64 bit CTX TIMESTAMP for platforms that support it. For platforms
that don't, live with the rare race. (Matt, Lucas)
- Convert engine id to hwe and get the MMIO value (Lucas)
- Correct commit message on when WA BB runs (Lucas)
v4:
- s/GRAPHICS_VER(...)/xe->info.has_64bit_timestamp/ (Matt)
- Drop support for active utilization on a VF (CI failure)
- In xe_lrc_init ensure the lrc value is 0 to begin with (CI regression)
v5:
- Minor checkpatch fix
- Squash into previous commit and make TDR use 32-bit time
- Update code comment to match commit msg
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4532
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250509161159.2173069-8-umesh.nerlige.ramappa@intel.com
With the idea of having more pinned objects using the blitter engine
where possible, during suspend/resume, mark the pinned objects which
can be done during the late phase once submission/migration has been
setup. Start out simple with lrc and page-tables from userspace.
v2:
- s/early_restore/late_restore; early restore was way too bold with too
many places being impacted at once.
v3:
- Split late vs early into separate lists, to align with newly added
apply-to-pinned infra.
v4:
- Rebase.
v5:
- Make sure we restore the late phase kernel_bo_present in igpu.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250403102440.266113-13-matthew.auld@intel.com
The context of each engine starts with a 4k memory space for the
"Per-process HW status page" (PPHWSP). In xe_gt_lrc_size(), we have been
implicitly accounting for that page in the switch statement on the
engine class.
Since the PPHWSP is common to all engines, let's extract that into it's
own assignment. That makes the context structure more explicit in the
code and aligns better with the descriptions in Bspec.
Another advantage of keeping it separate is that now the sizes used in
the switch statement match the sizes we calculate engine-specific
context images, which have their own Bspec pages.
Bspec: 67296, 60159, 45554
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20250328-explicit-pphwsp-size-in-xe_gt_lrc_size-v1-1-ceb9ce7c8bc1@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Userspace is required to mark a queue as using PXP to guarantee that the
PXP instructions will work. In addition to managing the PXP sessions,
when a PXP queue is created the driver will set the relevant bits in
its context control register.
On submission of a valid PXP queue, the driver will validate all
encrypted objects mapped to the VM to ensured they were encrypted with
the current key.
v2: Remove pxp_types include outside of PXP code (Jani), better comments
and code cleanup (John)
v3: split the internal PXP management to a separate patch for ease of
review. re-order ioctl checks to always return -EINVAL if parameters are
invalid, rebase on msix changes.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-9-daniele.ceraolospurio@intel.com
When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1. the guc-captured register-dump is already there.
2. the driver is wedged.mode > 1, so GuC-engine-reset / GuC-err-capture
will not happen.
3. the user has started the driver in execlist-submission mode.
4. the guc-captured register-dump is not ready yet so we force GuC to kill
that context now, but:
A. we don't know yet if GuC will be successful on the engine-reset
and get the guc-err-capture, else kmd will do a manual reset later
OR B. guc will be successful and we will get a guc-err-capture
shortly.
So to accomdate the scenarios of 2 and 4A, we will need to do a manual KMD
capture first(which is not be reliable in guc-submission mode) and decide
later if we need to use that for the cases of 2 or 4A. So this flow is
part of the implementation for this patch.
Provide xe_guc_capture_get_reg_desc_list to get the register dscriptor
list.
Add manual capture by read from hw engine if GuC capture is not ready.
If it becomes ready at later time, GuC sourced data will be used.
Although there may only be a small delay between (1) the check for whether
guc-err-capture is available at the start of guc_exec_queue_timedout_job
and (2) the decision on using a valid guc-err-capture or manual-capture,
lets not take any chances and lock the matching node down so it doesn't
get re-claimed if GuC-Err-Capture subsystem is running out of pre-cached
nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-6-zhanjun.dong@intel.com
When using MSI-X, hw engines report interrupt status and source to engine
instance 0. For this scenario, in order to differentiate between the
engines, we need to pass different status/source pointers in the LRC.
The requirements on those pointers are:
- Interrupt status should be 4KiB aligned
- Interrupt source should be 64 bytes aligned
To accommodate this, we duplicate the current memirq page layout -
allocating a page for each engine instance and pass this page in the LRC.
Note that the same page can be reused for different engine types.
For example, an LRC executing on CCS #x will have pointers to page #x,
and an LRC executing on BCS #x will have the same pointers. Thus, to
locate the proper page, the pointer accessors were modified to receive
the hw engine.
Signed-off-by: Ilia Levi <ilia.levi@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240918053942.1331811-5-illevi@habana.ai
Early in the development of Xe we identified an issue with SVG state
handling on DG2 and MTL (and later on Xe2 as well). In
commit 72ac304769 ("drm/xe: Emit SVG state on RCS during driver load
on DG2 and MTL") and commit fb24b858a2 ("drm/xe/xe2: Update SVG state
handling") we implemented our own workaround to prevent SVG state from
leaking from context A to context B in cases where context B never
issues a specific state setting.
The hardware teams have now created official workaround Wa_14019789679
to cover this issue. The workaround description only requires emitting
3DSTATE_MESH_CONTROL, since they believe that's the only SVG instruction
that would potentially remain unset by a context B, but still cause
notable issues if unwanted values were inherited from context A.
However since we already have a more extensive implementation that emits
the entire SVG state and prevents _any_ SVG state from unintentionally
leaking, we'll stick with our existing implementation just to be safe.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240812181042.2013508-2-matthew.d.roper@intel.com