Add a test for sending messages from every GuC to every other GuC to
test G2G communications.
Note that, being a debug only feature, the test interface only exists
in pre-production builds of the GuC firmware.
v2: Fix 'default' case to actually use the driver's registration code
as well as allocation. Add comments explaining the different test
types. Fix (C) date and an assert. Review feedback from Daniele.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com
All recent platforms (including all the ones officially supported by the
Xe driver) do not allow concurrent execution of RCS and CCS workloads
from different address spaces, with the HW blocking the context switch
when it detects such a scenario.
The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a
context it knows will not be able to execute. This, however, causes a new
problem: if RCS and CCS queues have pending workloads from different
address spaces, the GuC needs to choose from which of the 2 queues to
pick the next workload to execute. By default, the GuC prioritizes RCS
submissions over CCS ones, which can lead to CCS workloads being
significantly (or completely) starved of execution time.
The driver can tune this by setting a dedicated scheduling policy KLV;
this KLV allows the driver to specify a quantum (in ms) and a ratio
(percentage value between 0 and 100), and the GuC will prioritize the CCS
for that percentage of each quantum.
Given that we want to guarantee enough RCS throughput to avoid missing
frames, we set the yield policy to 20% of each 80ms interval.
v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable
in gt_sanitize
Fixes: d9a1ae0d17 ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com
GuC has an interface to set a power profile for the SLPC algorithm.
Base mode is default and ensures a balanced performance, power_saving
mode has conservative up/down thresholds and is suitable for use with
apps that typically need to be power efficient. This will result in
lower GT frequencies, thus consuming lower power.
Selected power profile will be displayed in this format:
$ cat power_profile
[base] power_saving
$ echo power_saving > power_profile
$ cat power_profile
base [power_saving]
v2: Address review comments (Rodrigo)
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The GuC load process will abort if certain status codes (which are
indicative of a fatal error) are reported. Otherwise, it keeps waiting
until the 'success' code is returned. New error codes have been added
in recent GuC releases, so add support for aborting on those as well.
v2: Shuffle HWCONFIG_START to the front of the switch to keep the
ordering as per the enum define for clarity (review feedback by
Jonathan). Also add a description for the basic 'invalid init data'
code which was missing.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250726024337.4056272-1-John.C.Harrison@Intel.com
The Dynamic Inhibit Context Switch is an optimization aimed at reducing
the amount of time the HW is stuck waiting on an unsatisfied semaphore.
When this optimization is enabled, the GuC will dynamically modify the
CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of
LRCs to enable immediate switching out on an unsatisfied semaphore wait
when multiple contexts are competing for time on the same engine.
This feature is available on recent HW from GuC 70.40.1 onwards and it
is enabled via a per-VF feature opt-in.
v2: rebase
v3: switch to using guc_buf_cache instead of dedicated alloc
v4: add helper to check for feature availability (Michal), don't enable
if multi-lrc is possible.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Julia Filipchuk <julia.filipchuk@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com
On newer HW (Xe2 onwards + PVC) it is possible to get extra information
when a CAT error occurs, specifically a dword reporting the error type.
To enable this extra reporting, we need to opt-in with the GuC, which is
done via a specific per-VF feature opt-in H2G.
On platforms where the HW does not support the extra reporting, the GuC
will set the type to 0xdeadbeef, so we can keep the code simple and
opt-in to the feature on every platform and then just discard the data
if it is invalid.
Note that on native/PF we're guaranteed that the opt in is available
because we don't support any GuC old enough to not have it, but if we're
a VF we might be running on a non-XE PF with an older GuC, so we need to
handle that case. We can re-use the invalid type above to handle this
scenario the same way as if the feature was not supported in HW.
Given that this patch is the first user of the guc_buf_cache on native
and VF, it also extends that feature to non-PF use-cases.
v2: simpler print for the error type (John), rebase
v3: use guc_buf_cache instead of new alloc, simpler doc (Michal)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com
This reverts commit 3972872e45.
There are several things wrong with the way this WA was implemented:
- The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't
apply it unconditionally.
- The KLV requires 2 DWs of data, which are not currently provided.
The GuC currently ignores any unknown KLVs, so on versions older that
70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts
to parse the KLV and fails due to the missing data, causing a GuC load
abort.
Given that 70.47.0 is the first GuC version approved for public release
for PTL, let's revert this patch so it doesn't cause the GuC load to
fail with that blob. We can then re-apply it properly fixed after the
GuC definition is merged, which will also have the added benefit of
running the KLV addition through CI with the right GuC version.
Fixes: 3972872e45 ("drm/xe/ptl: Apply Wa_16026007364")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: sanirban <sk.anirban@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
These error codes are not actually used in the driver but it is
extremely useful to have them available to understand error messages.
v2: Add a bunch more error codes and drop 'status' from names (review
feedback by Michal W).
v3: Drop 'SUCCESS' response as meaningless in current API (review
feedback by Michal W).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250512215324.1457009-3-John.C.Harrison@Intel.com
Some GuC messages are constructed with incrementing dword counter
rather than referencing specific DWORDs, as described in GuC interface
specification.
This change introduces the definitions of DWORD numbers for parameters
which will need to be referenced in a CTB parser to be added in a
following patch. To ensure correctness of these DWORDs, verification
in form of asserts was added to the message construction code.
v2: Renamed enum members, added ones for single context registration,
modified asserts to check values rather than indexes.
v3: Reordered assert args to take less lines
v4: Added lengths
v5: Renamed MULTI_LRC_MSG_LEN to MULTI_LRC_MSG_MIN_LEN
Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250512114018.361843-4-tomasz.lis@intel.com
Allow user to provide a low latency hint. When set, KMD sends a hint
to GuC which results in special handling for that process. SLPC will
ramp the GT frequency aggressively every time it switches to this
process.
We need to enable the use of SLPC Compute strategy during init, but
it will apply only to processes that set this bit during process
creation.
Improvement with this approach as below:
Before,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 283.16 us
After,
:~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency
Platform: Intel(R) OpenCL Graphics
Device: Intel(R) Graphics [0xe20b]
Driver version : 24.52.0 (Linux x64)
Compute units : 160
Clock frequency : 2850 MHz
Kernel launch latency : 63.38 us
Compute PR: https://github.com/intel/compute-runtime/pull/794
Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214
IGT PR: https://patchwork.freedesktop.org/patch/639989/
V10(Lucas):
- Remove doc from drm-uapi.rst
v9(Vinay):
- remove extra line, align commit message
v8(Vinay):
- Add separate example for using low latency hint
v7(Jose):
- Update UMD PR
- applicable to all gpus
V6:
- init flags, remove redundant flags check (MAuld)
V5:
- Move uapi doc to documentation and GuC ABI specific change (Rodrigo)
- Modify logic to restrict exec queue flags (MAuld)
V4:
- To make it clear, dont use exec queue word (Vinay)
- Correct typo in description of flag (Jose/Vinay)
- rename set_strategy api and replace ctx with exec queue(Vinay)
- Start with 0th bit to indentify user flags (Jose)
V3:
- Conver user flag to kernel internal flag and use (Oak)
- Support query config for use to check kernel support (Jose)
- Dont need to take runtime pm (Vinay)
V2:
- DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon)
- Add motivation to description (Lucas)
Acked-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
GuC provides support to read engine counters to calculate the
engine activity. KMD exposes two counters via the PMU interface to
calculate engine activity
Engine Active Ticks(engine-active-ticks) - active ticks of engine
Engine Total Ticks (engine-total-ticks) - total ticks of engine
Engine activity percentage can be calculated as below
Engine activity % = (engine active ticks/engine total ticks) * 100.
v2: fix cosmetic review comments
add forcewake for gpm_ts (Umesh)
v3: fix CI hooks error
change function parameters and unpin bo on error
of allocate_activity_buffers
fix kernel-doc (Umesh)
use engine activity (Umesh, Lucas)
rename xe_engine_activity to xe_guc_engine_*
fix commit message to use engine activity (Lucas, Umesh)
v4: add forcewake in PMU layer
v5: fix makefile
use drmm_kcalloc instead of kmalloc_array
remove managed bo
skip init for VF
fix cosmetic review comments (Michal)
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-2-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
PXP requires submissions to the HW for the following operations
1) Key invalidation, done via the VCS engine
2) Communication with the GSC FW for session management, done via the
GSCCS.
Key invalidation submissions are serialized (only 1 termination can be
serviced at a given time) and done via GGTT, so we can allocate a simple
BO and a kernel queue for it.
Submissions for session management are tied to a PXP client (identified
by a unique host_session_id); from the GSC POV this is a user-accessible
construct, so all related submission must be done via PPGTT. The driver
does not currently support PPGTT submission from within the kernel, so
to add this support, the following changes have been included:
- a new type of kernel-owned VM (marked as GSC), required to ensure we
don't use fault mode on the engine and to mark the different lock
usage with lockdep.
- a new function to map a BO into a VM from within the kernel.
v2: improve comments and function name, remove unneeded include (John)
v3: fix variable/function names in documentation
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com
After restore, GuC will not answer to any messages from VF KMD until
fixups are applied. When that is done, VF KMD sends RESFIX_DONE
message to GuC, at which point GuC resumes normal operation.
This patch implements sending the RESFIX_DONE message at end of
post-migration recovery.
v2: keep pm ref during whole recovery, style fixes (Michal)
v3: assert removal to separate patch, debug message per GuC instead
of one, comments changes (Michal)
v4: improve one debug message (Michal)
Signed-off-by: Tomasz Lis <tomasz.lis@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-4-tomasz.lis@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Upon the G2H Notify-Err-Capture event, parse through the
GuC Log Buffer (error-capture-subregion) and generate one or
more capture-nodes. A single node represents a single "engine-
instance-capture-dump" and contains at least 3 register lists:
global, engine-class and engine-instance. An internal link
list is maintained to store one or more nodes.
Because the link-list node generation happen before the call
to devcoredump, duplicate global and engine-class register
lists for each engine-instance register dump if we find
dependent-engine resets in a engine-capture-group.
To avoid dynamically allocate the output nodes during gt reset,
pre-allocate a fixed number of empty nodes up front (at the
time of ADS registration) that we can consume from or return to
an internal cached list of nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-5-zhanjun.dong@intel.com
Add referenced registers defines and list of registers.
Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.
Then, populate GuC ADS with the lists of registers we want
GuC to report back to host on engine reset events. This list
should include global, engine-class and engine-instance
registers for every engine-class type on the current hardware.
Ensure we allocate a persistent storage for the register lists
that are populated into ADS so that we don't need to allocate
memory during GT resets when GuC is reloaded and ADS population
happens again.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
Add a worker function helper for asynchronously dumping state when an
internal/fatal error is detected in CT processing. Being asynchronous
is required to avoid deadlocks and scheduling-while-atomic or
process-stalled-for-too-long issues. Also check for a bunch more error
conditions and improve the handling of some existing checks.
v2: Use compile time CONFIG check for new (but not directly CT_DEAD
related) checks and use unsigned int for a bitmask, rename
CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments,
rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W.
Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the
entire mask to zero.
v3: Fix kerneldoc
v4: Nullify some floating pointers after free.
v5: Add section headings and device info to make the state dump look
more like a devcoredump to allow parsing by the same tools (eventual
aim is to just call the devcoredump code itself, but that currently
requires an xe_sched_job, which is not available in the CT code).
v6: Fix potential for leaking snapshots with concurrent error
conditions (review feedback from Julia F).
v7: Don't complain about unexpected G2H messages yet because there is
a known issue causing them. Fix bit shift bug with v6 change. Add GT
id to fake coredump headers and use puts instead of printf.
v8: Disable the head mis-match check in g2h_read because it is failing
on various discrete platforms due to unknown reasons.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-9-John.C.Harrison@Intel.com
GuC loading can take longer than it is supposed to for various
reasons. So add in the code to cope with that and to report it when it
happens. There are also many different reasons why GuC loading can
fail, so add in the code for checking for those and for reporting
issues in a meaningful manner rather than just hitting a timeout and
saying 'fail: status = %x'.
Also, remove the 'FIXME' comment about an i915 bug that has never been
applicable to Xe!
v2: Actually report the requested and granted frequencies rather than
showing granted twice (review feedback from Badal).
v3: Locally code all the timeout and end condition handling because a
helper function is not allowed (review feedback from Lucas/Rodrigo).
v4: Add more documentation comments and rename a define to add units
(review feedback from Lucas).
v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from
Lucas) and rebase (no more return value from guc_wait_ucode).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com