linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-26 10:32:25 -04:00

Author	SHA1	Message	Date
Matthew Brost	08c98f3f2b	Revert "drm/xe/vf: Rebase exec queue parallel commands during migration recovery" This reverts commit `ba180a3621`. Due to change in the VF migration recovery design this code is not needed any more. v3: - Add commit message (Michal / Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20251002233824.203417-2-michal.wajdeczko@intel.com	2025-10-03 20:36:23 -07:00
John Harrison	456b32c9c1	drm/xe/guc: Add test for G2G communications Add a test for sending messages from every GuC to every other GuC to test G2G communications. Note that, being a debug only feature, the test interface only exists in pre-production builds of the GuC firmware. v2: Fix 'default' case to actually use the driver's registration code as well as allocation. Add comments explaining the different test types. Fix (C) date and an assert. Review feedback from Daniele. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250910210237.603576-5-John.C.Harrison@Intel.com	2025-09-15 09:53:26 -07:00
Daniele Ceraolo Spurio	8843444843	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com	2025-09-11 09:45:35 -07:00
Vinay Belgaumkar	60d2b78991	drm/xe/guc: Add SLPC power profile interface GuC has an interface to set a power profile for the SLPC algorithm. Base mode is default and ensures a balanced performance, power_saving mode has conservative up/down thresholds and is suitable for use with apps that typically need to be power efficient. This will result in lower GT frequencies, thus consuming lower power. Selected power profile will be displayed in this format: $ cat power_profile [base] power_saving $ echo power_saving > power_profile $ cat power_profile base [power_saving] v2: Address review comments (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250903232120.390190-1-vinay.belgaumkar@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-11 08:45:05 -04:00
Badal Nilawar	29042df3ac	drm/xe/psmi: Add Wa_14020001231 Enable Wa 14020001231 to block psmi interrupts during C6 entry exit flow. It's only enabled if PSMI is enabled in runtime. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-4-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Tomasz Lis	ba180a3621	drm/xe/vf: Rebase exec queue parallel commands during migration recovery Parallel exec queues have an additional command streamer buffer which holds a GGTT reference to data within context status. The GGTT references have to be fixed after VF migration. v2: Properly handle nop entry, verify if parsing goes ok v3: Improve error/warn logging, add propagation of errors, give names to magic offsets Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Cc: Michal Winiarski <michal.winiarski@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://lore.kernel.org/r/20250802031045.1127138-9-tomasz.lis@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>	2025-08-04 16:47:12 +02:00
John Harrison	45fbb51050	drm/xe/guc: Add more GuC load error status codes The GuC load process will abort if certain status codes (which are indicative of a fatal error) are reported. Otherwise, it keeps waiting until the 'success' code is returned. New error codes have been added in recent GuC releases, so add support for aborting on those as well. v2: Shuffle HWCONFIG_START to the front of the switch to keep the ordering as per the enum define for clarity (review feedback by Jonathan). Also add a description for the basic 'invalid init data' code which was missing. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250726024337.4056272-1-John.C.Harrison@Intel.com	2025-07-28 14:38:44 -07:00
Sk Anirban	d72779c29d	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250716101622.3421480-2-sk.anirban@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 14:01:27 -07:00
Daniele Ceraolo Spurio	9c7d93a8f1	drm/xe/guc: Enable the Dynamic Inhibit Context Switch optimization The Dynamic Inhibit Context Switch is an optimization aimed at reducing the amount of time the HW is stuck waiting on an unsatisfied semaphore. When this optimization is enabled, the GuC will dynamically modify the CTX_CTRL_INHIBIT_SYN_CTX_SWITCH in the CTX_CONTEXT_CONTROL register of LRCs to enable immediate switching out on an unsatisfied semaphore wait when multiple contexts are competing for time on the same engine. This feature is available on recent HW from GuC 70.40.1 onwards and it is enabled via a per-VF feature opt-in. v2: rebase v3: switch to using guc_buf_cache instead of dedicated alloc v4: add helper to check for feature availability (Michal), don't enable if multi-lrc is possible. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Julia Filipchuk <julia.filipchuk@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-4-daniele.ceraolospurio@intel.com	2025-06-27 11:08:50 -07:00
Daniele Ceraolo Spurio	a7ffcea863	drm/xe/guc: Enable extended CAT error reporting On newer HW (Xe2 onwards + PVC) it is possible to get extra information when a CAT error occurs, specifically a dword reporting the error type. To enable this extra reporting, we need to opt-in with the GuC, which is done via a specific per-VF feature opt-in H2G. On platforms where the HW does not support the extra reporting, the GuC will set the type to 0xdeadbeef, so we can keep the code simple and opt-in to the feature on every platform and then just discard the data if it is invalid. Note that on native/PF we're guaranteed that the opt in is available because we don't support any GuC old enough to not have it, but if we're a VF we might be running on a non-XE PF with an older GuC, so we need to handle that case. We can re-use the invalid type above to handle this scenario the same way as if the feature was not supported in HW. Given that this patch is the first user of the guc_buf_cache on native and VF, it also extends that feature to non-PF use-cases. v2: simpler print for the error type (John), rebase v3: use guc_buf_cache instead of new alloc, simpler doc (Michal) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> #v1 Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250625205405.1653212-3-daniele.ceraolospurio@intel.com	2025-06-27 11:08:40 -07:00
Daniele Ceraolo Spurio	dfe6c28132	Revert "drm/xe/ptl: Apply Wa_16026007364" This reverts commit `3972872e45`. There are several things wrong with the way this WA was implemented: - The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't apply it unconditionally. - The KLV requires 2 DWs of data, which are not currently provided. The GuC currently ignores any unknown KLVs, so on versions older that 70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts to parse the KLV and fails due to the missing data, causing a GuC load abort. Given that 70.47.0 is the first GuC version approved for public release for PTL, let's revert this patch so it doesn't cause the GuC load to fail with that blob. We can then re-apply it properly fixed after the GuC definition is merged, which will also have the added benefit of running the KLV addition through CI with the right GuC version. Fixes: `3972872e45` ("drm/xe/ptl: Apply Wa_16026007364") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: sanirban <sk.anirban@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-25 10:18:04 -04:00
sanirban	3972872e45	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. v2: - Update klv name (Badal) Signed-off-by: sanirban <sk.anirban@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-20 15:36:43 -04:00
John Harrison	12373b30e2	drm/xe/guc: Add missing H2G error code definitions These error codes are not actually used in the driver but it is extremely useful to have them available to understand error messages. v2: Add a bunch more error codes and drop 'status' from names (review feedback by Michal W). v3: Drop 'SUCCESS' response as meaningless in current API (review feedback by Michal W). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250512215324.1457009-3-John.C.Harrison@Intel.com	2025-05-15 12:27:34 -07:00
Tomasz Lis	e327592cc9	drm/xe/guc: Introduce enum with offsets for context register H2Gs Some GuC messages are constructed with incrementing dword counter rather than referencing specific DWORDs, as described in GuC interface specification. This change introduces the definitions of DWORD numbers for parameters which will need to be referenced in a CTB parser to be added in a following patch. To ensure correctness of these DWORDs, verification in form of asserts was added to the message construction code. v2: Renamed enum members, added ones for single context registration, modified asserts to check values rather than indexes. v3: Reordered assert args to take less lines v4: Added lengths v5: Renamed MULTI_LRC_MSG_LEN to MULTI_LRC_MSG_MIN_LEN Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250512114018.361843-4-tomasz.lis@intel.com	2025-05-12 15:53:37 +02:00
John Harrison	d3e8349edf	drm/xe/guc: Enable w/a 16026508708 The workaround is only relevant to SRIOV but does affect all platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250403185619.1555853-2-John.C.Harrison@Intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-04-10 14:09:35 -07:00
Riana Tauro	2de3f38fbf	drm/xe: Add support for per-function engine activity Add support for function level engine activity stats. Engine activity stats are enabled when VF's are enabled v2: remove unnecessary initialization move offset to improve code readability (Umesh) remove global for function engine activity (Lucas) v3: fix commit message (Michal) v4: remove enable function parameter fix kernel-doc (Umesh) Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250311071759.2117211-2-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-03-25 07:14:32 -07:00
Tejas Upadhyay	5488bec96b	drm/xe/uapi: Use hint for guc to set GT frequency Allow user to provide a low latency hint. When set, KMD sends a hint to GuC which results in special handling for that process. SLPC will ramp the GT frequency aggressively every time it switches to this process. We need to enable the use of SLPC Compute strategy during init, but it will apply only to processes that set this bit during process creation. Improvement with this approach as below: Before, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 283.16 us After, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 63.38 us Compute PR: https://github.com/intel/compute-runtime/pull/794 Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214 IGT PR: https://patchwork.freedesktop.org/patch/639989/ V10(Lucas): - Remove doc from drm-uapi.rst v9(Vinay): - remove extra line, align commit message v8(Vinay): - Add separate example for using low latency hint v7(Jose): - Update UMD PR - applicable to all gpus V6: - init flags, remove redundant flags check (MAuld) V5: - Move uapi doc to documentation and GuC ABI specific change (Rodrigo) - Modify logic to restrict exec queue flags (MAuld) V4: - To make it clear, dont use exec queue word (Vinay) - Correct typo in description of flag (Jose/Vinay) - rename set_strategy api and replace ctx with exec queue(Vinay) - Start with 0th bit to indentify user flags (Jose) V3: - Conver user flag to kernel internal flag and use (Oak) - Support query config for use to check kernel support (Jose) - Dont need to take runtime pm (Vinay) V2: - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon) - Add motivation to description (Lucas) Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-03-05 09:54:24 +05:30
Riana Tauro	b729ea271e	drm/xe: Add engine activity support GuC provides support to read engine counters to calculate the engine activity. KMD exposes two counters via the PMU interface to calculate engine activity Engine Active Ticks(engine-active-ticks) - active ticks of engine Engine Total Ticks (engine-total-ticks) - total ticks of engine Engine activity percentage can be calculated as below Engine activity % = (engine active ticks/engine total ticks) * 100. v2: fix cosmetic review comments add forcewake for gpm_ts (Umesh) v3: fix CI hooks error change function parameters and unpin bo on error of allocate_activity_buffers fix kernel-doc (Umesh) use engine activity (Umesh, Lucas) rename xe_engine_activity to xe_guc_engine_* fix commit message to use engine activity (Lucas, Umesh) v4: add forcewake in PMU layer v5: fix makefile use drmm_kcalloc instead of kmalloc_array remove managed bo skip init for VF fix cosmetic review comments (Michal) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250224053903.2253539-2-riana.tauro@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-24 12:32:09 -08:00
Daniele Ceraolo Spurio	0387d46ea7	drm/xe/pxp: Add GSC session initialization support A session is initialized (i.e. started) by sending a message to the GSC. The initialization will be triggered when a user opts-in to using PXP; the interface for that is coming in a follow-up patch in the series. v2: clean up error messages, use new ARB define (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-7-daniele.ceraolospurio@intel.com	2025-02-03 11:51:15 -08:00
Daniele Ceraolo Spurio	96e84a2f5a	drm/xe/pxp: Add GSC session invalidation support After a session is terminated, we need to inform the GSC so that it can clean up its side of the allocation. This is done by sending an invalidation command with the session ID. The invalidation will be triggered in response to a termination, interrupt, whose handling is coming in the next patch in the series. v2: Better comment and error messages (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-5-daniele.ceraolospurio@intel.com	2025-02-03 11:51:11 -08:00
Daniele Ceraolo Spurio	dcdd6b84d9	drm/xe/pxp: Allocate PXP execution resources PXP requires submissions to the HW for the following operations 1) Key invalidation, done via the VCS engine 2) Communication with the GSC FW for session management, done via the GSCCS. Key invalidation submissions are serialized (only 1 termination can be serviced at a given time) and done via GGTT, so we can allocate a simple BO and a kernel queue for it. Submissions for session management are tied to a PXP client (identified by a unique host_session_id); from the GSC POV this is a user-accessible construct, so all related submission must be done via PPGTT. The driver does not currently support PPGTT submission from within the kernel, so to add this support, the following changes have been included: - a new type of kernel-owned VM (marked as GSC), required to ensure we don't use fault mode on the engine and to mark the different lock usage with lockdep. - a new function to map a BO into a VM from within the kernel. v2: improve comments and function name, remove unneeded include (John) v3: fix variable/function names in documentation Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-3-daniele.ceraolospurio@intel.com	2025-02-03 11:51:05 -08:00
Nitin Gote	75fd04f276	drm/xe: Fix all typos in xe Fix all typos in files of xe, reported by codespell tool. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2025-01-09 17:58:09 +01:00
John Harrison	a9f7b97dda	drm/xe/guc: Add support for G2G communications Some features require inter-GuC communication channels on multi-tile devices. So allocate and enable such. v2: Correct use of xe_bo_get/put (review feedback from Matthew Brost) Add extra assert, re-order a calculation for better clarity and add comments to slot calculation (review feedback from Daniele). Also slightly re-work the slot calc to avoid code duplication. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241120000222.204095-3-John.C.Harrison@Intel.com	2024-11-22 19:10:56 -08:00
Michal Wajdeczko	5bd3521d25	drm/xe/guc: Add VF_CFG_SCHED_PRIORITY_KEY KLV definition This KLV allows to set the scheduling priority for each VF, also for the PF. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241106151301.2079-2-michal.wajdeczko@intel.com	2024-11-08 13:31:13 +01:00
Tomasz Lis	1255954d9f	drm/xe/vf: Send RESFIX_DONE message at end of VF restore After restore, GuC will not answer to any messages from VF KMD until fixups are applied. When that is done, VF KMD sends RESFIX_DONE message to GuC, at which point GuC resumes normal operation. This patch implements sending the RESFIX_DONE message at end of post-migration recovery. v2: keep pm ref during whole recovery, style fixes (Michal) v3: assert removal to separate patch, debug message per GuC instead of one, comments changes (Michal) v4: improve one debug message (Michal) Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241104213449.1455694-4-tomasz.lis@intel.com Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>	2024-11-06 15:25:30 +01:00
Vinay Belgaumkar	61ef737db9	drm/xe/ptl: Apply Wa_14022866841 As part of this WA, GuC will hold a forcewake for certain MMIO accesses outside the GT/media domains. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com	2024-10-17 11:20:21 -07:00
Zhanjun Dong	8bfc496327	drm/xe/guc: Extract GuC error capture lists Upon the G2H Notify-Err-Capture event, parse through the GuC Log Buffer (error-capture-subregion) and generate one or more capture-nodes. A single node represents a single "engine- instance-capture-dump" and contains at least 3 register lists: global, engine-class and engine-instance. An internal link list is maintained to store one or more nodes. Because the link-list node generation happen before the call to devcoredump, duplicate global and engine-class register lists for each engine-instance register dump if we find dependent-engine resets in a engine-capture-group. To avoid dynamically allocate the output nodes during gt reset, pre-allocate a fixed number of empty nodes up front (at the time of ADS registration) that we can consume from or return to an internal cached list of nodes. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-5-zhanjun.dong@intel.com	2024-10-08 09:34:45 -07:00
Zhanjun Dong	84d15f4261	drm/xe/guc: Add capture size check in GuC log buffer Capture-nodes generated by GuC are placed in the GuC capture ring buffer which is a sub-region of the larger Guc-Log-buffer. Add capture output size check before allocating the shared buffer. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-4-zhanjun.dong@intel.com	2024-10-08 09:34:28 -07:00
Zhanjun Dong	9c8c7a7e6f	drm/xe/guc: Prepare GuC register list and update ADS size for error capture Add referenced registers defines and list of registers. Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent storage for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com	2024-10-08 09:34:04 -07:00
John Harrison	d2c5a5a926	drm/xe/guc: Dead CT helper Add a worker function helper for asynchronously dumping state when an internal/fatal error is detected in CT processing. Being asynchronous is required to avoid deadlocks and scheduling-while-atomic or process-stalled-for-too-long issues. Also check for a bunch more error conditions and improve the handling of some existing checks. v2: Use compile time CONFIG check for new (but not directly CT_DEAD related) checks and use unsigned int for a bitmask, rename CT_DEAD_RESET to CT_DEAD_REARM and add some explaining comments, rename 'hxg' macro parameter to 'ctb' - review feedback from Michal W. Drop CT_DEAD_ALIVE as no need for a bitfield define to just set the entire mask to zero. v3: Fix kerneldoc v4: Nullify some floating pointers after free. v5: Add section headings and device info to make the state dump look more like a devcoredump to allow parsing by the same tools (eventual aim is to just call the devcoredump code itself, but that currently requires an xe_sched_job, which is not available in the CT code). v6: Fix potential for leaking snapshots with concurrent error conditions (review feedback from Julia F). v7: Don't complain about unexpected G2H messages yet because there is a known issue causing them. Fix bit shift bug with v6 change. Add GT id to fake coredump headers and use puts instead of printf. v8: Disable the head mis-match check in g2h_read because it is failing on various discrete platforms due to unknown reasons. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-9-John.C.Harrison@Intel.com	2024-10-07 18:34:59 -07:00
Michal Wajdeczko	804ce41f66	drm/xe/guc: Add PF2GUC_SAVE_RESTORE_VF to ABI In upcoming patches we will add support to the PF driver to save and restore a VF state maintained by the GuC to allow VF migration. Add necessary H2G definitions to our GuC firmware ABI header. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240912203817.1880-3-michal.wajdeczko@intel.com	2024-09-16 13:00:08 +02:00
Julia Filipchuk	636cdf6fbd	drm/xe/guc: Enable w/a 14022293748 and 22019794406 Enable workarounds for HW bug where render engine reset fails. Given that we're bumping the minimum required GuC version to 70.29, we're guaranteed to always have support for this KLV in the GuC. v2: Enable KLV correctly for either workaround (Lucas) v4: Add check for minimum supported GuC firmware version. Enable w/a for hw version 20.01 too. (Daniele) v5 (Daniele): remove now unneeded fw type and version checks (JohnH) Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240805205435.921921-1-daniele.ceraolospurio@intel.com	2024-08-08 13:47:27 -07:00
Michal Wajdeczko	b084dfaef2	drm/xe/guc: Add more GuC error codes to ABI There are many more error codes used that the GuC firmware can return in the RESPONSE_FAILURE message. Add to the ABI header those which are more likely to be seen by the PF or VF drivers. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240625141258.1257-3-michal.wajdeczko@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:25:07 -04:00
Michal Wajdeczko	4ca1a12a1b	drm/xe/guc: Add kernel-doc for HXG Fast Request We have kernel-doc for all HXG message types but Fast Request. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240610120411.1768-3-michal.wajdeczko@intel.com	2024-06-11 12:25:21 +02:00
Michal Wajdeczko	91524b3a09	drm/xe/guc: Drop unused legacy GuC message ABI definitions Those were copy-pasted from i915 code and never used in Xe driver. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240610120411.1768-2-michal.wajdeczko@intel.com	2024-06-11 12:25:19 +02:00
Michal Wajdeczko	3a3fc10cce	drm/xe/guc: Move H2G SETUP_PC_GUCRC definition to SLPC ABI We already have a dedicated file for GuC SLPC ABI definitions. Move definition of the SETUP_PC_GUCRC action and related enum to that file, rename them to match format of other new ABI definitions and add simple kernel-doc. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240609181931.1724-2-michal.wajdeczko@intel.com	2024-06-10 12:41:38 +02:00
Michal Wajdeczko	e70aa1016e	drm/xe/guc: Add GLOBAL_CFG_GMD_ID KLV definition VF drivers can't access GMD_ID register over MMIO. The value of the GMD_ID register must be queried from GuC. It is available as GLOBAL_CFG_GMD_ID KLV. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240523192240.844-3-michal.wajdeczko@intel.com	2024-05-24 10:02:27 +02:00
John Harrison	b0ac1b42db	drm/xe/guc: Port over the slow GuC loading support from i915 GuC loading can take longer than it is supposed to for various reasons. So add in the code to cope with that and to report it when it happens. There are also many different reasons why GuC loading can fail, so add in the code for checking for those and for reporting issues in a meaningful manner rather than just hitting a timeout and saying 'fail: status = %x'. Also, remove the 'FIXME' comment about an i915 bug that has never been applicable to Xe! v2: Actually report the requested and granted frequencies rather than showing granted twice (review feedback from Badal). v3: Locally code all the timeout and end condition handling because a helper function is not allowed (review feedback from Lucas/Rodrigo). v4: Add more documentation comments and rename a define to add units (review feedback from Lucas). v5: Fix copy/paste error in xe_mmio_wait32_not (review feedback from Lucas) and rebase (no more return value from guc_wait_ucode). Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240518043700.3264362-3-John.C.Harrison@Intel.com	2024-05-23 10:55:31 -07:00
Michal Wajdeczko	c454f1a6b9	drm/xe/guc: Add VF2GUC_QUERY_SINGLE_KLV to ABI In upcoming patches we will add support to the VF driver to read its configuration from the GuC using special H2G actions. Add necessary definitions to our GuC firmware ABI header. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-4-michal.wajdeczko@intel.com	2024-05-16 20:18:33 +02:00
Michal Wajdeczko	769551c45c	drm/xe/guc: Add VF2GUC_VF_RESET to ABI The version negotiation between the VF driver and the GuC firmware must start with explicit soft reset of the GuC state initiated by the VF driver. Add VF2GUC action definitions to the ABI header. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-3-michal.wajdeczko@intel.com	2024-05-16 20:18:32 +02:00
Michal Wajdeczko	e158cf9361	drm/xe/guc: Add VF2GUC_MATCH_VERSION to ABI In upcoming patches we will add a version negotiation between the VF driver and the GuC firmware. Add necessary definitions to our GuC firmware ABI header. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240516110546.2216-2-michal.wajdeczko@intel.com	2024-05-16 20:18:31 +02:00
Michal Wajdeczko	d5e12fffcc	drm/xe/guc: Add GUC2PF_ADVERSE_EVENT to ABI When thresholds used to monitor VFs activities are configured, then GuC may send GUC2PF_ADVERSE_EVENT messages informing the PF driver about exceeded thresholds. Add necessary definitions to our GuC firmware ABI header. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240514190015.2172-7-michal.wajdeczko@intel.com	2024-05-16 18:04:45 +02:00
Michal Wajdeczko	7547a23cae	drm/xe/guc: Fix typos in VF CFG KLVs descriptions Apart from the obvious spelling typo, use the correct values for infinity quantum/timeout settings (it's 0x0 instead of 0xFFFFFFFF). Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240424140506.2133-1-michal.wajdeczko@intel.com	2024-04-25 17:27:48 +02:00
Michal Wajdeczko	5a8c292f74	drm/xe/guc: Update VF configuration KLVs definitions GuC firmware specification says that maximum value for the execution quantum KLV is 100s and anything exceeding that will be clamped. The same limitation applies to the preemption timeout KLV. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240419123543.270-2-michal.wajdeczko@intel.com	2024-04-24 15:32:24 +02:00
Michal Wajdeczko	8f21f82d8b	drm/xe/guc: Add GuC Relay ABI version 1.0 definitions This initial GuC Relay ABI specification includes messages for ABI version negotiation and to query values of runtime/fuse registers. We will start handling those messages on the PF driver soon. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423180436.2089-2-michal.wajdeczko@intel.com	2024-04-24 15:10:39 +02:00
Vinay Belgaumkar	2817a1f1bf	drm/xe/lnl: Apply GuC Wa_13011645652 Enable WA for a bug that could cause the C6 state machine to hang during RC6 exit. v2: Add comment clarifying the WA (John H) v3: Add more details to the comment (John H) Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com	2024-04-17 15:21:12 -07:00
John Harrison	b7f888ee9c	drm/xe/lnl: Enable more GuC based workarounds There are a couple of new workarounds for LNL that are implemented in the GuC firmware. The KMD needs to enable them explicitly. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com	2024-04-16 10:50:36 -07:00
Michal Wajdeczko	3f11bcc656	drm/xe/guc: Add PF2GUC_UPDATE_VF_CFG to ABI In upcoming patches the PF driver will add support to change VFs configuration and will need to use PF2GUC_UPDATE_VF_CFG messages. Add necessary definitions to our GuC firmware ABI header. Definitions of the GuC VF Configuration KLVs used by this action are already present in abi/guc_klvs_abi.h Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240415173937.1287-5-michal.wajdeczko@intel.com	2024-04-16 12:37:32 +02:00
Michal Wajdeczko	bbc8a6fb83	drm/xe/guc: Add PF2GUC_UPDATE_VGT_POLICY to ABI In upcoming patches the PF driver will add support to change GuC policies and will need to use PF2GUC_UPDATE_VGT_POLICY messages. Add necessary definitions to our GuC firmware ABI header. Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410170338.1199-4-michal.wajdeczko@intel.com	2024-04-12 16:23:30 +02:00
Badal Nilawar	c151ff5c90	drm/xe/lnl: Enable GuC Wa_14019882105 Enable GuC Wa_14019882105 to block interrupts during C6 flow when the memory path has been blocked v2: Make helper function generic and name it as guc_waklv_enable_simple (John Harrison) v3: Make warning descriptive (John Harrison) v4: s/drm_WARN/xe_gt_WARN/ (Michal) Cc: John Harrison <john.harrison@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-3-badal.nilawar@intel.com	2024-04-09 12:54:04 +02:00

1 2

62 Commits