Commit Graph

69 Commits

Author SHA1 Message Date
Matt Roper
8367585154 drm/xe: Cleanup unused header includes
clangd reports many "unused header" warnings throughout the Xe driver.
Start working to clean this up by removing unnecessary includes in our
.c files and/or replacing them with explicit includes of other headers
that were previously being included indirectly.

By far the most common offender here was unnecessary inclusion of
xe_gt.h.  That likely originates from the early days of xe.ko when
xe_mmio did not exist and all register accesses, including those
unrelated to GTs, were done with GT functions.

There's still a lot of additional #include cleanup that can be done in
the headers themselves; that will come as a followup series.

v2:
 - Squash the 79-patch series down to a single patch.  (MattB)

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260115032803.4067824-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-15 07:05:04 -08:00
Lukasz Laguna
0f13dead4e drm/xe: Update wedged.mode only after successful reset policy change
Previously, the driver's internal wedged.mode state was updated without
verifying whether the corresponding engine reset policy update in GuC
succeeded. This could leave the driver reporting a wedged.mode state
that doesn't match the actual reset behavior programmed in GuC.

With this change, the reset policy is updated first, and the driver's
wedged.mode state is modified only if the policy update succeeds on all
available GTs.

This patch also introduces two functional improvements:

 - The policy is sent to GuC only when a change is required. An update
   is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG,
   because only in that case the reset policy changes. For example,
   switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and
   XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no
   need to send the same value to GuC.

 - An inconsistent_reset flag is added to track cases where reset policy
   update succeeds only on a subset of GTs. If such inconsistency is
   detected, future wedged mode configuration will force a retry of the
   reset policy update to restore a consistent state across all GTs.

Fixes: 6b8ef44cc0 ("drm/xe: Introduce the wedged_mode debugfs")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:42 -05:00
Lukasz Laguna
17d3c3365b drm/xe: Validate wedged_mode parameter and define enum for modes
Check correctness of the wedged_mode parameter input to ensure only
supported values are accepted. Additionally, replace magic numbers with
a clearly defined enum.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-2-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:07 -05:00
Michal Wajdeczko
09af64eba6 drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helper
There are already few places in the code where we need to check GuC
firmware version. Wrap existing raw conditions into a named helper
macro to make it clear and avoid explicit call of the MAKE_GUC_VER.

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
2025-12-17 23:42:43 +01:00
Matt Roper
83f4151787 drm/xe/lnl: Drop pre-production workaround support
LNL has been out long enough that all of our internal usage of
pre-production hardware has been phased out and we no longer need to
maintain workarounds that were exclusive to pre-production parts.

Production LNL hardware always has B0 or later steppings for both
graphics and media IP.  Eliminate all workarounds that were exclusive to
A-step hardware and set the 'has_prod_wa_only' device flag for LNL to
make sure we warn and taint if someone tries to load the driver on an
old pre-production part.

Bspec: 70821
Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-4-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-12 21:17:10 -08:00
Brian Welty
94edd65186 drm/xe/xe3p_lpm: Configure MAIN_GAMCTRL_QUEUE_SELECT
Starting from Xe3p, there are two different copies of some of the GAM
registers:  the traditional MCR variant at their old locations, and a
new unicast copy known as "main_gamctrl."  The Xe driver doesn't use
these registers directly, but we need to instruct the GuC on which set
it should use.  Since the new, unicast registers are preferred (since
they avoid the need for unnecessary MCR synchronization), set a new GuC
feature flag, GUC_CTL_MAIN_GAMCTRL_QUEUES to convey this decision.  A
new helper function, xe_guc_using_main_gamctrl_queues(), is added for
use in the 3 independent places that need to handle configuration of the
new reporting queues.

The mmio write to enable the main gamctl is only done during the general
GuC upload.  The gamctrl registers are not accessed by the GuC during
hwconfig load.

Last, the ADS blob for communicating the queue addresses contains both a
DPA and GGTT offset. The GuC documentation states that DPA is now MBZ
when using the MAIN_GAMCTRL queues.

Bspec: 76445, 73540
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-1-ad66d3c1908f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-20 17:21:11 -07:00
Arun Abhishek Chowhan
64d00d41f5 drm/xe: Sort include files alphabetically.
Sort the include lines alphabetically, no impact on code behavior.

Signed-off-by: Arun Abhishek Chowhan <arun.abhishek.chowhan@intel.com>
Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://lore.kernel.org/r/20251013095914.3742505-1-arun.abhishek.chowhan@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 07:18:20 -07:00
Lucas De Marchi
19baa830fb drm/xe: Use ARRAY_SIZE in guc_waklv_init()
Prefer using ARRAY_SIZE where needed and just passing 1 instead of
calculating the size of one element.

Cc: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-15 07:50:53 -07:00
Badal Nilawar
29042df3ac drm/xe/psmi: Add Wa_14020001231
Enable Wa 14020001231 to block psmi interrupts during C6 entry exit
flow. It's only enabled if PSMI is enabled in runtime.

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250821-psmi-v5-4-34ab7550d3d8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-08-22 11:46:44 -07:00
Matt Atwood
4d5c98eb77 drm/xe: rename XE_WA to XE_GT_WA
Now that there are two types of wa tables and infrastructure, be more
concise in the naming of GT wa macros.

v2: update the documentation

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-08 10:50:45 -04:00
Jonathan Cavitt
7c9de25efa drm/xe/xe_guc_ads: Consolidate guc_waklv_enable functions
Presently, multiple versions of the guc_waklv_enable_.* function exist,
all with different numbers of dwords added to the klv_entry array.  This
is not extensible, and more duplicates of the function will need to be
created if it ever becomes necessary to support 3 or more dwords per wa
in the future.

Consolidate the disparate guc_waklv_enable functions into a single
guc_waklv_enable function that can take an arbitrary number of dword
values.

v2:
- Update length value properly (Shuicheng)

v3: (Harrison)
- Use data as a term instead of dwords or arr
- Reformat warning message to use hex values
- Eliminate need for kzalloc and klv_entry array
- Reorder function parameters to fix line wrapping

v4:
- Miscellaneous formatting fixes (Cavitt)

v5: (Harrison)
- s/data_range/data_len_dw
- Use data_len_dw to calculate size for xe_map_memcpy_to

Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Cc: Lucas De Marchi <lucas.demarch@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250728194806.68176-2-jonathan.cavitt@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-08-05 07:59:38 -07:00
Sk Anirban
d72779c29d drm/xe/ptl: Apply Wa_16026007364
As part of this WA GuC will save and restore value of two XE3_Media
control registers that were not included in the HW power context.

Signed-off-by: Sk Anirban <sk.anirban@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250716101622.3421480-2-sk.anirban@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-24 14:01:27 -07:00
Michal Wajdeczko
621a422079 drm/xe/guc: Don't allocate temporary policies object
Since we are already using reusable buffer objects from the GuC
buffer cache, we can directly write into their CPU pointers and
spare unnecessary temporary allocation.

While around, also make sure to clear obtained buffer, to avoid
sending some stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com
2025-07-10 16:00:13 +02:00
Matthew Brost
ec9223b49a drm/xe: Drop bo->size
bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.

v2:
 - Fix typo in kernel doc (Ashutosh)
 - Fix kunit (CI)
 - Fix line wrap (Checkpatch)
v3:
 - Fix sriov build (CI)
v4:
 - Fix display build (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
2025-06-27 14:52:31 -07:00
Daniele Ceraolo Spurio
dfe6c28132 Revert "drm/xe/ptl: Apply Wa_16026007364"
This reverts commit 3972872e45.

There are several things wrong with the way this WA was implemented:

- The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't
  apply it unconditionally.

- The KLV requires 2 DWs of data, which are not currently provided.

The GuC currently ignores any unknown KLVs, so on versions older that
70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts
to parse the KLV and fails due to the missing data, causing a GuC load
abort.

Given that 70.47.0 is the first GuC version approved for public release
for PTL, let's revert this patch so it doesn't cause the GuC load to
fail with that blob. We can then re-apply it properly fixed after the
GuC definition is merged, which will also have the added benefit of
running the KLV addition through CI with the right GuC version.

Fixes: 3972872e45 ("drm/xe/ptl: Apply Wa_16026007364")
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: sanirban <sk.anirban@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-25 10:18:04 -04:00
sanirban
3972872e45 drm/xe/ptl: Apply Wa_16026007364
As part of this WA GuC will save and restore value of two XE3_Media
control registers that were not included in the HW power context.

v2:
  - Update klv name (Badal)

Signed-off-by: sanirban <sk.anirban@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-20 15:36:43 -04:00
Michal Wajdeczko
3dbab383e3 drm/xe/guc: Don't allocate managed BO for each policy change
We shouldn't use xe_managed_bo_create_from_data() to allocate
temporary BO, as it will be released only on unload and every
change in wedge_mode policy will consume resources (including
precious GGTT). Instead just switchover to GuC buffer cache.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250512220018.172-3-michal.wajdeczko@intel.com
2025-05-15 12:29:55 +02:00
Lucas De Marchi
c31a0b6402 drm/xe: Set LRC addresses before guc load
The metadata saved in the ADS is read by GuC when it's initialized.
Saving the addresses to the LRCs when they are populated is too late as
GuC will keep using the old ones.

This was causing GuC to use the RCS LRC for any engine class. It's not a
big problem on a Linux-only scenario since the they are used by GuC only
on media engines when the watchdog is triggered. However, in a
virtualization scenario with Windows as the VF, it causes the wrong LRCs
to be loaded as the watchdog is used for all engines.

Fix it by letting guc_golden_lrc_init() initialize the metadata, like
other *_init() functions, and later guc_golden_lrc_populate() to copy
the LRCs to the right places. The former is called before the second GuC
load, while the latter is called after LRCs have been recorded.

Cc: Chee Yin Wong <chee.yin.wong@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: <stable@vger.kernel.org> # v6.11+
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Chee Yin Wong <chee.yin.wong@intel.com>
Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-14 05:49:53 -07:00
John Harrison
d3e8349edf drm/xe/guc: Enable w/a 16026508708
The workaround is only relevant to SRIOV but does affect all platforms.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://lore.kernel.org/r/20250403185619.1555853-2-John.C.Harrison@Intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-10 14:09:35 -07:00
Matthew Brost
045448da87 drm/xe: Add XE_BO_FLAG_PINNED_NORESTORE
Not all BOs need to be restored on resume / d3cold exit, add
XE_BO_FLAG_PINNED_NO_RESTORE which skips restoring of BOs rather just
allocates VRAM for the BO. This should slightly speedup resume / d3cold
exit flows.

Marking GuC ADS, GuC CT, GuC log, GuC PC, and SA as NORESTORE.

v2:
 - s/WONTNEED/NORESTORE (Vivi)
 - Rebase on newly added g2g and backup object flow

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250403102440.266113-11-matthew.auld@intel.com
2025-04-04 11:41:01 +01:00
John Harrison
ce22fccd07 drm/xe/guc: Re-word message about ADS size changes
The error capture list in the ADS is initially allocated using a
placeholder size. When the actual size is determinied later on, there
is a debug print about the new size. However, the wording is such that
some people see it as an unexpected thing and therefore a potential
problem. So re-word it to be a little less concerning.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250325203211.3907890-1-John.C.Harrison@Intel.com
2025-03-28 12:05:05 -07:00
Aradhya Bhatia
8c5fe7d88b drm/xe: Add Wa_16021333562 and Wa_14016712196
Wa_16021333562 and Wa_14016712196 are permanent workarounds that apply
to multiple platforms. Wa_16021333562 applies to platforms ranging from
TGL (12.00) to Xe_LPM (13.00), while Wa_14016712196 from DG2 (12.55) to
Xe_LPG (12.74).

Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-2-aradhya.bhatia@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-02-26 07:50:11 -08:00
Jesus Narvaez
ee5a1321df drm/xe/guc: Adding steering info support for GuC register lists
The guc_mmio_reg interface supports steering, but it is currently not
implemented. This will allow the GuC to control steering of MMIO
registers after save-restore and avoid reading from fused off MCR
register instances.

Fixes: 9c57bc0865 ("drm/xe/lnl: Drop force_probe requirement")
Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com
2025-01-09 10:46:28 -08:00
Lucas De Marchi
3fcf68d739 drm/xe: Apply whitelist to engine save-restore
Instead of handling the whitelist directly in the GuC ADS
initialization, make it follow the same logic as other engine registers
that are save-restored. Main benefit is that then the SW tracking then
shows it in debugfs and there's no risk of an engine workaround to write
to the same nopriv register that is being passed directly to GuC.

This means that xe_reg_whitelist_process_engine() only has to process
the RTP and convert them to entries for the hwe.  With that all the
registers should be covered by xe_reg_sr_apply_mmio() to write to the HW
and there's no special handling in GuC ADS to also add these registers
to the list of registers that is passed to GuC.

Example for DG2:

	# cat  /sys/kernel/debug/dri/0000\:03\:00.0/gt0/register-save-restore
	...
	Engine
	rcs0
		...
		REG[0x24d0] clr=0xffffffff set=0x1000dafc masked=no mcr=no
		REG[0x24d4] clr=0xffffffff set=0x1000db01 masked=no mcr=no
		REG[0x24d8] clr=0xffffffff set=0x0000db1c masked=no mcr=no
	...
	Whitelist
	rcs0
		REG[0xdafc-0xdaff]: allow read access
		REG[0xdb00-0xdb1f]: allow read access
		REG[0xdb1c-0xdb1f]: allow rw access

v2:
  - Use ~0u for clr bits so it's just a write (Matt Roper)
  - Simplify helpers now that unused slots are not written

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-12-11 07:28:58 -08:00
Jonathan Cavitt
20124c3e22 drm/xe/xe_guc_ads: Add nonpriv registers to write list
When performing a guc_mmio_regset_write, we add all the registers in the
reg_sr list to the save/restore list, but do not do the same for the
nonpriv registers.  Add them in.

v2:
- Add all NONPRIV registers to avoid undefined behavior (Harrison)
- s/whitelist/nonpriv

v3:
- Rebase

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: Lucas de Marchi <lucas.demarchi@intel.com>
CC: Matt Roper <matthew.d.roper@intel.com>
CC: John Harrison <john.c.harrison@intel.com>
CC: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
CC: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241122180826.7075-1-jonathan.cavitt@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-12-03 14:02:59 -08:00
Lucas De Marchi
8262db9eff drm/xe: Move Wa 1607983814 to oob
needs_wa_1607983814() predates wa_oob, so it was not being printed
in /sys/kernel/debug/dri/0/*/workarounds. Port it to OOB rules.
This makes the WA show up in debugfs. For TGL:

	OOB Workarounds
		1607983814
		22012773006
		1409600907

Eventually the RTP infra may add support for writing registers in a
loop, which would allow to keep track of the registers as well. But for
now, just listing it as OOB workaround is already an improvement.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029193258.749882-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-10-31 19:24:26 -07:00
Ashutosh Dixit
0191fddf53 Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs"
This reverts commit 55858fa7eb.

'55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist
regs")' was not properly reviewed and also causes dmesg asserts in
CI. Revert it.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295
Fixes: 55858fa7eb ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com
2024-10-30 12:49:18 -07:00
Jonathan Cavitt
55858fa7eb drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs
Several OA registers and allowlist registers were missing from the
save/restore list for GuC and could be lost during an engine reset.  Add
them to the list.

v2:
- Fix commit message (Umesh)
- Add missing closes (Ashutosh)

v3:
- Add missing fixes (Ashutosh)

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Suggested-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
CC: stable@vger.kernel.org # v6.11+
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com
2024-10-28 15:41:40 -07:00
Vinay Belgaumkar
61ef737db9 drm/xe/ptl: Apply Wa_14022866841
As part of this WA, GuC will hold a forcewake for certain
MMIO accesses outside the GT/media domains.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com
2024-10-17 11:20:21 -07:00
Zhanjun Dong
b170d696c1 drm/xe/guc: Add XE_LP steered register lists
Add the ability for runtime allocation and freeing of
steered register list extentions that depend on the
detected HW config fuses.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-3-zhanjun.dong@intel.com
2024-10-08 09:34:16 -07:00
Zhanjun Dong
9c8c7a7e6f drm/xe/guc: Prepare GuC register list and update ADS size for error capture
Add referenced registers defines and list of registers.
Update GuC ADS size allocation to include space for
the lists of error state capture register descriptors.

Then, populate GuC ADS with the lists of registers we want
GuC to report back to host on engine reset events. This list
should include global, engine-class and engine-instance
registers for every engine-class type on the current hardware.

Ensure we allocate a persistent storage for the register lists
that are populated into ADS so that we don't need to allocate
memory during GT resets when GuC is reloaded and ADS population
happens again.

Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com
2024-10-08 09:34:04 -07:00
Francois Dugast
91b2c42c21 drm/xe: Use fault injection infrastructure to find issues at probe time
The kernel fault injection infrastructure is used to test proper error
handling during probe. The return code of the functions using
ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by
tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION
(among others).

One way to use fault injection at probe time by making each of those
functions fail one at a time is:

    FAILTYPE=fail_function
    DEVICE="0000:00:08.0" # depends on the system
    ERRNO=-12 # -ENOMEM, can depend on the function

    echo N > /sys/kernel/debug/$FAILTYPE/task-filter
    echo 100 > /sys/kernel/debug/$FAILTYPE/probability
    echo 0 > /sys/kernel/debug/$FAILTYPE/interval
    echo -1 > /sys/kernel/debug/$FAILTYPE/times
    echo 0 > /sys/kernel/debug/$FAILTYPE/space
    echo 1 > /sys/kernel/debug/$FAILTYPE/verbose

    modprobe xe
    echo $DEVICE > /sys/bus/pci/drivers/xe/unbind

    grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable | \
    cut -d ' ' -f 1 | while read -r FUNCTION ; do
        echo "Injecting fault in $FUNCTION"
        echo "" > /sys/kernel/debug/$FAILTYPE/inject
        echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject
        printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval
        echo $DEVICE > /sys/bus/pci/drivers/xe/bind
    done

    rmmod xe

It will also be integrated into IGT for systematic execution by CI.

v2: Wrappers are not needed in the cases covered by this patch, so
    remove them and use ALLOW_ERROR_INJECTION() directly.

v3: Document the use of fault injection at probe time in xe_pci_probe
    and refer to it where ALLOW_ERROR_INJECTION() is used.

Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-10-03 08:58:26 -04:00
Matt Roper
c18d4193b5 drm/xe/guc: Convert register access to use xe_mmio
Stop using GT pointers for register access.

v2:
 - Don't drop the _Generic wrapper macro for xe_mmio_wait32_not() yet.
   Defer that to the final patch of the series instead.  (Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-66-matthew.d.roper@intel.com
2024-09-11 15:32:50 -07:00
Julia Filipchuk
636cdf6fbd drm/xe/guc: Enable w/a 14022293748 and 22019794406
Enable workarounds for HW bug where render engine reset fails. Given
that we're bumping the minimum required GuC version to 70.29, we're
guaranteed to always have support for this KLV in the GuC.

v2: Enable KLV correctly for either workaround (Lucas)
v4: Add check for minimum supported GuC firmware version. Enable w/a for
hw version 20.01 too. (Daniele)
v5 (Daniele): remove now unneeded fw type and version checks (JohnH)

Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240805205435.921921-1-daniele.ceraolospurio@intel.com
2024-08-08 13:47:27 -07:00
Michal Wajdeczko
26a22952c8 drm/xe: Don't rely on indirect includes from xe_mmio.h
These compilation units use udelay() or some GT oriented printk
functions without explicitly including proper header files, and
relying on #includes from the xe_mmio.h instead. Fix that.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com
2024-05-22 12:11:26 +02:00
Niranjana Vishwanathapura
d6219e1cd5 drm/xe: Add Indirect Ring State support
When Indirect Ring State is enabled, the Ring Buffer state and
Batch Buffer state are context save/restored to/from Indirect
Ring State instead of the LRC. The Indirect Ring State is a 4K
page mapped in global GTT at a 4K aligned address. This address
is programmed in the INDIRECT_RING_STATE register of the
corresponding context's LRC.

v2: Fix kernel-doc, add bspec reference
v3: Fix typo in commit text

Bspec: 67296, 67139
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240507224255.5059-3-niranjana.vishwanathapura@intel.com
2024-05-08 14:48:30 -07:00
Lucas De Marchi
ee72842306 drm/xe/ads: Use flexible-array
Zero-length arrays are deprecated and flexible arrays should be
used instead: https://www.kernel.org/doc/html/v6.9-rc7/process/deprecated.html#zero-length-and-one-element-arrays

Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Julia Lawall <julia.lawall@inria.fr>
Closes: https://lore.kernel.org/r/202405051824.AmjAI5Pg-lkp@intel.com/
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240506141917.205714-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-05-07 16:32:25 -07:00
Rodrigo Vivi
6b8ef44cc0 drm/xe: Introduce the wedged_mode debugfs
So, the wedged mode can be selected per device at runtime,
before the tests or before reproducing the issue.

v2: - s/busted/wedged
    - some locking consistency

v3: - remove mutex
    - toggle guc reset policy on any mode change

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Rodrigo Vivi
8ed9aaae39 drm/xe: Force wedged state and block GT reset upon any GPU hang
In many validation situations when debugging GPU Hangs,
it is useful to preserve the GT situation from the moment
that the timeout occurred.

This patch introduces a module parameter that could be used
on situations like this.

If xe.wedged module parameter is set to 2, Xe will be declared
wedged on every single execution timeout (a.k.a. GPU hang) right
after devcoredump snapshot capture and without attempting any
kind of GT reset and blocking entirely any kind of execution.

v2: Really block gt_reset from guc side. (Lucas)
    s/wedged/busted (Lucas)

v3: - s/busted/wedged
    - Really use global_flags (Dafna)
    - More robust timeout handling when wedging it.

v4: A really robust clean exit done by Matt Brost.
    No more kernel warns on unbind.

v5: Simplify error message (Lucas)

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Dafna Hirschfeld <dhirschfeld@habana.ai>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Himanshu Somaiya <himanshu.somaiya@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2024-04-24 12:12:58 -04:00
Vinay Belgaumkar
2817a1f1bf drm/xe/lnl: Apply GuC Wa_13011645652
Enable WA for a bug that could cause the C6 state machine to hang
during RC6 exit.

v2: Add comment clarifying the WA (John H)
v3: Add more details to the comment (John H)

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com
2024-04-17 15:21:12 -07:00
John Harrison
b7f888ee9c drm/xe/lnl: Enable more GuC based workarounds
There are a couple of new workarounds for LNL that are implemented in
the GuC firmware. The KMD needs to enable them explicitly.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com
2024-04-16 10:50:36 -07:00
Badal Nilawar
c151ff5c90 drm/xe/lnl: Enable GuC Wa_14019882105
Enable GuC Wa_14019882105 to block interrupts during C6 flow
when the memory path has been blocked

v2: Make helper function generic and name it as
    guc_waklv_enable_simple (John Harrison)
v3: Make warning descriptive (John Harrison)
v4: s/drm_WARN/xe_gt_WARN/ (Michal)

Cc: John Harrison <john.harrison@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-3-badal.nilawar@intel.com
2024-04-09 12:54:04 +02:00
Badal Nilawar
d6da81a478 drm/xe/guc: Add support for workaround KLVs
To prevent running out of bits, new workaround (w/a) enable flags are
being added via a KLV system instead of a 32 bit flags word.

v2: GuC version check > 70.10 is not needed as base line xe doesnot
    support anything below < 70.19
v3: Use 64 bit ggtt address for future
    compatibility (John Harrison/Daniele)
v4: %s/PAGE_SIZE/SZ_4K/ (Michal)

Cc: John Harrison <John.C.Harrison@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-2-badal.nilawar@intel.com
2024-04-09 12:54:02 +02:00
Lucas De Marchi
62742d1266 drm/xe: Normalize bo flags macros
The flags stored in the BO grew over time without following
much a naming pattern. First of all, get rid of the _BIT suffix that was
banned from everywhere else due to the guideline in
drivers/gpu/drm/i915/i915_reg.h that xe kind of follows:

	Define bits using ``REG_BIT(N)``. Do **not** add ``_BIT`` suffix to the name.

Here the flags aren't for a register, but it's good practice to keep it
consistent.

Second divergence on names is the use or not of "CREATE". This is
because most of the flags are passed to xe_bo_create*() family of
functions, changing its behavior. However, since the flags are also
stored in the bo itself and checked elsewhere in the code, it seems
better to just omit the CREATE part.

With those 2 guidelines, all the flags are given the form
XE_BO_FLAG_<FLAG_NAME> with the following commands:

	git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i \
		-e "s/XE_BO_\([_A-Z0-9]*\)_BIT/XE_BO_\1/g" \
		-e 's/XE_BO_CREATE_/XE_BO_FLAG_/g'
	git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i -r \
		-e 's/XE_BO_(DEFER_BACKING|SCANOUT|FIXED_PLACEMENT|PAGETABLE|NEEDS_CPU_ACCESS|NEEDS_UC|INTERNAL_TEST|INTERNAL_64K|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g'

And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to
follow the coding style.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2024-04-02 10:33:57 -07:00
Matthew Brost
231c411087 drm/xe: Add XE_BO_GGTT_INVALIDATE flag
Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be
invalidated when a BO is added / removed from the GGTT. This is
typically set when a BO is used by the GuC as the GuC has GGTT TLBs.

Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
[mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo]
[mlankhorst: Remove _BIT from name]
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com
2024-03-20 10:49:14 +01:00
Michał Winiarski
a44bbace73 drm/xe/guc: Allocate GuC data structures in system memory for initial load
GuC load will need to happen at an earlier point in probe, where local
memory is not yet available. Use system memory for GuC data structures
used for initial "hwconfig" load, and realloc at a later,
"post-hwconfig" load if needed, when local memory is available.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com
2024-02-20 14:13:42 -05:00
Lucas De Marchi
5a92da34dd drm/xe: Rename info.supports_* to info.has_*
Rename supports_mmio_ext and supports_usm to use a has_ prefix so the
flags are grouped together. This settles on just one variant for
positive info matching ("has_") and one for negative ("skip_").

Also make sure the has_* flags are grouped together in xe_pci.c.

Reviewed-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231205145235.2114761-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:27 -05:00
Niranjana Vishwanathapura
0d97ecce16 drm/xe: Enable Fixed CCS mode setting
Disable dynamic HW load balancing of compute resource assignment
to engines and instead enabled fixed mode of mapping compute
resources to engines on all platforms with more than one compute
engine.

By default enable only one CCS engine with all compute slices
assigned to it. This is the desired configuration for common
workloads.

PVC platform supports only the fixed CCS mode (workaround 16016805146).

v2: Rebase, make it platform agnostic
v3: Minor code refactoring

Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:26 -05:00
Michał Winiarski
0e1a47fcab drm/xe: Add a helper for DRM device-lifetime BO create
A helper for managed BO allocations makes it possible to remove specific
"fini" actions and will simplify the following patches adding ability to
execute a release action for specific BO directly.

Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:11 -05:00
Matt Roper
83af834e71 drm/xe/mocs: MOCS registers are multicast on Xe_HP and beyond
The MOCS registers should be written in an MCR-specific manner on Xe_HP
and beyond to prevent any other driver threads or external firmware from
putting the hardware into unicast mode while we initialize the MOCS
table.

Bspec: 66534, 67609, 71185
Cc: Ruthuvikas Ravikumar <ruthuvikas.ravikumar@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231023204112.2856331-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:22 -05:00