linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-21 08:13:56 -04:00

Author	SHA1	Message	Date
Lucas De Marchi	c31a0b6402	drm/xe: Set LRC addresses before guc load The metadata saved in the ADS is read by GuC when it's initialized. Saving the addresses to the LRCs when they are populated is too late as GuC will keep using the old ones. This was causing GuC to use the RCS LRC for any engine class. It's not a big problem on a Linux-only scenario since the they are used by GuC only on media engines when the watchdog is triggered. However, in a virtualization scenario with Windows as the VF, it causes the wrong LRCs to be loaded as the watchdog is used for all engines. Fix it by letting guc_golden_lrc_init() initialize the metadata, like other *_init() functions, and later guc_golden_lrc_populate() to copy the LRCs to the right places. The former is called before the second GuC load, while the latter is called after LRCs have been recorded. Cc: Chee Yin Wong <chee.yin.wong@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.11+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Chee Yin Wong <chee.yin.wong@intel.com> Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-04-14 05:49:53 -07:00
John Harrison	d3e8349edf	drm/xe/guc: Enable w/a 16026508708 The workaround is only relevant to SRIOV but does affect all platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250403185619.1555853-2-John.C.Harrison@Intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-04-10 14:09:35 -07:00
Matthew Brost	045448da87	drm/xe: Add XE_BO_FLAG_PINNED_NORESTORE Not all BOs need to be restored on resume / d3cold exit, add XE_BO_FLAG_PINNED_NO_RESTORE which skips restoring of BOs rather just allocates VRAM for the BO. This should slightly speedup resume / d3cold exit flows. Marking GuC ADS, GuC CT, GuC log, GuC PC, and SA as NORESTORE. v2: - s/WONTNEED/NORESTORE (Vivi) - Rebase on newly added g2g and backup object flow Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-11-matthew.auld@intel.com	2025-04-04 11:41:01 +01:00
John Harrison	ce22fccd07	drm/xe/guc: Re-word message about ADS size changes The error capture list in the ADS is initially allocated using a placeholder size. When the actual size is determinied later on, there is a debug print about the new size. However, the wording is such that some people see it as an unexpected thing and therefore a potential problem. So re-word it to be a little less concerning. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250325203211.3907890-1-John.C.Harrison@Intel.com	2025-03-28 12:05:05 -07:00
Aradhya Bhatia	8c5fe7d88b	drm/xe: Add Wa_16021333562 and Wa_14016712196 Wa_16021333562 and Wa_14016712196 are permanent workarounds that apply to multiple platforms. Wa_16021333562 applies to platforms ranging from TGL (12.00) to Xe_LPM (13.00), while Wa_14016712196 from DG2 (12.55) to Xe_LPG (12.74). Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-2-aradhya.bhatia@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-26 07:50:11 -08:00
Jesus Narvaez	ee5a1321df	drm/xe/guc: Adding steering info support for GuC register lists The guc_mmio_reg interface supports steering, but it is currently not implemented. This will allow the GuC to control steering of MMIO registers after save-restore and avoid reading from fused off MCR register instances. Fixes: `9c57bc0865` ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com	2025-01-09 10:46:28 -08:00
Lucas De Marchi	3fcf68d739	drm/xe: Apply whitelist to engine save-restore Instead of handling the whitelist directly in the GuC ADS initialization, make it follow the same logic as other engine registers that are save-restored. Main benefit is that then the SW tracking then shows it in debugfs and there's no risk of an engine workaround to write to the same nopriv register that is being passed directly to GuC. This means that xe_reg_whitelist_process_engine() only has to process the RTP and convert them to entries for the hwe. With that all the registers should be covered by xe_reg_sr_apply_mmio() to write to the HW and there's no special handling in GuC ADS to also add these registers to the list of registers that is passed to GuC. Example for DG2: # cat /sys/kernel/debug/dri/0000\:03\:00.0/gt0/register-save-restore ... Engine rcs0 ... REG[0x24d0] clr=0xffffffff set=0x1000dafc masked=no mcr=no REG[0x24d4] clr=0xffffffff set=0x1000db01 masked=no mcr=no REG[0x24d8] clr=0xffffffff set=0x0000db1c masked=no mcr=no ... Whitelist rcs0 REG[0xdafc-0xdaff]: allow read access REG[0xdb00-0xdb1f]: allow read access REG[0xdb1c-0xdb1f]: allow rw access v2: - Use ~0u for clr bits so it's just a write (Matt Roper) - Simplify helpers now that unused slots are not written Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-12-11 07:28:58 -08:00
Jonathan Cavitt	20124c3e22	drm/xe/xe_guc_ads: Add nonpriv registers to write list When performing a guc_mmio_regset_write, we add all the registers in the reg_sr list to the save/restore list, but do not do the same for the nonpriv registers. Add them in. v2: - Add all NONPRIV registers to avoid undefined behavior (Harrison) - s/whitelist/nonpriv v3: - Rebase Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249 Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: Lucas de Marchi <lucas.demarchi@intel.com> CC: Matt Roper <matthew.d.roper@intel.com> CC: John Harrison <john.c.harrison@intel.com> CC: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> CC: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241122180826.7075-1-jonathan.cavitt@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-12-03 14:02:59 -08:00
Lucas De Marchi	8262db9eff	drm/xe: Move Wa 1607983814 to oob needs_wa_1607983814() predates wa_oob, so it was not being printed in /sys/kernel/debug/dri/0/*/workarounds. Port it to OOB rules. This makes the WA show up in debugfs. For TGL: OOB Workarounds 1607983814 22012773006 1409600907 Eventually the RTP infra may add support for writing registers in a loop, which would allow to keep track of the registers as well. But for now, just listing it as OOB workaround is already an improvement. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029193258.749882-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-10-31 19:24:26 -07:00
Ashutosh Dixit	0191fddf53	Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs" This reverts commit `55858fa7eb`. '55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")' was not properly reviewed and also causes dmesg asserts in CI. Revert it. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295 Fixes: `55858fa7eb` ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com	2024-10-30 12:49:18 -07:00
Jonathan Cavitt	55858fa7eb	drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs Several OA registers and allowlist registers were missing from the save/restore list for GuC and could be lost during an engine reset. Add them to the list. v2: - Fix commit message (Umesh) - Add missing closes (Ashutosh) v3: - Add missing fixes (Ashutosh) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Suggested-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: stable@vger.kernel.org # v6.11+ Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com	2024-10-28 15:41:40 -07:00
Vinay Belgaumkar	61ef737db9	drm/xe/ptl: Apply Wa_14022866841 As part of this WA, GuC will hold a forcewake for certain MMIO accesses outside the GT/media domains. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com	2024-10-17 11:20:21 -07:00
Zhanjun Dong	b170d696c1	drm/xe/guc: Add XE_LP steered register lists Add the ability for runtime allocation and freeing of steered register list extentions that depend on the detected HW config fuses. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-3-zhanjun.dong@intel.com	2024-10-08 09:34:16 -07:00
Zhanjun Dong	9c8c7a7e6f	drm/xe/guc: Prepare GuC register list and update ADS size for error capture Add referenced registers defines and list of registers. Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent storage for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com	2024-10-08 09:34:04 -07:00
Francois Dugast	91b2c42c21	drm/xe: Use fault injection infrastructure to find issues at probe time The kernel fault injection infrastructure is used to test proper error handling during probe. The return code of the functions using ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION (among others). One way to use fault injection at probe time by making each of those functions fail one at a time is: FAILTYPE=fail_function DEVICE="0000:00:08.0" # depends on the system ERRNO=-12 # -ENOMEM, can depend on the function echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose modprobe xe echo $DEVICE > /sys/bus/pci/drivers/xe/unbind grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable \| \ cut -d ' ' -f 1 \| while read -r FUNCTION ; do echo "Injecting fault in $FUNCTION" echo "" > /sys/kernel/debug/$FAILTYPE/inject echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval echo $DEVICE > /sys/bus/pci/drivers/xe/bind done rmmod xe It will also be integrated into IGT for systematic execution by CI. v2: Wrappers are not needed in the cases covered by this patch, so remove them and use ALLOW_ERROR_INJECTION() directly. v3: Document the use of fault injection at probe time in xe_pci_probe and refer to it where ALLOW_ERROR_INJECTION() is used. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-10-03 08:58:26 -04:00
Matt Roper	c18d4193b5	drm/xe/guc: Convert register access to use xe_mmio Stop using GT pointers for register access. v2: - Don't drop the _Generic wrapper macro for xe_mmio_wait32_not() yet. Defer that to the final patch of the series instead. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-66-matthew.d.roper@intel.com	2024-09-11 15:32:50 -07:00
Julia Filipchuk	636cdf6fbd	drm/xe/guc: Enable w/a 14022293748 and 22019794406 Enable workarounds for HW bug where render engine reset fails. Given that we're bumping the minimum required GuC version to 70.29, we're guaranteed to always have support for this KLV in the GuC. v2: Enable KLV correctly for either workaround (Lucas) v4: Add check for minimum supported GuC firmware version. Enable w/a for hw version 20.01 too. (Daniele) v5 (Daniele): remove now unneeded fw type and version checks (JohnH) Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240805205435.921921-1-daniele.ceraolospurio@intel.com	2024-08-08 13:47:27 -07:00
Michal Wajdeczko	26a22952c8	drm/xe: Don't rely on indirect includes from xe_mmio.h These compilation units use udelay() or some GT oriented printk functions without explicitly including proper header files, and relying on #includes from the xe_mmio.h instead. Fix that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com	2024-05-22 12:11:26 +02:00
Niranjana Vishwanathapura	d6219e1cd5	drm/xe: Add Indirect Ring State support When Indirect Ring State is enabled, the Ring Buffer state and Batch Buffer state are context save/restored to/from Indirect Ring State instead of the LRC. The Indirect Ring State is a 4K page mapped in global GTT at a 4K aligned address. This address is programmed in the INDIRECT_RING_STATE register of the corresponding context's LRC. v2: Fix kernel-doc, add bspec reference v3: Fix typo in commit text Bspec: 67296, 67139 Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240507224255.5059-3-niranjana.vishwanathapura@intel.com	2024-05-08 14:48:30 -07:00
Lucas De Marchi	ee72842306	drm/xe/ads: Use flexible-array Zero-length arrays are deprecated and flexible arrays should be used instead: https://www.kernel.org/doc/html/v6.9-rc7/process/deprecated.html#zero-length-and-one-element-arrays Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202405051824.AmjAI5Pg-lkp@intel.com/ Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240506141917.205714-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-07 16:32:25 -07:00
Rodrigo Vivi	6b8ef44cc0	drm/xe: Introduce the wedged_mode debugfs So, the wedged mode can be selected per device at runtime, before the tests or before reproducing the issue. v2: - s/busted/wedged - some locking consistency v3: - remove mutex - toggle guc reset policy on any mode change Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	8ed9aaae39	drm/xe: Force wedged state and block GT reset upon any GPU hang In many validation situations when debugging GPU Hangs, it is useful to preserve the GT situation from the moment that the timeout occurred. This patch introduces a module parameter that could be used on situations like this. If xe.wedged module parameter is set to 2, Xe will be declared wedged on every single execution timeout (a.k.a. GPU hang) right after devcoredump snapshot capture and without attempting any kind of GT reset and blocking entirely any kind of execution. v2: Really block gt_reset from guc side. (Lucas) s/wedged/busted (Lucas) v3: - s/busted/wedged - Really use global_flags (Dafna) - More robust timeout handling when wedging it. v4: A really robust clean exit done by Matt Brost. No more kernel warns on unbind. v5: Simplify error message (Lucas) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dafna Hirschfeld <dhirschfeld@habana.ai> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himanshu Somaiya <himanshu.somaiya@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Vinay Belgaumkar	2817a1f1bf	drm/xe/lnl: Apply GuC Wa_13011645652 Enable WA for a bug that could cause the C6 state machine to hang during RC6 exit. v2: Add comment clarifying the WA (John H) v3: Add more details to the comment (John H) Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com	2024-04-17 15:21:12 -07:00
John Harrison	b7f888ee9c	drm/xe/lnl: Enable more GuC based workarounds There are a couple of new workarounds for LNL that are implemented in the GuC firmware. The KMD needs to enable them explicitly. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com	2024-04-16 10:50:36 -07:00
Badal Nilawar	c151ff5c90	drm/xe/lnl: Enable GuC Wa_14019882105 Enable GuC Wa_14019882105 to block interrupts during C6 flow when the memory path has been blocked v2: Make helper function generic and name it as guc_waklv_enable_simple (John Harrison) v3: Make warning descriptive (John Harrison) v4: s/drm_WARN/xe_gt_WARN/ (Michal) Cc: John Harrison <john.harrison@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-3-badal.nilawar@intel.com	2024-04-09 12:54:04 +02:00
Badal Nilawar	d6da81a478	drm/xe/guc: Add support for workaround KLVs To prevent running out of bits, new workaround (w/a) enable flags are being added via a KLV system instead of a 32 bit flags word. v2: GuC version check > 70.10 is not needed as base line xe doesnot support anything below < 70.19 v3: Use 64 bit ggtt address for future compatibility (John Harrison/Daniele) v4: %s/PAGE_SIZE/SZ_4K/ (Michal) Cc: John Harrison <John.C.Harrison@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-2-badal.nilawar@intel.com	2024-04-09 12:54:02 +02:00
Lucas De Marchi	62742d1266	drm/xe: Normalize bo flags macros The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_$[_A-Z0-9]$_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-02 10:33:57 -07:00
Matthew Brost	231c411087	drm/xe: Add XE_BO_GGTT_INVALIDATE flag Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be invalidated when a BO is added / removed from the GGTT. This is typically set when a BO is used by the GuC as the GuC has GGTT TLBs. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> [mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo] [mlankhorst: Remove _BIT from name] Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com	2024-03-20 10:49:14 +01:00
Michał Winiarski	a44bbace73	drm/xe/guc: Allocate GuC data structures in system memory for initial load GuC load will need to happen at an earlier point in probe, where local memory is not yet available. Use system memory for GuC data structures used for initial "hwconfig" load, and realloc at a later, "post-hwconfig" load if needed, when local memory is available. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com	2024-02-20 14:13:42 -05:00
Lucas De Marchi	5a92da34dd	drm/xe: Rename info.supports_* to info.has_* Rename supports_mmio_ext and supports_usm to use a has_ prefix so the flags are grouped together. This settles on just one variant for positive info matching ("has_") and one for negative ("skip_"). Also make sure the has_* flags are grouped together in xe_pci.c. Reviewed-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231205145235.2114761-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:27 -05:00
Niranjana Vishwanathapura	0d97ecce16	drm/xe: Enable Fixed CCS mode setting Disable dynamic HW load balancing of compute resource assignment to engines and instead enabled fixed mode of mapping compute resources to engines on all platforms with more than one compute engine. By default enable only one CCS engine with all compute slices assigned to it. This is the desired configuration for common workloads. PVC platform supports only the fixed CCS mode (workaround 16016805146). v2: Rebase, make it platform agnostic v3: Minor code refactoring Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:26 -05:00
Michał Winiarski	0e1a47fcab	drm/xe: Add a helper for DRM device-lifetime BO create A helper for managed BO allocations makes it possible to remove specific "fini" actions and will simplify the following patches adding ability to execute a release action for specific BO directly. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:11 -05:00
Matt Roper	83af834e71	drm/xe/mocs: MOCS registers are multicast on Xe_HP and beyond The MOCS registers should be written in an MCR-specific manner on Xe_HP and beyond to prevent any other driver threads or external firmware from putting the hardware into unicast mode while we initialize the MOCS table. Bspec: 66534, 67609, 71185 Cc: Ruthuvikas Ravikumar <ruthuvikas.ravikumar@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231023204112.2856331-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:22 -05:00
Francois Dugast	c73acc1eeb	drm/xe: Use Xe assert macros instead of XE_WARN_ON macro The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:08 -05:00
Daniele Ceraolo Spurio	296549107e	drm/xe: base definitions for the GSCCS The first step in introducing the GSCCS is to add all the basic defs for it (name, mmio base, class/instance, lrc size etc). Bspec: 60149, 60421, 63752 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230817201831.1583172-3-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:20 -05:00
Francois Dugast	9b9529ce37	drm/xe: Rename engine to exec_queue Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:20 -05:00
Francois Dugast	99fea68288	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Francois Dugast	3e8e7ee6a3	drm/xe: Cleanup style warnings Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by addressing those warnings types: LEADING_SPACE LINE_SPACING BRACES TRAILING_SEMICOLON CONSTANT_COMPARISON BLOCK_COMMENT_STYLE RETURN_VOID ONE_SEMICOLON SUSPECT_CODE_INDENT LINE_CONTINUATIONS UNNECESSARY_ELSE UNSPECIFIED_INT UNNECESSARY_INT MISORDERED_TYPE Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:31 -05:00
Matt Roper	876611c2b7	drm/xe: Memory allocations are tile-based, not GT-based Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:14 -05:00
Lucas De Marchi	ee21379acc	drm/xe: Rename reg field to addr Rename the address field to "addr" rather than "reg" so it's easier to understand what it is. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230508225322.2692066-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:33:50 -05:00
Lucas De Marchi	ce8bf5bd05	drm/xe/mmio: Use struct xe_reg Convert all the callers to deal with xe_mmio_*() using struct xe_reg instead of plain u32. In a few places there was also a rename s/reg/reg_val/ when dealing with the value returned so it doesn't get mixed up with the register address. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230508225322.2692066-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:33:49 -05:00
Lucas De Marchi	98ce59e9ba	drm/xe/guc: Handle RCU_MODE as masked from definition guc_mmio_regset_write() had a flags for the registers to be added to the GuC's regset list. The only register actually using that was RCU_MODE, but it was setting the flags to a bogus value. From struct xe_guc_fwif.h, #define GUC_REGSET_MASKED BIT(0) #define GUC_REGSET_MASKED_WITH_VALUE BIT(2) #define GUC_REGSET_RESTORE_ONLY BIT(3) Cross checking with i915, the only flag to set in RCU_MODE is GUC_REGSET_MASKED. That can be done automatically from the register, as long as the definition is correct. Add the XE_REG_OPTION_MASKED annotation to RCU_MODE and kill the "flags" field in guc_mmio_regset_write(): guc_mmio_regset_write_one() can decide that based on the register being passed. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230429062332.354139-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:33:14 -05:00
Lucas De Marchi	07fbd1f85d	drm/xe: Plumb xe_reg into WAs, rtp, etc Now that struct xe_reg and struct xe_reg_mcr are types that can be used by xe, convert more of the driver to use them. Some notes about the conversions: - The RTP tables don't need the MASKED flags anymore in the actions as that information now comes from the register definition - There is no need for the _XE_RTP_REG/_XE_RTP_REG_MCR macros and the register types on RTP infra: that comes from the register definitions. - When declaring the RTP entries, there is no need anymore to undef XE_REG and friends: the RTP macros deal with removing the cast where needed due to not being able to use a compound statement for initialization in the tables - The index in the reg-sr xarray is the register offset only. Otherwise we wouldn't catch mistakes about adding both a MCR-style and normal-style registers. For that, the register is now also part of the entry, so the options can be compared to check for compatible entries. In order to be able to accomplish this, some improvements are needed on the RTP macros. Change its implementation to concentrate on "pasting a prefix to each argument" rather than the more general "call any macro for each argument". Hopefully this will avoid trying to extend this infra and making it more complex. With the use of tuples for building the arguments, it's not possible to pass additional register fields and using xe_reg in the RTP tables. xe_mmio_* still need to be converted, from u32 to xe_reg, but that is left for another change. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230427223256.1432787-10-lucas.demarchi@intel.com Link: https://lore.kernel.org/r/20230427223256.1432787-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:32:21 -05:00
Lucas De Marchi	d9b79ad275	drm/xe: Drop gen afixes from registers The defines for the registers were brought over from i915 while bootstrapping the driver. As xe supports TGL and later only, it doesn't make sense to keep the GEN* prefixes and suffixes in the registers: TGL is graphics version 12, previously called "GEN12". So drop the prefix everywhere. v2: - Also drop _TGL suffix and reword commit message as suggested by Matt Roper. While at it, rename VSUNIT_CLKGATE_DIS_TGL to VSUNIT_CLKGATE2_DIS with the additional "2", so it doesn't clash with the define for the other register Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230427223256.1432787-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:32:15 -05:00
Lucas De Marchi	7b829f6dd6	drm/xe/guc: Convert GuC registers to REG_FIELD/REG_BIT Cleanup GuC register declarations by converting them to use REG_FIELD, REG_BIT and REG_GENMASK. While converting, also reorder the bitfields so they follow the convention of declaring the higher bits first. v2: - Drop unused HUC_LOADING_AGENT_VCR and DMA_ADDRESS_SPACE_GTT (Matt Roper) - Simplify HUC_LOADING_AGENT_GUC define (Matt Roper) Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20230427223256.1432787-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:32:15 -05:00
Lucas De Marchi	a9b1a13614	drm/xe/guc: Move GuC registers to regs/ There's no good reason to keep the GuC registers outside the regs/ directory: move the header with GuC registers under that. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:31:47 -05:00
Matt Roper	39fd0b4507	drm/xe/guc: Handle regset overflow check for entire GT Checking whether a single engine's register save/restore entries overflow the expected/pre-allocated GuC ADS regset area isn't terribly useful; we actually want to check whether the combined entries from all engines on the GT overflow the regset space. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230308005509.2975663-1-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:44 -05:00
Matt Roper	f659ac1564	drm/xe/mocs: LNCF MOCS settings only need to be restored on pre-Xe_HP Reprogramming the LNCF MOCS registers on render domain reset is not intended to be regular driver programming, but rather the implementation of a specific workaround (Wa_1607983814). This workaround no longer applies on Xe_HP any beyond, so we can expect that these registers, like the rest of the LNCF/LBCF registers, will maintain their values through all engine resets. We should only add these registers to the GuC's save/restore list on platforms that need the workaround. Furthermore, xe_mocs_init_engine() appears to be another attempt to satisfy this same workaround. This is unnecessary on the Xe driver since even on platforms where the workaround is necessary, all single-engine resets are initiated by the GuC and thus the GuC will take care of saving/restoring these registers. The only host-initiated resets we have in Xe are full GT resets which will already (re)initialize these registers as part of the regular xe_mocs_init() flow. v2: - Add needs_wa_1607983814() so that calculate_regset_size() doesn't overallocate regset space when the workaround isn't needed. (Lucas) - On platforms affected by Wa_1607983814, only add the LNCF MOCS registers to the render engine's GuC save/restore list; resets of other engines don't need to save/restore these. Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:43 -05:00
Lucas De Marchi	226bfec858	drm/xe: Remove dependency on intel_gt_regs.h Create regs/xe_gt_regs.h file with all the registers and bit definitions used by the xe driver. Eventually the registers may be defined in a different way and since xe doesn't supported below gen12, the number of registers touched is much smaller, so create a new header. The definitions themselves are direct copy from the gt/intel_gt_regs.h file, just sorting the registers by address. Cleaning those up and adhering to a common coding style is left for later. v2: Make the change to MCR_REG location in a separate patch to go through the i915 branch (Matt Roper / Rodrigo) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:21 -05:00
Lucas De Marchi	b79e8fd954	drm/xe: Remove dependency on intel_engine_regs.h Create regs/xe_engine_regs.h file with all the registers and bit definitions used by the xe driver. Eventually the registers may be defined in a different way and since xe doesn't supported below gen12, the number of registers touched is much smaller, so create a new header. The definitions themselves are direct copy from the gt/intel_engine_regs.h file, just sorting the registers by address. Cleaning those up and adhering to a common coding style is left for later. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:29:20 -05:00

1 2

52 Commits