linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-25 18:12:26 -04:00

Author	SHA1	Message	Date
Matt Roper	8367585154	drm/xe: Cleanup unused header includes clangd reports many "unused header" warnings throughout the Xe driver. Start working to clean this up by removing unnecessary includes in our .c files and/or replacing them with explicit includes of other headers that were previously being included indirectly. By far the most common offender here was unnecessary inclusion of xe_gt.h. That likely originates from the early days of xe.ko when xe_mmio did not exist and all register accesses, including those unrelated to GTs, were done with GT functions. There's still a lot of additional #include cleanup that can be done in the headers themselves; that will come as a followup series. v2: - Squash the 79-patch series down to a single patch. (MattB) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260115032803.4067824-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2026-01-15 07:05:04 -08:00
Lukasz Laguna	0f13dead4e	drm/xe: Update wedged.mode only after successful reset policy change Previously, the driver's internal wedged.mode state was updated without verifying whether the corresponding engine reset policy update in GuC succeeded. This could leave the driver reporting a wedged.mode state that doesn't match the actual reset behavior programmed in GuC. With this change, the reset policy is updated first, and the driver's wedged.mode state is modified only if the policy update succeeds on all available GTs. This patch also introduces two functional improvements: - The policy is sent to GuC only when a change is required. An update is needed only when entering or leaving XE_WEDGED_MODE_UPON_ANY_HANG, because only in that case the reset policy changes. For example, switching between XE_WEDGED_MODE_UPON_CRITICAL_ERROR and XE_WEDGED_MODE_NEVER doesn't affect the reset policy, so there is no need to send the same value to GuC. - An inconsistent_reset flag is added to track cases where reset policy update succeeds only on a subset of GTs. If such inconsistency is detected, future wedged mode configuration will force a retry of the reset policy update to restore a consistent state across all GTs. Fixes: `6b8ef44cc0` ("drm/xe: Introduce the wedged_mode debugfs") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Link: https://patch.msgid.link/20260107174741.29163-3-lukasz.laguna@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-08 16:07:42 -05:00
Lukasz Laguna	17d3c3365b	drm/xe: Validate wedged_mode parameter and define enum for modes Check correctness of the wedged_mode parameter input to ensure only supported values are accepted. Additionally, replace magic numbers with a clearly defined enum. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20260107174741.29163-2-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-01-08 16:07:07 -05:00
Michal Wajdeczko	09af64eba6	drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helper There are already few places in the code where we need to check GuC firmware version. Wrap existing raw conditions into a named helper macro to make it clear and avoid explicit call of the MAKE_GUC_VER. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com	2025-12-17 23:42:43 +01:00
Matt Roper	83f4151787	drm/xe/lnl: Drop pre-production workaround support LNL has been out long enough that all of our internal usage of pre-production hardware has been phased out and we no longer need to maintain workarounds that were exclusive to pre-production parts. Production LNL hardware always has B0 or later steppings for both graphics and media IP. Eliminate all workarounds that were exclusive to A-step hardware and set the 'has_prod_wa_only' device flag for LNL to make sure we warn and taint if someone tries to load the driver on an old pre-production part. Bspec: 70821 Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://patch.msgid.link/20251212181411.294854-4-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2025-12-12 21:17:10 -08:00
Brian Welty	94edd65186	drm/xe/xe3p_lpm: Configure MAIN_GAMCTRL_QUEUE_SELECT Starting from Xe3p, there are two different copies of some of the GAM registers: the traditional MCR variant at their old locations, and a new unicast copy known as "main_gamctrl." The Xe driver doesn't use these registers directly, but we need to instruct the GuC on which set it should use. Since the new, unicast registers are preferred (since they avoid the need for unnecessary MCR synchronization), set a new GuC feature flag, GUC_CTL_MAIN_GAMCTRL_QUEUES to convey this decision. A new helper function, xe_guc_using_main_gamctrl_queues(), is added for use in the 3 independent places that need to handle configuration of the new reporting queues. The mmio write to enable the main gamctl is only done during the general GuC upload. The gamctrl registers are not accessed by the GuC during hwconfig load. Last, the ADS blob for communicating the queue addresses contains both a DPA and GGTT offset. The GuC documentation states that DPA is now MBZ when using the MAIN_GAMCTRL queues. Bspec: 76445, 73540 Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20251019-xe3p-gamctrl-v1-1-ad66d3c1908f@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-20 17:21:11 -07:00
Arun Abhishek Chowhan	64d00d41f5	drm/xe: Sort include files alphabetically. Sort the include lines alphabetically, no impact on code behavior. Signed-off-by: Arun Abhishek Chowhan <arun.abhishek.chowhan@intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://lore.kernel.org/r/20251013095914.3742505-1-arun.abhishek.chowhan@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-10-13 07:18:20 -07:00
Lucas De Marchi	19baa830fb	drm/xe: Use ARRAY_SIZE in guc_waklv_init() Prefer using ARRAY_SIZE where needed and just passing 1 instead of calculating the size of one element. Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202508130158.eogeBZQT-lkp@intel.com/ Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250912-guc-ads-array-size-v1-1-a6555392a1f8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-09-15 07:50:53 -07:00
Badal Nilawar	29042df3ac	drm/xe/psmi: Add Wa_14020001231 Enable Wa 14020001231 to block psmi interrupts during C6 entry exit flow. It's only enabled if PSMI is enabled in runtime. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-4-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Matt Atwood	4d5c98eb77	drm/xe: rename XE_WA to XE_GT_WA Now that there are two types of wa tables and infrastructure, be more concise in the naming of GT wa macros. v2: update the documentation Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-08 10:50:45 -04:00
Jonathan Cavitt	7c9de25efa	drm/xe/xe_guc_ads: Consolidate guc_waklv_enable functions Presently, multiple versions of the guc_waklv_enable_.* function exist, all with different numbers of dwords added to the klv_entry array. This is not extensible, and more duplicates of the function will need to be created if it ever becomes necessary to support 3 or more dwords per wa in the future. Consolidate the disparate guc_waklv_enable functions into a single guc_waklv_enable function that can take an arbitrary number of dword values. v2: - Update length value properly (Shuicheng) v3: (Harrison) - Use data as a term instead of dwords or arr - Reformat warning message to use hex values - Eliminate need for kzalloc and klv_entry array - Reorder function parameters to fix line wrapping v4: - Miscellaneous formatting fixes (Cavitt) v5: (Harrison) - s/data_range/data_len_dw - Use data_len_dw to calculate size for xe_map_memcpy_to Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Lucas De Marchi <lucas.demarch@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250728194806.68176-2-jonathan.cavitt@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-05 07:59:38 -07:00
Sk Anirban	d72779c29d	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. Signed-off-by: Sk Anirban <sk.anirban@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250716101622.3421480-2-sk.anirban@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-07-24 14:01:27 -07:00
Michal Wajdeczko	621a422079	drm/xe/guc: Don't allocate temporary policies object Since we are already using reusable buffer objects from the GuC buffer cache, we can directly write into their CPU pointers and spare unnecessary temporary allocation. While around, also make sure to clear obtained buffer, to avoid sending some stale data. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250702142504.1656-1-michal.wajdeczko@intel.com	2025-07-10 16:00:13 +02:00
Matthew Brost	ec9223b49a	drm/xe: Drop bo->size bo->size is redundant because the base GEM object already has a size field with the same value. Drop bo->size and use the base GEM object’s size instead. While at it, introduce xe_bo_size() to abstract the BO size. v2: - Fix typo in kernel doc (Ashutosh) - Fix kunit (CI) - Fix line wrap (Checkpatch) v3: - Fix sriov build (CI) v4: - Fix display build (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com	2025-06-27 14:52:31 -07:00
Daniele Ceraolo Spurio	dfe6c28132	Revert "drm/xe/ptl: Apply Wa_16026007364" This reverts commit `3972872e45`. There are several things wrong with the way this WA was implemented: - The KLV is only supported on GuC 70.47.0 or newer, so we shouldn't apply it unconditionally. - The KLV requires 2 DWs of data, which are not currently provided. The GuC currently ignores any unknown KLVs, so on versions older that 70.47.0 nothing happens. However, starting on 70.47.0 the GuC attempts to parse the KLV and fails due to the missing data, causing a GuC load abort. Given that 70.47.0 is the first GuC version approved for public release for PTL, let's revert this patch so it doesn't cause the GuC load to fail with that blob. We can then re-apply it properly fixed after the GuC definition is merged, which will also have the added benefit of running the KLV addition through CI with the right GuC version. Fixes: `3972872e45` ("drm/xe/ptl: Apply Wa_16026007364") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: sanirban <sk.anirban@intel.com> Cc: Badal Nilawar <badal.nilawar@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250625001202.1616606-2-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-25 10:18:04 -04:00
sanirban	3972872e45	drm/xe/ptl: Apply Wa_16026007364 As part of this WA GuC will save and restore value of two XE3_Media control registers that were not included in the HW power context. v2: - Update klv name (Badal) Signed-off-by: sanirban <sk.anirban@intel.com> Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250619133413.107423-2-sk.anirban@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-20 15:36:43 -04:00
Michal Wajdeczko	3dbab383e3	drm/xe/guc: Don't allocate managed BO for each policy change We shouldn't use xe_managed_bo_create_from_data() to allocate temporary BO, as it will be released only on unload and every change in wedge_mode policy will consume resources (including precious GGTT). Instead just switchover to GuC buffer cache. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250512220018.172-3-michal.wajdeczko@intel.com	2025-05-15 12:29:55 +02:00
Lucas De Marchi	c31a0b6402	drm/xe: Set LRC addresses before guc load The metadata saved in the ADS is read by GuC when it's initialized. Saving the addresses to the LRCs when they are populated is too late as GuC will keep using the old ones. This was causing GuC to use the RCS LRC for any engine class. It's not a big problem on a Linux-only scenario since the they are used by GuC only on media engines when the watchdog is triggered. However, in a virtualization scenario with Windows as the VF, it causes the wrong LRCs to be loaded as the watchdog is used for all engines. Fix it by letting guc_golden_lrc_init() initialize the metadata, like other *_init() functions, and later guc_golden_lrc_populate() to copy the LRCs to the right places. The former is called before the second GuC load, while the latter is called after LRCs have been recorded. Cc: Chee Yin Wong <chee.yin.wong@intel.com> Cc: John Harrison <john.c.harrison@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: <stable@vger.kernel.org> # v6.11+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Tested-by: Chee Yin Wong <chee.yin.wong@intel.com> Link: https://lore.kernel.org/r/20250409-fix-guc-ads-v1-1-494135f7a5d0@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-04-14 05:49:53 -07:00
John Harrison	d3e8349edf	drm/xe/guc: Enable w/a 16026508708 The workaround is only relevant to SRIOV but does affect all platforms. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://lore.kernel.org/r/20250403185619.1555853-2-John.C.Harrison@Intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-04-10 14:09:35 -07:00
Matthew Brost	045448da87	drm/xe: Add XE_BO_FLAG_PINNED_NORESTORE Not all BOs need to be restored on resume / d3cold exit, add XE_BO_FLAG_PINNED_NO_RESTORE which skips restoring of BOs rather just allocates VRAM for the BO. This should slightly speedup resume / d3cold exit flows. Marking GuC ADS, GuC CT, GuC log, GuC PC, and SA as NORESTORE. v2: - s/WONTNEED/NORESTORE (Vivi) - Rebase on newly added g2g and backup object flow Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-11-matthew.auld@intel.com	2025-04-04 11:41:01 +01:00
John Harrison	ce22fccd07	drm/xe/guc: Re-word message about ADS size changes The error capture list in the ADS is initially allocated using a placeholder size. When the actual size is determinied later on, there is a debug print about the new size. However, the wording is such that some people see it as an unexpected thing and therefore a potential problem. So re-word it to be a little less concerning. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250325203211.3907890-1-John.C.Harrison@Intel.com	2025-03-28 12:05:05 -07:00
Aradhya Bhatia	8c5fe7d88b	drm/xe: Add Wa_16021333562 and Wa_14016712196 Wa_16021333562 and Wa_14016712196 are permanent workarounds that apply to multiple platforms. Wa_16021333562 applies to platforms ranging from TGL (12.00) to Xe_LPM (13.00), while Wa_14016712196 from DG2 (12.55) to Xe_LPG (12.74). Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250220094645.358647-2-aradhya.bhatia@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-02-26 07:50:11 -08:00
Jesus Narvaez	ee5a1321df	drm/xe/guc: Adding steering info support for GuC register lists The guc_mmio_reg interface supports steering, but it is currently not implemented. This will allow the GuC to control steering of MMIO registers after save-restore and avoid reading from fused off MCR register instances. Fixes: `9c57bc0865` ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: Jesus Narvaez <jesus.narvaez@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212190100.3768068-1-jesus.narvaez@intel.com	2025-01-09 10:46:28 -08:00
Lucas De Marchi	3fcf68d739	drm/xe: Apply whitelist to engine save-restore Instead of handling the whitelist directly in the GuC ADS initialization, make it follow the same logic as other engine registers that are save-restored. Main benefit is that then the SW tracking then shows it in debugfs and there's no risk of an engine workaround to write to the same nopriv register that is being passed directly to GuC. This means that xe_reg_whitelist_process_engine() only has to process the RTP and convert them to entries for the hwe. With that all the registers should be covered by xe_reg_sr_apply_mmio() to write to the HW and there's no special handling in GuC ADS to also add these registers to the list of registers that is passed to GuC. Example for DG2: # cat /sys/kernel/debug/dri/0000\:03\:00.0/gt0/register-save-restore ... Engine rcs0 ... REG[0x24d0] clr=0xffffffff set=0x1000dafc masked=no mcr=no REG[0x24d4] clr=0xffffffff set=0x1000db01 masked=no mcr=no REG[0x24d8] clr=0xffffffff set=0x0000db1c masked=no mcr=no ... Whitelist rcs0 REG[0xdafc-0xdaff]: allow read access REG[0xdb00-0xdb1f]: allow read access REG[0xdb1c-0xdb1f]: allow rw access v2: - Use ~0u for clr bits so it's just a write (Matt Roper) - Simplify helpers now that unused slots are not written Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-12-11 07:28:58 -08:00
Jonathan Cavitt	20124c3e22	drm/xe/xe_guc_ads: Add nonpriv registers to write list When performing a guc_mmio_regset_write, we add all the registers in the reg_sr list to the save/restore list, but do not do the same for the nonpriv registers. Add them in. v2: - Add all NONPRIV registers to avoid undefined behavior (Harrison) - s/whitelist/nonpriv v3: - Rebase Closes: https://gitlab.freedesktop.org/drm/xe/kernel/issues/2249 Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: Lucas de Marchi <lucas.demarchi@intel.com> CC: Matt Roper <matthew.d.roper@intel.com> CC: John Harrison <john.c.harrison@intel.com> CC: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> CC: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241122180826.7075-1-jonathan.cavitt@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-12-03 14:02:59 -08:00
Lucas De Marchi	8262db9eff	drm/xe: Move Wa 1607983814 to oob needs_wa_1607983814() predates wa_oob, so it was not being printed in /sys/kernel/debug/dri/0/*/workarounds. Port it to OOB rules. This makes the WA show up in debugfs. For TGL: OOB Workarounds 1607983814 22012773006 1409600907 Eventually the RTP infra may add support for writing registers in a loop, which would allow to keep track of the registers as well. But for now, just listing it as OOB workaround is already an improvement. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029193258.749882-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-10-31 19:24:26 -07:00
Ashutosh Dixit	0191fddf53	Revert "drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs" This reverts commit `55858fa7eb`. '55858fa7eb2f ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs")' was not properly reviewed and also causes dmesg asserts in CI. Revert it. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3295 Fixes: `55858fa7eb` ("drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241029200147.1476513-1-ashutosh.dixit@intel.com	2024-10-30 12:49:18 -07:00
Jonathan Cavitt	55858fa7eb	drm/xe/xe_guc_ads: save/restore OA registers and allowlist regs Several OA registers and allowlist registers were missing from the save/restore list for GuC and could be lost during an engine reset. Add them to the list. v2: - Fix commit message (Umesh) - Add missing closes (Ashutosh) v3: - Add missing fixes (Ashutosh) Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2249 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Suggested-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Suggested-by: John Harrison <john.c.harrison@intel.com> Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> CC: stable@vger.kernel.org # v6.11+ Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241023200716.82624-1-jonathan.cavitt@intel.com	2024-10-28 15:41:40 -07:00
Vinay Belgaumkar	61ef737db9	drm/xe/ptl: Apply Wa_14022866841 As part of this WA, GuC will hold a forcewake for certain MMIO accesses outside the GT/media domains. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241015234428.2004825-1-vinay.belgaumkar@intel.com	2024-10-17 11:20:21 -07:00
Zhanjun Dong	b170d696c1	drm/xe/guc: Add XE_LP steered register lists Add the ability for runtime allocation and freeing of steered register list extentions that depend on the detected HW config fuses. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-3-zhanjun.dong@intel.com	2024-10-08 09:34:16 -07:00
Zhanjun Dong	9c8c7a7e6f	drm/xe/guc: Prepare GuC register list and update ADS size for error capture Add referenced registers defines and list of registers. Update GuC ADS size allocation to include space for the lists of error state capture register descriptors. Then, populate GuC ADS with the lists of registers we want GuC to report back to host on engine reset events. This list should include global, engine-class and engine-instance registers for every engine-class type on the current hardware. Ensure we allocate a persistent storage for the register lists that are populated into ADS so that we don't need to allocate memory during GT resets when GuC is reloaded and ADS population happens again. Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-2-zhanjun.dong@intel.com	2024-10-08 09:34:04 -07:00
Francois Dugast	91b2c42c21	drm/xe: Use fault injection infrastructure to find issues at probe time The kernel fault injection infrastructure is used to test proper error handling during probe. The return code of the functions using ALLOW_ERROR_INJECTION() can be conditionnally modified at runtime by tuning some debugfs entries. This requires CONFIG_FUNCTION_ERROR_INJECTION (among others). One way to use fault injection at probe time by making each of those functions fail one at a time is: FAILTYPE=fail_function DEVICE="0000:00:08.0" # depends on the system ERRNO=-12 # -ENOMEM, can depend on the function echo N > /sys/kernel/debug/$FAILTYPE/task-filter echo 100 > /sys/kernel/debug/$FAILTYPE/probability echo 0 > /sys/kernel/debug/$FAILTYPE/interval echo -1 > /sys/kernel/debug/$FAILTYPE/times echo 0 > /sys/kernel/debug/$FAILTYPE/space echo 1 > /sys/kernel/debug/$FAILTYPE/verbose modprobe xe echo $DEVICE > /sys/bus/pci/drivers/xe/unbind grep -oP "^.* \[xe\]" /sys/kernel/debug/$FAILTYPE/injectable \| \ cut -d ' ' -f 1 \| while read -r FUNCTION ; do echo "Injecting fault in $FUNCTION" echo "" > /sys/kernel/debug/$FAILTYPE/inject echo $FUNCTION > /sys/kernel/debug/$FAILTYPE/inject printf %#x $ERRNO > /sys/kernel/debug/$FAILTYPE/$FUNCTION/retval echo $DEVICE > /sys/bus/pci/drivers/xe/bind done rmmod xe It will also be integrated into IGT for systematic execution by CI. v2: Wrappers are not needed in the cases covered by this patch, so remove them and use ALLOW_ERROR_INJECTION() directly. v3: Document the use of fault injection at probe time in xe_pci_probe and refer to it where ALLOW_ERROR_INJECTION() is used. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927151207.399354-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-10-03 08:58:26 -04:00
Matt Roper	c18d4193b5	drm/xe/guc: Convert register access to use xe_mmio Stop using GT pointers for register access. v2: - Don't drop the _Generic wrapper macro for xe_mmio_wait32_not() yet. Defer that to the final patch of the series instead. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240910234719.3335472-66-matthew.d.roper@intel.com	2024-09-11 15:32:50 -07:00
Julia Filipchuk	636cdf6fbd	drm/xe/guc: Enable w/a 14022293748 and 22019794406 Enable workarounds for HW bug where render engine reset fails. Given that we're bumping the minimum required GuC version to 70.29, we're guaranteed to always have support for this KLV in the GuC. v2: Enable KLV correctly for either workaround (Lucas) v4: Add check for minimum supported GuC firmware version. Enable w/a for hw version 20.01 too. (Daniele) v5 (Daniele): remove now unneeded fw type and version checks (JohnH) Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240805205435.921921-1-daniele.ceraolospurio@intel.com	2024-08-08 13:47:27 -07:00
Michal Wajdeczko	26a22952c8	drm/xe: Don't rely on indirect includes from xe_mmio.h These compilation units use udelay() or some GT oriented printk functions without explicitly including proper header files, and relying on #includes from the xe_mmio.h instead. Fix that. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240520181814.2392-3-michal.wajdeczko@intel.com	2024-05-22 12:11:26 +02:00
Niranjana Vishwanathapura	d6219e1cd5	drm/xe: Add Indirect Ring State support When Indirect Ring State is enabled, the Ring Buffer state and Batch Buffer state are context save/restored to/from Indirect Ring State instead of the LRC. The Indirect Ring State is a 4K page mapped in global GTT at a 4K aligned address. This address is programmed in the INDIRECT_RING_STATE register of the corresponding context's LRC. v2: Fix kernel-doc, add bspec reference v3: Fix typo in commit text Bspec: 67296, 67139 Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240507224255.5059-3-niranjana.vishwanathapura@intel.com	2024-05-08 14:48:30 -07:00
Lucas De Marchi	ee72842306	drm/xe/ads: Use flexible-array Zero-length arrays are deprecated and flexible arrays should be used instead: https://www.kernel.org/doc/html/v6.9-rc7/process/deprecated.html#zero-length-and-one-element-arrays Reported-by: kernel test robot <lkp@intel.com> Reported-by: Julia Lawall <julia.lawall@inria.fr> Closes: https://lore.kernel.org/r/202405051824.AmjAI5Pg-lkp@intel.com/ Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240506141917.205714-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-07 16:32:25 -07:00
Rodrigo Vivi	6b8ef44cc0	drm/xe: Introduce the wedged_mode debugfs So, the wedged mode can be selected per device at runtime, before the tests or before reproducing the issue. v2: - s/busted/wedged - some locking consistency v3: - remove mutex - toggle guc reset policy on any mode change Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-4-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Rodrigo Vivi	8ed9aaae39	drm/xe: Force wedged state and block GT reset upon any GPU hang In many validation situations when debugging GPU Hangs, it is useful to preserve the GT situation from the moment that the timeout occurred. This patch introduces a module parameter that could be used on situations like this. If xe.wedged module parameter is set to 2, Xe will be declared wedged on every single execution timeout (a.k.a. GPU hang) right after devcoredump snapshot capture and without attempting any kind of GT reset and blocking entirely any kind of execution. v2: Really block gt_reset from guc side. (Lucas) s/wedged/busted (Lucas) v3: - s/busted/wedged - Really use global_flags (Dafna) - More robust timeout handling when wedging it. v4: A really robust clean exit done by Matt Brost. No more kernel warns on unbind. v5: Simplify error message (Lucas) Cc: Matthew Brost <matthew.brost@intel.com> Cc: Dafna Hirschfeld <dhirschfeld@habana.ai> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: Himanshu Somaiya <himanshu.somaiya@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240423221817.1285081-3-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-24 12:12:58 -04:00
Vinay Belgaumkar	2817a1f1bf	drm/xe/lnl: Apply GuC Wa_13011645652 Enable WA for a bug that could cause the C6 state machine to hang during RC6 exit. v2: Add comment clarifying the WA (John H) v3: Add more details to the comment (John H) Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240417054802.1766359-1-vinay.belgaumkar@intel.com	2024-04-17 15:21:12 -07:00
John Harrison	b7f888ee9c	drm/xe/lnl: Enable more GuC based workarounds There are a couple of new workarounds for LNL that are implemented in the GuC firmware. The KMD needs to enable them explicitly. Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240410002646.3002394-2-John.C.Harrison@Intel.com	2024-04-16 10:50:36 -07:00
Badal Nilawar	c151ff5c90	drm/xe/lnl: Enable GuC Wa_14019882105 Enable GuC Wa_14019882105 to block interrupts during C6 flow when the memory path has been blocked v2: Make helper function generic and name it as guc_waklv_enable_simple (John Harrison) v3: Make warning descriptive (John Harrison) v4: s/drm_WARN/xe_gt_WARN/ (Michal) Cc: John Harrison <john.harrison@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-3-badal.nilawar@intel.com	2024-04-09 12:54:04 +02:00
Badal Nilawar	d6da81a478	drm/xe/guc: Add support for workaround KLVs To prevent running out of bits, new workaround (w/a) enable flags are being added via a KLV system instead of a 32 bit flags word. v2: GuC version check > 70.10 is not needed as base line xe doesnot support anything below < 70.19 v3: Use 64 bit ggtt address for future compatibility (John Harrison/Daniele) v4: %s/PAGE_SIZE/SZ_4K/ (Michal) Cc: John Harrison <John.C.Harrison@intel.com> Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405084231.3620848-2-badal.nilawar@intel.com	2024-04-09 12:54:02 +02:00
Lucas De Marchi	62742d1266	drm/xe: Normalize bo flags macros The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_$[_A-Z0-9]$_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-02 10:33:57 -07:00
Matthew Brost	231c411087	drm/xe: Add XE_BO_GGTT_INVALIDATE flag Add XE_BO_GGTT_INVALIDATE flag which indicates the GGTT should be invalidated when a BO is added / removed from the GGTT. This is typically set when a BO is used by the GuC as the GuC has GGTT TLBs. Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> [mlankhorst: Small fix to only inherit GGTT_INVALIDATE from src bo] [mlankhorst: Remove _BIT from name] Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240306052002.311196-4-matthew.brost@intel.com	2024-03-20 10:49:14 +01:00
Michał Winiarski	a44bbace73	drm/xe/guc: Allocate GuC data structures in system memory for initial load GuC load will need to happen at an earlier point in probe, where local memory is not yet available. Use system memory for GuC data structures used for initial "hwconfig" load, and realloc at a later, "post-hwconfig" load if needed, when local memory is available. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219130530.1406044-1-michal.winiarski@intel.com	2024-02-20 14:13:42 -05:00
Lucas De Marchi	5a92da34dd	drm/xe: Rename info.supports_* to info.has_* Rename supports_mmio_ext and supports_usm to use a has_ prefix so the flags are grouped together. This settles on just one variant for positive info matching ("has_") and one for negative ("skip_"). Also make sure the has_* flags are grouped together in xe_pci.c. Reviewed-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231205145235.2114761-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:27 -05:00
Niranjana Vishwanathapura	0d97ecce16	drm/xe: Enable Fixed CCS mode setting Disable dynamic HW load balancing of compute resource assignment to engines and instead enabled fixed mode of mapping compute resources to engines on all platforms with more than one compute engine. By default enable only one CCS engine with all compute slices assigned to it. This is the desired configuration for common workloads. PVC platform supports only the fixed CCS mode (workaround 16016805146). v2: Rebase, make it platform agnostic v3: Minor code refactoring Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:26 -05:00
Michał Winiarski	0e1a47fcab	drm/xe: Add a helper for DRM device-lifetime BO create A helper for managed BO allocations makes it possible to remove specific "fini" actions and will simplify the following patches adding ability to execute a release action for specific BO directly. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:11 -05:00
Matt Roper	83af834e71	drm/xe/mocs: MOCS registers are multicast on Xe_HP and beyond The MOCS registers should be written in an MCR-specific manner on Xe_HP and beyond to prevent any other driver threads or external firmware from putting the hardware into unicast mode while we initialize the MOCS table. Bspec: 66534, 67609, 71185 Cc: Ruthuvikas Ravikumar <ruthuvikas.ravikumar@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231023204112.2856331-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:22 -05:00

1 2

69 Commits