Commit Graph

258 Commits

Author SHA1 Message Date
Jouni Högander
b11a020d91 drm/xe: Do clean shutdown also when using flr
Currently Xe driver is triggering flr without any clean-up on
shutdown. This is causing random warnings from pending related works as the
underlying hardware is reset in the middle of their execution.

Fix this by performing clean shutdown also when using flr.

Fixes: 501d799a47 ("drm/xe: Wire up device shutdown handler")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
(cherry picked from commit a4ff26b7c8)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07 03:05:32 -08:00
Tejas Upadhyay
9cd27eec87 drm/xe: Move declarations under conditional branch
The xe_device_shutdown() function was needing a few declarations
that were only required under a specific condition. This change
moves those declarations to be within that conditional branch
to avoid unnecessary declarations.

Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit 15b3036045)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-11-07 03:05:20 -08:00
Kenneth Graunke
e5ae8d1eb0 drm/xe: Increase global invalidation timeout to 1000us
The previous timeout of 500us seems to be too small; panning the map in
the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered
timeouts within a few seconds of usage, causing the monitor to freeze
and the following to be printed to dmesg:

[Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout
[Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out

I haven't hit a single timeout since increasing it to 1000us even after
several multi-hour testing sessions.

Fixes: 0dd2dd0182 ("drm/xe: Move DSB l2 flush to a more sensible place")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 146046907b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-13 13:02:50 -07:00
Michal Wajdeczko
3734c8184d drm/xe/vf: Don't claim support for firmware late-bind if VF
In general, the VFs can't load firmwares so attempt to initialize
the firmware late-bind component leads to errors like:

 [] xe 0000:03:00.1: [drm] *ERROR* Late bind component not bound

Fixes: 918bd789d6 ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com
(cherry picked from commit e35e288090)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-02 21:57:52 -07:00
Michal Wajdeczko
de61d5944c drm/xe/vf: Rename sriov_update_device_info
This is a VF only function and its name should reflect that to
avoid any confusion. Move the VF check to the caller side.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com
(cherry picked from commit b88bb1eefa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-02 21:57:51 -07:00
Badal Nilawar
918bd789d6 drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw
Introduce xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
v6:
 - rebased
v7:
 - rebased
 - In xe_late_bind_init, use drm_err when returning an error to
   stop the probe (Lucas)
 - Use imperative mode in commit message (Lucas)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Varun Gupta
010629e00d drm/xe: Fix driver reference in FLR comment
Rectify the reference of i915 to Xe in a comment.

v2: Cosmetic changes. (Karthik)
v3: Rephrased the commit message. (Karthik)

Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-11 08:00:33 -07:00
Thomas Hellström
a2f2453c2c drm/xe: Convert xe_bo_create_user() for exhaustive eviction
Use the xe_validation_guard() to convert xe_bo_create_user()
for exhaustive eviction.

v2:
- Adapt to argument changes of xe_validation_guard()

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Link: https://lore.kernel.org/r/20250908101246.65025-4-thomas.hellstrom@linux.intel.com
2025-09-10 09:15:57 +02:00
Satyanarayana K V P
be5590c384 drm/xe/vf: Enable CCS save/restore only on supported GUC versions
CCS save/restore is supported starting with GuC 70.48.0 (compatibility
version 1.23.0). Gate the feature on the GuC firmware version and keep it
disabled on older or unsupported versions.

Fixes: f3009272ff ("drm/xe/vf: Create contexts for CCS read write")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250902103256.21658-2-satyanarayana.k.v.p@intel.com
2025-09-02 18:59:17 +02:00
Riana Tauro
f646c9f937 drm/xe/doc: Document device wedged and runtime survivability
Add documentation for vendor specific device wedged recovery method
and runtime survivability.

v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
v5: add more documentation

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
41ff795aff drm/xe/xe_survivability: Refactor survivability mode
Refactor survivability mode code to support both boot
and runtime survivability.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
60439ac3f2 drm/xe: Add a helper function to set recovery method
Add a helper function to set recovery method. The recovery
method can be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set.

v2: fix documentation (Raag)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
90fdcf5f89 drm/xe: Set GT as wedged before sending wedged uevent
Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.

Cc: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-4-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
38fc73b8c7 drm/xe: Add documentation for Xe Device Wedging
Add documentation for Xe Device Wedging so that
file can be referenced in following patches.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Himal Prasad Ghimiray
418807860e drm/xe/uapi: Add UAPI for querying VMA count and memory attributes
Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow
userspace to query memory attributes of VMAs within a user specified
virtual address range.

Userspace first calls the ioctl with num_mem_ranges = 0,
sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve
the number of memory ranges (vmas) and size of each memory range attribute.
Then, it allocates a buffer of that size and calls the ioctl again to fill
the buffer with memory range attributes.

This two-step interface allows userspace to first query the required
buffer size, then retrieve detailed attributes efficiently.

v2 (Matthew Brost)
- Use same ioctl to overload functionality

v3
- Add kernel-doc

v4
- Make uapi future proof by passing struct size (Matthew Brost)
- make lock interruptible (Matthew Brost)
- set reserved bits to zero (Matthew Brost)
- s/__copy_to_user/copy_to_user (Matthew Brost)
- Avod using VMA term in uapi (Thomas)
- xe_vm_put(vm) is missing (Shuicheng)

v5
- Nits
- Fix kernel-doc

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-21-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray
e80b05b09f drm/xe: Enable madvise ioctl for xe
Ioctl enables setting up of memory attributes in user provided range.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-20-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-08-26 11:25:36 +05:30
Lucas De Marchi
aaa0c1f50a drm/xe/psmi: Add debugfs interface for PSMI
Requirement for PSMI capture is to have a physically contiguous buffer.
All the needed configuration is done by the userspace tool directly to
the GPU via mmio access.

This interface only support allocating from VRAM regions. For integrated
devices, the PSMI buffer is in SYSTEM memory and should be allocated by
userspace using hugetlbfs.

Here we add the ability to allocate a region of physically contiguous
memory by writing to debugfs file (listed below). For multi-tile devices,
the capture tool requires ability to allocate a capture buffer per tile
(VRAM region) and so user can specify a region_mask. The tool then
can mmap the buffers via direct mmap of the PCIBAR via sysfs.

To support the capture tool, 3 new debugfs entries are added:

   psmi_capture_addr - physical address per VRAM region's capture buffer
   psmi_capture_region_mask - select which region(s) to allocate a buffer
   psmi_capture_size - size of current capture buffer

Writing psmi_capture_size will allocate new buffer of requested size per
region after freeing any current buffers.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Original-author: Brian Welty <brian.welty@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> # v2
Link: https://lore.kernel.org/r/20250821-psmi-v5-2-34ab7550d3d8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-08-22 11:46:43 -07:00
Matt Atwood
4d5c98eb77 drm/xe: rename XE_WA to XE_GT_WA
Now that there are two types of wa tables and infrastructure, be more
concise in the naming of GT wa macros.

v2: update the documentation

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-08 10:50:45 -04:00
Balasubramani Vivekanandan
1fdc4c381f drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.

The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.

With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.

[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
 <TASK>
 ? drm_printf+0x64/0x90
 __xe_devcoredump_read+0x23f/0x2d0 [xe]
 ? __pfx___drm_printfn_coredump+0x10/0x10
 ? __pfx___drm_puts_coredump+0x10/0x10
 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
 process_one_work+0x22e/0x6f0
 worker_thread+0x1e8/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x11f/0x250
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x47/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)

Fixes: 4209d635a8 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-01 11:28:35 -04:00
Lukasz Laguna
552dbba1ca drm/xe/vf: Disable CSC support on VF
CSC is not accessible by VF drivers, so disable its support flag on VF
to prevent further initialization attempts.

Fixes: e02cea83d3 ("drm/xe/gsc: add Battlemage support")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com
2025-07-30 13:09:46 +02:00
Satyanarayana K V P
f3009272ff drm/xe/vf: Create contexts for CCS read write
Create two LRCs to handle CCS meta data read / write from CCS pool in the
VM. Read context is used to hold GPU instructions to be executed at save
time and write context is used to hold GPU instructions to be executed at
the restore time.

Allocate batch buffer pool using suballocator for both read and write
contexts.

Migration framework is reused to create LRCAs for read and write.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-2-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:28 -07:00
Piotr Piórkowski
7a20b4f558 drm/xe: Move struct xe_vram_region to a dedicated header
Let's move the xe_vram_region structure to a new header dedicated to VRAM
to improve modularity and avoid unnecessary dependencies when only
VRAM-related structures are needed.

v2: Fix build if CONFIG_DRM_XE_DEVMEM_MIRROR is enabled
v3: Fix build if CONFIG_DRM_XE_DISPLAY is enabled
v4: Move helper to get tile dpagemap to xe_svm.c

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> # rev3
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-4-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:12:36 -07:00
Piotr Piórkowski
f92cfd72d9 drm/xe: Use dynamic allocation for tile and device VRAM region structures
In future platforms, we will need to represent the device and tile
VRAM regions in a more dynamic way, so let's abandon the static
allocation of these structures and start use a dynamic allocation.

v2:
 - Add a helpers for accessing fields of the xe_vram_region structure
v3:
- Add missing EXPORT_SYMBOL_IF_KUNIT for
  xe_vram_region_actual_physical_size

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:08:31 -07:00
Matt Atwood
77fa16c8f8 drm/xe: extend Wa_15015404425 to apply to PTL
Wa_15015404425 only needs to be applied on PTL platforms with an A step
compute die. There is no way to map PCI revid to the compute die
stepping. The easiest way to figure out compute die stepping our end is
to map the media IP's stepping to the compute die. For PTL, compute die
has an A stepping if and only if the media IP's stepping is also A-step
(This relationship is determined on a per platform basis and just
happens to be this way on PTL).

In addition this workaround is a chicken-and-egg problem. Wa_15015404425
requires that all register reads be preceded by four dummy MMIO writes
(including during early driver  init and even pre-OS firmware). The
driver needs to perform some MMIO reads during init which include the
GMD_ID register that contains the Media IPs stepping. To handle this in
the safest manner assume the workaround applies to all of PTL during
driver probe and deactivate the workaround after.

The overall solution becomes a set of two workarounds:

* 15015404425 - a Device OOB workaround that's always active for PTL
* 15015404425_disable - a GT OOB workaround that applies to PTL
  platfroms with a B0 or later stepping

The first of these workarounds issues dummy MMIO writes we do when
reading registers. The second guards logic that disables the first once
we have the necessary information later in the probe process.

v2: rename SoC to device, avoid null pointer dereference, update commit
message.
v3: rebase
v5: move disable check into xe_device_probe to avoid linking in xe_wa
into xe_pci, reword commit message
v6: squash extension and b0 support into 1 patch

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-7-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:31 -07:00
Matt Atwood
661a6950e0 drm/xe: Add infrastructure for Device OOB workarounds
Some workarounds need to be able to be applied ahead of any GT
initialization for example 15015404425. This patch creates XE_DEVICE_WA
macro, in the same vein as XE_WA. This macro can be used ahead of GT
initialization, and can be tracked in sysfs. This should alleviate some
of the complexities that exist in i915.

v2: name change SoC to Device, address style issues
v5: split into separate patch from RTP changes, put oob within a struct,
move the initiation of oob workarounds into xe_device_probe_early(),
clean up the comments around XE_WA.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20250709221605.172516-5-matthew.s.atwood@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-10 15:36:30 -07:00
Heikki Krogerus
f0e53aadd7 drm/xe: Support for I2C attached MCUs
Adding adaption/glue layer where the I2C host adapter
(Synopsys DesignWare I2C adapter) and the I2C clients (the
microcontroller units) are enumerated.

The microcontroller units (MCU) that are attached to the GPU
depend on the OEM. The initially supported MCU will be the
Add-In Management Controller (AMC).

Co-developed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Heikki Krogerus <heikki.krogerus@linux.intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://lore.kernel.org/r/20250701122252.2590230-4-heikki.krogerus@linux.intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo fixed the co-developed tags and SPDX format in the .c file]
2025-07-10 10:19:41 -04:00
Maarten Lankhorst
3effd109c6 drm/xe: Move xe_ttm_sys_mgr_init() downwards.
Now that all previous allocations are gone, ensure no new allocations
will ever be done before xe_display_init_early(), by moving the call
that allows allocations downwards.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-21-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-26 22:11:35 +02:00
Maarten Lankhorst
11bf0f0b3a drm/xe: Split init of xe_gt_init_hwconfig to xe_gt_init and *_early
Now that we added the separate step of initialising GUC in
xe_gt_init_early, it should be ok to initialise the minimum during early
init, and the rest after allocations are allowed.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-20-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-26 22:11:35 +02:00
Maarten Lankhorst
b3412d7233 drm/xe/sriov: Move VF bootstrap and query_config to vf_guc_init
We want to split up GUC init to an alloc and noalloc part to keep the
init path the same for VF and !VF as much as possible.

Everything in vf_guc_init should be done as early as possible, otherwise
VRAM probing becomes impossible.

Also move xe_gt_mmio_init to the end of xe_gt_init_early(), cleaning up
the init in xe_device slightly.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-15-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-26 22:07:44 +02:00
Maarten Lankhorst
e6018b194b drm/xe: Defer memirq init until needed
memirqs require allocations into GGTT, which we cannot use until
after display is enabled.

Now that the initialisation of interrupts is postponed, move memirq
init too.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ilia Levi <ilia.levi@intel.com>
Link: https://lore.kernel.org/r/20250619104858.418440-14-dev@lankhorst.se
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-06-26 22:07:44 +02:00
Vinay Belgaumkar
deea6a7d6d drm/xe/bmg: Update Wa_22019338487
Limit GT max frequency to 2600MHz and wait for frequency to reduce
before proceeding with a transient flush. This is really only needed for
the transient flush: if L2 flush is needed due to 16023588340 then
there's no need to do this additional wait since we are already using
the bigger hammer.

v2: Use generic names, ensure user set max frequency requests wait
for flush to complete (Rodrigo)
v3:
 - User requests wait via wait_var_event_timeout (Lucas)
 - Close races on flush + user requests (Lucas)
 - Fix xe_guc_pc_remove_flush_freq_limit() being called on last gt
   rather than root gt (Lucas)
v4:
 - Only apply the freq reducing part if a TDF is needed: L2 flush trumps
   the need for waiting a lower frequency

Fixes: aaa08078e7 ("drm/xe/bmg: Apply Wa_22019338487")
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-4-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24 09:58:29 -07:00
Lucas De Marchi
5e300ed8a5 drm/xe: Split xe_device_td_flush()
xe_device_td_flush() has 2 possible implementations: an entire L2 flush
or a transient flush, depending on WA 16023588340. Make this clear by
splitting the function so it calls each of them.

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250618-wa-22019338487-v5-3-b888388477f2@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-24 09:58:03 -07:00
Alexander Usyskin
c28bfb107d drm/xe/nvm: add on-die non-volatile memory device
Enable access to internal non-volatile memory on DGFX
with GSC/CSC devices via a child device.
The nvm child device is exposed via auxiliary bus.

Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com>
Link: https://lore.kernel.org/r/20250617145159.3803852-7-alexander.usyskin@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-06-23 13:14:50 -04:00
Dave Airlie
36c52fb703 Merge tag 'drm-intel-next-2025-06-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
drm/i915 feature pull for v6.17:

Features and functionality:
- Add support for DSC fractional link bpp on DP MST (Imre)
- Add support for simultaneous Panel Replay and Adaptive Sync (Jouni)
- Add support for PTL+ double buffered LUT registers (Chaitanya, Ville)
- Add PIPEDMC event handling in preparation for flip queue (Ville)

Refactoring and cleanups:
- Rename lots of DPLL interfaces to unify them (Suraj)
- Allocate struct intel_display dynamically (Jani)
- Abstract VLV IOSF sideband better (Jani)
- Use str_true_false() helper (Yumeng Fang)
- Refactor DSB code in preparation for flip queue (Ville)
- Use drm_modeset_lock_assert_held() instead of open coding (Luca)
- Remove unused arg from skl_scaler_get_filter_select() (Luca)
- Split out a separate display register header (Jani)
- Abstract DRAM detection better (Jani)
- Convert LPT/WPT SBI sideband to struct intel_display (Jani)

Fixes:
- Fix DSI HS command dispatch with forced pipeline flush (Gareth Yu)
- Fix BMG and LNL+ DP adaptive sync SDP programming (Ankit)
- Fix error path for xe display workqueue allocation (Haoxiang Li)
- Disable DP AUX access probe where not required (Imre)
- Fix DKL PHY access if the port is invalid (Luca)
- Fix PSR2_SU_STATUS access on ADL+ (Jouni)
- Add sanity checks for porch and sync on BXT/GLK DSI (Ville)

DRM core changes:
- Change AUX DPCD access probe address (Imre)
- Refactor EDID quirks, amd make them available to drivers (Imre)
- Add quirk for DPCD access probe (Imre)
- Add DPCD definitions for Panel Replay capabilities (Jouni)

Merges:
- Backmerges to sync with v6.15-rcs and v6.16-rc1 (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/fff9f231850ed410bd81b53de43eff0b98240d31@intel.com
2025-06-23 10:49:27 +10:00
Dave Airlie
9356b50af5 Merge tag 'drm-misc-next-2025-06-19' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:

UAPI Changes:
- Add Task Information for the wedge API

Cross-subsystem Changes:

Core Changes:
- Fix warnings related to export.h
- fbdev: Make CONFIG_FIRMWARE_EDID available on all architectures
- fence: Fix UAF issues
- format-helper: Improve tests

Driver Changes:
- ivpu: Add turbo flag, Add Wildcat Lake Support
- rz-du: Improve MIPI-DSI Support
- vmwgfx: fence improvement

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250619-perfect-industrious-whippet-8ed3db@houat
2025-06-20 11:34:09 +10:00
André Almeida
183bccafa1 drm: Create a task info option for wedge events
When a device get wedged, it might be caused by a guilty application.
For userspace, knowing which task was involved can be useful for some
situations, like for implementing a policy, logs or for giving a chance
for the compositor to let the user know what task was involved in the
problem.  This is an optional argument, when the task info is not
available, the PID and TASK string won't appear in the event string.

Sometimes just the PID isn't enough giving that the task might be already
dead by the time userspace will try to check what was this PID's name,
so to make the life easier also notify what's the task's name in the user
event.

Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250617124949.2151549-4-andrealmeid@igalia.com
Signed-off-by: André Almeida <andrealmeid@igalia.com>
2025-06-17 11:32:47 -03:00
Daniele Ceraolo Spurio
90f4d3f756 drm/xe/vf: Boostrap all GTs immediately after MMIO init
Currently we perform the bootstrap for the primary GT early on during
device init, while the media GT bootstrap happens when we try and fetch
the hwconfig table. For consistency, move the bootstrap of the media GT
happen at the same time as the primary GT, so that all the subsequent
code can rely on both GTs being in the same state.

v2: Also drop config query from min_guc_load since we now do it
    early (Michal)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250603235432.720833-8-daniele.ceraolospurio@intel.com
2025-06-06 08:33:18 -07:00
Jani Nikula
1e2803e565 drm/xe/display: move xe->display initialization to xe_display_probe()
The future goal is to have intel_display_device_probe() create struct
intel_display. As the first step, postpone xe->display initialization
right before that call. This is the same location as in i915.

There's a subtle functional change here: xe->display will now be
initialized only if xe->info.probe_display.

The xe_display_create() function becomes empty, and can be removed. Move
its documentation to xe_display_probe()

Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/6c3075739d84cecea258d686c3ef38455a61191c.1747397638.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-05-20 20:55:22 +03:00
Matthew Brost
1b894c2246 drm/xe: Add atomic_svm_timeslice_ms debugfs entry
Add some informal control for atomic SVM fault GPU timeslice to be able
to play around with values and tweak performance.

v2:
 - Reduce timeslice default value to 5ms

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250512135500.1405019-6-matthew.brost@intel.com
2025-05-12 13:49:18 -07:00
Thomas Hellström
5dd933e33b drm/xe: Make the gem shrinker drm managed
Make the xe drm shrinker drm managed like many other resources
created at device creation time.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250508113015.3374-1-thomas.hellstrom@linux.intel.com
2025-05-12 10:01:31 +02:00
Raag Jadav
f3e875b3c0 drm/xe: Move xe_device_sysfs_init() to xe_device_probe()
Since xe_device_sysfs_init() exposes device specific attributes, a better
place for it is xe_device_probe().

Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://lore.kernel.org/r/20250506054835.3395220-2-raag.jadav@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-05-07 15:31:10 -04:00
Riana Tauro
bc417e54e2 drm/xe: Enable configfs support for survivability mode
Enable survivability mode if supported and configfs attribute is set.
Enabling survivability mode manually is useful in cases where pcode does
not detect failure, validation and for IFR (in-field-repair).

To set configfs survivability mode attribute for a device

echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode

The card enters survivability mode if supported

v2: add a log if survivability mode is enabled for unsupported
    platforms (Rodrigo)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250407051414.1651616-4-riana.tauro@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-04-08 22:24:00 -07:00
Thomas Hellström
3cbb651117 drm/xe/bo: Add a bo remove callback
On device unbind, migrate exported bos, including pagemap bos to
system. This allows importers to take proper action without
disruption. In particular, SVM clients on remote devices may
continue as if nothing happened, and can chose a different
placement.

The evict_flags() placement is chosen in such a way that bos that
aren't exported are purged.

For pinned bos, we unmap DMA, but their pages are not freed yet
since we can't be 100% sure they are not accessed.

All pinned external bos (not just the VRAM ones) are put on the
pinned.external list with this patch. But this only affects the
xe_bo_pci_dev_remove_pinned() function since !VRAM bos are
ignored by the suspend / resume functionality. As a follow-up we
could look at removing the suspend / resume iteration over
pinned external bos since we currently don't allow pinning
external bos in VRAM, and other external bos don't need any
special treatment at suspend / resume.

v2:
- Address review comments. (Matthew Auld).
v3:
- Don't introduce an external_evicted list (Matthew Auld)
- Add a discussion around suspend / resume behaviour to the
  commit message.
- Formatting fixes.
v4:
- Move dma-unmaps of pinned kernel bos to a dev managed
  callback to give subsystems using these bos a chance to
  clean them up. (Matthew Auld)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250326080551.40201-4-thomas.hellstrom@linux.intel.com
2025-03-27 12:07:57 +01:00
Lucas De Marchi
676da6ba5b drm/xe: Allow to inject error in early probe
Allow to test if driver behaves correctly when xe_pcode_probe_early()
fails. Note that this is not sufficient for testing survivability mode
as it's still required to read the hw to check for errors, which doesn't
happen on an injected failure.

To complete the early probe coverage, allow injection in the other
functions as well: xe_mmio_probe_early() and xe_device_probe_early().

Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-3-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-21 11:48:22 -07:00
Lucas De Marchi
86b5e0dbba drm/xe: Move survivability back to xe
Commit d40f275d96 ("drm/xe: Move survivability entirely to xe_pci")
moved the survivability handling to be done entirely in the xe_pci
layer. However there are some issues with that approach:

1) Survivability mode needs at least the mmio initialized, otherwise it
   can't really read a register to decide if it should enter that state
2) SR-IOV mode should be initialized, otherwise it's not possible to
   check if it's VF

Besides, as pointed by Riana the check for
xe_survivability_mode_enable() was wrong in xe_pci_probe() since it's
not a bool return.

Fix that by moving the initialization to be entirely in the xe_device
layer, with the correct dependencies handled: only after mmio and sriov
initialization, and not triggering it on error from
wait_for_lmem_ready(). This restores the trigger behavior before that
commit. The xe_pci layer now only checks for "is it enabled?",
like it's doing in xe_pci_suspend()/xe_pci_remove(), etc.

Cc: Riana Tauro <riana.tauro@intel.com>
Fixes: d40f275d96 ("drm/xe: Move survivability entirely to xe_pci")
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250314-fix-survivability-v5-1-fdb3559ea965@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-03-21 11:48:09 -07:00
Thomas Hellström
52eb8cd788 Merge drm/drm-next into drm-xe-next
Backmerging to bring in the xe shrinker from drm-next.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-03-19 11:08:52 +01:00
Michal Wajdeczko
d3414acf4a drm/xe/vf: Don't try Driver-FLR if VF
Driver-FLR can't be triggered from the VF driver, so treat it
as disabled if VF. While around, fix also the message, as it
shouldn't be printed just 'once' as we may have many devices.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250311135726.1998-2-michal.wajdeczko@intel.com
2025-03-12 12:02:55 +01:00
Michal Wajdeczko
de35cc27fd drm/xe: Prefer USEC_PER_SEC over MICRO
It will be easier to understand the meaning of the flr_timeout
value if the USEC_PER_SEC macro is used in the expression.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250311140115.2042-1-michal.wajdeczko@intel.com
2025-03-12 11:44:51 +01:00
Dave Airlie
11a5c6445a Merge tag 'drm-xe-next-2025-03-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:
 - Expose per-engine activity via perf pmu (Riana, Lucas, Umesh)
 - Add support for EU stall sampling (Harish, Ashutosh)
 - Allow userspace to provide low latency hint for submission (Tejas)
 - GPU SVM and Xe SVM implementation (Matthew Brost)

Cross-subsystem Changes:
 - devres handling for component drivers (Lucas)
 - Backmege drm-next to allow cross dependent change with i915
 - GPU SVM and Xe SVM implementation (Matthew Brost)

Core Changes:

Driver Changes:
 - Fixes to userptr and missing validations (Matthew Auld, Thomas
   Hellström, Matthew Brost)
 - devcoredump typos and error handling improvement (Shuicheng)
 - Allow oa_exponent value of 0 (Umesh)
 - Finish moving device probe to devm (Lucas)
 - Fix race between submission restart and scheduled being freed (Tejas)
 - Fix counter overflows in gt_stats (Francois)
 - Refactor and add missing workarounds and tunings for pre-Xe2 platforms
   (Aradhya, Tvrtko)
 - Fix PXP locks interaction with exec queues being killed (Daniele)
 - Eliminate TIMESTAMP_OVERRIDE from xe (Matt Roper)
 - Change xe_gen_wa_oob to allow building on MacOS (Daniel Gomez)
 - New workarounds for Panther Lake (Tejas)
 - Fix VF resume errors (Satyanarayana)
 - Fix workaround infra skipping some workarounds dependent on engine
   initialization (Tvrtko)
 - Improve per-IP descriptors (Gustavo)
 - Add more error injections to probe sequence (Francois)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ilc5jvtyaoyi6woyhght5a6sw5jcluiojjueorcyxbynrcpcjp@mw2mi6rd6a7l
2025-03-11 10:26:17 +10:00
Dave Airlie
d65a27f95f Merge tag 'drm-misc-next-2025-03-06' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.15:

Cross-subsystem Changes:

base:
- component: Provide helper to query bound status

fbdev:
- fbtft: Remove access to page->index

Core Changes:

- Fix usage of logging macros in several places

gem:
- Add test function for imported dma-bufs and use it in core and helpers
- Avoid struct drm_gem_object.import_attach

tests:
- Fix lockdep warnings

ttm:
- Add helpers for TTM shrinker

Driver Changes:

adp:
- Add support for Apple Touch Bar displays on M1/M2

amdxdna:
- Fix interrupt handling

appletbdrm:
- Add support for Apple Touch Bar displays on x86

bridge:
- synopsys: Add HDMI audio support
- ti-sn65dsi83: Support negative DE polarity

ipu-v3:
- Remove unused code

nouveau:
- Avoid multiple -Wflex-array-member-not-at-end warnings

panthor:
- Fix CS_STATUS_ defines
- Improve locking

rockchip:
- analogix_dp: Add eDP support
- lvds: Improve logging
- vop2: Improve HDMI mode handling; Add support for RK3576
- Fix shutdown
- Support rk3562-mali

xe:
- Use TTM shrinker

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306130700.GA485504@linux.fritz.box
2025-03-07 09:55:50 +10:00