Commit Graph

288 Commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Linus Torvalds
939faf71cf Merge tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "Highlights:
   - amdgpu support for lots of new IP blocks which means newer GPUs
   - xe has a lot of SR-IOV and SVM improvements
   - lots of intel display refactoring across i915/xe
   - msm has more support for gen8 platforms
   - Given up on kgdb/kms integration, it's too hard on modern hw

  core:
   - drop kgdb support
   - replace system workqueue with percpu
   - account for property blobs in memcg
   - MAINTAINERS updates for xe + buddy

  rust:
   - Fix documentation for Registration constructors
   - Use pin_init::zeroed() for fops initialization
   - Annotate DRM helpers with __rust_helper
   - Improve safety documentation for gem::Object::new()
   - Update AlwaysRefCounted imports
   - mm: Prevent integer overflow in page_align()

  atomic:
   - add drm_device pointer to drm_private_obj
   - introduce gamma/degamma LUT size check

  buddy:
   - fix free_trees memory leak
   - prevent BUG_ON

  bridge:
   - introduce drm_bridge_unplug/enter/exit
   - add connector argument to .hpd_notify
   - lots of recounting conversions
   - convert rockchip inno hdmi to bridge
   - lontium-lt9611uxc: switch to HDMI audio helpers
   - dw-hdmi-qp: add support for HPD-less setups
   - Algoltek AG6311 support

  panels:
   - edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
   - st75751: add SPI support
   - Sitronix ST7920, Samsung LTL106HL02
   - LG LH546WF1-ED01, HannStar HSD156J
   - BOE NV130WUM-T08
   - Innolux G150XGE-L05
   - Anbernic RG-DS

  dma-buf:
   - improve sg_table debugging
   - add tracepoints
   - call clear_page instead of memset
   - start to introduce cgroup memory accounting in heaps
   - remove sysfs stats

  dma-fence:
   - add new helpers

  dp:
   - mst: avoid oob access with vcpi=0

  hdmi:
   - limit infoframes exposure to userspace

  gem:
   - reduce page table overhead with THP
   - fix leak in drm_gem_get_unmapped_area

  gpuvm:
   - API sanitation for rust bindings

  sched:
   - introduce new helpers

  panic:
   - report invalid panic modes
   - add kunit tests

  i915/xe display:
   - Expose sharpness only if num_scalers is >= 2
   - Add initial Xe3P_LPD for NVL
   - BMG FBC support
   - Add MTL+ platforms to support dpll framework
   _ fix DIMM_S DRM decoding on ICL
   - Return to using AUX interrupts
   - PSR/Panel replay refactoring
   - use consolidation HDMI tables
   - Xe3_LPD CD2X dividier changes

  xe:
   - vfio: add vfio_pci for intel GPU
   - multi queue support
   - dynamic pagemaps and multi-device SVM
   - expose temp attribs in hwmon
   - NO_COMPRESSION bo flag
   - expose MERT OA unit
   - sysfs survivability refactor
   - SRIOV PF: add MERT support
   - enable SR-IOV VF migration
   - Enable I2C/NVM on Crescent Island
   - Xe3p page reclaimation support
   - introduce SRIOV scheduler groups
   - add SoC remappt support in system controller
   - insert compiler barriers in GuC code
   - define NVL GuC firmware
   - handle GT resume failure
   - fix drm scheduler layering violations
   - enable GSC loading and PXP for PTL
   - disable GuC Power DCC strategy on PTL
   - unregister drm device on probe error

  i915:
   - move to kernel standard fault injection
   - bump recommended GuC version for DG2 and MTL

  amdgpu:
   - SMUIO 15.x, PSP 15.x support
   - IH 6.1.1/7.1 support
   - MMHUB 3.4/4.2 support
   - GC 11.5.4/12.1 support
   - SDMA 6.1.4/7.1/7.11.4 support
   - JPEG 5.3 support
   - UserQ updates
   - GC 9 gfx queue reset support
   - TTM memory ops parallelization
   - convert legacy logging to new helpers
   - DC analog fixes

  amdkfd:
   - GC 11.5.4/12.1 suppport
   - SDMA 6.1.4/7.1 support
   - per context support
   - increase kfd process hash table
   - Reserved SDMA rework

  radeon:
   - convert legacy logging to new helpers
   - use devm for i2c adapters

  msm:
   - GPU
      - Document a612/RGMU dt bindings
      - UBWC 6.0 support (for A840 / Kaanapali)
      - a225 support
   - DPU:
      - Switch to use virtual planes by default
      - Fix DSI CMD panels on DPU 3.x
      - Rewrite format handling to remove intermediate representation
      - Fix watchdog on DPU 8.x+
      - Fix TE / Vsync source setting on DPU 8.x+
      - Add 3D_Mux on SC7280
      - Kaanapali platform support
      - Fix UBWC register programming
      - Make RM reserve DSPP-enabled mixers for CRTCs with LMs
      - Gamma correction support
   - DP:
      - Enable support for eDP 1.4+ link rate tables
      - Fix MDSS1 DP indices on SA8775P, making them to work
      - Fix msm_dp_ctrl_config_msa() to work with LLVM 20
   - DSI:
      - Document QCS8300 as compatible with SA8775P
      - Kaanapali platform support
   - DSI PHY:
      - switch to divider_determine_rate()
   - MDP5:
      - Drop support for MSM8998, SDM660 and SDM630 (switch over to DPU)
   -  MDSS:
      - Kaanapali platform support
      - Fixed UBWC register programming

  nova-core:
   - Prepare for Turing support. This includes parsing and handling
     Turing-specific firmware headers and sections as well as a Turing
     Falcon HAL implementation
   - Get rid of the Result<impl PinInit<T, E>> anti-pattern
   - Relocate initializer-specific code into the appropriate initializer
   - Use CStr::from_bytes_until_nul() to remove custom helpers
   - Improve handling of unexpected firmware values
   - Clean up redundant debug prints
   - Replace c_str!() with native Rust C-string literals
   - Update nova-core task list

  nova:
   - Align GEM object size to system page size

  tyr:
   - Use generated uAPI bindings for GpuInfo
   - Replace manual sleeps with read_poll_timeout()
   - Replace c_str!() with native Rust C-string literals
   - Suppress warnings for unread fields
   - Fix incorrect register name in print statement

  nouveau:
   - fix big page table support races in PTE management
   - improve reclocking on tegra 186+

  amdxdna:
   - fix suspend race conditions
   - improve handling of zero tail pointers
   - fix cu_idx overwritten during command setup
   - enable hardware context priority
   - remove NPU2 support
   - update message buffer allocation requirements
   - update firmware version check

  ast:
   - support imported cursor buffers
   - big endian fixes

  etnaviv:
   - add PPU flop reset support

  imagination:
   - add AM62P support
   - introduce hw version checks

  ivpu:
   - implement warm boot flow

  panfrost:
   - add bo sync ioctl
   - add GPU_PM_RT support for RZ/G3E SoC

  panthor:
   - add bo sync ioctl
   - enable timestamp propagation
   - scheduler robustness improvements
   - VM termination fixes
   - huge page support

  rockchip:
   - RK3368 HDMI Support
   - get rid of atomic_check fixups
   - RK3506 support
   - RK3576/RK3588 improved HPD handling

  rz-du:
   - RZ/V2H(P) MIPI-DSI Support

  v3d:
   - fix DMA segment size
   - convert to new logging helpers

  mediatek:
   - move DP training to hotplug thread
   - convert logging to new helpers
   - add support for HS speed DSI
   - Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support

  atmel-hlcdc:
   - switch to drmm resource
   - support nomodeset
   - use newer helpers

  hisilicon:
   - fix various DP bugs

  renesas:
   - fix kernel panic on reboot

  exynos:
   - fix vidi_connection_ioctl using wrong device
   - fix vidi_connection deref user ptr
   - fix concurrency regression with vidi_context

  vkms:
   - add configfs support for display configuration

* tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel: (1610 commits)
  drm/xe/pm: Disable D3Cold for BMG only on specific platforms
  drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep
  drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early
  drm/xe: Fix kerneldoc for xe_migrate_exec_queue
  drm/xe/query: Fix topology query pointer advance
  drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header
  drm/xe/guc: Fix CFI violation in debugfs access.
  accel/amdxdna: Move RPM resume into job run function
  accel/amdxdna: Fix incorrect DPM level after suspend/resume
  nouveau/vmm: start tracking if the LPT PTE is valid. (v6)
  nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2)
  nouveau/vmm: rewrite pte tracker using a struct and bitfields.
  accel/amdxdna: Fix incorrect error code returned for failed chain command
  accel/amdxdna: Remove hardware context status
  drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe()
  drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dsc
  drm/i915/display: fix the pixel normalization handling for xe3p_lpd
  drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free
  drm/exynos: vidi: fix to avoid directly dereferencing user pointer
  drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl()
  ...
2026-02-11 12:55:44 -08:00
Shuicheng Lin
2da8fbb8f1 drm/xe/nvm: Manage nvm aux cleanup with devres
Move nvm teardown to a devm-managed action registered from xe_nvm_init().
This ensures the auxiliary NVM device is deleted on probe failure and
device detach without requiring explicit calls from remove paths.

As part of this, drop xe_nvm_fini() from xe_device_remove() and from the
survivability sysfs teardown, and remove the public xe_nvm_fini() API from
the header.

This is to fix below warn message when there is probe failure after
xe_nvm_init(), then xe_device_probe() is called again:
"
[  207.318152] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/xe.nvm.768'
[  207.318157] CPU: 5 UID: 0 PID: 10261 Comm: modprobe Tainted: G    B   W           6.19.0-rc2-lgci-xe-kernel+ #223 PREEMPT(voluntary)
[  207.318160] Tainted: [B]=BAD_PAGE, [W]=WARN
[  207.318161] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
[  207.318163] Call Trace:
[  207.318163]  <TASK>
[  207.318165]  dump_stack_lvl+0xa0/0xc0
[  207.318170]  dump_stack+0x10/0x20
[  207.318171]  sysfs_warn_dup+0xd5/0x110
[  207.318175]  sysfs_create_dir_ns+0x1f6/0x280
[  207.318177]  ? __pfx_sysfs_create_dir_ns+0x10/0x10
[  207.318179]  ? lock_acquire+0x1a4/0x2e0
[  207.318182]  ? __kasan_check_read+0x11/0x20
[  207.318185]  ? do_raw_spin_unlock+0x5c/0x240
[  207.318187]  kobject_add_internal+0x28d/0x8e0
[  207.318189]  kobject_add+0x11f/0x1f0
[  207.318191]  ? __pfx_kobject_add+0x10/0x10
[  207.318193]  ? lockdep_init_map_type+0x4b/0x230
[  207.318195]  ? get_device_parent.isra.0+0x43/0x4c0
[  207.318197]  ? kobject_get+0x55/0xf0
[  207.318199]  device_add+0x2d7/0x1500
[  207.318201]  ? __pfx_device_add+0x10/0x10
[  207.318203]  ? lockdep_init_map_type+0x4b/0x230
[  207.318205]  __auxiliary_device_add+0x99/0x140
[  207.318208]  xe_nvm_init+0x7a2/0xef0 [xe]
[  207.318333]  ? xe_devcoredump_init+0x80/0x110 [xe]
[  207.318452]  ? __devm_add_action+0x82/0xc0
[  207.318454]  ? fs_reclaim_release+0xc0/0x110
[  207.318457]  xe_device_probe+0x17dd/0x2c40 [xe]
[  207.318574]  ? __pfx___drm_dev_dbg+0x10/0x10
[  207.318576]  ? add_dr+0x180/0x220
[  207.318579]  ? __pfx___drmm_mutex_release+0x10/0x10
[  207.318582]  ? __pfx_xe_device_probe+0x10/0x10 [xe]
[  207.318697]  ? xe_pm_init_early+0x33a/0x410 [xe]
[  207.318850]  xe_pci_probe+0x936/0x1250 [xe]
[  207.318999]  ? lock_acquire+0x1a4/0x2e0
[  207.319003]  ? __pfx_xe_pci_probe+0x10/0x10 [xe]
[  207.319151]  local_pci_probe+0xe6/0x1a0
[  207.319154]  pci_device_probe+0x523/0x840
[  207.319157]  ? __pfx_pci_device_probe+0x10/0x10
[  207.319159]  ? sysfs_do_create_link_sd.isra.0+0x8c/0x110
[  207.319162]  ? sysfs_create_link+0x48/0xc0
...
"

Fixes: c28bfb107d ("drm/xe/nvm: add on-die non-volatile memory device")
Reviewed-by: Alexander Usyskin <alexander.usyskin@intel.com>
Reviewed-by: Brian Nguyen <brian3.nguyen@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patch.msgid.link/20260120183239.2966782-6-shuicheng.lin@intel.com
(cherry picked from commit 11035eab1b)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2026-01-29 20:25:32 +01:00
Shuicheng Lin
96c2c72b81 drm/xe: Unregister drm device on probe error
Call drm_dev_unregister() when xe_device_probe() fails after successful
drm_dev_register(). This ensures the DRM device is promptly unregistered
before returning an error, avoiding leaving it registered on the failure
path.
Otherwise, there is warn message if xe_device_probe() is called again:
"
[  207.322365] [drm:drm_minor_register]
[  207.322381] debugfs: '128' already exists in 'dri'
[  207.322432] sysfs: cannot create duplicate filename '/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/renderD128'
[  207.322435] CPU: 5 UID: 0 PID: 10261 Comm: modprobe Tainted: G    B   W           6.19.0-rc2-lgci-xe-kernel+ #223 PREEMPT(voluntary)
[  207.322439] Tainted: [B]=BAD_PAGE, [W]=WARN
[  207.322440] Hardware name: ASUS System Product Name/PRIME Z790-P WIFI, BIOS 0812 02/24/2023
[  207.322441] Call Trace:
[  207.322442]  <TASK>
[  207.322443]  dump_stack_lvl+0xa0/0xc0
[  207.322446]  dump_stack+0x10/0x20
[  207.322448]  sysfs_warn_dup+0xd5/0x110
[  207.322451]  sysfs_create_dir_ns+0x1f6/0x280
[  207.322453]  ? __pfx_sysfs_create_dir_ns+0x10/0x10
[  207.322455]  ? lock_acquire+0x1a4/0x2e0
[  207.322458]  ? __kasan_check_read+0x11/0x20
[  207.322461]  kobject_add_internal+0x28d/0x8e0
[  207.322464]  kobject_add+0x11f/0x1f0
[  207.322465]  ? lock_acquire+0x1a4/0x2e0
[  207.322467]  ? __pfx_kobject_add+0x10/0x10
[  207.322469]  ? __kasan_check_write+0x14/0x20
[  207.322471]  ? kobject_put+0x62/0x4a0
[  207.322473]  ? get_device_parent.isra.0+0x1bb/0x4c0
[  207.322475]  ? kobject_put+0x62/0x4a0
[  207.322477]  device_add+0x2d7/0x1500
[  207.322479]  ? __pfx_device_add+0x10/0x10
[  207.322481]  ? drm_debugfs_add_file+0xfa/0x170
[  207.322483]  ? drm_debugfs_add_files+0x82/0xd0
[  207.322485]  ? drm_debugfs_add_files+0x82/0xd0
[  207.322487]  drm_minor_register+0x10a/0x2d0
[  207.322489]  drm_dev_register+0x143/0x860
[  207.322491]  ? xe_configfs_get_psmi_enabled+0x12/0x90 [xe]
[  207.322667]  xe_device_probe+0x185b/0x2c40 [xe]
[  207.322812]  ? __pfx___drm_dev_dbg+0x10/0x10
[  207.322815]  ? add_dr+0x180/0x220
[  207.322818]  ? __pfx___drmm_mutex_release+0x10/0x10
[  207.322821]  ? __pfx_xe_device_probe+0x10/0x10 [xe]
[  207.322966]  ? xe_pm_init_early+0x33a/0x410 [xe]
[  207.323136]  xe_pci_probe+0x936/0x1250 [xe]
[  207.323298]  ? lock_acquire+0x1a4/0x2e0
[  207.323302]  ? __pfx_xe_pci_probe+0x10/0x10 [xe]
[  207.323464]  local_pci_probe+0xe6/0x1a0
[  207.323468]  pci_device_probe+0x523/0x840
[  207.323470]  ? __pfx_pci_device_probe+0x10/0x10
[  207.323473]  ? sysfs_do_create_link_sd.isra.0+0x8c/0x110
[  207.323476]  ? sysfs_create_link+0x48/0xc0
[  207.323479]  really_probe+0x1fd/0x8a0
...
"

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://patch.msgid.link/20260109211041.2446012-2-shuicheng.lin@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 60bfb8baf8)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-27 11:40:20 -05:00
Matt Roper
8367585154 drm/xe: Cleanup unused header includes
clangd reports many "unused header" warnings throughout the Xe driver.
Start working to clean this up by removing unnecessary includes in our
.c files and/or replacing them with explicit includes of other headers
that were previously being included indirectly.

By far the most common offender here was unnecessary inclusion of
xe_gt.h.  That likely originates from the early days of xe.ko when
xe_mmio did not exist and all register accesses, including those
unrelated to GTs, were done with GT functions.

There's still a lot of additional #include cleanup that can be done in
the headers themselves; that will come as a followup series.

v2:
 - Squash the 79-patch series down to a single patch.  (MattB)

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260115032803.4067824-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-15 07:05:04 -08:00
Balasubramani Vivekanandan
d758c8d6e2 drm/xe/device: Convert wait for lmem init into an assert
Prior to lmem init check, driver is waiting for the pcode uncore_init
status. uncore_init status will be flagged after the complete boot and
initialization of the SoC by the pcode. uncore_init confirms that lmem
init and mmio unblock has been already completed.
It makes no sense to check for lmem init after the pcode uncore_init
check. So change the wait for lmem init check into an assert which
confirms lmem init is set.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/20251219145024.2955946-2-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2026-01-12 08:56:20 -08:00
Lukasz Laguna
96d45e34f8 drm/xe/pf: Allow upon-any-hang wedged mode only in debug config
The GuC reset policy is global, so disabling it on PF can affect all
running VFs. To avoid unintended side effects, restrict setting
upon-any-hang (2) wedged mode on the PF to debug builds only.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-5-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:59 -05:00
Lukasz Laguna
43d78aca8e drm/xe/vf: Disallow setting wedged mode to upon-any-hang
In upon-any-hang (2) wedged mode, engine resets need to be disabled,
which requires changing the GuC reset policy. VFs are not permitted to
do that.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-4-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:53 -05:00
Lukasz Laguna
17d3c3365b drm/xe: Validate wedged_mode parameter and define enum for modes
Check correctness of the wedged_mode parameter input to ensure only
supported values are accepted. Additionally, replace magic numbers with
a clearly defined enum.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/20260107174741.29163-2-lukasz.laguna@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2026-01-08 16:07:07 -05:00
Niranjana Vishwanathapura
caaed1dda7 Revert "drm/xe/multi_queue: Support active group after primary is destroyed"
This reverts commit 3131a43ecb.

There is no must have requirement for this feature from Compute UMD.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20260106191051.2866538-5-niranjana.vishwanathapura@intel.com
2026-01-06 11:13:54 -08:00
Umesh Nerlige Ramappa
a9f88c68f8 drm/xe/soc_remapper: Initialize SoC remapper during Xe probe
SoC remapper is used to map different HW functions in the SoC to their
respective drivers. Initialize SoC remapper during driver load.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patch.msgid.link/20251223183943.3175941-6-umesh.nerlige.ramappa@intel.com
2025-12-23 11:43:46 -08:00
Thomas Hellström
dff547e137 drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm
Use device file descriptors and regions to represent pagemaps on
foreign or local devices.

The underlying files are type-checked at madvise time, and
references are kept on the drm_pagemap as long as there is are
madvises pointing to it.

Extend the madvise preferred_location UAPI to support the region
instance to identify the foreign placement.

v2:
- Improve UAPI documentation. (Matt Brost)
- Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost)
- Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost)
- Don't allow a foreign drm_pagemap madvise without a fast
  interconnect.
v3:
- Add a comment about reference-counting in xe_devmem_open() and
  remove the reference-count get-and-put. (Matt Brost)

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
2025-12-23 10:00:48 +01:00
Thomas Hellström
8a52f4d9b1 drm/xe: Use the drm_pagemap cache and shrinker
Define a struct xe_pagemap that embeds all pagemap-related
data used by xekmd, and use the drm_pagemap cache- and
shrinker to manage lifetime.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-9-thomas.hellstrom@linux.intel.com
2025-12-23 09:58:33 +01:00
Matthew Brost
6e608bff25 drm/xe: Add debugfs knobs to control long running workload timeslicing
Add debugfs knobs to control timeslicing for long-running workloads,
allowing quick tuning of values when running benchmarks.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251212182847.1683222-4-matthew.brost@intel.com
2025-12-15 13:53:32 -08:00
Jagmeet Randhawa
eafb6f6209 drm/xe: Increase TDF timeout
There are some corner cases where flushing transient
data may take slightly longer than the 150us timeout
we currently allow.  Update the driver to use a 300us
timeout instead based on the latest guidance from
the hardware team. An update to the bspec to formally
document this is expected to arrive soon.

Fixes: c01c6066e6 ("drm/xe/device: implement transient flush")
Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/0201b1d6ec64d3651fcbff1ea21026efa915126a.1765487866.git.jagmeet.randhawa@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit d69d3636f5)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-15 14:16:57 +01:00
Matt Roper
7ef2d25e47 drm/xe: Track pre-production workaround support
When we're initially enabling driver support for a new platform/IP, we
usually implement all workarounds documented in the WA database in the
driver.  Many of those workarounds are restricted to early steppings
that only showed up in pre-production hardware (i.e., internal test
chips that are not available to the general public).  Since the
workarounds for early, pre-production steppings tend to be some of the
ugliest and most complicated workarounds, we generally want to eliminate
them and simplify the code once the platform has launched and our
internal usage of those pre-production parts have been phased out.

Let's add a flag to the device info that tracks which platforms still
have support for pre-production workarounds for so that we can print a
warning and taint if someone tries to load the driver on a
pre-production part for a platform without pre-production workarounds.
This will help our internal users understand the likely problems they'll
encounter if they try to load the driver on an old pre-production
device.

The Xe behavior here is similar to what we've done for many years on
i915 (see intel_detect_preproduction_hw()), except that instead of
manually coding up ranges of device steppings that we believe to be
pre-production hardware, Xe will use the hardware's own production vs
pre-production fusing status, which we can read from the FUSE2 register.
This fuse didn't exist on older Intel hardware, but should be present on
all platforms supported by the Xe driver.

Going forward, let's set the expectation that we'll start looking into
removing pre-production workarounds for a platform around the time that
platforms of the next major IP stepping are having their force_probe
requirement lifted.  This timing is just a rough guideline; there may be
cases where some instances of pre-production parts are still being
actively used in CI farms, internal device pools, etc. and we'll need to
wait a bit longer for those to be swapped out.

v2:
 - Fix inverted forcewake check

v3:
 - Invert flag and add it to the platforms on which we still have
   pre-prod workarounds.  (Jani, Lucas)

v4:
 - Avoid checking pre-production on VF since they don't have access to
   the FUSE2 register.

Bspec: 78271, 52544
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patch.msgid.link/20251212181411.294854-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-12 21:17:10 -08:00
Jagmeet Randhawa
d69d3636f5 drm/xe: Increase TDF timeout
There are some corner cases where flushing transient
data may take slightly longer than the 150us timeout
we currently allow.  Update the driver to use a 300us
timeout instead based on the latest guidance from
the hardware team. An update to the bspec to formally
document this is expected to arrive soon.

Fixes: c01c6066e6 ("drm/xe/device: implement transient flush")
Signed-off-by: Jagmeet Randhawa <jagmeet.randhawa@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patch.msgid.link/0201b1d6ec64d3651fcbff1ea21026efa915126a.1765487866.git.jagmeet.randhawa@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-12 09:08:21 -08:00
Niranjana Vishwanathapura
3131a43ecb drm/xe/multi_queue: Support active group after primary is destroyed
Add support to keep the group active after the primary queue is
destroyed. Instead of killing the primary queue during exec_queue
destroy ioctl, kill it when all the secondary queues of the group
are killed.

Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-34-niranjana.vishwanathapura@intel.com
2025-12-11 19:22:05 -08:00
Niranjana Vishwanathapura
2a31ea17d5 drm/xe/multi_queue: Add exec_queue set_property ioctl support
This patch adds support for exec_queue set_property ioctl.
It is derived from the original work which is part of
https://patchwork.freedesktop.org/series/112188/

Currently only DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY
property can be dynamically set.

v2: Check for and update kernel-doc which property this ioctl
    supports (Matt Brost)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251211010249.1647839-25-niranjana.vishwanathapura@intel.com
2025-12-11 19:21:04 -08:00
Thomas Hellström
0f94e51b53 Merge drm/drm-next into drm-xe-next
Backmerging to bring in a needed dependency for the Xe VFIO
driver variant. This should ideally have been done before we
commited that, so we now have a small window in drm-xe-next
where that driver doesn't compile.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512030331.I8CveRre-lkp@intel.com/
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-03 11:34:12 +01:00
Matt Roper
89bba8fe92 drm/xe/device: Use scope-based cleanup
Convert device code to use scope-based forcewake and runtime PM.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://patch.msgid.link/20251118164338.3572146-40-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-11-19 11:58:57 -08:00
Dave Airlie
61926c915f Merge tag 'drm-xe-next-2025-11-05' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
UAPI Changes:

Limit number of jobs per exec queue (Shuicheng)
Add sriov_admin sysfs tree (Michal)

Driver Changes:

Fix an uninitialized value (Thomas)
Expose a residency counter through debugfs (Mohammed Thasleem)
Workaround enabling and improvement (Tapani, Tangudu)
More Crescent Island-specific support (Sk Anirban, Lucas)
PAT entry dump imprement (Xin)
Inline gt_reset in the worker (Lucas)
Synchronize GT reset with device unbind (Balasubramani)
Do clean shutdown also when using flr (Jouni)
Fix serialization on burst of unbinds (Matt Brost)
Pagefault Refactor (Matt Brost)
Remove some unused code (Gwan-gyeong)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aQuBECxNOhudc0Bz@fedora
2025-11-17 09:08:26 +10:00
Dave Airlie
e237dfe708 Merge tag 'drm-misc-next-2025-11-05-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19-rc1:

UAPI Changes:
- Add userptr support to ivpu.
- Add IOCTL's for resource and telemetry data in amdxdna.

Core Changes:
- Improve some atomic state checking handling.
- drm/client updates.
- Use forward declarations instead of including drm_print.h
- RUse allocation flags in ttm_pool/device_init and allow specifying max
  useful pool size and propagate ENOSPC.
- Updates and fixes to scheduler and bridge code.
- Add support for quirking DisplayID checksum errors.

Driver Changes:
- Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf,
  sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti,
  panthor, vkms.
- Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI,
  TL121BVMS07-00 (IL79900A) panels.
- Add mali MediaTek MT8196 SoC gpu support.
- Add etnaviv GC8000 Nano Ultra VIP r6205 support.
- Document powervr ge7800 support in the devicetree.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com
2025-11-07 12:41:26 +10:00
Matthew Brost
1919d1687e drm/xe: Implement xe_pagefault_init
Create pagefault queues and initialize them.

v2:
 - Fix kernel doc + add comment for number PF queue (Francois)
v4:
 - Move init after GT init (CI, Francois)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Tested-by: Francois Dugast <francois.dugast@intel.com>
Link: https://patch.msgid.link/20251031165416.2871503-3-matthew.brost@intel.com
2025-11-04 09:04:26 -08:00
Jouni Högander
a4ff26b7c8 drm/xe: Do clean shutdown also when using flr
Currently Xe driver is triggering flr without any clean-up on
shutdown. This is causing random warnings from pending related works as the
underlying hardware is reset in the middle of their execution.

Fix this by performing clean shutdown also when using flr.

Fixes: 501d799a47 ("drm/xe: Wire up device shutdown handler")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://patch.msgid.link/20251031122312.1836534-1-jouni.hogander@intel.com
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
2025-11-04 15:12:16 +01:00
Tvrtko Ursulin
77e19f8d32 drm/ttm: Replace multiple booleans with flags in device init
Multiple consecutive boolean function arguments are usually not very
readable.

Replace the ones in ttm_device_init() with flags with the additional
benefit of soon being able to pass in more data with just a one off
code base churning cost.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Sui Jingfeng <suijingfeng@loongson.cn>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Zack Rusin <zack.rusin@broadcom.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Zack Rusin <zack.rusin@broadcom.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> # For xe
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Link: https://lore.kernel.org/r/20251020115411.36818-4-tvrtko.ursulin@igalia.com
[tursulin: fixup checkpatch while applying]
2025-10-31 09:14:35 +00:00
Sanjay Yadav
dd5d11b657 drm/xe: Fix spelling and typos across Xe driver files
Corrected various spelling mistakes and typos in multiple
files under the Xe directory. These fixes improve clarity
and maintain consistency in documentation.

v2
- Replaced all instances of "XE" with "Xe" where it referred
  to the driver name
- of -> for
- Typical -> Typically

v3
- Revert "Xe" to "XE" for macro prefix reference

Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251023121453.1182035-2-sanjay.kumar.yadav@intel.com
2025-10-27 13:00:11 +00:00
Matt Roper
ad0084f3f8 drm/xe: Don't check BIOS-disabled FlatCCS if primary GT is disabled
If the primary is GT is disabled via configfs, we can't read the GT
registers that would tell us whether the BIOS has disabled FlatCCS on a
platform that would otherwise have it; we'll just proceed as if the
FlatCCS is still enabled.  This is similar to the situation seen by
SRIOV VFs and doesn't cause any functional problems since the hardware
will simply drop writes to the CCS region and reads will always come
back as 0 (indicating uncompressed data).  We'll simply miss out on the
chance to avoid some unnecessary overhead during BO creation and
migration.

Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20251013200944.2499947-45-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-10-14 07:45:17 -07:00
Matt Roper
082547d8b4 drm/xe: Skip L2 / TDF cache flushes if primary GT is disabled
If the primary GT is disabled via configfs, GT-side L2 and TD cache
flushes are unnecessary since nothing is using/filling these caches.

Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Link: https://lore.kernel.org/r/20251013200944.2499947-34-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-10-14 07:44:57 -07:00
Kenneth Graunke
146046907b drm/xe: Increase global invalidation timeout to 1000us
The previous timeout of 500us seems to be too small; panning the map in
the Roll20 VTT in Firefox on a KDE/Wayland desktop reliably triggered
timeouts within a few seconds of usage, causing the monitor to freeze
and the following to be printed to dmesg:

[Jul30 13:44] xe 0000:03:00.0: [drm] *ERROR* GT0: Global invalidation timeout
[Jul30 13:48] xe 0000:03:00.0: [drm] *ERROR* [CRTC:82:pipe A] flip_done timed out

I haven't hit a single timeout since increasing it to 1000us even after
several multi-hour testing sessions.

Fixes: 0dd2dd0182 ("drm/xe: Move DSB l2 flush to a more sensible place")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5710
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <dev@lankhorst.se>
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20250912223254.147940-1-kenneth@whitecape.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-10-09 07:53:40 -07:00
Tejas Upadhyay
15b3036045 drm/xe: Move declarations under conditional branch
The xe_device_shutdown() function was needing a few declarations
that were only required under a specific condition. This change
moves those declarations to be within that conditional branch
to avoid unnecessary declarations.

Reviewed-by: Nitin Gote <nitin.r.gote@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20251007100208.1407021-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-10-08 15:07:41 +05:30
Michal Wajdeczko
e35e288090 drm/xe/vf: Don't claim support for firmware late-bind if VF
In general, the VFs can't load firmwares so attempt to initialize
the firmware late-bind component leads to errors like:

 [] xe 0000:03:00.1: [drm] *ERROR* Late bind component not bound

Fixes: 918bd789d6 ("drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6190
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-3-michal.wajdeczko@intel.com
2025-09-29 21:42:29 +02:00
Michal Wajdeczko
b88bb1eefa drm/xe/vf: Rename sriov_update_device_info
This is a VF only function and its name should reflect that to
avoid any confusion. Move the VF check to the caller side.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250928174811.198933-2-michal.wajdeczko@intel.com
2025-09-29 21:42:28 +02:00
Lucas De Marchi
09ab20c41a drm/xe/device: Use poll_timeout_us() to wait for lmem
Now that there's a generic poll_timeout_us(), use it to wait for
LMEM_INIT in GU_CNTL.

Reviewed-by: Maarten Lankhorst <dev@lankhorst.se>
Link: https://lore.kernel.org/r/20250922-xe-iopoll-v4-1-06438311a63f@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-24 21:23:18 -07:00
Badal Nilawar
918bd789d6 drm/xe/xe_late_bind_fw: Introduce xe_late_bind_fw
Introduce xe_late_bind_fw to enable firmware loading for the devices,
such as the fan controller, during the driver probe. Typically,
firmware for such devices are part of IFWI flash image but can be
replaced at probe after OEM tuning.
This patch binds mei late binding component to enable firmware loading.

v2:
 - Add devm_add_action_or_reset to remove the component (Daniele)
 - Add INTEL_MEI_GSC check in xe_late_bind_init() (Daniele)
v3:
 - Fail driver probe if late bind initialization fails,
   add has_late_bind flag (Daniele)
v4:
 - %s/I915_COMPONENT_LATE_BIND/INTEL_COMPONENT_LATE_BIND/
v6:
 - rebased
v7:
 - rebased
 - In xe_late_bind_init, use drm_err when returning an error to
   stop the probe (Lucas)
 - Use imperative mode in commit message (Lucas)

Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250905154953.3974335-4-badal.nilawar@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-18 09:32:00 -07:00
Varun Gupta
010629e00d drm/xe: Fix driver reference in FLR comment
Rectify the reference of i915 to Xe in a comment.

v2: Cosmetic changes. (Karthik)
v3: Rephrased the commit message. (Karthik)

Signed-off-by: Varun Gupta <varun.gupta@intel.com>
Reviewed-by: Karthik Poosa <karthik.poosa@intel.com>
Link: https://lore.kernel.org/r/20250911111712.811524-1-varun.gupta@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-09-11 08:00:33 -07:00
Thomas Hellström
a2f2453c2c drm/xe: Convert xe_bo_create_user() for exhaustive eviction
Use the xe_validation_guard() to convert xe_bo_create_user()
for exhaustive eviction.

v2:
- Adapt to argument changes of xe_validation_guard()

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1
Link: https://lore.kernel.org/r/20250908101246.65025-4-thomas.hellstrom@linux.intel.com
2025-09-10 09:15:57 +02:00
Satyanarayana K V P
be5590c384 drm/xe/vf: Enable CCS save/restore only on supported GUC versions
CCS save/restore is supported starting with GuC 70.48.0 (compatibility
version 1.23.0). Gate the feature on the GuC firmware version and keep it
disabled on older or unsupported versions.

Fixes: f3009272ff ("drm/xe/vf: Create contexts for CCS read write")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Andi Shyti <andi.shyti@kernel.org>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250902103256.21658-2-satyanarayana.k.v.p@intel.com
2025-09-02 18:59:17 +02:00
Riana Tauro
f646c9f937 drm/xe/doc: Document device wedged and runtime survivability
Add documentation for vendor specific device wedged recovery method
and runtime survivability.

v2: fix documentation (Raag)
v3: add userspace tool for firmware update (Raag)
v4: use consistent documentation (Raag)
v5: add more documentation

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
41ff795aff drm/xe/xe_survivability: Refactor survivability mode
Refactor survivability mode code to support both boot
and runtime survivability.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
60439ac3f2 drm/xe: Add a helper function to set recovery method
Add a helper function to set recovery method. The recovery
method can be set before declaring the device wedged and sending the
drm wedged uevent. If no method is set, default unbind/re-bind method
will be set.

v2: fix documentation (Raag)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Raag Jadav <raag.jadav@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-5-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
90fdcf5f89 drm/xe: Set GT as wedged before sending wedged uevent
Userspace should be notified after setting the device as wedged.
Re-order function calls to set gt wedged before sending uevent.

Cc: Matthew Brost <matthew.brost@intel.com>
Suggested-by: Raag Jadav <raag.jadav@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-4-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Riana Tauro
38fc73b8c7 drm/xe: Add documentation for Xe Device Wedging
Add documentation for Xe Device Wedging so that
file can be referenced in following patches.

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250826063419.3022216-2-riana.tauro@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-26 10:11:34 -04:00
Himal Prasad Ghimiray
418807860e drm/xe/uapi: Add UAPI for querying VMA count and memory attributes
Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow
userspace to query memory attributes of VMAs within a user specified
virtual address range.

Userspace first calls the ioctl with num_mem_ranges = 0,
sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve
the number of memory ranges (vmas) and size of each memory range attribute.
Then, it allocates a buffer of that size and calls the ioctl again to fill
the buffer with memory range attributes.

This two-step interface allows userspace to first query the required
buffer size, then retrieve detailed attributes efficiently.

v2 (Matthew Brost)
- Use same ioctl to overload functionality

v3
- Add kernel-doc

v4
- Make uapi future proof by passing struct size (Matthew Brost)
- make lock interruptible (Matthew Brost)
- set reserved bits to zero (Matthew Brost)
- s/__copy_to_user/copy_to_user (Matthew Brost)
- Avod using VMA term in uapi (Thomas)
- xe_vm_put(vm) is missing (Shuicheng)

v5
- Nits
- Fix kernel-doc

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Shuicheng Lin <shuicheng.lin@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-21-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray
e80b05b09f drm/xe: Enable madvise ioctl for xe
Ioctl enables setting up of memory attributes in user provided range.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-20-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
2025-08-26 11:25:36 +05:30
Lucas De Marchi
aaa0c1f50a drm/xe/psmi: Add debugfs interface for PSMI
Requirement for PSMI capture is to have a physically contiguous buffer.
All the needed configuration is done by the userspace tool directly to
the GPU via mmio access.

This interface only support allocating from VRAM regions. For integrated
devices, the PSMI buffer is in SYSTEM memory and should be allocated by
userspace using hugetlbfs.

Here we add the ability to allocate a region of physically contiguous
memory by writing to debugfs file (listed below). For multi-tile devices,
the capture tool requires ability to allocate a capture buffer per tile
(VRAM region) and so user can specify a region_mask. The tool then
can mmap the buffers via direct mmap of the PCIBAR via sysfs.

To support the capture tool, 3 new debugfs entries are added:

   psmi_capture_addr - physical address per VRAM region's capture buffer
   psmi_capture_region_mask - select which region(s) to allocate a buffer
   psmi_capture_size - size of current capture buffer

Writing psmi_capture_size will allocate new buffer of requested size per
region after freeing any current buffers.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Original-author: Brian Welty <brian.welty@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> # v2
Link: https://lore.kernel.org/r/20250821-psmi-v5-2-34ab7550d3d8@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-08-22 11:46:43 -07:00
Matt Atwood
4d5c98eb77 drm/xe: rename XE_WA to XE_GT_WA
Now that there are two types of wa tables and infrastructure, be more
concise in the naming of GT wa macros.

v2: update the documentation

Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250807214224.32728-1-matthew.s.atwood@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-08 10:50:45 -04:00
Balasubramani Vivekanandan
1fdc4c381f drm/xe/devcoredump: Defer devcoredump initialization during probe
Doing devcoredump initializing before GT though look harmless, it leads
to problem during driver unbind. Because of this order, GT/Engine
release functions will be called before xe devcoredump release function
(xe_driver_devcoredump_fini) leading to the following kernel crash[1]
because the devcoredump functions might still use GT/Engine
datastructures after those are freed.

The following crash is observed while running the IGT
xe_wedged@wedged-at-any-timeout. The test forces a wedged state by
submitting a workload which hangs. Then does a unbind/rebind of the
driver to recover from the wedged state.
The hanged workload leads to a devcoredump. The following crash is
noticed when the devcoredump capture races with the driver unbind.
During driver unbind, the release function hw_engine_fini() will be
called which assigns NULL to hwe->gt. But the same data structure is
accessed during the coredump capture in the function
xe_engine_snapshot_print by reading snapshot->hwe->gt.

With this patch, we make sure the devcoredump is stopped before
deinitializing the core driver functions.

[1]:
BUG: kernel NULL pointer dereference, address: 0000000000000000
Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe]
RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe]
Call Trace:
 <TASK>
 ? drm_printf+0x64/0x90
 __xe_devcoredump_read+0x23f/0x2d0 [xe]
 ? __pfx___drm_printfn_coredump+0x10/0x10
 ? __pfx___drm_puts_coredump+0x10/0x10
 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe]
 process_one_work+0x22e/0x6f0
 worker_thread+0x1e8/0x3d0
 ? __pfx_worker_thread+0x10/0x10
 kthread+0x11f/0x250
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x47/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1a/0x30

v2: Detailed commit description (Rodrigo)
v3: FIXME added (Rodrigo, Stuart)

Fixes: 4209d635a8 ("drm/xe: Remove devcoredump during driver release")
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-08-01 11:28:35 -04:00
Lukasz Laguna
552dbba1ca drm/xe/vf: Disable CSC support on VF
CSC is not accessible by VF drivers, so disable its support flag on VF
to prevent further initialization attempts.

Fixes: e02cea83d3 ("drm/xe/gsc: add Battlemage support")
Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Cc: Alexander Usyskin <alexander.usyskin@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com
2025-07-30 13:09:46 +02:00