Commit Graph

3735 Commits

Author SHA1 Message Date
Linus Torvalds
260f6f4fda Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel
Pull drm updates from Dave Airlie:
 "Highlights:

   - Intel xe enable Panthor Lake, started adding WildCat Lake

   - amdgpu has a bunch of reset improvments along with the usual IP
     updates

   - msm got VM_BIND support which is important for vulkan sparse memory

   - more drm_panic users

   - gpusvm common code to handle a bunch of core SVM work outside
     drivers.

  Detail summary:

  Changes outside drm subdirectory:
   - 'shrink_shmem_memory()' for better shmem/hibernate interaction
   - Rust support infrastructure:
      - make ETIMEDOUT available
      - add size constants up to SZ_2G
      - add DMA coherent allocation bindings
   - mtd driver for Intel GPU non-volatile storage
   - i2c designware quirk for Intel xe

  core:
   - atomic helpers: tune enable/disable sequences
   - add task info to wedge API
   - refactor EDID quirks
   - connector: move HDR sink to drm_display_info
   - fourcc: half-float and 32-bit float formats
   - mode_config: pass format info to simplify

  dma-buf:
   - heaps: Give CMA heap a stable name

  ci:
   - add device tree validation and kunit

  displayport:
   - change AUX DPCD access probe address
   - add quirk for DPCD probe
   - add panel replay definitions
   - backlight control helpers

  fbdev:
   - make CONFIG_FIRMWARE_EDID available on all arches

  fence:
   - fix UAF issues

  format-helper:
   - improve tests

  gpusvm:
   - introduce devmem only flag for allocation
   - add timeslicing support to GPU SVM

  ttm:
   - improve eviction

  sched:
   - tracing improvements
   - kunit improvements
   - memory leak fixes
   - reset handling improvements

  color mgmt:
   - add hardware gamma LUT handling helpers

  bridge:
   - add destroy hook
   - switch to reference counted drm_bridge allocations
   - tc358767: convert to devm_drm_bridge_alloc
   - improve CEC handling

  panel:
   - switch to reference counter drm_panel allocations
   - fwnode panel lookup
   - Huiling hl055fhv028c support
   - Raspberry Pi 7" 720x1280 support
   - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK
   - simple: AUO P238HAN01
   - st7701: Winstar wf40eswaa6mnn0
   - visionox: rm69299-shift
   - Renesas R61307, Renesas R69328 support
   - DJN HX83112B

  hdmi:
   - add CEC handling
   - YUV420 output support

  xe:
   - WildCat Lake support
   - Enable PanthorLake by default
   - mark BMG as SRIOV capable
   - update firmware recommendations
   - Expose media OA units
   - aux-bux support for non-volatile memory
   - MTD intel-dg driver for non-volatile memory
   - Expose fan control and voltage regulator in sysfs
   - restructure migration for multi-device
   - Restore GuC submit UAF fix
   - make GEM shrinker drm managed
   - SRIOV VF Post-migration recovery of GGTT nodes
   - W/A additions/reworks
   - Prefetch support for svm ranges
   - Don't allocate managed BO for each policy change
   - HWMON fixes for BMG
   - Create LRC BO without VM
   - PCI ID updates
   - make SLPC debugfs files optional
   - rework eviction rejection of bound external BOs
   - consolidate PAT programming logic for pre/post Xe2
   - init changes for flicker-free boot
   - Enable GuC Dynamic Inhibit Context switch

  i915:
   - drm_panic support for i915/xe
   - initial flip queue off by default for LNL/PNL
   - Wildcat Lake Display support
   - Support for DSC fractional link bpp
   - Support for simultaneous Panel Replay and Adaptive sync
   - Support for PTL+ double buffer LUT
   - initial PIPEDMC event handling
   - drm_panel_follower support
   - DPLL interface renames
   - allocate struct intel_display dynamically
   - flip queue preperation
   - abstract DRAM detection better
   - avoid GuC scheduling stalls
   - remove DG1 force probe requirement
   - fix MEI interrupt handler on RT kernels
   - use backlight control helpers for eDP
   - more shared display code refactoring

  amdgpu:
   - add userq slot to INFO ioctl
   - SR-IOV hibernation support
   - Suspend improvements
   - Backlight improvements
   - Use scaling for non-native eDP modes
   - cleaner shader updates for GC 9.x
   - Remove fence slab
   - SDMA fw checks for userq support
   - RAS updates
   - DMCUB updates
   - DP tunneling fixes
   - Display idle D3 support
   - Per queue reset improvements
   - initial smartmux support

  amdkfd:
   - enable KFD on loongarch
   - mtype fix for ext coherent system memory

  radeon:
   - CS validation additional GL extensions
   - drop console lock during suspend/resume
   - bump driver version

  msm:
   - VM BIND support
   - CI: infrastructure updates
   - UBWC single source of truth
   - decouple GPU and KMS support
   - DP: rework I/O accessors
   - DPU: SM8750 support
   - DSI: SM8750 support
   - GPU: X1-45 support and speedbin support for X1-85
   - MDSS: SM8750 support

  nova:
   - register! macro improvements
   - DMA object abstraction
   - VBIOS parser + fwsec lookup
   - sysmem flush page support
   - falcon: generic falcon boot code and HAL
   - FWSEC-FRTS: fb setup and load/execute

  ivpu:
   - Add Wildcat Lake support
   - Add turbo flag

  ast:
   - improve hardware generations implementation

  imx:
   - IMX8qxq Display Controller support

  lima:
   - Rockchip RK3528 GPU support

  nouveau:
   - fence handling cleanup

  panfrost:
   - MT8370 support
   - bo labeling
   - 64-bit register access

  qaic:
   - add RAS support

  rockchip:
   - convert inno_hdmi to a bridge

  rz-du:
   - add RZ/V2H(P) support
   - MIPI-DSI DCS support

  sitronix:
   - ST7567 support

  sun4i:
   - add H616 support

  tidss:
   - add TI AM62L support
   - AM65x OLDI bridge support

  bochs:
   - drm panic support

  vkms:
   - YUV and R* format support
   - use faux device

  vmwgfx:
   - fence improvements

  hyperv:
   - move out of simple
   - add drm_panic support"

* tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits)
  drm/tidss: oldi: convert to devm_drm_bridge_alloc() API
  drm/tidss: encoder: convert to devm_drm_bridge_alloc()
  drm/amdgpu: move reset support type checks into the caller
  drm/amdgpu/sdma7: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma6: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset
  drm/amdgpu/sdma5: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx12: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx11: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx10: re-emit unprocessed state on ring reset
  drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset
  drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset
  drm/amdgpu: Add WARN_ON to the resource clear function
  drm/amd/pm: Use cached metrics data on SMUv13.0.6
  drm/amd/pm: Use cached data for min/max clocks
  gpu: nova-core: fix bounds check in PmuLookupTableEntry::new
  drm/amdgpu: Replace HQD terminology with slots naming
  drm/amdgpu: Add user queue instance count in HW IP info
  drm/amd/amdgpu: Add helper functions for isp buffers
  drm/amd/amdgpu: Initialize swnode for ISP MFD device
  ...
2025-07-30 19:26:49 -07:00
Linus Torvalds
9669b2499e Merge tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform drivers from Ilpo Järvinen:

 - alienware: Add more precise labels to fans

 - amd/hsmp: Improve misleading probe errors (make the legacy driver
   aware when HSMP is supported through the ACPI driver)

 - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list

 - drm/xe: Correct (D)VSEC information to support PMT crashlog feature

 - fujitsu: Clamp charge threshold instead of returning an error

 - ideapad: Expore change types

 - intel/pmt:
     - Add PMT Discovery driver
     - Add API to retrieve telemetry regions by feature
     - Fix crashlog NULL access
     - Support Battlemage GPU (BMG) crashlog

 - intel/vsec:
     - Add Discovery feature
     - Add feature dependency support using device links

 - lenovo:
     - Move lenovo drivers under drivers/platform/x86/lenovo/
     - Add WMI drivers for Lenovo Gaming series
     - Improve DMI handling

 - oxpec:
     - Add support for OneXPlayer X1 Mini Pro (Strix Point variant)
     - Fix EC registers for G1 AMD

 - samsung-laptop: Expose change types

 - wmi: Fix WMI device naming issue (same GUID corner cases)

 - x86-android-tables: Add ovc-capacity-table to generic battery nodes

 - Miscellaneous cleanups / refactoring / improvements

* tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits)
  platform/x86: oxpec: Add support for OneXPlayer X1 Mini Pro (Strix Point)
  platform/x86: oxpec: Fix turbo register for G1 AMD
  platform/x86/intel/pmt: support BMG crashlog
  platform/x86/intel/pmt: use a version struct
  platform/x86/intel/pmt: refactor base parameter
  platform/x86/intel/pmt: add register access helpers
  platform/x86/intel/pmt: decouple sysfs and namespace
  platform/x86/intel/pmt: correct types
  platform/x86/intel/pmt: re-order trigger logic
  platform/x86/intel/pmt: use guard(mutex)
  platform/x86/intel/pmt: mutex clean up
  platform/x86/intel/pmt: white space cleanup
  drm/xe: Correct BMG VSEC header sizing
  drm/xe: Correct the rev value for the DVSEC entries
  platform/x86/intel/pmt: fix a crashlog NULL pointer access
  platform/x86: samsung-laptop: Expose charge_types
  platform/x86/amd: pmc: Add Lenovo Yoga 6 13ALC6 to pmc quirk list
  platform/x86: dell-uart-backlight: Use blacklight power constant
  platform/x86/intel/pmt: fix build dependency for kunit test
  platform/x86: lenovo: gamezone needs "other mode"
  ...
2025-07-28 23:21:28 -07:00
Lucas De Marchi
99e91521ce drm/xe: Fix build without debugfs
When CONFIG_DEBUG_FS is off, drivers/gpu/drm/xe/xe_gt_debugfs.o
is not built and build fails on some setups with:

	ld: drivers/gpu/drm/xe/xe_gt.o: in function `xe_fault_inject_gt_reset':
	drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1659): undefined reference to `gt_reset_failure'
	ld: drivers/gpu/drm/xe/xe_gt.h:27:(.text+0x1c16): undefined reference to `gt_reset_failure'
	collect2: error: ld returned 1 exit status

Do not use the gt_reset_failure attribute if debugfs is not enabled.

Fixes: 8f3013e0b2 ("drm/xe: Introduce fault injection for gt reset")
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Tested-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://lore.kernel.org/r/20250722-xe-fix-build-fault-v1-1-157384d50987@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 4d3bbe9dd2)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-07-24 21:06:52 +02:00
Michael J. Ruhl
5b27388171 drm/xe: Correct BMG VSEC header sizing
The intel_vsec_header information for the crashlog feature is
incorrect.

Update the VSEC header with correct sizing and count.

Since the crashlog entries are "merged" (num_entries = 2), the
separate capabilities entries must be merged as well.

Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250713172943.7335-4-michael.j.ruhl@intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-22 17:38:49 +03:00
Michael J. Ruhl
0ba9e9cf76 drm/xe: Correct the rev value for the DVSEC entries
By definition, the Designated Vendor Specific Extended Capability
(DVSEC) revision should be 1.

Add the rev value to be correct.

Fixes: 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: David E. Box <david.e.box@linux.intel.com>
Link: https://lore.kernel.org/r/20250713172943.7335-3-michael.j.ruhl@intel.com
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-07-22 17:38:47 +03:00
Dave Airlie
be3cd668ff Merge tag 'drm-misc-next-2025-07-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

- mode_config: Change fb_create prototype to pass the drm_format_info
  and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
  improvements
- tests: kunit EDID update

Driver Changes:

- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support

- bridge:
  - Make connector available to bridge detect hook

- panel:
  - More refcounting changes
  - New panels: BOE NE14QDM

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
2025-07-21 09:16:51 +10:00
Dave Airlie
af42cf30ea Merge tag 'drm-xe-next-2025-07-15' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Driver Changes:
 - Create and use XE_DEVICE_WA infrastructure (Atwood)
 - SRIOV: Mark BMG as SR-IOV capable (Michal)
 - Dont skip TLB invalidations on VF (Tejas)
 - Fix migration copy direction in access_memory (Auld)
 - General code clean-up (Lucas, Brost, Dr. David, Xin)
 - More missing XeLP workarounds (Tvrtko)
 - SRIOV: Relax VF/PF version negotiation (Michal)
 - SRIOV: LMTT invalidation (Michal)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aHacDvF9IaVHI61C@intel.com
2025-07-18 19:48:20 +10:00
Michal Wajdeczko
5c244eeca5 drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
(cherry picked from commit 1c38dd6afa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Michal Wajdeczko
81dccec448 drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
(cherry picked from commit 9f50b729dd)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Lucas De Marchi
057a7d66f9 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit 81e139db69)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:51 -03:00
Matthew Brost
3155ac8925 drm/xe: Move page fault init after topology init
We need the topology to determine GT page fault queue size, move page
fault init after topology init.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
(cherry picked from commit beb72acb5b)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:27 -03:00
Balasubramani Vivekanandan
2a58b21ade drm/xe/mocs: Initialize MOCS index early
MOCS uc_index is used even before it is initialized in the following
callstack
    guc_prepare_xfer()
    __xe_guc_upload()
    xe_guc_min_load_for_hwconfig()
    xe_uc_init_hwconfig()
    xe_gt_init_hwconfig()

Do MOCS index initialization earlier in the device probe.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com>
Link: https://lore.kernel.org/r/20250520142445.2792824-1-balasubramani.vivekanandan@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 241cc827c0)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:51:12 -03:00
Matthew Auld
1381c231c1 drm/xe/migrate: fix copy direction in access_memory
After we do the modification on the host side, ensure we write the
result back to VRAM and not the other way around, otherwise the
modification will be lost if treated like a read.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com
(cherry picked from commit c12fe703ca)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Tejas Upadhyay
fd25fa90ed drm/xe: Dont skip TLB invalidations on VF
Skipping TLB invalidations on VF causing unrecoverable
faults. Probable reason for skipping TLB invalidations
on SRIOV could be lack of support for instruction
MI_FLUSH_DW_STORE_INDEX. Add back TLB flush with some
additional handling.

Helps in resolving,
[  704.913454] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]]
                ASID: 0
                VFID: 0
                PDATA: 0x0d92
                Faulted Address: 0x0000000002fa0000
                FaultType: 0
                AccessType: 1
                FaultLevel: 0
                EngineClass: 3 bcs
                EngineInstance: 8
[  704.913551] xe 0000:00:02.1: [drm:pf_queue_work_func [xe]] Fault response: Unsuccessful -22

V2:
 - Use Xmas tree (MichalW)

Suggested-by: Matthew Brost <matthew.brost@intel.com>
Fixes: 97515d0b3e ("drm/xe/vf: Don't emit access to Global HWSP if VF")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250710045945.1023840-1-tejas.upadhyay@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
(cherry picked from commit b528e896fa)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-17 09:45:19 -03:00
Ville Syrjälä
800df9e50c drm/i915: Pass along the format info from .fb_create() to drm_helper_mode_fill_fb_struct()
Plumb the format info from .fb_create() all the way to
drm_helper_mode_fill_fb_struct() to avoid the redundant
lookup.

For the fbdev case a manual drm_get_format_info() lookup
is needed.

Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250701090722.13645-14-ville.syrjala@linux.intel.com
2025-07-16 20:09:08 +03:00
Maíra Canal
53dcd0eaa2 drm/xe: Use DRM_GPU_SCHED_STAT_NO_HANG to skip the reset
Xe can skip the reset if TDR has fired before the free job worker and can
also re-arm the timeout timer in some scenarios. Instead of manipulating
scheduler's internals, inform the scheduler that the job did not actually
timeout and no reset was performed through the new status code
DRM_GPU_SCHED_STAT_NO_HANG.

Note that, in the first case, there is no need to restart submission if it
hasn't been stopped.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-7-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15 08:27:08 -03:00
Maíra Canal
0a5dc1b67e drm/sched: Rename DRM_GPU_SCHED_STAT_NOMINAL to DRM_GPU_SCHED_STAT_RESET
Among the scheduler's statuses, the only one that indicates an error is
DRM_GPU_SCHED_STAT_ENODEV. Any status other than DRM_GPU_SCHED_STAT_ENODEV
signifies that the operation succeeded and the GPU is in a nominal state.

However, to provide more information about the GPU's status, it is needed
to convey more information than just "OK".

Therefore, rename DRM_GPU_SCHED_STAT_NOMINAL to
DRM_GPU_SCHED_STAT_RESET, which better communicates the meaning of this
status. The status DRM_GPU_SCHED_STAT_RESET indicates that the GPU has
hung, but it has been successfully reset and is now in a nominal state
again.

Reviewed-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20250714-sched-skip-reset-v6-1-5c5ba4f55039@igalia.com
Signed-off-by: Maíra Canal <mcanal@igalia.com>
2025-07-15 08:27:00 -03:00
Michal Wajdeczko
a816487681 drm/xe/pf: Invalidate LMTT after completing changes
Once we finish populating all leaf pages in the VF's LMTT we should
make sure that hardware will not access any stale data. Explicitly
force LMTT invalidation (as it was already planned in the past).

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-7-michal.wajdeczko@intel.com
2025-07-15 13:05:22 +02:00
Michal Wajdeczko
e497957fee drm/xe/pf: Invalidate LMTT during LMEM unprovisioning
Invalidate LMTT immediately after removing VF's LMTT page tables
and clearing root PTE in the LMTT PD to avoid any invalid access
by the hardware (and VF) due to stale data.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-6-michal.wajdeczko@intel.com
2025-07-15 13:05:20 +02:00
Michal Wajdeczko
68ae022278 drm/xe/pf: Force GuC virtualization mode
By default the GuC starts in the 'native' mode and enables the VGT
mode (aka 'virtualization' mode) only after it receives at least one
set of VF configuration data. While this happens naturally while PF
begins VFs provisioning, we might need this sooner as some actions,
like TLB_INVALIDATION_ALL(0x7002), is supported by the GuC only in
the VGT mode.

And this becomes a real problem if we would want to use above action
to invalidate the LMTT early during VFs auto-provisioning, before VFs
are enabled, as such H2G would be rejected:

 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: FAST_REQ H2G fence 0x804e failed! e=0x30, h=0
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: Fence 0x804e was used by action 0x7002 sent at:
      h2g_write+0x33e/0x870 [xe]
      __guc_ct_send_locked+0x1e1/0x1110 [xe]
      guc_ct_send_locked+0x9f/0x740 [xe]
      xe_guc_ct_send_locked+0x19/0x60 [xe]
      send_tlb_invalidation+0xc2/0x470 [xe]
      xe_gt_tlb_invalidation_all_async+0x45/0xa0 [xe]
      xe_gt_tlb_invalidation_all+0x4b/0xa0 [xe]
      lmtt_invalidate_hw+0x64/0x1a0 [xe]
      xe_lmtt_invalidate_hw+0x5c/0x340 [xe]
      pf_update_vf_lmtt+0x398/0xae0 [xe]
      pf_provision_vf_lmem+0x350/0xa60 [xe]
      xe_gt_sriov_pf_config_bulk_set_lmem+0xe2/0x410 [xe]
      xe_gt_sriov_pf_config_set_fair_lmem+0x1c6/0x620 [xe]
      xe_gt_sriov_pf_config_set_fair+0xd5/0x3f0 [xe]
      xe_pci_sriov_configure+0x360/0x1200 [xe]
      sriov_numvfs_store+0xbc/0x1d0
      dev_attr_store+0x17/0x40
      sysfs_kf_write+0x4a/0x80
      kernfs_fop_write_iter+0x166/0x220
      vfs_write+0x2ba/0x580
      ksys_write+0x77/0x100
      __x64_sys_write+0x19/0x30
      x64_sys_call+0x2bf/0x2660
      do_syscall_64+0x93/0x7a0
      entry_SYSCALL_64_after_hwframe+0x76/0x7e
 [ ] xe 0000:4d:00.0: [drm] *ERROR* GT0: CT dequeue failed: -71
 [ ] xe 0000:4d:00.0: [drm] GT0: trying reset from receive_g2h [xe]

This could be mitigated by pushing earlier a PF self-configuration
with some hard-coded values that cover unlimited access to the GGTT,
use of all GuC contexts and doorbells.  This step is sufficient for
the GuC to switch into the VGT mode.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-5-michal.wajdeczko@intel.com
2025-07-15 13:05:19 +02:00
Michal Wajdeczko
92ba2032a1 drm/xe/pf: Move GGTT config KLVs encoding to helper
In upcoming patch we will want to encode GGTT config KLVs based
on raw numbers, without relying on the allocated GGTT node.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-4-michal.wajdeczko@intel.com
2025-07-15 13:05:18 +02:00
Michal Wajdeczko
1c38dd6afa drm/xe/pf: Resend PF provisioning after GT reset
If we reload the GuC due to suspend/resume or GT reset then we
have to resend not only any VFs provisioning data, but also PF
configuration, like scheduling parameters (EQ, PT), as otherwise
GuC will continue to use default values.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-3-michal.wajdeczko@intel.com
2025-07-15 13:05:17 +02:00
Michal Wajdeczko
9f50b729dd drm/xe/pf: Prepare to stop SR-IOV support prior GT reset
As part of the resume or GT reset, the PF driver schedules work
which is then used to complete restarting of the SR-IOV support,
including resending to the GuC configurations of provisioned VFs.

However, in case of short delay between those two actions, which
could be seen by triggering a GT reset on the suspened device:

 $ echo 1 > /sys/kernel/debug/dri/0000:00:02.0/gt0/force_reset

this PF worker might be still busy, which lead to errors due to
just stopped or disabled GuC CTB communication:

 [ ] xe 0000:00:02.0: [drm:xe_gt_resume [xe]] GT0: resumed
 [ ] xe 0000:00:02.0: [drm] GT0: trying reset from force_reset_show [xe]
 [ ] xe 0000:00:02.0: [drm] GT0: reset queued
 [ ] xe 0000:00:02.0: [drm] GT0: reset started
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel stopped
 [ ] xe 0000:00:02.0: [drm:guc_ct_send_recv [xe]] GT0: H2G request 0x5503 canceled!
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 12 config KLVs (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF1 configuration (-ECANCELED)
 [ ] xe 0000:00:02.0: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 12 config KLVs (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push VF2 configuration (-ENODEV)
 [ ] xe 0000:00:02.0: [drm] GT0: PF: Failed to push 2 of 2 VFs configurations
 [ ] xe 0000:00:02.0: [drm:pf_worker_restart_func [xe]] GT0: PF: restart completed

While this VFs reprovisioning will be successful during next spin
of the worker, to avoid those errors, make sure to cancel restart
worker if we are about to trigger next reset.

Fixes: 411220808c ("drm/xe/pf: Restart VFs provisioning after GT reset")
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250711193316.1920-2-michal.wajdeczko@intel.com
2025-07-15 13:05:15 +02:00
Lucas De Marchi
f4d51b6ce5 drm/xe/lrc: Add table with LRC layout
Add a table to document the LRC's BO layout to make it easier to
visualize how each region stacks on top of each other.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-4-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Tvrtko Ursulin
aded26ccaa drm/xe: Waste fewer instructions in emit_wa_job()
I was debugging some unrelated issue and noticed the current code was
very verbose. We can improve it easily by using the more common batch
buffer building pattern.

Before:
                bb->cs[bb->len++] = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO;
     c4d:       41 8b 56 10             mov    0x10(%r14),%edx
     c51:       49 8b 4e 08             mov    0x8(%r14),%rcx
     c55:       8d 72 01                lea    0x1(%rdx),%esi
     c58:       41 89 76 10             mov    %esi,0x10(%r14)
     c5c:       c7 04 91 01 00 08 15    movl   $0x15080001,(%rcx,%rdx,4)
                        bb->cs[bb->len++] = entry->reg.addr;
     c63:       8b 08                   mov    (%rax),%ecx
     c65:       41 8b 56 10             mov    0x10(%r14),%edx
     c69:       49 8b 76 08             mov    0x8(%r14),%rsi
     c6d:       81 e1 ff ff 3f 00       and    $0x3fffff,%ecx
     c73:       8d 7a 01                lea    0x1(%rdx),%edi
     c76:       41 89 7e 10             mov    %edi,0x10(%r14)
     c7a:       89 0c 96                mov    %ecx,(%rsi,%rdx,4)
 ..etc..

After:
                *cs++ = MI_LOAD_REGISTER_REG | MI_LRR_DST_CS_MMIO;
     c52:       41 c7 04 24 01 00 08    movl   $0x15080001,(%r12)
     c59:       15
                        *cs++ = entry->reg.addr;
     c5a:       8b 10                   mov    (%rax),%edx
 ..etc..

Resulting in the following binary change:

	add/remove: 0/0 grow/shrink: 0/2 up/down: 0/-348 (-348)
	Function                                     old     new   delta
	xe_gt_record_default_lrcs.cold               304     296      -8
	xe_gt_record_default_lrcs                   2200    1860    -340
	Total: Before=13554, After=13206, chg -2.57%

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-7-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
f4b538245f drm/xe/gt: Drop third submission for default context
There's no need to submit the nop job again on the first queue. Any
state needed is already saved when the first LRC is switched out. The
comment is a little misleading regarding indirect W/A: first of all
there's still no indirect W/A enabled and secondly, even after they are,
there's no need to submit this job again for having their state
propagated: the indirect W/A will actually run on every LRC switch.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-6-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
6d891d22c6 drm/xe/lrc: Remove leftover TODO/FIXME
There isn't anything to set for CTX_TIMESTAMP handling in the empty
LRC: that is set on every LRC init since it should always start from 0
rather than the value saved in the image after first submission.

The FIXME about perma-pinning also doesn't make much sense as we will
always going to pin the lrc and the GGTT mapping has nothing to do with
VM bind.

Nuke these leftover comments.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-5-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:40:17 -07:00
Lucas De Marchi
fab2cc0c09 drm/xe/gt: Extract emit_job_sync()
Both the nop and wa jobs are going through the same boiler plate calls
to emit the job with a timeout and handling error for both bb and job.
Extract emit_job_sync() so those functions create the bb, handling
possible errors and delegate the part about really emitting the job
and waiting for its completion.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-3-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:03 -07:00
Lucas De Marchi
e4cb5823ba drm/xe: Count dwords before allocating
The bb allocation in emit_wa_job() is wrong in 2 ways: first it's
allocating enough space for the 3DSTATE or hardcoding 4k depending on
the engine. In the first case it doesn't account for the WAs and in the
former it may not be sufficient. Secondly it's using the size instead of
number of dwords, causing the buffer to be 4x bigger than needed:
xe_bb_new() receives number of dwords as parameter and its declaration
was also not following its implementation.

Lastly, reword the debug message since it's not only about the LRC WAs
anymore as it also include the 3DSTATE for render.

While it's unlikely this is causing any real issue, let's calculate the
needed space and allocate just enough.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-2-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:02 -07:00
Lucas De Marchi
76650bcf2a drm/xe/lrc: Reduce scope of empty lrc data
The only case in which new lrc data is created from scratch is when it's
called prior to recording the default lrc. There's no need to check for
NULL init_data since in that case the function already failed: just move
the allocation where it's needed.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710-lrc-refactors-v2-1-a5e2ca03f6bd@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 13:20:02 -07:00
Michal Wajdeczko
b533b8e5a1 drm/xe/vf: Store negotiated VF/PF ABI version at device level
There is no need to maintain PF ABI version on per-GT level.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-8-michal.wajdeczko@intel.com
2025-07-14 18:19:39 +02:00
Michal Wajdeczko
a6c384b24f drm/xe/pf: Stop requiring VF/PF version negotiation on every GT
While some VF/PF relay actions must be handled on the GT level,
like query for runtime registers, it was clarified by the arch
team that initial version negotiation can be done by the VF just
once, by using any available GuC/GT.

Move handling of the VF/PF ABI version negotiation on the PF side
from the GT level functions to the device level functions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-7-michal.wajdeczko@intel.com
2025-07-14 18:19:31 +02:00
Michal Wajdeczko
d962178a88 drm/xe/pf: Expose basic info about VFs in debugfs
We already have function to print summary about VFs, but we missed
to add debugfs attribute to make it visible. Do it now.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-6-michal.wajdeczko@intel.com
2025-07-14 18:19:29 +02:00
Michal Wajdeczko
ffab82b062 drm/xe: Introduce xe_gt_is_main_type helper
Instead of checking for not being a media type GT provide a small
helper to explicitly express our intentions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-5-michal.wajdeczko@intel.com
2025-07-14 18:19:29 +02:00
Michal Wajdeczko
76293a83a9 drm/xe: Introduce xe_tile_is_root helper
Instead of looking at the tile->id member provide a small helper
to explicitly express our intentions.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-4-michal.wajdeczko@intel.com
2025-07-14 18:19:28 +02:00
Michal Wajdeczko
73c0e8054f drm/xe: Move PF and VF device types to separate headers
We plan to add more PF and VF types and mixing them in a single
file is not desired.  Move them out to new dedicated files.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-3-michal.wajdeczko@intel.com
2025-07-14 18:18:49 +02:00
Michal Wajdeczko
7dcae5288a drm/xe: Combine PF and VF device data into union
There is no need to keep PF and VF data fields fully separate
since we can be only in one mode at the time. Move them into
a anonymous union to save few bytes.

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20250713103625.1964-2-michal.wajdeczko@intel.com
2025-07-14 17:53:39 +02:00
Xin Wang
8d4aec43f6 drm/xe: Update register definitions in LRC layout header
Update the register definitions in xe_lrc_layout.h to align with the
official hardware specification (Bspec) terminology. Specifically:

- rename PVC_CTX_ACC_CTR_THOLD to CTX_ACC_CTR_THOLD
- rename PVC_CTX_ASID to CTX_ASID

Signed-off-by: Xin Wang <x.wang@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711060924.7373-1-x.wang@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:34:44 -07:00
Tvrtko Ursulin
fba1230763 drm/xe: Add plumbing for indirect context workarounds
Some upcoming workarounds need to be emitted from the indirect workaround
context so lets add some plumbing where they will be able to easily slot
in.

No functional changes for now since everything is still deactivated.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Bspec: 45954
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-7-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:22:10 -07:00
Tvrtko Ursulin
a3397b24ae drm/xe: Allow specifying number of extra dwords at the end of wa bb emission
Indirect context setup will need more than one.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-6-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:55 -07:00
Tvrtko Ursulin
5ce511ad2b drm/xe: Track number of written dwords from workaround batch buffer emission
Indirect context setup will need to get to the number of written dwords.
Lets add it as an output parameter so it can be accessed from the finish
helper regardless of whether code is writing directly or via an shadow
buffer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-5-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:14 -07:00
Tvrtko Ursulin
1ec31d355c drm/xe: Rename utilization workaround emission function
Lucas suggested to consolidate to a slightly different naming scheme which
will align with the upcoming additions better.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-4-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:14 -07:00
Tvrtko Ursulin
81b79670a3 drm/xe: Pass wa bb setup arguments in a struct
Group the function arguments in a struct for more readable code and easier
extending.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-3-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:06:09 -07:00
Tvrtko Ursulin
fa7c2a2460 drm/xe: Generalize wa bb emission code
Generalize the wa bb emission by splitting it into three phases - setup,
emit and finish, and extract setup and finish steps into helpers.

This will enable using the same infrastructure for emitting the indirect
context workarounds.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20250711160153.49833-2-tvrtko.ursulin@igalia.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:04:35 -07:00
Lucas De Marchi
e08c0fa02e drm/xe: Fix missing kernel-doc
Fix warning:

	Warning: drivers/gpu/drm/xe/xe_device_types.h:658 struct member 'wa_active' not described in 'xe_device'

Fixes: 661a6950e0 ("drm/xe: Add infrastructure for Device OOB workarounds")
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Reviewed-by: Jonathan Cavitt <joanthan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250711214911.2009714-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 08:01:19 -07:00
Dr. David Alan Gilbert
8f3d1c9fb0 drm/xe: Remove unused functions
xe_bo_create_from_data() last use was removed in 2023 by
commit 0e1a47fcab ("drm/xe: Add a helper for DRM device-lifetime BO
create")

xe_rtp_match_first_gslice_fused_off() last use was removed in 2023 by
commit 4e124151fc ("drm/xe/dg2: Drop pre-production workarounds")

Remove them, and xe_dss_mask_empty whose last use was by
xe_rtp_match_first_gslice_fused_off().

(Xe has a bunch ofother symbols that have been added but not used,
given how new it is, I've left those, as opposed to these that
had the code that used them removed).

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Link: https://lore.kernel.org/r/20250713152531.219326-1-linux@treblig.org
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-14 07:55:18 -07:00
Lucas De Marchi
7b6db1731a drm/xe: Normalize default param values
Document xe module params with the default values following a similar
strategy for all of them:

	1) Define a DEFAULT_* macro with the default value. When the
	   value can't be directly stringified, also define a *_STR
	   variant
	2) Use __stringify() or the _STR variant to make sure the
	   default value shows up in the param description

This allows us to show the correct default according to the
configuration. max_vfs for example was wrongly documented for
CONFIG_DRM_XE_DEBUG and svm_notifier_size didn't have its default
documented.

Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://lore.kernel.org/r/20250626-guc-log-level-v3-1-c3ed8b452e91@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:56:38 -07:00
Lucas De Marchi
81e139db69 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:40:58 -07:00
Matthew Brost
4a1eaf7d11 drm/xe: Remove references to CONFIG_DRM_XE_DEVMEM_MIRROR
The prefetch code was referencing CONFIG_DRM_XE_DEVMEM_MIRROR, which has
been replaced by CONFIG_DRM_XE_PAGEMAP. As a result, prefetches were
limited to SRAM. Update the code to use CONFIG_DRM_XE_PAGEMAP instead of
the deprecated option.

Fixes: f86ad0ed62 ("drm/gpusvm, drm/pagemap: Move migration functionality to drm_pagemap")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20250710205413.1105595-1-matthew.brost@intel.com
2025-07-11 10:43:12 -07:00
Matthew Brost
beb72acb5b drm/xe: Move page fault init after topology init
We need the topology to determine GT page fault queue size, move page
fault init after topology init.

Cc: stable@vger.kernel.org
Fixes: 3338e4f90c ("drm/xe: Use topology to determine page fault queue size")
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250710191208.1040215-1-matthew.brost@intel.com
2025-07-11 10:43:11 -07:00