linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-02 21:42:42 -04:00

Author	SHA1	Message	Date
Chaitanya Kumar Borah	d738e1be2b	drm/xe/wcl: Extend L3bank mask workaround The commit `9ab440a9d0` ("drm/xe/ptl: L3bank mask is not available on the media GT") added a workaround to ignore the fuse register that L3 bank availability as it did not contain valid values. Same is true for WCL therefore extend the workaround to cover it. Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://lore.kernel.org/r/20250822002512.1129144-1-chaitanya.kumar.borah@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>	2025-08-26 16:30:48 -03:00
Riana Tauro	d1f51a4f95	drm/xe/xe_hw_error: Add fault injection to trigger csc error handler Add a debugfs fault handler to trigger csc error handler that wedges the device and enables runtime survivability mode. v2: add debugfs only for bmg (Umesh) v3: do not use csc_fault attribute if debugfs is not enabled v4: rebase Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-11-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	a7df563b45	drm/xe/xe_hw_error: Handle CSC Firmware reported Hardware errors Add support to handle CSC firmware reported errors. When CSC firmware errors are encoutered, a error interrupt is received by the GFX device as a MSI interrupt. Device Source control registers indicates the source of the error as CSC The HEC error status register indicates that the error is firmware reported Depending on the type of error, the error cause is written to the HEC Firmware error register. On encountering such CSC firmware errors, the graphics device is non-recoverable from driver context. The only way to recover from these errors is firmware flash. System admin/userspace is notified of the necessity of firmware flash with a combination of vendor-specific drm device edged uevent, dmesg logs and runtime survivability sysfs. It is the responsiblity of the consumer to verify all the actions and then trigger a firmware flash using tools like fwupd. $ udevadm monitor --property --kernel monitor will print the received events for: KERNEL - the kernel uevent KERNEL[754.709341] change /devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 (drm) ACTION=change DEVPATH=/devices/pci0000:00/0000:00:01.0/0000:01:00.0/0000:02:01.0/0000:03:00.0/drm/card0 SUBSYSTEM=drm WEDGED=vendor-specific DEVNAME=/dev/dri/card0 DEVTYPE=drm_minor SEQNUM=5973 MAJOR=226 MINOR=0 Logs xe 0000:03:00.0: [drm] ERROR [Hardware Error]: Tile0 reported NONFATAL error 0x20000 xe 0000:03:00.0: [drm] ERROR [Hardware Error]: NONFATAL: HEC Uncorrected FW FD Corruption error reported, bit[2] is set xe 0000:03:00.0: Runtime Survivability mode enabled xe 0000:03:00.0: [drm] ERROR CRITICAL: Xe has declared device 0000:03:00.0 as wedged. IOCTLs and executions are blocked. Only a rebind may clear the failure Please file a _new_ bug report at https://gitlab.freedesktop.org/drm/xe/kernel/issues/new xe 0000:03:00.0: [drm] device wedged, needs recovery xe 0000:03:00.0: Firmware flash required, Please refer to the userspace documentation for more details! Runtime survivability Sysfs: /sys/bus/pci/devices/<device>/survivability_mode v2: use vendor recovery method with runtime survivability (Christian, Rodrigo, Raag) v3: move declare wedged to runtime survivability mode (Rodrigo) v4: update commit message Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-10-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	0a2a873d61	drm/xe: Add support to handle hardware errors Gfx device reports two classes of errors: uncorrectable and correctable. Depending on the severity uncorrectable errors are further classified Non-Fatal and Fatal. Correctable and Non-Fatal errors: These errors are reported as MSI. Bits in the Master Interrupt Register indicate the class of the error. The source of the error is then read from the Device Error Source Register. Fatal errors: These are reported as PCIe errors When a PCIe error is asserted, the OS will perform a SBR (Secondary Bus reset) which causes the driver to reload. The error registers are sticky and the values are maintained through SBR. Add basic support to handle these errors. Bspec: 50875, 53073, 53074, 53075, 53076 v2: Format commit message (Umesh) v3: fix documentation (Stuart) Cc: Stuart Summers <stuart.summers@intel.com> Co-developed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-9-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	f646c9f937	drm/xe/doc: Document device wedged and runtime survivability Add documentation for vendor specific device wedged recovery method and runtime survivability. v2: fix documentation (Raag) v3: add userspace tool for firmware update (Raag) v4: use consistent documentation (Raag) v5: add more documentation Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-8-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	a2ca0633a0	drm/xe/xe_survivability: Add support for Runtime survivability mode Certain runtime firmware errors can cause the device to be in a unusable state requiring a firmware flash to restore normal operation. Runtime Survivability Mode indicates firmware flash is necessary by wedging the device and exposing survivability mode sysfs. The below sysfs is an indication that device is in survivability mode /sys/bus/pci/devices/<device>/survivability_mode v2: Fix kernel-doc (Umesh) v3: Add user friendly dmesg (Frank) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-7-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	41ff795aff	drm/xe/xe_survivability: Refactor survivability mode Refactor survivability mode code to support both boot and runtime survivability. Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-6-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	60439ac3f2	drm/xe: Add a helper function to set recovery method Add a helper function to set recovery method. The recovery method can be set before declaring the device wedged and sending the drm wedged uevent. If no method is set, default unbind/re-bind method will be set. v2: fix documentation (Raag) Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-5-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	90fdcf5f89	drm/xe: Set GT as wedged before sending wedged uevent Userspace should be notified after setting the device as wedged. Re-order function calls to set gt wedged before sending uevent. Cc: Matthew Brost <matthew.brost@intel.com> Suggested-by: Raag Jadav <raag.jadav@intel.com> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-4-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	9c857a9d84	drm: Add a vendor-specific recovery method to drm device wedged uevent Address the need for a recovery method (firmware flash on Firmware errors) introduced in the later patches of Xe KMD. Whenever XE KMD detects a firmware error, a firmware flash is required to recover the device to normal operation. The initial proposal to use 'firmware-flash' as a recovery method was not applicable to other drivers and could cause multiple recovery methods specific to vendors to be added. To address this a more generic 'vendor-specific' method is introduced, guiding users to refer to vendor specific documentation and system logs for detailed vendor specific recovery procedure. Add a recovery method 'WEDGED=vendor-specific' for such errors. Vendors must provide additional recovery documentation if this method is used. It is the responsibility of the consumer to refer to the correct vendor specific documentation and usecase before attempting a recovery. For example: If driver is XE KMD, the consumer must refer to the documentation of 'Device Wedging' under 'Documentation/gpu/xe/'. v2: fix documentation (Raag) v3: add more details to commit message (Sima, Rodrigo, Raag) add an example script to the documentation (Raag) v4: use consistent naming (Raag) v5: fix commit message v6: add more documentation Cc: André Almeida <andrealmeid@igalia.com> Cc: Christian König <christian.koenig@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona.vetter@ffwll.ch> Cc: Maxime Ripard <mripard@kernel.org> Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250826063419.3022216-3-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Riana Tauro	38fc73b8c7	drm/xe: Add documentation for Xe Device Wedging Add documentation for Xe Device Wedging so that file can be referenced in following patches. Signed-off-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250826063419.3022216-2-riana.tauro@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:11:34 -04:00
Himal Prasad Ghimiray	418807860e	drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow userspace to query memory attributes of VMAs within a user specified virtual address range. Userspace first calls the ioctl with num_mem_ranges = 0, sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve the number of memory ranges (vmas) and size of each memory range attribute. Then, it allocates a buffer of that size and calls the ioctl again to fill the buffer with memory range attributes. This two-step interface allows userspace to first query the required buffer size, then retrieve detailed attributes efficiently. v2 (Matthew Brost) - Use same ioctl to overload functionality v3 - Add kernel-doc v4 - Make uapi future proof by passing struct size (Matthew Brost) - make lock interruptible (Matthew Brost) - set reserved bits to zero (Matthew Brost) - s/__copy_to_user/copy_to_user (Matthew Brost) - Avod using VMA term in uapi (Thomas) - xe_vm_put(vm) is missing (Shuicheng) v5 - Nits - Fix kernel-doc Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-21-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	e80b05b09f	drm/xe: Enable madvise ioctl for xe Ioctl enables setting up of memory attributes in user provided range. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-20-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	a2eb8aec3e	drm/xe: Reset VMA attributes to default in SVM garbage collector Restore default memory attributes for VMAs during garbage collection if they were modified by madvise. Reuse existing VMA if fully overlapping; otherwise, allocate a new mirror VMA. v2 (Matthew Brost) - Add helper for vma split - Add retry to get updated vma v3 - Rebase on gpuvm layer Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-19-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	58dc430d89	drm/xe/vm: Add helper to check for default VMA memory attributes Introduce a new helper function `xe_vma_has_default_mem_attrs()` to determine whether a VMA's memory attributes are set to their default values. This includes checks for atomic access, PAT index, and preferred location. Also, add a new field `default_pat_index` to `struct xe_vma_mem_attr` to track the initial PAT index set during the first bind. This helps distinguish between default and user-modified pat index, such as those changed via madvise. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-18-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	002f817d61	drm/xe/madvise: Skip vma invalidation if mem attr are unchanged If a VMA within the madvise input range already has the same memory attribute as the one requested by the user, skip PTE zapping for that VMA to avoid unnecessary invalidation. v2 (Matthew Brost) - fix skip_invalidation for new attributes - s/u32/bool - Remove unnecessary assignment for kzalloc'ed Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-17-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	293032eec4	drm/xe/bo: Update atomic_access attribute on madvise Update the bo_atomic_access based on user-provided input and determine the migration to smem during a CPU fault v2 (Matthew Brost) - Avoid cpu unmapping if bo is already in smem - check atomics on smem too for ioctl - Add comments v3 - Avoid migration in prefetch v4 (Matthew Brost) - make sanity check function bool - add assert for smem placement - fix doc v5 (Matthew Brost) - NACK atomic fault with DRM_XE_ATOMIC_CPU Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-16-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	072e299982	drm/xe/bo: Add attributes field to xe_bo A single BO can be linked to multiple VMAs, making VMA attributes insufficient for determining the placement and PTE update attributes of the BO. To address this, an attributes field has been added to the BO. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-15-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	c1bb69a2e8	drm/xe/svm: Consult madvise preferred location in prefetch When prefetch region is DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC, prefetch svm ranges to preferred location provided by madvise. v2 (Matthew Brost) - Fix region, devmem_fd usages - consult madvise is applicable for other vma's too. v3 - Fix atomic handling v4 - Fix xe_svm_range_validate to check for DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC too. Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-14-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	18d36fd6d1	drm/xe/svm: Support DRM_XE_SVM_MEM_RANGE_ATTR_PAT memory attribute This attributes sets the pat_index for the svm used vma range, which is utilized to ascertain the coherence. v2 (Matthew Brost) - Pat index sanity check Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-12-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	a894c27407	drm/xe/madvise: Update migration policy based on preferred location When the user sets the valid devmem_fd as a preferred location, GPU fault will trigger migration to tile of device associated with devmem_fd. If the user sets an invalid devmem_fd the preferred location is current placement(smem) only. v2(Matthew Brost) - Default should be faulting tile - remove devmem_fd used as region v3 (Matthew Brost) - Add migration_policy - Fix return condition - fix migrate condition v4 -Rebase v5 - Add check for userptr and bo based vmas Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-11-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	d6db171167	drm/xe/svm: Add svm ranges migration policy on atomic access If the platform does not support atomic access on system memory, and the ranges are in system memory, but the user requires atomic accesses on the VMA, then migrate the ranges to VRAM. Apply this policy for prefetch operations as well. v2 - Drop unnecessary vm_dbg v3 (Matthew Brost) - fix atomic policy - prefetch shouldn't have any impact of atomic - bo can be accessed from vma, avoid duplicate parameter v4 (Matthew Brost) - Remove TODO comment - Fix comment - Dont allow gpu atomic ops when user is setting atomic attr as CPU v5 (Matthew Brost) - Fix atomic checks - Add userptr checks Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-10-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	ada7486c56	drm/xe: Implement madvise ioctl for xe This driver-specific ioctl enables UMDs to control the memory attributes for GPU VMAs within a specified input range. If the start or end addresses fall within an existing VMA, the VMA is split accordingly. The attributes of the VMA are modified as provided by the users. The old mappings of the VMAs are invalidated, and TLB invalidation is performed if necessary. v2(Matthew brost) - xe_vm_in_fault_mode can't be enabled by Mesa, hence allow ioctl in non fault mode too - fix tlb invalidation skip for same ranges in multiple op - use helper for tlb invalidation - use xe_svm_notifier_lock/unlock helper - s/lockdep_assert_held/lockdep_assert_held_write - Add kernel-doc v3(Matthew Brost) - make vfunc fail safe - Add sanitizing input args before vfunc v4(Matthew Brost/Shuicheng) - Make locks interruptable - Error handling fixes - vm_put fixes v5(Matthew Brost) - Flush garbage collector before any locking. - Add check for null vma Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-9-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	6ca463ef0d	drm/xe/svm: Add xe_svm_ranges_zap_ptes_in_range() for PTE zapping Introduce xe_svm_ranges_zap_ptes_in_range(), a function to zap page table entries (PTEs) for all SVM ranges within a user-specified address range. -v2 (Matthew Brost) Lock should be called even for tlb_invalidation v3(Matthew Brost) - Update comment - s/notifier->itree.start/drm_gpusvm_notifier_start - s/notifier->itree.last + 1/drm_gpusvm_notifier_end - use WRITE_ONCE Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-8-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	6ad887f378	drm/xe: Allow CPU address mirror VMA unbind with gpu bindings for madvise In the case of the MADVISE ioctl, if the start or end addresses fall within a VMA and existing SVM ranges are present, remove the existing SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP unmapping of old one. v2 (Matthew Brost) - Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse - Rename the function v3 - Fix doc v4 - check if range is already in garbage collector (Matthew Brost) Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-7-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	186b526abd	drm/xe/svm: Split system allocator vma incase of madvise call If the start or end of input address range lies within system allocator vma split the vma to create new vma's as per input range. v2 (Matthew Brost) - Add lockdep_assert_write for vm->lock - Remove unnecessary page aligned checks - Add kerrnel-doc and comments - Remove unnecessary unwind_ops and return v3 - Fix copying of attributes v4 - Nit fixes v5 - Squash identifier for madvise in xe_vma_ops to this patch v6/v7/v8 - Rebase on drm_gpuvm changes Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-6-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	29c39c56a0	drm/xe/vma: Modify new_vma to accept struct xe_vma_mem_attr as parameter This change simplifies the logic by ensuring that remapped previous or next VMAs are created with the same memory attributes as the original VMA. By passing struct xe_vma_mem_attr as a parameter, we maintain consistency in memory attributes. -v2 dst = src (Matthew Brost) -v3 (Matthew Brost) Drop unnecessary helper pass attr ptr as input to new_vma and vma_create Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-5-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	11974fe8c7	drm/xe/vma: Move pat_index to vma attributes The PAT index determines how PTEs are encoded and can be modified by madvise. Therefore, it is now part of the vma attributes. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-4-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Himal Prasad Ghimiray	99a89e4e2d	drm/xe/vm: Add attributes struct as member of vma The attribute of xe_vma will determine the migration policy and the encoding of the page table entries (PTEs) for that vma. This attribute helps manage how memory pages are moved and how their addresses are translated. It will be used by madvise to set the behavior of the vma. v2 (Matthew Brost) - Add docs v3 (Matthew Brost) - Add uapi references - 80 characters line wrap Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-3-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Lucas De Marchi	9d527c4f14	Merge drm/drm-next into drm-xe-next Sync with drm-misc-next which is necessary for changes in gpuvm and gpusvm that will be used in xe. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-25 22:08:34 -07:00
Carlos Llamas	41be792f5b	drm/xe: switch to local xbasename() helper Commit `b0a2ee5567` ("drm/xe: prepare xe_gen_wa_oob to be multi-use") introduced a call to basename(). The GNU version of this function is not portable and fails to build with alternative libc implementations like musl or bionic. This causes the following build error: drivers/gpu/drm/xe/xe_gen_wa_oob.c:130:12: error: assignment to ‘const char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] 130 \| fn = basename(fn); \| ^ While a POSIX version of basename() could be used, it would require a separate header plus the behavior differs from GNU version in that it might modify its argument. Not great. Instead, implement a local xbasename() helper based on strrchr() that provides the same functionality and avoids portability issues. Fixes: `b0a2ee5567` ("drm/xe: prepare xe_gen_wa_oob to be multi-use") Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tiffany Yang <ynaffit@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250825155743.1132433-1-cmllamas@google.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-25 22:04:26 -07:00
Matthew Brost	ffdf968762	drm/xe: Don't trigger rebind on initial dma-buf validation On the first validate of an imported dma-buf (initial bind), the device has no GPU mappings, so a rebind is unnecessary. Rebinding here is harmful in multi-GPU setups and for VMs using preempt-fence mode, as it would evict in-flight GPU work. v2: - Drop dma_buf_validated, check for XE_PL_SYSTEM (Thomas) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250825152841.3837378-1-matthew.brost@intel.com	2025-08-25 21:59:54 -07:00
Thomas Hellström	358ee50ab5	drm/xe/vm: Clear the scratch_pt pointer on error Avoid triggering a dereference of an error pointer on cleanup in xe_vm_free_scratch() by clearing any scratch_pt error pointer. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `06951c2ee7` ("drm/xe: Use NULL PTEs as scratch PTEs") Cc: Brian Welty <brian.welty@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821143045.106005-4-thomas.hellstrom@linux.intel.com	2025-08-25 10:24:13 +02:00
Thomas Hellström	b5dd1505a3	drm/xe/tests/xe_dma_buf: Set the drm_object::dma_buf member This member is set when exporting using prime. However the xe_gem_prime_export() alone doesn't set it, since it's done later in the prime export flow. For the test, set it manually and remove the hack that set it temporarily when it was really needed. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821143045.106005-3-thomas.hellstrom@linux.intel.com	2025-08-25 10:23:55 +02:00
Thomas Hellström	0a51bf3e54	drm/xe/vm: Don't pin the vm_resv during validation The pinning has the odd side-effect that unlocking any resv during validation triggers an "unlocking pinned lock" warning. Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `9d5558649f` ("drm/xe: Rework eviction rejection of bound external bos") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821143045.106005-2-thomas.hellstrom@linux.intel.com	2025-08-25 10:23:42 +02:00
Zbigniew Kempczyński	8ae04fe9ff	drm/xe/xe_sync: avoid race during ufence signaling Marking ufence as signalled after copy_to_user() is too late. Worker thread which signals ufence by memory write might be raced with another userspace vm-bind call. In map/unmap scenario unmap may still see ufence is not signalled causing -EBUSY. Change the order of marking / write to user-fence fixes this issue. Fixes: `977e5b82e0` ("drm/xe: Expose user fence from xe_sync_entry") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5536 Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250820083903.2109891-2-zbigniew.kempczynski@intel.com	2025-08-24 22:15:25 -07:00
Lucas De Marchi	13dda74a16	drm/xe/configfs: Dump custom settings when binding Device configuration using configfs could be prepared long time prior the driver load. Currently all the xe configfs entries are for things that are important to have in the log if a non-default value is being used. Add a info-level message about that with the individual entries that are different than the default. Based on previous patch by Michal Wajdeczko. Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-12-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:46 -07:00
Lucas De Marchi	66b21c338e	drm/xe/configfs: Minor fixes to documentation Add a few missing punctuation and line breaks and make the syntax for code snippets common to all of them. Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-11-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:46 -07:00
Lucas De Marchi	e2b33fce5e	drm/xe/configfs: Improve documentation steps The steps are roughly: 1. Load the module without binding to the device 2. Configure the desired device 3. Bind the device Move the binding part to the "Create devices" since it's not exclusive to the survivability_mode attribute and better document the steps. Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-10-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:46 -07:00
Lucas De Marchi	3eb2280f6a	drm/xe/configfs: Use tree-like output in documentation When documenting the directories, use an output similar to the `tree` command and add VFs and missing attributes. Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-9-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:45 -07:00
Lucas De Marchi	734197a933	drm/xe/configfs: Use guard() for dev->lock Instead of the manual lock()/unlock() pattern, use guard() which will make things easier for handling errors or early returns. Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-8-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:45 -07:00
Lucas De Marchi	afe902848b	drm/xe/configfs: Allow to enable PSMI Now that additional WAs are in place and it's possible to allocate buffers through debugfs, add the configfs attribute to turn PSMI on. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-7-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:45 -07:00
Lucas De Marchi	49245b4961	drm/xe/configfs: Simplify kernel doc From the caller perspective reading the documentation, there's no need to be so specific about everything the function is doing/checking. Just document the functionality a caller cares about. Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-6-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Vinay Belgaumkar	95b3899b4d	drm/xe/psmi: Add Wa_16023683509 This WA ensures GuC will restore the media MCFG registers at C6 exit. Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-5-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Badal Nilawar	29042df3ac	drm/xe/psmi: Add Wa_14020001231 Enable Wa 14020001231 to block psmi interrupts during C6 entry exit flow. It's only enabled if PSMI is enabled in runtime. Signed-off-by: Badal Nilawar <badal.nilawar@intel.com> Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-4-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:44 -07:00
Lucas De Marchi	d67b1dfad0	drm/xe/rtp: Add match for psmi Add match to be used on WAs for only enabling workarounds if psmi is intended to be used. Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-3-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:43 -07:00
Lucas De Marchi	aaa0c1f50a	drm/xe/psmi: Add debugfs interface for PSMI Requirement for PSMI capture is to have a physically contiguous buffer. All the needed configuration is done by the userspace tool directly to the GPU via mmio access. This interface only support allocating from VRAM regions. For integrated devices, the PSMI buffer is in SYSTEM memory and should be allocated by userspace using hugetlbfs. Here we add the ability to allocate a region of physically contiguous memory by writing to debugfs file (listed below). For multi-tile devices, the capture tool requires ability to allocate a capture buffer per tile (VRAM region) and so user can specify a region_mask. The tool then can mmap the buffers via direct mmap of the PCIBAR via sysfs. To support the capture tool, 3 new debugfs entries are added: psmi_capture_addr - physical address per VRAM region's capture buffer psmi_capture_region_mask - select which region(s) to allocate a buffer psmi_capture_size - size of current capture buffer Writing psmi_capture_size will allocate new buffer of requested size per region after freeing any current buffers. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Original-author: Brian Welty <brian.welty@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> # v2 Link: https://lore.kernel.org/r/20250821-psmi-v5-2-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:43 -07:00
Lucas De Marchi	efeb036ffd	drm/xe/psmi: Add GuC flag to enable PSMI PSMI allows to capture data from the GPU useful for early validation. From the kernel side there isn't much to be done, just a few things: 1) Toggle the feature support in GuC 2) Enable some additional WAs 3) Allocate buffers Here is the first step, with the next ones to follow. For now everything is disabled through a check in configfs that is currently hardcoded to disabled. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250821-psmi-v5-1-34ab7550d3d8@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-22 11:46:43 -07:00
Stuart Summers	2515d2b9ab	drm/xe/pcode: Initialize data0 for pcode read routine There are two registers filled in when reading data from pcode besides the mailbox itself. Currently, we allow a NULL value for the second of these two (data1) and assume the first is defined. However, many of the routines that are calling this function assume that pcode will ignore the value being passed in and so leave that first value (data0) defined but uninitialized. To be safe, make sure this value is always initialized to something (0 generally) in the event pcode behavior changes and starts using this value. v2: Fix sob/author Signed-off-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250819201054.393220-1-stuart.summers@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-22 12:26:06 -04:00
Michal Wajdeczko	bc2b206268	drm/xe/kunit: Extend platform generator with PTL Our list of typical platforms used to generate test device objects does not contain any PANTHERLAKE. Add one. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250818192032.633-1-michal.wajdeczko@intel.com	2025-08-21 17:53:14 +02:00

1 2 3 4 5 ...

116363 Commits