linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-22 00:33:58 -04:00

Author	SHA1	Message	Date
Linus Torvalds	b08494a8f7	Merge tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "As part of building up nova-core/nova-drm pieces we've brought in some rust abstractions through this tree, aux bus being the main one, with devres changes also in the driver-core tree. Along with the drm core abstractions and enough nova-core/nova-drm to use them. This is still all stub work under construction, to build the nova driver upstream. The other big NVIDIA related one is nouveau adds support for Hopper/Blackwell GPUs, this required a new GSP firmware update to 570.144, and a bunch of rework in order to support multiple fw interfaces. There is also the introduction of an asahi uapi header file as a precursor to getting the real driver in later, but to unblock userspace mesa packages while the driver is trapped behind rust enablement. Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe, and msm being the main ones, and some changes to vsprintf. new drivers: - bring in the asahi uapi header standalone - nova-drm: stub driver rust dependencies (for nova-core): - auxiliary - bus abstractions - driver registration - sample driver - devres changes from driver-core - revocable changes core: - add Apple fourcc modifiers - add virtio capset definitions - extend EXPORT_SYNC_FILE for timeline syncobjs - convert to devm_platform_ioremap_resource - refactor shmem helper page pinning - DP powerup/down link helpers - extended %p4cc in vsprintf.c to support fourcc prints - change vsprintf %p4cn to %p4chR, remove %p4cn - Add drm_file_err function - IN_FORMATS_ASYNC property - move sitronix from tiny to their own subdir rust: - add drm core infrastructure rust abstractions (device/driver, ioctl, file, gem) dma-buf: - adjust sg handling to not cache map on attach - allow setting dma-device for import - Add a helper to sort and deduplicate dma_fence arrays docs: - updated drm scheduler docs - fbdev todo update - fb rendering - actual brightness ttm: - fix delayed destroy resv object bridge: - add kunit tests - convert tc358775 to atomic - convert drivers to devm_drm_bridge_alloc - convert rk3066_hdmi to bridge driver scheduler: - add kunit tests panel: - refcount panels to improve lifetime handling - Powertip PH128800T004-ZZA01 - NLT NL13676BC25-03F, Tianma TM070JDHG34-00 - Himax HX8279/HX8279-D DDIC - Visionox G2647FB105 - Sitronix ST7571 - ZOTAC rotation quirk vkms: - allow attaching more displays i915: - xe3lpd display updates - vrr refactor - intel_display struct conversions - xe2hpd memory type identification - add link rate/count to i915_display_info - cleanup VGA plane handling - refactor HDCP GSC - fix SLPC wait boosting reference counting - add 20ms delay to engine reset - fix fence release on early probe errors xe: - SRIOV updates - BMG PCI ID update - support separate firmware for each GT - SVM fix, prelim SVM multi-device work - export fan speed - temp disable d3cold on BMG - backup VRAM in PM notifier instead of suspend/freeze - update xe_ttm_access_memory to use GPU for non-visible access - fix guc_info debugfs for VFs - use copy_from_user instead of __copy_from_user - append PCIe gen5 limitations to xe_firmware document amdgpu: - DSC cleanup - DC Scaling updates - Fused I2C-over-AUX updates - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - SR-IOV updates - RAS updates - PSP 12 cleanups - GC 9.5 updates - SMU 13.x updates - VCN / JPEG SR-IOV updates amdkfd: - Update error messages for SDMA - Userptr updates - XNACK fixes radeon: - CIK doorbell cleanup nouveau: - add support for NVIDIA r570 GSP firmware - enable Hopper/Blackwell support nova-core: - fix task list - register definition infrastructure - move firmware into own rust module - register auxiliary device for nova-drm nova-drm: - initial driver skeleton msm: - GPU: - ACD (adaptive clock distribution) for X1-85 - drop fictional address_space_size - improve GMU HFI response time out robustness - fix crash when throttling during boot - DPU: - use single CTL path for flushing on DPU 5.x+ - improve SSPP allocation code for better sharing - Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550 - Added SAR2130P support - Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660 - DP: - switch to new audio helpers - better LTTPR handling - DSI: - Added support for SA8775P - Added SAR2130P support - HDMI: - Switched to use new helpers for ACR data - Fixed old standing issue of HPD not working in some cases amdxdna: - add dma-buf support - allow empty command submits renesas: - add dma-buf support - add zpos, alpha, blend support panthor: - fail properly for NO_MMAP bos - add SET_LABEL ioctl - debugfs BO dumping support imagination: - update DT bindings - support TI AM68 GPU hibmc: - improve interrupt handling and HPD support virtio: - add panic handler support rockchip: - add RK3588 support - add DP AUX bus panel support ivpu: - add heartbeat based hangcheck mediatek: - prepares support for MT8195/99 HDMIv2/DDCv2 anx7625: - improve HPD tegra: - speed up firmware loading * tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits) drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr() drm/xe: Default auto_link_downgrade status to false drm/xe/guc: Make creation of SLPC debugfs files conditional drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue() drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read drm/i915/ptl: Use everywhere the correct DDI port clock select mask drm/nouveau/kms: add support for GB20x drm/dp: add option to disable zero sized address only transactions. drm/nouveau: add support for GB20x drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle drm/nouveau: add support for GB10x drm/nouveau/gf100-: track chan progress with non-WFI semaphore release drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos drm/nouveau: add support for GH100 drm/nouveau: improve handling of 64-bit BARs drm/nouveau/gv100-: switch to volta semaphore methods drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM ...	2025-05-28 09:46:39 -07:00
Matthew Brost	794f5493f5	drm/xe: Strict migration policy for atomic SVM faults Mixing GPU and CPU atomics does not work unless a strict migration policy of GPU atomics must be device memory. Enforce a policy of must be in VRAM with a retry loop of 3 attempts, if retry loop fails abort fault. Removing always_migrate_to_vram modparam as we now have real migration policy. v2: - Only retry migration on atomics - Drop alway migrate modparam v3: - Only set vram_only on DGFX (Himal) - Bail on get_pages failure if vram_only and retry count exceeded (Himal) - s/vram_only/devmem_only - Update xe_svm_range_is_valid to accept devmem_only argument v4: - Fix logic bug get_pages failure v5: - Fix commit message (Himal) - Mention removing always_migrate_to_vram in commit message (Lucas) - Fix xe_svm_range_is_valid to check for devmem pages - Bail on devmem_only && !migrate_devmem (Thomas) v6: - Add READ_ONCE barriers for opportunistic checks (Thomas) - Pair READ_ONCE with WRITE_ONCE (Thomas) v7: - Adjust comments (Thomas) Fixes: `2f118c9491` ("drm/xe: Add SVM VRAM migration") Cc: stable@vger.kernel.org Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250512135500.1405019-3-matthew.brost@intel.com (cherry picked from commit `a9ac0fa455`) Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-05-14 09:03:29 -07:00
Oak Zeng	5b658b7e89	drm/xe: Clear scratch page on vm_bind When a vm runs under fault mode, if scratch page is enabled, we need to clear the scratch page mapping on vm_bind for the vm_bind address range. Under fault mode, we depend on recoverable page fault to establish mapping in page table. If scratch page is not cleared, GPU access of address won't cause page fault because it always hits the existing scratch page mapping. When vm_bind with IMMEDIATE flag, there is no need of clearing as immediate bind can overwrite the scratch page mapping. So far only is xe2 and xe3 products are allowed to enable scratch page under fault mode. On other platform we don't allow scratch page under fault mode, so no need of such clearing. v2: Rework vm_bind pipeline to clear scratch page mapping. This is similar to a map operation, with the exception that PTEs are cleared instead of pointing to valid physical pages. (Matt, Thomas) TLB invalidation is needed after clear scratch page mapping as larger scratch page mapping could be backed by physical page and cached in TLB. (Matt, Thomas) v3: Fix the case of clearing huge pte (Thomas) Improve commit message (Thomas) v4: TLB invalidation on all LR cases, not only the clear on bind cases (Thomas) v5: Misc cosmetic changes (Matt) Drop pt_update_ops.invalidate_on_bind. Directly wire xe_vma_op.map.invalidata_on_bind to bind_op_prepare/commit (Matt) v6: checkpatch fix (Matt) v7: No need to check platform needs_scratch deciding invalidate_on_bind (Matt) v8: rebase v9: rebase v10: fix an error in xe_pt_stage_bind_entry, introduced in v9 rebase Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250403165328.2438690-3-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-04-07 11:17:15 +05:30
Matthew Auld	8e8e9c2663	drm/xe: unconditionally apply PINNED for pin_map() Some users apply PINNED and some don't when using pin_map(). The pin in pin_map() should imply PINNED so just unconditionally apply it and clean up all users. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-14-matthew.auld@intel.com	2025-04-04 11:41:08 +01:00
Matthew Auld	7f387e6012	drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE With the idea of having more pinned objects using the blitter engine where possible, during suspend/resume, mark the pinned objects which can be done during the late phase once submission/migration has been setup. Start out simple with lrc and page-tables from userspace. v2: - s/early_restore/late_restore; early restore was way too bold with too many places being impacted at once. v3: - Split late vs early into separate lists, to align with newly added apply-to-pinned infra. v4: - Rebase. v5: - Make sure we restore the late phase kernel_bo_present in igpu. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-13-matthew.auld@intel.com	2025-04-04 11:41:05 +01:00
Thomas Hellström	32cb8dc550	drm/xe: Fix xe_pt_stage_bind_walk kerneldoc The structure was missing a proper kerneldoc header and once that was added a number of typos and errors became obvious. Fix those. Reported-by: Lucas De Marchi <lucas.demarchi@intel.com> Closes: https://lore.kernel.org/intel-xe/x53tcs5bjldw6lcorjemuheklxcmepdvr2u7lvt3hpqrzqoc4h@nsu6hs25taqj/ Fixes: `b2d4b03b03` ("drm/xe: Make the PT code handle placement per PTE rather than per vma / range") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250402122924.25526-1-thomas.hellstrom@linux.intel.com	2025-04-03 12:23:28 +02:00
Thomas Hellström	b2d4b03b03	drm/xe: Make the PT code handle placement per PTE rather than per vma / range With SVM, ranges forwarded to the PT code for binding can, mostly due to races when migrating, point to both VRAM and system / foreign device memory. Make the PT code able to handle that by checking, for each PTE set up, whether it points to local VRAM or to system memory. v2: - Fix system memory GPU atomic access. v3: - Avoid the UAPI change. It needs more thought. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250326080551.40201-6-thomas.hellstrom@linux.intel.com	2025-03-27 12:08:33 +01:00
Thomas Hellström	6c55404d4f	drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Don't rely on CONFIG_DRM_GPUSVM because other drivers may enable it causing us to compile in SVM support unintentionally. Also take the opportunity to leave more code out of compilation if !CONFIG_DRM_XE_GPUSVM and !CONFIG_DRM_XE_DEVMEM_MIRROR v3: - Fixes for compilation errors on 32-bit. This changes the Kconfig logic a bit. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250326080551.40201-2-thomas.hellstrom@linux.intel.com	2025-03-27 11:46:06 +01:00
Matthew Brost	d92eabb370	drm/xe: Add SVM debug Add some useful SVM debug logging fro SVM range which prints the range's state. v2: - Update logging with latest structure layout v3: - Better commit message (Thomas) - New range structure (Thomas) - s/COLLECTOT/s/COLLECTOR (Thomas) v4: - Drop partial evict message (Thomas) - Use %p for pointers print (Thomas) v6: - Cast dma_addr to u64 (CI) - Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-30-matthew.brost@intel.com	2025-03-06 11:36:57 -08:00
Matthew Brost	d1e6efdfab	drm/xe: Add unbind to SVM garbage collector Add unbind to SVM garbage collector. To facilitate add unbind support function to VM layer which unbinds a SVM range. Also teach PT layer to understand unbinds of SVM ranges. v3: - s/INVALID_VMA/XE_INVALID_VMA (Thomas) - Kernel doc (Thomas) - New GPU SVM range structure (Thomas) - s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas) v4: - Use xe_vma_op_unmap_range (Himal) v5: - s/PY/PT (Thomas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-17-matthew.brost@intel.com	2025-03-06 11:35:47 -08:00
Matthew Brost	7d1d48fb17	drm/xe: Add (re)bind to SVM page fault handler Add (re)bind to SVM page fault handler. To facilitate add support function to VM layer which (re)binds a SVM range. Also teach PT layer to understand (re)binds of SVM ranges. v2: - Don't assert BO lock held for range binds - Use xe_svm_notifier_lock/unlock helper in xe_svm_close - Use drm_pagemap dma cursor - Take notifier lock in bind code to check range state v3: - Use new GPU SVM range structure (Thomas) - Kernel doc (Thomas) - s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas) v5: - Kernel doc (Thomas) v6: - Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Tested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-15-matthew.brost@intel.com	2025-03-06 11:35:43 -08:00
Matthew Brost	ab498828fa	drm/xe: Add SVM range invalidation and page fault Add SVM range invalidation vfunc which invalidates PTEs. A new PT layer function which accepts a SVM range is added to support this. In addition, add the basic page fault handler which allocates a SVM range which is used by SVM range invalidation vfunc. v2: - Don't run invalidation if VM is closed - Cycle notifier lock in xe_svm_close - Drop xe_gt_tlb_invalidation_fence_fini v3: - Better commit message (Thomas) - Add lockdep asserts (Thomas) - Add kernel doc (Thomas) - s/change/changed (Thomas) - Use new GPU SVM range / notifier structures - Ensure PTEs are zapped / dma mappings are unmapped on VM close (Thomas) v4: - Fix macro (Checkpatch) v5: - Use range start/end helpers (Thomas) - Use notifier start/end helpers (Thomas) v6: - Use min/max helpers (Himal) - Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-13-matthew.brost@intel.com	2025-03-06 11:35:40 -08:00
Matthew Brost	074e40d9c2	drm/xe: Nuke VM's mapping upon close Clear root PT entry and invalidate entire VM's address space when closing the VM. Will prevent the GPU from accessing any of the VM's memory after closing. v2: - s/vma/vm in kernel doc (CI) - Don't nuke migration VM as this occur at driver unload (CI) v3: - Rebase and pull into SVM series (Thomas) - Wait for pending binds (Thomas) v5: - Remove xe_gt_tlb_invalidation_fence_fini in error case (Matt Auld) - Drop local migration bool (Thomas) v7: - Add drm_dev_enter/exit protecting invalidation (CI, Matt Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-12-matthew.brost@intel.com	2025-03-06 11:35:38 -08:00
Matthew Brost	b43e864af0	drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU page tables. These VMAs are referred to as CPU address mirror VMAs. The idea is that upon a page fault or prefetch, the memory backing and GPU page tables will be populated. CPU address mirror VMAs only update GPUVM state; they do not have an internal page table (PT) state, nor do they have GPU mappings. It is expected that CPU address mirror VMAs will be mixed with buffer object (BO) VMAs within a single VM. In other words, system allocations and runtime allocations can be mixed within a single user-mode driver (UMD) program. Expected usage: - Bind the entire virtual address (VA) space upon program load using the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - If a buffer object (BO) requires GPU mapping (runtime allocation), allocate a CPU address using mmap(PROT_NONE), bind the BO to the mmapped address using existing bind IOCTLs. If a CPU map of the BO is needed, mmap it again to the same CPU address using mmap(MAP_FIXED) - If a BO no longer requires GPU mapping, munmap it from the CPU address space and them bind the mapping address with the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - Any malloc'd or mmapped CPU address accessed by the GPU will be faulted in via the SVM implementation (system allocation). - Upon freeing any mmapped or malloc'd data, the SVM implementation will remove GPU mappings. Only supporting 1 to 1 mapping between user address space and GPU address space at the moment as that is the expected use case. uAPI defines interface for non 1 to 1 but enforces 1 to 1, this restriction can be lifted if use cases arrise for non 1 to 1 mappings. This patch essentially short-circuits the code in the existing VM bind paths to avoid populating page tables when the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set. v3: - Call vm_bind_ioctl_ops_fini on -ENODATA - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas) - Rework commit message for expected usage (Thomas) - Describe state of code after patch in commit message (Thomas) v4: - Fix alignment (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com	2025-03-06 11:35:33 -08:00
Matthew Brost	6f39b0c5ef	drm/xe: Add staging tree for VM binds Concurrent VM bind staging and zapping of PTEs from a userptr notifier do not work because the view of PTEs is not stable. VM binds cannot acquire the notifier lock during staging, as memory allocations are required. To resolve this race condition, use a staging tree for VM binds that is committed only under the userptr notifier lock during the final step of the bind. This ensures a consistent view of the PTEs in the userptr notifier. A follow up may only use staging for VM in fault mode as this is the only mode in which the above race exists. v3: - Drop zap PTE change (Thomas) - s/xe_pt_entry/xe_pt_entry_staging (Thomas) Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: <stable@vger.kernel.org> Fixes: `e8babb280b` ("drm/xe: Convert multiple bind ops into single job") Fixes: `a708f6501c` ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-5-thomas.hellstrom@linux.intel.com Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2025-03-05 09:30:52 +01:00
Thomas Hellström	100a5b8dad	drm/xe: Fix fault mode invalidation with unbind Fix fault mode invalidation racing with unbind leading to the PTE zapping potentially traversing an invalid page-table tree. Do this by holding the notifier lock across PTE zapping. This might transfer any contention waiting on the notifier seqlock read side to the notifier lock read side, but that shouldn't be a major problem. At the same time get rid of the open-coded invalidation in the bind code by relying on the notifier even when the vma bind is not yet committed. Finally let userptr invalidation call a dedicated xe_vm function performing a full invalidation. Fixes: `e8babb280b` ("drm/xe: Convert multiple bind ops into single job") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228073058.59510-4-thomas.hellstrom@linux.intel.com	2025-03-05 09:09:30 +01:00
Nitin Gote	75fd04f276	drm/xe: Fix all typos in xe Fix all typos in files of xe, reported by codespell tool. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2025-01-09 17:58:09 +01:00
Daniele Ceraolo Spurio	65338639b7	drm/xe: Call invalidation_fence_fini for PT inval fences in error state Invalidation_fence_init takes a PM reference, which is released in its _fini counterpart, so we need to make sure that the latter is called, even if the fence is in an error state. Since we already have a function that calls _fini() and signals the fence in the tlb inval code, we can expose that and call it from the PT code. Fixes: `f002702290` ("drm/xe: Hold a PM ref when GT TLB invalidations are inflight") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: <stable@vger.kernel.org> # v6.11+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241206015022.1567113-1-daniele.ceraolospurio@intel.com	2024-12-10 13:11:14 -08:00
Francois Dugast	9d42476f71	drm/xe: Allow fault injection in vm create and vm bind IOCTLs Use fault injection infrastructure to allow specific functions to be configured over debugfs for failing during the execution of xe_vm_create_ioctl() and xe_vm_bind_ioctl(). This allows more thorough testing from user space by going through code paths for error handling and unwinding which cannot be reached by simply injecting errors in IOCTL arguments. This can help increase code robustness. v2: Add xe_pt_update_ops_{prepare,run} (Matthew Brost) Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241113162212.2154103-1-francois.dugast@intel.com Signed-off-by: Francois Dugast <francois.dugast@intel.com>	2024-11-14 11:48:13 +01:00
Matthew Brost	63e0695597	drm/xe: Fix memory leak when aborting binds Make sure to call xe_pt_update_ops_fini in xe_pt_update_ops_abort to free any memory the bind allocated. Caught by kmemleak when running Vulkan CTS tests on LNL. The leak seems to happen only when there's some kind of failure happening, like the lack of memory. Example output: unreferenced object 0xffff9120bdf62000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 1b 05 f9 28 01 00 00 40 ...........(...@ 00 00 00 00 00 00 00 00 1b 15 f9 28 01 00 00 40 ...........(...@ backtrace (crc 7a56be79): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8309>] xe_pt_insert_entry+0xb9/0x140 [xe] [<ffffffffc08eab6d>] xe_pt_stage_bind_entry+0x12d/0x5b0 [xe] [<ffffffffc08ecbca>] xe_pt_walk_range+0xea/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08e9eff>] xe_pt_stage_bind.constprop.0+0x25f/0x580 [xe] [<ffffffffc08eb21a>] bind_op_prepare+0xea/0x6e0 [xe] [<ffffffffc08ebab8>] xe_pt_update_ops_prepare+0x1c8/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] unreferenced object 0xffff9120bdf72000 (size 8192): comm "deqp-vk", pid 115008, jiffies 4310295728 hex dump (first 32 bytes): 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace (crc 23b2f0b5): [<ffffffff86dd81f0>] __kmalloc_cache_noprof+0x310/0x3d0 [<ffffffffc08e8211>] xe_pt_new_shared.constprop.0+0x81/0xb0 [xe] [<ffffffffc08e8453>] xe_pt_stage_unbind_post_descend+0xb3/0x150 [xe] [<ffffffffc08ecd26>] xe_pt_walk_range+0x246/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08eccea>] xe_pt_walk_range+0x20a/0x280 [xe] [<ffffffffc08ece31>] xe_pt_walk_shared+0xc1/0x110 [xe] [<ffffffffc08e7b2a>] xe_pt_stage_unbind+0x9a/0xd0 [xe] [<ffffffffc08e913d>] unbind_op_prepare+0xdd/0x270 [xe] [<ffffffffc08eb9f6>] xe_pt_update_ops_prepare+0x106/0x440 [xe] [<ffffffffc08ffbf3>] ops_execute+0x143/0x850 [xe] [<ffffffffc0900b64>] vm_bind_ioctl_ops_execute+0x244/0x800 [xe] [<ffffffffc0906467>] xe_vm_bind_ioctl+0x1877/0x2370 [xe] [<ffffffffc05e92b3>] drm_ioctl_kernel+0xb3/0x110 [drm] [<ffffffffc05e95a0>] drm_ioctl+0x280/0x4e0 [drm] Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2877 Fixes: `a708f6501c` ("drm/xe: Update PT layer with better error handling") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240927232228.3255246-1-matthew.brost@intel.com	2024-10-02 06:32:51 -07:00
Matthew Brost	bf758226c7	drm/xe: Invalidate media_gt TLBs in PT code Testing on LNL has shown media GT's TLBs need to be invalidated via the GuC, update PT code appropriately. v2: - Do dma_fence_get before first call of invalidation_fence_init (Himal) - No need to check for valid chain fence (Himal) v3: - Use dma-fence-array Fixes: `3330361543` ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240826170144.2492062-3-matthew.brost@intel.com	2024-08-30 11:41:26 -07:00
Matthew Brost	25ebe10e3f	Revert "drm/xe: Invalidate media_gt TLBs in PT code" This reverts commit `40520283e0`. We can't install dma-fence-chain in timeline sync objs. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240823162207.2168887-1-matthew.brost@intel.com	2024-08-23 09:51:47 -07:00
Matthew Brost	40520283e0	drm/xe: Invalidate media_gt TLBs in PT code Testing on LNL has shown media GT's TLBs need to be invalidated via the GuC, update PT code appropriately. v2: - Do dma_fence_get before first call of invalidation_fence_init (Himal) - No need to check for valid chain fence (Himal) Fixes: `3330361543` ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240820161632.987369-1-matthew.brost@intel.com	2024-08-21 19:34:07 -07:00
Matthew Brost	8d5309b7f6	drm/xe: Only check last fence on user binds We only set the last fence on user binds, so no need to check last fence kernel issued binds. Will avoid blowing up last fence lockdep asserts. Cc: Francois Dugast <francois.dugast@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240805200233.3050325-1-matthew.brost@intel.com	2024-08-05 14:28:07 -07:00
Matthew Brost	649b93dbb9	drm/xe: Fix xe_pt_abort_unbind When restoring the children PT entries on a bind failure the incorrect loop index has used resulting in PT entries being leaked. This is shown by running xe_vm.bind-array-conflict-error-inject on a VRAM device going into a suspend state after the test completes. v2: - s/childern/children (CI, Matt Auld) Fixes: `a708f6501c` ("drm/xe: Update PT layer with better error handling") Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240723010230.1652707-1-matthew.brost@intel.com	2024-07-23 08:02:43 -07:00
Matthew Brost	61ac035361	drm/xe: Drop xe_gt_tlb_invalidation_wait Having two methods to wait on GT TLB invalidations is not ideal. Remove xe_gt_tlb_invalidation_wait and only use GT TLB invalidation fences. In addition to two methods being less than ideal, once GT TLB invalidations are coalesced the seqno cannot be assigned during xe_gt_tlb_invalidation_ggtt/range. Thus xe_gt_tlb_invalidation_wait would not have a seqno to wait one. A fence however can be armed and later signaled. v3: - Add explaination about coalescing to commit message v4: - Don't put dma fence if defined on stack (CI) v5: - Initialize ret to zero (CI) v6: - Use invalidation_fence_signal helper in tlb timeout (Matthew Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-3-matthew.brost@intel.com	2024-07-19 19:45:31 -07:00
Matthew Brost	a522b285c6	drm/xe: Add xe_gt_tlb_invalidation_fence_init helper Other layers should not be touching struct xe_gt_tlb_invalidation_fence directly, add helper for initialization. v2: - Add dma_fence_get and list init to xe_gt_tlb_invalidation_fence_init Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240719172905.1527927-2-matthew.brost@intel.com	2024-07-19 19:45:29 -07:00
Matthew Brost	04e9c0ce19	drm/xe: Add VM bind IOCTL error injection Add VM bind IOCTL error injection which steals MSB of the bind flags field which if set injects errors at various points in the VM bind IOCTL. Intended to validate error paths. Enabled by CONFIG_DRM_XE_DEBUG. v4: - Change define layout (Jonathan) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-8-matthew.brost@intel.com	2024-07-03 22:28:07 -07:00
Matthew Brost	a708f6501c	drm/xe: Update PT layer with better error handling Update PT layer so if a memory allocation for a PTE fails the error can be propagated to the user without requiring the VM to be killed. v5: - change return value invalidation_fence_init to void (Matthew Auld) v7: - Invert i,j usage in two places (Matthew Auld) - s/0/NULL (Matthew Auld) - Don't ignore return value of xe_pt_new_shared (Matthew Auld) - Don't check for NULL in xe_pt_entry (Matthew Auld) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-7-matthew.brost@intel.com	2024-07-03 22:28:06 -07:00
Matthew Brost	e8babb280b	drm/xe: Convert multiple bind ops into single job This aligns with the uAPI of an array of binds or single bind that results in multiple GPUVA ops to be considered a single atomic operations. The design is roughly: - xe_vma_ops is a list of xe_vma_op (GPUVA op) - each xe_vma_op resolves to 0-3 PT ops - xe_vma_ops creates a single job - if at any point during binding a failure occurs, xe_vma_ops contains the information necessary unwind the PT and VMA (GPUVA) state v2: - add missing dma-resv slot reservation (CI, testing) v4: - Fix TLB invalidation (Paulo) - Add missing xe_sched_job_last_fence_add/test_dep check (Inspection) v5: - Invert i, j usage (Matthew Auld) - Add helper to test and add job dep (Matthew Auld) - Return on anything but -ETIME for cpu bind (Matthew Auld) - Return -ENOBUFS if suballoc of BB fails due to size (Matthew Auld) - s/do/Do (Matthew Auld) - Add missing comma (Matthew Auld) - Do not assign return value to xe_range_fence_insert (Matthew Auld) v6: - s/0x1ff/MAX_PTE_PER_SDI (Matthew Auld, CI) - Check to large of SA in Xe to avoid triggering WARN (Matthew Auld) - Fix checkpatch issues v7: - Rebase - Support more than 510 PTEs updates in a bind job (Paulo, mesa testing) v8: - Rebase Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-5-matthew.brost@intel.com	2024-07-03 22:28:04 -07:00
Radhakrishna Sripada	501c4255c4	drm/xe/trace: Print device_id in xe_trace events In multi-gpu environments it is important to know the device gt events belongs to. The tracing information includes the device_id to indicate the device the event is associated with. v2: Use variable sized variant to display dev name(Gustavo) v3: Pass single argument to __assign_str to fix kunit error v4: Remove unused sting_helper library include Suggested-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240607182943.3572524-6-radhakrishna.sripada@intel.com	2024-06-12 09:25:13 -07:00
Nirmoy Das	735940f999	drm/xe: Add warn when level can not be zero. At xe_pt_zap_ptes_entry() and xe_pt_stage_unbind_entry, the level cannot be 0. Therefore, add an independent check for the level. Since the level cannot be zero at this point, there is no need to check for `is_compact`, so remove that instead. Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240521103623.11645-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-21 22:09:20 +02:00
Matthew Brost	c8ff26b82c	drm/xe: Only zap PTEs as needed If PTEs are already invalidated no need to invalidate again. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240514232325.84508-1-matthew.brost@intel.com	2024-05-15 08:07:57 -07:00
Nirmoy Das	a0862cf2fe	drm/xe: Refactor default device atomic settings The default behavior of device atomics depends on the VM type and buffer allocation types. Device atomics are expected to function with all types of allocations for traditional applications/APIs. Additionally, in compute/SVM API scenarios with fault mode or LR mode VMs, device atomics must work with single-region allocations. In all other cases device atomics should be disabled by default also on platforms where we know device atomics doesn't on work on particular allocations types. v3: fault mode requires LR mode so only check for LR mode to determine compute API(Jose). Handle SMEM+LMEM BO's migration to LMEM where device atomics is expected to work. (Brian). v2: Fix platform checks to correct atomics behaviour on PVC. Acked-by: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240430162529.21588-6-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-05-06 18:14:11 +02:00
Matthew Brost	c4f1870362	drm/xe: Add xe_gt_tlb_invalidation_range and convert PT layer to use this xe_gt_tlb_invalidation_range accepts a start and end address rather than a VMA. This will enable multiple VMAs to be invalidated in a single invalidation. Update the PT layer to use this new function. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240425045513.1913039-13-matthew.brost@intel.com	2024-04-26 12:10:07 -07:00
Michal Wajdeczko	48651e18bb	drm/xe: Move PTE/PDE bit definitions to proper header We already have dedicated header for GGTT/PPGTT definitions. It's also cleaner to separate them from implementation macros. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405123520.847-1-michal.wajdeczko@intel.com	2024-04-05 19:58:54 +02:00
Lucas De Marchi	62742d1266	drm/xe: Normalize bo flags macros The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_\([_A-Z0-9]\)_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-02 10:33:57 -07:00
Thomas Hellström	7ee7dd6f30	drm/xe: Move vma rebinding to the drm_exec locking loop Rebinding might allocate page-table bos, causing evictions. To support blocking locking during these evictions, perform the rebinding in the drm_exec locking loop. Also Reserve fence slots where actually needed rather than trying to predict how many fence slots will be needed over a complete wound-wait transaction. v2: - Remove a leftover call to xe_vm_rebind() (Matt Brost) - Add a helper function xe_vm_validate_rebind() (Matt Brost) v3: - Add comments and squash with previous patch (Matt Brost) Fixes: `24f947d58f` ("drm/xe: Use DRM GPUVM helpers for external- and evicted objects") Fixes: `29f424eb87` ("drm/xe/exec: move fence reservation") Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-5-thomas.hellstrom@linux.intel.com	2024-03-28 08:39:30 +01:00
Thomas Hellström	0453f17575	drm/xe: Make TLB invalidation fences unordered They can actually complete out-of-order, so allocate a unique fence context for each fence. Fixes: `5387e865d9` ("drm/xe: Add TLB invalidation fence after rebinds issued from execs") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-4-thomas.hellstrom@linux.intel.com	2024-03-28 08:39:29 +01:00
Thomas Hellström	5a091aff50	drm/xe: Rework rebinding Instead of handling the vm's rebind fence separately, which is error prone if they are not strictly ordered, attach rebind fences as kernel fences to the vm's resv. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-3-thomas.hellstrom@linux.intel.com	2024-03-28 08:39:28 +01:00
Thomas Hellström	4fc4899e86	drm/xe: Use ring ops TLB invalidation for rebinds For each rebind we insert a GuC TLB invalidation and add a corresponding unordered TLB invalidation fence. This might add a huge number of TLB invalidation fences to wait for so rather than doing that, defer the TLB invalidation to the next ring ops for each affected exec queue. Since the TLB is invalidated on exec_queue switch, we need to invalidate once for each affected exec_queue. v2: - Simplify if-statements around the tlb_flush_seqno. (Matthew Brost) - Add some comments and asserts. Fixes: `5387e865d9` ("drm/xe: Add TLB invalidation fence after rebinds issued from execs") Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240327091136.3271-2-thomas.hellstrom@linux.intel.com	2024-03-28 08:39:26 +01:00
Nirmoy Das	6d74e387aa	drm/xe: Drop bogus vma NULL check The vma pointer can't be NULL here. Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240318093547.16326-1-nirmoy.das@intel.com	2024-03-19 08:58:42 +00:00
Matthew Brost	0f688c0eb6	drm/xe: Return 2MB page size for compact 64k PTEs Compact 64k PTEs are only intended to be used within a single VMA which covers the entire 2MB range of the compact 64k PTEs. Add XE_VMA_PTE_COMPACT VMA flag to indicate compact 64k PTEs are used and update xe_vma_max_pte_size to return at least 2MB if set. v2: Include missing changes Fixes: `8f33b4f054` ("drm/xe: Avoid doing rebinds") Fixes: `c47794bdd6` ("drm/xe: Set max pte size when skipping rebinds") Reported-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/758 Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219211942.3633795-4-matthew.brost@intel.com	2024-02-20 08:39:45 -08:00
Matthew Brost	15f0e0c2c4	drm/xe: Add XE_VMA_PTE_64K VMA flag Add XE_VMA_PTE_64K VMA flag to ensure skipping rebinds does not cross 64k page boundaries. Fixes: `8f33b4f054` ("drm/xe: Avoid doing rebinds") Fixes: `c47794bdd6` ("drm/xe: Set max pte size when skipping rebinds") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240219211942.3633795-3-matthew.brost@intel.com	2024-02-20 08:39:34 -08:00
Thomas Hellström	157261c58b	drm/xe/pt: Allow for stricter type- and range checking Distinguish between xe_pt and the xe_pt_dir subclass when allocating and freeing. Also use a fixed-size array for the xe_pt_dir page entries to make life easier for dynamic range- checkers. Finally rename the page-directory child pointer array to "children". While no functional change, this fixes ubsan splats similar to: [ 51.463021] ------------[ cut here ]------------ [ 51.463022] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/xe/xe_pt.c:47:9 [ 51.463023] index 0 is out of range for type 'xe_ptw []' [ 51.463024] CPU: 5 PID: 2778 Comm: xe_vm Tainted: G U 6.8.0-rc1+ #218 [ 51.463026] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 51.463027] Call Trace: [ 51.463028] <TASK> [ 51.463029] dump_stack_lvl+0x47/0x60 [ 51.463030] __ubsan_handle_out_of_bounds+0x95/0xd0 [ 51.463032] xe_pt_destroy+0xa5/0x150 [xe] [ 51.463088] __xe_pt_unbind_vma+0x36c/0x9b0 [xe] [ 51.463144] xe_vm_unbind+0xd8/0x580 [xe] [ 51.463204] ? drm_exec_prepare_obj+0x3f/0x60 [drm_exec] [ 51.463208] __xe_vma_op_execute+0x5da/0x910 [xe] [ 51.463268] ? __drm_gpuvm_sm_unmap+0x1cb/0x220 [drm_gpuvm] [ 51.463272] ? radix_tree_node_alloc.constprop.0+0x89/0xc0 [ 51.463275] ? drm_gpuva_it_remove+0x1f3/0x2a0 [drm_gpuvm] [ 51.463279] ? drm_gpuva_remove+0x2f/0xc0 [drm_gpuvm] [ 51.463283] xe_vm_bind_ioctl+0x1a55/0x20b0 [xe] [ 51.463344] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463414] drm_ioctl_kernel+0xb6/0x120 [ 51.463416] drm_ioctl+0x287/0x4e0 [ 51.463418] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] [ 51.463481] __x64_sys_ioctl+0x94/0xd0 [ 51.463484] do_syscall_64+0x86/0x170 [ 51.463486] ? syscall_exit_to_user_mode+0x7d/0x200 [ 51.463488] ? do_syscall_64+0x96/0x170 [ 51.463490] ? do_syscall_64+0x96/0x170 [ 51.463492] entry_SYSCALL_64_after_hwframe+0x6e/0x76 [ 51.463494] RIP: 0033:0x7f246bfe817d [ 51.463498] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 [ 51.463501] RSP: 002b:00007ffc1bd19ad0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 51.463502] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f246bfe817d [ 51.463504] RDX: 00007ffc1bd19b60 RSI: 0000000040886445 RDI: 0000000000000003 [ 51.463505] RBP: 00007ffc1bd19b20 R08: 0000000000000000 R09: 0000000000000000 [ 51.463506] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffc1bd19b60 [ 51.463508] R13: 0000000040886445 R14: 0000000000000003 R15: 0000000000010000 [ 51.463510] </TASK> [ 51.463517] ---[ end trace ]--- v2 - Fix kerneldoc warning (Matthew Brost) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240209112655.4872-1-thomas.hellstrom@linux.intel.com	2024-02-12 22:57:35 +01:00
Matthew Brost	5fcbf83e39	drm/xe: Drop rebind argument from xe_pt_prepare_bind This is unused, drop it. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Oak Zeng <oak.zeng@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201184844.2317004-1-matthew.brost@intel.com	2024-02-01 18:43:11 -08:00
Thomas Hellström	5bd24e7882	drm/xe/vm: Subclass userptr vmas The construct allocating only parts of the vma structure when the userptr part is not needed is very fragile. A developer could add additional fields below the userptr part, and the code could easily attempt to access the userptr part even if its not persent. So introduce xe_userptr_vma which subclasses struct xe_vma the proper way, and accordingly modify a couple of interfaces. This should also help if adding userptr helpers to drm_gpuvm. v2: - Fix documentation of to_userptr_vma() (Matthew Brost) - Fix allocation and freeing of vmas to clearer distinguish between the types. Closes: https://lore.kernel.org/intel-xe/0c4cc1a7-f409-4597-b110-81f9e45d1ffe@embeddedor.com/T/#u Fixes: `a4cc60a55f` ("drm/xe: Only alloc userptr part of xe_vma for userptrs") Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240131091628.12318-1-thomas.hellstrom@linux.intel.com	2024-02-01 09:49:26 +01:00
Thomas Hellström	06951c2ee7	drm/xe: Use NULL PTEs as scratch PTEs Currently scratch PTEs are write-enabled and points to a single scratch page. This has the side effect that buggy applications with out-of-bounds memory accesses may not notice the bad access since what's written may be read back. Instead use NULL PTEs as scratch PTEs. These always return 0 when reading, and writing has no effect. As a slight benefit, we can also use huge NULL PTEs. One drawback pointed out is that debugging may be hampered since previously when inspecting the content of the scratch page, it might be possible to detect writes to out-of-bound addresses and possibly also from where the out-of-bounds address originated. However since the scratch page-table structure is kept, it will be easy to add back the single RW-enabled scratch page under a debug define if needed. Also update the kerneldoc accordingly and move the function to create the scratch page-tables from xe_pt.c to xe_pt.h since it is accessing vm structure internals and this also makes it possible to make it static. v2: - Don't try to encode scratch PTEs larger than 1GiB. - Move xe_pt_create_scratch(), Update kerneldoc. v3: - Rebase. Cc: Brian Welty <brian.welty@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> #for general direction. Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231209151843.7903-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:28 -05:00
Thomas Hellström	e84d716dd4	drm/xe: Restrict huge PTEs to 1GiB Add a define for the highest level for which we can encode a huge PTE, and use it for page-table building. Also update an assert that checks that we don't try to encode for larger sizes. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231209151843.7903-2-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:28 -05:00
Matthew Auld	e1fbc4f18d	drm/xe/uapi: support pat_index selection with vm_bind Allow userspace to directly control the pat_index for a given vm binding. This should allow directly controlling the coherency, caching behaviour, compression and potentially other stuff in the future for the ppGTT binding. The exact meaning behind the pat_index is very platform specific (see BSpec or PRMs) but effectively maps to some predefined memory attributes. From the KMD pov we only care about the coherency that is provided by the pat_index, which falls into either NONE, 1WAY or 2WAY. The vm_bind coherency mode for the given pat_index needs to be at least 1way coherent when using cpu_caching with DRM_XE_GEM_CPU_CACHING_WB. For platforms that lack the explicit coherency mode attribute, we treat UC/WT/WC as NONE and WB as AT_LEAST_1WAY. For userptr mappings we lack a corresponding gem object, so the expected coherency mode is instead implicit and must fall into either 1WAY or 2WAY. Trying to use NONE will be rejected by the kernel. For imported dma-buf (from a different device) the coherency mode is also implicit and must also be either 1WAY or 2WAY. v2: - Undefined coh_mode(pat_index) can now be treated as programmer error. (Matt Roper) - We now allow gem_create.coh_mode <= coh_mode(pat_index), rather than having to match exactly. This ensures imported dma-buf can always just use 1way (or even 2way), now that we also bundle 1way/2way into at_least_1way. We still require 1way/2way for external dma-buf, but the policy can now be the same for self-import, if desired. - Use u16 for pat_index in uapi. u32 is massive overkill. (José) - Move as much of the pat_index validation as we can into vm_bind_ioctl_check_args. (José) v3 (Matt Roper): - Split the pte_encode() refactoring into separate patch. v4: - Rebase v5: - Check for and reject !coh_mode which would indicate hw reserved pat_index on xe2. v6: - Rebase on removal of coh_mode from uapi. We just need to reject cpu_caching=wb + pat_index with coh_none. Testcase: igt@xe_pat Bspec: 45101, 44235 #xe Bspec: 70552, 71582, 59400 #xe2 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Filip Hazubski <filip.hazubski@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: Effie Yu <effie.yu@intel.com> Cc: Zhengguo Xu <zhengguo.xu@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Zhengguo Xu <zhengguo.xu@intel.com> Acked-by: Bartosz Dunajski <bartosz.dunajski@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:07 -05:00

1 2 3

102 Commits