Support destination migration over interconnect when migrating from
device-private pages with the same dev_pagemap owner.
Since we now also collect device-private pages to migrate,
also abort migration if the range to migrate is already
fully populated with pages from the desired pagemap.
Finally return -EBUSY from drm_pagemap_populate_mm()
if the migration can't be completed without first migrating all
pages in the range to system. It is expected that the caller
will perform that before retrying the call to
drm_pagemap_populate_mm().
v3:
- Fix a bug where the p2p dma-address was never used.
- Postpone enabling destination interconnect migration,
since xe devices require source interconnect migration to
ensure the source L2 cache is flushed at migration time.
- Update the drm_pagemap_migrate_to_devmem() interface to
pass migration details.
v4:
- Define XE_INTERCONNECT_P2P unconditionally (CI)
- Include a missing header (CI)
v5:
- Use page order increments where possible (Matt Brost).
- Fix a negated value of can_migrate_same_pagemap.
- Move removal of some dead code to a separate patch (Matt Brost).
- Remove an unnecessary zdd get() and put() (Matt Brost).
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> # For merging through drm-xe.
Link: https://patch.msgid.link/20251219113320.183860-23-thomas.hellstrom@linux.intel.com
Mimic the dma-buf method using dma_[map|unmap]_resource to map
for pcie-p2p dma.
There's an ongoing area of work upstream to sort out how this best
should be done. One method proposed is to add an additional
pci_p2p_dma_pagemap aliasing the device_private pagemap and use
the corresponding pci_p2p_dma_pagemap page as input for
dma_map_page(). However, that would incur double the amount of
memory and latency to set up the drm_pagemap and given the huge
amount of memory present on modern GPUs, that would really not work.
Hence the simple approach used in this patch.
v2:
- Simplify xe_page_to_pcie(). (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-17-thomas.hellstrom@linux.intel.com
Use device file descriptors and regions to represent pagemaps on
foreign or local devices.
The underlying files are type-checked at madvise time, and
references are kept on the drm_pagemap as long as there is are
madvises pointing to it.
Extend the madvise preferred_location UAPI to support the region
instance to identify the foreign placement.
v2:
- Improve UAPI documentation. (Matt Brost)
- Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost)
- Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost)
- Don't allow a foreign drm_pagemap madvise without a fast
interconnect.
v3:
- Add a comment about reference-counting in xe_devmem_open() and
remove the reference-count get-and-put. (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com
Honor the drm_pagemap vma attribute when migrating SVM pages.
Ensure that when the desired placement is validated as device
memory, that we also check that the requested drm_pagemap is
consistent with the current.
v2:
- Initialize a struct drm_pagemap pointer to NULL that could
otherwise be dereferenced uninitialized. (CI)
- Remove a redundant assignment (Matt Brost)
- Slightly improved commit message (Matt Brost)
- Extended drm_pagemap validation.
v3:
- Fix a compilation error if CONFIG_DRM_GPUSVM is not enabled.
(kernel test robot <lkp@intel.com>)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-14-thomas.hellstrom@linux.intel.com
Register a driver-wide owner list, provide a callback to identify
fast interconnects and use the drm_pagemap_util helper to allocate
or reuse a suitable owner struct. For now we consider pagemaps on
different tiles on the same device as having fast interconnect and
thus the same owner.
v2:
- Fix up the error onion unwind in xe_pagemap_create(). (Matt Brost)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251219113320.183860-12-thomas.hellstrom@linux.intel.com
The Xe driver fails to build when CONFIG_DRM_XE_GPUSVM is disabled
but CONFIG_DRM_GPUSVM is turned on, due to the clash of two commits:
In file included from drivers/gpu/drm/xe/xe_vm_madvise.c:8:
drivers/gpu/drm/xe/xe_svm.h: In function 'xe_svm_init':
include/linux/stddef.h:8:14: error: passing argument 5 of 'drm_gpusvm_init' makes integer from pointer without a cast [-Wint-conversion]
drivers/gpu/drm/xe/xe_svm.h:217:38: note: in expansion of macro 'NULL'
217 | NULL, NULL, 0, 0, 0, NULL, NULL, 0);
| ^~~~
In file included from drivers/gpu/drm/xe/xe_bo_types.h:11,
from drivers/gpu/drm/xe/xe_bo.h:11,
from drivers/gpu/drm/xe/xe_vm_madvise.c:11:
include/drm/drm_gpusvm.h:254:35: note: expected 'long unsigned int' but argument is of type 'void *'
254 | unsigned long mm_start, unsigned long mm_range,
| ~~~~~~~~~~~~~~^~~~~~~~
In file included from drivers/gpu/drm/xe/xe_vm_madvise.c:14:
drivers/gpu/drm/xe/xe_svm.h:216:16: error: too many arguments to function 'drm_gpusvm_init'; expected 10, have 11
216 | return drm_gpusvm_init(&vm->svm.gpusvm, "Xe SVM (simple)", &vm->xe->drm,
| ^~~~~~~~~~~~~~~
217 | NULL, NULL, 0, 0, 0, NULL, NULL, 0);
| ~
include/drm/drm_gpusvm.h:251:5: note: declared here
Adapt the caller to the new argument list by removing the extraneous
NULL argument.
Fixes: 9e97874148 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Fixes: 10aa5c8060 ("drm/gpusvm, drm/xe: Fix userptr to not allow device private pages")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251204094704.1030933-1-arnd@kernel.org
When userptr is used on SVM-enabled VMs, a non-NULL
hmm_range::dev_private_owner value might mean that
hmm_range_fault() attempts to return device private pages.
Either that will fail, or the userptr code will not know
how to handle those.
Use NULL for hmm_range::dev_private_owner to migrate
such pages to system. In order to do that, move the
struct drm_gpusvm::device_private_page_owner field to
struct drm_gpusvm_ctx::device_private_page_owner so that
it doesn't remain immutable over the drm_gpusvm lifetime.
v2:
- Don't conditionally compile xe_svm_devm_owner().
- Kerneldoc xe_svm_devm_owner().
Fixes: 9e97874148 ("drm/xe/userptr: replace xe_hmm with gpusvm")
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250930122752.96034-1-thomas.hellstrom@linux.intel.com
Goal here is cut over to gpusvm and remove xe_hmm, relying instead on
common code. The core facilities we need are get_pages(), unmap_pages()
and free_pages() for a given useptr range, plus a vm level notifier
lock, which is now provided by gpusvm.
v2:
- Reuse the same SVM vm struct we use for full SVM, that way we can
use the same lock (Matt B & Himal)
v3:
- Re-use svm_init/fini for userptr.
v4:
- Allow building xe without userptr if we are missing DRM_GPUSVM
config. (Matt B)
- Always make .read_only match xe_vma_read_only() for the ctx. (Dafna)
v5:
- Fix missing conversion with CONFIG_DRM_XE_USERPTR_INVAL_INJECT
v6:
- Convert the new user in xe_vm_madise.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Dafna Hirschfeld <dafna.hirschfeld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250828142430.615826-17-matthew.auld@intel.com
When the user sets the valid devmem_fd as a preferred location, GPU fault
will trigger migration to tile of device associated with devmem_fd.
If the user sets an invalid devmem_fd the preferred location is current
placement(smem) only.
v2(Matthew Brost)
- Default should be faulting tile
- remove devmem_fd used as region
v3 (Matthew Brost)
- Add migration_policy
- Fix return condition
- fix migrate condition
v4
-Rebase
v5
- Add check for userptr and bo based vmas
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-11-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
In the case of the MADVISE ioctl, if the start or end addresses fall
within a VMA and existing SVM ranges are present, remove the existing
SVM mappings. Then, continue with ops_parse to create new VMAs by REMAP
unmapping of old one.
v2 (Matthew Brost)
- Use vops flag to call unmapping of ranges in vm_bind_ioctl_ops_parse
- Rename the function
v3
- Fix doc
v4
- check if range is already in garbage collector (Matthew Brost)
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250821173104.3030148-7-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
The xe_svm_range_validate() function checks if a range is
valid and located in the desired memory region.
xe_svm_range_migrate_to_smem() checks if range have pages in devmem and
migrate them to smem.
v2
- Fix function stub in xe_svm.h
- Fix doc
v3 (Matthew Brost)
- Remove extra new line
- s/range->base.flags.has_devmem_pages/xe_svm_range_in_vram
v4 (Matthew Brost)
- s/xe_svm_range_in_vram/range->base.flags.has_devmem_pages
- Move eviction logic to separate function
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250513040228.470682-12-himal.prasad.ghimiray@intel.com
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Mixing GPU and CPU atomics does not work unless a strict migration
policy of GPU atomics must be device memory. Enforce a policy of must be
in VRAM with a retry loop of 3 attempts, if retry loop fails abort
fault.
Removing always_migrate_to_vram modparam as we now have real migration
policy.
v2:
- Only retry migration on atomics
- Drop alway migrate modparam
v3:
- Only set vram_only on DGFX (Himal)
- Bail on get_pages failure if vram_only and retry count exceeded (Himal)
- s/vram_only/devmem_only
- Update xe_svm_range_is_valid to accept devmem_only argument
v4:
- Fix logic bug get_pages failure
v5:
- Fix commit message (Himal)
- Mention removing always_migrate_to_vram in commit message (Lucas)
- Fix xe_svm_range_is_valid to check for devmem pages
- Bail on devmem_only && !migrate_devmem (Thomas)
v6:
- Add READ_ONCE barriers for opportunistic checks (Thomas)
- Pair READ_ONCE with WRITE_ONCE (Thomas)
v7:
- Adjust comments (Thomas)
Fixes: 2f118c9491 ("drm/xe: Add SVM VRAM migration")
Cc: stable@vger.kernel.org
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250512135500.1405019-3-matthew.brost@intel.com
Add a new entry in stats to for svm page faults. If CONFIG_DEBUG_FS is
enabled, the count can be viewed with per GT stat debugfs file.
This is similar to what is already in place for vma page faults.
Example output:
cat /sys/kernel/debug/dri/0/gt0/stats
svm_pagefault_count: 6
tlb_inval_count: 78
vma_pagefault_count: 0
vma_pagefault_kb: 0
v2: Fix build with CONFIG_DRM_GPUSVM disabled
v3: Update argument in kernel doc
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250312092749.164232-1-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Migration is implemented with range granularity, with VRAM backing being
a VM private TTM BO (i.e., shares dma-resv with VM). The lifetime of the
TTM BO is limited to when the SVM range is in VRAM (i.e., when a VRAM
SVM range is migrated to SRAM, the TTM BO is destroyed).
The design choice for using TTM BO for VRAM backing store, as opposed to
direct buddy allocation, is as follows:
- DRM buddy allocations are not at page granularity, offering no
advantage over a BO.
- Unified eviction is required (SVM VRAM and TTM BOs need to be able to
evict each other).
- For exhaustive eviction [1], SVM VRAM allocations will almost certainly
require a dma-resv.
- Likely allocation size is 2M which makes of size of BO (872)
acceptable per allocation (872 / 2M == .0004158).
With this, using TTM BO for VRAM backing store seems to be an obvious
choice as it allows leveraging of the TTM eviction code.
Current migration policy is migrate any SVM range greater than or equal
to 64k once.
[1] https://patchwork.freedesktop.org/series/133643/
v2:
- Rebase on latest GPU SVM
- Retry page fault on get pages returning mixed allocation
- Use drm_gpusvm_devmem
v3:
- Use new BO flags
- New range structure (Thomas)
- Hide migration behind Kconfig
- Kernel doc (Thomas)
- Use check_pages_threshold
v4:
- Don't evict partial unmaps in garbage collector (Thomas)
- Use %pe to print errors (Thomas)
- Use %p to print pointers (Thomas)
v5:
- Use range size helper (Thomas)
- Make BO external (Thomas)
- Set tile to NULL for BO creation (Thomas)
- Drop BO mirror flag (Thomas)
- Hold BO dma-resv lock across migration (Auld, Thomas)
v6:
- s/drm_info/drm_dbg (Thomas)
- s/migrated/skip_migrate (Himal)
- Better debug message on VRAM migration failure (Himal)
- Drop return BO from VRAM allocation function (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-28-matthew.brost@intel.com
uAPI is designed with the use case that only mapping a BO to a malloc'd
address will unbind a CPU-address mirror VMA. Therefore, allowing a
CPU-address mirror VMA to unbind when the GPU has bindings in the range
being unbound does not make much sense. This behavior is not supported,
as it simplifies the code. This decision can always be revisited if a
use case arises.
v3:
- s/arrises/arises (Thomas)
- s/system allocator/GPU address mirror (Thomas)
- Kernel doc (Thomas)
- Newline between function defs (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-18-matthew.brost@intel.com
Add basic SVM garbage collector which destroy a SVM range upon a MMU
UNMAP event. The garbage collector runs on worker or in GPU fault
handler and is required as locks in the path of reclaim are required and
cannot be taken the notifier.
v2:
- Flush garbage collector in xe_svm_close
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Use list_first_entry_or_null for garbage collector loop (Thomas)
- Don't add to garbage collector if VM is closed (Thomas)
v4:
- Use %pe to print error (Thomas)
v5:
- s/visable/visible (Thomas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-16-matthew.brost@intel.com
Add (re)bind to SVM page fault handler. To facilitate add support
function to VM layer which (re)binds a SVM range. Also teach PT layer to
understand (re)binds of SVM ranges.
v2:
- Don't assert BO lock held for range binds
- Use xe_svm_notifier_lock/unlock helper in xe_svm_close
- Use drm_pagemap dma cursor
- Take notifier lock in bind code to check range state
v3:
- Use new GPU SVM range structure (Thomas)
- Kernel doc (Thomas)
- s/DRM_GPUVA_OP_USER/DRM_GPUVA_OP_DRIVER (Thomas)
v5:
- Kernel doc (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Tested-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-15-matthew.brost@intel.com
Add SVM range invalidation vfunc which invalidates PTEs. A new PT layer
function which accepts a SVM range is added to support this. In
addition, add the basic page fault handler which allocates a SVM range
which is used by SVM range invalidation vfunc.
v2:
- Don't run invalidation if VM is closed
- Cycle notifier lock in xe_svm_close
- Drop xe_gt_tlb_invalidation_fence_fini
v3:
- Better commit message (Thomas)
- Add lockdep asserts (Thomas)
- Add kernel doc (Thomas)
- s/change/changed (Thomas)
- Use new GPU SVM range / notifier structures
- Ensure PTEs are zapped / dma mappings are unmapped on VM close (Thomas)
v4:
- Fix macro (Checkpatch)
v5:
- Use range start/end helpers (Thomas)
- Use notifier start/end helpers (Thomas)
v6:
- Use min/max helpers (Himal)
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-13-matthew.brost@intel.com
Add SVM init / close / fini to faulting VMs. Minimual implementation
acting as a placeholder for follow on patches.
v2:
- Add close function
v3:
- Better commit message (Thomas)
- Kernel doc (Thomas)
- Update chunk array to be unsigned long (Thomas)
- Use new drm_gpusvm.h header location (Thomas)
- Newlines between functions in xe_svm.h (Thomas)
- Call drm_gpusvm_driver_set_lock in init (Thomas)
v6:
- Only compile if CONFIG_DRM_GPUSVM selected (CI, Lucas)
v7:
- Only select CONFIG_DRM_GPUSVM if DEVICE_PRIVATE (CI)
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-10-matthew.brost@intel.com