Commit Graph

151 Commits

Author SHA1 Message Date
Lukasz Laguna
57a5f45b3b drm/xe/migrate: Add function to copy of VRAM data in chunks
Introduce a new function to copy data between VRAM and sysmem objects.
The existing xe_migrate_copy() is tailored for eviction and restore
operations, which involves additional logic and operates on entire
objects.
The xe_migrate_vram_copy_chunk() allows copying chunks of data to or
from a dedicated buffer object, which is essential in case of VF
migration.

Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/20251112132220.516975-22-michal.winiarski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-11-13 11:48:20 +01:00
Matthew Brost
b2d7ec41f2 drm/xe: Attach last fence to TLB invalidation job queues
Add support for attaching the last fence to TLB invalidation job queues
to address serialization issues during bursts of unbind jobs. Ensure
that user fence signaling for a bind job reflects both the bind job
itself and the last fences of all related TLB invalidations. Maintain
submission order based solely on the state of the bind and TLB
invalidation queues.

Introduce support functions for last fence attachment to TLB
invalidation queues.

v3:
 - Fix assert in xe_exec_queue_tlb_inval_last_fence_set (CI)
 - Ensure migrate lock held for migrate queues (Testing)
v5:
 - Style nits (Thomas)
 - Rewrite commit message (Thomas)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/20251031234050.3043507-3-matthew.brost@intel.com
2025-11-04 08:20:57 -08:00
Sanjay Yadav
dd5d11b657 drm/xe: Fix spelling and typos across Xe driver files
Corrected various spelling mistakes and typos in multiple
files under the Xe directory. These fixes improve clarity
and maintain consistency in documentation.

v2
- Replaced all instances of "XE" with "Xe" where it referred
  to the driver name
- of -> for
- Typical -> Typically

v3
- Revert "Xe" to "XE" for macro prefix reference

Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patch.msgid.link/20251023121453.1182035-2-sanjay.kumar.yadav@intel.com
2025-10-27 13:00:11 +00:00
Matthew Auld
f558630a7d drm/xe/migrate: skip bounce buffer path on xe2
Now that we support MEM_COPY we should be able to use the PAGE_COPY
mode, otherwise falling back to BYTE_COPY mode when we have odd
sizing/alignment.

v2:
 - Use info.has_mem_copy_instr
 - Rebase on latest changes.
v3 (Matt Brost):
 - Allow various pitches including 1byte pitch for MEM_COPY

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-8-matthew.auld@intel.com
2025-10-23 10:48:41 +01:00
Matthew Auld
1e12dbae9d drm/xe/migrate: support MEM_COPY instruction
Make this the default on xe2+ when doing a copy. This has a few
advantages over the exiting copy instruction:

1) It has a special PAGE_COPY mode that claims to be optimised for
   page-in/page-out, which is the vast majority of current users.

2) It also has a simple BYTE_COPY mode that supports byte granularity
   copying without any restrictions.

With 2) we can now easily skip the bounce buffer flow when copying
buffers with strange sizing/alignment, like for memory_access. But that
is left for the next patch.

v2 (Matt Brost):
  - Use device info to check whether device should use the MEM_COPY
    path. This should fit better with making this a configfs tunable.
  - And with that also keep old path still functional on xe2 for possible
    experimentation.
  - Add a define for PAGE_COPY page-size.
v3 (Matt Brost):
  - Fallback to an actual linear copy for pitch=1.
  - Also update NVL.

BSpec: 57561
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-7-matthew.auld@intel.com
2025-10-23 10:48:39 +01:00
Matthew Auld
0171dcce33 drm/xe/migrate: trim batch buffer sizing
We have an extra two dwords, but it looks like we should only need one
for the extra bb_end. Likely this is just leftover from back when the
arb handling was moved into the ring programming.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-6-matthew.auld@intel.com
2025-10-23 10:48:38 +01:00
Matthew Auld
1413329456 drm/xe/migrate: fix batch buffer sizing
In xe_migrate_vram() the copy can straddle page boundaries, so the len
might look like a single page, but actually accounting for the offset
within the page we will need to emit more than one PTE. Otherwise in
some cases the batch buffer will be undersized leading to warnings
later.  We already have npages so use that instead.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-5-matthew.auld@intel.com
2025-10-23 10:48:37 +01:00
Matthew Auld
fb188d8b00 drm/xe/migrate: fix chunk handling for 2M page emit
On systems with PAGE_SIZE > 4K the chunk will likely be rounded down to
zero, if say we have single 2M page, so one huge pte, since we also try
to align the chunk to PAGE_SIZE / XE_PAGE_SIZE, which will be 16 on 64K
systems. Make the ALIGN_DOWN conditional for 4K PTEs where we can
encounter gpu_page_size < PAGE_SIZE.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-4-matthew.auld@intel.com
2025-10-23 10:48:36 +01:00
Matthew Auld
aaeef7a9c8 drm/xe/migrate: rework size restrictions for sram pte emit
We allow the input size to not be aligned to PAGE_SIZE, which leads to
various bugs in build_pt_update_batch_sram() for PAGE_SIZE > 4K systems.
For example if ptes is exactly one gpu_page_size then the chunk size is
rounded down to zero.  The simplest fix looks to be forcing PAGE_SIZE
aligned inputs.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-3-matthew.auld@intel.com
2025-10-23 10:48:34 +01:00
Matthew Auld
3c767f762b drm/xe/migrate: fix offset and len check
Restriction here is pitch of 4bytes to match pixel width (32b), and hw
restriction where src and dst must be aligned to 64bytes. If any of that
is not possible then we need a bounce buffer.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251022163836.191405-2-matthew.auld@intel.com
2025-10-23 10:48:33 +01:00
Matthew Auld
641bcf8731 drm/xe/migrate: don't misalign current bytes
If current bytes exceeds the max copy size, ensure the clamped size
still accounts for the XE_CACHELINE_BYTES alignment, otherwise we
trigger the assert in xe_migrate_vram with the size now being out of
alignment.

Fixes: 8c2d61e0e9 ("drm/xe/migrate: don't overflow max copy size")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com
2025-10-15 15:26:51 +01:00
Matthew Brost
dd83b101a4 drm/xe: Enable 2M pages in xe_migrate_vram
Using 2M pages in xe_migrate_vram has two benefits: we issue fewer
instructions per 2M copy (1 vs. 512), and the cache hit rate should be
higher. This results in increased copy engine bandwidth, as shown by
benchmark IGTs.

Enable 2M pages by reserving PDEs in the migrate VM and using 2M pages
in xe_migrate_vram if the DMA address order matches 2M.

v2:
 - Reuse build_pt_update_batch_sram (Stuart)
 - Fix build_pt_update_batch_sram for PAGE_SIZE > 4K
v3:
 - More fixes for PAGE_SIZE > 4K, align chunk, decrement chunk as needed
 - Use stack incr var in xe_migrate_vram_use_pde (Stuart)
v4:
 - Split PAGE_SIZE > 4K fix out in different patch (Stuart)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20251013034555.4121168-3-matthew.brost@intel.com
2025-10-13 12:31:22 -07:00
Matthew Brost
55991d854f drm/xe: Fix build_pt_update_batch_sram for non-4K PAGE_SIZE
The build_pt_update_batch_sram function in the Xe migrate layer assumes
PAGE_SIZE == XE_PAGE_SIZE (4K), which is not a valid assumption on
non-x86 platforms. This patch updates build_pt_update_batch_sram to
correctly handle PAGE_SIZE > 4K by programming multiple 4K GPU pages per
CPU page.

v5:
 - Mask off non-address bits during compare

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Tested-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20251013034555.4121168-2-matthew.brost@intel.com
2025-10-13 12:31:21 -07:00
Thomas Hellström
381f1ed151 drm/xe/migrate: Fix an error path
The exhaustive eviction accidently changed an error path goto to
a return. Fix this.

Fixes: 59eabff2a3 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Link: https://lore.kernel.org/r/20250910160939.103473-1-thomas.hellstrom@linux.intel.com
2025-10-10 11:35:11 +02:00
Satyanarayana K V P
673167d9f0 drm/xe: Use PPGTT addresses for TLB invalidation to avoid GGTT fixups
The migrate VM builds the CCS metadata save/restore batch buffer (BB) in
advance and retains it so the GuC can submit it directly when saving a
VM’s state.

When a VM migrates between VFs, the GGTT base can change. Any GGTT-based
addresses embedded in the BB would then have to be parsed and patched.

Use PPGTT addresses in the BB (including for TLB invalidation) so the BB
remains GGTT-agnostic and requires no address fixups during migration.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251008214532.3442967-31-matthew.brost@intel.com
2025-10-09 03:24:07 -07:00
Thomas Hellström
187e16f69d drm/xe: Work around clang multiple goto-label error
When using drm_exec_retry_on_contention(), clang may consider
all labels for which we take addresses in a function as
potential retry goto targets, although strictly only one
is possible. It will then in some situations generate false
positive errors.

In this case, the compiler, for some architectures, consider the

might_lock(&m->job_mutex);

as a potential goto target from drm_exec_retry_on_contention(),
and errors.

Work around that by moving the xe_validate / drm_exec
transaction to a separate function.

v2:
- New commit message based on analysis of Nathan Chancellor

Fixes: 59eabff2a3 ("drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction")
Cc: Matthew Brost <matthew.brost@intel.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202509101853.nDmyxTEM-lkp@intel.com/
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org> # build
Link: https://lore.kernel.org/r/20250911080324.180307-1-thomas.hellstrom@linux.intel.com
2025-09-18 08:27:00 +02:00
Thomas Hellström
59eabff2a3 drm/xe: Convert xe_bo_create_pin_map() for exhaustive eviction
Introduce an xe_bo_create_pin_map_novm() function that does not
take the drm_exec paramenter to simplify the conversion of many
callsites.
For the rest, ensure that the same drm_exec context that was used
for locking the vm is passed down to validation.

Use xe_validation_guard() where appropriate.

v2:
- Avoid gotos from within xe_validation_guard(). (Matt Brost)
- Break out the change to pf_provision_vf_lmem8 to a separate
  patch.
- Adapt to signature change of xe_validation_guard().

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250908101246.65025-12-thomas.hellstrom@linux.intel.com
2025-09-10 09:16:06 +02:00
Sanjay Yadav
c4dfa0bea2 drm/xe/migrate: Remove unneeded emit_pte() when copying CCS only
In xe_migrate_copy(), when copy_only_ccs is true, we only need two
emit_pte() calls one for the BO and one for the raw CCS storage.
However, the current implementation issues three emit_pte() calls,
resulting in an unnecessary PTE programming job.

This fix removes the redundant emit_pte() call to avoid programming
the same PTEs twice and reducing overhead during CCS-only migration.

v2: Preserve correct behavior on DG2, which requires both CCS and
page copies.

Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250904161423.2448727-1-sanjay.kumar.yadav@intel.com
2025-09-05 13:29:20 +01:00
Matthew Auld
edb1745fc6 drm/xe: improve dma-resv handling for backup object
Since the dma-resv is shared we don't need to reserve and add a fence
slot fence twice, plus no need to loop through the dependencies.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20250829164715.720735-2-matthew.auld@intel.com
2025-09-05 11:53:00 +01:00
Matthew Auld
81a45cb7ea drm/xe/migrate: make MI_TLB_INVALIDATE conditional
When clearing VRAM we should be able to skip invalidating the TLBs if we
are only using the identity map to access VRAM (which is the common
case), since no modifications are made to PTEs on the fly. Also since we
use huge 1G entries within the identity map, there should be a pretty
decent chance that the next packet(s) (if also clears) can avoid a tree
walk if we don't shoot down the TLBs, like if we have to process a long
stream of clears.

For normal moves/copies, we usually always end up with the src or dst
being system memory, meaning we can't only rely on the identity map and
will also need to emit PTEs and so will always require a TLB flush.

v2:
  - Update commit to explain the situation for normal copies (Matt B)
  - Rebase on latest changes

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250808110452.467513-2-matthew.auld@intel.com
2025-08-28 09:58:19 +01:00
Simon Richter
b85bb2d677 drm/xe: Make page size consistent in loop
If PAGE_SIZE != XE_PAGE_SIZE (which is currently locked behind
CONFIG_BROKEN), this would generate the wrong number of PDEs.

Since these PDEs are consumed by the GPU, the GPU page size needs to be
used.

Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250818064806.2835-1-Simon.Richter@hogyros.de
2025-08-18 10:52:31 -07:00
Piotr Piórkowski
9337166fa1 drm/xe: Assign ioctl xe file handler to vm in xe_vm_create
In several code paths, such as xe_pt_create(), the vm->xef field is used
to determine whether a VM originates from userspace or the kernel.

Previously, this handler was only assigned in xe_vm_create_ioctl(),
after the VM was created by xe_vm_create(). However, xe_vm_create()
triggers page table creation, and that function assumes vm->xef should
be already set. This could lead to incorrect origin detection.

To fix this problem and ensure consistency in the initialization of
the VM object, let's move the assignment of this handler to
xe_vm_create.

v2:
 - take reference to the xe file object only when xef is not NULL
 - release the reference to the xe file object on the error path (Matthew)

Fixes: 7f387e6012 ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE")
Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-08-12 13:03:36 +02:00
Matthew Auld
17593a69b7 drm/xe: rework PDE PAT index selection
For non-leaf paging structures we end up selecting a random index
between [0, 3], depending on the first user if the page-table is shared,
since non-leaf structures only have two bits in the HW for encoding the
PAT index, and here we are just passing along the full user provided
index, which can be an index as large as ~31 on xe2+. The user provided
index is meant for the leaf node, which maps the actual BO pages where
we have more PAT bits, and not the non-leaf nodes which are only mapping
other paging structures, and so only needs a minimal PAT index range.
Also the chosen index might need to consider how the driver mapped the
paging structures on the host side, like wc vs wb, which is separate
from the user provided index.

With that move the PDE PAT index selection under driver control. For now
just use a coherent index on platforms with page-tables that are cached
on host side, and incoherent otherwise. Using a coherent index could
potentially be expensive, and would be overkill if we know the page-table
is always uncached on host side.

v2 (Stuart):
  - Add some documentation and split into separate helper.

BSpec: 59510
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250808103455.462424-2-matthew.auld@intel.com
2025-08-11 17:41:02 +01:00
Satyanarayana K V P
9f8aa0bcd1 drm/xe/vf: Refactor CCS save/restore to use default migration context
Previously, CCS save/restore operations created separate migration
contexts with new VM memory allocations, resulting in significant
overhead.

This commit eliminates redundant context creation reusing the default
migration context by registering new execution queues for CCS save and
restore on the existing migrate VM.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250808073628.32745-2-satyanarayana.k.v.p@intel.com
2025-08-08 10:29:37 -07:00
Matthew Auld
9b7ca35ed2 drm/xe/migrate: prevent potential UAF
If we hit the error path, the previous fence (if there is one) has
already been put() prior to this, so doing a fence_wait could lead to
UAF. Tweak the flow to do to the put() until after we do the wait.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com
2025-08-07 16:59:22 +01:00
Matthew Auld
8c2d61e0e9 drm/xe/migrate: don't overflow max copy size
With non-page aligned copy, we need to use 4 byte aligned pitch, however
the size itself might still be close to our maximum of ~8M, and so the
dimensions of the copy can easily exceed the S16_MAX limit of the copy
command leading to the following assert:

xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed!
platform: BATTLEMAGE subplatform: 1
graphics: Xe2_HPG 20.01 step A0
media: Xe2_HPM 13.01 step A1
tile: 0 VRAM 10.0 GiB
GT: 0 type 1

WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe]

To fix this account for the pitch when calculating the number of current
bytes to copy.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com
2025-08-07 16:59:22 +01:00
Matthew Auld
38b34e928a drm/xe/migrate: prevent infinite recursion
If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to
using a bounce buffer. However the bounce buffer here is allocated on
the stack, and the only alignment requirement here is that it's
naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce
buffer is also misaligned we then recurse back into the function again,
however the new bounce buffer might also not be aligned, and might never
be until we eventually blow through the stack, as we keep recursing.

Instead of using the stack use kmalloc, which should respect the
power-of-two alignment request here. Fixes a kernel panic when
triggering this path through eudebug.

v2 (Stuart):
 - Add build bug check for power-of-two restriction
 - s/EINVAL/ENOMEM/

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com
2025-08-07 16:59:18 +01:00
Francois Dugast
321d420325 drm/xe/migrate: Populate struct drm_pagemap_addr array
Workaround to ensure all addresses are populated in the array as
this is expected when creating the copy batch. This is required
because the migrate layer does not support 2MB GPU pages yet. A
proper fix will come in a follow-up.

Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-6-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2025-08-06 13:35:05 +02:00
Francois Dugast
f35a6cdf8a drm/pagemap: Use struct drm_pagemap_addr in mapping and copy functions
This struct embeds more information than just the DMA address. This will
help later to support folio orders greater than zero. At this point, there
is no functional change as the only struct member used is addr.

In Xe, adapt to the new drm_gpusvm_devmem_ops type signatures using struct
drm_pagemap_addr, as well as the internal xe SVM functions implementing
those operations. The use of this struct is propagated to xe_migrate as it
makes indexed accesses to the next DMA address but they are no longer
contiguous.

v2:
- Rename drm_pagemap_device_addr to drm_pagemap_addr (Matthew Brost)
- Squash with patch for Xe (Matthew Brost)
- Set proto and dir for completeness (Matthew Brost)
- Assess DMA map protocol (Matthew Brost)

Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/20250805140028.599361-3-francois.dugast@intel.com
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
2025-08-06 13:34:42 +02:00
Satyanarayana K V P
a843b98947 drm/xe/vf: Fix VM crash during VF driver release
The VF CCS save/restore series (patchwork #149108) has a dependency
on the migration framework. A recent migration update in commit
d65ff1ec85 ("drm/xe: Split xe_migrate allocation from initialization")
caused a VM crash during XE driver release for iGPU devices.

Oops: general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b83: 0000 [#1] SMP NOPTI
RIP: 0010:xe_lrc_ring_head+0x12/0xb0 [xe]
Call Trace:
 xe_sriov_vf_ccs_fini+0x1e/0x40 [xe]
 devm_action_release+0x12/0x30
 release_nodes+0x3a/0x120
 devres_release_all+0x96/0xd0
 device_unbind_cleanup+0x12/0x80
 device_release_driver_internal+0x23a/0x280
 device_release_driver+0x12/0x20
 pci_stop_bus_device+0x69/0x90
 pci_stop_and_remove_bus_device+0x12/0x30
 pci_iov_remove_virtfn+0xbd/0x130
 sriov_disable+0x42/0x100
 pci_disable_sriov+0x34/0x50
 xe_pci_sriov_configure+0xf71/0x1020 [xe]

Update the VF CCS migration initialization sequence to align with the new
migration framework changes, resolving the release-time crash.

Fixes: f3009272ff ("drm/xe/vf: Create contexts for CCS read write")
Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250729120720.13990-1-satyanarayana.k.v.p@intel.com
2025-07-29 22:05:14 -07:00
Matthew Brost
dba89840a9 drm/xe: Add GT TLB invalidation jobs
Add GT TLB invalidation jobs which issue GT TLB invalidations. Built on
top of Xe generic dependency scheduler.

v2:
 - Fix checkpatch
v3:
 - Fix kernel doc in xe_gt_tlb_inval_job_alloc_dep,
   xe_gt_tlb_inval_job_push
 - Use IS_ERR_OR_NULL in xe_gt_tlb_inval_job_put
 - Squash migrate lock / unlock helpers into this patch (Stuart)

Suggested-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250724191216.4076566-6-matthew.brost@intel.com
2025-07-24 18:27:22 -07:00
Matthew Brost
c3ead4ecfc drm/xe: Explicitly mark migration queues with flag
Rather than inferring if an exec queue is a migration queue for a flag,
explicitly mark migration queues with a flag.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Stuart Summers <stuart.summers@intel.com>
Link: https://lore.kernel.org/r/20250724191216.4076566-2-matthew.brost@intel.com
2025-07-24 18:25:55 -07:00
Satyanarayana K V P
916ee4704a drm/xe/vf: Register CCS read/write contexts with Guc
Register read write contexts with newly added flags with GUC and
enable the context immediately after registration.
Re-register the context with Guc when resuming from runtime suspend as
soft reset is applied to Guc during xe_pm_runtime_resume().
Make Ring head=tail while unbinding device to avoid issues with VF pause
after device is unbinded.

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-4-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:34 -07:00
Satyanarayana K V P
864690cf4d drm/xe/vf: Attach and detach CCS copy commands with BO
Attach CCS read/write copy commands to BO for old and new mem types as
NULL -> tt or system -> tt.
Detach the CCS read/write copy commands from BO while deleting ttm bo
from xe_ttm_bo_delete_mem_notify().

Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250722120506.6483-3-satyanarayana.k.v.p@intel.com
2025-07-23 07:22:31 -07:00
Piotr Piórkowski
d65ff1ec85 drm/xe: Split xe_migrate allocation from initialization
Currently, xe_migrate_init handled both allocation and initialization,
Lets moves allocation to xe_tile_alloc via a new xe_migrate_alloc
function, and keep initialization in xe_migrate_init.
This will allow the migration pointers to be passed to other structures
before full initialization.
Also replaces devm_kzalloc with drmm_kzalloc for better
DRM-managed memory.

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-5-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:12:49 -07:00
Piotr Piórkowski
f92cfd72d9 drm/xe: Use dynamic allocation for tile and device VRAM region structures
In future platforms, we will need to represent the device and tile
VRAM regions in a more dynamic way, so let's abandon the static
allocation of these structures and start use a dynamic allocation.

v2:
 - Add a helpers for accessing fields of the xe_vram_region structure
v3:
- Add missing EXPORT_SYMBOL_IF_KUNIT for
  xe_vram_region_actual_physical_size

Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Acked-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250714184818.89201-3-piotr.piorkowski@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-16 12:08:31 -07:00
Lucas De Marchi
81e139db69 drm/xe/migrate: Fix alignment check
The check would fail if the address is unaligned, but not when
accounting the offset. Instead of `buf | offset` it should have
been `buf + offset`. To make it more readable and also drop the
uintptr_t, just use the IS_ALIGNED() macro.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250710-migrate-aligned-v1-1-44003ef3c078@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-07-11 12:40:58 -07:00
Matthew Auld
c12fe703ca drm/xe/migrate: fix copy direction in access_memory
After we do the modification on the host side, ensure we write the
result back to VRAM and not the other way around, otherwise the
modification will be lost if treated like a read.

Fixes: 270172f64b ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250710134128.800756-2-matthew.auld@intel.com
2025-07-11 16:33:29 +01:00
Matthew Auld
f7a2fd776e drm/xe/bmg: fix compressed VRAM handling
There looks to be an issue in our compression handling when the BO pages
are very fragmented, where we choose to skip the identity map and
instead fall back to emitting the PTEs by hand when migrating memory,
such that we can hopefully do more work per blit operation. However in
such a case we need to ensure the src PTEs are correctly tagged with a
compression enabled PAT index on dgpu xe2+, otherwise the copy will
simply treat the src memory as uncompressed, leading to corruption if
the memory was compressed by the user.

To fix this pass along use_comp_pat into emit_pte() on the src side, to
indicate that compression should be considered.

v2 (Jonathan): tweak the commit message

Fixes: 523f191cc0 ("drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com>
Cc: <stable@vger.kernel.org> # v6.12+
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250701103949.83116-2-matthew.auld@intel.com
2025-07-04 12:31:06 +01:00
Matthew Brost
ec9223b49a drm/xe: Drop bo->size
bo->size is redundant because the base GEM object already has a size
field with the same value. Drop bo->size and use the base GEM object’s
size instead. While at it, introduce xe_bo_size() to abstract the BO
size.

v2:
 - Fix typo in kernel doc (Ashutosh)
 - Fix kunit (CI)
 - Fix line wrap (Checkpatch)
v3:
 - Fix sriov build (CI)
v4:
 - Fix display build (CI)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://lore.kernel.org/r/20250625144128.2827577-1-matthew.brost@intel.com
2025-06-27 14:52:31 -07:00
Jia Yao
c038bdba98 drm/xe: Fix out-of-bounds field write in MI_STORE_DATA_IMM
According to Bspec, bits 0~9 of MI_STORE_DATA_IMM must not exceed 0x3FE.
The macro MI_SDI_NUM_QW(x) evaluates to 2 * x + 1, which means the
condition 2 * x + 1 <= 0x3FE must be satisfied. Therefore, the maximum
valid value for x is 0x1FE, not 0x1FF.

v2
 - Replace 0x1fe with macro MAX_PTE_PER_SDI (Auld, Matthew & Patelczyk, Maciej)

v3
 - Change macro MAX_PTE_PER_SDI from 0x1fe to 0x1feU (De Marchi, Lucas)

Bspec: 60246

Fixes: 9c44fd5f6e ("drm/xe: Add migrate layer functions for SVM support")
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Brian3 Nguyen <brian3.nguyen@intel.com>
Cc: Alex Zuo <alex.zuo@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Maciej Patelczyk <maciej.patelczyk@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Suggested-by: Shuicheng Lin <shuicheng.lin@intel.com>
Signed-off-by: Jia Yao <jia.yao@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Maciej Patelczyk <maciej.patelczyk@intel.com>
Link: https://lore.kernel.org/r/20250612224620.161105-1-jia.yao@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
2025-06-26 14:10:28 -07:00
Matthew Brost
270172f64b drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access
Add migrate layer functions to access VRAM and update
xe_ttm_access_memory to use for non-visible access and large (more than
16k) BO access. 8G devcoreump on BMG observed 3 minute CPU copy time vs.
3s GPU copy time.

v4:
 - Fix non-page aligned accesses
 - Add support for small / unaligned access
 - Update commit message indicating migrate used for large accesses (Auld)
 - Fix warning in xe_res_cursor for non-zero offset
v5:
 - Fix 32 bit build (CI)
v6:
 - Rebase and use SVM migration copy functions
v7:
 - Fix build error (CI)
v8:
 - Remove ifdef around VRAM copy functions (CI)
 - Use break statement in dma unmmaping (Jonathan)
 - Use if/else rather than goto (Jonathan)
 - Use single return point (Jonathan)

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
Link: https://lore.kernel.org/r/20250423171725.597955-3-matthew.brost@intel.com
2025-04-24 15:51:39 -07:00
Matthew Auld
8e8e9c2663 drm/xe: unconditionally apply PINNED for pin_map()
Some users apply PINNED and some don't when using pin_map(). The pin in
pin_map() should imply PINNED so just unconditionally apply it and clean
up all users.

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250403102440.266113-14-matthew.auld@intel.com
2025-04-04 11:41:08 +01:00
Matthew Auld
58fa61ce4a drm/xe/migrate: ignore CCS for kernel objects
For kernel BOs we don't clear the CCS state on creation, therefore we
should be careful to ignore it when copying pages. In a future patch we
opt for using the copy path here for kernel BOs, so this now needs to be
considered.

v2:
 - Drop bogus asserts (CI)

Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com>
Link: https://lore.kernel.org/r/20250403102440.266113-12-matthew.auld@intel.com
2025-04-04 11:41:04 +01:00
Arnd Bergmann
c909225750 drm/xe: avoid plain 64-bit division
Building the xe driver for i386 results in a link time warning:

x86_64-linux-ld: drivers/gpu/drm/xe/xe_migrate.o: in function `xe_migrate_vram':
xe_migrate.c:(.text+0x1e15): undefined reference to `__udivdi3'

Avoid this by using DIV_U64_ROUND_UP() instead of DIV_ROUND_UP().  The driver
is unlikely to be used on 32=bit hardware, so the extra cost here is not
too important.

Fixes: 9c44fd5f6e ("drm/xe: Add migrate layer functions for SVM support")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://lore.kernel.org/r/20250324210612.2927194-1-arnd@kernel.org
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-03-28 14:28:24 -04:00
Matthew Brost
762b7e9536 drm/xe: Use local fence in error path of xe_migrate_clear
The intent of the error path in xe_migrate_clear is to wait on locally
generated fence and then return. The code is waiting on m->fence which
could be the local fence but this is only stable under the job mutex
leading to a possible UAF. Fix code to wait on local fence.

Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Cc: stable@vger.kernel.org
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://lore.kernel.org/r/20250311182915.3606291-1-matthew.brost@intel.com
2025-03-27 15:30:57 -07:00
Aradhya Bhatia
c045e03634 drm/xe/migrate: Switch from drm to dev managed actions
Change the scope of the migrate subsystem to be dev managed instead of
drm managed.

The parent pci struct &device, that the xe struct &drm_device is a part
of, gets removed when a hot unplug is triggered, which causes the
underlying iommu group to get destroyed as well.

The migrate subsystem, which handles the lifetime of the page-table tree
(pt) BO, doesn't get a chance to keep the BO back during the hot unplug,
as all the references to DRM haven't been put back.
When all the references to DRM are indeed put back later, the migrate
subsystem tries to put back the pt BO. Since the underlying iommu group
has been already destroyed, a kernel NULL ptr dereference takes place
while attempting to keep back the pt BO.

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3914
Suggested-by: Thomas Hellstrom <thomas.hellstrom@intel.com>
Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250326151929.1495972-1-aradhya.bhatia@intel.com
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
2025-03-27 18:06:09 +05:30
Thomas Hellström
4c2007540f drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices
The drm_pagemap functionality does not depend on the device having
recoverable pagefaults available. So allow xe_migrate_vram() also for
such devices. Even if this will have little use in practice, it's
beneficial for testin multi-device SVM, since a memory provider could
be a non-pagefault capable gpu.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250326080551.40201-5-thomas.hellstrom@linux.intel.com
2025-03-27 12:08:33 +01:00
Thomas Hellström
6c55404d4f drm/xe: Introduce CONFIG_DRM_XE_GPUSVM
Don't rely on CONFIG_DRM_GPUSVM because other drivers may enable it
causing us to compile in SVM support unintentionally.

Also take the opportunity to leave more code out of compilation if
!CONFIG_DRM_XE_GPUSVM and !CONFIG_DRM_XE_DEVMEM_MIRROR

v3:
- Fixes for compilation errors on 32-bit. This changes the Kconfig
  logic a bit.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20250326080551.40201-2-thomas.hellstrom@linux.intel.com
2025-03-27 11:46:06 +01:00
Matthew Brost
9c44fd5f6e drm/xe: Add migrate layer functions for SVM support
Add functions which migrate to / from VRAM accepting a single DPA
argument (VRAM) and array of dma addresses (SRAM). Used for SVM
migrations.

v2:
 - Don't unlock job_mutex in error path of xe_migrate_vram
v3:
 - Kernel doc (Thomas)
 - Better commit message (Thomas)
 - s/dword/num_dword (Thomas)
 - Return error on to large of migration (Thomas)

Signed-off-by: Oak Zeng <oak.zeng@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-21-matthew.brost@intel.com
2025-03-06 11:35:53 -08:00