Allow userspace to specify the CPU caching mode at object creation.
Modify gem create handler and introduce xe_bo_create_user to replace
xe_bo_create. In a later patch we will support setting the pat_index as
part of vm_bind, where expectation is that the coherency mode extracted
from the pat_index must be least 1way coherent if using cpu_caching=wb.
v2
- s/smem_caching/smem_cpu_caching/ and
s/XE_GEM_CACHING/XE_GEM_CPU_CACHING/. (Matt Roper)
- Drop COH_2WAY and just use COH_NONE + COH_AT_LEAST_1WAY; KMD mostly
just cares that zeroing/swap-in can't be bypassed with the given
smem_caching mode. (Matt Roper)
- Fix broken range check for coh_mode and smem_cpu_caching and also
don't use constant value, but the already defined macros. (José)
- Prefer switch statement for smem_cpu_caching -> ttm_caching. (José)
- Add note in kernel-doc for dgpu and coherency modes for system
memory. (José)
v3 (José):
- Make sure to reject coh_mode == 0 for VRAM-only.
- Also make sure to actually pass along the (start, end) for
__xe_bo_create_locked.
v4
- Drop UC caching mode. Can be added back if we need it. (Matt Roper)
- s/smem_cpu_caching/cpu_caching. Idea is that VRAM is always WC, but
that is currently implicit and KMD controlled. Make it explicit in
the uapi with the limitation that it currently must be WC. For VRAM
+ SYS objects userspace must now select WC. (José)
- Make sure to initialize bo_flags. (José)
v5
- Make to align with the other uapi and prefix uapi constants with
DRM_ (José)
v6:
- Make it clear that zero cpu_caching is only allowed for kernel
objects. (José)
v7: (Oak)
- With all the changes from the original design, it looks we can
further simplify here and drop the explicit coh_mode. We can just
infer the coh_mode from the cpu_caching. i.e reject cpu_caching=wb +
coh_none. It's one less thing for userspace to maintain so seems
worth it.
v8:
- Make sure to also update the kselftests.
Testcase: igt@xe_mmap@cpu-caching
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Filip Hazubski <filip.hazubski@intel.com>
Cc: Carl Zhang <carl.zhang@intel.com>
Cc: Effie Yu <effie.yu@intel.com>
Cc: Zhengguo Xu <zhengguo.xu@intel.com>
Cc: Francois Dugast <francois.dugast@intel.com>
Cc: Oak Zeng <oak.zeng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Zhengguo Xu <zhengguo.xu@intel.com>
Acked-by: Bartosz Dunajski <bartosz.dunajski@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The concern here is that we may have platforms with dedicated media GT,
and we anyway allocate the object on the tile, which just means running
the same test twice (i.e primary vs media GT).
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We need to sanitize and reset each GT, since xe_bo_evict_all() will
evict everything regardless of GT, which can leave other GTs in a broken
state.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It looks like bulk_move is set during object construction, but is only
removed on object close, however in various places we might not yet have
an actual fd to close, like on the error paths for the gem_create ioctl,
and also one internal user for the evict_test_run_gt() selftest. Try to
handle those cases by manually resetting the bulk_move. This should
prevent triggering:
WARNING: CPU: 7 PID: 8252 at drivers/gpu/drm/ttm/ttm_bo.c:327
ttm_bo_release+0x25e/0x2a0 [ttm]
v2 (Nirmoy):
- It should be safe to just unconditionally call
__xe_bo_unset_bulk_move() in most places.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Test seems to be failing badly after calling xe_bo_restore_kernel().
Taking a snapshot of the CTB and copying back a potentially old version
seems risky, depending on what might have been inflight. Also it seems
snapshotting the ADS object and copying back results in serious
breakage. Normally when calling xe_bo_restore_kernel() we always fully
restart the GT, which re-intializes such things. We could potentially
skip saving and restoring such objects in xe_bo_evict_all() however
seems quite fragile not to also restart the GT. Try to do that here by
triggering a GT reset.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The GPU job will keep the device awake, however assumption here is that
caller of xe_migrate_clear() is also holding mem_access.ref otherwise we
hit the asserts in xe_sa_bo_flush_write() prior to the job construction.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We are calling fairly low level things like xe_bo_restore_kernel() which
expect caller to be holding mem_access.ref. Since we are doing stuff
like evict_all we likely don't want to race with rpm suspend, since that
potentially wants to do the same thing, so just wrap the whole test.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Migration primarily focuses on the memory associated with a tile, so it
makes more sense to track this at the tile level (especially since the
driver was already skipping migration operations on media GTs).
Note that the blitter engine used to perform the migration always lives
in the tile's primary GT today. In theory that could change if media
GTs ever start including blitter engines in the future, but we can
extend the design if/when that happens in the future.
v2:
- Fix kunit test build
- Kerneldoc parameter name update
v3:
- Removed leftover prototype for removed function. (Gustavo)
- Remove unrelated / unwanted error handling change. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since memory and address spaces are a tile concept rather than a GT
concept, we need to plumb tile-based handling through lots of
memory-related code.
Note that one remaining shortcoming here that will need to be addressed
before media GT support can be re-enabled is that although the address
space is shared between a tile's GTs, each GT caches the PTEs
independently in their own TLB and thus TLB invalidation should be
handled at the GT level.
v2:
- Fix kunit test build.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Create a new xe_tile structure to begin separating the concept of "tile"
from "GT." A tile is effectively a complete GPU, and a GT is just one
part of that. On platforms like MTL, there's only a single full GPU
(tile) which has its IP blocks provided by two GTs. In contrast, a
"multi-tile" platform like PVC is basically multiple complete GPUs
packed behind a single PCI device.
For now, just create xe_tile as a simple wrapper around xe_gt. The
items in xe_gt that are truly tied to the tile rather than the GT will
be moved in future patches. Support for multiple GTs per tile (i.e.,
the MTL standalone media case) will also be re-introduced in a future
patch.
v2:
- Fix kunit test build
- Move hunk from next patch to use local tile variable rather than
direct xe->tiles[id] accesses. (Lucas)
- Mention compute in kerneldoc. (Rodrigo)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Instead of simply using EXPORT_SYMBOL() to export the functions needed
in xe.ko to be be called across modules, use EXPORT_SYMBOL_IF_KUNIT()
which will export the symbol under the EXPORTED_FOR_KUNIT_TESTING
namespace.
This avoids accidentally "leaking" these functions and letting them be
called from outside the kunit tests. If these functiosn are accidentally
called from another module, they receive a modpost error like below:
ERROR: modpost: module XXXXXXX uses symbol
xe_ccs_migrate_kunit from namespace EXPORTED_FOR_KUNIT_TESTING,
but does not import it.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Link: https://lore.kernel.org/r/20230401085151.1786204-4-lucas.demarchi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although xe_migrate_clear() has a value argument, currently the driver
is only passing 0 at all the places this function is invoked with the
exception the kunit tests are using the parameter to validate this
function with different values.
xe_migrate_clear() is failing on platforms with link copy engines
because xe_migrate_clear() via emit_clear() is using the blitter
instruction XY_FAST_COLOR_BLT to clear the memory. But this instruction
is not supported by link copy engine.
So the solution is to use the alternate instruction MEM_SET when
platform contains link copy engine. But MEM_SET instruction accepts only
8-bit value for setting whereas the value agrument of xe_migrate_clear()
is 32-bit.
So instead of spreading this limitation around all invocations of
xe_migrate_clear() and causing more confusion, it was decided to not
accept any value itself as driver does not really need this currently.
All the kunit tests are adapted as per the new function prototype.
This will be followed by a patch to add support for link copy engines.
Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
In order to avoid -Werror=missing-prototypes, add the prototypes
in a separate tests/<test-name>_test.h file that is included by both
the implementation (tests/xe_<testname>.c, injected in xe.ko) and the
kunit module (tests/xe_<testname>_test.c -> xe-<testname>-test.ko).
v2: Add header and don't add ifdef to files that are already not built
when not using kunit (Matt Auld)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe, is a new driver for Intel GPUs that supports both integrated and
discrete platforms starting with Tiger Lake (first Intel Xe Architecture).
The code is at a stage where it is already functional and has experimental
support for multiple platforms starting from Tiger Lake, with initial
support implemented in Mesa (for Iris and Anv, our OpenGL and Vulkan
drivers), as well as in NEO (for OpenCL and Level0).
The new Xe driver leverages a lot from i915.
As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there. But it is not added
in this patch.
This initial work is a collaboration of many people and unfortunately
the big squashed patch won't fully honor the proper credits. But let's
get some git quick stats so we can at least try to preserve some of the
credits:
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Co-developed-by: Francois Dugast <francois.dugast@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Co-developed-by: Philippe Lecluse <philippe.lecluse@intel.com>
Co-developed-by: Nirmoy Das <nirmoy.das@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: José Roberto de Souza <jose.souza@intel.com>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Co-developed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Co-developed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>