The GSC notifies us of a proxy request via the HECI2 interrupt. The
interrupt must be enabled both in the HECI layer and in our usual gt irq
programming; for the latter, the interrupt is enabled via the same enable
register as the GSC CS, but it does have its own mask register. When the
interrupt is received, we also need to de-assert it in both layers.
The handling of the proxy request is deferred to the same worker that we
use for GSC load. New flags have been added to distinguish between the
init case and the proxy interrupt.
v2: rename irq define, fix include ordering (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117182621.2653049-3-daniele.ceraolospurio@intel.com
The GSC uC needs to communicate with the CSME to perform certain
operations. Since the GSC can't perform this communication directly on
platforms where it is integrated in GT, the graphics driver needs to
transfer the messages from GSC to CSME and back. The proxy flow must be
manually started after the GSC is loaded to signal to GSC that we're
ready to handle its messages and allow it to query its init data from
CSME.
Note that the component must be removed before the pci_remove call
completes, so we can't use a drmm helper for it and we need to instead
perform the cleanup as part of the removal flow.
v2: add function documentation, more targeted memory clear, clearer logs
and variable names (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117182621.2653049-2-daniele.ceraolospurio@intel.com
The GuC handles the WA, the KMD just needs to set the flag to enable
it on the appropriate platforms.
v2:
- Fixed CI checkpatch warning, alignment should match open parenthesis.
- Fixed GUC FW version check to use XE_UC_FW_VER_RELEASE which points to
current GUC FW version instead of XE_UC_FW_VER_COMPATIBILITY which
holds GUC FW I/F version (Badal).
v3:
- Removed extra character in debug print.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com>
Reviewed-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117055035.2417711-1-karthik.poosa@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Creating one module for each compilation unit to be tested seems
excessive as the number of tests increase. Group them all in a single
kunit test module called xe_test.ko.
The tests requiring the physical device, aka "live" tests, are still
kept in separate modules since they are normally triggered via igt,
and not via kunit.py. After igt is converted, those can be merged in
a single module as well.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231122203147.988021-2-lucas.demarchi@intel.com
Requesting all memory regions on PVC will fill bo->placements up to
XE_BO_MAX_PLACEMENTS. The subsequent call to try_add_stolen() will trip
over the bounds checking even though XE_PL_STOLEN is not expected to
be used in this case.
This is hit with igt@xe_exec_fault_mode@once-basic-prefetch:
xe 0000:8c:00.0: [drm] Assertion `*c < (sizeof(bo->placements) / sizeof((bo->placements)[0]) + ((int)(sizeof(struct { int:(-!!(__builtin_types_compatible_p(typeof((bo->placements)), typeof(&(bo->placements)[0])))); }))))` failed!
WARNING: CPU: 30 PID: 6161 at drivers/gpu/drm/xe/xe_bo.c:203 __xe_bo_placement_for_flags+0x218/0x240 [xe]
Is fixed here by moving the bounds checks closer to where we actually
write into the bo->placement array.
Fixes: 8c54ee8a86 ("drm/xe: Ensure that we don't access the placements array out-of-bounds")
Link: https://patchwork.freedesktop.org/patch/msgid/20240111002111.10190-1-brian.welty@intel.com
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
(cherry picked from commit 52e3fa3e3e)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Since the migrate code is using the identity map for addressing VRAM,
copy chunks may become as small as 64K if the VRAM resource is fragmented.
However, a chunk size smaller that 1MiB may lead to the *next* chunk's
offset into the CCS metadata backup memory may not be page-aligned, and
the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could,
the current code doesn't handle the offset calculaton correctly.
To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If
the remaining data to copy is smaller than that, that's not a problem,
so use the remaining size. If the VRAM copy cunk becomes fragmented due
to the size alignment restriction, don't use the identity map, but instead
emit PTEs into the page-table like we do for system memory.
v2:
- Rebase
v3:
- Future proof somewhat by taking into account the real data size to
flat CCS metadata size ratio. (Matt Roper)
- Invert a couple of if-statements for better readability.
- Fix support for 4K-granularity VRAM sizes. (Tested on DG1).
v4:
- Fix up code comments
- Fix debug printout format typo.
v5:
- Add a Fixes: tag.
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: e89b384cde ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com
(cherry picked from commit ef51d7542d)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
This error path should clean up before returning.
Smatch detected this bug:
drivers/gpu/drm/xe/xe_device.c:487 xe_device_probe() warn: missing unwind goto?
Fixes: 4cb12b7192 ("drm/xe/xe2: Determine bios enablement for flat ccs on igfx")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
(cherry picked from commit c10da95afa)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Building drivers/gpu/drm/xe/xe_gt_pagefault.c with GCC 11 results
in the following build errors:
./include/linux/fortify-string.h:57:33: error: writing 16 bytes into a region of size 0 [-Werror=stringop-overflow=]
57 | #define __underlying_memcpy __builtin_memcpy
| ^
./include/linux/fortify-string.h:644:9: note: in expansion of macro ‘__underlying_memcpy’
644 | __underlying_##op(p, q, __fortify_size); \
| ^~~~~~~~~~~~~
./include/linux/fortify-string.h:689:26: note: in expansion of macro ‘__fortify_memcpy_chk’
689 | #define memcpy(p, q, s) __fortify_memcpy_chk(p, q, s, \
| ^~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/xe/xe_gt_pagefault.c:340:17: note: in expansion of macro ‘memcpy’
340 | memcpy(pf_queue->data + pf_queue->tail, msg, len * sizeof(u32));
| ^~~~~~
In file included from drivers/gpu/drm/xe/xe_device_types.h:17,
from drivers/gpu/drm/xe/xe_vm_types.h:16,
from drivers/gpu/drm/xe/xe_bo.h:13,
from drivers/gpu/drm/xe/xe_gt_pagefault.c:16:
drivers/gpu/drm/xe/xe_gt_types.h:102:25: note: at offset [1144, 265324] into destination object ‘tile’ of size 8
102 | struct xe_tile *tile;
| ^~~~
Fix these by removing -Wstringop-overflow from drm/xe builds.
Closes: https://lore.kernel.org/all/45ad1d0f-a10f-483e-848a-76a30252edbe@paulmck-laptop/
Fixes: 7a8bc11782 ("drm/xe: Enable W=1 warnings by default")
Suggested-by: Stephen Rothwell <sfr@rothwell.id.au>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
[ This particular warning is broken on GCC11. In future changes it will
be moved to the normal C flags in the top level Makefile (out of
Makefile.extrawarn), but accounting for the compiler support. Just
remove it out of xe's forced extra warnings for now ]
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
(cherry picked from commit a109d19992)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
After exec_queue has been created, we cannot simply modify q->priority.
This needs to be done by the backend via q->ops. However in this case,
it would be more efficient to simply pass a flag when creating the
exec_queue and set the desired priority upfront during queue creation.
To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced.
The priority field is moved to be with other scheduling properties and
is now exec_queue.sched_props.priority. This is no longer set to initial
value by the backend, but is now set within __xe_exec_queue_create().
Fixes: b4eecedc75 ("drm/xe: Fix potential deadlock handling page faults")
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
(cherry picked from commit a8004af338)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
We need to set q->priority prior to calling guc_exec_queue_add_msg() as
that will call init_policies() and sets the scheduling properties to those
stored in the exec_queue.
Fixes: dd08ebf6c3 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
(cherry picked from commit b16483f9f8)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
The intent is to return -EWOULDBLOCK to the user if a long running exec
queue is full during the exec IOCTL. -EWOULDBLOCK aliases to -EAGAIN
which results in the exec IOCTL doing a retry loop. Fix this by ensuring
the retry loop is broken when returning -EWOULDBLOCK.
Fixes: 8ae8a2e8dd ("drm/xe: Long running job update")
Reported-by: Sai Gowtham Ch <sai.gowtham.ch@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
(cherry picked from commit 97d0047cbb)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
i915 defines it as unsigned long so Xe should do the same to avoid
compilation warnings:
CC [M] drivers/gpu/drm/i915/i915_gem.o
CC [M] drivers/gpu/drm/xe/i915-display/intel_display_power_well.o
In file included from ./include/drm/drm_mm.h:51,
from drivers/gpu/drm/xe/xe_bo_types.h:11,
from drivers/gpu/drm/xe/xe_bo.h:11,
from ./drivers/gpu/drm/xe/compat-i915-headers/gem/i915_gem_object.h:11,
from ./drivers/gpu/drm/xe/compat-i915-headers/i915_drv.h:15,
from drivers/gpu/drm/i915/display/intel_display_power.c:8:
drivers/gpu/drm/i915/display/intel_display_power.c: In function ‘print_async_put_domains_state’:
drivers/gpu/drm/i915/display/intel_display_power.c:408:29: warning: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 5 has type ‘int’ [-Wformat=]
408 | drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~
409 | power_domains->async_put_wakeref);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| |
| int
./include/drm/drm_print.h:410:39: note: in definition of macro ‘drm_dev_dbg’
410 | __drm_dev_dbg(NULL, dev, cat, fmt, ##__VA_ARGS__)
| ^~~
./include/drm/drm_print.h:510:33: note: in expansion of macro ‘drm_dbg_driver’
510 | #define drm_dbg(drm, fmt, ...) drm_dbg_driver(drm, fmt, ##__VA_ARGS__)
| ^~~~~~~~~~~~~~
drivers/gpu/drm/i915/display/intel_display_power.c:408:9: note: in expansion of macro ‘drm_dbg’
408 | drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
| ^~~~~~~
drivers/gpu/drm/i915/display/intel_display_power.c:408:50: note: format string is defined here
408 | drm_dbg(&i915->drm, "async_put_wakeref %lu\n",
| ~~^
| |
| long unsigned int
| %u
CC [M] drivers/gpu/drm/i915/i915_gem_evict.o
CC [M] drivers/gpu/drm/i915/i915_gem_gtt.o
CC [M] drivers/gpu/drm/xe/i915-display/intel_display_trace.o
CC [M] drivers/gpu/drm/xe/i915-display/intel_display_wa.o
CC [M] drivers/gpu/drm/i915/i915_query.o
Fixes: 44e694958b ("drm/xe/display: Implement display support")
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
(cherry picked from commit fdbadf5043)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Requesting all memory regions on PVC will fill bo->placements up to
XE_BO_MAX_PLACEMENTS. The subsequent call to try_add_stolen() will trip
over the bounds checking even though XE_PL_STOLEN is not expected to
be used in this case.
This is hit with igt@xe_exec_fault_mode@once-basic-prefetch:
xe 0000:8c:00.0: [drm] Assertion `*c < (sizeof(bo->placements) / sizeof((bo->placements)[0]) + ((int)(sizeof(struct { int:(-!!(__builtin_types_compatible_p(typeof((bo->placements)), typeof(&(bo->placements)[0])))); }))))` failed!
WARNING: CPU: 30 PID: 6161 at drivers/gpu/drm/xe/xe_bo.c:203 __xe_bo_placement_for_flags+0x218/0x240 [xe]
Is fixed here by moving the bounds checks closer to where we actually
write into the bo->placement array.
Fixes: 8c54ee8a86 ("drm/xe: Ensure that we don't access the placements array out-of-bounds")
Link: https://patchwork.freedesktop.org/patch/msgid/20240111002111.10190-1-brian.welty@intel.com
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is
considered the largest legal value accepted. Since that instruction
field is always encoded in (val-2) format, this translates to 0x400
dwords for the true maximum length of the instruction. Subtracting the
instruction header (1 dword) and address (2 dwords), that leaves 0x3FD
dwords (i.e., 0x1FE qwords) for PTE values.
Bspec: 60246, 45753
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111220238.1467572-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Soon we will be required to exclude some of the GGTT addresses
from the allocations, since on some platforms running the SR-IOV VF
mode, we will be able to use only selected range of the GGTT space.
Add helper functions to manage such GGTT range exclusions, and
follow the naming from the similar concept used by GVT-g.
Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com>
Link: https://lore.kernel.org/r/20240111182559.629-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Not all CTB responses from the GuC are fixed size and we need to
pass response length to the caller, if there was a response_buffer.
Easiest solution is to return it as positive value from all
xe_guc_ct_send_recv() functions. The CTB response length is always
between 1 and 254 (ie. GUC_HXG_MSG_MIN_LEN and GUC_CTB_MAX_DWORDS
- GUC_HXG_MSG_MIN_LEN).
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20240111152724.497-1-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Since the migrate code is using the identity map for addressing VRAM,
copy chunks may become as small as 64K if the VRAM resource is fragmented.
However, a chunk size smaller that 1MiB may lead to the *next* chunk's
offset into the CCS metadata backup memory may not be page-aligned, and
the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could,
the current code doesn't handle the offset calculaton correctly.
To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If
the remaining data to copy is smaller than that, that's not a problem,
so use the remaining size. If the VRAM copy cunk becomes fragmented due
to the size alignment restriction, don't use the identity map, but instead
emit PTEs into the page-table like we do for system memory.
v2:
- Rebase
v3:
- Future proof somewhat by taking into account the real data size to
flat CCS metadata size ratio. (Matt Roper)
- Invert a couple of if-statements for better readability.
- Fix support for 4K-granularity VRAM sizes. (Tested on DG1).
v4:
- Fix up code comments
- Fix debug printout format typo.
v5:
- Add a Fixes: tag.
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Fixes: e89b384cde ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k")
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com
This error path should clean up before returning.
Smatch detected this bug:
drivers/gpu/drm/xe/xe_device.c:487 xe_device_probe() warn: missing unwind goto?
Fixes: 4cb12b7192 ("drm/xe/xe2: Determine bios enablement for flat ccs on igfx")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Convention for queues in Linux is the producer moves the head and
consumer moves the tail. Fix the access counter queue to conform to
this convention.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
If ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW != 0 then the access counter queue
logic does not work when wrapping occurs. Add a build bug on to assert
ACC_QUEUE_NUM_DW % ACC_MSG_LEN_DW == 0 to enforce this restriction and
document the code.
v2:
- s/NUM_ACC_QUEUE/ACC_QUEUE_NUM_DW (Brian)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Convention for queues in Linux is the producer moves the head and
consumer moves the tail. Fix the page fault queue to conform to this
convention.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
If PF_QUEUE_NUM_DW % PF_MSG_LEN_DW != 0 then the page fault queue logic
does not work when wrapping occurs. Add a build bug on to assert
PF_QUEUE_NUM_DW % PF_MSG_LEN_DW == 0 to enforce this restriction and
document the code.
v2:
- s/NUM_PF_QUEUE/PF_QUEUE_NUM_DW (Brian)
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Setting of exec_queue user extensions is moved from the end of the ioctl
function earlier, into __xe_exec_queue_alloc().
This fixes bug in that the USM attributes for access counters were being
applied too late, and effectively were ignored.
However, in order to apply user extensions this early, we can no longer
call q->ops functions. Instead, make it more efficient. The user extension
functions can simply update the q->sched_props values and they will be
applied by the backend during q->ops->init().
v2: minor changes for readability (Matt)
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>