Commit Graph

18 Commits

Author SHA1 Message Date
Himal Prasad Ghimiray
9cca49021c drm/xe/xe2: Updates on XY_CTRL_SURF_COPY_BLT
- The XY_CTRL_SURF_COPY_BLT instruction operating on ccs data expects
size in pages of main memory for which CCS data should be copied.
- The bitfield representing copy size in XY_CTRL_SURF_COPY_BLT has
shifted one bit higher in the instruction.

v2:
 - Fix the num_pages for ccs size calculation.
 - Address nits (Thomas)

v3:
- Use FIELD_PREP and FIELD_FIT instead of shifts and numbers.(Matt)

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray
064686272b drm/xe/xe2: Modify main memory to ccs memory ratio.
On xe2 platforms each byte of CCS data now represents 512 bytes of
main memory data.

Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:46:09 -05:00
Tejas Upadhyay
0ac3d319cb drm/xe/xe2: Add workaround 16020292621
Workaround applies to Graphics 20.04 as part of ring
submission

V4(MattR):
  - Rule for engine in oob WA not supported, add explicitly
V3(MattR):
  - Pass hwe and rename API name to hint end of ring work
  - Use existing RING_NOPID API
V2:
  - Marking this WA for 20.04 instead of 20.00

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:45:10 -05:00
Matt Roper
0134f130e7 drm/xe: Extract MI_* instructions to their own header
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.

Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that.  Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.

v2:
 - Use "instr" instead of "inst" as the short form of "instruction"
   everywhere.  (Lucas)
 - Include xe_reg_defs.h instead of the i915 compat header.  (Lucas)

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00
Matt Roper
14a1e6a4a4 drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMM
MI_STORE_DATA_IMM can store either dword values or qword values, and can
store more than one value if the instruction's length field is large
enough.  Create explicit defines to specify the number of dwords/qwords
to be stored, which will set the instruction length correctly and, if
necessary, turn on the 'store qword' bit.

While we're here, also replace an open-coded version of
MI_STORE_DATA_IMM with the common macros.

Bspec: 60246
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-11-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00
Matt Roper
e12a64881e drm/xe: Separate number of registers from MI_LRI opcode
Keeping the number of registers to be loaded as a separate macro from
the instruction opcode will simplify some upcoming LRC parsing code.

Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-10-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00
Matt Roper
de54bb81d9 drm/xe: Make MI_FLUSH_DW immediate size more explicit
Despite its name, MI_FLUSH_DW instruction can write an immediate value
of either dword size or qword size, depending on the 'length' field of
the instruction.  Since "length" excludes the first two dwords of the
instruction, a value of 2 in the length field implies a dword write and
a value of 3 implies a qword write.  Even in cases where the flush
instruction's post-sync operation is set to "no write" we're still
expected to size the overall instruction as if we were doing a dword or
qword write (i.e., a length of 1 shouldn't be used on modern platforms).

Rather than baking a size of "1" into the #define and then adding
another unexplained "+ 1" at all the spots where the definition gets
used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
that should be OR'd into the instruction header to make it more explicit
what behavior we're requesting.

Bspec: 60229
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-9-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:43:00 -05:00
Haridhar Kalvala
30603b5b0f drm/xe/xe2: Update MOCS fields in blitter instructions
Xe2 changes or adds bits for mocs in a few BLT instructions:
XY_CTRL_SURF_COPY_BLT, XY_FAST_COLOR_BLT, XY_FAST_COPY_BLT, and MEM_SET.
Modify the code to deal with the new location. Unlike Xe1, the MOCS
field in those instructions is only the MOCS index and not the
Structure_MEMORY_OBJECT_CONTROL_STATE anymore. The pxp bit is now
explicitly documented separately.

Bspec: 57567,57566,57565,57562
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230929213640.3189912-5-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:08 -05:00
Haridhar Kalvala
4bdd8c2ed9 drm/xe/xe2: Set tile y type in XY_FAST_COPY_BLT to Tile4
Set bits 30 and 31 of XY_FAST_COPY_BLT's dword1 for XeHP and above.

Destination or source being Y-Major is selected on dword0 and there's
nothing to set on dword1. According to the bspec for Xe2,
"Behavior is undefined when programmed the value 0". Also for XeHP,
the only value allowed in those bits is 0b11, not being possible to
select "Legacy Tile-Y" anymore, only the newer Tile4.

So, unconditionally set those bits for graphics IP 12.50 and above.

v2: Reword commit message and extend it to graphics version >= 12.50
    (Matt Roper)

Bspec: 57567
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230929213640.3189912-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:04 -05:00
Haridhar Kalvala
c690f0e6b7 drm/xe: Rename MEM_SET instruction
PVC_MS_* doesn't reflect the real name of the instruction. Rename
it to follow the name used in the bspec.

Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230929213640.3189912-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:04 -05:00
Haridhar Kalvala
2c0ac321d9 drm/xe: Adjust mocs field mask definitions
Instead of using xe_mocs_index_to_value(), simply define the bitmask
with the shift left applied. This will make it easier to adapt to new
platforms that simply use the index.

This also fixes PVC bug in emit_clear_link_copy() where the MOCS was
getting shifted both by PVC_MS_MOCS_INDEX_MASK definition and by the
xe_moc_index_to_value function.

Bspec: 44509
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230929213640.3189912-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-21 11:42:03 -05:00
Thomas Hellström
9f8f93bee3 drm/xe: Emit a render cache flush after each rcs/ccs batch
We need to flush render caches before fence signalling, where we might
release the memory for reuse. We can't rely on userspace doing this,
so flush render caches after the batch, but before user fence- and
dma_fence signalling.

Copy the cache flush from i915, but omit PIPE_CONTROL_FLUSH_L3, since it
should be implied by the other flushes. Also omit
PIPE_CONTROL_TLB_INVALIDATE since there should be no apparent need to
invalidate TLB after batch completion.

v2:
- Update Makefile for OOB WA.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com> #1
Reported-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:35:21 -05:00
Thomas Hellström
85dbfe47d0 drm/xe: Invalidate TLB also on bind if in scratch page mode
For scratch table mode we need to cover the case where a scratch PTE might
have been pre-fetched and cached and used instead of that of the newly
bound vma.
For compute vms, invalidate TLB globally using GuC before signalling
bind complete. For !long-running vms, invalidate TLB at batch start.

Also document how TLB invalidation works.

v2:
- Fix a pointer to the comment about TLB invalidation (Jose Souza).
- Add a bool to the vm whether we want to invalidate TLB at batch start.
- Invalidate TLB also on BCS- and video engines at batch start where
  needed.
- Use BIT() macro instead of explicit shift.

Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com> #v1
Reported-by: José Roberto de Souza <jose.souza@intel.com> #v1
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:35:20 -05:00
Lucas De Marchi
d9b79ad275 drm/xe: Drop gen afixes from registers
The defines for the registers were brought over from i915 while
bootstrapping the driver. As xe supports TGL and later only, it doesn't
make sense to keep the GEN* prefixes and suffixes in the registers: TGL
is graphics version 12, previously called "GEN12". So drop the prefix
everywhere.

v2:
  - Also drop _TGL suffix and reword commit message as suggested
    by Matt Roper. While at it, rename VSUNIT_CLKGATE_DIS_TGL to
    VSUNIT_CLKGATE2_DIS with the additional "2", so it doesn't clash
    with the define for the other register

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:32:15 -05:00
Lucas De Marchi
56492dacee drm/xe: Rename instruction field to avoid confusion
There was both BLT_DEPTH_32 and XY_FAST_COLOR_BLT_DEPTH_32 - also add
the prefix to the first to make it clear this is about the FAST_**COPY**
operation. While at it, remove the GEN9_ prefix.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:31:46 -05:00
Matthew Brost
4f1411e2da drm/xe: Reinstate render / compute cache invalidation in ring ops
Render / compute engines have additional caches (not just TLBs) that
need to be invalidated each batch, reinstate these invalidations in ring
ops.

Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Suggested-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:30:21 -05:00
Balasubramani Vivekanandan
11a2407ed5 drm/xe: Stop accepting value in xe_migrate_clear
Although xe_migrate_clear() has a value argument, currently the driver
is only passing 0 at all the places this function is invoked with the
exception the kunit tests are using the parameter to validate this
function with different values.
xe_migrate_clear() is failing on platforms with link copy engines
because xe_migrate_clear() via emit_clear() is using the blitter
instruction XY_FAST_COLOR_BLT to clear the memory. But this instruction
is not supported by link copy engine.
So the solution is to use the alternate instruction MEM_SET when
platform contains link copy engine. But MEM_SET instruction accepts only
8-bit value for setting whereas the value agrument of xe_migrate_clear()
is 32-bit.
So instead of spreading this limitation around all invocations of
xe_migrate_clear() and causing more confusion, it was decided to not
accept any value itself as driver does not really need this currently.

All the kunit tests are adapted as per the new function prototype.

This will be followed by a patch to add support for link copy engines.

Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:30:20 -05:00
Lucas De Marchi
63955b3bfa drm/xe: Remove dependency on intel_gpu_commands.h
Copy the macros used by xe in intel_gpu_commands.h to
regs/xe_gpu_commands.h. PIPE_CONTROL_3D_ENGINE_FLAGS and
PIPE_CONTROL_3D_ARCH_FLAGS were already defined in
drivers/gpu/drm/xe/xe_ring_ops.c and only used there. So let that define
to be used instead of also adding to the new header.

v2: Let PIPE_CONTROL_3D_ENGINE_FLAGS/PIPE_CONTROL_3D_ARCH_FLAGS in the
    only .c that uses it instead of redefining (Matt Roper)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2023-12-19 18:29:21 -05:00