linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-19 15:24:02 -04:00

Author	SHA1	Message	Date
Himal Prasad Ghimiray	c0e2508cb1	drm/xe/xe2: Use XE_CACHE_WB pat index The pat table entry associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is non coherent. Migration expects the coherency with cpu therefore use the coherent entry XE_CACHE_WB for buffers not supporting compression. For read/write to flat ccs region the issue is not related to coherency with cpu. The hardware expects the pat index associated with GPUVA for indirect access to be compression enabled hence use XE_CACHE_NONE_COMPRESSION. v2 - Fix the argument to emit_pte, pass the bool directly. (Thomas) v3 - Rebase - Update commit message (Matt) v4 - Add a Fixes: tag. (Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `65ef8dbad1` ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index") Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com (cherry picked from commit `6a02867560`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-24 11:13:33 +01:00
Thomas Hellström	7425c43c26	drm/xe/migrate: Fix CCS copy for small VRAM copy chunks Since the migrate code is using the identity map for addressing VRAM, copy chunks may become as small as 64K if the VRAM resource is fragmented. However, a chunk size smaller that 1MiB may lead to the next chunk's offset into the CCS metadata backup memory may not be page-aligned, and the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could, the current code doesn't handle the offset calculaton correctly. To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If the remaining data to copy is smaller than that, that's not a problem, so use the remaining size. If the VRAM copy cunk becomes fragmented due to the size alignment restriction, don't use the identity map, but instead emit PTEs into the page-table like we do for system memory. v2: - Rebase v3: - Future proof somewhat by taking into account the real data size to flat CCS metadata size ratio. (Matt Roper) - Invert a couple of if-statements for better readability. - Fix support for 4K-granularity VRAM sizes. (Tested on DG1). v4: - Fix up code comments - Fix debug printout format typo. v5: - Add a Fixes: tag. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `e89b384cde` ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `ef51d7542d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-15 15:37:00 +01:00
Brian Welty	19c0222524	drm/xe: Fix modifying exec_queue priority in xe_migrate_init After exec_queue has been created, we cannot simply modify q->priority. This needs to be done by the backend via q->ops. However in this case, it would be more efficient to simply pass a flag when creating the exec_queue and set the desired priority upfront during queue creation. To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced. The priority field is moved to be with other scheduling properties and is now exec_queue.sched_props.priority. This is no longer set to initial value by the backend, but is now set within __xe_exec_queue_create(). Fixes: `b4eecedc75` ("drm/xe: Fix potential deadlock handling page faults") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> (cherry picked from commit `a8004af338`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2024-01-15 15:36:50 +01:00
Himal Prasad Ghimiray	266c858852	drm/xe/xe2: Handle flat ccs move for igfx. - Clear flat ccs during user bo creation. - copy ccs meta data between flat ccs and bo during eviction and restore. - Add a bool field ccs_cleared in bo, true means ccs region of bo is already cleared. v2: - Rebase. v3: - Maintain order of xe_bo_move_notify for ttm_bo_type_sg. v4: - xe_migrate_copy can be used to copy src to dst bo on igfx too. Add a bool which handles only ccs metadata copy. v5: - on dgfx ccs should be cleared even if the bo is not compression enabled. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	65ef8dbad1	drm/xe/xe2: Update emit_pte to use compression enabled PAT index For indirect accessed buffer use compression enabled PAT index. v2: - Fix parameter name. v3: - use a relevant define instead of fix number. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	0942752679	drm/xe/xe2: Update chunk size for each iteration of ccs copy In xe2 platform XY_CTRL_SURF_COPY_BLT can handle ccs copy for max of 1024 main surface pages. v2: - Use better logic to determine chunk size (Matt/Thomas) v3: - use function instead of macro(Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	9116eabb6d	drm/xe/xe_migrate: Use NULL 1G PTE mapped at 255GiB VA for ccs clear Get rid of the cleared bo, instead use null 1G PTE mapped at 255GiB offset, this can be used for both dgfx and igfx. v2: - Remove xe_migrate::cleared_bo. - Add a comment for NULL mapping.(Thomas) Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	9cca49021c	drm/xe/xe2: Updates on XY_CTRL_SURF_COPY_BLT - The XY_CTRL_SURF_COPY_BLT instruction operating on ccs data expects size in pages of main memory for which CCS data should be copied. - The bitfield representing copy size in XY_CTRL_SURF_COPY_BLT has shifted one bit higher in the instruction. v2: - Fix the num_pages for ccs size calculation. - Address nits (Thomas) v3: - Use FIELD_PREP and FIELD_FIT instead of shifts and numbers.(Matt) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Matthew Brost	eb9702ad29	drm/xe: Allow num_batch_buffer / num_binds == 0 in IOCTLs The idea being out-syncs can signal indicating all previous operations on the bind queue are complete. An example use case of this would be support for implementing vkQueueWaitIdle easily. All in-syncs are waited on before signaling out-syncs. This is implemented by forming a composite software fence of in-syncs and installing this fence in the out-syncs and exec queue last fence slot. The last fence must be added as a dependency for jobs on user exec queues as it is possible for the last fence to be a composite software fence (unordered, ioctl with zero bb or binds) rather than hardware fence (ordered, previous job on queue). Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:09 -05:00
Lucas De Marchi	5a92da34dd	drm/xe: Rename info.supports_* to info.has_* Rename supports_mmio_ext and supports_usm to use a has_ prefix so the flags are grouped together. This settles on just one variant for positive info matching ("has_") and one for negative ("skip_"). Also make sure the has_* flags are grouped together in xe_pci.c. Reviewed-by: Koby Elbaz <kelbaz@habana.ai> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20231205145235.2114761-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:27 -05:00
Brian Welty	a682b6a42d	drm/xe: Support device page faults on integrated platforms Update xe_migrate_prepare_vm() to use the usm batch buffer even for servicing device page faults on integrated platforms. And as we have no VRAM on integrated platforms, device pagefault handler should not attempt to migrate into VRAM. LNL is first integrated platform to support device pagefaults. Signed-off-by: Brian Welty <brian.welty@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:45:10 -05:00
Thomas Hellström	a21fe5ee59	drm/xe/bo: Rename xe_bo_get_sg() to xe_bo_sg() Using "get" typically refers to obtaining a refcount, which we don't do here so rename to xe_bo_sg(). Suggested-by: Ohad Sharabi <osharabi@habana.ai> Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/946 Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Ohad Sharabi<osharabi@habana.ai> Link: https://patchwork.freedesktop.org/patch/msgid/20231122110359.4087-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:44:57 -05:00
Matthew Auld	a667cf56db	drm/xe/bo: consider dma-resv fences for clear job There could be active fences already in the dma-resv for the object prior to clearing. Make sure to input them as dependencies for the clear job. v2 (Matt B): - We can use USAGE_KERNEL here, since it's only the move fences we care about here. Also add a comment. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:31 -05:00
Matthew Auld	4202dd9fc4	drm/xe/migrate: fix MI_ARB_ON_OFF usage Spec says: "This is a privileged command; it will not be effective (will be converted to a no-op) if executed from within a non-privileged batch buffer." However here it looks like we are just emitting it inside some bb which was jumped to via the ppGTT, which should be considered a non-privileged address space. It looks like we just need some way of preventing things like the emit_pte() and later copy/clear being preempted in-between so rather just emit directly in the ring for migration jobs. Bspec: 45716 Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:31 -05:00
Matt Roper	0134f130e7	drm/xe: Extract MI_* instructions to their own header Extracting the common MI_* instructions that can be used with any engine to their own header will make it easier as we add additional engine instructions in upcoming patches. Also, since the majority of GPU instructions (both MI and non-MI) have a "length" field in bits 7:0 of the instruction header, a common define is added for that. Instruction-specific length fields are still defined for special case instructions that have larger/smaller length fields. v2: - Use "instr" instead of "inst" as the short form of "instruction" everywhere. (Lucas) - Include xe_reg_defs.h instead of the i915 compat header. (Lucas) Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:00 -05:00
Matt Roper	14a1e6a4a4	drm/xe: Clarify number of dwords/qwords stored by MI_STORE_DATA_IMM MI_STORE_DATA_IMM can store either dword values or qword values, and can store more than one value if the instruction's length field is large enough. Create explicit defines to specify the number of dwords/qwords to be stored, which will set the instruction length correctly and, if necessary, turn on the 'store qword' bit. While we're here, also replace an open-coded version of MI_STORE_DATA_IMM with the common macros. Bspec: 60246 Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20231016163449.1300701-11-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:43:00 -05:00
Matthew Auld	e814389ff1	drm/xe: directly use pat_index for pte_encode In a future patch userspace will be able to directly set the pat_index as part of vm_bind. To support this we need to get away from using xe_cache_level in the low level routines and rather just use the pat_index directly. v2: Rebase v3: Some missed conversions, also prefer tile_to_xe() (Niranjana) v4: remove leftover const (Lucas) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Pallavi Mishra <pallavi.mishra@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:58 -05:00
David Kershner	d9e85dd5c2	drm/xe/xe_migrate.c: Use DPA offset for page table entries. Device Physical Address (DPA) is the starting offset device memory. Update xe_migrate identity map base PTE entries to start at dpa_base instead of 0. The VM offset value should be 0 relative instead of DPA relative. Reviewed-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: "Michael J. Ruhl" <michael.j.ruhl@intel.com> Signed-off-by: David Kershner <david.kershner@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:56 -05:00
Haridhar Kalvala	30603b5b0f	drm/xe/xe2: Update MOCS fields in blitter instructions Xe2 changes or adds bits for mocs in a few BLT instructions: XY_CTRL_SURF_COPY_BLT, XY_FAST_COLOR_BLT, XY_FAST_COPY_BLT, and MEM_SET. Modify the code to deal with the new location. Unlike Xe1, the MOCS field in those instructions is only the MOCS index and not the Structure_MEMORY_OBJECT_CONTROL_STATE anymore. The pxp bit is now explicitly documented separately. Bspec: 57567,57566,57565,57562 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:08 -05:00
Haridhar Kalvala	4bdd8c2ed9	drm/xe/xe2: Set tile y type in XY_FAST_COPY_BLT to Tile4 Set bits 30 and 31 of XY_FAST_COPY_BLT's dword1 for XeHP and above. Destination or source being Y-Major is selected on dword0 and there's nothing to set on dword1. According to the bspec for Xe2, "Behavior is undefined when programmed the value 0". Also for XeHP, the only value allowed in those bits is 0b11, not being possible to select "Legacy Tile-Y" anymore, only the newer Tile4. So, unconditionally set those bits for graphics IP 12.50 and above. v2: Reword commit message and extend it to graphics version >= 12.50 (Matt Roper) Bspec: 57567 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:04 -05:00
Haridhar Kalvala	c690f0e6b7	drm/xe: Rename MEM_SET instruction PVC_MS_* doesn't reflect the real name of the instruction. Rename it to follow the name used in the bspec. Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:04 -05:00
Haridhar Kalvala	2c0ac321d9	drm/xe: Adjust mocs field mask definitions Instead of using xe_mocs_index_to_value(), simply define the bitmask with the shift left applied. This will make it easier to adapt to new platforms that simply use the index. This also fixes PVC bug in emit_clear_link_copy() where the MOCS was getting shifted both by PVC_MS_MOCS_INDEX_MASK definition and by the xe_moc_index_to_value function. Bspec: 44509 Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230929213640.3189912-2-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:42:03 -05:00
Lucas De Marchi	fcd75139cd	drm/xe: Use pat_index to encode pde/pte Change the xelp_pte_encode() and xelp_pde_encode() functions to use the platform-dependent pat_index. The same function can be used for all platforms as they only need to encode the pat_index bits in the same pte/pde layout. For platforms that don't have the most significant bit, as long as they don't return a bogus index they should be fine. v2: Use the same logic to encode pde as it's compatible with previous logic, it's more future proof and also fixes the cache setting for PVC (Matt Roper) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-10-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:20 -05:00
Lucas De Marchi	23c8495efe	drm/xe/migrate: Do not hand-encode pte Instead of encoding the pte, call a new vfunc from xe_vm to handle that. The encoding may not be the same on every platform, so keeping it in one place helps to better support them. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:20 -05:00
Lucas De Marchi	0e5e77bd97	drm/xe: Use vfunc for pte/pde ppgtt encoding Move the function to encode pte/pde to be vfuncs inside struct xe_vm. This will allow to easily extend to platforms that don't have a compatible encoding. v2: Fix kunit build Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230927193902.2849159-4-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:19 -05:00
Matt Roper	1951dad534	drm/xe: Infer service copy functionality from engine list On platforms with multiple BCS engines (i.e., PVC and Xe2), not all BCS engines are created equal. The BCS0 engine is what the specs refer to as a "resource copy engine," which supports the platform's full set of copy/fill instructions. In contast, the non-BCS0 "service copy" engines are more streamlined and only support a subset of the GPU instructions supported by the resource copy engine. Platforms with both types of copy engines always support the MEM_COPY and MEM_SET instructions which can be used for simple copy and fill operations on either type of BCS engine. Since the simple MEM_SET instruction meets the needs of Xe's migrate code (and since the more elaborate XY_FAST_COLOR_BLT instruction isn't available to use on service copy engines), we always prefer to use MEM_SET for clearing buffers on our newer platforms. We've been using a 'has_link_copy_engine' feature flag to keep track of which platforms should use MEM_SET for fills. However a feature flag like this is unnecessary since we can already derive the presence of service copy engines (and in turn the MEM_SET instruction) just by looking at the platform's pre-fusing engine list. Utilizing the engine list for this also avoids mistakes like we've made on Xe2 where we forget to set the feature flag in the IP definition. For clarity, "service copy" is a general term that covers any blitter engines that support a limited subset of the overall blitter instruction set (in practice this is any non-BCS0 blitter engine). The "link copy engines" introduced on PVC and the "paging copy engine" present in Xe2 are both instances of service copy engines. v2: - Rewrite / expand the commit message. (Bala) - Fix checkpatch whitespace error. Bspec: 65019 Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Reviewed-by: Haridhar Kalvala <haridhar.kalvala@intel.com> Link: https://lore.kernel.org/r/20230927205143.2695089-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:16 -05:00
Francois Dugast	c73acc1eeb	drm/xe: Use Xe assert macros instead of XE_WARN_ON macro The XE_WARN_ON macro maps to WARN_ON which is not justified in many cases where only a simple debug check is needed. Replace the use of the XE_WARN_ON macro with the new xe_assert macros which relies on drm_*. This takes a struct drm_device argument, which is one of the main changes in this commit. The other main change is that the condition is reversed, as with XE_WARN_ON a message is displayed if the condition is true, whereas with xe_assert it is if the condition is false. v2: - Rebase - Keep WARN splats in xe_wopcm.c (Matt Roper) v3: - Rebase Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:08 -05:00
Thomas Hellström	d00e9cc28e	drm/xe/vm: Simplify and document xe_vm_lock() The xe_vm_lock() function was unnecessarily using ttm_eu_reserve_buffers(). Simplify and document the interface. v4: - Improve on xe_vm_lock() documentation (Matthew Brost) v5: - Rebase conflict. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230908091716.36984-3-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:41:07 -05:00
Niranjana Vishwanathapura	a043fbab7a	drm/xe/pvc: Use fast copy engines as migrate engine on PVC Some copy hardware engine instances are faster than others on PVC. Use a virtual engine of these plus the reserved instance for the migrate engine on PVC. The idea being if a fast instance is available it will be used and the throughput of kernel copies, clears, and pagefault servicing will be higher. v2: Use OOB WA, use all copy engines if no WA is required Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:28 -05:00
Daniele Ceraolo Spurio	923e423817	drm/xe: split kernel vs permanent engine flags If an engine is only destroyed on driver unload, we can skip its clean-up steps with the GuC because the GuC is going to be tuned off as well, so it doesn't matter if we're in sync with it or not. Currently, we apply this optimization to all engines marked as kernel, but this stops us to supporting kernel engines that don't stick around until unload. To remove this limitation, add a separate flag to indicate if the engine is expected to only be destryed on driver unload and use that to trigger the optimzation. While at it, add a small comment to explain what each engine flag represents. v2: s/XE_BUG_ON/XE_WARN_ON, s/ENGINE/EXEC_QUEUE v3: rebased Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230822173334.1664332-3-daniele.ceraolospurio@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:27 -05:00
Oak Zeng	0887a2e7ab	drm/xe: Make xe_mem_region struct Make a xe_mem_region structure which will be used in the coming patches. The new structure is used in both xe device level (xe->mem.vram) and xe_tile level (tile->vram). Make the definition of xe_mem_region.dpa_base to be the DPA base of this memory region and change codes according to this new definition. v1: - rename xe_mem_region.base to dpa_base per conversation with Mike Ruhl Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:40:19 -05:00
Francois Dugast	9b9529ce37	drm/xe: Rename engine to exec_queue Engine was inappropriately used to refer to execution queues and it also created some confusion with hardware engines. Where it applies the exec_queue variable name is changed to q and comments are also updated. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/162 Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:20 -05:00
Francois Dugast	c22a4ed0c3	drm/xe: Rename xe_engine.[ch] to xe_exec_queue.[ch] This is a preparation commit for a larger renaming of engine to exec queue. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Francois Dugast	99fea68288	drm/xe: Prefer WARN() over BUG() to avoid crashing the kernel Replace calls to XE_BUG_ON() with calls XE_WARN_ON() which in turn calls WARN() instead of BUG(). BUG() crashes the kernel and should only be used when it is absolutely unavoidable in case of catastrophic and unrecoverable failures, which is not the case here. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:17 -05:00
Lucas De Marchi	b23ebae7ab	drm/xe: Set PTE_DM bit for stolen on MTL Integrated graphics 1270 and beyond should set the PTE_LM bit in the PTE when it's stolen memory. Add a new function, xe_bo_is_stolen_devmem(), and use it when encoding the PTE. In some places in the spec the PTE bit is called "Local Memory", abbreviated as LM, and in others it's called "Device Memory" (DM). Since we moved away from "Local Memory" and preferred the "vram" terminology, also rename the macros as DM to follow the name of the new function. Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-7-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:05 -05:00
Lucas De Marchi	937b4be72b	drm/xe: Decouple vram check from xe_bo_addr() The output arg is_vram in xe_bo_addr() is unused by several callers. It's also not what the function is mainly doing. Remove the argument and let the interested callers to call xe_bo_is_vram(). Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-6-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:05 -05:00
Lucas De Marchi	621c1fbd9b	drm/xe: Remove vma arg from xe_pte_encode() All the callers pass a NULL vma, so the buffer is always the BO. Remove the argument and the side effects of dealing with it. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230726160708.3967790-5-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:39:04 -05:00
Matthew Brost	fd84041d09	drm/xe: Make bind engines safe We currently have a race between bind engines which can result in corrupted page tables leading to faults. A simple example: bind A 0x0000-0x1000, engine A, has unsatisfied in-fence bind B 0x1000-0x2000, engine B, no in-fences exec A uses 0x1000-0x2000 Bind B will pass bind A and exec A will fault. This occurs as bind A programs the root of the page table in a bind job which is held up by an in-fence. Bind B in this case just programs a leaf entry of the structure. To fix use range-fence utility to track cross bind engine conflicts. In the above example bind A would insert an dependency into the range-fence tree with a key of 0x0-0x7fffffffff, bind B would find that dependency and its bind job would scheduled behind the unsatisfied in-fence and bind A's job. Reviewed-by: Maarten Lankhorst<maarten.lankhorst@linux.intel.com> Co-developed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:52 -05:00
Francois Dugast	72e8d73b71	drm/xe: Cleanup style warnings and errors Fix 6 errors and 20 warnings reported by checkpatch.pl. Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:52 -05:00
Lucas De Marchi	0d39b6daa5	drm/xe: Normalize XE_VM_FLAG* names Rename XE_VM_FLAGS_64K to XE_VM_FLAG_64K to follow the other names and s/GT/TILE/ that got missed in commit `08dea76745` ("drm/xe: Move migration from GT to tile"). Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20230718193924.3084759-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:37 -05:00
Francois Dugast	3e8e7ee6a3	drm/xe: Cleanup style warnings Reduce the number of warnings reported by checkpatch.pl from 118 to 48 by addressing those warnings types: LEADING_SPACE LINE_SPACING BRACES TRAILING_SEMICOLON CONSTANT_COMPARISON BLOCK_COMMENT_STYLE RETURN_VOID ONE_SEMICOLON SUSPECT_CODE_INDENT LINE_CONTINUATIONS UNNECESSARY_ELSE UNSPECIFIED_INT UNNECESSARY_INT MISORDERED_TYPE Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:37:31 -05:00
Matthew Brost	b06d47be7c	drm/xe: Port Xe to GPUVA Rather than open coding VM binds and VMA tracking, use the GPUVA library. GPUVA provides a common infrastructure for VM binds to use mmap / munmap semantics and support for VK sparse bindings. The concepts are: 1) xe_vm inherits from drm_gpuva_manager 2) xe_vma inherits from drm_gpuva 3) xe_vma_op inherits from drm_gpuva_op 4) VM bind operations (MAP, UNMAP, PREFETCH, UNMAP_ALL) call into the GPUVA code to generate an VMA operations list which is parsed, committed, and executed. v2 (CI): Add break after default in case statement. v3: Rebase v4: Fix some error handling v5: Use unlocked version VMA in error paths v6: Rebase, address some review feedback mainly Thomas H v7: Fix compile error in xe_vma_op_unwind, address checkpatch Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:18 -05:00
Thomas Hellström	b747411964	drm/xe: Make page-table updates using the default engine happen in order If the default engine m->eng was used, there is no check for idle and a cpu page-table update may thus happen in parallel with a gpu one. Don't allow CPU page-table updates with the default engine until the engine is idle. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230629205134.111849-2-thomas.hellstrom@linux.intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:35:07 -05:00
Lucas De Marchi	a0ea91db61	drm/xe: Rename pte/pde encoding functions Remove the leftover TODO by renameing the functions to use xe prefix. Since the static __gen8_pte_encode() already has a double score, just remove the prefix. Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20230611222447.2837573-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:34:14 -05:00
Matt Roper	f6929e80cd	drm/xe: Allocate GT dynamically In preparation for re-adding media GT support, switch the primary GT within the tile to a dynamic allocation. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-19-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:15 -05:00
Matt Roper	08dea76745	drm/xe: Move migration from GT to tile Migration primarily focuses on the memory associated with a tile, so it makes more sense to track this at the tile level (especially since the driver was already skipping migration operations on media GTs). Note that the blitter engine used to perform the migration always lives in the tile's primary GT today. In theory that could change if media GTs ever start including blitter engines in the future, but we can extend the design if/when that happens in the future. v2: - Fix kunit test build - Kerneldoc parameter name update v3: - Removed leftover prototype for removed function. (Gustavo) - Remove unrelated / unwanted error handling change. (Gustavo) Cc: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:15 -05:00
Matt Roper	876611c2b7	drm/xe: Memory allocations are tile-based, not GT-based Since memory and address spaces are a tile concept rather than a GT concept, we need to plumb tile-based handling through lots of memory-related code. Note that one remaining shortcoming here that will need to be addressed before media GT support can be re-enabled is that although the address space is shared between a tile's GTs, each GT caches the PTEs independently in their own TLB and thus TLB invalidation should be handled at the GT level. v2: - Fix kunit test build. Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:14 -05:00
Michael J. Ruhl	fb31517cd7	drm/xe: Rename GPU offset helper to reflect true usage The _io_offset helper function is returning an offset into the GPU address space. Using the CPU address offset (io_) is not correct. Rename to reflect usage. Update to use GPU offset information. Update PT dma_offset to use the helper Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:10 -05:00
Matthew Auld	a2f9f4ff07	drm/xe/migrate: retain CCS aux state for vram -> vram There is no mention that migrate_copy() will skip copying the CCS aux state for all types of vram -> vram transfers. Currently we don't need such a facility but might be surprising if we ever do. v2: (Lucas): - s/lmem/vram/ in the commit message - Tidy up the code a bit; use one emit_copy_ccs() v3: - Reword the commit message Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Nirmoy Das <nirmoy.das@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:04 -05:00
Thomas Hellström	3690a01ba9	drm/xe: Support copying of data between system memory bos Modify the xe_migrate_copy() function somewhat to explicitly allow copying of data between two buffer objects including system memory buffer objects. Update the migrate test accordingly. v2: - Check that buffer object sizes match when copying (Matthew Auld) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:34:04 -05:00

1 2

66 Commits