linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-22 00:33:58 -04:00

Author	SHA1	Message	Date
Matthew Brost	270172f64b	drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access Add migrate layer functions to access VRAM and update xe_ttm_access_memory to use for non-visible access and large (more than 16k) BO access. 8G devcoreump on BMG observed 3 minute CPU copy time vs. 3s GPU copy time. v4: - Fix non-page aligned accesses - Add support for small / unaligned access - Update commit message indicating migrate used for large accesses (Auld) - Fix warning in xe_res_cursor for non-zero offset v5: - Fix 32 bit build (CI) v6: - Rebase and use SVM migration copy functions v7: - Fix build error (CI) v8: - Remove ifdef around VRAM copy functions (CI) - Use break statement in dma unmmaping (Jonathan) - Use if/else rather than goto (Jonathan) - Use single return point (Jonathan) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250423171725.597955-3-matthew.brost@intel.com	2025-04-24 15:51:39 -07:00
Matthew Auld	8e8e9c2663	drm/xe: unconditionally apply PINNED for pin_map() Some users apply PINNED and some don't when using pin_map(). The pin in pin_map() should imply PINNED so just unconditionally apply it and clean up all users. Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-14-matthew.auld@intel.com	2025-04-04 11:41:08 +01:00
Matthew Auld	58fa61ce4a	drm/xe/migrate: ignore CCS for kernel objects For kernel BOs we don't clear the CCS state on creation, therefore we should be careful to ignore it when copying pages. In a future patch we opt for using the copy path here for kernel BOs, so this now needs to be considered. v2: - Drop bogus asserts (CI) Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250403102440.266113-12-matthew.auld@intel.com	2025-04-04 11:41:04 +01:00
Arnd Bergmann	c909225750	drm/xe: avoid plain 64-bit division Building the xe driver for i386 results in a link time warning: x86_64-linux-ld: drivers/gpu/drm/xe/xe_migrate.o: in function `xe_migrate_vram': xe_migrate.c:(.text+0x1e15): undefined reference to `__udivdi3' Avoid this by using DIV_U64_ROUND_UP() instead of DIV_ROUND_UP(). The driver is unlikely to be used on 32=bit hardware, so the extra cost here is not too important. Fixes: `9c44fd5f6e` ("drm/xe: Add migrate layer functions for SVM support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20250324210612.2927194-1-arnd@kernel.org Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-03-28 14:28:24 -04:00
Matthew Brost	762b7e9536	drm/xe: Use local fence in error path of xe_migrate_clear The intent of the error path in xe_migrate_clear is to wait on locally generated fence and then return. The code is waiting on m->fence which could be the local fence but this is only stable under the job mutex leading to a possible UAF. Fix code to wait on local fence. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250311182915.3606291-1-matthew.brost@intel.com	2025-03-27 15:30:57 -07:00
Aradhya Bhatia	c045e03634	drm/xe/migrate: Switch from drm to dev managed actions Change the scope of the migrate subsystem to be dev managed instead of drm managed. The parent pci struct &device, that the xe struct &drm_device is a part of, gets removed when a hot unplug is triggered, which causes the underlying iommu group to get destroyed as well. The migrate subsystem, which handles the lifetime of the page-table tree (pt) BO, doesn't get a chance to keep the BO back during the hot unplug, as all the references to DRM haven't been put back. When all the references to DRM are indeed put back later, the migrate subsystem tries to put back the pt BO. Since the underlying iommu group has been already destroyed, a kernel NULL ptr dereference takes place while attempting to keep back the pt BO. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3914 Suggested-by: Thomas Hellstrom <thomas.hellstrom@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250326151929.1495972-1-aradhya.bhatia@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-03-27 18:06:09 +05:30
Thomas Hellström	4c2007540f	drm/xe/migrate: Allow xe_migrate_vram() also on non-pagefault capable devices The drm_pagemap functionality does not depend on the device having recoverable pagefaults available. So allow xe_migrate_vram() also for such devices. Even if this will have little use in practice, it's beneficial for testin multi-device SVM, since a memory provider could be a non-pagefault capable gpu. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250326080551.40201-5-thomas.hellstrom@linux.intel.com	2025-03-27 12:08:33 +01:00
Thomas Hellström	6c55404d4f	drm/xe: Introduce CONFIG_DRM_XE_GPUSVM Don't rely on CONFIG_DRM_GPUSVM because other drivers may enable it causing us to compile in SVM support unintentionally. Also take the opportunity to leave more code out of compilation if !CONFIG_DRM_XE_GPUSVM and !CONFIG_DRM_XE_DEVMEM_MIRROR v3: - Fixes for compilation errors on 32-bit. This changes the Kconfig logic a bit. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250326080551.40201-2-thomas.hellstrom@linux.intel.com	2025-03-27 11:46:06 +01:00
Matthew Brost	9c44fd5f6e	drm/xe: Add migrate layer functions for SVM support Add functions which migrate to / from VRAM accepting a single DPA argument (VRAM) and array of dma addresses (SRAM). Used for SVM migrations. v2: - Don't unlock job_mutex in error path of xe_migrate_vram v3: - Kernel doc (Thomas) - Better commit message (Thomas) - s/dword/num_dword (Thomas) - Return error on to large of migration (Thomas) Signed-off-by: Oak Zeng <oak.zeng@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-21-matthew.brost@intel.com	2025-03-06 11:35:53 -08:00
Nitin Gote	75fd04f276	drm/xe: Fix all typos in xe Fix all typos in files of xe, reported by codespell tool. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250106102646.1400146-2-nitin.r.gote@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2025-01-09 17:58:09 +01:00
José Roberto de Souza	88fca61ba5	Revert "drm/xe: Force write completion of MI_STORE_DATA_IMM" This reverts commit `1460bb1fef`. In all places the MI_STORE_DATA_IMM are not followed by a read of the same memory address in the same batch buffer and the posted writes are flushed with PIPE_CONTROL or MI_FLUSH_DW in xe_ring_ops.c functions so there is no need to set this register. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Fixes: `1460bb1fef` ("drm/xe: Force write completion of MI_STORE_DATA_IMM") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241227183230.101334-1-jose.souza@intel.com	2025-01-02 05:35:16 -08:00
José Roberto de Souza	1460bb1fef	drm/xe: Force write completion of MI_STORE_DATA_IMM With Force write completion unset there is no guarantees of when the write will be globally visible what is not the behavior wanted. Fixes: `9c57bc0865` ("drm/xe/lnl: Drop force_probe requirement") Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241217160732.46280-1-jose.souza@intel.com	2024-12-18 10:11:14 -08:00
Matthew Auld	febc689b27	drm/xe/migrate: use XE_BO_FLAG_PAGETABLE On some HW we want to avoid the host caching PTEs, since access from GPU side can be incoherent. However here the special migrate object is mapping PTEs which are written from the host and potentially cached. Use XE_BO_FLAG_PAGETABLE to ensure that non-cached mapping is used, on platforms where this matters. Fixes: `7a060d786c` ("drm/xe/mtl: Map PPGTT as CPU:WC") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-4-matthew.auld@intel.com	2024-11-27 10:54:20 +00:00
Matthew Auld	f3dc9246f9	drm/xe/migrate: fix pat index usage XE_CACHE_WB must be converted into the per-platform pat index for that particular caching mode, otherwise we are just encoding whatever happens to be the value of that enum. Fixes: `e8babb280b` ("drm/xe: Convert multiple bind ops into single job") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Nirmoy Das <nirmoy.das@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Nirmoy Das <nirmoy.das@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241126181259.159713-3-matthew.auld@intel.com	2024-11-27 10:54:19 +00:00
Jani Nikula	87d8ecf015	drm/xe: replace #include <drm/xe_drm.h> with <uapi/drm/xe_drm.h> include/drm/xe_drm.h does not exist. Prefer the explicit uapi include. Signed-off-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240827091539.4136838-1-jani.nikula@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-08-28 15:17:54 -04:00
Matthew Brost	852856e3b6	drm/xe: Use reserved copy engine for user binds on faulting devices User binds map to engines with can fault, faults depend on user binds completion, thus we can deadlock. Avoid this by using reserved copy engine for user binds on faulting devices. While we are here, normalize bind queue creation with a helper. v2: - Pass in extensions to bind queue creation (CI) v3: - s/resevered/reserved (Lucas) - Fix NULL hwe check (Jonathan) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240816034033.53837-1-matthew.brost@intel.com	2024-08-17 14:12:19 -07:00
Nirmoy Das	b8cdc47adf	drm/xe/migrate: Parameterize ccs and bo data clear in xe_migrate_clear() Parameterize clearing ccs and bo data in xe_migrate_clear() which higher layers can utilize. This patch will be used later on when doing bo data clear for igfx as well. v2: Replace multiple params with flags in xe_migrate_clear (Matt B) v3: s/CLEAR_BO_DATA_FLAG_/XE_MIGRATE_CLEAR_FLAG_ and move to xe_migrate.h. other nits(Matt B) Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240809220347.25330-1-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-13 10:11:07 +02:00
Matt Roper	7657d7c966	drm/xe/migrate: Future-proof compressed PAT check Although all current Xe2 platforms support FlatCCS, we probably shouldn't assume that will be universally true forever. In the past we've had platforms like PVC that didn't support compression, and the same could show up again at some point in the future. Future-proof the migration code by adding an explicit check for FlatCCS support to the condition that decides whether to use a compressed PAT index for migration. While we're at it, we can drop the IS_DGFX check since it's redundant with the src_is_vram check (only dGPUs have VRAM). Cc: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240726171757.2728819-2-matthew.d.roper@intel.com	2024-07-29 08:15:56 -07:00
Akshata Jahagirdar	523f191cc0	drm/xe/xe_migrate: Handle migration logic for xe2+ dgfx During eviction (vram->sysmem), we use compressed -> uncompressed mapping. During restore (sysmem->vram), we need to use mapping from uncompressed -> uncompressed. Handle logic for selecting the compressed identity map for eviction, and selecting uncompressed map for restore operations. v2: Move check of xe_migrate_ccs_emit() before calling xe_migrate_ccs_copy(). (Nirmoy) Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/79b3a016e686a662ae68c32b5fc7f0f2ac8043e9.1721250309.git.akshata.jahagirdar@intel.com	2024-07-17 17:02:31 -07:00
Akshata Jahagirdar	2b808d6b29	drm/xe/xe2: Introduce identity map for compressed pat for vram Xe2+ has unified compression (exactly one compression mode/format), where compression is now controlled via PAT at PTE level. This simplifies KMD operations, as it can now decompress freely without concern for the buffer's original compression format—unlike DG2, which had multiple compression formats and thus required copying the raw CCS state during VRAM eviction. In addition mixed VRAM and system memory buffers were not supported with compression enabled. On Xe2 dGPU compression is still only supported with VRAM, however we can now support compression with VRAM and system memory buffers, with GPU access being seamless underneath. So long as when doing VRAM -> system memory the KMD uses compressed -> uncompressed, to decompress it. This also allows CPU access to such buffers, assuming that userspace first decompress the corresponding pages being accessed. If the pages are already in system memory then KMD would have already decompressed them. When restoring such buffers with sysmem -> VRAM the KMD can't easily know which pages were originally compressed, so we always use uncompressed -> uncompressed here. With this it also means we can drop all the raw CCS handling on such platforms (including needing to allocate extra CCS storage). In order to support this we now need to have two different identity mappings for compressed and uncompressed VRAM. In this patch, we set up the additional identity map for the VRAM with compressed pat_index. We then select the appropriate mapping during migration/clear. During eviction (vram->sysmem), we use the mapping from compressed -> uncompressed. During restore (sysmem->vram), we need the mapping from uncompressed -> uncompressed. Therefore, we need to have two different mappings for compressed and uncompressed vram. We set up an additional identity map for the vram with compressed pat_index. We then select the appropriate mapping during migration/clear. v2: Formatting nits, Updated code to match recent changes in xe_migrate_prepare_vm(). (Matt) v3: Move identity map loop to a helper function. (Matt Brost) v4: Split helper function in different patch, and add asserts and nits. (Matt Brost) v5: Convert the 2 bool arguments of pte_update_size to flags argument (Matt Brost) v6: Formatting nits (Matt Brost) Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/b00db5c7267e54260cb6183ba24b15c1e6ae52a3.1721250309.git.akshata.jahagirdar@intel.com	2024-07-17 17:02:30 -07:00
Akshata Jahagirdar	8d79acd567	drm/xe/migrate: Add helper function to program identity map Add an helper function to program identity map. v2: Formatting nits Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/91dc05f05bd33076fb9a9f74f8495b48d2abff53.1721250309.git.akshata.jahagirdar@intel.com	2024-07-17 17:02:29 -07:00
Akshata Jahagirdar	108c972a11	drm/xe/migrate: Handle clear ccs logic for xe2 dgfx For Xe2 dGPU, we clear the bo by modifying the VRAM using an uncompressed pat index which then indirectly updates the compression status as uncompressed i.e zeroed CCS. So xe_migrate_clear() should be updated for BMG to not emit CCS surf copy commands. v2: Moved xe_device_needs_ccs_emit() to xe_migrate.c and changed name to xe_migrate_needs_ccs_emit() since its very specific to migration.(Matt) Signed-off-by: Akshata Jahagirdar <akshata.jahagirdar@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/8dd869dd8dda5e17ace28c04f1a48675f5540874.1721250309.git.akshata.jahagirdar@intel.com	2024-07-17 17:02:27 -07:00
Matthew Brost	e8babb280b	drm/xe: Convert multiple bind ops into single job This aligns with the uAPI of an array of binds or single bind that results in multiple GPUVA ops to be considered a single atomic operations. The design is roughly: - xe_vma_ops is a list of xe_vma_op (GPUVA op) - each xe_vma_op resolves to 0-3 PT ops - xe_vma_ops creates a single job - if at any point during binding a failure occurs, xe_vma_ops contains the information necessary unwind the PT and VMA (GPUVA) state v2: - add missing dma-resv slot reservation (CI, testing) v4: - Fix TLB invalidation (Paulo) - Add missing xe_sched_job_last_fence_add/test_dep check (Inspection) v5: - Invert i, j usage (Matthew Auld) - Add helper to test and add job dep (Matthew Auld) - Return on anything but -ETIME for cpu bind (Matthew Auld) - Return -ENOBUFS if suballoc of BB fails due to size (Matthew Auld) - s/do/Do (Matthew Auld) - Add missing comma (Matthew Auld) - Do not assign return value to xe_range_fence_insert (Matthew Auld) v6: - s/0x1ff/MAX_PTE_PER_SDI (Matthew Auld, CI) - Check to large of SA in Xe to avoid triggering WARN (Matthew Auld) - Fix checkpatch issues v7: - Rebase - Support more than 510 PTEs updates in a bind job (Paulo, mesa testing) v8: - Rebase Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-5-matthew.brost@intel.com	2024-07-03 22:28:04 -07:00
Matthew Brost	67d90d679e	drm/xe: s/xe_tile_migrate_engine/xe_tile_migrate_exec_queue Engine is old nomenclature, replace with exec queue. Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240704041652.272920-2-matthew.brost@intel.com	2024-07-03 22:26:59 -07:00
Matthew Auld	ce6b63336f	drm/xe: fix error handling in xe_migrate_update_pgtables Don't call drm_suballoc_free with sa_bo pointing to PTR_ERR. References: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2120 Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240620102025.127699-2-matthew.auld@intel.com	2024-06-27 11:00:36 +01:00
Francois Dugast	de8390b101	drm/xe/sched_job: Promote xe_sched_job_add_deps() Move it out of the xe_migrate compilation unit so it can be re-used in other places. Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Francois Dugast <francois.dugast@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240614094433.775866-1-francois.dugast@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-14 16:51:43 -04:00
Radhakrishna Sripada	e46d3f813a	drm/xe/trace: Extract bo, vm, vma traces xe_trace.h is starting to get over crowded. Move the traces related to bo, vm, vma's to its own file. v2: Update year in License(Gustavo) Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Suggested-by: Jani Nikula <jani.nikula@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240607182943.3572524-2-radhakrishna.sripada@intel.com	2024-06-12 09:25:07 -07:00
Matthew Brost	6d3581edff	drm/xe: Don't overmap identity VRAM mapping Overmapping the identity VRAM mapping is triggering hardware bugs on certain platforms. Use 2M pages for the last unaligned (to 1G) VRAM chunk. v2: - Always use 2M pages for last chunk (Fei Yang) - break loop when 2M pages are used - Add assert for usable_size being 2M aligned v3: - Fix checkpatch Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Fei Yang <fei.yang@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Fei Yang <fei.yang@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240603181824.1927675-1-matthew.brost@intel.com	2024-06-05 11:40:09 -07:00
Thomas Hellström	50e52592fb	drm/xe: Move job creation out of the struct xe_migrate::job_mutex In order to be able to run gpu jobs from reclaim context, move job creation (where allocation takes place) out of the struct xe_migrate::job_mutex, and prime that mutex as reclaim tainted. Jobs that may need to run from reclaim context include CCS metadata extraction at shrinking time. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-6-thomas.hellstrom@linux.intel.com	2024-05-27 21:26:07 +02:00
Matthew Brost	04f4a70a18	drm/xe: Only use reserved BCS instances for usm migrate exec queue The GuC context scheduling queue is 2 entires deep, thus it is possible for a migration job to be stuck behind a fault if migration exec queue shares engines with user jobs. This can deadlock as the migrate exec queue is required to service page faults. Avoid deadlock by only using reserved BCS instances for usm migrate exec queue. Fixes: `a043fbab7a` ("drm/xe/pvc: Use fast copy engines as migrate engine on PVC") Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240415190453.696553-2-matthew.brost@intel.com Reviewed-by: Brian Welty <brian.welty@intel.com>	2024-05-14 13:12:27 -07:00
Lucas De Marchi	61549a2ee5	drm/xe: Drop __engine_mask Not really used, it's just a copy of engine_mask, which already reads the fuses to mark engines as available/not-available. Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240513213751.1017791-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-05-13 21:21:13 -07:00
Michal Wajdeczko	62010b3cd6	drm/xe: Move xe_gpu_commands.h file to instructions/ All other files with commands definitions are in instructions/ folder. Move xe_gpu_commands.h also there. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240508174856.1908-1-michal.wajdeczko@intel.com	2024-05-09 21:17:57 +02:00
José Roberto de Souza	335ad807d5	drm/xe: Remove debug message from migrate_clear() This messages is printed a lot and from my understanding it do not bring any value, so here dropping it. Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405153849.44906-1-jose.souza@intel.com	2024-04-08 07:11:02 -07:00
Michal Wajdeczko	48651e18bb	drm/xe: Move PTE/PDE bit definitions to proper header We already have dedicated header for GGTT/PPGTT definitions. It's also cleaner to separate them from implementation macros. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240405123520.847-1-michal.wajdeczko@intel.com	2024-04-05 19:58:54 +02:00
Himal Prasad Ghimiray	34820967ae	drm/xe/xe_migrate: Cast to output precision before multiplying operands Addressing potential overflow in result of multiplication of two lower precision (u32) operands before widening it to higher precision (u64). -v2 Fix commit message and description. (Rodrigo) Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240401175300.3823653-1-himal.prasad.ghimiray@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-04-03 15:04:56 -04:00
Lucas De Marchi	62742d1266	drm/xe: Normalize bo flags macros The flags stored in the BO grew over time without following much a naming pattern. First of all, get rid of the _BIT suffix that was banned from everywhere else due to the guideline in drivers/gpu/drm/i915/i915_reg.h that xe kind of follows: Define bits using ``REG_BIT(N)``. Do not add ``_BIT`` suffix to the name. Here the flags aren't for a register, but it's good practice to keep it consistent. Second divergence on names is the use or not of "CREATE". This is because most of the flags are passed to xe_bo_create() family of functions, changing its behavior. However, since the flags are also stored in the bo itself and checked elsewhere in the code, it seems better to just omit the CREATE part. With those 2 guidelines, all the flags are given the form XE_BO_FLAG_<FLAG_NAME> with the following commands: git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i \ -e "s/XE_BO_\([_A-Z0-9]\)_BIT/XE_BO_\1/g" \ -e 's/XE_BO_CREATE_/XE_BO_FLAG_/g' git grep -le "XE_BO_" -- drivers/gpu/drm/xe \| xargs sed -i -r \ -e 's/XE_BO_(DEFER_BACKING\|SCANOUT\|FIXED_PLACEMENT\|PAGETABLE\|NEEDS_CPU_ACCESS\|NEEDS_UC\|INTERNAL_TEST\|INTERNAL_64K\|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g' And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to follow the coding style. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-04-02 10:33:57 -07:00
Arnd Bergmann	1408784b59	drm/xe/xe2: fix 64-bit division in pte_update_size This function does not build on 32-bit targets when the compiler fails to reduce DIV_ROUND_UP() into a shift: ld.lld: error: undefined symbol: __aeabi_uldivmod >>> referenced by xe_migrate.c >>> drivers/gpu/drm/xe/xe_migrate.o:(pte_update_size) in archive vmlinux.a There are two instances in this function. Change the first to use an open-coded shift with the same behavior, and the second one to a 32-bit calculation, which is sufficient here as the size is never more than 2^32 pages (16TB). Fixes: `237412e453` ("drm/xe: Enable 32bits build") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240226124736.1272949-3-arnd@kernel.org Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-02-28 11:38:12 -08:00
Dafna Hirschfeld	a24d909977	drm/xe: Do not include current dir for generated/xe_wa_oob.h The generated file 'generated/xe_wa_oob.h' is included using: "generated/xe_wa_oob.h" which first look inside the source code. But the file resides in the build directory and should therefore be included using: <generated/xe_wa_oob.h> Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240221083622.1584492-1-dhirschfeld@habana.ai	2024-02-21 21:53:15 -08:00
Matthew Brost	72f86ed3c8	drm/xe: Map both mem.kernel_bb_pool and usm.bb_pool For integrated devices we need to map both mem.kernel_bb_pool and usm.bb_pool to be able to run batches from both pools. Fixes: `a682b6a42d` ("drm/xe: Support device page faults on integrated platforms") Tested-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Brian Welty <brian.welty@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240202033440.2351862-1-matthew.brost@intel.com	2024-02-02 17:37:58 -08:00
Matthew Brost	a856b67a84	drm/xe: Take a reference in xe_exec_queue_last_fence_get() Take a reference in xe_exec_queue_last_fence_get(). Also fix a reference counting underflow bug VM bind and unbind. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240201004849.2219558-2-matthew.brost@intel.com	2024-02-02 06:45:57 -08:00
Fei Yang	348769d1cb	drm/xe: correct the assertion for number of PTEs While one MI_STORE_DATA_IMM can take no more than 0x1fe qwords, the size of the pgtable can be 512 entries. Fixes: `43d48379c9` ("drm/xe: correct the calculation of remaining size") Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Fei Yang <fei.yang@intel.com> Tested-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240125065245.1204731-2-fei.yang@intel.com	2024-01-26 18:29:05 -08:00
Himal Prasad Ghimiray	6a02867560	drm/xe/xe2: Use XE_CACHE_WB pat index The pat table entry associated with XE_CACHE_WB is coherent whereas XE_CACHE_NONE is non coherent. Migration expects the coherency with cpu therefore use the coherent entry XE_CACHE_WB for buffers not supporting compression. For read/write to flat ccs region the issue is not related to coherency with cpu. The hardware expects the pat index associated with GPUVA for indirect access to be compression enabled hence use XE_CACHE_NONE_COMPRESSION. v2 - Fix the argument to emit_pte, pass the bool directly. (Thomas) v3 - Rebase - Update commit message (Matt) v4 - Add a Fixes: tag. (Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `65ef8dbad1` ("drm/xe/xe2: Update emit_pte to use compression enabled PAT index") Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240119041826.1670496-1-himal.prasad.ghimiray@intel.com	2024-01-22 12:36:36 +01:00
Fei Yang	43d48379c9	drm/xe: correct the calculation of remaining size In function write_pgtable, the calculation of chunk in the do-while loop is wrong, we should always compare against remaining size instead of the total size update->qwords. Signed-off-by: Fei Yang <fei.yang@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240116223709.652585-2-fei.yang@intel.com	2024-01-18 14:43:44 -08:00
Matt Roper	ca630876aa	drm/xe/migrate: Cap PTEs written by MI_STORE_DATA_IMM to 510 Although MI_STORE_DATA_IMM's "length" field is 10-bits, 0x3FE is considered the largest legal value accepted. Since that instruction field is always encoded in (val-2) format, this translates to 0x400 dwords for the true maximum length of the instruction. Subtracting the instruction header (1 dword) and address (2 dwords), that leaves 0x3FD dwords (i.e., 0x1FE qwords) for PTE values. Bspec: 60246, 45753 Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20240111220238.1467572-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>	2024-01-12 09:09:06 -08:00
Thomas Hellström	ef51d7542d	drm/xe/migrate: Fix CCS copy for small VRAM copy chunks Since the migrate code is using the identity map for addressing VRAM, copy chunks may become as small as 64K if the VRAM resource is fragmented. However, a chunk size smaller that 1MiB may lead to the next chunk's offset into the CCS metadata backup memory may not be page-aligned, and the XY_CTRL_SURF_COPY_BLT command can't handle that, and even if it could, the current code doesn't handle the offset calculaton correctly. To fix this, make sure we align the size of VRAM copy chunks to 1MiB. If the remaining data to copy is smaller than that, that's not a problem, so use the remaining size. If the VRAM copy cunk becomes fragmented due to the size alignment restriction, don't use the identity map, but instead emit PTEs into the page-table like we do for system memory. v2: - Rebase v3: - Future proof somewhat by taking into account the real data size to flat CCS metadata size ratio. (Matt Roper) - Invert a couple of if-statements for better readability. - Fix support for 4K-granularity VRAM sizes. (Tested on DG1). v4: - Fix up code comments - Fix debug printout format typo. v5: - Add a Fixes: tag. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `e89b384cde` ("drm/xe/migrate: Update emit_pte to cope with a size level than 4k") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240110163415.524165-1-thomas.hellstrom@linux.intel.com	2024-01-11 10:00:25 +01:00
Brian Welty	25ce7c5063	drm/xe: Finish refactoring of exec_queue_create Setting of exec_queue user extensions is moved from the end of the ioctl function earlier, into __xe_exec_queue_alloc(). This fixes bug in that the USM attributes for access counters were being applied too late, and effectively were ignored. However, in order to apply user extensions this early, we can no longer call q->ops functions. Instead, make it more efficient. The user extension functions can simply update the q->sched_props values and they will be applied by the backend during q->ops->init(). v2: minor changes for readability (Matt) Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2024-01-10 15:01:53 -08:00
Brian Welty	a8004af338	drm/xe: Fix modifying exec_queue priority in xe_migrate_init After exec_queue has been created, we cannot simply modify q->priority. This needs to be done by the backend via q->ops. However in this case, it would be more efficient to simply pass a flag when creating the exec_queue and set the desired priority upfront during queue creation. To that end: new flag EXEC_QUEUE_FLAG_HIGH_PRIORITY is introduced. The priority field is moved to be with other scheduling properties and is now exec_queue.sched_props.priority. This is no longer set to initial value by the backend, but is now set within __xe_exec_queue_create(). Fixes: `b4eecedc75` ("drm/xe: Fix potential deadlock handling page faults") Signed-off-by: Brian Welty <brian.welty@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com>	2024-01-09 14:11:58 -08:00
Himal Prasad Ghimiray	266c858852	drm/xe/xe2: Handle flat ccs move for igfx. - Clear flat ccs during user bo creation. - copy ccs meta data between flat ccs and bo during eviction and restore. - Add a bool field ccs_cleared in bo, true means ccs region of bo is already cleared. v2: - Rebase. v3: - Maintain order of xe_bo_move_notify for ttm_bo_type_sg. v4: - xe_migrate_copy can be used to copy src to dst bo on igfx too. Add a bool which handles only ccs metadata copy. v5: - on dgfx ccs should be cleared even if the bo is not compression enabled. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	65ef8dbad1	drm/xe/xe2: Update emit_pte to use compression enabled PAT index For indirect accessed buffer use compression enabled PAT index. v2: - Fix parameter name. v3: - use a relevant define instead of fix number. Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00
Himal Prasad Ghimiray	0942752679	drm/xe/xe2: Update chunk size for each iteration of ccs copy In xe2 platform XY_CTRL_SURF_COPY_BLT can handle ccs copy for max of 1024 main surface pages. v2: - Use better logic to determine chunk size (Matt/Thomas) v3: - use function instead of macro(Thomas) Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-21 11:46:15 -05:00

1 2 3

110 Commits