linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 23:03:57 -04:00

Author	SHA1	Message	Date
Niranjana Vishwanathapura	caaed1dda7	Revert "drm/xe/multi_queue: Support active group after primary is destroyed" This reverts commit `3131a43ecb`. There is no must have requirement for this feature from Compute UMD. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260106191051.2866538-5-niranjana.vishwanathapura@intel.com	2026-01-06 11:13:54 -08:00
Thomas Hellström	dff547e137	drm/xe/uapi: Extend the madvise functionality to support foreign pagemap placement for svm Use device file descriptors and regions to represent pagemaps on foreign or local devices. The underlying files are type-checked at madvise time, and references are kept on the drm_pagemap as long as there is are madvises pointing to it. Extend the madvise preferred_location UAPI to support the region instance to identify the foreign placement. v2: - Improve UAPI documentation. (Matt Brost) - Sanitize preferred_mem_loc.region_instance madvise. (Matt Brost) - Clarify madvise drm_pagemap vs xe_pagemap refcounting. (Matt Brost) - Don't allow a foreign drm_pagemap madvise without a fast interconnect. v3: - Add a comment about reference-counting in xe_devmem_open() and remove the reference-count get-and-put. (Matt Brost) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251219113320.183860-16-thomas.hellstrom@linux.intel.com	2025-12-23 10:00:48 +01:00
Ashutosh Dixit	ab39e2a8f7	drm/xe/oa/uapi: Expose MERT OA unit A MERT OA unit is available in the SoC on some platforms. Add support for this OA unit and expose it to userspace. The MERT OA unit does not have any HW engines attached, but is otherwise similar to an OAM unit. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com	2025-12-16 17:08:24 -08:00
Shuicheng Lin	b07bac9bd7	drm/xe: Limit num_syncs to prevent oversized allocations The exec and vm_bind ioctl allow userspace to specify an arbitrary num_syncs value. Without bounds checking, a very large num_syncs can force an excessively large allocation, leading to kernel warnings from the page allocator as below. Introduce DRM_XE_MAX_SYNCS (set to 1024) and reject any request exceeding this limit. " ------------[ cut here ]------------ WARNING: CPU: 0 PID: 1217 at mm/page_alloc.c:5124 __alloc_frozen_pages_noprof+0x2f8/0x2180 mm/page_alloc.c:5124 ... Call Trace: <TASK> alloc_pages_mpol+0xe4/0x330 mm/mempolicy.c:2416 ___kmalloc_large_node+0xd8/0x110 mm/slub.c:4317 __kmalloc_large_node_noprof+0x18/0xe0 mm/slub.c:4348 __do_kmalloc_node mm/slub.c:4364 [inline] __kmalloc_noprof+0x3d4/0x4b0 mm/slub.c:4388 kmalloc_noprof include/linux/slab.h:909 [inline] kmalloc_array_noprof include/linux/slab.h:948 [inline] xe_exec_ioctl+0xa47/0x1e70 drivers/gpu/drm/xe/xe_exec.c:158 drm_ioctl_kernel+0x1f1/0x3e0 drivers/gpu/drm/drm_ioctl.c:797 drm_ioctl+0x5e7/0xc50 drivers/gpu/drm/drm_ioctl.c:894 xe_drm_ioctl+0x10b/0x170 drivers/gpu/drm/xe/xe_device.c:224 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:598 [inline] __se_sys_ioctl fs/ioctl.c:584 [inline] __x64_sys_ioctl+0x18b/0x210 fs/ioctl.c:584 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0xbb/0x380 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f ... " v2: Add "Reported-by" and Cc stable kernels. v3: Change XE_MAX_SYNCS from 64 to 1024. (Matt & Ashutosh) v4: s/XE_MAX_SYNCS/DRM_XE_MAX_SYNCS/ (Matt) v5: Do the check at the top of the exec func. (Matt) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Reported-by: Koen Koning <koen.koning@intel.com> Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6450 Cc: <stable@vger.kernel.org> # v6.12+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: Carl Zhang <carl.zhang@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Ivan Briano <ivan.briano@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251205234715.2476561-5-shuicheng.lin@intel.com	2025-12-15 13:33:56 -08:00
Niranjana Vishwanathapura	3131a43ecb	drm/xe/multi_queue: Support active group after primary is destroyed Add support to keep the group active after the primary queue is destroyed. Instead of killing the primary queue during exec_queue destroy ioctl, kill it when all the secondary queues of the group are killed. Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251211010249.1647839-34-niranjana.vishwanathapura@intel.com	2025-12-11 19:22:05 -08:00
Niranjana Vishwanathapura	2a31ea17d5	drm/xe/multi_queue: Add exec_queue set_property ioctl support This patch adds support for exec_queue set_property ioctl. It is derived from the original work which is part of https://patchwork.freedesktop.org/series/112188/ Currently only DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY property can be dynamically set. v2: Check for and update kernel-doc which property this ioctl supports (Matt Brost) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251211010249.1647839-25-niranjana.vishwanathapura@intel.com	2025-12-11 19:21:04 -08:00
Niranjana Vishwanathapura	898a00f4b4	drm/xe/multi_queue: Add multi queue priority property Add support for queues of a multi queue group to set their priority within the queue group by adding property DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE_PRIORITY. This is the only other property supported by secondary queues of a multi queue group, other than DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE. v2: Add kernel doc for enum xe_multi_queue_priority, Add assert for priority values, fix includes and declarations (Matt Brost) v3: update uapi kernel-doc (Matt Brost) v4: uapi change due to rebase Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251211010249.1647839-23-niranjana.vishwanathapura@intel.com	2025-12-11 19:20:51 -08:00
Niranjana Vishwanathapura	d9ec634746	drm/xe/multi_queue: Add user interface for multi queue support Multi Queue is a new mode of execution supported by the compute and blitter copy command streamers (CCS and BCS, respectively). It is an enhancement of the existing hardware architecture and leverages the same submission model. It enables support for efficient, parallel execution of multiple queues within a single context. All the queues of a group must use the same address space (VM). The new DRM_XE_EXEC_QUEUE_SET_PROPERTY_MULTI_QUEUE execution queue property supports creating a multi queue group and adding queues to a queue group. All queues of a multi queue group share the same context. A exec queue create ioctl call with above property specified with value DRM_XE_SUPER_GROUP_CREATE will create a new multi queue group with the queue being created as the primary queue (aka q0) of the group. To add secondary queues to the group, they need to be created with the above property with id of the primary queue as the value. The properties of the primary queue (like priority, timeslice) applies to the whole group. So, these properties can't be set for secondary queues of a group. Once destroyed, the secondary queues of a multi queue group can't be replaced. However, they can be dynamically added to the group up to a total of 64 queues per group. Once the primary queue is destroyed, secondary queues can't be added to the queue group. v2: Remove group->lock, fix xe_exec_queue_group_add()/delete() function semantics, add additional comments, remove unused group->list_lock, add XE_BO_FLAG_GGTT_INVALIDATE for cgp bo, Assert LRC is valid, update uapi kernel doc. (Matt Brost) v3: Use XE_BO_FLAG_PINNED_LATE_RESTORE/USER_VRAM/GGTT_INVALIDATE flags for cgp bo (Matt) v4: Ensure queue is not a vm_bind queue uapi change due to rebase Signed-off-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251211010249.1647839-21-niranjana.vishwanathapura@intel.com	2025-12-11 19:20:37 -08:00
Ashutosh Dixit	16e076b036	drm/xe/oa/uapi: Add gt_id to struct drm_xe_oa_unit gt_id was previously omitted from 'struct drm_xe_oa_unit' because it could be determine from hwe's attached to the OA unit. However, we now have OA units which don't have any hwe's attached to them. Hence add gt_id to 'struct drm_xe_oa_unit' in order to provide this needed information to userspace. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251202025115.373546-3-ashutosh.dixit@intel.com	2025-12-04 13:33:20 -08:00
Sanjay Yadav	78d91ba6bd	drm/xe/uapi: Add NO_COMPRESSION BO flag and query capability Introduce DRM_XE_GEM_CREATE_FLAG_NO_COMPRESSION to let userspace opt out of CCS compression on a per-BO basis. When set, the driver maps this to XE_BO_FLAG_NO_COMPRESSION, skips CCS metadata allocation/clearing, and rejects compressed PAT indices at vm_bind. This avoids extra memory ops and manual CCS state handling for buffers. To allow userspace to detect at runtime whether the kernel supports this feature, add DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT and expose it via query_config() on Xe2+ platforms. Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38425 IGT PR: https://patchwork.freedesktop.org/patch/685180/ v2 - Changed error code from -EINVAL to -EOPNOTSUPP for unsupported flag usage on pre-Xe2 platforms - Fixed checkpatch warning in xe_vm.c - Fixed kernel-doc formatting in xe_drm.h v3 - Rebase - Updated commit title and description - Added UAPI for DRM_XE_QUERY_CONFIG_FLAG_HAS_NO_COMPRESSION_HINT and exposed it via query_config() v4 - Rebase v5 - Included Mesa PR and IGT PR in the commit description - Used xe_pat_index_get_comp_en() to extract the compression v6 - Added XE_IOCTL_DBG() checks for argument validation Suggested-by: Matthew Auld <matthew.auld@intel.com> Suggested-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20251204040402.2692921-2-sanjay.kumar.yadav@intel.com	2025-12-04 11:31:11 +00:00
Matthew Brost	b80961a86b	drm/xe/uapi: Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE Add DRM_XE_EXEC_QUEUE_SET_HANG_REPLAY_STATE which accepts a user pointer to populate the exec queue state so that a GPU hang can be replayed via a Mesa tool. v2: Update the value for HANG_REPLAY_STATE flag Cc: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Carlos Santa <carlos.santa@intel-corp-partner.google.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251126185952.546277-8-matthew.brost@intel.com	2025-12-01 10:03:17 -08:00
Thomas Hellström	59a2d3f38a	drm/xe/uapi: Hide the madvise autoreset behind a VM_BIND flag The madvise implementation currently resets the SVM madvise if the underlying CPU map is unmapped. This is in an attempt to mimic the CPU madvise behaviour. However, it's not clear that this is a desired behaviour since if the end app user relies on it for malloc()ed objects or stack objects, it may not work as intended. Instead of having the autoreset functionality being a direct application-facing implicit UAPI, make the UMD explicitly choose this behaviour if it wants to expose it by introducing DRM_XE_VM_BIND_FLAG_MADVISE_AUTORESET, and add a semantics description. v2: - Kerneldoc fixes. Fix a commit log message. Fixes: `a2eb8aec3e` ("drm/xe: Reset VMA attributes to default in SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Falkowski, John" <john.falkowski@intel.com> Cc: "Mrozek, Michal" <michal.mrozek@intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20251015170726.178685-2-thomas.hellstrom@linux.intel.com	2025-10-17 10:26:15 +02:00
Sanjay Yadav	409b949909	drm/xe/uapi: Add documentation for DRM_XE_GEM_CREATE_FLAG_SCANOUT Add documentation for drm_xe_gem_create structure flag DRM_XE_GEM_CREATE_FLAG_SCANOUT. Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20251014142823.3701228-2-sanjay.kumar.yadav@intel.com	2025-10-15 16:35:56 +01:00
Himal Prasad Ghimiray	eeb8117f5f	drm/xe/uapi: Fix kernel-doc formatting for madvise and vma_query Correct kernel-doc formatting issues in the UAPI definitions for madvise and VMA query interfaces to resolve docutils warnings during documentation build. Fixes: `418807860e` ("drm/xe/uapi: Add UAPI for querying VMA count and memory attributes") Fixes: `231bb0ee7a` ("drm/xe/uapi: Add madvise interface") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250828071516.3838110-1-himal.prasad.ghimiray@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2025-08-28 07:08:08 -07:00
Himal Prasad Ghimiray	418807860e	drm/xe/uapi: Add UAPI for querying VMA count and memory attributes Introduce the DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS ioctl to allow userspace to query memory attributes of VMAs within a user specified virtual address range. Userspace first calls the ioctl with num_mem_ranges = 0, sizeof_mem_ranges_attr = 0 and vector_of_vma_mem_attr = NULL to retrieve the number of memory ranges (vmas) and size of each memory range attribute. Then, it allocates a buffer of that size and calls the ioctl again to fill the buffer with memory range attributes. This two-step interface allows userspace to first query the required buffer size, then retrieve detailed attributes efficiently. v2 (Matthew Brost) - Use same ioctl to overload functionality v3 - Add kernel-doc v4 - Make uapi future proof by passing struct size (Matthew Brost) - make lock interruptible (Matthew Brost) - set reserved bits to zero (Matthew Brost) - s/__copy_to_user/copy_to_user (Matthew Brost) - Avod using VMA term in uapi (Thomas) - xe_vm_put(vm) is missing (Shuicheng) v5 - Nits - Fix kernel-doc Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-21-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	fa1a82c985	drm/xe/uapi: Add flag for consulting madvise hints on svm prefetch Introduce flag DRM_XE_CONSULT_MEM_ADVISE_PREF_LOC to ensure prefetching in madvise-advised memory regions v2 (Matthew Brost) - Add kernel-doc v3 (Matthew Brost) - Fix kernel-doc Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-13-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:36 +05:30
Himal Prasad Ghimiray	231bb0ee7a	drm/xe/uapi: Add madvise interface This commit introduces a new madvise interface to support driver-specific ioctl operations. The madvise interface allows for more efficient memory management by providing hints to the driver about the expected memory usage and pte update policy for gpuvma. v2 (Matthew/Thomas) - Drop num_ops support - Drop purgeable support - Add kernel-docs - IOWR/IOW v3 (Matthew/Thomas) - Reorder attributes - use __u16 for migration_policy - use __u64 for reserved in unions - Avoid usage of vma Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821173104.3030148-2-himal.prasad.ghimiray@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-08-26 11:25:35 +05:30
Priyanka Dandamudi	4df0bd5eb4	drm/xe/uapi: Add documentation for DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING Add documentation for drm_xe_gem_create structure flag DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING. v2: Modified to be in a more generalised way. Signed-off-by: Priyanka Dandamudi <priyanka.dandamudi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250728043336.3319521-1-priyanka.dandamudi@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-07-29 11:21:00 +05:30
Shuicheng Lin	771f002ef1	drm/xe/uapi: Correct sync type definition in comments Commit `37d078e51b` ("drm/xe/uapi: Split xe_sync types from flags") renamed some DRM_XE_SYNC_* defines but later commits kept using the old names. Correct them with the new definition. v2: correct fixes tag and update commit message to explain why (Lucas) Fixes: `9329f06672` ("drm/xe/uapi: Use LR abbrev for long-running vms") Fixes: `4b437893a8` ("drm/xe/uapi: More uAPI documentation additions and cosmetic updates") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Zongyao Bai <zongyao.bai@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250608230133.1250849-1-shuicheng.lin@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-06-25 10:15:54 -04:00
Ashutosh Dixit	e04dac12ce	drm/xe/oa/uapi: Expose media OA units On Xe2+ platforms, media engines are attached to "SCMI" OA media (OAM) units. One or more SCMI OAM units might be present on a platform. In addition there is another OAM unit for global events, called OAM-SAG. Performance metrics for media workloads can be obtained from these OAM units, similar to OAG. Expose these OAM units for userspace to use. OAM-SAG is exposed as an OA unit without any attached engines. Bspec: 70819, 67103, 63844, 72572, 74476, 61284 v2: Fix xe_gt_WARN_ON in __hwe_oam_unit for < 12.7 platforms v3: Return XE_OA_UNIT_INVALID for < 12.7 to indicate no OAM units v4: Move xe_oa_print_oa_units() to separate patch v5: Introduce DRM_XE_OA_UNIT_TYPE_OAM_SAG v6: Introduce DRM_XE_OA_CAPS_OAM Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250606192618.4133817-2-ashutosh.dixit@intel.com	2025-06-17 11:31:50 -07:00
Daniele Ceraolo Spurio	21784ca960	drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready The expected flow of operations when using PXP is to query the PXP status and wait for it to transition to "ready" before attempting to create an exec_queue. This flow is followed by the Mesa driver, but there is no guarantee that an incorrectly coded (or malicious) app will not attempt to create the queue first without querying the status. Therefore, we need to clarify what the expected behavior of the queue creation ioctl is in this scenario. Currently, the ioctl always fails with an -EBUSY code no matter the error, but for consistency it is better to distinguish between "failed to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl does. Note that, while this is a change in the return code of an ioctl, the behavior of the ioctl in this particular corner case was not clearly spec'd, so no one should have been relying on it (and we know that Mesa, which is the only known userspace for this, didn't). v2: Minor rework of the doc (Rodrigo) Fixes: `72d479601d` ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com	2025-06-02 08:28:48 -07:00
Oak Zeng	ae28e34400	drm/xe: Allow scratch page under fault mode for certain platform Normally scratch page is not allowed when a vm is operate under page fault mode, i.e., in the existing codes, DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE and DRM_XE_VM_CREATE_FLAG_FAULT_MODE are mutual exclusive. The reason is fault mode relies on recoverable page to work, while scratch page can mute recoverable page fault. On xe2 and xe3, out of bound prefetch can cause page fault and further system hang because xekmd can't resolve such page fault. SYCL and OCL language runtime requires out of bound prefetch to be silently dropped without causing any functional problem, thus the existing behavior doesn't meet language runtime requirement. At the same time, HW prefetching can cause page fault interrupt. Due to page fault interrupt overhead (i.e., need Guc and KMD involved to fix the page fault), HW prefetching can be slowed by many orders of magnitude. Fix those problems by allowing scratch page under fault mode for xe2 and xe3. With scratch page in place, HW prefetching could always hit scratch page instead of causing interrupt. A side effect is, scratch page could hide application program error. Application out of bound accesses are hided by scratch page mapping, instead of get reported to user. v2: Refine commit message (Thomas) v3: Move the scratch page flag check to after scratch page wa (Thomas) v4: drop NEEDS_SCRATCH macro (matt) Add a comment to DRM_XE_VM_CREATE_FLAG_SCRATCH_PAGE Signed-off-by: Oak Zeng <oak.zeng@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://lore.kernel.org/r/20250403165328.2438690-4-oak.zeng@intel.com Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>	2025-04-07 11:17:30 +05:30
Matthew Brost	77613a2e10	drm/xe/uapi: Add DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR Add the DRM_XE_QUERY_CONFIG_FLAG_HAS_CPU_ADDR_MIRROR device query flag, which indicates whether the device supports CPU address mirroring. The intent is for UMDs to use this query to determine if a VM can be set up with CPU address mirroring. This flag is implemented by checking if the device supports GPU faults. v7: - Only report enabled if CONFIG_DRM_GPUSVM is selected (CI) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-20-matthew.brost@intel.com	2025-03-06 11:35:52 -08:00
Matthew Brost	b43e864af0	drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR Add the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag, which is used to create unpopulated virtual memory areas (VMAs) without memory backing or GPU page tables. These VMAs are referred to as CPU address mirror VMAs. The idea is that upon a page fault or prefetch, the memory backing and GPU page tables will be populated. CPU address mirror VMAs only update GPUVM state; they do not have an internal page table (PT) state, nor do they have GPU mappings. It is expected that CPU address mirror VMAs will be mixed with buffer object (BO) VMAs within a single VM. In other words, system allocations and runtime allocations can be mixed within a single user-mode driver (UMD) program. Expected usage: - Bind the entire virtual address (VA) space upon program load using the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - If a buffer object (BO) requires GPU mapping (runtime allocation), allocate a CPU address using mmap(PROT_NONE), bind the BO to the mmapped address using existing bind IOCTLs. If a CPU map of the BO is needed, mmap it again to the same CPU address using mmap(MAP_FIXED) - If a BO no longer requires GPU mapping, munmap it from the CPU address space and them bind the mapping address with the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag. - Any malloc'd or mmapped CPU address accessed by the GPU will be faulted in via the SVM implementation (system allocation). - Upon freeing any mmapped or malloc'd data, the SVM implementation will remove GPU mappings. Only supporting 1 to 1 mapping between user address space and GPU address space at the moment as that is the expected use case. uAPI defines interface for non 1 to 1 but enforces 1 to 1, this restriction can be lifted if use cases arrise for non 1 to 1 mappings. This patch essentially short-circuits the code in the existing VM bind paths to avoid populating page tables when the DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR flag is set. v3: - Call vm_bind_ioctl_ops_fini on -ENODATA - Don't allow DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR on non-faulting VMs - s/DRM_XE_VM_BIND_FLAG_SYSTEM_ALLOCATOR/DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR (Thomas) - Rework commit message for expected usage (Thomas) - Describe state of code after patch in commit message (Thomas) v4: - Fix alignment (Checkpatch) Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250306012657.3505757-9-matthew.brost@intel.com	2025-03-06 11:35:33 -08:00
Tejas Upadhyay	5488bec96b	drm/xe/uapi: Use hint for guc to set GT frequency Allow user to provide a low latency hint. When set, KMD sends a hint to GuC which results in special handling for that process. SLPC will ramp the GT frequency aggressively every time it switches to this process. We need to enable the use of SLPC Compute strategy during init, but it will apply only to processes that set this bit during process creation. Improvement with this approach as below: Before, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 283.16 us After, :~$ NEOReadDebugKeys=1 EnableDirectSubmission=0 clpeak --kernel-latency Platform: Intel(R) OpenCL Graphics Device: Intel(R) Graphics [0xe20b] Driver version : 24.52.0 (Linux x64) Compute units : 160 Clock frequency : 2850 MHz Kernel launch latency : 63.38 us Compute PR: https://github.com/intel/compute-runtime/pull/794 Mesa PR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33214 IGT PR: https://patchwork.freedesktop.org/patch/639989/ V10(Lucas): - Remove doc from drm-uapi.rst v9(Vinay): - remove extra line, align commit message v8(Vinay): - Add separate example for using low latency hint v7(Jose): - Update UMD PR - applicable to all gpus V6: - init flags, remove redundant flags check (MAuld) V5: - Move uapi doc to documentation and GuC ABI specific change (Rodrigo) - Modify logic to restrict exec queue flags (MAuld) V4: - To make it clear, dont use exec queue word (Vinay) - Correct typo in description of flag (Jose/Vinay) - rename set_strategy api and replace ctx with exec queue(Vinay) - Start with 0th bit to indentify user flags (Jose) V3: - Conver user flag to kernel internal flag and use (Oak) - Support query config for use to check kernel support (Jose) - Dont need to take runtime pm (Vinay) V2: - DRM_XE_EXEC_QUEUE_LOW_LATENCY_HINT 1 planned for other hint(Szymon) - Add motivation to description (Lucas) Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250228070224.739295-2-tejas.upadhyay@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>	2025-03-05 09:54:24 +05:30
Harish Chegondi	cd5bbb2532	drm/xe/uapi: Add a device query to get EU stall sampling information User space can get the EU stall data record size, EU stall capabilities, EU stall sampling rates, and per XeCore buffer size with query IOCTL DRM_IOCTL_XE_DEVICE_QUERY with .query set to DRM_XE_DEVICE_QUERY_EU_STALL. A struct drm_xe_query_eu_stall will be returned to the user space along with an array of supported sampling rates sorted in the fastest sampling rate first order. sampling_rates in struct drm_xe_query_eu_stall will point to the array of sampling rates. Any capabilities in EU stall sampling as of this patch are considered as base capabilities. New capability bits will be added for any new functionality added later. v12: Rename has_eu_stall_sampling_support() to xe_eu_stall_supported_on_platform() and move it to header file. v11: Check if EU stall sampling is supported on the platform. v10: Change comments and variable names as per feedback v9: Move reserved fields above num_sampling_rates in struct drm_xe_query_eu_stall. v7: Change sampling_rates from a pointer to flexible array. v6: Include EU stall sampling rates information and per XeCore buffer size in the query information. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/67ba42796a5a99d648239c315694cd222812a49b.1740533885.git.harish.chegondi@intel.com	2025-02-26 11:31:05 -08:00
Harish Chegondi	1537ec85eb	drm/xe/uapi: Introduce API for EU stall sampling A new hardware feature first introduced in PVC gives capability to periodically sample EU stall state and record counts for different stall reasons, on a per IP basis, aggregate across all EUs in a subslice and record the samples in a buffer in each subslice. Eventually, the aggregated data is written out to a buffer in the memory. This feature is also supported in XE2 and later architecture GPUs. Use an existing IOCTL - DRM_IOCTL_XE_OBSERVATION as the interface into the driver from the user space to do initial setup and obtain a file descriptor for the EU stall data stream. Input parameter to the IOCTL is a struct drm_xe_observation_param in which observation_type should be set to DRM_XE_OBSERVATION_TYPE_EU_STALL, observation_op should be DRM_XE_OBSERVATION_OP_STREAM_OPEN and param should point to a chain of drm_xe_ext_set_property structures in which each structure has a pair of property and value. The EU stall sampling input properties are defined in drm_xe_eu_stall_property_id enum. With the file descriptor obtained from DRM_IOCTL_XE_OBSERVATION, user space can enable and disable EU stall sampling with the IOCTLs: DRM_XE_OBSERVATION_IOCTL_ENABLE and DRM_XE_OBSERVATION_IOCTL_DISABLE. User space can also call poll() to check for availability of data in the buffer. The data can be read with read(). Finally, the file descriptor can be closed with close(). v11: Changed a couple of variables in struct eu_stall_open_properties from unsigned int to int. v10: Use extension number while parsing chain of extensions. Remove function description for static functions. Move code around as per review feedback. v9: Changed some u32 to unsigned int. Moved some code around as per review feedback from v8. v8: Used div_u64 instead of / to fix 32-bit build issue. Changed copyright year in xe_eu_stall.c/h to 2025. v7: Renamed input property DRM_XE_EU_STALL_PROP_EVENT_REPORT_COUNT to DRM_XE_EU_STALL_PROP_WAIT_NUM_REPORTS to be consistent with OA. Renamed the corresponding internal variables. Fixed some commit messages based on review feedback. v6: Change the input sampling rate to GPU cycles instead of GPU cycles multiplier. Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/bb707a27975c33e4a912b9839b023acb7a1f9c90.1740533885.git.harish.chegondi@intel.com	2025-02-26 11:30:57 -08:00
Daniele Ceraolo Spurio	41a97c4a12	drm/xe/pxp/uapi: Add API to mark a BO as using PXP The driver needs to know if a BO is encrypted with PXP to enable the display decryption at flip time. Furthermore, we want to keep track of the status of the encryption and reject any operation that involves a BO that is encrypted using an old key. There are two points in time where such checks can kick in: 1 - at VM bind time, all operations except for unmapping will be rejected if the key used to encrypt the BO is no longer valid. This check is opt-in via a new VM_BIND flag, to avoid a scenario where a malicious app purposely shares an invalid BO with a non-PXP aware app (such as a compositor). If the VM_BIND was failed, the compositor would be unable to display anything at all. Allowing the bind to go through means that output still works, it just displays garbage data within the bounds of the illegal BO. 2 - at job submission time, if the queue is marked as using PXP, all objects bound to the VM will be checked and the submission will be rejected if any of them was encrypted with a key that is no longer valid. Note that there is no risk of leaking the encrypted data if a user does not opt-in to those checks; the only consequence is that the user will not realize that the encryption key is changed and that the data is no longer valid. v2: Better commnnts and descriptions (John), rebase v3: Properly return the result of key_assign up the stack, do not use xe_bo in display headers (Jani) v4: improve key_instance variable documentation (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-11-daniele.ceraolospurio@intel.com	2025-02-03 11:51:23 -08:00
Daniele Ceraolo Spurio	bd98ac2e05	drm/xe/pxp/uapi: Add a query for PXP status PXP prerequisites (SW proxy and HuC auth via GSC) are completed asynchronously from driver load, which means that userspace can start submitting before we're ready to start a PXP session. Therefore, we need a query that userspace can use to check not only if PXP is supported but also to wait until the prerequisites are done. v2: Improve doc, do not report TYPE_NONE as supported (José) v3: Better comments, remove unneeded copy_from_user (John) Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-10-daniele.ceraolospurio@intel.com	2025-02-03 11:51:21 -08:00
Daniele Ceraolo Spurio	72d479601d	drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues Userspace is required to mark a queue as using PXP to guarantee that the PXP instructions will work. In addition to managing the PXP sessions, when a PXP queue is created the driver will set the relevant bits in its context control register. On submission of a valid PXP queue, the driver will validate all encrypted objects mapped to the VM to ensured they were encrypted with the current key. v2: Remove pxp_types include outside of PXP code (Jani), better comments and code cleanup (John) v3: split the internal PXP management to a separate patch for ease of review. re-order ioctl checks to always return -EINVAL if parameters are invalid, rebase on msix changes. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250129174140.948829-9-daniele.ceraolospurio@intel.com	2025-02-03 11:51:18 -08:00
Rodrigo Vivi	a46ea12eca	drm/xe/uapi: Fix documentation indentation Fix these issues: Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:817: WARNING: +Bullet list ends without a blank line; unexpected unindent. Documentation/gpu/driver-uapi:29: include/uapi/drm/xe_drm.h:835: WARNING: +Definition list ends without a blank line; unexpected unindent. Fixes: `75d37750a7` ("drm/xe/mmap: Add mmap support for PCI memory barrier") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/intel-xe/20250117164023.3fdc00b9@canb.auug.org.au/ Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Tested-by: Bagas Sanjaya <bagasdotme@gmail.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250117193827.91779-1-rodrigo.vivi@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-01-21 08:45:28 -05:00
Tejas Upadhyay	75d37750a7	drm/xe/mmap: Add mmap support for PCI memory barrier In order to avoid having userspace to use MI_MEM_FENCE, we are adding a mechanism for userspace to generate a PCI memory barrier with low overhead (avoiding IOCTL call as well as writing to VRAM will adds some overhead). This is implemented by memory-mapping a page as uncached that is backed by MMIO on the dGPU and thus allowing userspace to do memory write to the page without invoking an IOCTL. We are selecting the MMIO so that it is not accessible from the PCI bus so that the MMIO writes themselves are ignored, but the PCI memory barrier will still take action as the MMIO filtering will happen after the memory barrier effect. When we detect special defined offset in mmap(), We are mapping 4K page which contains the last of page of doorbell MMIO range to userspace for same purpose. For user to query special offset we are adding special flag in mmap_offset ioctl which needs to be passed as follows, struct drm_xe_gem_mmap_offset mmo = { .handle = 0, /* this must be 0 */ .flags = DRM_XE_MMAP_OFFSET_FLAG_PCI_BARRIER, }; igt_ioctl(fd, DRM_IOCTL_XE_GEM_MMAP_OFFSET, &mmo); map = mmap(NULL, size, PROT_WRITE, MAP_SHARED, fd, mmo); IGT : `b2dbc6f228` UMD : https://github.com/intel/compute-runtime/pull/772 V7: - Dgpu filter added V6(MAuld) - Move physical mmap to fault handler - Modify kernel-doc and attach UMD PR when ready V5(MAuld) - Return invalid early in case of non 4K PAGE_SIZE - Format kernel-doc and add note for 4K PAGE_SIZE HW limit V4(MAuld) - Add kernel-doc for uapi change - Restrict page size to 4K V3(MAuld) - Remove offset defination from UAPI to be able to change later - Edit commit message for special flag addition V2(MAuld) - Add fault handler with dummy page to handle unplug device - Add Build check for special offset to be below normal start page - Test d3hot, mapping seems to be valid in d3hot as well - Add more info to commit message Cc: Matthew Auld <matthew.auld@intel.com> Acked-by: Michal Mrozek <michal.mrozek@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250113114201.3178806-1-tejas.upadhyay@intel.com	2025-01-16 11:50:00 +00:00
Ashutosh Dixit	5637797add	drm/xe/oa/uapi: Expose an unblock after N reports OA property Expose an "unblock after N reports" OA property, to allow userspace threads to be woken up less frequently. Co-developed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241212224903.1853862-1-ashutosh.dixit@intel.com	2024-12-16 18:04:14 -08:00
Sai Teja Pottumuttu	720f63a838	drm/xe/oa/uapi: Make OA buffer size configurable Add a new property called DRM_XE_OA_PROPERTY_OA_BUFFER_SIZE to allow OA buffer size to be configurable from userspace. With this OA buffer size can be configured to any power of 2 size between 128KB and 128MB and it would default to 16MB in case the size is not supplied. v2: - Rebase v3: - Add oa buffer size to capabilities [Ashutosh] - Address several nitpicks [Ashutosh] - Fix commit message/subject [Ashutosh] BSpec: 61100, 61228 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241205041913.883767-2-sai.teja.pottumuttu@intel.com	2024-12-10 10:26:55 -08:00
Ashutosh Dixit	c8507a25ce	drm/xe/oa/uapi: Define and parse OA sync properties Now that we have laid the groundwork, introduce OA sync properties in the uapi and parse the input xe_sync array as is done elsewhere in the driver. Also add DRM_XE_OA_CAPS_SYNCS bit in OA capabilities for userspace. v2: Fix and document DRM_XE_SYNC_TYPE_USER_FENCE for OA (Matt B) Add DRM_XE_OA_CAPS_SYNCS bit to OA capabilities (Jose) Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241022200352.1192560-3-ashutosh.dixit@intel.com	2024-10-23 12:42:15 -07:00
Shekhar Chauhan	9ab440a9d0	drm/xe/ptl: L3bank mask is not available on the media GT On PTL platforms with media version 30.00, the fuse registers for reporting L3 bank availability to the GT just read out as ~0 and do not provide proper values. Xe does not use the L3 bank mask for anything internally; it only passes the mask through to userspace via the GT topology query. Since we don't have any way to get the real L3 bank mask, we don't want to pass garbage to userspace. Passing a zeroed mask or a copy of the primary GT's L3 bank mask would also be inaccurate and likely to cause confusion for userspace. The best approach is to simply not include L3 in the list of masks returned by the topology query in cases where we aren't able to provide a meaningful value. This won't change the behavior for any existing platforms (where we can always obtain L3 masks successfully for all GTs), it will only prevent us from mis-reporting bad information on upcoming platform(s). There's a good chance this will become a formal workaround in the future, but for now we don't have a lineage number so "no_media_l3" is used in place of a lineage as the OOB workaround descriptor. v2: - Re-calculate query size to properly match data returned. (Gustavo) - Update kerneldoc to clarify that the L3bank mask may not be included in the query results if the hardware doesn't make it available. (Gustavo) Cc: Matt Atwood <matthew.s.atwood@intel.com> Cc: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Co-developed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Acked-by: Francois Dugast <francois.dugast@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20241007154143.2021124-2-matthew.d.roper@intel.com	2024-10-08 06:56:51 -07:00
Geert Uytterhoeven	f2881dfdaa	drm/xe/oa/uapi: Make bit masks unsigned When building with gcc-5: In function ‘decode_oa_format.isra.26’, inlined from ‘xe_oa_set_prop_oa_format’ at drivers/gpu/drm/xe/xe_oa.c:1664:6: ././include/linux/compiler_types.h:510:38: error: call to ‘__compiletime_assert_1336’ declared with attribute error: FIELD_GET: mask is not constant [...] ./include/linux/bitfield.h:155:3: note: in expansion of macro ‘__BF_FIELD_CHECK’ __BF_FIELD_CHECK(_mask, _reg, 0U, "FIELD_GET: "); \ ^ drivers/gpu/drm/xe/xe_oa.c:1573:18: note: in expansion of macro ‘FIELD_GET’ u32 bc_report = FIELD_GET(DRM_XE_OA_FORMAT_MASK_BC_REPORT, fmt); ^ Fixes: `b6fd51c621` ("drm/xe/oa/uapi: Define and parse OA stream properties") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240729092634.2227611-1-geert+renesas@glider.be Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-07-30 13:45:38 -07:00
Lucas De Marchi	7108b4a589	drm/xe/uapi: Expose SIMD16 EU mask in topology query PVC, Xe2 and later platforms have 16-wide EUs. We were implicitly reporting for PVC the number of 16-wide EUs without giving userspace any hint that they were different than for other platforms. Xe2 and later also have 16-wide, but in those cases the reported number would correspond to the 8-wide count. To avoid confusion and make sure the right number is used by userspace depending on the platform, add a new item to the topology query and drop the one that is not available. The new mask reported for both PVC and Xe2 should now match the numbers reported via hwconfig. v2: Use a different topo item with EU type in its name to report the new mask instead of adding the type itself as the item (Matt Roper) Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Acked-by: Mateusz Jablonski <mateusz.jablonski@intel.com> Acked-by: Wenbin Lu <wenbin.lu@intel.com> Acked-by: Effie Yu <effie.yu@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240710220446.2169797-1-lucas.demarchi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-07-18 13:20:30 -07:00
Thomas Hellström	01e0cfc994	drm/xe: Use write-back caching mode for system memory on DGFX The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement. However, write-combined system memory is expensive to allocate and even though it is pooled, the pool is expensive to shrink, since it involves global CPU TLB flushes. Moreover write-combined system memory from TTM is only reliably available on x86 and DGFX doesn't have an x86 restriction. So regardless of the cpu caching mode selected for a bo, internally use write-back caching mode for system memory on DGFX. Coherency is maintained, but user-space clients may perceive a difference in cpu access speeds. v2: - Update RB- and Ack tags. - Rephrase wording in xe_drm.h (Matt Roper) v3: - Really rephrase wording. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `622f709ca6` ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra <pallavi.mishra@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: dri-devel@lists.freedesktop.org Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Effie Yu <effie.yu@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Jose Souza <jose.souza@intel.com> Cc: Michal Mrozek <michal.mrozek@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Acked-by: Matthew Auld <matthew.auld@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Fixes: `622f709ca6` ("drm/xe/uapi: Add support for CPU caching mode") Acked-by: Michal Mrozek <michal.mrozek@intel.com> Acked-by: Effie Yu <effie.yu@intel.com> #On chat Link: https://patchwork.freedesktop.org/patch/msgid/20240705132828.27714-1-thomas.hellstrom@linux.intel.com	2024-07-06 11:05:46 +02:00
Ashutosh Dixit	8169b2097d	drm/xe/uapi: Rename xe perf layer as xe observation layer In Xe, the perf layer allows capture of HW counter streams. These HW counters are generally performance related but don't have to be necessarily so. Also, the name "perf" is a carryover from i915 and is not preferred. Here we propose the name "observation" for this common layer which allows capture of different types of these counter streams. v2: Rename observability layer to observation layer (Lucas/Rodrigo) v3: Rename sysctl file to "observation_paranoid" (Jose) Fixes: `52c2e956dc` ("drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types") Fixes: `fe8929bdf8` ("drm/xe/perf/uapi: Add perf_stream_paranoid sysctl") Acked-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240703164801.2561423-1-ashutosh.dixit@intel.com	2024-07-03 16:46:02 -07:00
Ashutosh Dixit	406d058dc3	drm/xe/oa/uapi: Allow preemption to be disabled on the stream exec queue Mesa VK_KHR_performance_query use case requires preemption and timeslicing to be disabled for the stream exec queue. Implement this functionality here. v2: Minor change to debug print to print both ret values (Umesh) Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240626181817.1516229-3-ashutosh.dixit@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:25:46 -04:00
Ashutosh Dixit	7e5161da9d	drm/xe/oa: Fix kernel doc in xe_drm.h Fix kernel doc in xe_drm.h. Also eliminate private/non-abi enum definitions. v2: Remove __DRM_XE_PERF_TYPE_MAX since it is unused (Michal) v3: Also remove DRM_XE_OA_PROPERTY_MAX since it can also be eliminated (Michal) Suggested-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240623203119.3840283-1-ashutosh.dixit@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2024-06-26 18:24:38 -04:00
Ashutosh Dixit	dd6b4718c3	drm/xe/oa/uapi: Query OA unit properties Implement query for properties of OA units present on a device. v2: Clean up reserved/pad fields (Umesh) Follow the same scheme as other query structs v3: Skip reporting reserved engines attached to OA units v4: Expose oa_buf_size via DRM_XE_PERF_IOCTL_INFO (Umesh) v5: Don't expose capabilities as OR of properties (Umesh) v6: Add extensions to query output structs: drm_xe_oa_unit, drm_xe_query_oa_units and drm_xe_oa_stream_info v7: Change oa_units[] array to __u64 type Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-13-ashutosh.dixit@intel.com	2024-06-18 12:40:40 -07:00
Ashutosh Dixit	efb315d0a0	drm/xe/oa/uapi: Read file_operation Implement the OA stream read file_operation. Both blocking and non-blocking reads are supported. As part of read system call, the read copies OA perf data from the OA buffer to the user buffer, after appending packet headers for status and data packets. v2: Drop OA report headers, implement DRM_XE_PERF_IOCTL_STATUS (Umesh) v3: Introduce 'struct drm_xe_oa_stream_status' v4: Define oa_status register bitfields (Umesh) v5: Add extensions to 'struct drm_xe_oa_stream_status' v6: Minor cleanup, eliminate report32 variable v7: Use -EIO to signal to userspace to read OASTATUS using DRM_XE_PERF_IOCTL_STATUS, change previous sites returning -EIO to return -EINVAL Make drm_xe_oa_stream_status bits contiguous (Jose, Umesh) rmw oa_status bits (Umesh) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-10-ashutosh.dixit@intel.com	2024-06-18 12:40:36 -07:00
Ashutosh Dixit	e936f885f1	drm/xe/oa/uapi: Expose OA stream fd The OA stream open perf op returns an fd with its own file_operations for the newly initialized OA stream. These file_operations allow userspace to enable or disable the stream, as well as apply a different metric configuration for the OA stream. Userspace can also poll for data availability. OA stream initialization is completed in this commit by enabling the OA stream. When sampling is enabled this starts a hrtimer which periodically checks for data availablility. v2: Use stream properties for stream reconfiguration with DRM_XE_PERF_IOCTL_CONFIG v3: Hold runtime_pm reference across oa buffer alloc/free v4: Fix 32 bit build Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-9-ashutosh.dixit@intel.com	2024-06-18 12:40:35 -07:00
Ashutosh Dixit	b6fd51c621	drm/xe/oa/uapi: Define and parse OA stream properties Properties for OA streams are specified by user space, when the stream is opened, as a chain of drm_xe_ext_set_property struct's. Parse and validate these stream properties. v2: Remove struct drm_xe_oa_open_param (Harish Chegondi) Drop DRM_XE_OA_PROPERTY_POLL_OA_PERIOD_US (Umesh) Eliminate comparison with xe_oa_max_sample_rate (Umesh) Drop 'struct drm_xe_oa_record_header' (Umesh) v3: s/DRM_XE_OA_PROPERTY_OA_EXPONENT/ \ DRM_XE_OA_PROPERTY_OA_PERIOD_EXPONENT/ (Jose) v4: Fix 32 bit build v5: Add non-static function kernel doc (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-7-ashutosh.dixit@intel.com	2024-06-18 12:40:32 -07:00
Ashutosh Dixit	cdf02fe1a9	drm/xe/oa/uapi: Add/remove OA config perf ops Introduce add/remove config perf ops for OA. OA configurations consist of a set of event/counter select register address/value pairs. The add_config perf op validates and stores such configurations and also exposes them in the metrics sysfs. These configurations will be programmed to OA unit HW when an OA stream using a configuration is opened. The OA stream can also switch to other stored configurations. v2: Start config id's from 1 and other minor review comments (Umesh) v3: Add 32 bit build v4: Add kernel doc for non-static functions (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-6-ashutosh.dixit@intel.com	2024-06-18 12:40:31 -07:00
Ashutosh Dixit	a9f905ae7b	drm/xe/oa/uapi: Initialize OA units Initialize OA unit data struct's for each gt during device probe. Also assign OA units for hardware engines. v2: Remove XE_OA_UNIT_OAG/XE_OA_UNIT_OAM_SAMEDIA_0 enum (Umesh) Change mtl_oa_base to 0x13000 (Umesh) v3: Switch to drmm_ functions and other cleanups (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-5-ashutosh.dixit@intel.com	2024-06-18 12:40:29 -07:00
Ashutosh Dixit	67977882a2	drm/xe/oa/uapi: Add OA data formats Add and initialize supported OA data formats for various platforms (including Xe2). User can request OA data in any supported format. Bspec: 52198, 60942, 61101 v2: Start 'xe_oa_format_name' enum from 0 (Umesh) Fix error rewind with OA (Umesh) v3: Use graphics versions rather than absolute platform names v4: Add missing kernel doc for struct memebers and enum and other minor changes (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-4-ashutosh.dixit@intel.com	2024-06-18 12:40:27 -07:00
Ashutosh Dixit	52c2e956dc	drm/xe/perf/uapi: "Perf" layer to support multiple perf counter stream types In Xe, the plan is to support multiple types of perf counter streams (OA is only one type of these streams). Rather than introduce NxM ioctls for these (N perf streams with M ioctl's per perf stream), we decide to multiplex these (N different stream types and the M ops for each of these stream types) through a single PERF ioctl. This multiplexing is the purpose of the PERF layer. In addition to PERF DRM ioctl's, another set of ioctl's on the PERF fd are defined. These are expected to be common to different PERF stream types and therefore defined at the PERF layer itself. v2: Add param_size to 'struct drm_xe_perf_param' (Umesh) v3: Rename 'enum drm_xe_perf_ops' to 'enum drm_xe_perf_ioctls' (Guy Zadicario) Add DRM_ prefix to ioctl names to indicate uapi names v4: Add 'enum drm_xe_perf_op' previously missed out (Guy Zadicario) v5: Squash the ops and PERF layer patches into a single patch (Umesh) Remove param_size from struct 'drm_xe_perf_param' (Umesh) v6: Add DRM_XE_PERF_IOCTL_STATUS v7: Add DRM_XE_PERF_IOCTL_INFO v8: Fix Copyright years, fix DRM_XE_PERF_TYPE_MAX, move '#include "xe_perf.h"' to xe_perf.c, add kernel doc (Michal) Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Guy Zadicario <gzadicario@habana.ai> Acked-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240618014609.3233427-2-ashutosh.dixit@intel.com	2024-06-18 12:40:24 -07:00

1 2 3 4

151 Commits