linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 14:53:58 -04:00

Author	SHA1	Message	Date
Matthew Auld	bfe9e314d7	drm/xe: always keep track of remap prev/next During 3D workload, user is reporting hitting: [ 413.361679] WARNING: drivers/gpu/drm/xe/xe_vm.c:1217 at vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe], CPU#7: vkd3d_queue/9925 [ 413.361944] CPU: 7 UID: 1000 PID: 9925 Comm: vkd3d_queue Kdump: loaded Not tainted 7.0.0-070000rc3-generic #202603090038 PREEMPT(lazy) [ 413.361949] RIP: 0010:vm_bind_ioctl_ops_unwind+0x1e2/0x2e0 [xe] [ 413.362074] RSP: 0018:ffffd4c25c3df930 EFLAGS: 00010282 [ 413.362077] RAX: 0000000000000000 RBX: ffff8f3ee817ed10 RCX: 0000000000000000 [ 413.362078] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 413.362079] RBP: ffffd4c25c3df980 R08: 0000000000000000 R09: 0000000000000000 [ 413.362081] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8f41fbf99380 [ 413.362082] R13: ffff8f3ee817e968 R14: 00000000ffffffef R15: ffff8f43d00bd380 [ 413.362083] FS: 00000001040ff6c0(0000) GS:ffff8f4696d89000(0000) knlGS:00000000330b0000 [ 413.362085] CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 [ 413.362086] CR2: 00007ddfc4747000 CR3: 00000002e6262005 CR4: 0000000000f72ef0 [ 413.362088] PKRU: 55555554 [ 413.362089] Call Trace: [ 413.362092] <TASK> [ 413.362096] xe_vm_bind_ioctl+0xa9a/0xc60 [xe] Which seems to hint that the vma we are re-inserting for the ops unwind is either invalid or overlapping with something already inserted in the vm. It shouldn't be invalid since this is a re-insertion, so must have worked before. Leaving the likely culprit as something already placed where we want to insert the vma. Following from that, for the case where we do something like a rebind in the middle of a vma, and one or both mapped ends are already compatible, we skip doing the rebind of those vma and set next/prev to NULL. As well as then adjust the original unmap va range, to avoid unmapping the ends. However, if we trigger the unwind path, we end up with three va, with the two ends never being removed and the original va range in the middle still being the shrunken size. If this occurs, one failure mode is when another unwind op needs to interact with that range, which can happen with a vector of binds. For example, if we need to re-insert something in place of the original va. In this case the va is still the shrunken version, so when removing it and then doing a re-insert it can overlap with the ends, which were never removed, triggering a warning like above, plus leaving the vm in a bad state. With that, we need two things here: 1) Stop nuking the prev/next tracking for the skip cases. Instead relying on checking for skip prev/next, where needed. That way on the unwind path, we now correctly remove both ends. 2) Undo the unmap va shrinkage, on the unwind path. With the two ends now removed the unmap va should expand back to the original size again, before re-insertion. v2: - Update the explanation in the commit message, based on an actual IGT of triggering this issue, rather than conjecture. - Also undo the unmap shrinkage, for the skip case. With the two ends now removed, the original unmap va range should expand back to the original range. v3: - Track the old start/range separately. vma_size/start() uses the va info directly. Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7602 Fixes: `8f33b4f054` ("drm/xe: Avoid doing rebinds") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260318100208.78097-2-matthew.auld@intel.com (cherry picked from commit `aec6969f75`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-25 08:05:33 -04:00
Matt Roper	56781a4597	drm/xe: Implement recent spec updates to Wa_16025250150 The hardware teams noticed that the originally documented workaround steps for Wa_16025250150 may not be sufficient to fully avoid a hardware issue. The workaround documentation has been augmented to suggest programming one additional register; make the corresponding change in the driver. Fixes: `7654d51f1f` ("drm/xe/xe2hpg: Add Wa_16025250150") Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20260319-wa_16025250150_part2-v1-1-46b1de1a31b2@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `a31566762d`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-24 09:29:10 -04:00
Michał Winiarski	87997b6c65	drm/xe/pf: Fix use-after-free in migration restore When an error is returned from xe_sriov_pf_migration_restore_produce(), the data pointer is not set to NULL, which can trigger use-after-free in subsequent .write() calls. Set the pointer to NULL upon error to fix the problem. Fixes: `1ed30397c0` ("drm/xe/pf: Add support for encap/decap of bitstream to/from packet") Reported-by: Sebastian Österlund <sebastian.osterlund@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7230 Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260217154118.176902-1-michal.winiarski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit `4f53d8c6d2`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-23 10:10:50 -04:00
Sanjay Yadav	65d046b2d8	drm/xe: Fix missing runtime PM reference in ccs_mode_store ccs_mode_store() calls xe_gt_reset() which internally invokes xe_pm_runtime_get_noresume(). That function requires the caller to already hold an outer runtime PM reference and warns if none is held: [46.891177] xe 0000:03:00.0: [drm] Missing outer runtime PM protection [46.891178] WARNING: drivers/gpu/drm/xe/xe_pm.c:885 at xe_pm_runtime_get_noresume+0x8b/0xc0 Fix this by protecting xe_gt_reset() with the scope-based guard(xe_pm_runtime)(xe), which is the preferred form when the reference lifetime matches a single scope. v2: - Use scope-based guard(xe_pm_runtime)(xe) (Shuicheng) - Update commit message accordingly Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7593 Fixes: `480b358e7d` ("drm/xe: Do not wake device during a GT reset") Cc: <stable@vger.kernel.org> # v6.19+ Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260313071608.3459480-2-sanjay.kumar.yadav@intel.com (cherry picked from commit `7937ea733f`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 18:05:04 +01:00
Matthew Brost	01f2557aa6	drm/xe: Open-code GGTT MMIO access protection GGTT MMIO access is currently protected by hotplug (drm_dev_enter), which works correctly when the driver loads successfully and is later unbound or unloaded. However, if driver load fails, this protection is insufficient because drm_dev_unplug() is never called. Additionally, devm release functions cannot guarantee that all BOs with GGTT mappings are destroyed before the GGTT MMIO region is removed, as some BOs may be freed asynchronously by worker threads. To address this, introduce an open-coded flag, protected by the GGTT lock, that guards GGTT MMIO access. The flag is cleared during the dev_fini_ggtt devm release function to ensure MMIO access is disabled once teardown begins. Cc: stable@vger.kernel.org Fixes: `919bb54e98` ("drm/xe: Fix missing runtime outer protection for ggtt_remove_node") Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-8-zhanjun.dong@intel.com (cherry picked from commit `4f3a998a17`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 17:13:54 +01:00
Umesh Nerlige Ramappa	e6e3ea52bf	drm/xe/lrc: Fix uninitialized new_ts when capturing context timestamp Getting engine specific CTX TIMESTAMP register can fail. In that case, if the context is active, new_ts is uninitialized. Fix that case by initializing new_ts to the last value that was sampled in SW - lrc->ctx_timestamp. Flagged by static analysis. v2: Fix new_ts initialization (Ashutosh) Fixes: `bb63e7257e` ("drm/xe: Avoid toggling schedule state to check LRC timestamp in TDR") Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20260312125308.3126607-2-umesh.nerlige.ramappa@intel.com (cherry picked from commit `466e75d480`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:23:04 +01:00
Ashutosh Dixit	9be6fd9fbd	drm/xe/oa: Allow reading after disabling OA stream Some OA data might be present in the OA buffer when OA stream is disabled. Allow UMD's to retrieve this data, so that all data till the point when OA stream is disabled can be retrieved. v2: Update tail pointer after disable (Umesh) Fixes: `efb315d0a0` ("drm/xe/oa/uapi: Read file_operation") Cc: stable@vger.kernel.org Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa<umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20260313053630.3176100-1-ashutosh.dixit@intel.com (cherry picked from commit `4ff57c5e8d`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:58 +01:00
Brian Nguyen	38b8dcde23	drm/xe: Skip over non leaf pte for PRL generation The check using xe_child->base.children was insufficient in determining if a pte was a leaf node. So explicitly skip over every non-leaf pt and conditionally abort if there is a scenario where a non-leaf pt is interleaved between leaf pt, which results in the page walker skipping over some leaf pt. Note that the behavior being targeted for abort is PD[0] = 2M PTE PD[1] = PT -> 512 4K PTEs PD[2] = 2M PTE results in abort, page walker won't descend PD[1]. With new abort, ensuring valid PRL before handling a second abort. v2: - Revert to previous assert. - Revised non-leaf handling for interleaf child pt and leaf pte. - Update comments to specifications. (Stuart) - Remove unnecessary XE_PTE_PS64. (Matthew B) v3: - Modify secondary abort to only check non-leaf PTEs. (Matthew B) Fixes: `b912138df2` ("drm/xe: Create page reclaim list on unbind") Signed-off-by: Brian Nguyen <brian3.nguyen@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Link: https://patch.msgid.link/20260305171546.67691-6-brian3.nguyen@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `1d12358752`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:53 +01:00
Zhanjun Dong	7838dd8367	drm/xe/guc: Ensure CT state transitions via STOP before DISABLED The GuC CT state transition requires moving to the STOP state before entering the DISABLED state. Update the driver teardown sequence to make the proper state machine transitions. Fixes: `ee4b32220a` ("drm/xe/guc: Add devm release action to safely tear down CT") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-6-zhanjun.dong@intel.com (cherry picked from commit `dace8cb003`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:39 +01:00
Matthew Brost	e0f82655df	drm/xe: Trigger queue cleanup if not in wedged mode 2 The intent of wedging a device is to allow queues to continue running only in wedged mode 2. In other modes, queues should initiate cleanup and signal all remaining fences. Fix xe_guc_submit_wedge to correctly clean up queues when wedge mode != 2. Fixes: `7dbe8af13c` ("drm/xe: Wedge the entire device") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-4-zhanjun.dong@intel.com (cherry picked from commit `e25ba41c82`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:33 +01:00
Matthew Brost	fb3738693c	drm/xe: Forcefully tear down exec queues in GuC submit fini In GuC submit fini, forcefully tear down any exec queues by disabling CTs, stopping the scheduler (which cleans up lost G2H), killing all remaining queues, and resuming scheduling to allow any remaining cleanup actions to complete and signal any remaining fences. Split guc_submit_fini into device related and software only part. Using device-managed and drm-managed action guarantees the correct ordering of cleanup. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-3-zhanjun.dong@intel.com (cherry picked from commit `a6ab444a11`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:28 +01:00
Matthew Brost	26c638d560	drm/xe: Always kill exec queues in xe_guc_submit_pause_abort xe_guc_submit_pause_abort is intended to be called after something disastrous occurs (e.g., VF migration fails, device wedging, or driver unload) and should immediately trigger the teardown of remaining submission state. With that, kill any remaining queues in this function. Fixes: `7c4b7e34c8` ("drm/xe/vf: Abort VF post migration recovery on failure") Cc: stable@vger.kernel.org Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260310225039.1320161-2-zhanjun.dong@intel.com (cherry picked from commit `78f3bf00be`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:22 +01:00
Daniele Ceraolo Spurio	9b72283ec9	drm/xe/guc: Fail immediately on GuC load error By using the same variable for both the return of poll_timeout_us and the return of the polled function guc_wait_ucode, the return value of the latter is overwritten and lost after exiting the polling loop. Since guc_wait_ucode returns -1 on GuC load failure, we lose that information and always continue as if the GuC had been loaded correctly. This is fixed by simply using 2 separate variables. Fixes: `a4916b4da4` ("drm/xe/guc: Refactor GuC load to use poll_timeout_us()") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://patch.msgid.link/20260303001732.2540493-2-daniele.ceraolospurio@intel.com (cherry picked from commit `c85ec5c575`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-03-19 14:22:17 +01:00
Varun Gupta	0cfe9c4838	drm/xe: Fix memory leak in xe_vm_madvise_ioctl When check_bo_args_are_sane() validation fails, jump to the new free_vmas cleanup label to properly free the allocated resources. This ensures proper cleanup in this error path. Fixes: `293032eec4` ("drm/xe/bo: Update atomic_access attribute on madvise") Cc: stable@vger.kernel.org # v6.18+ Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Varun Gupta <varun.gupta@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260223175145.1532801-1-varun.gupta@intel.com Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com> (cherry picked from commit `29bd06faf7`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-04 08:54:19 -05:00
Shuicheng Lin	3091723785	drm/xe/reg_sr: Fix leak on xa_store failure Free the newly allocated entry when xa_store() fails to avoid a memory leak on the error path. v2: use goto fail_free. (Bala) Fixes: `e5283bd4df` ("drm/xe/reg_sr: Remove register pool") Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260204172810.1486719-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `6bc6fec71a`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-04 08:54:19 -05:00
Matt Roper	89865e6dc8	drm/xe/xe2_hpg: Correct implementation of Wa_16025250150 Wa_16025250150 asks us to set five register fields of the register to 0x1 each. However we were just OR'ing this into the existing register value (which has a default of 0x4 for each nibble-sized field) resulting in final field values of 0x5 instead of the desired 0x1. Correct the RTP programming (use FIELD_SET instead of SET) to ensure each field is assigned to exactly the value we want. Cc: Aradhya Bhatia <aradhya.bhatia@intel.com> Cc: Tejas Upadhyay <tejas.upadhyay@intel.com> Cc: stable@vger.kernel.org # v6.16+ Fixes: `7654d51f1f` ("drm/xe/xe2hpg: Add Wa_16025250150") Reviewed-by: Ngai-Mint Kwan <ngai-mint.kwan@linux.intel.com> Link: https://patch.msgid.link/20260227164341.3600098-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `d139209ef8`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-04 08:54:18 -05:00
Zhanjun Dong	b3368ecca9	drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure xe_gsc_proxy_remove undoes what is done in both xe_gsc_proxy_init and xe_gsc_proxy_start; however, if we fail between those 2 calls, it is possible that the HW forcewake access hasn't been initialized yet and so we hit errors when the cleanup code tries to write GSC register. To avoid that, split the cleanup in 2 functions so that the HW cleanup is only called if the HW setup was completed successfully. Since the HW cleanup (interrupt disabling) is now removed from xe_gsc_proxy_remove, the cleanup on error paths in xe_gsc_proxy_start must be updated to disable interrupts before returning. Fixes: `ff6cd29b69` ("drm/xe: Cleanup unwind of gt initialization") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260220225308.101469-1-zhanjun.dong@intel.com (cherry picked from commit `2b37c401b2`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-04 08:54:18 -05:00
Tomasz Lis	99f9b5343c	drm/xe/queue: Call fini on exec queue creation fail Every call to queue init should have a corresponding fini call. Skipping this would mean skipping removal of the queue from GuC list (which is part of guc_id allocation). A damaged queue stored in exec_queue_lookup list would lead to invalid memory reference, sooner or later. Call fini to free guc_id. This must be done before any internal LRCs are freed. Since the finalization with this extra call became very similar to __xe_exec_queue_fini(), reuse that. To make this reuse possible, alter xe_lrc_put() so it can survive NULL parameters, like other similar functions. v2: Reuse _xe_exec_queue_fini(). Make xe_lrc_put() aware of NULLs. Fixes: `3c1fa4aa60` ("drm/xe: Move queue init before LRC creation") Signed-off-by: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> (v1) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260226212701.2937065-2-tomasz.lis@intel.com (cherry picked from commit `393e5fea6f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-02 11:12:40 -05:00
Shuicheng Lin	e377182f02	drm/xe/configfs: Free ctx_restore_mid_bb in release ctx_restore_mid_bb memory is allocated in wa_bb_store(), but xe_config_device_release() only frees ctx_restore_post_bb. Free ctx_restore_mid_bb[0].cs as well to avoid leaking the allocation when the configfs device is removed. Fixes: `b30d5de3d4` ("drm/xe/configfs: Add mid context restore bb") Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Nitin Gote <nitin.r.gote@intel.com> Link: https://patch.msgid.link/20260225013448.3547687-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `a235e7d009`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-02 11:12:34 -05:00
Matthew Brost	cdc8a1e11f	drm/xe: Do not preempt fence signaling CS instructions If a batch buffer is complete, it makes little sense to preempt the fence signaling instructions in the ring, as the largest portion of the work (the batch buffer) is already done and fence signaling consists of only a few instructions. If these instructions are preempted, the GuC would need to perform a context switch just to signal the fence, which is costly and delays fence signaling. Avoid this scenario by disabling preemption immediately after the BB start instruction and re-enabling it after executing the fence signaling instructions. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Carlos Santa <carlos.santa@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260115004546.58060-1-matthew.brost@intel.com (cherry picked from commit `2bcbf2dcde`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-03-02 11:12:28 -05:00
Shuicheng Lin	0879c3f04f	drm/xe/sync: Fix user fence leak on alloc failure When dma_fence_chain_alloc() fails, properly release the user fence reference to prevent a memory leak. Fixes: `0995c2fc39` ("drm/xe: Enforce correct user fence signaling order using") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260219233516.2938172-6-shuicheng.lin@intel.com (cherry picked from commit `a5d5634cde`) Cc: stable@vger.kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-23 16:48:43 -05:00
Shuicheng Lin	1bfd757509	drm/xe/sync: Cleanup partially initialized sync on parse failure xe_sync_entry_parse() can allocate references (syncobj, fence, chain fence, or user fence) before hitting a later failure path. Several of those paths returned directly, leaving partially initialized state and leaking refs. Route these error paths through a common free_sync label and call xe_sync_entry_cleanup(sync) before returning the error. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260219233516.2938172-5-shuicheng.lin@intel.com (cherry picked from commit `f939bdd920`) Cc: stable@vger.kernel.org Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-23 16:48:29 -05:00
Matt Roper	43d37df67f	drm/xe/wa: Steer RMW of MCR registers while building default LRC When generating the default LRC, if a register is not masked, we apply any save-restore programming necessary via a read-modify-write sequence that will ensure we only update the relevant bits/fields without clobbering the rest of the register. However some of the registers that need to be updated might be MCR registers which require steering to a non-terminated instance to ensure we can read back a valid, non-zero value. The steering of reads originating from a command streamer is controlled by register CS_MMIO_GROUP_INSTANCE_SELECT. Emit additional MI_LRI commands to update the steering before any RMW of an MCR register to ensure the reads are performed properly. Note that needing to perform a RMW of an MCR register while building the default LRC is pretty rare. Most of the MCR registers that are part of an engine's LRCs are also masked registers, so no MCR is necessary. Fixes: `f2f90989cc` ("drm/xe: Avoid reading RMW registers in emit_wa_job") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://patch.msgid.link/20260206223058.387014-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `6c2e331c91`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-23 13:54:48 -05:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/\(alloc_objs(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	d4a292c5f8	Merge tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "This is the fixes and cleanups for the end of the merge window, it's nearly all amdgpu, with some amdkfd, then a pagemap core fix, i915/xe display fixes, and some xe driver fixes. Nothing seems out of the ordinary, except amdgpu is a little more volume than usual. pagemap: - drm/pagemap: pass pagemap_addr by reference amdgpu: - DML 2.1 fixes - Panel replay fixes - Display writeback fixes - MES 11 old firmware compat fix - DC CRC improvements - DPIA fixes - XGMI fixes - ASPM fix - SMU feature bit handling fixes - DC LUT fixes - RAS fixes - Misc memory leak in error path fixes - SDMA queue reset fixes - PG handling fixes - 5 level GPUVM page table fix - SR-IOV fix - Queue reset fix - SMU 13.x fixes - DC resume lag fix - MPO fixes - DCN 3.6 fix - VSDB fixes - HWSS clean up - Replay fixes - DCE cursor fixes - DCN 3.5 SR DDR5 latency fixes - HPD fixes - Error path unwind fixes - SMU13/14 mode1 reset fixes - PSP 15 updates - SMU 15 updates - Sync fix in amdgpu_dma_buf_move_notify() - HAINAN fix - PSP 13.x fix - GPUVM locking fix - Fixes for DC analog support - DC FAMS fixes - DML 2.1 fixes - eDP fixes - Misc DC fixes - Fastboot fix - 3DLUT fixes - GPUVM fixes - 64bpp format fix - Fix for MacBooks with switchable gfx amdkfd: - Fix possible double deletion of validate list - Event setup fix - Device disconnect regression fix - APU GTT as VRAM fix - Fix piority inversion with MQDs - NULL check fix radeon: - HAINAN fix i915/xe display: - Regresion fix for HDR 4k displays (#15503) - Fixup for Dell XPS 13 7390 eDP rate limit - Memory leak fix on ACPI _DSM handling - Add missing slice count check during DP mode validation xe: - drm/xe: Prevent VFs from exposing the CCS mode sysfs file - SRIOV related fixes - PAT cache fix - MMIO read fix - W/a fixes - Adjust type of xe_modparam.force_vram_bar_size - Wedge mode fix - HWMon fix * tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel: (143 commits) drm/amd/display: Remove unneeded DAC link encoder register drm/amd/display: Enable DAC in DCE link encoder drm/amd/display: Set CRTC source for DAC using registers drm/amd/display: Initialize DAC in DCE link encoder using VBIOS drm/amd/display: Turn off DAC in DCE link encoder using VBIOS drm/amd/display: Don't call find_analog_engine() twice drm/amdgpu: fix 4-level paging if GMC supports 57-bit VA v2 drm/amdgpu: keep vga memory on MacBooks with switchable graphics drm/amdgpu: Set atomics to true for xgmi drm/amdkfd: Check for NULL return values drm/amd/display: Use same max plane scaling limits for all 64 bpp formats drm/amdgpu: Set vmid0 PAGE_TABLE_DEPTH for GFX12.1 drm/amdkfd: Disable MQD queue priority drm/amd/display: Remove conditional for shaper 3DLUT power-on drm/amd/display: Check return of shaper curve to HW format drm/amd/display: Correct logic check error for fastboot drm/amd/display: Skip eDP detection when no sink Revert "drm/amd/display: Add Gfx Base Case For Linear Tiling Handling" Revert "drm/amd/display: Correct hubp GfxVersion verification" Revert "drm/amd/display: Add Handling for gfxversion DcGfxBase" ...	2026-02-20 15:36:38 -08:00
Nareshkumar Gollakoti	2d01d88a53	drm/xe: Prevent VFs from exposing the CCS mode sysfs file Skip creating CCS sysfs files in VF mode to ensure VFs do not try to change CCS mode, as it is predefined and immutable in the SR-IOV mode. Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Nareshkumar Gollakoti <naresh.kumar.g@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260202170810.1393147-5-naresh.kumar.g@intel.com (cherry picked from commit `4e8f602ac3`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:42:00 -05:00
Karthik Poosa	93d08a1cfc	drm/xe/hwmon: Prevent unintended VRAM channel creation Remove the unnecessary VRAM channel entry introduced in xe_hwmon_channel. Without this, adding any new hwmon channel causes extra VRAM channel to appear. This remained unnoticed earlier because VRAM was the final xe hwmon channel. v2: Use MAX_VRAM_CHANNELS with in_range() instead of CHANNEL_VRAM_N_MAX. (Raag) Fixes: `49a4983384` ("drm/xe/hwmon: Expose individual VRAM channel temperature") Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Link: https://patch.msgid.link/20260206081655.2115439-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `48eb073c7d`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:50 -05:00
Arnd Bergmann	b61d565166	drm/pagemap: pass pagemap_addr by reference Passing a structure by value into a function is sometimes problematic, for a number of reasons. Of of these is a warning from the 32-bit arm compiler: drivers/gpu/drm/drm_gpusvm.c: In function '__drm_gpusvm_unmap_pages': drivers/gpu/drm/drm_gpusvm.c:1152:33: note: parameter passing for argument of type 'struct drm_pagemap_addr' changed in GCC 9.1 1152 \| dpagemap->ops->device_unmap(dpagemap, \| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 1153 \| dev, *addr); \| ~~~~~~~~~~~ This particular problem is harmless since we are not mixing compiler versions inside of the compiler. However, passing this by reference avoids the warning along with providing slightly better calling conventions as it avoids an extra copy on the stack. Fixes: `75af93b3f5` ("drm/pagemap, drm/xe: Support destination migration over interconnect") Fixes: `2df55d9e66` ("drm/xe: Support pcie p2p dma as a fast interconnect") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://patch.msgid.link/20260216134644.1025365-1-arnd@kernel.org Acked-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> (cherry picked from commit `95162db020`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:44 -05:00
Raag Jadav	4e83a8d58e	drm/xe/bo: Redirect faults to dummy page for wedged device As per uapi documentation[1], the prerequisite for wedged device is to redirected page faults to a dummy page. Follow it. [1] Documentation/gpu/drm-uapi.rst v2: Add uapi reference and fixes tag (Matthew Brost) Fixes: `7bc00751f8` ("drm/xe: Use device wedged event") Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20260212055622.2054991-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `c020fff70d`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:38 -05:00
Shuicheng Lin	1acec6ef05	drm/xe: Make xe_modparam.force_vram_bar_size signed vram_bar_size is registered as an int module parameter and is documented to accept negative values to disable BAR resizing. Store it as an int in xe_modparam as well, so negative values work as intended and the module_param type matches. Fixes: `80742a1aa2` ("drm/xe: Allow to drop vram resizing") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202181853.1095736-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `25c9aa4dcb`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:30 -05:00
Piotr Piórkowski	5e905ec672	drm/xe/vf: Avoid reading media version when media GT is disabled When the media GT is not allowed, a VF must not attempt to read the media version from the GuC. The GuC may not be loaded, and any attempt to communicate with it would result in a timeout and a VF probe failure: (...) [ 1912.406046] xe 0000:01:00.1: [drm] ERROR Tile0: GT1: GuC mmio request 0x5507: no reply 0x5507 [ 1912.407277] xe 0000:01:00.1: [drm] ERROR Tile0: GT1: [GUC COMMUNICATION] MMIO send failed (-ETIMEDOUT) [ 1912.408689] xe 0000:01:00.1: [drm] ERROR VF: Tile0: GT1: Failed to reset GuC state (-ETIMEDOUT) [ 1912.413986] xe 0000:01:00.1: probe with driver xe failed with error -110 Let's skip reading the media version for VFs when the media GT is not allowed. v2: move the condition directly to the VF path Fixes: `7abd69278b` ("drm/xe/configfs: Add attribute to disable GT types") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260202115041.2863357-1-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit `0bcacf56dc`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:23 -05:00
Matt Roper	bc6387a2e0	drm/xe/xe2_hpg: Fix handling of Wa_14019988906 & Wa_14019877138 The PSS_CHICKEN register has been part of the RCS engine's LRC since it was first introduced in Xe_LP. That means that any workarounds that adjust its value (such as Wa_14019988906 and Wa_14019877138) need to be implemented in the lrc_was[] table so that they become part of the default LRC from which all subsequent LRCs are copied. Although these workarounds were implemented correctly on most platforms, they were incorrectly placed on the engine_was[] table for Xe2_HPG. Move the workarounds to the proper lrc_was[] table and switch the 'xe_rtp_match_first_render_or_compute' rule to specifically match the RCS since that's the engine whose LRC manages the register. Bspec: 65182 Fixes: `7f3ee7d880` ("drm/xe/xe2hpg: Add initial GT workarounds") Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Link: https://patch.msgid.link/20260205220508.51905-2-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `e04c609eed`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:17 -05:00
Shuicheng Lin	4a9b4e1fa5	drm/xe/mmio: Avoid double-adjust in 64-bit reads xe_mmio_read64_2x32() was adjusting register addresses and then calling xe_mmio_read32(), which applies the adjustment again. This may shift accesses twice if adj_offset < adj_limit. There is no issue currently, as for media gt, adj_offset > adj_limit, so the 2nd adjust will be a no-op. But it may not work in future. To fix it, replace the adjusted-address comparison with a direct sanity check that ensures the MMIO address adjustment cutoff never falls within the 8-byte range of a 64-bit register. And let xe_mmio_read32() handle address translation. v2: rewrite the sanity check in a more natural way. (Matt) v3: Add Fixes tag. (Jani) Fixes: `07431945d8` ("drm/xe: Avoid 64-bit register reads") Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260130165621.471408-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `a30f999681`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:10 -05:00
Jia Yao	fbbe32618e	drm/xe: Add bounds check on pat_index to prevent OOB kernel read in madvise When user provides a bogus pat_index value through the madvise IOCTL, the xe_pat_index_get_coh_mode() function performs an array access without validating bounds. This allows a malicious user to trigger an out-of-bounds kernel read from the xe->pat.table array. The vulnerability exists because the validation in madvise_args_are_sane() directly calls xe_pat_index_get_coh_mode(xe, args->pat_index.val) without first checking if pat_index is within [0, xe->pat.n_entries). Although xe_pat_index_get_coh_mode() has a WARN_ON to catch this in debug builds, it still performs the unsafe array access in production kernels. v2(Matthew Auld) - Using array_index_nospec() to mitigate spectre attacks when the value is used v3(Matthew Auld) - Put the declarations at the start of the block Fixes: `ada7486c56` ("drm/xe: Implement madvise ioctl for xe") Reviewed-by: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.18+ Cc: Matthew Brost <matthew.brost@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Jia Yao <jia.yao@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260205161529.1819276-1-jia.yao@intel.com (cherry picked from commit `944a3329b0`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:39:04 -05:00
Michal Wajdeczko	2a673fb4d7	drm/xe/configfs: Fix 'parameter name omitted' errors On some configs and old compilers we can get following build errors: ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_mid_bb': ../drivers/gpu/drm/xe/xe_configfs.h:40:76: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_mid_bb(struct pci_dev pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ ../drivers/gpu/drm/xe/xe_configfs.h: In function 'xe_configfs_get_ctx_restore_post_bb': ../drivers/gpu/drm/xe/xe_configfs.h:42:77: error: parameter name omitted static inline u32 xe_configfs_get_ctx_restore_post_bb(struct pci_dev pdev, enum xe_engine_class, ^~~~~~~~~~~~~~~~~~~~ when trying to define our configfs stub functions. Fix that. Fixes: `7a4756b2fd` ("drm/xe/lrc: Allow to add user commands mid context switch") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203193745.576-1-michal.wajdeczko@intel.com (cherry picked from commit `f59cde8a24`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:38:57 -05:00
Michal Wajdeczko	bf7172cd25	drm/xe/pf: Fix sysfs initialization In case of devm_add_action_or_reset() failure the provided cleanup action will be run immediately on the not yet initialized kobject. This may lead to errors like: [ ] kobject: '(null)' (ff110001393608e0): is not initialized, yet kobject_put() is being called. [ ] WARNING: lib/kobject.c:734 at kobject_put+0xd9/0x250, CPU#0: kworker/0:0/9 [ ] RIP: 0010:kobject_put+0xdf/0x250 [ ] Call Trace: [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 [ ] refcount_t: underflow; use-after-free. [ ] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x68/0xb0, CPU#0: kworker/0:0/9 [ ] RIP: 0010:refcount_warn_saturate+0x68/0xb0 [ ] Call Trace: [ ] kobject_put+0x174/0x250 [ ] xe_sriov_pf_sysfs_init+0x21/0x100 [xe] [ ] xe_sriov_pf_init_late+0x87/0x2b0 [xe] [ ] xe_sriov_init_late+0x5f/0x2c0 [xe] [ ] xe_device_probe+0x5f2/0xc20 [xe] [ ] xe_pci_probe+0x396/0x610 [xe] [ ] local_pci_probe+0x47/0xb0 Fix that by calling kobject_init() and kobject_add() separately and register cleanup action after the kobject is initialized. Also make this cleanup registration a part of the create helper to fix another mistake, as in the loop we were wrongly passing parent kobject while registering cleanup action, and this resulted in some undetected leaks. Fixes: `5c170a4d9c` ("drm/xe/pf: Prepare sysfs for SR-IOV admin attributes") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260203235332.1350-1-michal.wajdeczko@intel.com (cherry picked from commit `98b16727f0`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-17 19:38:50 -05:00
Linus Torvalds	939faf71cf	Merge tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - amdgpu support for lots of new IP blocks which means newer GPUs - xe has a lot of SR-IOV and SVM improvements - lots of intel display refactoring across i915/xe - msm has more support for gen8 platforms - Given up on kgdb/kms integration, it's too hard on modern hw core: - drop kgdb support - replace system workqueue with percpu - account for property blobs in memcg - MAINTAINERS updates for xe + buddy rust: - Fix documentation for Registration constructors - Use pin_init::zeroed() for fops initialization - Annotate DRM helpers with __rust_helper - Improve safety documentation for gem::Object::new() - Update AlwaysRefCounted imports - mm: Prevent integer overflow in page_align() atomic: - add drm_device pointer to drm_private_obj - introduce gamma/degamma LUT size check buddy: - fix free_trees memory leak - prevent BUG_ON bridge: - introduce drm_bridge_unplug/enter/exit - add connector argument to .hpd_notify - lots of recounting conversions - convert rockchip inno hdmi to bridge - lontium-lt9611uxc: switch to HDMI audio helpers - dw-hdmi-qp: add support for HPD-less setups - Algoltek AG6311 support panels: - edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H - st75751: add SPI support - Sitronix ST7920, Samsung LTL106HL02 - LG LH546WF1-ED01, HannStar HSD156J - BOE NV130WUM-T08 - Innolux G150XGE-L05 - Anbernic RG-DS dma-buf: - improve sg_table debugging - add tracepoints - call clear_page instead of memset - start to introduce cgroup memory accounting in heaps - remove sysfs stats dma-fence: - add new helpers dp: - mst: avoid oob access with vcpi=0 hdmi: - limit infoframes exposure to userspace gem: - reduce page table overhead with THP - fix leak in drm_gem_get_unmapped_area gpuvm: - API sanitation for rust bindings sched: - introduce new helpers panic: - report invalid panic modes - add kunit tests i915/xe display: - Expose sharpness only if num_scalers is >= 2 - Add initial Xe3P_LPD for NVL - BMG FBC support - Add MTL+ platforms to support dpll framework _ fix DIMM_S DRM decoding on ICL - Return to using AUX interrupts - PSR/Panel replay refactoring - use consolidation HDMI tables - Xe3_LPD CD2X dividier changes xe: - vfio: add vfio_pci for intel GPU - multi queue support - dynamic pagemaps and multi-device SVM - expose temp attribs in hwmon - NO_COMPRESSION bo flag - expose MERT OA unit - sysfs survivability refactor - SRIOV PF: add MERT support - enable SR-IOV VF migration - Enable I2C/NVM on Crescent Island - Xe3p page reclaimation support - introduce SRIOV scheduler groups - add SoC remappt support in system controller - insert compiler barriers in GuC code - define NVL GuC firmware - handle GT resume failure - fix drm scheduler layering violations - enable GSC loading and PXP for PTL - disable GuC Power DCC strategy on PTL - unregister drm device on probe error i915: - move to kernel standard fault injection - bump recommended GuC version for DG2 and MTL amdgpu: - SMUIO 15.x, PSP 15.x support - IH 6.1.1/7.1 support - MMHUB 3.4/4.2 support - GC 11.5.4/12.1 support - SDMA 6.1.4/7.1/7.11.4 support - JPEG 5.3 support - UserQ updates - GC 9 gfx queue reset support - TTM memory ops parallelization - convert legacy logging to new helpers - DC analog fixes amdkfd: - GC 11.5.4/12.1 suppport - SDMA 6.1.4/7.1 support - per context support - increase kfd process hash table - Reserved SDMA rework radeon: - convert legacy logging to new helpers - use devm for i2c adapters msm: - GPU - Document a612/RGMU dt bindings - UBWC 6.0 support (for A840 / Kaanapali) - a225 support - DPU: - Switch to use virtual planes by default - Fix DSI CMD panels on DPU 3.x - Rewrite format handling to remove intermediate representation - Fix watchdog on DPU 8.x+ - Fix TE / Vsync source setting on DPU 8.x+ - Add 3D_Mux on SC7280 - Kaanapali platform support - Fix UBWC register programming - Make RM reserve DSPP-enabled mixers for CRTCs with LMs - Gamma correction support - DP: - Enable support for eDP 1.4+ link rate tables - Fix MDSS1 DP indices on SA8775P, making them to work - Fix msm_dp_ctrl_config_msa() to work with LLVM 20 - DSI: - Document QCS8300 as compatible with SA8775P - Kaanapali platform support - DSI PHY: - switch to divider_determine_rate() - MDP5: - Drop support for MSM8998, SDM660 and SDM630 (switch over to DPU) - MDSS: - Kaanapali platform support - Fixed UBWC register programming nova-core: - Prepare for Turing support. This includes parsing and handling Turing-specific firmware headers and sections as well as a Turing Falcon HAL implementation - Get rid of the Result<impl PinInit<T, E>> anti-pattern - Relocate initializer-specific code into the appropriate initializer - Use CStr::from_bytes_until_nul() to remove custom helpers - Improve handling of unexpected firmware values - Clean up redundant debug prints - Replace c_str!() with native Rust C-string literals - Update nova-core task list nova: - Align GEM object size to system page size tyr: - Use generated uAPI bindings for GpuInfo - Replace manual sleeps with read_poll_timeout() - Replace c_str!() with native Rust C-string literals - Suppress warnings for unread fields - Fix incorrect register name in print statement nouveau: - fix big page table support races in PTE management - improve reclocking on tegra 186+ amdxdna: - fix suspend race conditions - improve handling of zero tail pointers - fix cu_idx overwritten during command setup - enable hardware context priority - remove NPU2 support - update message buffer allocation requirements - update firmware version check ast: - support imported cursor buffers - big endian fixes etnaviv: - add PPU flop reset support imagination: - add AM62P support - introduce hw version checks ivpu: - implement warm boot flow panfrost: - add bo sync ioctl - add GPU_PM_RT support for RZ/G3E SoC panthor: - add bo sync ioctl - enable timestamp propagation - scheduler robustness improvements - VM termination fixes - huge page support rockchip: - RK3368 HDMI Support - get rid of atomic_check fixups - RK3506 support - RK3576/RK3588 improved HPD handling rz-du: - RZ/V2H(P) MIPI-DSI Support v3d: - fix DMA segment size - convert to new logging helpers mediatek: - move DP training to hotplug thread - convert logging to new helpers - add support for HS speed DSI - Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support atmel-hlcdc: - switch to drmm resource - support nomodeset - use newer helpers hisilicon: - fix various DP bugs renesas: - fix kernel panic on reboot exynos: - fix vidi_connection_ioctl using wrong device - fix vidi_connection deref user ptr - fix concurrency regression with vidi_context vkms: - add configfs support for display configuration * tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel: (1610 commits) drm/xe/pm: Disable D3Cold for BMG only on specific platforms drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early drm/xe: Fix kerneldoc for xe_migrate_exec_queue drm/xe/query: Fix topology query pointer advance drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header drm/xe/guc: Fix CFI violation in debugfs access. accel/amdxdna: Move RPM resume into job run function accel/amdxdna: Fix incorrect DPM level after suspend/resume nouveau/vmm: start tracking if the LPT PTE is valid. (v6) nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2) nouveau/vmm: rewrite pte tracker using a struct and bitfields. accel/amdxdna: Fix incorrect error code returned for failed chain command accel/amdxdna: Remove hardware context status drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe() drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dsc drm/i915/display: fix the pixel normalization handling for xe3p_lpd drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free drm/exynos: vidi: fix to avoid directly dereferencing user pointer drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl() ...	2026-02-11 12:55:44 -08:00
Karthik Poosa	666c654a5a	drm/xe/pm: Disable D3Cold for BMG only on specific platforms Restrict D3Cold disablement for BMG to unsupported NUC platforms, instead of disabling it on all platforms. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: `3e331a6715` ("drm/xe/pm: Temporarily disable D3Cold on BMG") Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `39125eaf88`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:58 -05:00
Shuicheng Lin	51cedb93da	drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_tlb_inval_job.c:210 expecting prototype for xe_tlb_inval_alloc_dep(). Prototype was for xe_tlb_inval_job_alloc_dep() instead" Fixes: `15366239e2` ("drm/xe: Decouple TLB invalidations from GT") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-8-shuicheng.lin@intel.com (cherry picked from commit `9f9c117ac5`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:52 -05:00
Shuicheng Lin	904b2e5063	drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_tlb_inval.c:136 expecting prototype for xe_gt_tlb_inval_init(). Prototype was for xe_gt_tlb_inval_init_early() instead" v2: add () for the function. (Michal) Fixes: `db16f9d90c` ("drm/xe: Split TLB invalidation code in frontend and backend") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-7-shuicheng.lin@intel.com (cherry picked from commit `0651dbb9d6`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:46 -05:00
Shuicheng Lin	5d5ef69549	drm/xe: Fix kerneldoc for xe_migrate_exec_queue Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_migrate.c:1262 expecting prototype for xe_get_migrate_exec_queue(). Prototype was for xe_migrate_exec_queue() instead" Fixes: `916ee4704a` ("drm/xe/vf: Register CCS read/write contexts with Guc") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-6-shuicheng.lin@intel.com (cherry picked from commit `9fd8da7179`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:41 -05:00
Shuicheng Lin	8b52d9ba08	drm/xe/query: Fix topology query pointer advance The topology query helper advanced the user pointer by the size of the pointer, not the size of the structure. This can misalign the output blob and corrupt the following mask. Fix the increment to use sizeof(topo). There is no issue currently, as sizeof(topo) happens to be equal to sizeof(topo) on 64-bit systems (both evaluate to 8 bytes). Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260130043907.465128-2-shuicheng.lin@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com> (cherry picked from commit `c2a6859138`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:35 -05:00
Chaitanya Kumar Borah	6282995188	drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header The GuC scheduler ABI header contains a file-level comment that is not intended to document a kernel-doc symbol. Using kernel-doc comment syntax (/** /) triggers kernel-doc warnings. With "-Werror", this causes the build to fail. Convert the comment to a regular block comment. HDRTEST drivers/gpu/drm/xe/abi/guc_scheduler_abi.h Warning: drivers/gpu/drm/xe/abi/guc_scheduler_abi.h:11 This comment starts with '/', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst Generic defines required for registration with and submissions to the GuC 1 warnings as errors make[6]: * [drivers/gpu/drm/xe/Makefile:377: drivers/gpu/drm/xe/abi/guc_scheduler_abi.hdrtest] Error 3 make[5]: * [scripts/Makefile.build:544: drivers/gpu/drm/xe] Error 2 make[4]: * [scripts/Makefile.build:544: drivers/gpu/drm] Error 2 make[3]: * [scripts/Makefile.build:544: drivers/gpu] Error 2 make[2]: * [scripts/Makefile.build:544: drivers] Error 2 make[1]: * [/home/kbuild2/kernel/Makefile:2088: .] Error 2 make: *** [Makefile:248: __sub-make] Error 2 v2: - Add Fixes tag (Daniele) Fixes: `b0c5cf4f59` ("drm/gt/guc: extract scheduler-related defines from guc_fwif.h") Signed-off-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Link: https://patch.msgid.link/20260130135210.2659200-1-chaitanya.kumar.borah@intel.com (cherry picked from commit `f89dbe14a0`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:30 -05:00
Daniele Ceraolo Spurio	6e035abf98	drm/xe/guc: Fix CFI violation in debugfs access. xe_guc_print_info is void-returning, but the function pointer it is assigned to expects an int-returning function, leading to the following CFI error: [ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe] (target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a) Fix this by updating xe_guc_print_info to return an integer. Fixes: `e15826bb3c` ("drm/xe/guc: Refactor GuC debugfs initialization") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: George D Sworo <george.d.sworo@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com (cherry picked from commit `dd8ea2f2ab`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2026-02-05 08:03:25 -05:00
Daniele Ceraolo Spurio	4cb1b32713	drm/xe/guc: Fix CFI violation in debugfs access. xe_guc_print_info is void-returning, but the function pointer it is assigned to expects an int-returning function, leading to the following CFI error: [ 206.873690] CFI failure at guc_debugfs_show+0xa1/0xf0 [xe] (target: xe_guc_print_info+0x0/0x370 [xe]; expected type: 0xbe3bc66a) Fix this by updating xe_guc_print_info to return an integer. Fixes: `e15826bb3c` ("drm/xe/guc: Refactor GuC debugfs initialization") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: George D Sworo <george.d.sworo@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129182547.32899-2-daniele.ceraolospurio@intel.com (cherry picked from commit `dd8ea2f2ab`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-02-05 09:45:22 +01:00
Karthik Poosa	bb36170d95	drm/xe/pm: Disable D3Cold for BMG only on specific platforms Restrict D3Cold disablement for BMG to unsupported NUC platforms, instead of disabling it on all platforms. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: `3e331a6715` ("drm/xe/pm: Temporarily disable D3Cold on BMG") Link: https://patch.msgid.link/20260123173238.1642383-1-karthik.poosa@intel.com Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `39125eaf88`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-02-04 16:41:07 +01:00
Shuicheng Lin	16264a3b59	drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep Correct the function name in the kerneldoc. It is for below warning: "Warning: drivers/gpu/drm/xe/xe_tlb_inval_job.c:210 expecting prototype for xe_tlb_inval_alloc_dep(). Prototype was for xe_tlb_inval_job_alloc_dep() instead" Fixes: `15366239e2` ("drm/xe: Decouple TLB invalidations from GT") Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20260129233834.419977-8-shuicheng.lin@intel.com (cherry picked from commit `9f9c117ac5`) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>	2026-02-04 16:40:54 +01:00

1 2 3 4 5 ...

4952 Commits