linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-22 16:53:59 -04:00

Author	SHA1	Message	Date
Thomas Hellström	77c8ede611	drm/xe: Don't copy pinned kernel bos twice on suspend We were copying the bo content the bos on the list "xe->pinned.late.kernel_bo_present" twice on suspend. Presumingly the intent is to copy the pinned external bos on the first pass. This is harmless since we (currently) should have no pinned external bos needing copy since a) exernal system bos don't have compressed content, b) We do not (yet) allow pinning of VRAM bos. Still, fix this up so that we copy pinned external bos on the first pass. We're about to allow bos pinned in VRAM. Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250918092207.54472-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `9e69bafece`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:18 -04:00
Lucas De Marchi	b67e7422d2	drm/xe: Fix build with CONFIG_MODULES=n When building with CONFIG_MODULES=n, the __exit functions are dropped. However our init functions may call them for error handling, so they are not good candidates for the exit sections. Fix this error reported by 0day: ld.lld: error: relocation refers to a symbol in a discarded section: xe_configfs_exit >>> defined in vmlinux.a(drivers/gpu/drm/xe/xe_configfs.o) >>> referenced by xe_module.c >>> drivers/gpu/drm/xe/xe_module.o:(init_funcs) in archive vmlinux.a This is the only exit function using __exit. Drop it to fix the build. Cc: Riana Tauro <riana.tauro@intel.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506092221.1FmUQmI8-lkp@intel.com/ Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250912-fix-nomodule-build-v1-1-d11b70a92516@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `d9b2623319`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:14 -04:00
Michal Wajdeczko	500dad428e	drm/xe/vf: Don't expose sysfs attributes not applicable for VFs VFs can't read BMG_PCIE_CAP(0x138340) register nor access PCODE (already guarded by the info.skip_pcode flag) so we shouldn't expose attributes that require any of them to avoid errors like: [] xe 0000:03:00.1: [drm] Tile0: GT0: VF is trying to read an \ inaccessible register 0x138340+0x0 [] RIP: 0010:xe_gt_sriov_vf_read32+0x6c2/0x9a0 [xe] [] Call Trace: [] xe_mmio_read32+0x110/0x280 [xe] [] auto_link_downgrade_capable_show+0x2e/0x70 [xe] [] dev_attr_show+0x1a/0x70 [] sysfs_kf_seq_show+0xaa/0x120 [] kernfs_seq_show+0x41/0x60 Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Fixes: `cdc36b66cd` ("drm/xe: Expose fan control and voltage regulator version") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250916170029.3313-2-michal.wajdeczko@intel.com (cherry picked from commit `a2d6223d22`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-22 12:13:08 -04:00
Daniele Ceraolo Spurio	26caeae9fb	drm/xe/guc: Set RCS/CCS yield policy All recent platforms (including all the ones officially supported by the Xe driver) do not allow concurrent execution of RCS and CCS workloads from different address spaces, with the HW blocking the context switch when it detects such a scenario. The DUAL_QUEUE flag helps with this, by causing the GuC to not submit a context it knows will not be able to execute. This, however, causes a new problem: if RCS and CCS queues have pending workloads from different address spaces, the GuC needs to choose from which of the 2 queues to pick the next workload to execute. By default, the GuC prioritizes RCS submissions over CCS ones, which can lead to CCS workloads being significantly (or completely) starved of execution time. The driver can tune this by setting a dedicated scheduling policy KLV; this KLV allows the driver to specify a quantum (in ms) and a ratio (percentage value between 0 and 100), and the GuC will prioritize the CCS for that percentage of each quantum. Given that we want to guarantee enough RCS throughput to avoid missing frames, we set the yield policy to 20% of each 80ms interval. v2: updated quantum and ratio, improved comment, use xe_guc_submit_disable in gt_sanitize Fixes: `d9a1ae0d17` ("drm/xe/guc: Enable WA_DUAL_QUEUE for newer platforms") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Tested-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Link: https://lore.kernel.org/r/20250905235632.3333247-2-daniele.ceraolospurio@intel.com (cherry picked from commit `8843444843`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo added #include "xe_guc_submit.h" while backporting]	2025-09-17 20:23:47 -04:00
Daniele Ceraolo Spurio	ae5fbbda34	drm/xe: Fix error handling if PXP fails to start Since the PXP start comes after __xe_exec_queue_init() has completed, we need to cleanup what was done in that function in case of a PXP start error. __xe_exec_queue_init calls the submission backend init() function, so we need to introduce an opposite for that. Unfortunately, while we already have a fini() function pointer, it performs other operations in addition to cleaning up what was done by the init(). Therefore, for clarity, the existing fini() has been renamed to destroy(), while a new fini() has been added to only clean up what was done by the init(), with the latter being called by the former (via xe_exec_queue_fini). Fixes: `72d479601d` ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250909221240.3711023-3-daniele.ceraolospurio@intel.com (cherry picked from commit `626667321d`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-17 12:28:55 -04:00
Zongyao Bai	ff89a4d285	drm/xe/sysfs: Add cleanup action in xe_device_sysfs_init On partial failure, some sysfs files created before the failure might not be removed. Add common cleanup step to remove them all immediately, as is should be harmless to attempt to remove non-existing files. Fixes: `0e414bf7ad` ("drm/xe: Expose PCIe link downgrade attributes") Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Shuicheng Lin <shuicheng.lin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Zongyao Bai <zongyao.bai@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250915214716.1327379-2-zongyao.bai@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `1a869168d9`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-17 12:28:55 -04:00
Dan Carpenter	cbc7f3b4f6	drm/xe: Fix a NULL vs IS_ERR() in xe_vm_add_compute_exec_queue() The xe_preempt_fence_create() function returns error pointers. It never returns NULL. Update the error checking to match. Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/aJTMBdX97cof_009@stanley.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `75cc23ffe5`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-16 11:49:16 -04:00
Nitin Gote	7926ba2143	drm/xe: defer free of NVM auxiliary container to device release callback Do not kfree the intel_dg_nvm_dev in xe_nvm_fini() right after auxiliary_device_delete/uninit. The auxiliary_device embeds the device/kobject (and its name); freeing it too early can race with asynchronous device_del/udev processing and cause a use-after-free. Signed-off-by: Nitin Gote <nitin.r.gote@intel.com> Fixes: `c28bfb107d` ("drm/xe/nvm: add on-die non-volatile memory device") Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250911052823.226696-1-nitin.r.gote@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `d4c3ed963e`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 13:12:19 -04:00
Mallesh Koujalagi	dcfd151d32	drm/xe/hwmon: Remove type casting Refactor: eliminate type casts by using proper u32 declarations. v2: - Address review comments. (Karthik) v3: - Use the proper u32 type and drop cast. (Lucas De Marchi) - Modify variable when actually using u64 value. - Change r value to reg_value with u32 type. v4: - Remove newline between trailer and Signed-off-by. (Lucas De Marchi) - Change reg_val to val for more user-friendly logging. - Use mul_u32_u32 function since both values are u32. v5: - mul_u32_u32 function with shift. (Lucas De Marchi) Fixes: `7596d839f6` ("drm/xe/hwmon: Add support to manage power limits though mailbox") Signed-off-by: Mallesh Koujalagi <mallesh.koujalagi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250912113458.2815172-1-mallesh.koujalagi@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `4e1d3b5e64`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 13:11:40 -04:00
Michal Wajdeczko	fef8b64e48	drm/xe/pf: Drop rounddown_pow_of_two fair LMEM limitation This effectively reverts commit `4c3fe5eae4` ("drm/xe/pf: Limit fair VF LMEM provisioning") since we don't need it any more after non-contig VRAM allocations were fixed. This allows larger LMEM auto-provisioning for VFs, so instead: [ ] GT0: PF: LMEM available(14096M) fair(1 x 8192M) [ ] GT0: PF: VF1 provisioned with 8589934592 (8.00 GiB) LMEM or [ ] GT0: PF: LMEM available(14096M) fair(2 x 4096M) [ ] GT0: PF: VF1..VF2 provisioned with 4294967296 (4.00 GiB) LMEM we may get: [ ] GT0: PF: LMEM available(14096M) fair(1 x 14096M) [ ] GT0: PF: VF1 provisioned with 14780727296 (13.8 GiB) LMEM and [ ] GT0: PF: LMEM available(14096M) fair(2 x 7048M) [ ] GT0: PF: VF1..VF2 provisioned with 7390363648 (6.88 GiB) LMEM Fixes: `1e32ffbc9d` ("drm/xe/sriov: support non-contig VRAM provisioning") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Link: https://lore.kernel.org/r/20250910222439.32869-1-michal.wajdeczko@intel.com (cherry picked from commit `95c1cfa306`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 08:26:26 -04:00
Shuicheng Lin	013e484dbd	drm/xe/tile: Release kobject for the failure path Call kobject_put() for the failure path to release the kobject v2: remove extra newline. (Matt) Fixes: `e3d0839aa5` ("drm/xe/tile: Abort driver load for sysfs creation failure") Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://lore.kernel.org/r/20250819153950.2973344-2-shuicheng.lin@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `b98775bca9`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-15 08:26:19 -04:00
Julia Filipchuk	fd99415ec8	drm/xe: Extend Wa_13011645652 to PTL-H, WCL Expand workaround to additional graphics architectures. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.17+ Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `6fc957185e`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-09 13:20:36 -04:00
Thomas Hellström	eb5723a751	drm/xe: Block exec and rebind worker while evicting for suspend / hibernate When the xe pm_notifier evicts for suspend / hibernate, there might be racing tasks trying to re-validate again. This can lead to suspend taking excessive time or get stuck in a live-lock. This behaviour becomes much worse with the fix that actually makes re-validation bring back bos to VRAM rather than letting them remain in TT. Prevent that by having exec and the rebind worker waiting for a completion that is set to block by the pm_notifier before suspend and is signaled by the pm_notifier after resume / wakeup. It's probably still possible to craft malicious applications that block suspending. More work is pending to fix that. v3: - Avoid wait_for_completion() in the kernel worker since it could potentially cause work item flushes from freezable processes to wait forever. Instead terminate the rebind workers if needed and re-launch at resume. (Matt Auld) v4: - Fix some bad naming and leftover debug printouts. - Fix kerneldoc. - Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld). - Rework the interface of xe_vm_rebind_resume_worker (Matt Auld). Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com (cherry picked from commit `599334572a`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-09 13:20:31 -04:00
Thomas Hellström	d84820309e	drm/xe: Allow the pm notifier to continue on failure Its actions are opportunistic anyway and will be completed on device suspend. Marking as a fix to simplify backporting of the fix that follows in the series. v2: - Keep the runtime pm reference over suspend / hibernate and document why. (Matt Auld, Rodrigo Vivi): Fixes: `c6a4d46ec1` ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com (cherry picked from commit `ebd546fdff`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-09 13:20:26 -04:00
Thomas Hellström	5c87fee3c9	drm/xe: Attempt to bring bos back to VRAM after eviction VRAM+TT bos that are evicted from VRAM to TT may remain in TT also after a revalidation following eviction or suspend. This manifests itself as applications becoming sluggish after buffer objects get evicted or after a resume from suspend or hibernation. If the bo supports placement in both VRAM and TT, and we are on DGFX, mark the TT placement as fallback. This means that it is tried only after VRAM + eviction. This flaw has probably been present since the xe module was upstreamed but use a Fixes: commit below where backporting is likely to be simple. For earlier versions we need to open- code the fallback algorithm in the driver. v2: - Remove check for dgfx. (Matthew Auld) - Update the xe_dma_buf kunit test for the new strategy (CI) - Allow dma-buf to pin in current placement (CI) - Make xe_bo_validate() for pinned bos a NOP. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995 Fixes: `a78a8da51b` ("drm/ttm: replace busy placement with flags v6") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `cb3d7b3b46`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-09 13:20:22 -04:00
Michal Wajdeczko	7934fdc25a	drm/xe/configfs: Don't touch survivability_mode on fini This is a user controlled configfs attribute, we should not modify that outside the configfs attr.store() implementation. Fixes: `bc417e54e2` ("drm/xe: Enable configfs support for survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com (cherry picked from commit `079a5c83db`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-09 13:20:17 -04:00
Thomas Hellström	379b3c983f	drm/xe: Fix incorrect migration of backed-up object to VRAM If an object is backed up to shmem it is incorrectly identified as not having valid data by the move code. This means moving to VRAM skips the -EMULTIHOP step and the bo is cleared. This causes all sorts of weird behaviour on DGFX if an already evicted object is targeted by the shrinker. Fix this by using ttm_tt_is_swapped() to identify backed-up objects. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5996 Fixes: `00c8efc318` ("drm/xe: Add a shrinker for xe bos") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250828134837.5709-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `1047bd8279`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-09-02 09:00:47 -04:00
Carlos Llamas	75671d90fd	drm/xe: switch to local xbasename() helper Commit `b0a2ee5567` ("drm/xe: prepare xe_gen_wa_oob to be multi-use") introduced a call to basename(). The GNU version of this function is not portable and fails to build with alternative libc implementations like musl or bionic. This causes the following build error: drivers/gpu/drm/xe/xe_gen_wa_oob.c:130:12: error: assignment to ‘const char *’ from ‘int’ makes pointer from integer without a cast [-Wint-conversion] 130 \| fn = basename(fn); \| ^ While a POSIX version of basename() could be used, it would require a separate header plus the behavior differs from GNU version in that it might modify its argument. Not great. Instead, implement a local xbasename() helper based on strrchr() that provides the same functionality and avoids portability issues. Fixes: `b0a2ee5567` ("drm/xe: prepare xe_gen_wa_oob to be multi-use") Suggested-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tiffany Yang <ynaffit@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250825155743.1132433-1-cmllamas@google.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `41be792f5b`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:12:11 -04:00
Matthew Brost	16ca06aa2c	drm/xe: Don't trigger rebind on initial dma-buf validation On the first validate of an imported dma-buf (initial bind), the device has no GPU mappings, so a rebind is unnecessary. Rebinding here is harmful in multi-GPU setups and for VMs using preempt-fence mode, as it would evict in-flight GPU work. v2: - Drop dma_buf_validated, check for XE_PL_SYSTEM (Thomas) Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250825152841.3837378-1-matthew.brost@intel.com (cherry picked from commit `ffdf968762`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:12:11 -04:00
Thomas Hellström	2b55ddf362	drm/xe/vm: Clear the scratch_pt pointer on error Avoid triggering a dereference of an error pointer on cleanup in xe_vm_free_scratch() by clearing any scratch_pt error pointer. Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `06951c2ee7` ("drm/xe: Use NULL PTEs as scratch PTEs") Cc: Brian Welty <brian.welty@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821143045.106005-4-thomas.hellstrom@linux.intel.com (cherry picked from commit `358ee50ab5`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:12:11 -04:00
Thomas Hellström	7551865cd1	drm/xe/vm: Don't pin the vm_resv during validation The pinning has the odd side-effect that unlocking any resv during validation triggers an "unlocking pinned lock" warning. Cc: Matthew Brost <matthew.brost@intel.com> Fixes: `5cc3325584` ("drm/xe: Rework eviction rejection of bound external bos") Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250821143045.106005-2-thomas.hellstrom@linux.intel.com (cherry picked from commit `0a51bf3e54`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:12:11 -04:00
Zbigniew Kempczyński	04e1f683cd	drm/xe/xe_sync: avoid race during ufence signaling Marking ufence as signalled after copy_to_user() is too late. Worker thread which signals ufence by memory write might be raced with another userspace vm-bind call. In map/unmap scenario unmap may still see ufence is not signalled causing -EBUSY. Change the order of marking / write to user-fence fixes this issue. Fixes: `977e5b82e0` ("drm/xe: Expose user fence from xe_sync_entry") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5536 Signed-off-by: Zbigniew Kempczyński <zbigniew.kempczynski@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250820083903.2109891-2-zbigniew.kempczynski@intel.com (cherry picked from commit `8ae04fe9ff`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-26 10:12:11 -04:00
Christoph Manszewski	111fb43a55	drm/xe: Fix vm_bind_ioctl double free bug If the argument check during an array bind fails, the bind_ops are freed twice as seen below. Fix this by setting bind_ops to NULL after freeing. ================================================================== BUG: KASAN: double-free in xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] Free of addr ffff88813bb9b800 by task xe_vm/14198 CPU: 5 UID: 0 PID: 14198 Comm: xe_vm Not tainted 6.16.0-xe-eudebug-cmanszew+ #520 PREEMPT(full) Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR5 RVP, BIOS ADLPFWI1.R00.2411.A02.2110081023 10/08/2021 Call Trace: <TASK> dump_stack_lvl+0x82/0xd0 print_report+0xcb/0x610 ? __virt_addr_valid+0x19a/0x300 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] kasan_report_invalid_free+0xc8/0xf0 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] check_slab_allocation+0x102/0x130 kfree+0x10d/0x440 ? should_fail_ex+0x57/0x2f0 ? xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] xe_vm_bind_ioctl+0x1b2/0x21f0 [xe] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] ? __lock_acquire+0xab9/0x27f0 ? lock_acquire+0x165/0x300 ? drm_dev_enter+0x53/0xe0 [drm] ? find_held_lock+0x2b/0x80 ? drm_dev_exit+0x30/0x50 [drm] ? drm_ioctl_kernel+0x128/0x1c0 [drm] drm_ioctl_kernel+0x128/0x1c0 [drm] ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] ? find_held_lock+0x2b/0x80 ? __pfx_drm_ioctl_kernel+0x10/0x10 [drm] ? should_fail_ex+0x57/0x2f0 ? __pfx_xe_vm_bind_ioctl+0x10/0x10 [xe] drm_ioctl+0x352/0x620 [drm] ? __pfx_drm_ioctl+0x10/0x10 [drm] ? __pfx_rpm_resume+0x10/0x10 ? do_raw_spin_lock+0x11a/0x1b0 ? find_held_lock+0x2b/0x80 ? __pm_runtime_resume+0x61/0xc0 ? rcu_is_watching+0x20/0x50 ? trace_irq_enable.constprop.0+0xac/0xe0 xe_drm_ioctl+0x91/0xc0 [xe] __x64_sys_ioctl+0xb2/0x100 ? rcu_is_watching+0x20/0x50 do_syscall_64+0x68/0x2e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa9acb24ded Fixes: `b43e864af0` ("drm/xe/uapi: Add DRM_XE_VM_BIND_FLAG_CPU_ADDR_MIRROR") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Christoph Manszewski <christoph.manszewski@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250813101231.196632-2-christoph.manszewski@intel.com (cherry picked from commit `a01b704527`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-21 15:06:58 -04:00
Piotr Piórkowski	8a30114073	drm/xe: Move ASID allocation and user PT BO tracking into xe_vm_create Currently, ASID assignment for user VMs and page-table BO accounting for client memory tracking are performed in xe_vm_create_ioctl. To consolidate VM object initialization, move this logic to xe_vm_create. v2: - removed unnecessary duplicate BO tracking code - using the local variable xef to verify whether the VM is being created by userspace Fixes: `658a1c8e0a` ("drm/xe: Assign ioctl xe file handler to vm in xe_vm_create") Suggested-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250811104358.2064150-3-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit `30e0c3f43a`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> [Rodrigo: Added fixes tag]	2025-08-21 15:06:58 -04:00
Piotr Piórkowski	658a1c8e0a	drm/xe: Assign ioctl xe file handler to vm in xe_vm_create In several code paths, such as xe_pt_create(), the vm->xef field is used to determine whether a VM originates from userspace or the kernel. Previously, this handler was only assigned in xe_vm_create_ioctl(), after the VM was created by xe_vm_create(). However, xe_vm_create() triggers page table creation, and that function assumes vm->xef should be already set. This could lead to incorrect origin detection. To fix this problem and ensure consistency in the initialization of the VM object, let's move the assignment of this handler to xe_vm_create. v2: - take reference to the xe file object only when xef is not NULL - release the reference to the xe file object on the error path (Matthew) Fixes: `7f387e6012` ("drm/xe: add XE_BO_FLAG_PINNED_LATE_RESTORE") Signed-off-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250811104358.2064150-2-piotr.piorkowski@intel.com Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> (cherry picked from commit `9337166fa1`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-19 10:15:08 -04:00
Michał Winiarski	94eae6ee4c	drm/xe/pf: Set VF LMEM BAR size LMEM is partitioned between multiple VFs and we expect that the more VFs we have, the less LMEM is assigned to each VF. This means that we can achieve full LMEM BAR access without the need to attempt full VF LMEM BAR resize via pci_resize_resource(). Always try to set the largest possible BAR size that allows to fit the number of enabled VFs and inform the user in case the resize attempt is not successful. Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Link: https://lore.kernel.org/r/20250527120637.665506-7-michal.winiarski@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `32a4d1b98e`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-14 10:30:53 -04:00
Karthik Poosa	55d49f0616	drm/xe/hwmon: Add SW clamp for power limits writes Clamp writes to power limits powerX_crit/currX_crit, powerX_cap, powerX_max, to the maximum supported by the pcode mailbox when sysfs-provided values exceed this limit. Although the pcode already performs clamping, values beyond the pcode mailbox's supported range get truncated, leading to incorrect critical power settings. This patch ensures proper clamping to prevent such truncation. v2: - Address below review comments. (Riana) - Split comments into multiple sentences. - Use local variables for readability. - Add a debug log. - Use u64 instead of unsigned long. v3: - Change drm_dbg logs to drm_info. (Badal) v4: - Rephrase the drm_info log. (Rodrigo, Riana) - Rename variable max_mbx_power_limit to max_supp_power_limit, as limit is same for platforms with and without mailbox power limit support. Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: `92d44a422d` ("drm/xe/hwmon: Expose card reactive critical power") Fixes: `fb1b70607f` ("drm/xe/hwmon: Expose power attributes") Reviewed-by: Riana Tauro <riana.tauro@intel.com> Link: https://lore.kernel.org/r/20250808185310.3466529-1-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `d301eb950d`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-12 12:52:26 -04:00
Thomas Hellström	2dd7a47669	drm/xe: Defer buffer object shrinker write-backs and GPU waits When the xe buffer-object shrinker allows GPU waits and write-back, (typically from kswapd), perform multiple passes, skipping subsequent passes if the shrinker number of scanned objects target is reached. 1) Without GPU waits and write-back 2) Without write-back 3) With both GPU-waits and write-back This is to avoid stalls and costly write- and readbacks unless they are really necessary. v2: - Don't test for scan completion twice. (Stuart Summers) - Update tags. Reported-by: melvyn <melvyn2@dnsense.pub> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5557 Cc: Summers Stuart <stuart.summers@intel.com> Fixes: `00c8efc318` ("drm/xe: Add a shrinker for xe bos") Cc: <stable@vger.kernel.org> # v6.15+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250805074842.11359-1-thomas.hellstrom@linux.intel.com (cherry picked from commit `80944d3341`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-12 12:52:26 -04:00
Matthew Auld	145832fbdd	drm/xe/migrate: prevent potential UAF If we hit the error path, the previous fence (if there is one) has already been put() prior to this, so doing a fence_wait could lead to UAF. Tweak the flow to do to the put() until after we do the wait. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-8-matthew.auld@intel.com (cherry picked from commit `9b7ca35ed2`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-12 12:52:26 -04:00
Matthew Auld	4126cb327a	drm/xe/migrate: don't overflow max copy size With non-page aligned copy, we need to use 4 byte aligned pitch, however the size itself might still be close to our maximum of ~8M, and so the dimensions of the copy can easily exceed the S16_MAX limit of the copy command leading to the following assert: xe 0000:03:00.0: [drm] Assertion `size / pitch <= ((s16)(((u16)~0U) >> 1))` failed! platform: BATTLEMAGE subplatform: 1 graphics: Xe2_HPG 20.01 step A0 media: Xe2_HPM 13.01 step A1 tile: 0 VRAM 10.0 GiB GT: 0 type 1 WARNING: CPU: 23 PID: 10605 at drivers/gpu/drm/xe/xe_migrate.c:673 emit_copy+0x4b5/0x4e0 [xe] To fix this account for the pitch when calculating the number of current bytes to copy. Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-7-matthew.auld@intel.com (cherry picked from commit `8c2d61e0e9`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-12 12:52:26 -04:00
Matthew Auld	9d7a1cbebb	drm/xe/migrate: prevent infinite recursion If the buf + offset is not aligned to XE_CAHELINE_BYTES we fallback to using a bounce buffer. However the bounce buffer here is allocated on the stack, and the only alignment requirement here is that it's naturally aligned to u8, and not XE_CACHELINE_BYTES. If the bounce buffer is also misaligned we then recurse back into the function again, however the new bounce buffer might also not be aligned, and might never be until we eventually blow through the stack, as we keep recursing. Instead of using the stack use kmalloc, which should respect the power-of-two alignment request here. Fixes a kernel panic when triggering this path through eudebug. v2 (Stuart): - Add build bug check for power-of-two restriction - s/EINVAL/ENOMEM/ Fixes: `270172f64b` ("drm/xe: Update xe_ttm_access_memory to use GPU for non-visible access") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Maciej Patelczyk <maciej.patelczyk@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731093807.207572-6-matthew.auld@intel.com (cherry picked from commit `38b34e928a`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-12 12:52:26 -04:00
Linus Torvalds	ffe8ac927d	Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...	2025-08-08 06:48:14 +03:00
Simon Richter	022906afdf	Mark xe driver as BROKEN if kernel page size is not 4kB This driver, for the time being, assumes that the kernel page size is 4kB, so it fails on loong64 and aarch64 with 16kB pages, and ppc64el with 64kB pages. Signed-off-by: Simon Richter <Simon.Richter@hogyros.de> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Fixes: `dd08ebf6c3` ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Link: https://lore.kernel.org/r/20250802024152.3021-1-Simon.Richter@hogyros.de (cherry picked from commit `0521a86822`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:11 -04:00
Michal Wajdeczko	cb7a3f949a	drm/xe/pf: Make sure PF is ready to configure VFs The PF driver might be resumed just to configure VFs, but since it is doing some asynchronous GuC reconfigurations after fresh reset, we should wait until all pending works are completed. This is especially important in case of LMEM provisioning, since we also need to update the LMTT and send invalidation requests to all GuCs, which are expected to be already in the VGT mode. Fixes: `68ae022278` ("drm/xe/pf: Force GuC virtualization mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-3-michal.wajdeczko@intel.com (cherry picked from commit `c6c86441c4`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:06 -04:00
Michal Wajdeczko	c286ce6b01	drm/xe/pf: Disable PF restart worker on device removal We can't let restart worker run once device is removed, since other data that it might want to access could be already released. Explicitly disable worker as part of device cleanup action. Fixes: `a4d1c5d0b9` ("drm/xe/pf: Move VFs reprovisioning to worker") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Piotr Piórkowski <piotr.piorkowski@intel.com> Cc: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250801142822.180530-2-michal.wajdeczko@intel.com (cherry picked from commit `a424353937`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:59:01 -04:00
Balasubramani Vivekanandan	465f1dba74	drm/xe/devcoredump: Defer devcoredump initialization during probe Doing devcoredump initializing before GT though look harmless, it leads to problem during driver unbind. Because of this order, GT/Engine release functions will be called before xe devcoredump release function (xe_driver_devcoredump_fini) leading to the following kernel crash[1] because the devcoredump functions might still use GT/Engine datastructures after those are freed. The following crash is observed while running the IGT xe_wedged@wedged-at-any-timeout. The test forces a wedged state by submitting a workload which hangs. Then does a unbind/rebind of the driver to recover from the wedged state. The hanged workload leads to a devcoredump. The following crash is noticed when the devcoredump capture races with the driver unbind. During driver unbind, the release function hw_engine_fini() will be called which assigns NULL to hwe->gt. But the same data structure is accessed during the coredump capture in the function xe_engine_snapshot_print by reading snapshot->hwe->gt. With this patch, we make sure the devcoredump is stopped before deinitializing the core driver functions. [1]: BUG: kernel NULL pointer dereference, address: 0000000000000000 Workqueue: events_unbound xe_devcoredump_deferred_snap_work [xe] RIP: 0010:xe_engine_snapshot_print+0x47/0x420 [xe] Call Trace: <TASK> ? drm_printf+0x64/0x90 __xe_devcoredump_read+0x23f/0x2d0 [xe] ? __pfx___drm_printfn_coredump+0x10/0x10 ? __pfx___drm_puts_coredump+0x10/0x10 xe_devcoredump_deferred_snap_work+0x17a/0x190 [xe] process_one_work+0x22e/0x6f0 worker_thread+0x1e8/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0x11f/0x250 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x47/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 v2: Detailed commit description (Rodrigo) v3: FIXME added (Rodrigo, Stuart) Fixes: `4209d635a8` ("drm/xe: Remove devcoredump during driver release") Reviewed-by: Stuart Summers <stuart.summers@intel.com> Link: https://lore.kernel.org/r/20250731061300.14320-1-balasubramani.vivekanandan@intel.com Signed-off-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com> Link: https://lore.kernel.org/r/20250801052356.21885-1-balasubramani.vivekanandan@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `1fdc4c381f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:58:56 -04:00
Michal Wajdeczko	df9bdd4381	drm/xe/pf: Enable SR-IOV PF mode by default We already claim official support for SR-IOV PF/VF modes on PTL and BMG platforms, but by default we start the Xe driver on those platforms in non-virtualized mode (native) since we still have max_vfs modparam set to disable creation of the VFs. It's time to let the Xe driver support SR-IOV PF mode by default. We were already testing this on our CI, which was relying on the patch that was enabling it for CONFIG_DRM_XE_DEBUG used by our CI. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Thomas Hellstrom <thomas.hellstrom@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250722182618.30811-3-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `a2b461bd6f`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-08-04 11:56:18 -04:00
Linus Torvalds	e991acf1bc	Merge tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: "Significant patch series in this pull request: - "squashfs: Remove page->mapping references" (Matthew Wilcox) gets us closer to being able to remove page->mapping - "relayfs: misc changes" (Jason Xing) does some maintenance and minor feature addition work in relayfs - "kdump: crashkernel reservation from CMA" (Jiri Bohac) switches us from static preallocation of the kdump crashkernel's working memory over to dynamic allocation. So the difficulty of a-priori estimation of the second kernel's needs is removed and the first kernel obtains extra memory - "generalize panic_print's dump function to be used by other kernel parts" (Feng Tang) implements some consolidation and rationalization of the various ways in which a failing kernel splats information at the operator * tag 'mm-nonmm-stable-2025-08-03-12-47' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (80 commits) tools/getdelays: add backward compatibility for taskstats version kho: add test for kexec handover delaytop: enhance error logging and add PSI feature description samples: Kconfig: fix spelling mistake "instancess" -> "instances" fat: fix too many log in fat_chain_add() scripts/spelling.txt: add notifer\|\|notifier to spelling.txt xen/xenbus: fix typo "notifer" net: mvneta: fix typo "notifer" drm/xe: fix typo "notifer" cxl: mce: fix typo "notifer" KVM: x86: fix typo "notifer" MAINTAINERS: add maintainers for delaytop ucount: use atomic_long_try_cmpxchg() in atomic_long_inc_below() ucount: fix atomic_long_inc_below() argument type kexec: enable CMA based contiguous allocation stackdepot: make max number of pools boot-time configurable lib/xxhash: remove unused functions init/Kconfig: restore CONFIG_BROKEN help text lib/raid6: update recov_rvv.c zero page usage docs: update docs after introducing delaytop ...	2025-08-03 16:23:09 -07:00
WangYuli	26197b0fd2	drm/xe: fix typo "notifer" There is a spelling mistake of 'notifer' in the comment which should be 'notifier'. Link: https://lkml.kernel.org/r/94190C5F54A19F3E+20250722073431.21983-3-wangyuli@uniontech.com Signed-off-by: WangYuli <wangyuli@uniontech.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2025-08-02 12:01:39 -07:00
Linus Torvalds	89748acdf2	Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "Just a bunch of amdgpu and xe fixes. amdgpu: - DSC divide by 0 fix - clang fix - DC debugfs fix - Userq fixes - Avoid extra evict-restore with KFD - Backlight fix - Documentation fix - RAS fix - Add new kicker handling - DSC fix for DCN 3.1.4 - PSR fix - Atomic fix - DC reset fixes - DCN 3.0.1 fix - MMHUB client mapping fix xe: - Fix BMG probe on unsupported mailbox command - Fix OA static checker warning about null gt - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter - Fix missing unwind goto in GuC/HuC - Don't register I2C devices if VF - Clear whole GuC g2h_fence during initialization - Avoid call kfree for drmm_kzalloc - Fix pci_dev reference leak on configfs - SRIOV: Disable CSC support on VF * tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits) drm/xe/vf: Disable CSC support on VF drm/amdgpu: update mmhub 4.1.0 client id mappings drm/amd/display: Allow DCN301 to clear update flags drm/amd/display: Pass up errors for reset GPU that fails to init HW drm/amd/display: Only finalize atomic_obj if it was initialized drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported drm/amd/display: Disable dsc_power_gate for dcn314 by default drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14 drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c' drm/amd/display: fix initial backlight brightness calculation drm/amdgpu: Avoid extra evict-restore process. drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram() drm/amd/display: Fix divide by zero when calculating min ODM factor drm/xe/configfs: Fix pci_dev reference leak drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() drm/xe/guc: Clear whole g2h_fence during initialization drm/xe/vf: Don't register I2C devices if VF ...	2025-07-31 21:47:36 -07:00
Linus Torvalds	260f6f4fda	Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...	2025-07-30 19:26:49 -07:00
Lukasz Laguna	f62408efc8	drm/xe/vf: Disable CSC support on VF CSC is not accessible by VF drivers, so disable its support flag on VF to prevent further initialization attempts. Fixes: `e02cea83d3` ("drm/xe/gsc: add Battlemage support") Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Cc: Alexander Usyskin <alexander.usyskin@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250729123437.5933-1-lukasz.laguna@intel.com (cherry picked from commit `552dbba1ca`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-30 15:09:27 -04:00
Linus Torvalds	9669b2499e	Merge tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers from Ilpo Järvinen: - alienware: Add more precise labels to fans - amd/hsmp: Improve misleading probe errors (make the legacy driver aware when HSMP is supported through the ACPI driver) - amd/pmc: Add Lenovo Yoga 6 13ALCL6 to pmc quirk list - drm/xe: Correct (D)VSEC information to support PMT crashlog feature - fujitsu: Clamp charge threshold instead of returning an error - ideapad: Expore change types - intel/pmt: - Add PMT Discovery driver - Add API to retrieve telemetry regions by feature - Fix crashlog NULL access - Support Battlemage GPU (BMG) crashlog - intel/vsec: - Add Discovery feature - Add feature dependency support using device links - lenovo: - Move lenovo drivers under drivers/platform/x86/lenovo/ - Add WMI drivers for Lenovo Gaming series - Improve DMI handling - oxpec: - Add support for OneXPlayer X1 Mini Pro (Strix Point variant) - Fix EC registers for G1 AMD - samsung-laptop: Expose change types - wmi: Fix WMI device naming issue (same GUID corner cases) - x86-android-tables: Add ovc-capacity-table to generic battery nodes - Miscellaneous cleanups / refactoring / improvements * tag 'platform-drivers-x86-v6.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (63 commits) platform/x86: oxpec: Add support for OneXPlayer X1 Mini Pro (Strix Point) platform/x86: oxpec: Fix turbo register for G1 AMD platform/x86/intel/pmt: support BMG crashlog platform/x86/intel/pmt: use a version struct platform/x86/intel/pmt: refactor base parameter platform/x86/intel/pmt: add register access helpers platform/x86/intel/pmt: decouple sysfs and namespace platform/x86/intel/pmt: correct types platform/x86/intel/pmt: re-order trigger logic platform/x86/intel/pmt: use guard(mutex) platform/x86/intel/pmt: mutex clean up platform/x86/intel/pmt: white space cleanup drm/xe: Correct BMG VSEC header sizing drm/xe: Correct the rev value for the DVSEC entries platform/x86/intel/pmt: fix a crashlog NULL pointer access platform/x86: samsung-laptop: Expose charge_types platform/x86/amd: pmc: Add Lenovo Yoga 6 13ALC6 to pmc quirk list platform/x86: dell-uart-backlight: Use blacklight power constant platform/x86/intel/pmt: fix build dependency for kunit test platform/x86: lenovo: gamezone needs "other mode" ...	2025-07-28 23:21:28 -07:00
Michal Wajdeczko	942ac8da63	drm/xe/configfs: Fix pci_dev reference leak We are using pci_get_domain_bus_and_slot() function to verify if the given config directory name matches any existing PCI device, but we missed to call matching pci_dev_put() to release reference. While around, also change error code in case of no device match, to make it more specific than generic formatting error. Fixes: `16280ded45` ("drm/xe: Add configfs to enable survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Link: https://lore.kernel.org/r/20250722141059.30707-2-michal.wajdeczko@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit `0bdd05c2a8`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:48 -04:00
Shuicheng Lin	4846856c3a	drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc() Memory allocated with drmm_kzalloc() should not be freed using kfree(), as it is managed by the DRM subsystem. The memory will be automatically freed when the associated drm_device is released. These 3 group pointers are allocated using drmm_kzalloc() in hw_engine_group_alloc(), so they don't require manual deallocation. Fixes: `6797906074` ("drm/xe/hw_engine_group: Fix potential leak") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://lore.kernel.org/r/20250724193854.1124510-2-shuicheng.lin@intel.com (cherry picked from commit `f98de826b4`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:43 -04:00
Michal Wajdeczko	a2e1407eb8	drm/xe/guc: Clear whole g2h_fence during initialization The struct g2h_fence must be explicitly initializated using the g2h_fence_init() function to avoid trash values in its members, but we missed to update this helper function with the new member. To fix that and avoid any future mistakes, memset the whole struct first, then update remaining non-zero members. Fixes: `94de94d24e` ("drm/xe/guc: Cancel ongoing H2G requests when stopping CT") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Lukasz Laguna <lukasz.laguna@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250723175639.206875-1-michal.wajdeczko@intel.com (cherry picked from commit `159afd92ba`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:38 -04:00
Lukasz Laguna	cccb918e02	drm/xe/vf: Don't register I2C devices if VF VF drivers can't access I2C devices, so skip their registration when running as VF. Signed-off-by: Lukasz Laguna <lukasz.laguna@intel.com> Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250717155420.25298-1-lukasz.laguna@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `9a220e0659`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:33 -04:00
Zhanjun Dong	dc94168eaa	drm/xe/uc: Fix missing unwind goto Fix missing unwind goto on error handling. Fixes: `b2c4ac219f` ("drm/xe/uc: Disable GuC communication on hardware initialization error") Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://lore.kernel.org/r/20250721214520.954014-1-zhanjun.dong@intel.com (cherry picked from commit `176f44a5ec`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:28 -04:00
Dan Carpenter	2bd986021c	drm/xe: Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter() The fwnode_create_software_node() function returns error pointers. It never returns NULL. Update the checks to match. Fixes: `f0e53aadd7` ("drm/xe: Support for I2C attached MCUs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/65825d00-81ab-4665-af51-4fff6786a250@sabinyo.mountain Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> (cherry picked from commit `2f264d58cc`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:22 -04:00
Ashutosh Dixit	6aaceed7fe	drm/xe/oa: Fix static checker warning about null gt There is a static checker warning that gt returned by xe_device_get_gt can be NULL and that is being dereferenced. Use xe_root_mmio_gt instead, which is equivalent and cannot return a NULL gt 0. Fixes: `10d42ef34b` ("drm/xe/oa: Assign hwe for OAM_SAG") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://lore.kernel.org/r/20250715181422.2807624-1-ashutosh.dixit@intel.com (cherry picked from commit `308dc9b278`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2025-07-28 10:22:17 -04:00

1 2 3 4 5 ...

3784 Commits