linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-26 18:42:25 -04:00

Author	SHA1	Message	Date
Alex Deucher	ac4ed2da4c	Revert "drm/amdgpu: fix incorrect vm flags to map bo" This reverts commit `b08425fa77`. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `be33e8a239`)	2025-08-27 14:00:38 -04:00
Alex Deucher	29f155c5e8	drm/amdgpu/gfx12: set MQD as appriopriate for queue types Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `7b9110f289`) Cc: stable@vger.kernel.org	2025-08-27 13:59:53 -04:00
Alex Deucher	27f5e0c132	drm/amdgpu/gfx11: set MQD as appriopriate for queue types Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `063d668320`) Cc: stable@vger.kernel.org	2025-08-27 13:59:25 -04:00
Alex Deucher	7e2a5b0a9a	drm/amdgpu/userq: fix error handling of invalid doorbell If the doorbell is invalid, be sure to set the r to an error state so the function returns an error. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
Jesse.Zhang	5f976c9939	drm/amdgpu: update firmware version checks for user queue support The minimum firmware versions required for user queue functionality have been increased to address an issue where the queue privilege state was lost during queue connect operations. The problem occurred because the privilege state was being restored to its initial value at the beginning of the function, overwriting the state that was properly set during the queue connect case. This commit updates the minimum version requirements: - ME firmware from 2390 to 2420 - PFP firmware from 2530 to 2580 - MEC firmware from 2600 to 2650 - MES firmware remains at 120 These updated firmware versions contain the necessary fixes to properly maintain queue privilege state throughout connect operations. Fixes: `61ca97e959` ("drm/amdgpu: Add fw minimum version check for usermode queue") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
Alex Deucher	ec813f384b	drm/amdgpu/vpe: cancel delayed work in hw_fini We need to cancel any outstanding work at both suspend and driver teardown. Move the cancel to hw_fini which gets called in both cases. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
David (Ming Qiang) Wu	061a09b4dc	drm/amdgpu/vcn: remove unused code in vcn_v1_0.c Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
Shaoyun Liu	87e6505261	drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12 From MES version 0x81, it provide the new API INV_TLBS that support invalidate tlbs with PASID. Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
Jesse.Zhang	a7a411e246	drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set Fix a UBSAN shift-out-of-bounds warning in amdgpu_debugfs_jpeg_sched_mask_set when the shift exponent reaches or exceeds 32 bits. The issue occurred because a 32-bit integer '1' was being shifted by up to 32 bits, which is undefined behavior. Replace '1' with '1ULL' to ensure 64-bit arithmetic, matching the u64 type of 'val' and preventing the shift overflow. This is consistent with the existing mask calculation that already uses 1ULL. The error manifested as: UBSAN: shift-out-of-bounds in drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c:373:17 shift exponent 32 is too large for 32-bit type 'int' v2: remove debug log Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:51 -04:00
Jack Xiao	c350d9e268	Reapply "drm/amdgpu: fix incorrect vm flags to map bo" It should use vm flags instead of pte flags to specify bo vm attributes. This reverts commit 1263ceea2a1327014d9de2858a122f3c27dfa4dd. Reapply this patch with the proper fixes tag. Fixes: `6716a823d1` ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:50 -04:00
Alex Deucher	be33e8a239	Revert "drm/amdgpu: fix incorrect vm flags to map bo" This reverts commit `b08425fa77`. Revert this to align with 6.17 because the fixes tag was wrong on this commit. Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:50 -04:00
Alex Deucher	1c65502f81	drm/amdgpu/vpe: add ring reset support Implement ring reset for VPE. Similar to VCN and JPEG, just powergate the the IP to reset it. v2: Properly set per queue reset flag Reviewed-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:49 -04:00
Alex Deucher	fa064d50b7	drm/amdgpu/vcn: drop extra cancel_delayed_work_sync() We already call this in the hw_fini() methods for all VCN instances, so no need to call it again in amdgpu_vcn_suspend(). Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:49 -04:00
Yifan Zhang	e68197aa2b	drm/amdgpu: remove redundant AMDGPU_HAS_VRAM AMDGPU_HAS_VRAM is redundant with is_app_apu, as both refer to APUs with no carve-out. Since AMDGPU_HAS_VRAM only occurs once, remove AMDGPU_HAS_VRAM definition. The tmr allocation can be covered with AMDGPU_GEM_DOMAIN_GTT \| AMDGPU_GEM_DOMAIN_VRAM in both vram and non vram ASICs. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:48 -04:00
Ce Sun	d8442bcad0	drm/amdgpu: Correct the loss of aca bank reg info By polling, poll ACA bank count to ensure that valid ACA bank reg info can be obtained v2: add corresponding delay before send msg to SMU to query mca bank info (Stanley) v3: the loop cannot exit. (Thomas) v4: remove amdgpu_aca_clear_bank_count. (Kevin) v5: continuously inject ce. If a creation interruption occurs at this time, bank reg info will be lost. (Thomas) v5: each cycle is delayed by 100ms. (Tao) Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Ce Sun	0989b764f4	drm/amdgpu: Add a mutex lock to protect poison injection When poison is triggered multiple times, competition will occur. Add a mutex lock to protect poison injection Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Ce Sun	907813e5d7	drm/amdgpu: Correct the counts of nr_banks and nr_errors Correct the counts of nr_banks and nr_errors Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Liao Yuanhong	ee6ba1e69d	drm/amdgpu/fence: Remove redundant 0 value initialization The amdgpu_fence struct is already zeroed by kzalloc(). It's redundant to initialize am_fence->context to 0. Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Hawking Zhang	22dcb283d6	drm/amdgpu: Allocate psp fw private buffer in vram It's not necessarily to allocate psp firmware private buffer in different memory domain in sriov and bare metal environment Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <le.ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Alex Deucher	7b9110f289	drm/amdgpu/gfx12: set MQD as appriopriate for queue types Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:42 -04:00
Alex Deucher	063d668320	drm/amdgpu/gfx11: set MQD as appriopriate for queue types Set the MQD as appropriate for the kernel vs user queues. Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:37 -04:00
Thomas Zimmermann	08fb45446e	drm/amdgpu: Pin buffers while vmap'ing exported dma-buf objects Current dma-buf vmap semantics require that the mapped buffer remains in place until the corresponding vunmap has completed. For GEM-SHMEM, this used to be guaranteed by a pin operation while creating an S/G table in import. GEM-SHMEN can now import dma-buf objects without creating the S/G table, so the pin is missing. Leads to page-fault errors, such as the one shown below. [ 102.101726] BUG: unable to handle page fault for address: ffffc90127000000 [...] [ 102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] [...] [ 102.243250] Call Trace: [ 102.245695] <TASK> [ 102.2477V95] ? validate_chain+0x24e/0x5e0 [ 102.251805] ? __lock_acquire+0x568/0xae0 [ 102.255807] udl_render_hline+0x165/0x341 [udl] [ 102.260338] ? __pfx_udl_render_hline+0x10/0x10 [udl] [ 102.265379] ? local_clock_noinstr+0xb/0x100 [ 102.269642] ? __lock_release.isra.0+0x16c/0x2e0 [ 102.274246] ? mark_held_locks+0x40/0x70 [ 102.278177] udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl] [ 102.284606] ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl] [ 102.291551] ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170 [ 102.297208] ? lockdep_hardirqs_on+0x88/0x130 [ 102.301554] ? _raw_spin_unlock_irq+0x24/0x50 [ 102.305901] ? wait_for_completion_timeout+0x2bb/0x3a0 [ 102.311028] ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200 [ 102.317714] ? drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.323279] drm_atomic_helper_commit_planes+0x3b6/0x1030 [ 102.328664] drm_atomic_helper_commit_tail+0x41/0xb0 [ 102.333622] commit_tail+0x204/0x330 [...] [ 102.529946] ---[ end trace 0000000000000000 ]--- [ 102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl] In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the mapping. Provide a custom dma-buf vmap method in amdgpu that pins the object before mapping it's buffer's pages into kernel address space. Do the opposite in vunmap. Note that dma-buf vmap differs from GEM vmap in how it handles relocation. While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code instead of its GEM code. A discussion of various approaches to solving the problem is available at [1]. v3: - try (GTT \| VRAM); drop CPU domain (Christian) v2: - only use mapable domains (Christian) - try pinning to domains in preferred order Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: `660cd44659` ("drm/shmem-helper: Import dmabuf without mapping its sg_table") Reported-by: Thomas Zimmermann <tzimmermann@suse.de> Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse.de/ Cc: Shixiong Ou <oushixiong@kylinos.cn> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Sumit Semwal <sumit.semwal@linaro.org> Cc: "Christian König" <christian.koenig@amd.com> Cc: dri-devel@lists.freedesktop.org Cc: linux-media@vger.kernel.org Cc: linaro-mm-sig@lists.linaro.org Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse.de/ # [1] Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://lore.kernel.org/r/20250821064031.39090-1-tzimmermann@suse.de	2025-08-22 14:05:31 +02:00
Maxime Ripard	1a2cf179e2	Merge drm/drm-fixes into drm-misc-fixes Update drm-misc-fixes to -rc2. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-08-20 16:08:49 +02:00
Rodrigo Siqueira	3856a53db6	drm/amdgpu/vcn: Remove unnecessary check The function amdgpu_vcn_sysfs_reset_mask_init already returns 0, which makes the check of the result unnecessary in the vcn_v4_0_3_sw_init(). Just return the amdgpu_vcn_sysfs_reset_mask_init directly. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-18 17:06:19 -04:00
Chenglei Xie	d2fa0ec6e0	drm/amdgpu: refactor bad_page_work for corner case handling When a poison is consumed on the guest before the guest receives the host's poison creation msg, a corner case may occur to have poison_handler complete processing earlier than it should to cause the guest to hang waiting for the req_bad_pages reply during a VF FLR, resulting in the VM becoming inaccessible in stress tests. To fix this issue, this patch refactored the mailbox sequence by seperating the bad_page_work into two parts req_bad_pages_work and handle_bad_pages_work. Old sequence: 1.Stop data exchange work 2.Guest sends MB_REQ_RAS_BAD_PAGES to host and keep polling for IDH_RAS_BAD_PAGES_READY 3.If the IDH_RAS_BAD_PAGES_READY arrives within timeout limit, re-init the data exchange region for updated bad page info else timeout with error message New sequence: req_bad_pages_work: 1.Stop data exhange work 2.Guest sends MB_REQ_RAS_BAD_PAGES to host Once Guest receives IDH_RAS_BAD_PAGES_READY event handle_bad_pages_work: 3.re-init the data exchange region for updated bad page info Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com> Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:07:30 -04:00
Qiang Liu	fc4e990a32	drm/amdgpu: remove duplicated argument wptr_va The duplicate judgment of wptr_va could be removed to simplify the logic Signed-off-by: Qiang Liu <liuqiang@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:07:25 -04:00
Jesse.Zhang	655d6403ad	drm/amd/vcn: Add late_init callback for VCN v4.0.3 reset handling This change reorganizes VCN reset capability detection by: 1. Moving reset mask configuration from sw_init to new late_init phase 2. Adding vcn_v4_0_3_late_init() to properly check for per-queue reset support 3. Only setting soft full reset mask as fallback when per-queue reset isn't supported 4. Removing TODO comment now that queue reset support is implemented V2: Removed unrelated changes. Keep amdgpu_get_soft_full_reset_mask in place and remove TODO comment. (Alex) v3: set the flags at one place (all in late_init) (Lijo) Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:05:06 -04:00
Heng Zhou	859958a7fa	drm/amdgpu: fix nullptr err of vm_handle_moved If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL. So, such kind of amdgpu_bo_va should be updated separately before amdgpu_vm_handle_moved. Signed-off-by: Heng Zhou <Heng.Zhou@amd.com> Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-15 13:04:07 -04:00
Thomas Zimmermann	3eb61d7cb7	Revert "drm/amdgpu: Use dma_buf from GEM object instance" This reverts commit `515986100d`. The dma_buf field in struct drm_gem_object is not stable over the object instance's lifetime. The field becomes NULL when user space releases the final GEM handle on the buffer object. This resulted in a NULL-pointer deref. Workarounds in commit `5307dce878` ("drm/gem: Acquire references on GEM handles for framebuffers") and commit `f6bfc9afc7` ("drm/framebuffer: Acquire internal references on GEM handles") only solved the problem partially. They especially don't work for buffer objects without a DRM framebuffer associated. Hence, this revert to going back to using .import_attach->dmabuf. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250715082635.34974-1-tzimmermann@suse.de	2025-08-13 11:14:17 +02:00
Liu01 Tong	aa5fc4362f	drm/amdgpu: fix task hang from failed job submission during process kill During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Fixes: `1f02f2044b` ("drm/amdgpu: Avoid extra evict-restore process.") Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f101c13a87`)	2025-08-12 16:16:36 -04:00
Jack Xiao	040bc6d0e0	drm/amdgpu: fix incorrect vm flags to map bo It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: `7946340fa3` ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b08425fa77`)	2025-08-12 16:08:04 -04:00
YiPeng Chai	10ef476aad	drm/amdgpu: fix vram reservation issue The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: `c9cad937c0` ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d38eaf27de`)	2025-08-12 16:07:45 -04:00
Frank Min	e67b8afcb6	drm/amdgpu: Add PSP fw version check for fw reserve GFX command The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: `a3b7f9c306` ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `065e23170a`)	2025-08-12 16:07:31 -04:00
Liu01 Tong	f101c13a87	drm/amdgpu: fix task hang from failed job submission during process kill During process kill, drm_sched_entity_flush() will kill the vm entities. The following job submissions of this process will fail, and the resources of these jobs have not been released, nor have the fences been signalled, causing tasks to hang and timeout. Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to stopped entity. v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in function amdgpu_cs_vm_handling(). Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:22:54 -04:00
Jack Xiao	b08425fa77	drm/amdgpu: fix incorrect vm flags to map bo It should use vm flags instead of pte flags to specify bo vm attributes. Fixes: `7946340fa3` ("drm/amdgpu: Move csa related code to separate file") Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:19:57 -04:00
YiPeng Chai	d38eaf27de	drm/amdgpu: fix vram reservation issue The vram block allocation flag must be cleared before making vram reservation, otherwise reserving addresses within the currently freed memory range will always fail. Fixes: `c9cad937c0` ("drm/amdgpu: add drm buddy support to amdgpu") Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:15:03 -04:00
Frank Min	065e23170a	drm/amdgpu: Add PSP fw version check for fw reserve GFX command The fw reserved GFX command is only supported starting from PSP fw version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command. Add a version guard to ensure the command is only used when the running PSP fw meets the minimum version requirement. This ensures backward compatibility and safe operation across fw revisions. Fixes: `a3b7f9c306` ("drm/amdgpu: reclaim psp fw reservation memory region") Signed-off-by: Frank Min <Frank.Min@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:13:03 -04:00
Lijo Lazar	d543489aa1	drm/amdgpu: Add description for partition commands Add string description for partition commands. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-12 14:12:57 -04:00
Cryolitia PukNgae	388b68aef7	drm/amdgpu: fix incorrect comment format Comments should not have a leading plus sign. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:41 -04:00
Vitaly Prosyak	c31f486bc8	drm/amdgpu: add to custom amdgpu_drm_release drm_dev_enter/exit User queues are disabled before GEM objects are released (protecting against user app crashes). No races with PCI hot-unplug (because drm_dev_enter prevents cleanup if iewdevice is being removed). Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:13:03 -04:00
Lijo Lazar	1dd2fa0e00	drm/amdgpu: Save and restore switch state During a DPC error kernel waits for the link to be active before notifying downstream devices. On certain platforms with Broadcom switch in synthetiic mode, switch responds with values even though the link is not fully ready. The config space restoration done by pcie port driver for SWUS/DS of dGPU is thus not effective as the switch is still doing internal enumeration. As a workaround, save state of SWUS/DS device in driver. Add additional check to see if link is active and restore the values during DPC error callbacks. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-11 11:12:53 -04:00
Linus Torvalds	ffe8ac927d	Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel Pull drm fixes from Dave Airlie: "This is the fixes that built up in the merge window, mostly amdgpu and xe with one i915 display fix, seems like things are pretty good for rc1. i915: - DP LPFS fixes xe: - SRIOV: PF fixes and removal of need of module param - Fix driver unbind around Devcoredump - Mark xe driver as BROKEN if kernel page size is not 4kB amdgpu: - GC 9.5.0 fixes - SMU fix - DCE 6 DC fixes - mmhub client ID fixes - VRR fix - Backlight fix - UserQ fix - Legacy reset fix - Misc fixes amdkfd: - CRIU fix - Debugfs fix" * tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits) drm/amdgpu: add missing vram lost check for LEGACY RESET drm/amdgpu/discovery: fix fw based ip discovery drm/amdkfd: Destroy KFD debugfs after destroy KFD wq amdgpu/amdgpu_discovery: increase timeout limit for IFWI init drm/amdgpu: Update SDMA firmware version check for user queue support drm/amdgpu: Add NULL check for asic_funcs drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value" drm/amd/display: fix a Null pointer dereference vulnerability drm/amd/display: Add primary plane to commits for correct VRR handling drm/amdgpu: update mmhub 3.3 client id mappings drm/amdgpu: update mmhub 3.0.1 client id mappings drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming. drm/amd/display: Don't overwrite dce60_clk_mgr drm/amdkfd: Fix checkpoint-restore on multi-xcc drm/amd: Restore cached manual clock settings during resume drm/amd: Restore cached power limit during resume drm/amdgpu: Update external revid for GC v9.5.0 drm/amdgpu: Update supported modes for GC v9.5.0 Mark xe driver as BROKEN if kernel page size is not 4kB ...	2025-08-08 06:48:14 +03:00
Alex Deucher	81699fe81b	drm/amdgpu: add missing vram lost check for LEGACY RESET Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `aae94897b6`) Cc: stable@vger.kernel.org	2025-08-06 16:54:25 -04:00
Alex Deucher	514678da56	drm/amdgpu/discovery: fix fw based ip discovery We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: `80a0e82829` ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `62eedd150f`) Cc: stable@vger.kernel.org	2025-08-06 16:54:04 -04:00
Xaver Hugl	928587381b	amdgpu/amdgpu_discovery: increase timeout limit for IFWI init With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9ed3d7bdf2`) Cc: stable@vger.kernel.org	2025-08-06 16:51:26 -04:00
Sathishkumar S	111821e4b5	drm/amdgpu/vcn: Hold pg_lock before vcn power off Acquire vcn_pg_lock before changes to vcn power state and release it after power off in idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:54 -04:00
Sathishkumar S	0e7581eda8	drm/amdgpu/jpeg: Hold pg_lock before jpeg poweroff Acquire jpeg_pg_lock before changes to jpeg power state and release it after power off from idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:50 -04:00
Lijo Lazar	0b4d79dafa	drm/amdgpu: Assign unique id to compute partition Assign unique id to compute partition. This is the unique id of the first XCD instance belonging to the partition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:47 -04:00
Alex Deucher	aae94897b6	drm/amdgpu: add missing vram lost check for LEGACY RESET Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:14 -04:00
Alex Deucher	62eedd150f	drm/amdgpu/discovery: fix fw based ip discovery We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: `80a0e82829` ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:32:38 -04:00

... 15 16 17 18 19 ...

17069 Commits