linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 14:53:58 -04:00

Author	SHA1	Message	Date
Alex Deucher	90d239cc53	drm/amd/display: Fix DCE LVDS handling LVDS does not use an HPD pin so it may be invalid. Handle this case correctly in link encoder creation. Fixes: `7c8fb3b8e9` ("drm/amd/display: Add hpd_source index check for DCE60/80/100/110/112/120 link encoders") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5012 Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Cc: Roman Li <roman.li@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3b5620f7ee`) Cc: stable@vger.kernel.org	2026-03-24 13:55:47 -04:00
Donet Tom	4e9597f22a	drm/amdgpu: Handle GPU page faults correctly on non-4K page systems During a GPU page fault, the driver restores the SVM range and then maps it into the GPU page tables. The current implementation passes a GPU-page-size (4K-based) PFN to svm_range_restore_pages() to restore the range. SVM ranges are tracked using system-page-size PFNs. On systems where the system page size is larger than 4K, using GPU-page-size PFNs to restore the range causes two problems: Range lookup fails: Because the restore function receives PFNs in GPU (4K) units, the SVM range lookup does not find the existing range. This will result in a duplicate SVM range being created. VMA lookup failure: The restore function also tries to locate the VMA for the faulting address. It converts the GPU-page-size PFN into an address using the system page size, which results in an incorrect address on non-4K page-size systems. As a result, the VMA lookup fails with the message: "address 0xxxx VMA is removed". This patch passes the system-page-size PFN to svm_range_restore_pages() so that the SVM range is restored correctly on non-4K page systems. Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `074fe395fb`)	2026-03-24 13:55:29 -04:00
Yang Wang	28922a43fd	drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v14 Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid, otherwise PMFW will reject this configuration on smu v14.0.2/14.0.3. example: $ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve OD_FAN_CURVE: 0: 0C 0% 1: 0C 0% 2: 0C 0% 3: 0C 0% 4: 0C 0% OD_RANGE: FAN_CURVE(hotspot temp): 0C 0C FAN_CURVE(fan speed): 0% 0% $ echo "0 50 40" \| sudo tee fan_curve kernel log: [ 969.761627] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! [ 1010.897800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ab4905d466`) Cc: stable@vger.kernel.org	2026-03-24 13:54:42 -04:00
Srinivasan Shanmugam	429aec2bc0	drm/amdkfd: Fix NULL pointer check order in kfd_ioctl_create_process In kfd_ioctl_create_process(), the pointer 'p' is used before checking if it is NULL. The code accesses p->context_id before validating 'p'. This can lead to a possible NULL pointer dereference. Move the NULL check before using 'p' so that the pointer is validated before access. Fixes the below: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_chardev.c:3177 kfd_ioctl_create_process() warn: variable dereferenced before check 'p' (see line 3174) Fixes: `cc6b66d661` ("amdkfd: introduce new ioctl AMDKFD_IOC_CREATE_PROCESS") Cc: Zhu Lingshan <lingshan.zhu@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `19d4149b22`)	2026-03-24 13:54:19 -04:00
Alex Deucher	9da4f9964a	drm/amd/display: check if ext_caps is valid in BL setup LVDS connectors don't have extended backlight caps so check if the pointer is valid before accessing it. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5012 Fixes: `1454642960` ("drm/amd: Re-introduce property to control adaptive backlight modulation") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3f797396d7`) Cc: stable@vger.kernel.org	2026-03-24 13:53:46 -04:00
Srinivasan Shanmugam	7150850146	drm/amdgpu: Fix fence put before wait in amdgpu_amdkfd_submit_ib amdgpu_amdkfd_submit_ib() submits a GPU job and gets a fence from amdgpu_ib_schedule(). This fence is used to wait for job completion. Currently, the code drops the fence reference using dma_fence_put() before calling dma_fence_wait(). If dma_fence_put() releases the last reference, the fence may be freed before dma_fence_wait() is called. This can lead to a use-after-free. Fix this by waiting on the fence first and releasing the reference only after dma_fence_wait() completes. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c:697 amdgpu_amdkfd_submit_ib() warn: passing freed memory 'f' (line 696) Fixes: `9ae55f030d` ("drm/amdgpu: Follow up change to previous drm scheduler change.") Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8b9e5259ad`)	2026-03-24 13:53:33 -04:00
Yang Wang	3e6dd28a11	drm/amd/pm: disable OD_FAN_CURVE if temp or pwm range invalid for smu v13 Forcibly disable the OD_FAN_CURVE feature when temperature or PWM range is invalid, otherwise PMFW will reject this configuration on smu v13.0.x example: $ sudo cat /sys/bus/pci/devices/<BDF>/gpu_od/fan_ctrl/fan_curve OD_FAN_CURVE: 0: 0C 0% 1: 0C 0% 2: 0C 0% 3: 0C 0% 4: 0C 0% OD_RANGE: FAN_CURVE(hotspot temp): 0C 0C FAN_CURVE(fan speed): 0% 0% $ echo "0 50 40" \| sudo tee fan_curve kernel log: [ 756.442527] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! [ 777.345800] amdgpu 0000:03:00.0: amdgpu: Fan curve temp setting(50) must be within [0, 0]! Closes: https://github.com/ROCm/amdgpu/issues/208 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `470891606c`) Cc: stable@vger.kernel.org	2026-03-23 14:49:15 -04:00
Asad Kamal	2f0e491fae	drm/amd/pm: Return -EOPNOTSUPP for unsupported OD_MCLK on smu_v13_0_6 When SET_UCLK_MAX capability is absent, return -EOPNOTSUPP from smu_v13_0_6_emit_clk_levels() for OD_MCLK instead of 0. This makes unsupported OD_MCLK reporting consistent with other clock types and allows callers to skip the entry cleanly. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d82e0a72d9`) Cc: stable@vger.kernel.org	2026-03-23 14:48:52 -04:00
Asad Kamal	cdbc3b62cf	drm/amd/pm: Skip redundant UCLK restore in smu_v13_0_6 Only reapply UCLK soft limits during PP_OD_RESTORE_DEFAULT when the current max differs from the DPM table max. This avoids redundant SMC updates and prevents -EINVAL on restore when no change is needed. Fixes: `b7a9003445` ("drm/amd/pm: Allow setting max UCLK on SMU v13.0.6") Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `17f11bbbc7`)	2026-03-23 14:48:45 -04:00
Alex Hung	37c2caa167	drm/amd/display: Fix drm_edid leak in amdgpu_dm [WHAT] When a sink is connected, aconnector->drm_edid was overwritten without freeing the previous allocation, causing a memory leak on resume. [HOW] Free the previous drm_edid before updating it. Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Chuanyu Tseng <chuanyu.tseng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `52024a94e7`) Cc: stable@vger.kernel.org	2026-03-23 14:48:32 -04:00
Eric Huang	14b81abe7b	drm/amdgpu: prevent immediate PASID reuse case PASID resue could cause interrupt issue when process immediately runs into hw state left by previous process exited with the same PASID, it's possible that page faults are still pending in the IH ring buffer when the process exits and frees up its PASID. To prevent the case, it uses idr cyclic allocator same as kernel pid's. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8f1de51f49`) Cc: stable@vger.kernel.org	2026-03-23 14:48:06 -04:00
Ruijing Dong	2d300ebfc4	drm/amdgpu: fix strsep() corrupting lockup_timeout on multi-GPU (v3) amdgpu_device_get_job_timeout_settings() passes a pointer directly to the global amdgpu_lockup_timeout[] buffer into strsep(). strsep() destructively replaces delimiter characters with '\0' in-place. On multi-GPU systems, this function is called once per device. When a multi-value setting like "0,0,0,-1" is used, the first GPU's call transforms the global buffer into "0\00\00\0-1". The second GPU then sees only "0" (terminated at the first '\0'), parses a single value, hits the single-value fallthrough (index == 1), and applies timeout=0 to all rings — causing immediate false job timeouts. Fix this by copying into a stack-local array before calling strsep(), so the global module parameter buffer remains intact across calls. The buffer is AMDGPU_MAX_TIMEOUT_PARAM_LENGTH (256) bytes, which is safe for the stack. v2: wrap commit message to 72 columns, add Assisted-by tag. v3: use stack array with strscpy() instead of kstrdup()/kfree() to avoid unnecessary heap allocation (Christian). This patch was developed with assistance from Claude (claude-opus-4-6). Assisted-by: Claude:claude-opus-4-6 Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `94d79f51ef`) Cc: stable@vger.kernel.org	2026-03-23 14:47:47 -04:00
Yussuf Khalil	aed3d041ab	drm/amd/display: Do not skip unrelated mode changes in DSC validation Starting with commit `17ce8a6907` ("drm/amd/display: Add dsc pre-validation in atomic check"), amdgpu resets the CRTC state mode_changed flag to false when recomputing the DSC configuration results in no timing change for a particular stream. However, this is incorrect in scenarios where a change in MST/DSC configuration happens in the same KMS commit as another (unrelated) mode change. For example, the integrated panel of a laptop may be configured differently (e.g., HDR enabled/disabled) depending on whether external screens are attached. In this case, plugging in external DP-MST screens may result in the mode_changed flag being dropped incorrectly for the integrated panel if its DSC configuration did not change during precomputation in pre_validate_dsc(). At this point, however, dm_update_crtc_state() has already created new streams for CRTCs with DSC-independent mode changes. In turn, amdgpu_dm_commit_streams() will never release the old stream, resulting in a memory leak. amdgpu_dm_atomic_commit_tail() will never acquire a reference to the new stream either, which manifests as a use-after-free when the stream gets disabled later on: BUG: KASAN: use-after-free in dc_stream_release+0x25/0x90 [amdgpu] Write of size 4 at addr ffff88813d836524 by task kworker/9:9/29977 Workqueue: events drm_mode_rmfb_work_fn Call Trace: <TASK> dump_stack_lvl+0x6e/0xa0 print_address_description.constprop.0+0x88/0x320 ? dc_stream_release+0x25/0x90 [amdgpu] print_report+0xfc/0x1ff ? srso_alias_return_thunk+0x5/0xfbef5 ? __virt_addr_valid+0x225/0x4e0 ? dc_stream_release+0x25/0x90 [amdgpu] kasan_report+0xe1/0x180 ? dc_stream_release+0x25/0x90 [amdgpu] kasan_check_range+0x125/0x200 dc_stream_release+0x25/0x90 [amdgpu] dc_state_destruct+0x14d/0x5c0 [amdgpu] dc_state_release.part.0+0x4e/0x130 [amdgpu] dm_atomic_destroy_state+0x3f/0x70 [amdgpu] drm_atomic_state_default_clear+0x8ee/0xf30 ? drm_mode_object_put.part.0+0xb1/0x130 __drm_atomic_state_free+0x15c/0x2d0 atomic_remove_fb+0x67e/0x980 Since there is no reliable way of figuring out whether a CRTC has unrelated mode changes pending at the time of DSC validation, remember the value of the mode_changed flag from before the point where a CRTC was marked as potentially affected by a change in DSC configuration. Reset the mode_changed flag to this earlier value instead in pre_validate_dsc(). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5004 Fixes: `17ce8a6907` ("drm/amd/display: Add dsc pre-validation in atomic check") Signed-off-by: Yussuf Khalil <dev@pp3345.net> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `cc7c7121ae`)	2026-03-23 14:38:56 -04:00
Alex Deucher	9787f7da18	drm/amdgpu: apply state adjust rules to some additional HAINAN vairants They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0de31d92a1`) Cc: stable@vger.kernel.org	2026-03-17 18:04:03 -04:00
Alex Deucher	e9f58ff991	drm/amdgpu: rework how we handle TLB fences Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: `f3854e04b7` ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `69c5fbd2b9`) Cc: stable@vger.kernel.org	2026-03-17 18:03:09 -04:00
Pratap Nirujogi	3fc4648b53	drm/amdgpu: Fix ISP segfault issue in kernel v7.0 Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: `057edc58aa` ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c51632d1ed`)	2026-03-17 12:19:29 -04:00
Alex Deucher	f39e127027	drm/amdgpu/gmc9.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e14d468304`) Cc: stable@vger.kernel.org	2026-03-17 12:19:23 -04:00
Alex Deucher	9c52f49545	drm/amdgpu/mmhub4.2.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `dea5f235ba`) Cc: stable@vger.kernel.org	2026-03-17 12:19:17 -04:00
Alex Deucher	3cdd405831	drm/amdgpu/mmhub4.1.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `04f063d850`) Cc: stable@vger.kernel.org	2026-03-17 12:19:11 -04:00
Alex Deucher	cdb82ecbec	drm/amdgpu/mmhub3.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f14f27bbe2`) Cc: stable@vger.kernel.org	2026-03-17 12:19:04 -04:00
Alex Deucher	e5e6d67b1c	drm/amdgpu/mmhub3.0.2: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1441f52c7f`) Cc: stable@vger.kernel.org	2026-03-17 12:18:58 -04:00
Alex Deucher	5d4e88bcfe	drm/amdgpu/mmhub3.0.1: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5f76083183`) Cc: stable@vger.kernel.org	2026-03-17 12:18:52 -04:00
Alex Deucher	a54403a534	drm/amdgpu/mmhub2.3: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `89cd90375c`) Cc: stable@vger.kernel.org	2026-03-17 12:18:46 -04:00
Alex Deucher	0b26edac4a	drm/amdgpu/mmhub2.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e064cef4b5`) Cc: stable@vger.kernel.org	2026-03-17 12:18:34 -04:00
Andy Nguyen	39f44f54af	drm/amd: fix dcn 2.01 check The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: `54b822b3ea` ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9c7be0efa6`)	2026-03-17 12:15:57 -04:00
Srinivasan Shanmugam	2323b01965	drm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr() parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid \|\| !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: `a638b837d0` ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `91c7e6342e`)	2026-03-17 12:15:49 -04:00
Xi Ruoyao	ebe82c6e75	drm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END} [Why] The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is compiled with CC_FLAGS_FPU into FP instructions. So when we call it we must use DC_FP_{START,END} to save and restore the FP context, and prepare the FP unit on architectures like LoongArch where the FP unit isn't always on. Reported-by: LiarOnce <liaronce@hotmail.com> Fixes: `ee7be8f3de` ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `25bb1d54ba`) Cc: stable@vger.kernel.org	2026-03-17 12:12:11 -04:00
Calvin Owens	1071815989	drm/amd/display: Fix uninitialized variable use which breaks full LTO Commit `e1b385726f` ("drm/amd/display: Add additional checks for PSP footer size") introduced a use of an uninitialized stack variable in dm_dmub_sw_init() (region_params.bss_data_size). Interestingly, this seems to cause no issue on normal kernels. But when full LTO is enabled, it causes the compiler to "optimize" out huge swaths of amdgpu initialization code, and the driver is unusable: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00 amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init It surprises me that neither gcc nor clang emit a warning about this: I only found it by bisecting the LTO breakage. Fix by using the bss_data_size field from fw_meta_info_params, as was presumably intended. Fixes: `e1b385726f` ("drm/amd/display: Add additional checks for PSP footer size") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b7f1402f6a`)	2026-03-17 12:11:49 -04:00
Jesse.Zhang	6270b1a5da	drm/amdgpu: Limit BO list entry count to prevent resource exhaustion Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `688b87d39e`) Cc: stable@vger.kernel.org	2026-03-17 12:10:16 -04:00
Alex Hung	b49814033c	drm/amd/display: Fix gamma 2.2 colorop TFs Use GAMMA22 for degamma/blend and GAMMA22_INV for shaper so curves match the color pipeline. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5016 Tested-by: Xaver Hugl <xaver.hugl@kde.org> Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d8f9f42eff`)	2026-03-17 12:08:46 -04:00
Mario Limonciello	3646ff2878	drm/amd: Set num IP blocks to 0 if discovery fails If discovery has failed for any reason (such as no support for a block) then there is no need to unwind all the IP blocks in fini. In this condition there can actually be failures during the unwind too. Reset num_ip_blocks to zero during failure path and skip the unnecessary cleanup path. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fae5984296`) Cc: stable@vger.kernel.org	2026-03-11 14:04:08 -04:00
Philip Yang	2ce75a0b7e	drm/amdkfd: Unreserve bo if queue update failed Error handling path should unreserve bo then return failed. Fixes: `305cd109b7` ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c24afed7de`)	2026-03-11 14:02:45 -04:00
Ivan Lipski	becbab4a5a	drm/amd/display: Check for S0i3 to be done before DCCG init on DCN21 [WHY] On DCN21, dccg2_init() is called in dcn10_init_hw() before bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264, causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden init. Eventually during the resume sequence, a flip timeout occurs. [HOW] Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21. Fixes: `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c61eda4343`)	2026-03-11 14:01:39 -04:00
Ivan Lipski	33efc6346e	drm/amd/display: Add missing DCCG register entries for DCN20-DCN316 Commit `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") moved register writes from hwseq to dccg2_*() functions but did not add the registers to the DCCG register list macros. The struct fields default to 0, so REG_WRITE() targets MMIO offset 0, causing a GPU hang on resume (seen on DCN21/DCN30 during IGT kms_cursor_crc@cursor-suspend). Add - MICROSECOND_TIME_BASE_DIV - MILLISECOND_TIME_BASE_DIV - DCCG_GATE_DISABLE_CNTL - DCCG_GATE_DISABLE_CNTL2 - DC_MEM_GLOBAL_PWR_REQ_CNTL to macros in dcn20_dccg.h, dcn301_dccg.h, dcn31_dccg.h, and dcn314_dccg.h. Fixes: `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reported-by: Rafael Passos <rafael@rcpassos.me> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e6e2b956fc`)	2026-03-11 14:01:16 -04:00
Mario Limonciello	72ecb1dae7	drm/amd: Fix a few more NULL pointer dereference in device cleanup I found a few more paths that cleanup fails due to a NULL version pointer on unsupported hardware. Add NULL checks as applicable. Fixes: `39fc2bc4da` ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f5a05f8414`) Cc: stable@vger.kernel.org	2026-03-06 17:19:14 -05:00
Yang Wang	a6571045cf	drm/amdgpu: fix gpu idle power consumption issue for gfx v12 Older versions of the MES firmware may cause abnormal GPU power consumption. When performing inference tasks on the GPU (e.g., with Ollama using ROCm), the GPU may show abnormal power consumption in idle state and incorrect GPU load information. This issue has been fixed in firmware version 0x8b and newer. Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4e22a5fe6e`)	2026-03-06 17:11:46 -05:00
Cristian Ciocaltea	52289ce48e	drm/amdgpu: Fix kernel-doc comments for some LUT properties The following members of struct amdgpu_mode_info do not have valid references in the related kernel-doc sections: - plane_shaper_lut_property - plane_shaper_lut_size_property, - plane_lut3d_size_property Correct all affected comment blocks. Fixes: `f545d82479` ("drm/amd/display: add plane shaper LUT and TF driver-specific properties") Fixes: `671994e3bf` ("drm/amd/display: add plane 3D LUT driver-specific properties") Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ec5708d6e5`)	2026-03-06 17:11:15 -05:00
Mario Limonciello	062ea905ff	drm/amd: Fix NULL pointer dereference in device cleanup When GPU initialization fails due to an unsupported HW block IP blocks may have a NULL version pointer. During cleanup in amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and amdgpu_device_set_cg_state which iterate over all IP blocks and access adev->ip_blocks[i].version without NULL checks, leading to a kernel NULL pointer dereference. Add NULL checks for adev->ip_blocks[i].version in both amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent dereferencing NULL pointers during GPU teardown when initialization has failed. Fixes: `39fc2bc4da` ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b7ac77468c`) Cc: stable@vger.kernel.org	2026-03-06 17:10:40 -05:00
Yang Wang	9d4837a261	drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14 add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3 Fixes: `9710b84e2a` ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1b5cf07d80`)	2026-03-06 17:10:26 -05:00
Yang Wang	cb47c882c3	drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13 add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7 Fixes: `cfffd980bf` ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `576a10797b`)	2026-03-06 17:10:13 -05:00
Sunil Khatri	65b5c326ce	drm/amdgpu/userq: refcount userqueues to avoid any race conditions To avoid race condition and avoid UAF cases, implement kref based queues and protect the below operations using xa lock a. Getting a queue from xarray b. Increment/Decrement it's refcount Every time some one want to access a queue, always get via amdgpu_userq_get to make sure we have locks in place and get the object if active. A userqueue is destroyed on the last refcount is dropped which typically would be via IOCTL or during fini. v2: Add the missing drop in one the condition in the signal ioclt [Alex] v3: remove the queue from the xarray first in the free queue ioctl path [Christian] - Pass queue to the amdgpu_userq_put directly. - make amdgpu_userq_put xa_lock free since we are doing put for each get only and final put is done via destroy and we remove the queue from xa with lock. - use userq_put in fini too so cleanup is done fully. v4: Use xa_erase directly rather than doing load and erase in free ioctl. Also remove some of the error logs which could be exploited by the user to flood the logs [Christian] Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4952189b28`) Cc: <stable@vger.kernel.org> # `048c1c4e51`: drm/amdgpu/userq: Consolidate wait ioctl exit path Cc: <stable@vger.kernel.org>	2026-03-04 13:15:00 -05:00
Tvrtko Ursulin	048c1c4e51	drm/amdgpu/userq: Consolidate wait ioctl exit path If we gate the fence destruction with a check telling us whether there are valid pointers in there we can eliminate the need for dual, basically identical, exit paths. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bea29bb0dd`)	2026-03-04 13:15:00 -05:00
sguttula	a145bbff6f	drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox The reason the RAP is not granting access to 0x58200 is that a dedicated RSMU slot would have to be spent for this address range, and MPASP is close to running out of RSMU slots. This will help to fix PSP TOC load failure during secureboot. GFX Driver Need to use indirect access for SMN address regs. Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9b822e26ee`)	2026-03-04 13:15:00 -05:00
Alysa Liu	2c1030f2e8	drm/amdgpu: Fix use-after-free race in VM acquire Replace non-atomic vm->process_info assignment with cmpxchg() to prevent race when parent/child processes sharing a drm_file both try to acquire the same VM after fork(). Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c7c573275e`) Cc: stable@vger.kernel.org	2026-03-04 13:15:00 -05:00
Yang Wang	68785c5e79	drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x v1: The metrics->EnergyAccumulator field has been deprecated on newer pmfw. v2: add smu 13.0.0/13.0.7/13.0.10 support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8de9edb359`) Cc: stable@vger.kernel.org	2026-03-04 13:14:59 -05:00
Dillon Varone	30d937f63b	drm/amd/display: Fallback to boot snapshot for dispclk [WHY & HOW] If the dentist is unavailable, fallback to reading CLKIP via the boot snapshot to get the current dispclk. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2ab77600d1`) Cc: stable@vger.kernel.org	2026-03-02 17:13:52 -05:00
sguttula	389c2024ca	drm/amdgpu: Enable DPG support for VCN5 This will set DPG flags for enabling power gating on GFX11_5_4 Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a503c266d7`)	2026-03-02 17:13:29 -05:00
Alex Hung	a4fa2355e0	drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT [WHAT] Create DEGAMMA properties even if color pipeline is enabled, and enforce the mutual exclusion in atomic check by rejecting any commit that attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on the CRTC simultaneously. Fixes: `18a4127e93` ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `196a6aa727`)	2026-03-02 17:13:13 -05:00
Alex Hung	c28b3ec3ca	drm/amd/display: Use mpc.preblend flag to indicate 3D LUT [WHAT] New ASIC's 3D LUT is indicated by mpc.preblend. Fixes: `0de2b1afea` ("drm/amd/display: add 3D LUT colorop") Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `43175f6164`)	2026-03-02 17:12:58 -05:00
Mario Limonciello	6b0d812971	drm/amd: Disable MES LR compute W/A A workaround was introduced in commit `1fb710793c` ("drm/amdgpu: Enable MES lr_compute_wa by default") to help with some hangs observed in gfx1151. This WA didn't fully fix the issue. It was actually fixed by adjusting the VGPR size to the correct value that matched the hardware in commit `b42f3bf953` ("drm/amdkfd: bump minimum vgpr size for gfx1151"). There are reports of instability on other products with newer GC microcode versions, and I believe they're caused by this workaround. As we don't need the workaround any more, remove it. Fixes: `b42f3bf953` ("drm/amdkfd: bump minimum vgpr size for gfx1151") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9973e64bd6`) Cc: stable@vger.kernel.org	2026-02-25 17:58:06 -05:00

1 2 3 4 5 ...

36005 Commits