linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-19 15:24:02 -04:00

Author	SHA1	Message	Date
Alex Deucher	9787f7da18	drm/amdgpu: apply state adjust rules to some additional HAINAN vairants They need a similar workaround. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1839 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0de31d92a1`) Cc: stable@vger.kernel.org	2026-03-17 18:04:03 -04:00
Alex Deucher	e9f58ff991	drm/amdgpu: rework how we handle TLB fences Add a new VM flag to indicate whether or not we need a TLB fence. Userqs (KFD or KGD) require a TLB fence. A TLB fence is not strictly required for kernel queues, but it shouldn't hurt. That said, enabling this unconditionally should be fine, but it seems to tickle some issues in KIQ/MES. Only enable them for KFD, or when KGD userq queues are enabled (currently via module parameter). Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4798 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4749 Fixes: `f3854e04b7` ("drm/amdgpu: attach tlb fence to the PTs update") Cc: Christian König <christian.koenig@amd.com> Cc: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `69c5fbd2b9`) Cc: stable@vger.kernel.org	2026-03-17 18:03:09 -04:00
Pratap Nirujogi	3fc4648b53	drm/amdgpu: Fix ISP segfault issue in kernel v7.0 Add NULL pointer checks for dev->type before accessing dev->type->name in ISP genpd add/remove functions to prevent kernel crashes. This regression was introduced in v7.0 as the wakeup sources are registered using physical device instead of ACPI device. This led to adding wakeup source device as the first child of AMDGPU device without initializing dev-type variable, and resulted in segfault when accessed it in the amdgpu isp driver. Fixes: `057edc58aa` ("ACPI: PM: Register wakeup sources under physical devices") Suggested-by: Bin Du <Bin.Du@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c51632d1ed`)	2026-03-17 12:19:29 -04:00
Alex Deucher	f39e127027	drm/amdgpu/gmc9.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Cc: Benjamin Cheng <benjamin.cheng@amd.com> Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e14d468304`) Cc: stable@vger.kernel.org	2026-03-17 12:19:23 -04:00
Alex Deucher	9c52f49545	drm/amdgpu/mmhub4.2.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `dea5f235ba`) Cc: stable@vger.kernel.org	2026-03-17 12:19:17 -04:00
Alex Deucher	3cdd405831	drm/amdgpu/mmhub4.1.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `04f063d850`) Cc: stable@vger.kernel.org	2026-03-17 12:19:11 -04:00
Alex Deucher	cdb82ecbec	drm/amdgpu/mmhub3.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f14f27bbe2`) Cc: stable@vger.kernel.org	2026-03-17 12:19:04 -04:00
Alex Deucher	e5e6d67b1c	drm/amdgpu/mmhub3.0.2: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1441f52c7f`) Cc: stable@vger.kernel.org	2026-03-17 12:18:58 -04:00
Alex Deucher	5d4e88bcfe	drm/amdgpu/mmhub3.0.1: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5f76083183`) Cc: stable@vger.kernel.org	2026-03-17 12:18:52 -04:00
Alex Deucher	a54403a534	drm/amdgpu/mmhub2.3: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `89cd90375c`) Cc: stable@vger.kernel.org	2026-03-17 12:18:46 -04:00
Alex Deucher	0b26edac4a	drm/amdgpu/mmhub2.0: add bounds checking for cid The value should never exceed the array size as those are the only values the hardware is expected to return, but add checks anyway. Reviewed-by: Benjamin Cheng <benjamin.cheng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e064cef4b5`) Cc: stable@vger.kernel.org	2026-03-17 12:18:34 -04:00
Andy Nguyen	39f44f54af	drm/amd: fix dcn 2.01 check The ASICREV_IS_BEIGE_GOBY_P check always took precedence, because it includes all chip revisions upto NV_UNKNOWN. Fixes: `54b822b3ea` ("drm/amd/display: Use dce_version instead of chip_id") Signed-off-by: Andy Nguyen <theofficialflow1996@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9c7be0efa6`)	2026-03-17 12:15:57 -04:00
Srinivasan Shanmugam	2323b01965	drm/amd/display: Fix DisplayID not-found handling in parse_edid_displayid_vrr() parse_edid_displayid_vrr() searches the EDID extension blocks for a DisplayID extension before parsing the dynamic video timing range. The code previously checked whether edid_ext was NULL after the search loop. However, edid_ext is assigned during each iteration of the loop, so it will never be NULL once the loop has executed. If no DisplayID extension is found, edid_ext ends up pointing to the last extension block, and the NULL check does not correctly detect the failure case. Instead, check whether the loop completed without finding a matching DisplayID block by testing "i == edid->extensions". This ensures the function exits early when no DisplayID extension is present and avoids parsing an unrelated EDID extension block. Also simplify the EDID validation check using "!edid \|\| !edid->extensions". Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:13079 parse_edid_displayid_vrr() warn: variable dereferenced before check 'edid_ext' (see line 13075) Fixes: `a638b837d0` ("drm/amd/display: Fix refresh rate range for some panel") Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Jerry Zuo <jerry.zuo@amd.com> Cc: Sun peng Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `91c7e6342e`)	2026-03-17 12:15:49 -04:00
Xi Ruoyao	ebe82c6e75	drm/amd/display: Wrap dcn32_override_min_req_memclk() in DC_FP_{START, END} [Why] The dcn32_override_min_req_memclk function is in dcn32_fpu.c, which is compiled with CC_FLAGS_FPU into FP instructions. So when we call it we must use DC_FP_{START,END} to save and restore the FP context, and prepare the FP unit on architectures like LoongArch where the FP unit isn't always on. Reported-by: LiarOnce <liaronce@hotmail.com> Fixes: `ee7be8f3de` ("drm/amd/display: Limit DCN32 8 channel or less parts to DPM1 for FPO") Signed-off-by: Xi Ruoyao <xry111@xry111.site> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `25bb1d54ba`) Cc: stable@vger.kernel.org	2026-03-17 12:12:11 -04:00
Calvin Owens	1071815989	drm/amd/display: Fix uninitialized variable use which breaks full LTO Commit `e1b385726f` ("drm/amd/display: Add additional checks for PSP footer size") introduced a use of an uninitialized stack variable in dm_dmub_sw_init() (region_params.bss_data_size). Interestingly, this seems to cause no issue on normal kernels. But when full LTO is enabled, it causes the compiler to "optimize" out huge swaths of amdgpu initialization code, and the driver is unusable: amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP: version=0x07002F00 amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5 amdgpu 0000:03:00.0: amdgpu_device_ip_init failed amdgpu 0000:03:00.0: Fatal error during GPU init It surprises me that neither gcc nor clang emit a warning about this: I only found it by bisecting the LTO breakage. Fix by using the bss_data_size field from fw_meta_info_params, as was presumably intended. Fixes: `e1b385726f` ("drm/amd/display: Add additional checks for PSP footer size") Signed-off-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b7f1402f6a`)	2026-03-17 12:11:49 -04:00
Jesse.Zhang	6270b1a5da	drm/amdgpu: Limit BO list entry count to prevent resource exhaustion Userspace can pass an arbitrary number of BO list entries via the bo_number field. Although the previous multiplication overflow check prevents out-of-bounds allocation, a large number of entries could still cause excessive memory allocation (up to potentially gigabytes) and unnecessarily long list processing times. Introduce a hard limit of 128k entries per BO list, which is more than sufficient for any realistic use case (e.g., a single list containing all buffers in a large scene). This prevents memory exhaustion attacks and ensures predictable performance. Return -EINVAL if the requested entry count exceeds the limit Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `688b87d39e`) Cc: stable@vger.kernel.org	2026-03-17 12:10:16 -04:00
Alex Hung	b49814033c	drm/amd/display: Fix gamma 2.2 colorop TFs Use GAMMA22 for degamma/blend and GAMMA22_INV for shaper so curves match the color pipeline. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5016 Tested-by: Xaver Hugl <xaver.hugl@kde.org> Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `d8f9f42eff`)	2026-03-17 12:08:46 -04:00
Mario Limonciello	3646ff2878	drm/amd: Set num IP blocks to 0 if discovery fails If discovery has failed for any reason (such as no support for a block) then there is no need to unwind all the IP blocks in fini. In this condition there can actually be failures during the unwind too. Reset num_ip_blocks to zero during failure path and skip the unnecessary cleanup path. Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fae5984296`) Cc: stable@vger.kernel.org	2026-03-11 14:04:08 -04:00
Philip Yang	2ce75a0b7e	drm/amdkfd: Unreserve bo if queue update failed Error handling path should unreserve bo then return failed. Fixes: `305cd109b7` ("drm/amdkfd: Validate user queue update") Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c24afed7de`)	2026-03-11 14:02:45 -04:00
Ivan Lipski	becbab4a5a	drm/amd/display: Check for S0i3 to be done before DCCG init on DCN21 [WHY] On DCN21, dccg2_init() is called in dcn10_init_hw() before bios_golden_init(). During S0i3 resume, BIOS sets MICROSECOND_TIME_BASE_DIV to 0x00120464 as a marker. dccg2_init() overwrites this to 0x00120264, causing dcn21_s0i3_golden_init_wa() to misdetect the state and skip golden init. Eventually during the resume sequence, a flip timeout occurs. [HOW] Skip DCCG on dccg2_is_s0i3_golden_init_wa_done() on DCN21. Fixes: `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c61eda4343`)	2026-03-11 14:01:39 -04:00
Ivan Lipski	33efc6346e	drm/amd/display: Add missing DCCG register entries for DCN20-DCN316 Commit `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") moved register writes from hwseq to dccg2_*() functions but did not add the registers to the DCCG register list macros. The struct fields default to 0, so REG_WRITE() targets MMIO offset 0, causing a GPU hang on resume (seen on DCN21/DCN30 during IGT kms_cursor_crc@cursor-suspend). Add - MICROSECOND_TIME_BASE_DIV - MILLISECOND_TIME_BASE_DIV - DCCG_GATE_DISABLE_CNTL - DCCG_GATE_DISABLE_CNTL2 - DC_MEM_GLOBAL_PWR_REQ_CNTL to macros in dcn20_dccg.h, dcn301_dccg.h, dcn31_dccg.h, and dcn314_dccg.h. Fixes: `4c595e7511` ("drm/amd/display: Migrate DCCG registers access from hwseq to dccg component.") Reported-by: Rafael Passos <rafael@rcpassos.me> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e6e2b956fc`)	2026-03-11 14:01:16 -04:00
Mario Limonciello	72ecb1dae7	drm/amd: Fix a few more NULL pointer dereference in device cleanup I found a few more paths that cleanup fails due to a NULL version pointer on unsupported hardware. Add NULL checks as applicable. Fixes: `39fc2bc4da` ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f5a05f8414`) Cc: stable@vger.kernel.org	2026-03-06 17:19:14 -05:00
Yang Wang	a6571045cf	drm/amdgpu: fix gpu idle power consumption issue for gfx v12 Older versions of the MES firmware may cause abnormal GPU power consumption. When performing inference tasks on the GPU (e.g., with Ollama using ROCm), the GPU may show abnormal power consumption in idle state and incorrect GPU load information. This issue has been fixed in firmware version 0x8b and newer. Closes: https://github.com/ROCm/ROCm/issues/5706 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4e22a5fe6e`)	2026-03-06 17:11:46 -05:00
Cristian Ciocaltea	52289ce48e	drm/amdgpu: Fix kernel-doc comments for some LUT properties The following members of struct amdgpu_mode_info do not have valid references in the related kernel-doc sections: - plane_shaper_lut_property - plane_shaper_lut_size_property, - plane_lut3d_size_property Correct all affected comment blocks. Fixes: `f545d82479` ("drm/amd/display: add plane shaper LUT and TF driver-specific properties") Fixes: `671994e3bf` ("drm/amd/display: add plane 3D LUT driver-specific properties") Reviewed-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Cristian Ciocaltea <cristian.ciocaltea@collabora.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ec5708d6e5`)	2026-03-06 17:11:15 -05:00
Mario Limonciello	062ea905ff	drm/amd: Fix NULL pointer dereference in device cleanup When GPU initialization fails due to an unsupported HW block IP blocks may have a NULL version pointer. During cleanup in amdgpu_device_fini_hw, the code calls amdgpu_device_set_pg_state and amdgpu_device_set_cg_state which iterate over all IP blocks and access adev->ip_blocks[i].version without NULL checks, leading to a kernel NULL pointer dereference. Add NULL checks for adev->ip_blocks[i].version in both amdgpu_device_set_cg_state and amdgpu_device_set_pg_state to prevent dereferencing NULL pointers during GPU teardown when initialization has failed. Fixes: `39fc2bc4da` ("drm/amdgpu: Protect GPU register accesses in powergated state in some paths") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b7ac77468c`) Cc: stable@vger.kernel.org	2026-03-06 17:10:40 -05:00
Yang Wang	9d4837a261	drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14 add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v14.0.2/14.0.3 Fixes: `9710b84e2a` ("drm/amd/pm: add overdrive support on smu v14.0.2/3") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1b5cf07d80`)	2026-03-06 17:10:26 -05:00
Yang Wang	cb47c882c3	drm/amd/pm: add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13 add missing od setting PP_OD_FEATURE_ZERO_FAN_BIT for smu v13.0.0/13.0.7 Fixes: `cfffd980bf` ("drm/amd/pm: add zero RPM OD setting support for SMU13") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/5018 Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `576a10797b`)	2026-03-06 17:10:13 -05:00
Sunil Khatri	65b5c326ce	drm/amdgpu/userq: refcount userqueues to avoid any race conditions To avoid race condition and avoid UAF cases, implement kref based queues and protect the below operations using xa lock a. Getting a queue from xarray b. Increment/Decrement it's refcount Every time some one want to access a queue, always get via amdgpu_userq_get to make sure we have locks in place and get the object if active. A userqueue is destroyed on the last refcount is dropped which typically would be via IOCTL or during fini. v2: Add the missing drop in one the condition in the signal ioclt [Alex] v3: remove the queue from the xarray first in the free queue ioctl path [Christian] - Pass queue to the amdgpu_userq_put directly. - make amdgpu_userq_put xa_lock free since we are doing put for each get only and final put is done via destroy and we remove the queue from xa with lock. - use userq_put in fini too so cleanup is done fully. v4: Use xa_erase directly rather than doing load and erase in free ioctl. Also remove some of the error logs which could be exploited by the user to flood the logs [Christian] Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4952189b28`) Cc: <stable@vger.kernel.org> # `048c1c4e51`: drm/amdgpu/userq: Consolidate wait ioctl exit path Cc: <stable@vger.kernel.org>	2026-03-04 13:15:00 -05:00
Tvrtko Ursulin	048c1c4e51	drm/amdgpu/userq: Consolidate wait ioctl exit path If we gate the fence destruction with a check telling us whether there are valid pointers in there we can eliminate the need for dual, basically identical, exit paths. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bea29bb0dd`)	2026-03-04 13:15:00 -05:00
sguttula	a145bbff6f	drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox The reason the RAP is not granting access to 0x58200 is that a dedicated RSMU slot would have to be spent for this address range, and MPASP is close to running out of RSMU slots. This will help to fix PSP TOC load failure during secureboot. GFX Driver Need to use indirect access for SMN address regs. Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9b822e26ee`)	2026-03-04 13:15:00 -05:00
Alysa Liu	2c1030f2e8	drm/amdgpu: Fix use-after-free race in VM acquire Replace non-atomic vm->process_info assignment with cmpxchg() to prevent race when parent/child processes sharing a drm_file both try to acquire the same VM after fork(). Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Signed-off-by: Alysa Liu <Alysa.Liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `c7c573275e`) Cc: stable@vger.kernel.org	2026-03-04 13:15:00 -05:00
Yang Wang	68785c5e79	drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x v1: The metrics->EnergyAccumulator field has been deprecated on newer pmfw. v2: add smu 13.0.0/13.0.7/13.0.10 support. Signed-off-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `8de9edb359`) Cc: stable@vger.kernel.org	2026-03-04 13:14:59 -05:00
Dillon Varone	30d937f63b	drm/amd/display: Fallback to boot snapshot for dispclk [WHY & HOW] If the dentist is unavailable, fallback to reading CLKIP via the boot snapshot to get the current dispclk. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2ab77600d1`) Cc: stable@vger.kernel.org	2026-03-02 17:13:52 -05:00
sguttula	389c2024ca	drm/amdgpu: Enable DPG support for VCN5 This will set DPG flags for enabling power gating on GFX11_5_4 Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a503c266d7`)	2026-03-02 17:13:29 -05:00
Alex Hung	a4fa2355e0	drm/amd/display: Enable DEGAMMA and reject COLOR_PIPELINE+DEGAMMA_LUT [WHAT] Create DEGAMMA properties even if color pipeline is enabled, and enforce the mutual exclusion in atomic check by rejecting any commit that attempts to enable both COLOR_PIPELINE on the plane and DEGAMMA_LUT on the CRTC simultaneously. Fixes: `18a4127e93` ("drm/amd/display: Disable CRTC degamma when color pipeline is enabled") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4963 Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `196a6aa727`)	2026-03-02 17:13:13 -05:00
Alex Hung	c28b3ec3ca	drm/amd/display: Use mpc.preblend flag to indicate 3D LUT [WHAT] New ASIC's 3D LUT is indicated by mpc.preblend. Fixes: `0de2b1afea` ("drm/amd/display: add 3D LUT colorop") Reviewed-by: Melissa Wen <mwen@igalia.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `43175f6164`)	2026-03-02 17:12:58 -05:00
Mario Limonciello	6b0d812971	drm/amd: Disable MES LR compute W/A A workaround was introduced in commit `1fb710793c` ("drm/amdgpu: Enable MES lr_compute_wa by default") to help with some hangs observed in gfx1151. This WA didn't fully fix the issue. It was actually fixed by adjusting the VGPR size to the correct value that matched the hardware in commit `b42f3bf953` ("drm/amdkfd: bump minimum vgpr size for gfx1151"). There are reports of instability on other products with newer GC microcode versions, and I believe they're caused by this workaround. As we don't need the workaround any more, remove it. Fixes: `b42f3bf953` ("drm/amdkfd: bump minimum vgpr size for gfx1151") Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9973e64bd6`) Cc: stable@vger.kernel.org	2026-02-25 17:58:06 -05:00
Lijo Lazar	b57c4ec98c	drm/amdgpu: Fix error handling in slot reset If the device has not recovered after slot reset is called, it goes to out label for error handling. There it could make decision based on uninitialized hive pointer and could result in accessing an uninitialized list. Initialize the list and hive properly so that it handles the error situation and also releases the reset domain lock which is acquired during error_detected callback. Fixes: `732c6cefc1` ("drm/amdgpu: Replace tmp_adev with hive in amdgpu_pci_slot_reset") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `bb71362182`)	2026-02-25 17:57:55 -05:00
sguttula	a5fe1a5451	drm/amdgpu/vcn5: Add SMU dpm interface type This will set AMDGPU_VCN_SMU_DPM_INTERFACE_* smu_type based on soc type and fixing ring timeout issue seen for DPM enabled case. Signed-off-by: sguttula <suresh.guttula@amd.com> Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f0f23c315b`)	2026-02-25 17:57:06 -05:00
Bart Van Assche	480ad5f6ea	drm/amdgpu: Fix locking bugs in error paths Do not unlock psp->ras_context.mutex if it has not been locked. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: YiPeng Chai <YiPeng.Chai@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: `b3fb79cda5` ("drm/amdgpu: add mutex to protect ras shared memory") Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `6fa01b4335`)	2026-02-25 17:56:50 -05:00
Bart Van Assche	5e0bcc7b88	drm/amdgpu: Unlock a mutex before destroying it Mutexes must be unlocked before these are destroyed. This has been detected by the Clang thread-safety analyzer. Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Yang Wang <kevinyang.wang@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: amd-gfx@lists.freedesktop.org Fixes: `f5e4cc8461` ("drm/amdgpu: implement RAS ACA driver framework") Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `270258ba32`)	2026-02-25 17:56:43 -05:00
Natalie Vock	28dfe43175	drm/amd/display: Use GFP_ATOMIC in dc_create_stream_for_sink This can be called while preemption is disabled, for example by dcn32_internal_validate_bw which is called with the FPU active. Fixes "BUG: scheduling while atomic" messages I encounter on my Navi31 machine. Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `b42dae2ebc`) Cc: stable@vger.kernel.org	2026-02-25 17:56:22 -05:00
Sunil Khatri	64ac7c09fc	drm/amdgpu: add upper bound check on user inputs in wait ioctl Huge input values in amdgpu_userq_wait_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. v2: squash in Srini's fix Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `fcec012c66`) Cc: stable@vger.kernel.org	2026-02-25 17:54:57 -05:00
Sunil Khatri	ea78f8c68f	drm/amdgpu: add upper bound check on user inputs in signal ioctl Huge input values in amdgpu_userq_signal_ioctl can lead to a OOM and could be exploited. So check these input value against AMDGPU_USERQ_MAX_HANDLES which is big enough value for genuine use cases and could potentially avoid OOM. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `be267e15f9`) Cc: stable@vger.kernel.org	2026-02-25 17:49:48 -05:00
Tvrtko Ursulin	7b7d7693a5	drm/amdgpu/userq: Do not allow userspace to trivially triger kernel warnings Userspace can either deliberately pass in the too small num_fences, or the required number can legitimately grow between the two calls to the userq wait ioctl. In both cases we do not want the emit the kernel warning backtrace since nothing is wrong with the kernel and userspace will simply get an errno reported back. So lets simply drop the WARN_ONs. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: `a292fdecd7` ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2c333ea579`)	2026-02-25 17:49:28 -05:00
Tvrtko Ursulin	49abfa8126	drm/amdgpu/userq: Fix reference leak in amdgpu_userq_wait_ioctl Drop reference to syncobj and timeline fence when aborting the ioctl due output array being too small. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com> Fixes: `a292fdecd7` ("drm/amdgpu: Implement userqueue signal/wait IOCTL") Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `68951e9c3e`) Cc: <stable@vger.kernel.org> # v6.16+	2026-02-25 17:49:02 -05:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00

1 2 3 4 5 ...

35992 Commits