linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-26 10:32:25 -04:00

Author	SHA1	Message	Date
Sathishkumar S	0e7581eda8	drm/amdgpu/jpeg: Hold pg_lock before jpeg poweroff Acquire jpeg_pg_lock before changes to jpeg power state and release it after power off from idle work handler. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:50 -04:00
Lijo Lazar	0b4d79dafa	drm/amdgpu: Assign unique id to compute partition Assign unique id to compute partition. This is the unique id of the first XCD instance belonging to the partition. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:47 -04:00
Alex Deucher	aae94897b6	drm/amdgpu: add missing vram lost check for LEGACY RESET Legacy resets reset the memory controllers so VRAM contents may be unreliable after reset. Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:34:14 -04:00
Alex Deucher	62eedd150f	drm/amdgpu/discovery: fix fw based ip discovery We only need the fw based discovery table for sysfs. No need to parse it. Additionally parsing some of the board specific tables may result in incorrect data on some boards. just load the binary and don't parse it on those boards. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441 Fixes: `80a0e82829` ("drm/amdgpu/discovery: optionally use fw based ip discovery") Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:32:38 -04:00
Lijo Lazar	589ea8a1fd	drm/amdgpu: Add helpers to set/get unique ids Add a struct to store unique id information for each type. Add helper to fetch the unique id. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:22:08 -04:00
Lijo Lazar	892bac995b	drm/amdgpu: Prevent hardware access in dpc state Don't allow hardware access while in dpc state. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:22:03 -04:00
Lijo Lazar	1a0e57eb96	drm/amdgpu/vcn: Fix double-free of vcn dump buffer The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The issue is introduced by below patch series. Fixes: `de55cbff5c` ("drm/amdgpu/vcn: Add regdump helper functions") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:16 -04:00
Lijo Lazar	c5c62160a5	drm/amdgpu: Log reset source during recovery To get more context, add reset source to identify the source of gpu recovery - job timeout, RAS, HWS hang etc. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:05 -04:00
Xiang Liu	b3505c2c48	drm/amdgpu: Generate BP threshold exceed CPER once threshold exceeded The bad pages threshold exceed CPER should be generated once threshold exceeded, no matter the bad_page_threshold setted or not. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:21:01 -04:00
Lijo Lazar	91c4fd4164	drm/amdgpu: Set dpc status appropriately Set the dpc status based on hardware state. Also, clear the status before reinitialization after a successful reset. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:18:35 -04:00
Lijo Lazar	32f73741d6	drm/amdgpu: Wait for bootloader after PSPv11 reset Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead of checking for C2PMSG_33 status, add the callback wait_for_bootloader. Wait for bootloader to be back to steady state is already part of the generic mode-1 reset flow. Increase the retry count for bootloader wait and also fix the mask to prevent fake pass. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:17:50 -04:00
Ethan Carter Edwards	66f92d1035	drm/amdgpu/gfx9.4.3: remove redundant repeated nested 0 check The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:17:47 -04:00
Ethan Carter Edwards	90e1d0324a	drm/amdgpu/gfx9: remove redundant repeated nested 0 check The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:17:45 -04:00
Ethan Carter Edwards	8802ec0eba	drm/amdgpu/gfx10: remove redundant repeated nested 0 check The repeated checks on grbm_soft_reset are unnecessary. Remove them. Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:17:40 -04:00
Xaver Hugl	9ed3d7bdf2	amdgpu/amdgpu_discovery: increase timeout limit for IFWI init With a timeout of only 1 second, my rx 5700XT fails to initialize, so this increases the timeout to 2s. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697 Signed-off-by: Xaver Hugl <xaver.hugl@kde.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-06 14:17:03 -04:00
Jesse.Zhang	124ffa2970	drm/amdgpu: Update SDMA firmware version check for user queue support This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: `8c011408ed` ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `92e2449241`) Cc: stable@vger.kernel.org	2025-08-04 15:48:14 -04:00
Lijo Lazar	c2fe914d50	drm/amdgpu: Add NULL check for asic_funcs If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `582bf7c515`) Cc: stable@vger.kernel.org	2025-08-04 15:47:50 -04:00
Alex Deucher	9f9bddfa31	drm/amdgpu: update mmhub 3.3 client id mappings Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `e932f4779a`) Cc: stable@vger.kernel.org	2025-08-04 15:42:15 -04:00
Alex Deucher	0bae62cc98	drm/amdgpu: update mmhub 3.0.1 client id mappings Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `2a2681eda7`) Cc: stable@vger.kernel.org	2025-08-04 15:41:52 -04:00
YuanShang	c00d8b79fd	drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ed76936c6b`) Cc: stable@vger.kernel.org	2025-08-04 15:41:09 -04:00
Lijo Lazar	05c8b69051	drm/amdgpu: Update external revid for GC v9.5.0 Use different external revid for GC v9.5.0 SOCs. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `21c6764ed4`) Cc: stable@vger.kernel.org	2025-08-04 15:32:37 -04:00
Lijo Lazar	389d79a195	drm/amdgpu: Update supported modes for GC v9.5.0 For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in NPS2 mode. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Asad Kamal <asad.kamal@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9d1ac25c7f`) Cc: stable@vger.kernel.org	2025-08-04 15:32:07 -04:00
Xiang Liu	58364f01db	drm/amdgpu: Fix vcn v4.0.3 poison irq call trace on sriov guest Sriov guest side doesn't init ras feature hence the poison irq shouldn't be put during hw fini. [25209.468816] Call Trace: [25209.468817] <TASK> [25209.468818] ? srso_alias_return_thunk+0x5/0x7f [25209.468820] ? show_trace_log_lvl+0x28e/0x2ea [25209.468822] ? show_trace_log_lvl+0x28e/0x2ea [25209.468825] ? vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu] [25209.468936] ? show_regs.part.0+0x23/0x29 [25209.468939] ? show_regs.cold+0x8/0xd [25209.468940] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469038] ? __warn+0x8c/0x100 [25209.469040] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469135] ? report_bug+0xa4/0xd0 [25209.469138] ? handle_bug+0x39/0x90 [25209.469140] ? exc_invalid_op+0x19/0x70 [25209.469142] ? asm_exc_invalid_op+0x1b/0x20 [25209.469146] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.469241] vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu] [25209.469343] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu] [25209.469511] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu] Fixes: `4c4a891496` ("drm/amdgpu: Register aqua vanjaram vcn poison irq") Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:43:51 -04:00
Xiang Liu	d3d73bdb02	drm/amdgpu: Fix jpeg v4.0.3 poison irq call trace on sriov guest Sriov guest side doesn't init ras feature hence the poison irq shouldn't be put during hw fini. [25209.467154] Call Trace: [25209.467156] <TASK> [25209.467158] ? srso_alias_return_thunk+0x5/0x7f [25209.467162] ? show_trace_log_lvl+0x28e/0x2ea [25209.467166] ? show_trace_log_lvl+0x28e/0x2ea [25209.467171] ? jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu] [25209.467300] ? show_regs.part.0+0x23/0x29 [25209.467303] ? show_regs.cold+0x8/0xd [25209.467304] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467403] ? __warn+0x8c/0x100 [25209.467407] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467503] ? report_bug+0xa4/0xd0 [25209.467508] ? handle_bug+0x39/0x90 [25209.467511] ? exc_invalid_op+0x19/0x70 [25209.467513] ? asm_exc_invalid_op+0x1b/0x20 [25209.467518] ? amdgpu_irq_put+0x9e/0xc0 [amdgpu] [25209.467613] ? amdgpu_irq_put+0x5f/0xc0 [amdgpu] [25209.467709] jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu] [25209.467805] amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu] [25209.467971] amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu] Fixes: `1b2231de41` ("drm/amdgpu: Register aqua vanjaram jpeg poison irq") Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:42:25 -04:00
Lijo Lazar	5c2b3226d0	drm/amdgpu: Add wrapper function for dpc state Use wrapper functions to set/indicate dpc status. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Ce Sun <cesun102@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:42:21 -04:00
Jesse.Zhang	92e2449241	drm/amdgpu: Update SDMA firmware version check for user queue support This commit fixes a firmware version check for enabling user queue support in SDMA v7.0. The previous version check (7836028) was incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL commands causing register conflicts between MCU_DBG0 and MCU_DBG1. Fixes: `8c011408ed` ("drm/amdgpu/sdma7: add ucode version checks for userq support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:41:06 -04:00
Lijo Lazar	582bf7c515	drm/amdgpu: Add NULL check for asic_funcs If driver load fails too early, asic_funcs pointer remains unassigned. Add NULL check to sanitize unwind path. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:40:39 -04:00
Mangesh Gadre	82594ac858	drm/amdgpu: Initialize vcn v5_0_1 ras function Initialize vcn v5_0_1 ras function Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:37:48 -04:00
Yunxiang Li	ba5e322b26	drm/amdgpu: skip mgpu fan boost for multi-vf On multi-vf setup if the VM have two vf assigned, perhaps from two different gpus, mgpu fan boost will fail. Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:37:27 -04:00
Mangesh Gadre	01fa9758c8	drm/amdgpu: Initialize jpeg v5_0_1 ras function Initialize jpeg v5_0_1 ras function Signed-off-by: Mangesh Gadre <Mangesh.Gadre@amd.com> Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:37:24 -04:00
Xiang Liu	8e8e08c831	drm/amdgpu: Skip poison aca bank from UE channel Avoid GFX poison consumption errors logged when fatal error occurs. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:37:20 -04:00
Arnd Bergmann	4d22db6d07	drm/amdgpu: fix link error for !PM_SLEEP When power management is not enabled in the kernel build, the newly added hibernation changes cause a link failure: arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o: in function `amdgpu_pmops_thaw': amdgpu_drv.c:(.text+0x1514): undefined reference to `pm_hibernate_is_recovering' Make the power management code in this driver conditional on CONFIG_PM and CONFIG_PM_SLEEP Fixes: `530694f54d` ("drm/amdgpu: do not resume device in thaw for normal hibernation") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250714081635.4071570-1-arnd@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:54 -04:00
Alex Deucher	e932f4779a	drm/amdgpu: update mmhub 3.3 client id mappings Update the client id mapping so the correct clients get printed when there is a mmhub page fault. v2: fix typos spotted by David Wu. v3: fix additional typo spotted by David. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:47 -04:00
Alex Deucher	2a2681eda7	drm/amdgpu: update mmhub 3.0.1 client id mappings Update the client id mapping so the correct clients get printed when there is a mmhub page fault. Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:40 -04:00
Sathishkumar S	26a63590fe	drm/amdgpu/vcn: Register dump cleanup in VCN2_5 Use generic vcn devcoredump helper functions for VCN2_5 and VCN2_6 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:36 -04:00
Sathishkumar S	53c4be7a59	drm/amdgpu/vcn: Register dump cleanup in VCN2_0_0 Use generic vcn devcoredump helper functions for VCN2_0_0 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:31 -04:00
Sathishkumar S	b2d532b588	drm/amdgpu/vcn: Register dump cleanup in VCN3_0 Use generic vcn devcoredump helper functions for VCN3_0 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:28 -04:00
Sathishkumar S	69cc37647b	drm/amdgpu/vcn: Register dump cleanup in VCN4_0_3 Use generic vcn devcoredump helper functions for VCN4_0_3 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:23 -04:00
Sathishkumar S	793b97c4ad	drm/amdgpu/vcn: Register dump cleanup in VCN4_0_5 Use generic vcn devcoredump helper functions for VCN4_0_5 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:20 -04:00
Sathishkumar S	4e011af912	drm/amdgpu/vcn: Register dump cleanup in VCN4_0_0 Use generic vcn devcoredump helper functions for VCN4_0_0 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:17 -04:00
Sathishkumar S	f4c3be28d5	drm/amdgpu/vcn: Register dump cleanup in VCN5 Use generic vcn devcoredump helper functions for VCN5 Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:14 -04:00
Stanley.Yang	08e27c9d92	drm/amdgpu: Add new error code for VCN/JPEG new chain Add VIDS and JPEG8/9 S\|D chain error code for VCN/JPEG v5.0.1. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:06 -04:00
Stanley.Yang	b1b29aa88f	drm/amdgpu: Fix vcn v5.0.1 poison irq call trace Why: [13014.890792] Call Trace: [13014.890793] <TASK> [13014.890795] ? show_trace_log_lvl+0x1d6/0x2ea [13014.890799] ? show_trace_log_lvl+0x1d6/0x2ea [13014.890800] ? vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu] [13014.890872] ? show_regs.part.0+0x23/0x29 [13014.890873] ? show_regs.cold+0x8/0xd [13014.890874] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu] [13014.890934] ? __warn+0x8c/0x100 [13014.890936] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu] [13014.890995] ? report_bug+0xa4/0xd0 [13014.890999] ? handle_bug+0x39/0x90 [13014.891001] ? exc_invalid_op+0x19/0x70 [13014.891003] ? asm_exc_invalid_op+0x1b/0x20 [13014.891005] ? amdgpu_irq_put+0xc6/0xe0 [amdgpu] [13014.891065] ? amdgpu_irq_put+0x63/0xe0 [amdgpu] [13014.891124] vcn_v5_0_1_hw_fini+0xe9/0x110 [amdgpu] [13014.891189] amdgpu_ip_block_hw_fini+0x3b/0x78 [amdgpu] [13014.891309] amdgpu_device_fini_hw+0x3c1/0x479 [amdgpu] How: Add omitted vcn poison irq get call. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:36:00 -04:00
Sathishkumar S	de55cbff5c	drm/amdgpu/vcn: Add regdump helper functions Add generic helper functions for vcn devcoredump support which can be re-used for all vcn versions. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:35:54 -04:00
Meng Li	e6c2b0f232	drm/amd/amdgpu: Release xcp drm memory after unplug Add a new API amdgpu_xcp_drm_dev_free(). After unplug xcp device, need to release xcp drm memory etc. Co-developed-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Jiang Liu <gerry@linux.alibaba.com> Signed-off-by: Meng Li <li.meng@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:35:48 -04:00
YuanShang	ed76936c6b	drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job The field job->vm is used in function amdgpu_job_run to get the page table re-generation counter and decide whether the job should be skipped. Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use. For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed, then the gfx job should be skipped. Fixes: `26c95e838e` ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare") Signed-off-by: YuanShang <YuanShang.Mao@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:31:34 -04:00
Mario Limonciello	cc51bbc7d7	drm/amd: Use drm_() macros instead of DRM_() for amdgpu_cs Some of the IOCTL messages can be called for different GPUs and it might not be obvious which one called them from a problem. Using the drm_*() macros the correct device will be shown in the messages. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250715212420.2254925-1-superm1@kernel.org Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:31:25 -04:00
Yunshui Jiang	130c7ed88f	drm/amdgpu: use kmalloc_array() instead of kmalloc() Use kmalloc_array() instead of kmalloc() with multiplication. kmalloc_array() is a safer way because of its multiply overflow check. Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:30:31 -04:00
Sathishkumar S	46b0e6b9d7	drm/amdgpu: Fix unintended error log in VCN5_0_0 The error log is supposed to be gaurded under if failure condition. Fixes: `faab5ea083` ("drm/amdgpu: Check vcn sram load return value") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:29:14 -04:00
Ce Sun	da46735229	drm/amdgpu: Effective health check before reset Move amdgpu_device_health_check into amdgpu_device_gpu_recover to ensure that if the device is present can be checked before reset The reason is: 1.During the dpc event, the device where the dpc event occurs is not present on the bus 2.When both dpc event and ATHUB event occur simultaneously,the dpc thread holds the reset domain lock when detecting error,and the gpu recover thread acquires the hive lock.The device is simultaneously in the states of amdgpu_ras_in_recovery and occurs_dpc,so gpu recover thread will not go to amdgpu_device_health_check.It waits for the reset domain lock held by the dpc thread, but dpc thread has not released the reset domain lock.In the dpc callback slot_reset,to obtain the hive lock, the hive lock is held by the gpu recover thread at this time.So a deadlock occurred Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:27:49 -04:00

... 8 9 10 11 12 ...

16673 Commits