Likun Gao
1ecef55893
drm/amdgpu: init TA fw for psp v14
...
Add support to init TA firmware for psp v14.
Signed-off-by: Likun Gao <Likun.Gao@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:52:43 -04:00
Yang Wang
017d0b67bf
drm/amdgpu: refine gfx6 firmware loading
...
refine gfx6 firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:52:36 -04:00
Yang Wang
a4fcb5f733
Revert "drm/amdgpu: change aca bank error lock type to spinlock"
...
This reverts commit f6bce954f4 .
Revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.
[ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[ 602.668939] Call Trace:
[ 602.668940] <TASK>
[ 602.668941] dump_stack_lvl+0x4c/0x70
[ 602.668945] dump_stack+0x14/0x20
[ 602.668946] __schedule_bug+0x5a/0x70
[ 602.668950] __schedule+0x940/0xb30
[ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668955] ? hrtimer_reprogram+0x77/0xb0
[ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668959] ? hrtimer_start_range_ns+0x126/0x370
[ 602.668961] schedule+0x39/0xe0
[ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140
[ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 602.668966] schedule_hrtimeout_range+0x17/0x20
[ 602.668967] usleep_range_state+0x69/0x90
[ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu]
[ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu]
[ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669339] ? stack_depot_save+0x12/0x20
[ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669343] ? set_track_prepare+0x52/0x70
[ 602.669346] ? kmemleak_alloc+0x4f/0x90
[ 602.669348] ? __kmalloc_node+0x34b/0x450
[ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[ 602.669743] ? kmemleak_free+0x36/0x60
[ 602.669745] ? kvfree+0x32/0x40
[ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669749] ? kfree+0x15d/0x2a0
[ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0
[ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[ 602.670187] ? srso_alias_return_thunk+0x5/0
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: YiPeng Chai <yipeng.chai@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:51:23 -04:00
Yang Wang
8c9ee18019
Revert "drm/amdgpu: change bank cache lock type to spinlock"
...
This reverts commit 258ed689bc
revert this patch to modify lock type back to 'mutex' to avoid kernel
calltrace issue.
[ 602.668806] Workqueue: amdgpu-reset-dev amdgpu_ras_do_recovery [amdgpu]
[ 602.668939] Call Trace:
[ 602.668940] <TASK>
[ 602.668941] dump_stack_lvl+0x4c/0x70
[ 602.668945] dump_stack+0x14/0x20
[ 602.668946] __schedule_bug+0x5a/0x70
[ 602.668950] __schedule+0x940/0xb30
[ 602.668952] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668955] ? hrtimer_reprogram+0x77/0xb0
[ 602.668957] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.668959] ? hrtimer_start_range_ns+0x126/0x370
[ 602.668961] schedule+0x39/0xe0
[ 602.668962] schedule_hrtimeout_range_clock+0xb1/0x140
[ 602.668964] ? __pfx_hrtimer_wakeup+0x10/0x10
[ 602.668966] schedule_hrtimeout_range+0x17/0x20
[ 602.668967] usleep_range_state+0x69/0x90
[ 602.668970] psp_cmd_submit_buf+0x132/0x570 [amdgpu]
[ 602.669066] psp_ras_invoke+0x75/0x1a0 [amdgpu]
[ 602.669156] psp_ras_query_address+0x9c/0x120 [amdgpu]
[ 602.669245] umc_v12_0_update_ecc_status+0x16d/0x520 [amdgpu]
[ 602.669337] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669339] ? stack_depot_save+0x12/0x20
[ 602.669342] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669343] ? set_track_prepare+0x52/0x70
[ 602.669346] ? kmemleak_alloc+0x4f/0x90
[ 602.669348] ? __kmalloc_node+0x34b/0x450
[ 602.669352] amdgpu_umc_update_ecc_status+0x23/0x40 [amdgpu]
[ 602.669438] mca_umc_mca_get_err_count+0x85/0xc0 [amdgpu]
[ 602.669554] mca_smu_parse_mca_error_count+0x120/0x1d0 [amdgpu]
[ 602.669655] amdgpu_mca_dispatch_mca_set.part.0+0x141/0x250 [amdgpu]
[ 602.669743] ? kmemleak_free+0x36/0x60
[ 602.669745] ? kvfree+0x32/0x40
[ 602.669747] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669749] ? kfree+0x15d/0x2a0
[ 602.669752] amdgpu_mca_smu_log_ras_error+0x1f6/0x210 [amdgpu]
[ 602.669839] amdgpu_ras_query_error_status_helper+0x2ad/0x390 [amdgpu]
[ 602.669924] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.669925] ? __call_rcu_common.constprop.0+0xa6/0x2b0
[ 602.669929] amdgpu_ras_query_error_status+0xf3/0x620 [amdgpu]
[ 602.670014] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.670017] amdgpu_ras_log_on_err_counter+0xe1/0x170 [amdgpu]
[ 602.670103] amdgpu_ras_do_recovery+0xd2/0x2c0 [amdgpu]
[ 602.670187] ? srso_alias_return_thunk+0x5/0xfbef5
[ 602.670189] ? __schedule+0x37d/0xb30
[ 602.670191] process_one_work+0x176/0x350
[ 602.670194] worker_thread+0x2f7/0x420
[ 602.670197] ?
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:50:31 -04:00
Alex Deucher
19797687e6
drm/amdgpu: remove amdgpu_mes_fence_wait_polling()
...
No longer used so remove it.
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:50:24 -04:00
Alex Deucher
fffe347e14
drm/amdgpu: cleanup MES12 command submission
...
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.
Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.
While at it cleanup the coding style.
Fixes: ade887c633 ("drm/amdgpu/mes12: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com >
Suggested-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:49:59 -04:00
Yang Wang
3af2c80ae2
drm/amdgpu: refine gfx10 firmware loading
...
refine gfx10 firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:49:53 -04:00
Yang Wang
23fc94795b
drm/amdgpu: refine gfx9 firmware loading
...
refine gfx9 firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:49:28 -04:00
Christian König
de32462541
drm/amdgpu: cleanup MES11 command submission
...
The approach of having a separate WB slot for each submission doesn't
really work well and for example breaks GPU reset.
Use a status query packet for the fence update instead since those
should always succeed we can use the fence of the original packet to
signal the state of the operation.
While at it cleanup the coding style.
Fixes: eef016ba89 ("drm/amdgpu/mes11: Use a separate fence per transaction")
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com >
Signed-off-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:48:25 -04:00
Christian König
a6328c9c3d
drm/amdgpu: fix using the reserved VMID with gang submit
...
We need to ensure that even when using a reserved VMID that the gang
members can still run in parallel.
Signed-off-by: Christian König <christian.koenig@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:48:00 -04:00
Victor Lu
b32563859d
drm/amdgpu: Do not wait for MP0_C2PMSG_33 IFWI init in SRIOV
...
SRIOV does not need to wait for IFWI init, and MP0_C2PMSG_33 is blocked
for VF access.
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com >
Reviewed-by: Vignesh Chander <Vignesh.Chander@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:47:52 -04:00
Mukul Joshi
4d14a74054
Revert "drm/amdgpu: Add missing locking for MES API calls"
...
This reverts commit 3612702852 .
This is causing a BUG message during suspend.
[ 61.603542] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
[ 61.603550] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2028, name: kworker/u64:14
[ 61.603553] preempt_count: 1, expected: 0
[ 61.603555] RCU nest depth: 0, expected: 0
[ 61.603557] Preemption disabled at:
[ 61.603559] [<ffffffffc08a3261>] amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[ 61.603789] CPU: 9 PID: 2028 Comm: kworker/u64:14 Tainted: G W 6.8.0+ #7
[ 61.603795] Workqueue: events_unbound async_run_entry_fn
[ 61.603801] Call Trace:
[ 61.603803] <TASK>
[ 61.603806] dump_stack_lvl+0x37/0x50
[ 61.603811] ? amdgpu_gfx_disable_kgq+0x61/0x160 [amdgpu]
[ 61.604007] dump_stack+0x10/0x20
[ 61.604010] __might_resched+0x16f/0x1d0
[ 61.604016] __might_sleep+0x43/0x70
[ 61.604020] mutex_lock+0x1f/0x60
[ 61.604024] amdgpu_mes_unmap_legacy_queue+0x6d/0x100 [amdgpu]
[ 61.604226] gfx11_kiq_unmap_queues+0x3dc/0x430 [amdgpu]
[ 61.604422] ? srso_alias_return_thunk+0x5/0xfbef5
[ 61.604429] amdgpu_gfx_disable_kgq+0x122/0x160 [amdgpu]
[ 61.604621] gfx_v11_0_hw_fini+0xda/0x100 [amdgpu]
[ 61.604814] gfx_v11_0_suspend+0xe/0x20 [amdgpu]
[ 61.605008] amdgpu_device_ip_suspend_phase2+0x135/0x1d0 [amdgpu]
[ 61.605175] amdgpu_device_suspend+0xec/0x180 [amdgpu]
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-19 12:46:39 -04:00
Yang Wang
3a3be8bb97
drm/amdgpu: refine gfx8 firmware loading
...
refine gfx8 firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:18:27 -04:00
Tao Zhou
09a3d8202d
drm/amdgpu: set RAS fed status for more cases
...
Indicate fatal error for each RAS block and NBIO.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:18:26 -04:00
Tao Zhou
7e4371676e
drm/amdgpu: create amdgpu_ras_in_recovery to simplify code
...
Reduce redundant code and user doesn't need to pay attention to RAS
details.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:18:26 -04:00
Tao Zhou
5f7697bbc1
drm/amdgpu: trigger mode1 reset for RAS RMA status
...
Check RMA status in bad page retirement flow.
v2: fix coding bugs in v1.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:18:26 -04:00
Yang Wang
9d26e0cfc2
drm/amdgpu: refine gfx7 firmware loading
...
refine gfx7 firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:15 -04:00
Christian König
030631e97b
drm/amdgpu: revert "take runtime pm reference when we attach a buffer" v2
...
This reverts commit b8c415e3bf ("drm/amdgpu: take runtime pm reference
when we attach a buffer") and commit 425285d39a ("drm/amdgpu: add amdgpu
runpm usage trace for separate funcs").
Taking a runtime pm reference for DMA-buf is actually completely
unnecessary and even dangerous.
The problem is that calling pm_runtime_get_sync() from the DMA-buf
callbacks is illegal because we have the reservation locked here
which is also taken during resume. So this would deadlock.
When the buffer is in GTT it is still accessible even when the GPU
is powered down and when it is in VRAM the buffer gets migrated to
GTT before powering down.
The only use case which would make it mandatory to keep the runtime
pm reference would be if we pin the buffer into VRAM, and that's not
something we currently do.
v2: improve the commit message
Signed-off-by: Christian König <christian.koenig@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
CC: stable@vger.kernel.org
2024-06-14 16:17:13 -04:00
Yang Wang
c37b8f7868
drm/amdgpu: refine imu firmware loading
...
refine imu firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
cd093c24ee
drm/amdgpu: refine gmc firmware loading
...
refine gmc firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
8d7ff60f36
drm/amdgpu: refine vpe firmware loading
...
refine vpe firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
b441e9ac9d
drm/amdgpu: refine vcn firmware loading
...
refine vcn firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
9817f06173
drm/amdgpu: move aca/mca init functions into ras_init() stage
...
adjust the function position to better match aca/mca fini code in ras_fini().
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Bob Zhou
be6a69b21a
drm/amdgpu: fix overflowed constant warning in mmhub_set_clockgating()
...
To fix potential overflowed constant warning, modify the variables to u32
for getting the return value of RREG32_SOC15().
Signed-off-by: Bob Zhou <bob.zhou@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Harish Kasiviswanathan
199d69d5f9
drm/amdgpu: Indicate CU havest info to CP
...
To achieve full occupancy CP hardware needs to know if CUs in SE are
symmetrically or asymmetrically harvested
v2: Reset is_symmetric_cus for each loop
Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Lijo Lazar
3a86fdc422
drm/amdgpu: Skip coredump during resets for debug
...
Skip scheduling coredump when gpu reset is intentionally triggered
through debugfs.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com >
Reviewed-by: Asad Kamal <asad.kamal@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
3618fa26c8
drm/amdgpu: refine sdma firmware loading
...
refine sdma firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:12 -04:00
Yang Wang
8cae4b578e
drm/amdgpu: refine psp firmware loading
...
refine psp firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:11 -04:00
Yang Wang
bf349b036d
drm/amdgpu: refine mes firmware loading
...
v1:
refine mes firmware loading
v2:
use dev_info instead of DRM_INFO
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:11 -04:00
Mukul Joshi
3612702852
drm/amdgpu: Add missing locking for MES API calls
...
Add missing locking at a few places when calling MES APIs to ensure
exclusive access to MES queue.
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com >
Reviewed-by: Kent Russell <kent.russell@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:11 -04:00
Mario Limonciello
2fe87f54ab
drm/amd/display: Set default brightness according to ACPI
...
Currently, amdgpu will always set up the brightness at 100% when it
loads. However this is jarring when the BIOS has it previously
programmed to a much lower value.
The ACPI ATIF method includes two members for "ac_level" and "dc_level".
These represent the default values that should be used if the system is
brought up in AC and DC respectively.
Use these values to set up the default brightness when the backlight
device is registered.
v2: squash in ACPI fix
Reviewed-by: Leo Li <sunpeng.li@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:17:11 -04:00
David (Ming Qiang) Wu
ee3942d9ab
drm/amdgpu: drop some kernel messages in VCN code
...
Similar to commit 813e7d4cd0 where some kernel log
messages are dropped. With this commit, more log
messages in older version of VCN/JPEG code are dropped.
Acked-by: Leo Liu <leo.liu@amd.com >
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:16:51 -04:00
Yang Wang
a777c9d70a
drm/amdgpu: refine gpu_info firmware loading
...
refine gpu_info firmware loading
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Yang Wang
1bfe5e7746
drm/amdgpu: enhance amdgpu_ucode_request() function flexibility
...
v1:
Adding formatting string feature to improve function flexibility.
v2:
modify macro name to ADMGPU_UCODE_MAX_NAME.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Bob Zhou
37f432481d
drm/amdgpu: fix the overflowed constant warning for RREG32_SOC15()
...
To fix potential overflowed constant warning reported by Coverity,
modify the variables to uint32_t.
Signed-off-by: Bob Zhou <bob.zhou@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Yunxiang Li
18f2525d31
drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb
...
We need to take the reset domain lock before flush hdp. We can't put the
lock inside amdgpu_device_flush_hdp itself because it is used during
reset where we already take the write side lock.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Yunxiang Li
9c33e5fd4f
drm/amdgpu: fix locking scope when flushing tlb
...
Which method is used to flush tlb does not depend on whether a reset is
in progress or not. We should skip flush altogether if the GPU will get
reset. So put both path under reset_domain read lock.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
CC: stable@vger.kernel.org
2024-06-14 16:15:59 -04:00
Yunxiang Li
ba531117a8
drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable
...
Here since we are in reset and takes the reset_domain write side lock
already. We can't use the flush tlb helper which tries to take the read
side.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Yunxiang Li
c1f9d82b92
drm/amdgpu: use helper in amdgpu_gart_unbind
...
When amdgpu_gart_invalidate_tlb helper is introduced this part was left
out of the conversion. Avoid the code duplication here.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:59 -04:00
Yunxiang Li
4b0e76e4c1
drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover
...
At this point the gart is not set up, there's no point to invalidate tlb
here and it could even be harmful.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:58 -04:00
Yunxiang Li
5c0a1cdd17
drm/amdgpu: fix sriov host flr handler
...
We send back the ready to reset message before we stop anything. This is
wrong. Move it to when we are actually ready for the FLR to happen.
In the current state since we take tens of seconds to stop everything,
it is very likely that host would give up waiting and reset the GPU
before we send ready, so it would be the same as before. But this gets
rid of the hack with reset_domain locking and also let us tell how slow
ready to reset actually is from the host. The ready to reset speed can
be improved later.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Reviewed-by: Emily Deng <Emily.Deng@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:58 -04:00
Yunxiang Li
b3948ad1ac
drm/amdgpu: add skip_hw_access checks for sriov
...
Accessing registers via host is missing the check for skip_hw_access and
the lockdep check that comes with it.
Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:58 -04:00
Eric Huang
bac640ddb5
drm/amdgpu: add reset source in various cases
...
To fullfill the reset event description.
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com >
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:58 -04:00
Eric Huang
7bed1df814
drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc
...
amdgpu_job_ring may return NULL, which causes kernel NULL
pointer error, using another way to print ring name instead
of ring->name.
Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com >
Signed-off-by: Eric Huang <jinhuieric.huang@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 16:15:58 -04:00
Jesse Zhang
27b500b77b
drm/amdgpu: remove dead code in atom_get_src_int
...
Since the range of align is 0~7, the expression is: align = (attr >> 3) & 7.
In the case of ATOM_ARG_IMM, the code cannot reach the default case.
So there is no need for "break".
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com >
Suggested-by: Tim Huang <Tim.Huang@amd.com >
Reviewed-by: Tim Huang <Tim.Huang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 15:34:10 -04:00
Frank Min
faa64f633c
drm/amdgpu: add sdma 7.0 support for copy dcc buffer
...
1. Add dcc buffer flag for copy buffer
2. Add sdma 7.0 support copy dcc buffer
Signed-off-by: Likun Gao <Likun.Gao@amd.com >
Signed-off-by: Frank Min <Frank.Min@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 15:22:14 -04:00
Likun Gao
7c85e97083
drm/amdgpu: support for DCC feature
...
Deal with AMDGPU_GEM_CREATE_GFX12_DCC to set DCC bit
when needed.
Signed-off-by: Likun Gao <Likun.Gao@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 15:21:51 -04:00
Alex Deucher
6b83b94a94
drm/amdgpu: add additional VM bits
...
Add additional VM PTE bits.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2024-06-14 15:20:56 -04:00
Maxime Ripard
14731a640e
Merge drm/drm-fixes into drm-misc-fixes
...
Roll -rc3 and current drm/fixes in.
This will also unstuck our for-next branch.
Signed-off-by: Maxime Ripard <mripard@kernel.org >
2024-06-14 09:55:46 +02:00
Dave Airlie
1ddaaa2440
Merge tag 'amd-drm-next-6.11-2024-06-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
...
amd-drm-next-6.11-2024-06-07:
amdgpu:
- DCN 4.0.x support
- DCN 3.5 updates
- GC 12.0 support
- DP MST fixes
- Cursor fixes
- MES11 updates
- MMHUB 4.1 support
- DML2 Updates
- DCN 3.1.5 fixes
- IPS fixes
- Various code cleanups
- GMC 12.0 support
- SDMA 7.0 support
- SMU 13 updates
- SR-IOV fixes
- VCN 5.x fixes
- MES12 support
- SMU 14.x updates
- Devcoredump improvements
- Fixes for HDP flush on platforms with >4k pages
- GC 9.4.3 fixes
- RAS ACA updates
- Silence UBSAN flex array warnings
- MMHUB 3.3 updates
amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes
radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings
UAPI:
- GFX12 modifier and DCC support
Proposed Mesa changes:
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29510
- KFD GFX ALU exceptions
Proposed ROCdebugger changes:
08c760622b
944fe1c141
- KFD Contiguous VRAM allocation flag
Proposed ROCr/HIP changes:
f7b4a26991
26e8530d05
1d48f2a1ab
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexander.deucher@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240607195900.902537-1-alexander.deucher@amd.com
2024-06-11 14:01:55 +10:00