linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-20 15:53:59 -04:00

Author	SHA1	Message	Date
Mukul Joshi	2f77b9a242	drm/amdkfd: Update MQD management on multi XCC setup Update MQD management for both HIQ and user-mode compute queues on a multi XCC setup. MQDs needs to be allocated, initialized, loaded and destroyed for each XCC in the KFD node. v2: squash in fix "drm/amdkfd: Fix SDMA+HIQ HQD allocation on GFX9.4.3" Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Amber Lin <Amber.Lin@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:36 -04:00
Mukul Joshi	8dc1db3172	drm/amdkfd: Introduce kfd_node struct (v5) Introduce a new structure, kfd_node, which will now represent a compute node. kfd_node is carved out of kfd_dev structure. kfd_dev struct now will become the parent of kfd_node, and will store common resources such as doorbells, GTT sub-alloctor etc. kfd_node struct will store all resources specific to a compute node, such as device queue manager, interrupt handling etc. This is the first step in adding compute partition support in KFD. v2: introduce kfd_node struct to gc v11 (Hawking) v3: make reference to kfd_dev struct through kfd_node (Morris) v4: use kfd_node instead for kfd isr/mqd functions (Morris) v5: rebase (Alex) Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Tested-by: Amber Lin <Amber.Lin@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:27 -04:00
Lijo Lazar	5cf1675591	drm/amdgpu: Add mode2 reset logic for v13.0.6 Mode2 reset for v13.0.6 has similar workflow as v13.0.2 Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:20 -04:00
Lijo Lazar	e6a02e2cc7	drm/amdgpu: Add some XCC programming Add additional XCC programming sequences. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:17 -04:00
Le Ma	15091a6f43	drm/amdgpu: add node_id to physical id conversion in EOP handler A new field nodeid in interrupt cookie indicates the node ID. Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Shiwu Zhang <shiwu.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:13 -04:00
Shiwu Zhang	147862d00b	drm/amdgpu: enable the ring and IB test for slave kcq With the mec FW update to utilize the mqd base set by driver for kcq mapping, slave kcq ring test and IB test can be re-enabled. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:08 -04:00
Hawking Zhang	89cf4549a9	drm/amdgpu: support gc v9_4_3 ring_test running on all xcc Each xcc has its own sratch_reg offset Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:05 -04:00
James Zhu	ae972ed5e0	drm/amdgpu: fix vcn doorbell range setting Should use vcn_ring0_1 instead of doorbell index to set nbio doorbell range. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:02 -04:00
James Zhu	6ddae0f3ab	drm/amdgpu/jpeg: enable jpeg doorbell for jpeg4.0.3 Enable jpeg doorbell for jpeg4.0.3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:42:00 -04:00
James Zhu	c21d446ba7	drm/amdgpu/vcn: enable vcn doorbell for vcn4.0.3 Enable vcn doorbell for vcn4.0.3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:57 -04:00
James Zhu	d7fd2a9e39	drm/amdgpu/nbio: update vcn doorbell range VCN4.0.3 used up to 16 doorbells per partition. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:55 -04:00
James Zhu	db77081fe3	drm/amdgpu/jpeg: add multiple jpeg rings support for vcn4_0_3 Add multiple jpeg rings support for vcn4_0_3 Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:46 -04:00
James Zhu	bc22455384	drm/amdgpu/jpeg: add multiple jpeg rings support Add multiple jpeg rings support. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:43 -04:00
James Zhu	31c0ec84f9	drm/amdgpu/vcn: enable vcn DPG mode for VCN4_0_3 Enable vcn DPG mode for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:41 -04:00
James Zhu	ef3aa0b40c	drm/amdgpu/vcn: enable vcn pg for VCN4_0_3 Enable vcn pg for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:39 -04:00
James Zhu	342397db6d	drm/amdgpu/vcn: enable vcn cg for VCN4_0_3 Enable vcn cg for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:36 -04:00
James Zhu	b7179fc29f	drm/amdgpu/jpeg: enable jpeg pg for VCN4_0_3 Enable jpeg pg for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:32 -04:00
James Zhu	380302f8b8	drm/amdgpu/jpeg: enable jpeg cg for VCN4_0_3 Enable jpeg cg for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:29 -04:00
James Zhu	b889ef4ac9	drm/amdgpu/vcn: add vcn support for VCN4_0_3 Add vcn support for VCN4_0_3. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:41:27 -04:00
James Zhu	e684e654eb	drm/amdgpu/jpeg: add jpeg support for VCN4_0_3 Add jpeg support for VCN4_0_3. v2: squash in delayed work typo fix (Alex) Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:56 -04:00
James Zhu	76e5e4c701	drm/amdgpu: add VCN4_0_3 firmware Add VCN4_0_3 firmware. v2: fix fw name (Alex) Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:53 -04:00
James Zhu	81283fee15	drm/amdgpu/: add more macro to support offset variant Add more macro to support offset variant and simplify macro SOC15_WAIT_ON_RREG. Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:46 -04:00
Lijo Lazar	9b4fd27601	drm/amdgpu: Use the correct API to read register Use SOC15 API so that the register offset is calculated correctly. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:43 -04:00
Amber Lin	f544afac3f	drm/amdgpu: Add kgd2kfd for GC 9.4.3 New GC (v9.4.3) and ATHUB (v1.8.0) versions are used. Add kgd_gfx_v9_4_3_* functions if registers in use of kgd_gfx_v9_* functions are changed or have different offset. Signed-off-by: Amber Lin <Amber.Lin@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Mukul Joshi <mukul.joshi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:40 -04:00
Shiwu Zhang	62e790879e	drm/amdgpu: alloc vm inv engines for every vmhub There are AMDGPU_MAX_VMHUBS of vmhub in maximum and need to init the vm_inv_engs for all of them. In this way, the below error can be ruled out. [ 217.317752] amdgpu 0000:02:00.0: amdgpu: no VM inv eng for ring sdma0 Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Christian Koenig <Christian.Koenig@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:35 -04:00
Shiwu Zhang	0fa49d1083	drm/amdgpu: override partition mode through module parameter Add a module parameter to override the partition mode. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:32 -04:00
Shiwu Zhang	99951878b0	drm/amdgpu: make the WREG32_SOC15_xx macro to support multi GC To write regs on different GCDs, use the inst index. Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:29 -04:00
Le Ma	98a54e88e8	drm/amdgpu: add sysfs node for compute partition mode Add current/available compute partitin mode sysfs node. v2: make the sysfs node as IP independent one in amdgpu_gfx.c Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:25 -04:00
Le Ma	3566938b34	drm/amdgpu: assign different AMDGPU_GFXHUB for rings on each xcc Pass the xcc_id to AMDGPU_GFXHUB(x) Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:22 -04:00
Le Ma	ce8a12a532	drm/amdgpu: init vmhubs bitmask for GC 9.4.3 Each XCD owns one GFXHUB. v2: switch to the new VMHUB layout Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:20 -04:00
Le Ma	d9426c3d9b	drm/amdgpu: add bitmask to iterate vmhubs As the layout of VMHUB definition has been changed to cover multiple XCD/AID case, the original num_vmhubs is not appropriate to do vmhub iteration any more. Drop num_vmhubs and introduce vmhubs_mask instead. v2: switch to the new VMHUB layout v3: use DECLARE_BITMAP to define vmhubs_mask Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:17 -04:00
Le Ma	b35ce49ab9	drm/amdgpu: assign register address for vmhub object on each XCD Each XCD has its own gfxhub. v2: switch to the new VMHUB layout v3: fix mistake Signed-off-by: Le Ma <le.ma@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:12 -04:00
Hawking Zhang	f4caf58426	drm/amdgpu: introduce vmhub definition for multi-partition cases (v3) v1: Each partition has its own gfxhub or mmhub. adjust the num of MAX_VMHUBS and the GFXHUB/MMHUB layout (Le) v2: re-design the AMDGPU_GFXHUB/AMDGPU_MMHUB layout (Le) v3: apply the gfxhub/mmhub layout to new IPs (Hawking) v4: fix up gmc11 (Alex) v5: rebase (Alex) Signed-off-by: Le Ma <le.ma@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:40:03 -04:00
Guchun Chen	3083b1007d	drm/amdgpu: skip disabling fence driver src_irqs when device is unplugged When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive. [ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] <TASK> [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:39:09 -04:00
Alex Sierra	6e87c42295	drm/amdgpu: improve wait logic at fence polling Accomplish this by reading the seq number right away instead of sleep for 5us. There are certain cases where the fence is ready almost immediately. Sleep number granularity was also reduced as the majority of the kiq tlb flush takes between 2us to 6us. Signed-off-by: Alex Sierra <alex.sierra@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:39:06 -04:00
Alex Deucher	17d62410ae	drm/amdgpu/gmc11: implement get_vbios_fb_size() Implement get_vbios_fb_size() so we can properly reserve the vbios splash screen to avoid potential artifacts on the screen during the transition from the pre-OS console to the OS console. Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:39:01 -04:00
Jesse Zhang	82ad22bbad	drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to revision id Due to the raven2 and raven/picasso maybe have the same GC_HWIP version. So differentiate them by revision id. Signed-off-by: shanshengwang <shansheng.wang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:38:54 -04:00
Yifan Zhang	6d99f3f4ea	drm/amdgpu: change gfx 11.0.4 external_id range gfx 11.0.4 range starts from 0x80. Fixes: `311d52367d` ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reported-by: Yogesh Mohan Marimuthu <Yogesh.Mohanmarimuthu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:38:25 -04:00
Srinivasan Shanmugam	1d6ecab1ac	drm/amd/amdgpu: Fix warnings in amdgpu _object, _ring.c Fix below warnings reported by checkpatch: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' WARNING: static const char * array should probably be static const char * const WARNING: space prohibited between function name and open parenthesis '(' WARNING: braces {} are not necessary for single statement blocks WARNING: Symbolic permissions 'S_IRUGO' are not preferred. Consider using octal permissions '0444'. Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:38:20 -04:00
Guilherme G. Piccoli	ee30b8001c	drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as well (Bas: speculative change to mirror gfx10/gfx9) Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:38:15 -04:00
Bas Nieuwenhuizen	a39b52c838	drm/amdgpu/gfx10: Disable gfxoff before disabling powergating. Otherwise we get a full system lock (looks like a FW mess). Copied the order from the GFX9 powergating code. Fixes: `366468ff6c` ("drm/amdgpu: Allow GfxOff on Vangogh as default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2545 Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:37:58 -04:00
Dan Carpenter	3fb9dd5fef	drm/amdgpu: release correct lock in amdgpu_gfx_enable_kgq() This function was releasing the incorrect lock on the error path. Reported-by: kernel test robot <lkp@intel.com> Fixes: `1156e1a60f` ("drm/amdgpu: add [en/dis]able_kgq() functions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:36:53 -04:00
Saleemkhan Jamadar	c488a9370d	drm/amdgpu/jpeg: Remove harvest checking for JPEG3 Register CC_UVD_HARVESTING is obsolete for JPEG 3.1.2 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:36:45 -04:00
Guchun Chen	d97b02bb9c	drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert. So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq should be executed under such condition, otherwise, an amdgpu_irq_put calltrace will occur. [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [ 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08: ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10: ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [ 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.170978] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.170986] Call Trace: [ 7283.170988] <TASK> [ 7283.170989] gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.172245] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412] pci_pm_freeze+0x54/0xc0 [ 7283.173419] ? __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425] dpm_run_callback+0x98/0x200 [ 7283.173430] __device_suspend+0x164/0x5f0 v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq funcs(Hawking) Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:36:38 -04:00
Guchun Chen	eb4f01784e	drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini, driver unconditionally disables ecc_irq which is only enabled on those asics enabling sdma ecc. This will introduce a warning in suspend cycle on those chips with sdma ip v4.0, while without sdma ecc. So this patch correct this. [ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246 [ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000 [ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000 [ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006 [ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390 [ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.167032] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.167036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.167041] Call Trace: [ 7283.167046] <TASK> [ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu] [ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.169464] pci_pm_freeze+0x54/0xc0 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:36:30 -04:00
Lin.Cao	4994d1f0a7	drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2) v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: `e18aaea733` ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:35:30 -04:00
Horatio Zhang	2f48965bdc	drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcs The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:35:20 -04:00
YuBiao Wang	db5dcd476e	drm/amdgpu: set default num_kcq to 2 under sriov The number of kernel queues has impact on the latency under sriov usecase. So to reduce the latency we set the default num_kcq = 2 under sriov if not set manually. Signed-off-by: YuBiao Wang <YuBiao.Wang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:34:35 -04:00
Lang Yu	187916e6ed	drm/amdgpu: install stub fence into potential unused fence pointers When using cpu to update page tables, vm update fences are unused. Install stub fence into these fence pointers instead of NULL to avoid NULL dereference when calling dma_fence_wait() on them. Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lang Yu <Lang.Yu@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:34:15 -04:00
Jiapeng Chong	9e72813f69	drm/amdgpu: Remove the unused variable golden_settings_gc_9_4_3 Variable golden_settings_gc_9_4_3 is not effectively used, so delete it. drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c:48:38: warning: ‘golden_settings_gc_9_4_3’ defined but not used. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=4877 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2023-06-09 09:34:03 -04:00

... 13 14 15 16 17 ...

12937 Commits