Commit Graph

13528 Commits

Author SHA1 Message Date
Philip Yang
0c2dece8fb drm/amdkfd: Page aligned memory reserve size
Use page aligned size to reserve memory usage because page aligned TTM
BO size is used to unreserve memory usage, otherwise no page aligned
size causes memory usage accounting unbalanced.

Change vram_used definition type to int64_t to be able to trigger
WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
accounting issue with warning and backtrace.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11 16:41:03 -05:00
Philip Yang
23b02b0e46 drm/amdkfd: Cleanup vm process info if init vm failed
If acquire_vm failed when initializing KFD vm, set vm->process_info to
NULL and free process info, otherwise, the future acquire_vm will
always fail as vm->process_info is not NULL.

Pass avm as parameter to remove the duplicate code.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11 16:40:45 -05:00
Eric Huang
a6941f89d7 drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 18:11:08 -05:00
Eric Huang
7f347e3f82 drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 18:05:03 -05:00
YiPeng Chai
40794dfd20 drm/amdgpu: Fixed bug on error when unloading amdgpu
Fixed bug on error when unloading amdgpu.

The error message is as follows:
[  377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[  377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G          IOE      6.0.0-thomas #1
[  377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[  377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[  377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[  377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[  377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[  377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[  377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[  377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[  377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[  377.706325] FS:  00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[  377.706334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[  377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  377.706361] Call Trace:
[  377.706365]  <TASK>
[  377.706369]  drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[  377.706376]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[  377.706572]  amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[  377.706650]  amdgpu_bo_fini+0x22/0x90 [amdgpu]
[  377.706727]  gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[  377.706821]  amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[  377.706897]  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[  377.706975]  drm_dev_release+0x20/0x40 [drm]
[  377.707006]  release_nodes+0x35/0xb0
[  377.707014]  devres_release_all+0x8b/0xc0
[  377.707020]  device_unbind_cleanup+0xe/0x70
[  377.707027]  device_release_driver_internal+0xee/0x160
[  377.707033]  driver_detach+0x44/0x90
[  377.707039]  bus_remove_driver+0x55/0xe0
[  377.707045]  pci_unregister_driver+0x3b/0x90
[  377.707052]  amdgpu_exit+0x11/0x6c [amdgpu]
[  377.707194]  __x64_sys_delete_module+0x142/0x2b0
[  377.707201]  ? fpregs_assert_state_consistent+0x22/0x50
[  377.707208]  ? exit_to_user_mode_prepare+0x3e/0x190
[  377.707215]  do_syscall_64+0x38/0x90
[  377.707221]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
030001288f drm/amd: make amdgpu_ucode_validate static
No consumers outside of amdgpu_ucode.c use amdgpu_ucode_validate
anymore, so make the function static.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
b31d306378 drm/amd: Use amdgpu_ucode_* helpers for GPU info bin
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
e5a7d047f4 drm/amd: Use amdgpu_ucode_* helpers for CGS
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
52215e2a5d drm/amd: Use amdgpu_ucode_* helpers for VCE
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
b406477c61 drm/amd: Use amdgpu_ucode_* helpers for UVD
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
cb9bdfad22 drm/amd: Use amdgpu_ucode_* helpers for SDMA on CIK
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
d7f50961aa drm/amd: Use amdgpu_ucode_* helpers for SDMA3.0
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
10024cd73d drm/amd: Use amdgpu_ucode_* helpers for SDMA2.4
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
4b1c8b6429 drm/amd: Use amdgpu_ucode_* helpers for GMC8
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
ee138d86ef drm/amd: Use amdgpu_ucode_* helpers for GMC7
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
2d70575b38 drm/amd: Use amdgpu_ucode_* helpers for GMC6
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
0aaafb7359 drm/amd: Use amdgpu_ucode_* helpers for GFX8
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
469f199e47 drm/amd: Use amdgpu_ucode_* helpers for GFX7
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
39d3649b16 drm/amd: Use amdgpu_ucode_* helpers for GFX6
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
62a27480b7 drm/amd: Optimize SRIOV switch/case for PSP microcode load
Now that IP version decoding is used, a number of case statements
can be combined.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
07dbfc6b10 drm/amd: Use amdgpu_ucode_* helpers for PSP
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:57 -05:00
Mario Limonciello
2d39c7ae37 drm/amd: Load PSP microcode during early_init
Simplifies the code so that all PSP versions will get the firmware
name from `amdgpu_ucode_ip_version_decode` and then use this filename
to load microcode as part of the early_init process.

Any failures will cause the driver to fail to probe before the firmware
framebuffer has been removed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:56 -05:00
Mario Limonciello
93fec4f8c1 drm/amd: Avoid BUG() for case of SRIOV missing IP version
No need to crash the kernel.  AMDGPU will now fail to probe.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:56 -05:00
Mario Limonciello
994a97447e drm/amd: Parse both v1 and v2 TA microcode headers using same function
Several IP versions duplicate code and can't use the common helpers.
Move this code into a single function so that the helpers can be used.

v2: squash in fix from Mario to remove duplicate ta parsing

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:56 -05:00
Christian König
3bd68b32c9 drm/amdgpu: fix pipeline sync v2
This fixes a potential memory leak of dma_fence objects in the CS code
as well as glitches in firefox because of missing pipeline sync.

v2: use the scheduler instead of the fence context

Signed-off-by: Christian König <christian.koenig@amd.com>
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2323
Tested-by: Michal Kubecek mkubecek@suse.cz
Tested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230109130120.73389-1-christian.koenig@amd.com
2023-01-10 11:45:37 +01:00
Uwe Kleine-König
000458b596 drm: Only select I2C_ALGOBIT for drivers that actually need it
While working on a drm driver that doesn't need the i2c algobit stuff I
noticed that DRM selects this code even though only 8 drivers actually use
it. While also only some drivers use i2c, keep the select for I2C for the
next cleanup patch. Still prepare this already by also selecting I2C for
the individual drivers.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221219083627.1401627-1-u.kleine-koenig@pengutronix.de
2023-01-10 11:15:44 +01:00
Yang Li
b8743f5dcc drm/amdgpu: clean up some inconsistent indentings
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:65 amdgpu_gem_fault() warn: inconsistent indenting

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3639
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
e045aec89d drm/amd: Load GFX11 microcode during early_init
If GFX11 microcode is required but not available during early init, the
firmware framebuffer will have already been released and the screen will
freeze.

Move the request for GFX11 microcode into the early_init phase
so that if it's not available, driver init will fail.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
c88135c089 drm/amd: Use amdgpu_ucode_* helpers for GFX11
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper will provide symmetery on unload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
9931b67690 drm/amd: Load GFX10 microcode during early_init
Simplifies the code so that GFX10 will get the firmware
name from `amdgpu_ucode_ip_version_decode` and then use this filename
to load microcode as part of the early_init process.

Any failures will cause the driver to fail to probe before the firmware
framebuffer has been removed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
3da9b71563 drm/amd: Use amdgpu_ucode_* helpers for GFX10
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
1c21885ec1 drm/amd: Load GFX9 microcode during early_init
If GFX9 microcode is required but not available during early init, the
firmware framebuffer will have already been released and the screen will
freeze.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
ec787deb2d drm/amd: Use amdgpu_ucode_* helpers for GFX9
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper will provide symmetry on unload.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
e78105c8c4 drm/amd: Remove superfluous assignment for adev->mes.adev
`amdgpu_mes_init` already sets `adev->mes.adev`, so there is no need
to also set it in the IP specific versions.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
11e0b0067e drm/amd: Use amdgpu_ucode_* helpers for MES
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper provides symmetry for releasing firmware.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
cc42e76e7d drm/amd: Load MES microcode during early_init
Add an early_init phase to MES for fetching and validating microcode
from the filesystem.

If MES microcode is required but not available during early init, the
firmware framebuffer will have already been released and the screen will
freeze.

Move the request for MES microcode into the early_init phase
so that if it's not available, early_init will fail.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
69939009bd drm/amd: Load VCN microcode during early_init
Simplifies the code so that all VCN versions will get the firmware
name from `amdgpu_ucode_ip_version_decode` and then use this filename
to load microcode as part of the early_init process.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:18 -05:00
Mario Limonciello
33efaf829d drm/amd: Use amdgpu_ucode_* helpers for VCN
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:17 -05:00
Mario Limonciello
6675402a47 drm/amd: Make SDMA firmware load failures less noisy.
When firmware is missing we get failures at every step.
```
[    3.855086] amdgpu 0000:04:00.0: Direct firmware load for amdgpu/green_sardine_sdma.bin failed with error -2
[    3.855087] [drm:amdgpu_sdma_init_microcode [amdgpu]] *ERROR* SDMA: Failed to init firmware "amdgpu/green_sardine_sdma.bin"
[    3.855398] [drm:sdma_v4_0_early_init [amdgpu]] *ERROR* Failed to load sdma firmware!
```
Realistically we don't need all of these, a user can tell from the first one
that request_firmware emitted what happened. Drop the others.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:17 -05:00
Mario Limonciello
1336b4e72c drm/amd: Convert SDMA to use amdgpu_ucode_ip_version_decode
Simplifies the code so that all SDMA versions will get the firmware
name from `amdgpu_ucode_ip_version_decode`.

v2: squash in fix from Srinivasan

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:17 -05:00
Mario Limonciello
e43229824d drm/amd: Use amdgpu_ucode_request helper for SDMA
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 17:02:09 -05:00
YiPeng Chai
99f1a36c90 drm/amdgpu: Fixed bug on error when unloading amdgpu
Fixed bug on error when unloading amdgpu.

The error message is as follows:
[  377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[  377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G          IOE      6.0.0-thomas #1
[  377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[  377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[  377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[  377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[  377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[  377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[  377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[  377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[  377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[  377.706325] FS:  00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[  377.706334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[  377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  377.706361] Call Trace:
[  377.706365]  <TASK>
[  377.706369]  drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[  377.706376]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[  377.706572]  amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[  377.706650]  amdgpu_bo_fini+0x22/0x90 [amdgpu]
[  377.706727]  gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[  377.706821]  amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[  377.706897]  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[  377.706975]  drm_dev_release+0x20/0x40 [drm]
[  377.707006]  release_nodes+0x35/0xb0
[  377.707014]  devres_release_all+0x8b/0xc0
[  377.707020]  device_unbind_cleanup+0xe/0x70
[  377.707027]  device_release_driver_internal+0xee/0x160
[  377.707033]  driver_detach+0x44/0x90
[  377.707039]  bus_remove_driver+0x55/0xe0
[  377.707045]  pci_unregister_driver+0x3b/0x90
[  377.707052]  amdgpu_exit+0x11/0x6c [amdgpu]
[  377.707194]  __x64_sys_delete_module+0x142/0x2b0
[  377.707201]  ? fpregs_assert_state_consistent+0x22/0x50
[  377.707208]  ? exit_to_user_mode_prepare+0x3e/0x190
[  377.707215]  do_syscall_64+0x38/0x90
[  377.707221]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-01-09 16:51:24 -05:00
Mario Limonciello
1923bc5a56 drm/amd: Delay removal of the firmware framebuffer
Removing the firmware framebuffer from the driver means that even
if the driver doesn't support the IP blocks in a GPU it will no
longer be functional after the driver fails to initialize.

This change will ensure that unsupported IP blocks at least cause
the driver to work with the EFI framebuffer.

Cc: stable@vger.kernel.org
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:50:00 -05:00
Luben Tuikov
0be7ed8e7e drm/amdgpu: Fix potential NULL dereference
Fix potential NULL dereference, in the case when "man", the resource manager
might be NULL, when/if we print debug information.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Fixes: 7554886daa ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:49:43 -05:00
Mario Limonciello
2210af50ae drm/amd: Add a new helper for loading/validating microcode
All microcode runs a basic validation after it's been loaded. Each
IP block as part of init will run both.

Introduce a wrapper for request_firmware and amdgpu_ucode_validate.
This wrapper will also remap any error codes from request_firmware
to -ENODEV.  This is so that early_init will fail if firmware couldn't
be loaded instead of the IP block being disabled.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:27:58 -05:00
Mario Limonciello
54a3e03234 drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"
This will allow other parts of the driver that currently special
case firmware file names to before IP version style naming to just
have a single call to `amdgpu_ucode_ip_version_decode`.

Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:27:24 -05:00
Mario Limonciello
b7cdb41e7d drm/amd: Delay removal of the firmware framebuffer
Removing the firmware framebuffer from the driver means that even
if the driver doesn't support the IP blocks in a GPU it will no
longer be functional after the driver fails to initialize.

This change will ensure that unsupported IP blocks at least cause
the driver to work with the EFI framebuffer.

Cc: stable@vger.kernel.org
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:26:48 -05:00
Luben Tuikov
08e60fac1d drm/amdgpu: Fix potential NULL dereference
Fix potential NULL dereference, in the case when "man", the resource manager
might be NULL, when/if we print debug information.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: AMD Graphics <amd-gfx@lists.freedesktop.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: kernel test robot <lkp@intel.com>
Fixes: 7554886daa ("drm/amdgpu: Fix size validation for non-exclusive domains (v4)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-09 16:24:26 -05:00
Christian König
41cc108b24 drm/amdgpu: fix missing dma_fence_put in error path
When the fence can't be added we need to drop the reference.

Suggested-by: Bert Karwatzki <spasswolf@web.de>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-2-christian.koenig@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-05 20:34:58 +01:00
Christian König
ed21f6c3fe drm/amdgpu: fix another missing fence reference in the CS code
drm_sched_job_add_dependency() consumes the references of the gang
members. Only triggered by mesh shaders.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 1728baa7e4 ("drm/amdgpu: use scheduler dependencies for CS")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Bert Karwatzki <spasswolf@web.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230105111703.52695-1-christian.koenig@amd.com
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
2023-01-05 20:34:58 +01:00