Commit Graph

11783 Commits

Author SHA1 Message Date
Alex Deucher
f732e2b3c6 drm/amdgpu/vcn4: add missing encoder cap
VCN4.x supports AV1 encode.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 13:26:26 -05:00
David (Ming Qiang) Wu
f823323b4a drm/amdgpu: limit AV1 to the first instance on VCN4 encode
AV1 is only supported on the first instance.
Added vcn_v4_0_enc_find_ib_param() to help search for an IB param.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 13:26:24 -05:00
Wayne Lin
f0127cb112 drm/amdgpu/display/mst: adjust the naming of mst_port and port of aconnector
[why & how]
The term (i.e. port & mst_port) that we used to use in amdgpu is a bit
confusing. Rename them to mst_output_port & mst_root respectively.

Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Acked-by: Harry Wentland <harry.wentland@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 13:26:12 -05:00
Tim Huang
e11c775030 drm/amdgpu: skip psp suspend for IMU enabled ASICs mode2 reset
The psp suspend & resume should be skipped to avoid destroy
the TMR and reload FWs again for IMU enabled APU ASICs.

Signed-off-by: Tim Huang <tim.huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 12:27:37 -05:00
Li Ma
a462ef872f drm/amdgpu: declare firmware for new MES 11.0.4
To support new mes ip block

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 12:24:34 -05:00
Li Ma
96a5dec18e drm/amdgpu: enable imu firmware for GC 11.0.4
The GC 11.0.4 needs load IMU to power up GFX before loads GFX firmware.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 12:24:24 -05:00
Jonathan Kim
601ff52237 drm/amdgpu: remove unconditional trap enable on add gfx11 queues
Rebase of driver has incorrect unconditional trap enablement
for GFX11 when adding mes queues.

Reported-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Graham Sider <graham.sider@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-24 12:23:45 -05:00
Marek Olšák
e3e84b0a03 drm/amdgpu: return the PCIe gen and lanes from the INFO ioctl
For computing PCIe bandwidth in userspace and troubleshooting PCIe
bandwidth issues. Note that this intentionally fills holes and padding
in drm_amdgpu_info_device.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20790

Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:26 -05:00
Pierre-Eric Pelloux-Prayer
26fd808b01 drm/amdgpu: print bo inode number instead of ptr
This allows to correlate the infos printed by
/sys/kernel/debug/dri/n/amdgpu_gem_info to the ones found
in /proc/.../fdinfo and /sys/kernel/debug/dma_buf/bufinfo.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:26 -05:00
Tao Zhou
071f526a13 drm/amdgpu: retire unused get_umc_v6_7_channel_index
Fix the following compile warning:

drivers/gpu/drm/amd/amdgpu/umc_v6_7.c:53:24: warning: unused function 'get_umc_v6_7_channel_index' [-Wunused-function]
static inline uint32_t get_umc_v6_7_channel_index(struct amdgpu_device *adev,
                          ^
1 warning generated.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:26 -05:00
YiPeng Chai
2cfb737b4b drm/amdgpu: Optimize sdma ras block initialization code for sdma v4_0
Optimize sdma ras block initialization code for sdma v4_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:26 -05:00
YiPeng Chai
a57b24e170 drm/amdgpu: Add sdma ras function on sdma v6_0_3
Add sdma ras function on sdma v6_0_3.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:26 -05:00
Mario Limonciello
96b810d8c6 drm/amd: decrease message about missing PSP runtime database to debug
Laptops with APUs from a variety of manufacturers and generations
show a warning about a missing PSP runtime database.

As it's not required for PSP to dump this database into framebuffer,
decrease messages about it missing to debug.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:25 -05:00
Alex Deucher
6482ba5d4b drm/amdgpu/vcn4: fail to schedule IB for AV1 if VCN0 is harvested
Only VCN0 supports AV1.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:25 -05:00
Alex Deucher
a6de636eb0 drm/amdgpu/soc21: don't expose AV1 if VCN0 is harvested
Only VCN0 supports AV1.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:25 -05:00
Alex Deucher
3c6f90f4aa drm/amdgpu/vcn3: fail to schedule IB for AV1 if VCN0 is harvested
Only VCN0 supports AV1.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:25 -05:00
Alex Deucher
384334120b drm/amdgpu/nv: don't expose AV1 if VCN0 is harvested
Only VCN0 supports AV1.

Reviewed-by: Leo Liu <leo.liu@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-19 17:24:25 -05:00
Lang Yu
25959dd67d drm/amdgpu: allow multipipe policy on ASICs with one MEC
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.

This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18 22:48:49 -05:00
Lang Yu
99761aaa1c drm/amdgpu: correct MEC number for gfx11 APUs
There is only one MEC on these APUs.

Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18 22:48:41 -05:00
Guilherme G. Piccoli
09eb3ea391 drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check
The HW model validation that guards the indirect SRAM checking in the
VCN code path is redundant - there's no model that's not included in the
switch, making it useless in practice [0].

So, let's remove this switch statement for good.

[0] lore.kernel.org/amd-gfx/MN0PR12MB61013D20B8A2263B22AE1BCFE2C19@MN0PR12MB6101.namprd12.prod.outlook.com

Suggested-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: James Zhu <James.Zhu@amd.com>
Cc: Lazar Lijo <Lijo.Lazar@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Cc: Sonny Jiang <sonny.jiang@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18 22:46:27 -05:00
Guilherme G. Piccoli
2ed9e22ed7 drm/amdgpu/vcn: Adjust firmware names indentation
This is an incredibly trivial fix, just for the sake of
"aesthetical" organization of the defines. Some were space based,
most were tab based and there was a lack of "alignment", now it's
all the same and aligned.

Cc: James Zhu <James.Zhu@amd.com>
Cc: Lazar Lijo <Lijo.Lazar@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Sonny Jiang <sonny.jiang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-18 22:46:13 -05:00
Leo Liu
cf22ef78f2 drm/amdgpu: Use the sched from entity for amdgpu_cs trace
The problem is that base sched hasn't been assigned yet at this moment,
causing something like "ring=0" all the time from trace.

mpv:cs0-3473    [002] ..... 129.047431: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473    [002] ..... 129.089125: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473    [002] ..... 129.130987: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473    [002] ..... 129.172478: amdgpu_cs: ring=0, dw=48, fences=0

Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
Stanley.Yang
442d61af79 drm/amdgpu: correct query xgmi3x16 pcs error status
There is xgmi3x16 pcs error status for aldebaran, driver should
check xgmi3x16 pcs error status field instead of gopx16 pcs error
status field.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
Stanley.Yang
828fc79dcf drm/amdgpu: support check xgmi/walf error mask bit for aldebaran
The pcs error count should be determined by PCS ERROR status and
PCS ERROR MASK registers, only PCS ERROR status register can not
refect error counts accurately.

Changed from V1:
	remove clean noncorrectable mask registers
	optimize query pcs error status

Changed from V2:
	remove check mask_value bits
	correct set value corresponding bit

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
Christian König
1427a72027 drm/amdgpu: fix amdgpu_job_free_resources v2
It can be that neither fence were initialized when we run out of UVD
streams for example.

v2: fix typo breaking compile

Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2324
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
YiPeng Chai
4da9932efe drm/amdgpu: Optimize gfx ras block initialization code for gfx v9_0
Use gfx ras common initialization interface to
initialize gfx ras block.

V2:
  Update function call due to amdgpu_gfx_ras_sw_init
interface changes.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
Thomas Zimmermann
53a17b6b75 drm/amdgpu: Fix coding style
Align a closing brace and remove trailing whitespaces. No functional
changes.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:52 -05:00
Mario Limonciello
ced6950276 drm/amd: Evaluate early init for all IP blocks even if one fails
If early init fails for a single IP block, then no further IP blocks
are evaluated.  This means that if a user was missing more than one
firmware binary they would have to keep adding binaries and re-probing
until they discovered the ones missing.

To make this easier, run early init for each IP block and report a single
failure if not all passed.

Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
Mario Limonciello
bda88a26f5 drm/amd: Remove needless break for legacy IP discovery MP0 9.0.0
There is already a "default" case in the switch block, so there is
no need to have a break after the switch block.

Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
Christian König
4d3d5e6c07 drm/amdgpu: fix cleaning up reserved VMID on release
We need to reset this or otherwise run into list corruption later on.

Fixes: e44a0fe630 ("drm/amdgpu: rework reserved VMID handling")
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
8f453c51cf drm/amdgpu: Adjust ras support check condition for special asic
[Why]:
     Amdgpu ras uses amdgpu_ras_is_supported to check whether
  the ras block supports the ras function. amdgpu_ras_is_supported
  uses .ras_enabled to determine whether the ras function of the
  block is enabled.
     But for special asic with mem ecc enabled but sram ecc not
  enabled, some ras blocks support poison mode but their ras function
  is not enabled on .ras_enabled, these ras blocks will run abnormally.

[How]:
    If the ras block is not supported on .ras_enabled but the asic
  supports poison mode and the ras block has ras configuration, it
  can be considered that the ras block supports ras function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
8c305a3fdf drm/amdgpu: Remove unnecessary ras block support check
[Why]:
   For special asic with mem ecc enabled but sram ecc
not enabled, some ras blocks can register their ras
configuration to ras list, but these ras blocks are not
enabled on .ras_enabled, so it can not get ras block
object using amdgpu_ras_get_ras_block.

[How]:
   Remove ras block support check. Even if the ras block
checked is not in the ras list, it will return a null
pointer and will have no effect.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
ac7b25d92c drm/amdgpu: Perform gpu reset after gfx finishes processing ras poison consumption on gfx_v11_0_3
Perform gpu reset after gfx finishes processing
ras poison consumption on gfx_v11_0_3.

V2:
 Move gfx poison consumption handler from hw_ops to ip
 function level.

V3:
 Adjust the calling position of amdgpu_gfx_poison_consumation_handler.

V4:
   Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null
 pointer check in amdgpu_ras_interrupt_poison_consumption_handler
 needs to be adjusted.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
790bef488b drm/amdgpu: Add gfx cp ecc error irq handling on gfx v11_0_3
V2:
  Optimize gfx_v11_0_set_cp_ecc_error_state function.

V3:
  Define macro constant for me pipe instance address interval.

V5:
  Register and handle gfx cp ecc error irq on gfx v11_0_3.

V6:
  Remove invalid intermediate function call.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
ae6f2db4d5 drm/amdgpu: Add gfx ras poison consumption irq handling on gfx v11_0_3
Add gfx ras poison consumption irq handling on gfx v11_0_3.

V2:
  Move ras poison consumption irq handling code of gfx
     v11_0_3 to gfx_v11_0_3.c.
V5:
  Create dedicated irq handler for RLC_GC_FED_INTERRUPT.

V6:
  Remove invalid function call.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:51 -05:00
YiPeng Chai
89e4c44881 drm/amdgpu: Add gfx ras function on gfx v11_0_3
Add gfx ras function on gfx v11_0_3.

V2:
 1. Add separate source files for gfx v11_0_3.
 2. Create a common function to initialize gfx ras block.

V3:
 1. Rename amdgpu_gfx_ras_block_init to amdgpu_gfx_ras_sw_init.
 2. Adjust the calling position of amdgpu_gfx_ras_sw_init.
 3. Remove gfx_v11_0_3_ras_ops.

V4:
 Revert changes in amdgpu_ras_interrupt_poison_consumption_handler.

V5:
 1. Remove invalid include file in gfx_v11_0_3.c.
 2. Reduce the number of parameters of amdgpu_gfx_ras_sw_init.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:50 -05:00
Mario Limonciello
0604897bc6 drm/amd: Fix renoir/green sardine MP0 IP version detection
The existing codebase never had a case for detecting MP0 version on
Renoir and instead relied upon hardcoded chip name.  This was missed as
part of the changes to migrate all IP blocks to build filenames from
`amdgpu_ucode.c`.

Consequently, Renoir tries to fetch a binary with 11_0_3 in the filename
and since it's supposed to have "renoir" in the filename fails to probe.
The fbdev still works though so the series worked.

Add a case for Renoir into the legacy table to ensure the right ASD and
TA firmware load again.

Reported-by: Ekene Akuneme <Ekene.Akuneme@amd.com>
Reported-by: Nicholas Choi <Nicholas.Choi@amd.com>
Cc: Alex Hung <Alex.Hung@amd.com>
Fixes: 994a97447e ("drm/amd: Parse both v1 and v2 TA microcode headers using same function")
Fixes: 54a3e03234 ("drm/amd: Add a legacy mapping to "amdgpu_ucode_ip_version_decode"")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 16:11:19 -05:00
Dan Carpenter
19d88e1df0 drm/amdgpu: Add a missing tab
This tab was deleted accidentally and triggers a Smatch warning:

    drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c:1006 gfx_v8_0_init_microcode()
    warn: inconsistent indenting

Add it back.

Fixes: 0aaafb7359 ("drm/amd: Use `amdgpu_ucode_*` helpers for GFX8")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-17 15:01:09 -05:00
Yifan Zha
60b7342974 drm/amdgpu: Skip specific mmhub and sdma registers accessing under sriov
[Why]
SDMA0_CNTL and MMHUB system aperture related registers are blocked by L1 Policy.
Therefore, they cannot be accessed by VF and loged in violation.

[How]
For MMHUB registers, they will be programmed by PF. So VF will skip to program them in mmhubv3_0.
For SDMA0_CNTL which is a PF_only register, VF don't need to program it in sdma_v6_0.

Signed-off-by: Yifan Zha <Yifan.Zha@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13 14:59:32 -05:00
Mario Limonciello
2b89da46a7 drm/amd: fix some dead code in gfx_v9_0_init_cp_compute_microcode
Some dead code was introduced as part of utilizing the `amdgpu_ucode_*`
helpers. Adjust the control flow to make sure that firmware is released
in the appropriate error flows.

Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1530548 ("Control flow issues")
Fixes: ec787deb2d ("drm/amd: Use `amdgpu_ucode_*` helpers for GFX9")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13 14:58:04 -05:00
Philip Yang
0c2dece8fb drm/amdkfd: Page aligned memory reserve size
Use page aligned size to reserve memory usage because page aligned TTM
BO size is used to unreserve memory usage, otherwise no page aligned
size causes memory usage accounting unbalanced.

Change vram_used definition type to int64_t to be able to trigger
WARN_ONCE(adev && adev->kfd.vram_used < 0, "..."), to help debug the
accounting issue with warning and backtrace.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11 16:41:03 -05:00
Philip Yang
23b02b0e46 drm/amdkfd: Cleanup vm process info if init vm failed
If acquire_vm failed when initializing KFD vm, set vm->process_info to
NULL and free process info, otherwise, the future acquire_vm will
always fail as vm->process_info is not NULL.

Pass avm as parameter to remove the duplicate code.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-11 16:40:45 -05:00
Eric Huang
7f347e3f82 drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU
The point bo->kfd_bo is NULL for queue's write pointer BO
when creating queue on mGPU. To avoid using the pointer
fixes the error.

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 18:05:03 -05:00
YiPeng Chai
40794dfd20 drm/amdgpu: Fixed bug on error when unloading amdgpu
Fixed bug on error when unloading amdgpu.

The error message is as follows:
[  377.706202] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[  377.706215] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[  377.706222] CPU: 4 PID: 8610 Comm: modprobe Tainted: G          IOE      6.0.0-thomas #1
[  377.706231] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 11/02/2021
[  377.706238] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[  377.706264] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[  377.706282] RSP: 0018:ffffad2dc4683cb8 EFLAGS: 00010287
[  377.706289] RAX: 0000000000000000 RBX: ffff8b1743bd5138 RCX: 0000000000000000
[  377.706297] RDX: ffff8b1743bd5160 RSI: ffff8b1743bd5c78 RDI: ffff8b16d1b25f70
[  377.706304] RBP: ffff8b1743bd59e0 R08: 0000000000000001 R09: 0000000000000001
[  377.706311] R10: ffff8b16c8572400 R11: ffffad2dc4683cf0 R12: ffff8b16d1b25f70
[  377.706318] R13: ffff8b16d1b25fd0 R14: ffff8b1743bd59c0 R15: ffff8b16d1b25f70
[  377.706325] FS:  00007fec56c72c40(0000) GS:ffff8b1836500000(0000) knlGS:0000000000000000
[  377.706334] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  377.706340] CR2: 00007f9b88c1ba50 CR3: 0000000110450004 CR4: 00000000003706e0
[  377.706347] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  377.706354] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  377.706361] Call Trace:
[  377.706365]  <TASK>
[  377.706369]  drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[  377.706376]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[  377.706572]  amdgpu_ttm_fini+0x12e/0x1a0 [amdgpu]
[  377.706650]  amdgpu_bo_fini+0x22/0x90 [amdgpu]
[  377.706727]  gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[  377.706821]  amdgpu_device_fini_sw+0xa1/0x3c0 [amdgpu]
[  377.706897]  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[  377.706975]  drm_dev_release+0x20/0x40 [drm]
[  377.707006]  release_nodes+0x35/0xb0
[  377.707014]  devres_release_all+0x8b/0xc0
[  377.707020]  device_unbind_cleanup+0xe/0x70
[  377.707027]  device_release_driver_internal+0xee/0x160
[  377.707033]  driver_detach+0x44/0x90
[  377.707039]  bus_remove_driver+0x55/0xe0
[  377.707045]  pci_unregister_driver+0x3b/0x90
[  377.707052]  amdgpu_exit+0x11/0x6c [amdgpu]
[  377.707194]  __x64_sys_delete_module+0x142/0x2b0
[  377.707201]  ? fpregs_assert_state_consistent+0x22/0x50
[  377.707208]  ? exit_to_user_mode_prepare+0x3e/0x190
[  377.707215]  do_syscall_64+0x38/0x90
[  377.707221]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
030001288f drm/amd: make amdgpu_ucode_validate static
No consumers outside of amdgpu_ucode.c use amdgpu_ucode_validate
anymore, so make the function static.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
b31d306378 drm/amd: Use amdgpu_ucode_* helpers for GPU info bin
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
e5a7d047f4 drm/amd: Use amdgpu_ucode_* helpers for CGS
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
52215e2a5d drm/amd: Use amdgpu_ucode_* helpers for VCE
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
b406477c61 drm/amd: Use amdgpu_ucode_* helpers for UVD
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00
Mario Limonciello
cb9bdfad22 drm/amd: Use amdgpu_ucode_* helpers for SDMA on CIK
The `amdgpu_ucode_request` helper will ensure that the return code for
missing firmware is -ENODEV so that early_init can fail.

The `amdgpu_ucode_release` helper is for symmetry on unloading.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-10 14:32:58 -05:00