We guard the suspend entry code from running unless we have proper
BIOS support for either S3 mode or s0ix mode.
If a user's system doesn't support either of these modes the kernel
still does offer s2idle in `/sys/power/mem_sleep` so there is an
expectation from users that it works even if the power consumption
remains very high.
Rafael Ávila de Espíndola reports that a system of his has a
non-functional graphics stack after resuming. That system doesn't
support S3 and the FADT doesn't indicate support for low power idle.
Through some experimentation it was concluded that even without the
hardware s0i3 support provided by the amd_pmc driver the power
consumption over suspend is decreased by running amdgpu's s0ix
suspend routine.
The numbers over suspend showed:
* No patch: 9.2W
* Skip amdgpu suspend entirely: 10.5W
* Run amdgpu s0ix routine: 7.7W
As this does improve the power, remove some of the guard rails in
`amdgpu_acpi.c` for only running s0ix suspend routines in the right
circumstances.
However if this turns out to cause regressions for anyone, we should
revert this change and instead opt for skipping suspend/resume routines
entirely or try to fix the underlying behavior that makes graphics fail
after resume without underlying platform support.
Reported-by: Rafael Ávila de Espíndola <rafael@espindo.la>
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This reverts commit fac53471d0.
The following change: move the drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is
the following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensure userspace can't access the
device instance anymore. If we call drm_dev_unplug after
amdgpu_driver_unload_kms then we observe IGT PCI software unplug
test failure (kernel hung) for all ASICs. This is how this
regression was found.
After this revert, the following commands do work not, but it would
be fixed in the next commit:
- sudo modprobe -r amdgpu
- sudo modprobe amdgpu
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Due to holidays we started -next with more -fixes in-flight than
usual, and people have been asking where they are. Backmerge to get
things better in sync.
Conflicts:
- Tiny conflict in drm_fbdev_generic.c between variable rename and
missing error handling that got added.
- Conflict in drm_fb_helper.c between the added call to vgaswitcheroo
in drm_fb_helper_single_fb_probe and a refactor patch that extracted
lots of helpers and incidentally removed the dev local variable.
Readd it to make things compile.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
AV1 is only supported on the first instance.
Added vcn_v4_0_enc_find_ib_param() to help search for an IB param.
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows to correlate the infos printed by
/sys/kernel/debug/dri/n/amdgpu_gem_info to the ones found
in /proc/.../fdinfo and /sys/kernel/debug/dma_buf/bufinfo.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Laptops with APUs from a variety of manufacturers and generations
show a warning about a missing PSP runtime database.
As it's not required for PSP to dump this database into framebuffer,
decrease messages about it missing to debug.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always enable multipipe policy on ASICs with GC VERSION > 9.0.0
instead of MEC number > 1.
This will allow multipipe policy on ASICs with one MEC,
e.g., gfx11 APUs.
Signed-off-by: Lang Yu <Lang.Yu@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The problem is that base sched hasn't been assigned yet at this moment,
causing something like "ring=0" all the time from trace.
mpv:cs0-3473 [002] ..... 129.047431: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.089125: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.130987: amdgpu_cs: ring=0, dw=48, fences=0
mpv:cs0-3473 [002] ..... 129.172478: amdgpu_cs: ring=0, dw=48, fences=0
Fixes: 4624459c84 ("drm/amdgpu: add gang submit frontend v6")
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is xgmi3x16 pcs error status for aldebaran, driver should
check xgmi3x16 pcs error status field instead of gopx16 pcs error
status field.
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The pcs error count should be determined by PCS ERROR status and
PCS ERROR MASK registers, only PCS ERROR status register can not
refect error counts accurately.
Changed from V1:
remove clean noncorrectable mask registers
optimize query pcs error status
Changed from V2:
remove check mask_value bits
correct set value corresponding bit
Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use gfx ras common initialization interface to
initialize gfx ras block.
V2:
Update function call due to amdgpu_gfx_ras_sw_init
interface changes.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Align a closing brace and remove trailing whitespaces. No functional
changes.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If early init fails for a single IP block, then no further IP blocks
are evaluated. This means that if a user was missing more than one
firmware binary they would have to keep adding binaries and re-probing
until they discovered the ones missing.
To make this easier, run early init for each IP block and report a single
failure if not all passed.
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is already a "default" case in the switch block, so there is
no need to have a break after the switch block.
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]:
Amdgpu ras uses amdgpu_ras_is_supported to check whether
the ras block supports the ras function. amdgpu_ras_is_supported
uses .ras_enabled to determine whether the ras function of the
block is enabled.
But for special asic with mem ecc enabled but sram ecc not
enabled, some ras blocks support poison mode but their ras function
is not enabled on .ras_enabled, these ras blocks will run abnormally.
[How]:
If the ras block is not supported on .ras_enabled but the asic
supports poison mode and the ras block has ras configuration, it
can be considered that the ras block supports ras function.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]:
For special asic with mem ecc enabled but sram ecc
not enabled, some ras blocks can register their ras
configuration to ras list, but these ras blocks are not
enabled on .ras_enabled, so it can not get ras block
object using amdgpu_ras_get_ras_block.
[How]:
Remove ras block support check. Even if the ras block
checked is not in the ras list, it will return a null
pointer and will have no effect.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Perform gpu reset after gfx finishes processing
ras poison consumption on gfx_v11_0_3.
V2:
Move gfx poison consumption handler from hw_ops to ip
function level.
V3:
Adjust the calling position of amdgpu_gfx_poison_consumation_handler.
V4:
Since gfx v11_0_3 does not have .hw_ops instance, the .hw_ops null
pointer check in amdgpu_ras_interrupt_poison_consumption_handler
needs to be adjusted.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add gfx ras function on gfx v11_0_3.
V2:
1. Add separate source files for gfx v11_0_3.
2. Create a common function to initialize gfx ras block.
V3:
1. Rename amdgpu_gfx_ras_block_init to amdgpu_gfx_ras_sw_init.
2. Adjust the calling position of amdgpu_gfx_ras_sw_init.
3. Remove gfx_v11_0_3_ras_ops.
V4:
Revert changes in amdgpu_ras_interrupt_poison_consumption_handler.
V5:
1. Remove invalid include file in gfx_v11_0_3.c.
2. Reduce the number of parameters of amdgpu_gfx_ras_sw_init.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>