The GPU is halted when it hits a MMU exception, so there is no point in
waiting for the job timeout to expire or try to work out if the GPU is
still making progress in the timeout handler, as we know that the GPU
won't make any more progress.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Now that it is only used to track the driver internal state of
the MMU global and cmdbuf objects, we can get rid of this property
by making the free/finit functions of those objects safe to call
on an uninitialized object.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Instead of only tracking if the FE is running, use a enum to better
describe the various states the GPU can be in. This allows some
additional validation to make sure that functions that expect a
certain GPU state are only called when the GPU is actually in that
state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Nothing in this callpath actually touches the GPU, so there is no reason
to get it out of suspend state here. Only if runtime PM isn't enabled at
all we must make sure to enable the clocks, so the GPU init routine can
access the GPU later on.
This also removes the need to guard against the state where the driver
isn't fully initialized yet in the runtime PM resume handler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Currently the clock is enabled in the runtime resume function, but are
disabled a level further down in the callstack in the suspend function.
Move the clock disable into the suspend function to make handling
symmetrical between resume and suspend.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Conceptually events are the right abstraction to handle the GPU
runtime PM state: as long as any event is pending the GPU can not
be idle. Events are also properly freed and reallocated when the
GPU has been reset after a hang.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Clearing the whole bitmap at once is only a minor optimization in
a path that should be extremely cold. Free the events by calling
event_free() instead of directly manipulating the completion count
and event bitmap.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
So it can use the event_free function without adding another
forward declaration. No functional change.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
This is the 2D GPU found on the i.MX8MP SoC. Feature bits taken
from the downstream kernel driver 6.4.3.p4.4.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This is the NPU found on the NXP i.MX8MP SoC. Feature bits taken
from the downstream kernel driver 6.4.3.p4.4.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
gpu->mmu_context is the MMU context of the last job in the HW queue, which
isn't necessarily the same as the context from the bad job. Dump the MMU
context from the scheduler determined bad submit to make it work as intended.
Fixes: 17e4660ae3 ("drm/etnaviv: implement per-process address spaces on MMUv2")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
v3: drop warn printing (Christian)
v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)
Fixes: 84ec374bd5 ("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable@vger.kernel.org
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
DCN301 does not have FAMS hence the workaround needed on other DCN3x
variants related to OTG min/max selector programming is not applicable for it.
Hence isolate it and have it use the old sequence without workaround.
Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Make a few functions non static so that they can be reused for other
asic. This is in preparation for separating out OTG programming sequence
for DCN301
Fixes: 1598fc5764 ("drm/amd/display: Program OTG vtotal min/max selectors unconditionally for DCN1+")
Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU7 does a check if the dGPU is inserted into a Rocket Lake system,
to turn off DPM. Extend this check to all systems that have problems
with dynamic switching by using the
amdgpu_device_pcie_dynamic_switching_supported() helper.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fix below checkpatch error & warnings:
ERROR: trailing statements should be on next line
+ default: BUG();
WARNING: braces {} are not necessary for single statement blocks
WARNING: braces {} are not necessary for any arm of this statement
WARNING: Block comments should align the * on each line
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When a new mode is set to modeset->mode, the previous mode should be freed.
This fixes the following kmemleak report:
drm_mode_duplicate+0x45/0x220 [drm]
drm_client_modeset_probe+0x944/0xf50 [drm]
__drm_fb_helper_initial_config_and_unlock+0xb4/0x2c0 [drm_kms_helper]
drm_fbdev_client_hotplug+0x2bc/0x4d0 [drm_kms_helper]
drm_client_register+0x169/0x240 [drm]
ast_pci_probe+0x142/0x190 [ast]
local_pci_probe+0xdc/0x180
work_for_cpu_fn+0x4e/0xa0
process_one_work+0x8b7/0x1540
worker_thread+0x70a/0xed0
kthread+0x29f/0x340
ret_from_fork+0x1f/0x30
cc: <stable@vger.kernel.org>
Reported-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711092203.68157-3-jfalempe@redhat.com
dmt_mode is allocated and never freed in this function.
It was found with the ast driver, but most drivers using generic fbdev
setup are probably affected.
This fixes the following kmemleak report:
backtrace:
[<00000000b391296d>] drm_mode_duplicate+0x45/0x220 [drm]
[<00000000e45bb5b3>] drm_client_target_cloned.constprop.0+0x27b/0x480 [drm]
[<00000000ed2d3a37>] drm_client_modeset_probe+0x6bd/0xf50 [drm]
[<0000000010e5cc9d>] __drm_fb_helper_initial_config_and_unlock+0xb4/0x2c0 [drm_kms_helper]
[<00000000909f82ca>] drm_fbdev_client_hotplug+0x2bc/0x4d0 [drm_kms_helper]
[<00000000063a69aa>] drm_client_register+0x169/0x240 [drm]
[<00000000a8c61525>] ast_pci_probe+0x142/0x190 [ast]
[<00000000987f19bb>] local_pci_probe+0xdc/0x180
[<000000004fca231b>] work_for_cpu_fn+0x4e/0xa0
[<0000000000b85301>] process_one_work+0x8b7/0x1540
[<000000003375b17c>] worker_thread+0x70a/0xed0
[<00000000b0d43cd9>] kthread+0x29f/0x340
[<000000008d770833>] ret_from_fork+0x1f/0x30
unreferenced object 0xff11000333089a00 (size 128):
cc: <stable@vger.kernel.org>
Fixes: 1d42bbc8f7 ("drm/fbdev: fix cloning on fbcon")
Reported-by: Zhang Yi <yizhan@redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711092203.68157-2-jfalempe@redhat.com
The commit e254b584db ("drm/ssd130x: Remove hardcoded bits-per-pixel in
ssd130x_buf_alloc()") used a pixel format info rather than a hardcoded bpp
to calculate the size of the buffer allocated to store the native pixels.
But it wrongly used the DRM_FORMAT_C1 fourcc pixel format. That is for
color-indexed frame buffer formats, while the ssd103x controllers don't
support different single-channel colors nor a Color Lookup Table (CLUT).
So the correct pixel format to use in this case is DRM_FORMAT_R1 instead.
Since both formats use a eight pixels/byte, there is no functional change
in practice by this patch. Still, the correct pixel format should be used.
Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Javier Martinez Canillas <javierm@redhat.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230713085859.907127-1-javierm@redhat.com
The feature to save the dispatch ID in trap temporaries 6 & 7 on context
save is unconditionally enabled during MQD initialization.
Now that TTMPs are always setup regardless of debug mode for GC 9.4.3, we
should report that the dispatch ID is always available for debug/trap
handling.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU13 overrides dynamic PCIe lane width and dynamic speed by when on
certain hosts. commit 38e4ced804 ("drm/amd/pm: conditionally disable
pcie lane switching for some sienna_cichlid SKUs") worked around this
issue by setting up certain SKUs to set up certain limits, but the same
fundamental problem with those hosts affects all SMU11 implmentations
as well, so align the SMU11 and SMU13 driver handling.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
SMU13 overrides dynamic PCIe lane width and dynamic speed by when on
certain hosts. commit 38e4ced804 ("drm/amd/pm: conditionally disable
pcie lane switching for some sienna_cichlid SKUs") worked around this
issue by setting up certain SKUs to set up certain limits, but the same
fundamental problem with those hosts affects all SMU11 implmentations
as well, so align the SMU11 and SMU13 driver handling.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
the smu driver_table is used for all types of smu
tables data transcation (e.g: PPtable, Metrics, i2c, Ecc..).
it is necessary to hold this lock to avoiding data tampering
during the i2c read operation.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org