Always setup overdrive tables after resume. Preserve only some
user-defined settings in user_overdrive_table if they're set.
Copy restored user_overdrive_table into od_table to get correct
values.
On cold boot, BTC was triggered and GfxVfCurve was calibrated. We
got VfCurve settings (a). On resuming back, BTC will be triggered
again and GfxVfCurve will be recalibrated. VfCurve settings (b)
got may be different from those of cold boot. So if we reuse
those VfCurve settings (a) got on cold boot on suspend, we can
run into discrepencies.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1897
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2276
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Błażej Szczygieł <mumei6102@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
During miration to vram prange->offset is valid after vram buffer is located,
either use old one or allocate a new one. Move svm_range_vram_node_new before
migrate for each vma to get valid prange->offset.
v2: squash in warning fix
Fixes: b4ee960637 ("drm/amdkfd: Fix BO offset for multi-VMA page migration")
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
svm_migrate_ram_to_vram migrates a prange from sys ram to vram. The prange may
cross multiple vma. Need remember current dst vram offset in the TTM resource for
each migration.
v2: squash in warning fix (Alex)
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-misc-next for v6.4-rc1:
Note: Only changes since pull request from 2023-02-23 are included here.
UAPI Changes:
- Convert rockchip bindings to YAML.
- Constify kobj_type structure in dma-buf.
- FBDEV cmdline parser fixes, and other small fbdev fixes for mode
parsing.
Cross-subsystem Changes:
- Add Neil Armstrong as linaro maintainer.
- Actually signal the private stub dma-fence.
Core Changes:
- Add function for adding syncobj dep to sched_job and use it in panfrost, v3d.
- Improve DisplayID 2.0 topology parsing and EDID parsing in general.
- Add a gem eviction function and callback for generic GEM shrinker
purposes.
- Prepare to convert shmem helper to use the GEM reservation lock instead of own
locking. (Actual commit itself got reverted for now)
- Move the suballocator from radeon and amdgpu drivers to core in preparation
for Xe.
- Assorted small fixes and documentation.
- Fixes to HPD polling.
- Assorted small fixes in simpledrm, bridge, accel, shmem-helper,
and the selftest of format-helper.
- Remove dummy resource when ttm bo is created, and during pipelined
gutting. Fix all drivers to accept a NULL ttm_bo->resource.
- Handle pinned BO moving prevention in ttm core.
- Set drm panel-bridge orientation before connector is registered.
- Remove dumb_destroy callback.
- Add documentation to GEM_CLOSE, PRIME_HANDLE_TO_FD, PRIME_FD_TO_HANDLE, GETFB2 ioctl's.
- Add atomic enable_plane callback, use it in ast, mgag200, tidss.
Driver Changes:
- Use drm_gem_objects_lookup in vc4.
- Assorted small fixes to virtio, ast, bridge/tc358762, meson, nouveau.
- Allow virtio KMS to be disabled and compiled out.
- Add Radxa 8/10HD, Samsung AMS495QA01 panels.
- Fix ivpu compiler errors.
- Assorted fixes to drm/panel, malidp, rockchip, ivpu, amdgpu, vgem,
nouveau, vc4.
- Assorted cleanups, simplifications and fixes to vmwgfx.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ac1f5186-54bb-02f4-ac56-907f5b76f3de@linux.intel.com
The VCN firmware loading path enables the indirect SRAM mode if it's
advertised as supported. We might have some cases of FW issues that
prevents this mode to working properly though, ending-up in a failed
probe. An example below, observed in the Steam Deck:
[...]
[drm] failed to load ucode VCN0_RAM(0x3A)
[drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0xFFFF0000)
amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
[drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <vcn_v3_0> failed -110
amdgpu 0000:04:00.0: amdgpu: amdgpu_device_ip_init failed
amdgpu 0000:04:00.0: amdgpu: Fatal error during GPU init
[...]
Disabling the VCN block circumvents this, but it's a very invasive
workaround that turns off the entire feature. So, let's add a quirk
on VCN loading that checks for known problematic BIOSes on Vangogh,
so we can proactively disable the indirect SRAM mode and allow the
HW proper probe and VCN IP block to work fine.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2385
Fixes: 82132ecc54 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
Cc: stable@vger.kernel.org
Cc: James Zhu <James.Zhu@amd.com>
Cc: Leo Liu <leo.liu@amd.com>
Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We should be checking if drm_dp_dpcd_read() returns the size that we are
asking it to read instead of just checking if it is greater than zero.
So, compare the return value of drm_dp_dpcd_read() to the requested
read size.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When FB_DAMAGE_CLIPS are provided in a non-MPO scenario, the loop does
not use the counter i. This causes the fill_dc_dity_rect() to always
fill dirty_rects[0], causing graphical artifacts when a damage clip
aware DRM client sends more than 1 damage clip.
Instead, use the flip_addrs->dirty_rect_count which is incremented by
fill_dc_dirty_rect() on a successful fill.
Fixes: 30ebe41582 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2453
Signed-off-by: Benjamin Cheng <ben@bcheng.me>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize hdp ras block only when mmhub ip block
supports ras features. Driver queries ras capabilities
after early_init, ras block init needs to be moved to
sw_init.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize mmhub ras block only when mmhub ip block
supports ras features. Driver queries ras capabilities
after early_init, ras block init needs to be moved to
sw_init.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize umc ras block only when umc ip block
supports ras. Driver queries ras capabilities after
early_init, ras block init needs to be moved to
sw_init.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize vcn ras block only when vcn ip block
supports ras features. Driver queries ras capabilities
after early_init, ras block init needs to be moved to
sw_int.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initialize jpeg ras block only when jpeg ip block
supports ras features. Driver queries ras capabilities
after early_init, ras block init needs to be moved to
sw_int.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Stanley Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some amd asics having reliable hotplug support don't call
drm_kms_helper_poll_init in driver init sequence. However,
due to the unified suspend/resume path for all asics, because
the output_poll_work->func is not set for these asics, a warning
arrives when suspending.
[ 90.656049] <TASK>
[ 90.656050] ? console_unlock+0x4d/0x100
[ 90.656053] ? __irq_work_queue_local+0x27/0x60
[ 90.656056] ? irq_work_queue+0x2b/0x50
[ 90.656057] ? __wake_up_klogd+0x40/0x60
[ 90.656059] __cancel_work_timer+0xed/0x180
[ 90.656061] drm_kms_helper_poll_disable.cold+0x1f/0x2c [drm_kms_helper]
[ 90.656072] amdgpu_device_suspend+0x81/0x170 [amdgpu]
[ 90.656180] amdgpu_pmops_runtime_suspend+0xb5/0x1b0 [amdgpu]
[ 90.656269] pci_pm_runtime_suspend+0x61/0x1b0
drm_kms_helper_poll_enable/disable is valid when poll_init is called in
amdgpu code, which is only used in non DC path. So move such codes into
non-DC path code to get rid of such warnings.
v1: introduce use_kms_poll flag in amdgpu as the poll stuff check
v2: use dc_enabled as the flag to simply code
v3: move code into non DC path instead of relying on any flag
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2411
Fixes: a4e771729a ("drm/probe_helper: sort out poll_running vs poll_enabled")
Reported-by: Bert Karwatzki <spasswolf@web.de>
Suggested-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
On resume some displays are not ready for HDCP, so they will fail if we
start the hdcp authentintication too soon.
Add a delay so that the displays can be ready before we start.
NOTE: Previoulsy this delay was set to 3 seconds but it was causing
issues with compliance, 2 seconds should enough for compliance and the
s3 resume case.
[How]
Change the Delay to 2 seconds.
Reviewed-by: Aurabindo Pillai <Aurabindo.Pillai@amd.com>
Acked-by: Qingqing Zhuo <qingqing.zhuo@amd.com>
Signed-off-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
kgd_mem pointers returned by kfd_process_device_translate_handle are
only guaranteed to be valid while p->mutex is held. As soon as the mutex
is unlocked, another thread can free the BO.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Currently there aren't any methods to determine PSR state residency.
[How]
create a sysfs entry for reading residency and internally hook it up
to existing functionality of reading PSR residency from firmware.
[Hamza: dropped the link.h include and made checkpatch happy]
Signed-off-by: Shirish S <shirish.s@amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
During miration to vram prange->offset is valid after vram buffer is located,
either use old one or allocate a new one. Move svm_range_vram_node_new before
migrate for each vma to get valid prange->offset.
v2: squash in warning fix
Fixes: 9473b6b25b ("drm/amdkfd: Fix BO offset for multi-VMA page migration")
Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always setup overdrive tables after resume. Preserve only some
user-defined settings in user_overdrive_table if they're set.
Copy restored user_overdrive_table into od_table to get correct
values.
On cold boot, BTC was triggered and GfxVfCurve was calibrated. We
got VfCurve settings (a). On resuming back, BTC will be triggered
again and GfxVfCurve will be recalibrated. VfCurve settings (b)
got may be different from those of cold boot. So if we reuse
those VfCurve settings (a) got on cold boot on suspend, we can
run into discrepencies.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1897
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2276
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Błażej Szczygieł <mumei6102@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508
Fixes: 8f70f1ec58 ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310061339.2495416-2-badal.nilawar@intel.com
Users reported oopses on list corruptions when using i915 perf with a
number of concurrently running graphics applications. Root cause analysis
pointed at an issue in barrier processing code -- a race among perf open /
close replacing active barriers with perf requests on kernel context and
concurrent barrier preallocate / acquire operations performed during user
context first pin / last unpin.
When adding a request to a composite tracker, we try to reuse an existing
fence tracker, already allocated and registered with that composite. The
tracker we obtain may already track another fence, may be an idle barrier,
or an active barrier.
If the tracker we get occurs a non-idle barrier then we try to delete that
barrier from a list of barrier tasks it belongs to. However, while doing
that we don't respect return value from a function that performs the
barrier deletion. Should the deletion ever fail, we would end up reusing
the tracker still registered as a barrier task. Since the same structure
field is reused with both fence callback lists and barrier tasks list,
list corruptions would likely occur.
Barriers are now deleted from a barrier tasks list by temporarily removing
the list content, traversing that content with skip over the node to be
deleted, then populating the list back with the modified content. Should
that intentionally racy concurrent deletion attempts be not serialized,
one or more of those may fail because of the list being temporary empty.
Related code that ignores the results of barrier deletion was initially
introduced in v5.4 by commit d8af05ff38 ("drm/i915: Allow sharing the
idle-barrier from other kernel requests"). However, all users of the
barrier deletion routine were apparently serialized at that time, then the
issue didn't exhibit itself. Results of git bisect with help of a newly
developed igt@gem_barrier_race@remote-request IGT test indicate that list
corruptions might start to appear after commit 311770173f ("drm/i915/gt:
Schedule request retirement when timeline idles"), introduced in v5.5.
Respect results of barrier deletion attempts -- mark the barrier as idle
only if successfully deleted from the list. Then, before proceeding with
setting our fence as the one currently tracked, make sure that the tracker
we've got is not a non-idle barrier. If that check fails then don't use
that tracker but go back and try to acquire a new, usable one.
v3: use unlikely() to document what outcome we expect (Andi),
- fix bad grammar in commit description.
v2: no code changes,
- blame commit 311770173f ("drm/i915/gt: Schedule request retirement
when timeline idles"), v5.5, not commit d8af05ff38 ("drm/i915: Allow
sharing the idle-barrier from other kernel requests"), v5.4,
- reword commit description.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6333
Fixes: 311770173f ("drm/i915/gt: Schedule request retirement when timeline idles")
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: stable@vger.kernel.org # v5.5
Cc: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230302120820.48740-1-janusz.krzysztofik@linux.intel.com
(cherry picked from commit 5060060557)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
It seems that commit bc3c5e0809 ("drm/i915/sseu: Don't try to store EU
mask internally in UAPI format") exposed a potential out-of-bounds
access, reported by UBSAN as following on a laptop with a gen 11 i915
card:
UBSAN: array-index-out-of-bounds in drivers/gpu/drm/i915/gt/intel_sseu.c:65:27
index 6 is out of range for type 'u16 [6]'
CPU: 2 PID: 165 Comm: systemd-udevd Not tainted 6.2.0-9-generic #9-Ubuntu
Hardware name: Dell Inc. XPS 13 9300/077Y9N, BIOS 1.11.0 03/22/2022
Call Trace:
<TASK>
show_stack+0x4e/0x61
dump_stack_lvl+0x4a/0x6f
dump_stack+0x10/0x18
ubsan_epilogue+0x9/0x3a
__ubsan_handle_out_of_bounds.cold+0x42/0x47
gen11_compute_sseu_info+0x121/0x130 [i915]
intel_sseu_info_init+0x15d/0x2b0 [i915]
intel_gt_init_mmio+0x23/0x40 [i915]
i915_driver_mmio_probe+0x129/0x400 [i915]
? intel_gt_probe_all+0x91/0x2e0 [i915]
i915_driver_probe+0xe1/0x3f0 [i915]
? drm_privacy_screen_get+0x16d/0x190 [drm]
? acpi_dev_found+0x64/0x80
i915_pci_probe+0xac/0x1b0 [i915]
...
According to the definition of sseu_dev_info, eu_mask->hsw is limited to
a maximum of GEN_MAX_SS_PER_HSW_SLICE (6) sub-slices, but
gen11_sseu_info_init() can potentially set 8 sub-slices, in the
!IS_JSL_EHL(gt->i915) case.
Fix this by reserving up to 8 slots for max_subslices in the eu_mask
struct.
Reported-by: Emil Renner Berthing <emil.renner.berthing@canonical.com>
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Fixes: bc3c5e0809 ("drm/i915/sseu: Don't try to store EU mask internally in UAPI format")
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230220171858.131416-1-andrea.righi@canonical.com
(cherry picked from commit 3cba09a6ac)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The hardware settings for color format and pitch are state of the
primary plane. Store the values in the primary plane's state structure
struct cirrus_primary_plane_state. Adapt all callers.
All fields in struct cirrus_device are now considered immutable after
initialization. Plane updates consider the difference between the old
and the new plane state before updating format or pitch.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Acked-by: Gerd Hoffmann <kraxel@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230215161517.5113-17-tzimmermann@suse.de