Commit Graph

34031 Commits

Author SHA1 Message Date
Dave Airlie
4b1c24ef50 Merge tag 'amd-drm-fixes-6.17-2025-08-28' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.17-2025-08-28:

amdgpu:
- UserQ fixes
- Revert CSA fix
- SR-IOV fix

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250828173904.75850-1-alexander.deucher@amd.com
2025-08-29 08:50:44 +10:00
Dave Airlie
60d98e1a8d Merge tag 'drm-misc-fixes-2025-08-28' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Several nouveau fixes to remove unused code, fix an error path and be
less restrictive with the formats it accepts. A fix for amdgpu to pin
vmapped dma-buf, and a revert for tegra for a regression in the dma-buf
/ GEM code.

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250828-hypersonic-colorful-squirrel-64f04b@houat
2025-08-29 08:44:53 +10:00
Alex Deucher
c767d74a9c drm/amdgpu/userq: fix error handling of invalid doorbell
If the doorbell is invalid, be sure to set the r to an error
state so the function returns an error.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7e2a5b0a9a)
Cc: stable@vger.kernel.org
2025-08-27 14:01:52 -04:00
Jesse.Zhang
ee38ea0ae4 drm/amdgpu: update firmware version checks for user queue support
The minimum firmware versions required for user queue functionality
have been increased to address an issue where the queue privilege
state was lost during queue connect operations.

The problem occurred because the privilege state was being restored
to its initial value at the beginning of the function, overwriting
the state that was properly set during the queue connect case.

This commit updates the minimum version requirements:
- ME firmware from 2390 to 2420
- PFP firmware from 2530 to 2580
- MEC firmware from 2600 to 2650
- MES firmware remains at 120

These updated firmware versions contain the necessary fixes to
properly maintain queue privilege state throughout connect operations.

Fixes: 61ca97e959 ("drm/amdgpu: Add fw minimum version check for usermode queue")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5f976c9939)
Cc: stable@vger.kernel.org
2025-08-27 14:01:32 -04:00
Yang Wang
5dff50802b drm/amd/amdgpu: disable hwmon power1_cap* for gfx 11.0.3 on vf mode
the PPSMC_MSG_GetPptLimit msg is not valid for gfx 11.0.3 on vf mode,
so skiped to create power1_cap* hwmon sysfs node.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e82a8d4410)
Cc: stable@vger.kernel.org
2025-08-27 14:01:08 -04:00
Alex Deucher
ac4ed2da4c Revert "drm/amdgpu: fix incorrect vm flags to map bo"
This reverts commit b08425fa77.

Revert this to align with 6.17 because the fixes tag
was wrong on this commit.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit be33e8a239)
2025-08-27 14:00:38 -04:00
Alex Deucher
29f155c5e8 drm/amdgpu/gfx12: set MQD as appriopriate for queue types
Set the MQD as appropriate for the kernel vs user queues.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7b9110f289)
Cc: stable@vger.kernel.org
2025-08-27 13:59:53 -04:00
Alex Deucher
27f5e0c132 drm/amdgpu/gfx11: set MQD as appriopriate for queue types
Set the MQD as appropriate for the kernel vs user queues.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 063d668320)
Cc: stable@vger.kernel.org
2025-08-27 13:59:25 -04:00
Dave Airlie
f9915c391c Merge tag 'drm-misc-fixes-2025-08-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A bunch of fixes for 6.17:
  - analogix_dp: devm_drm_bridge_alloc() error handling fix
  - gaudi: Memory deallocation fix
  - gpuvm: Documentation warning fix
  - hibmc: Various misc fixes
  - nouveau: Memory leak fixes, typos
  - panic: u64 division handling on 32 bits architecture fix
  - rockchip: Kconfig fix, register caching fix
  - rust: memory layout and safety fixes
  - tests: Endianness fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250821-economic-dandelion-rooster-c57fa9@houat
2025-08-23 06:45:53 +10:00
Thomas Zimmermann
08fb45446e drm/amdgpu: Pin buffers while vmap'ing exported dma-buf objects
Current dma-buf vmap semantics require that the mapped buffer remains
in place until the corresponding vunmap has completed.

For GEM-SHMEM, this used to be guaranteed by a pin operation while creating
an S/G table in import. GEM-SHMEN can now import dma-buf objects without
creating the S/G table, so the pin is missing. Leads to page-fault errors,
such as the one shown below.

[  102.101726] BUG: unable to handle page fault for address: ffffc90127000000
[...]
[  102.157102] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl]
[...]
[  102.243250] Call Trace:
[  102.245695]  <TASK>
[  102.2477V95]  ? validate_chain+0x24e/0x5e0
[  102.251805]  ? __lock_acquire+0x568/0xae0
[  102.255807]  udl_render_hline+0x165/0x341 [udl]
[  102.260338]  ? __pfx_udl_render_hline+0x10/0x10 [udl]
[  102.265379]  ? local_clock_noinstr+0xb/0x100
[  102.269642]  ? __lock_release.isra.0+0x16c/0x2e0
[  102.274246]  ? mark_held_locks+0x40/0x70
[  102.278177]  udl_primary_plane_helper_atomic_update+0x43e/0x680 [udl]
[  102.284606]  ? __pfx_udl_primary_plane_helper_atomic_update+0x10/0x10 [udl]
[  102.291551]  ? lockdep_hardirqs_on_prepare.part.0+0x92/0x170
[  102.297208]  ? lockdep_hardirqs_on+0x88/0x130
[  102.301554]  ? _raw_spin_unlock_irq+0x24/0x50
[  102.305901]  ? wait_for_completion_timeout+0x2bb/0x3a0
[  102.311028]  ? drm_atomic_helper_calc_timestamping_constants+0x141/0x200
[  102.317714]  ? drm_atomic_helper_commit_planes+0x3b6/0x1030
[  102.323279]  drm_atomic_helper_commit_planes+0x3b6/0x1030
[  102.328664]  drm_atomic_helper_commit_tail+0x41/0xb0
[  102.333622]  commit_tail+0x204/0x330
[...]
[  102.529946] ---[ end trace 0000000000000000 ]---
[  102.651980] RIP: 0010:udl_compress_hline16+0x219/0x940 [udl]

In this stack strace, udl (based on GEM-SHMEM) imported and vmap'ed a
dma-buf from amdgpu. Amdgpu relocated the buffer, thereby invalidating the
mapping.

Provide a custom dma-buf vmap method in amdgpu that pins the object before
mapping it's buffer's pages into kernel address space. Do the opposite in
vunmap.

Note that dma-buf vmap differs from GEM vmap in how it handles relocation.
While dma-buf vmap keeps the buffer in place, GEM vmap requires the caller
to keep the buffer in place. Hence, this fix is in amdgpu's dma-buf code
instead of its GEM code.

A discussion of various approaches to solving the problem is available
at [1].

v3:
- try (GTT | VRAM); drop CPU domain (Christian)
v2:
- only use mapable domains (Christian)
- try pinning to domains in preferred order

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 660cd44659 ("drm/shmem-helper: Import dmabuf without mapping its sg_table")
Reported-by: Thomas Zimmermann <tzimmermann@suse.de>
Closes: https://lore.kernel.org/dri-devel/ba1bdfb8-dbf7-4372-bdcb-df7e0511c702@suse.de/
Cc: Shixiong Ou <oushixiong@kylinos.cn>
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: David Airlie <airlied@gmail.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: Sumit Semwal <sumit.semwal@linaro.org>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: dri-devel@lists.freedesktop.org
Cc: linux-media@vger.kernel.org
Cc: linaro-mm-sig@lists.linaro.org
Link: https://lore.kernel.org/dri-devel/9792c6c3-a2b8-4b2b-b5ba-fba19b153e21@suse.de/ # [1]
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20250821064031.39090-1-tzimmermann@suse.de
2025-08-22 14:05:31 +02:00
Maxime Ripard
1a2cf179e2 Merge drm/drm-fixes into drm-misc-fixes
Update drm-misc-fixes to -rc2.

Signed-off-by: Maxime Ripard <mripard@kernel.org>
2025-08-20 16:08:49 +02:00
Timur Kristóf
297a4833a6 drm/amd/display: Fix DP audio DTO1 clock source on DCE 6.
On DCE 6, DP audio was not working. However, it worked when an
HDMI monitor was also plugged in.

Looking at dce_aud_wall_dto_setup it seems that the main
difference is that we use DTO1 when only DP is plugged in.

When programming DTO1, it uses audio_dto_source_clock_in_khz
which is set from get_dp_ref_freq_khz

The dce60_get_dp_ref_freq_khz implementation looks incorrect,
because DENTIST_DISPCLK_CNTL seems to be always zero on DCE 6,
so it isn't usable.
I compared dce60_get_dp_ref_freq_khz to the legacy display code,
specifically dce_v6_0_audio_set_dto, and it turns out that in
case of DCE 6, it needs to use the display clock. With that,
DP audio started working on Pitcairn, Oland and Cape Verde.

However, it still didn't work on Tahiti. Despite having the
same DCE version, Tahiti seems to have a different audio device.
After some trial and error I realized that it works with the
default display clock as reported by the VBIOS, not the current
display clock.

The patch was tested on all four SI GPUs:

* Pitcairn (DCE 6.0)
* Oland (DCE 6.4)
* Cape Verde (DCE 6.0)
* Tahiti (DCE 6.0 but different)

The testing was done on Samsung Odyssey G7 LS28BG700EPXEN on
each of the above GPUs, at the following settings:

* 4K 60 Hz
* 1080p 60 Hz
* 1080p 144 Hz

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 645cc7863d)
Cc: stable@vger.kernel.org
2025-08-18 18:00:56 -04:00
Timur Kristóf
1050747846 drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3
For later VBIOS versions, the fractional feedback divider is
calculated as the remainder of dividing the feedback divider by
a factor, which is set to 1000000. For reference, see:
- calculate_fb_and_fractional_fb_divider
- calc_pll_max_vco_construct

However, in case of old VBIOS versions that have
set_pixel_clock_v3, they only have 1 byte available for the
fractional feedback divider, and it's expected to be set to the
remainder from dividing the feedback divider by 10.
For reference see the legacy display code:
- amdgpu_pll_compute
- amdgpu_atombios_crtc_program_pll

This commit fixes set_pixel_clock_v3 by dividing the fractional
feedback divider passed to the function by 100000.

Fixes: 4562236b3b ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 027e7acc7e)
Cc: stable@vger.kernel.org
2025-08-18 18:00:34 -04:00
Timur Kristóf
f14ee2e7a8 drm/amd/display: Don't print errors for nonexistent connectors
When getting the number of connectors, the VBIOS reports
the number of valid indices, but it doesn't say which indices
are valid, and not every valid index has an actual connector.
If we don't find a connector on an index, that is not an error.

Considering these are not actual errors, don't litter the logs.

Fixes: 60df562814 ("drm/amd/display: handle invalid connector indices")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 249d4bc5f1)
2025-08-18 18:00:20 -04:00
Timur Kristóf
8246147f1f drm/amd/display: Don't warn when missing DCE encoder caps
On some GPUs the VBIOS just doesn't have encoder caps,
or maybe not for every encoder.

This isn't really a problem and it's handled well,
so let's not litter the logs with it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 33e0227ee9)
2025-08-18 18:00:11 -04:00
Timur Kristóf
7d07140d37 drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs
Also needed by DCE 6.
This way the code that gathers this info can be shared between
different DCE versions and doesn't have to be repeated.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8107432dff)
Cc: stable@vger.kernel.org
2025-08-18 17:59:53 -04:00
Timur Kristóf
669f73a26f drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs
dce110_fill_display_configs is shared between DCE 6-11, and
finding the first CRTC and its line time is relevant to DCE 6 too.
Move the code to find it from DCE 11 specific code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4ab09785f8)
Cc: stable@vger.kernel.org
2025-08-18 17:59:34 -04:00
Timur Kristóf
1fc931be2f drm/amd/display: Adjust DCE 8-10 clock, don't overclock by 15%
Adjust the nominal (and performance) clocks for DCE 8-10,
and set them to 625 MHz, which is the value used by the legacy
display code in amdgpu_atombios_get_clock_info.

This was tested with Hawaii, Tonga and Fiji.
These GPUs can output 4K 60Hz (10-bit depth) at 625 MHz.

The extra 15% clock was added as a workaround for a Polaris issue
which uses DCE 11, and should not have been used on DCE 8-10 which
are already hardcoded to the highest possible display clock.
Unfortunately, the extra 15% was mistakenly copied and kept
even on code paths which don't affect Polaris.

This commit fixes that and also	adds a check to	make sure
not to exceed the maximum DCE 8-10 display clock.

Fixes: 8cd61c313d ("drm/amd/display: Raise dispclk value for Polaris")
Fixes: dc88b4a684 ("drm/amd/display: make clk mgr soc specific")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1ae45b5d4f)
2025-08-18 17:59:02 -04:00
Timur Kristóf
cb7b7ae53b drm/amd/display: Don't overclock DCE 6 by 15%
The extra 15% clock was added as a workaround for a Polaris issue
which uses DCE 11, and should not have been used on DCE 6 which
is already hardcoded to the highest possible display clock.
Unfortunately, the extra 15% was mistakenly copied and kept
even on code paths which don't affect Polaris.

This commit fixes that and also adds a check to make sure
not to exceed the maximum DCE 6 display clock.

Fixes: 8cd61c313d ("drm/amd/display: Raise dispclk value for Polaris")
Fixes: dc88b4a684 ("drm/amd/display: make clk mgr soc specific")
Fixes: 3ecb3b794e ("drm/amd/display: dc/clk_mgr: add support for SI parts (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 427980c1cb)
Cc: stable@vger.kernel.org
2025-08-18 17:57:45 -04:00
Chenyuan Yang
7a2ca2ea64 drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session()
The function mod_hdcp_hdcp1_create_session() calls the function
get_first_active_display(), but does not check its return value.
The return value is a null pointer if the display list is empty.
This will lead to a null pointer dereference.

Add a null pointer check for get_first_active_display() and return
MOD_HDCP_STATUS_DISPLAY_NOT_FOUND if the function return null.

This is similar to the commit c3e9826a22
("drm/amd/display: Add null pointer check for get_first_active_display()").

Fixes: 2deade5ede ("drm/amd/display: Remove hdcp display state with mst fix")
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5e43eb3cd7)
2025-08-18 17:57:12 -04:00
Tom Chung
66af73a1c3 drm/amd/display: Fix Xorg desktop unresponsive on Replay panel
[WHY & HOW]
IPS & self-fresh feature can cause vblank counter resets between
vblank disable and enable.
It may cause system stuck due to wait the vblank counter.

Call the drm_crtc_vblank_restore() during vblank enable to estimate
missed vblanks by using timestamps and update the vblank counter in
DRM.

It can make the vblank counter increase smoothly and resolve this issue.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 34d66bc7ff)
Cc: stable@vger.kernel.org
2025-08-18 17:56:53 -04:00
Mario Limonciello
07b93a5704 drm/amd/display: Avoid a NULL pointer dereference
[WHY]
Although unlikely drm_atomic_get_new_connector_state() or
drm_atomic_get_old_connector_state() can return NULL.

[HOW]
Check returns before dereference.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 1e5e8d672f)
Cc: stable@vger.kernel.org
2025-08-18 17:56:25 -04:00
Alex Deucher
79e25cd06e drm/amdgpu/swm14: Update power limit logic
Take into account the limits from the vbios.  Ported
from the SMU13 code.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4352
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 203cc7f1dd86f2c8de5c3c6182f19adac7c9c206)
Cc: stable@vger.kernel.org
2025-08-18 17:41:43 -04:00
Gabe Teeger
0aa86640eb drm/amd/display: Revert Add HPO encoder support to Replay
This reverts commits:
commit 1f26214d26 ("drm/amd/display: Add HPO encoder support to Replay")
commit 3bfce48b10 ("drm/amd/display: Add support for Panel Replay on DP1 eDP (panel_inst=1)")
due to visual confirm issue.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92f68f6a1b)
2025-08-18 17:41:43 -04:00
Thomas Zimmermann
3eb61d7cb7 Revert "drm/amdgpu: Use dma_buf from GEM object instance"
This reverts commit 515986100d.

The dma_buf field in struct drm_gem_object is not stable over the
object instance's lifetime. The field becomes NULL when user space
releases the final GEM handle on the buffer object. This resulted
in a NULL-pointer deref.

Workarounds in commit 5307dce878 ("drm/gem: Acquire references on
GEM handles for framebuffers") and commit f6bfc9afc7 ("drm/framebuffer:
Acquire internal references on GEM handles") only solved the problem
partially. They especially don't work for buffer objects without a DRM
framebuffer associated.

Hence, this revert to going back to using .import_attach->dmabuf.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250715082635.34974-1-tzimmermann@suse.de
2025-08-13 11:14:17 +02:00
Liu01 Tong
aa5fc4362f drm/amdgpu: fix task hang from failed job submission during process kill
During process kill, drm_sched_entity_flush() will kill the vm
entities. The following job submissions of this process will fail, and
the resources of these jobs have not been released, nor have the fences
been signalled, causing tasks to hang and timeout.

Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
stopped entity.

v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in
function amdgpu_cs_vm_handling().

Fixes: 1f02f2044b ("drm/amdgpu: Avoid extra evict-restore process.")
Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com>
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit f101c13a87)
2025-08-12 16:16:36 -04:00
Jack Xiao
040bc6d0e0 drm/amdgpu: fix incorrect vm flags to map bo
It should use vm flags instead of pte flags
to specify bo vm attributes.

Fixes: 7946340fa3 ("drm/amdgpu: Move csa related code to separate file")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b08425fa77)
2025-08-12 16:08:04 -04:00
YiPeng Chai
10ef476aad drm/amdgpu: fix vram reservation issue
The vram block allocation flag must be cleared
before making vram reservation, otherwise reserving
addresses within the currently freed memory range
will always fail.

Fixes: c9cad937c0 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d38eaf27de)
2025-08-12 16:07:45 -04:00
Frank Min
e67b8afcb6 drm/amdgpu: Add PSP fw version check for fw reserve GFX command
The fw reserved GFX command is only supported starting from PSP fw
version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command.

Add a version guard to ensure the command is only used when the running
PSP fw meets the minimum version requirement.

This ensures backward compatibility and safe operation across fw
revisions.

Fixes: a3b7f9c306 ("drm/amdgpu: reclaim psp fw reservation memory region")
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 065e23170a)
2025-08-12 16:07:31 -04:00
Linus Torvalds
ffe8ac927d Merge tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "This is the fixes that built up in the merge window, mostly amdgpu and
  xe with one i915 display fix, seems like things are pretty good for
  rc1.

  i915:
   - DP LPFS fixes

  xe:
   - SRIOV: PF fixes and removal of need of module param
   - Fix driver unbind around Devcoredump
   - Mark xe driver as BROKEN if kernel page size is not 4kB

  amdgpu:
   - GC 9.5.0 fixes
   - SMU fix
   - DCE 6 DC fixes
   - mmhub client ID fixes
   - VRR fix
   - Backlight fix
   - UserQ fix
   - Legacy reset fix
   - Misc fixes

  amdkfd:
   - CRIU fix
   - Debugfs fix"

* tag 'drm-next-2025-08-08' of https://gitlab.freedesktop.org/drm/kernel: (28 commits)
  drm/amdgpu: add missing vram lost check for LEGACY RESET
  drm/amdgpu/discovery: fix fw based ip discovery
  drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
  amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
  drm/amdgpu: Update SDMA firmware version check for user queue support
  drm/amdgpu: Add NULL check for asic_funcs
  drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
  drm/amd/display: fix a Null pointer dereference vulnerability
  drm/amd/display: Add primary plane to commits for correct VRR handling
  drm/amdgpu: update mmhub 3.3 client id mappings
  drm/amdgpu: update mmhub 3.0.1 client id mappings
  drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
  drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
  drm/amd/display: Don't overwrite dce60_clk_mgr
  drm/amdkfd: Fix checkpoint-restore on multi-xcc
  drm/amd: Restore cached manual clock settings during resume
  drm/amd: Restore cached power limit during resume
  drm/amdgpu: Update external revid for GC v9.5.0
  drm/amdgpu: Update supported modes for GC v9.5.0
  Mark xe driver as BROKEN if kernel page size is not 4kB
  ...
2025-08-08 06:48:14 +03:00
Alex Deucher
81699fe81b drm/amdgpu: add missing vram lost check for LEGACY RESET
Legacy resets reset the memory controllers so VRAM contents
may be unreliable after reset.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit aae94897b6)
Cc: stable@vger.kernel.org
2025-08-06 16:54:25 -04:00
Alex Deucher
514678da56 drm/amdgpu/discovery: fix fw based ip discovery
We only need the fw based discovery table for sysfs.  No
need to parse it.  Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e82829 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 62eedd150f)
Cc: stable@vger.kernel.org
2025-08-06 16:54:04 -04:00
Amber Lin
2e58401a24 drm/amdkfd: Destroy KFD debugfs after destroy KFD wq
Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
    debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0333052d90)
Cc: stable@vger.kernel.org
2025-08-06 16:52:08 -04:00
Xaver Hugl
928587381b amdgpu/amdgpu_discovery: increase timeout limit for IFWI init
With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9ed3d7bdf2)
Cc: stable@vger.kernel.org
2025-08-06 16:51:26 -04:00
Jesse.Zhang
124ffa2970 drm/amdgpu: Update SDMA firmware version check for user queue support
This commit fixes a firmware version check for enabling user queue
support in SDMA v7.0. The previous version check (7836028) was
incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL
commands causing register conflicts between MCU_DBG0 and MCU_DBG1.

Fixes: 8c011408ed ("drm/amdgpu/sdma7: add ucode version checks for userq support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 92e2449241)
Cc: stable@vger.kernel.org
2025-08-04 15:48:14 -04:00
Lijo Lazar
c2fe914d50 drm/amdgpu: Add NULL check for asic_funcs
If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 582bf7c515)
Cc: stable@vger.kernel.org
2025-08-04 15:47:50 -04:00
Mario Limonciello
8e6a18cbf3 drm/amd/display: Revert "drm/amd/display: Fix AMDGPU_MAX_BL_LEVEL value"
This reverts commit 66abb99699.
This broke custom brightness curves but it wasn't obvious because
of other related changes. Custom brightness curves are always
from a 0-255 input signal. The correct fix was to fix the default
value which was done by [1].

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4412
Link: https://lore.kernel.org/amd-gfx/0f094c4b-d2a3-42cd-824c-dc2858a5618d@kernel.org/T/#m69f875a7e69aa22df3370b3e3a9e69f4a61fdaf2
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6ec8a5cbec)
Cc: stable@vger.kernel.org
2025-08-04 15:46:36 -04:00
Siyang Liu
1bcf63a443 drm/amd/display: fix a Null pointer dereference vulnerability
[Why]
A null pointer dereference vulnerability exists in the AMD display driver's
(DC module) cleanup function dc_destruct().
When display control context (dc->ctx) construction fails
(due to memory allocation failure), this pointer remains NULL.
During subsequent error handling when dc_destruct() is called,
there's no NULL check before dereferencing the perf_trace member
(dc->ctx->perf_trace), causing a kernel null pointer dereference crash.

[How]
Check if dc->ctx is non-NULL before dereferencing.

Link: https://lore.kernel.org/r/tencent_54FF4252EDFB6533090A491A25EEF3EDBF06@qq.com
Co-developed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
(Updated commit text and removed unnecessary error message)
Signed-off-by: Siyang Liu <Security@tencent.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9dd8e2ba26)
Cc: stable@vger.kernel.org
2025-08-04 15:44:55 -04:00
Michel Dänzer
3477c1b097 drm/amd/display: Add primary plane to commits for correct VRR handling
amdgpu_dm_commit_planes calls update_freesync_state_on_stream only for
the primary plane. If a commit affects a CRTC but not its primary plane,
it would previously not trigger a refresh cycle or affect LFC, violating
current UAPI semantics.

Fixes e.g. atomic commits affecting only the cursor plane being limited
to the minimum refresh rate.

Don't do this for the legacy cursor ioctls though, it would break the
UAPI semantics for those.

Suggested-by: Xaver Hugl <xaver.hugl@kde.org>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Signed-off-by: Michel Dänzer <mdaenzer@redhat.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cc7bfba959)
Cc: stable@vger.kernel.org
2025-08-04 15:43:58 -04:00
Alex Deucher
9f9bddfa31 drm/amdgpu: update mmhub 3.3 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

v2: fix typos spotted by David Wu.
v3: fix additional typo spotted by David.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e932f4779a)
Cc: stable@vger.kernel.org
2025-08-04 15:42:15 -04:00
Alex Deucher
0bae62cc98 drm/amdgpu: update mmhub 3.0.1 client id mappings
Update the client id mapping so the correct clients
get printed when there is a mmhub page fault.

Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2a2681eda7)
Cc: stable@vger.kernel.org
2025-08-04 15:41:52 -04:00
YuanShang
c00d8b79fd drm/amdgpu: Retain job->vm in amdgpu_job_prepare_job
The field job->vm is used in function amdgpu_job_run to get the page
table re-generation counter and decide whether the job should be skipped.

Specifically, function amdgpu_vm_generation checks if the VM is valid for this job to use.
For instance, if a gfx job depends on a cancelled sdma job from entity vm->delayed,
then the gfx job should be skipped.

Fixes: 26c95e838e ("drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare")
Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit ed76936c6b)
Cc: stable@vger.kernel.org
2025-08-04 15:41:09 -04:00
Timur Kristóf
1c8dc3e088 drm/amd/display: Fix DCE 6.0 and 6.4 PLL programming.
Apparently, both DCE 6.0 and 6.4 have 3 PLLs, but PLL0 can only
be used for DP. Make sure to initialize the correct amount of PLLs
in DC for these DCE versions and use PLL0 only for DP.

Also, on DCE 6.0 and 6.4, the PLL0 needs to be powered on at
initialization as opposed to DCE 6.1 and 7.x which use a different
clock source for DFS.

The following functions were used as reference from the	old
radeon driver implementation of	DCE 6.x:
- radeon_atom_pick_pll
- atombios_crtc_set_disp_eng_pll

Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 35222b5934)
Cc: stable@vger.kernel.org
2025-08-04 15:39:42 -04:00
Timur Kristóf
4db9cd5548 drm/amd/display: Don't overwrite dce60_clk_mgr
dc_clk_mgr_create accidentally overwrites the dce60_clk_mgr
with the dce_clk_mgr, causing incorrect behaviour on DCE6.
Fix it by removing the extra dce_clk_mgr_construct.

Fixes: 62eab49faa ("drm/amd/display: hide VGH asic specific structs")
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bbddcbe36a)
Cc: stable@vger.kernel.org
2025-08-04 15:39:21 -04:00
David Yat Sin
f6c0f3d244 drm/amdkfd: Fix checkpoint-restore on multi-xcc
GPUs with multi-xcc have multiple MQDs per queue. This patch saves and
restores all the MQDs within the partition.

Signed-off-by: David Yat Sin <David.YatSin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit a578f2a58c)
Cc: stable@vger.kernel.org
2025-08-04 15:38:49 -04:00
Mario Limonciello
796ff8a7e0 drm/amd: Restore cached manual clock settings during resume
If the SCLK limits have been set before S3 they will not
be restored. The limits are however cached in the driver and so
they can be restored by running a commit sequence during resume.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-3-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4e9526924d)
Cc: stable@vger.kernel.org
2025-08-04 15:37:26 -04:00
Mario Limonciello
ed4efe426a drm/amd: Restore cached power limit during resume
The power limit will be cached in smu->current_power_limit but
if the ASIC goes into S3 this value won't be restored.

Restore the value during SMU resume.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250725031222.3015095-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 26a609e053)
Cc: stable@vger.kernel.org
2025-08-04 15:37:05 -04:00
Lijo Lazar
05c8b69051 drm/amdgpu: Update external revid for GC v9.5.0
Use different external revid for GC v9.5.0 SOCs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 21c6764ed4)
Cc: stable@vger.kernel.org
2025-08-04 15:32:37 -04:00
Lijo Lazar
389d79a195 drm/amdgpu: Update supported modes for GC v9.5.0
For GC v9.5.0 SOCs, both CPX and QPX compute modes are also supported in
NPS2 mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Mangesh Gadre <Mangesh.Gadre@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d1ac25c7f)
Cc: stable@vger.kernel.org
2025-08-04 15:32:07 -04:00
Linus Torvalds
89748acdf2 Merge tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
 "Just a bunch of amdgpu and xe fixes.

  amdgpu:
   - DSC divide by 0 fix
   - clang fix
   - DC debugfs fix
   - Userq fixes
   - Avoid extra evict-restore with KFD
   - Backlight fix
   - Documentation fix
   - RAS fix
   - Add new kicker handling
   - DSC fix for DCN 3.1.4
   - PSR fix
   - Atomic fix
   - DC reset fixes
   - DCN 3.0.1 fix
   - MMHUB client mapping fix

  xe:
   - Fix BMG probe on unsupported mailbox command
   - Fix OA static checker warning about null gt
   - Fix a NULL vs IS_ERR() bug in xe_i2c_register_adapter
   - Fix missing unwind goto in GuC/HuC
   - Don't register I2C devices if VF
   - Clear whole GuC g2h_fence during initialization
   - Avoid call kfree for drmm_kzalloc
   - Fix pci_dev reference leak on configfs
   - SRIOV: Disable CSC support on VF

* tag 'drm-next-2025-08-01' of https://gitlab.freedesktop.org/drm/kernel: (24 commits)
  drm/xe/vf: Disable CSC support on VF
  drm/amdgpu: update mmhub 4.1.0 client id mappings
  drm/amd/display: Allow DCN301 to clear update flags
  drm/amd/display: Pass up errors for reset GPU that fails to init HW
  drm/amd/display: Only finalize atomic_obj if it was initialized
  drm/amd/display: Avoid configuring PSR granularity if PSR-SU not supported
  drm/amd/display: Disable dsc_power_gate for dcn314 by default
  drm/amdgpu: add kicker fws loading for gfx12/smu14/psp14
  drm/amd/amdgpu: fix missing lock for cper.ring->rptr/wptr access
  drm/amd/display: Fix misuse of /** to /* in 'dce_i2c_hw.c'
  drm/amd/display: fix initial backlight brightness calculation
  drm/amdgpu: Avoid extra evict-restore process.
  drm/amdgpu: track whether a queue is a kernel queue in amdgpu_mqd_prop
  drm/amdgpu: check if hubbub is NULL in debugfs/amdgpu_dm_capabilities
  drm/amdgpu: Initialize data to NULL in imu_v12_0_program_rlc_ram()
  drm/amd/display: Fix divide by zero when calculating min ODM factor
  drm/xe/configfs: Fix pci_dev reference leak
  drm/xe/hw_engine_group: Avoid call kfree() for drmm_kzalloc()
  drm/xe/guc: Clear whole g2h_fence during initialization
  drm/xe/vf: Don't register I2C devices if VF
  ...
2025-07-31 21:47:36 -07:00