Commit Graph

2073 Commits

Author SHA1 Message Date
Yang Wang
ee8d07cd57 drm/amd/pm: fix race in power state check before mutex lock
The power state check in amdgpu_dpm_set_powergating_by_smu() is done
before acquiring the pm mutex, leading to a race condition where:
1. Thread A checks state and thinks no change is needed
2. Thread B acquires mutex and modifies the state
3. Thread A returns without updating state, causing inconsistency

Fix this by moving the mutex lock before the power state check,
ensuring atomicity of the state check and modification.

Fixes: 6ee27ee27b ("drm/amd/pm: avoid duplicate powergate/ungate setting")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7a3fbdfd19)
2026-01-27 18:25:15 -05:00
Yang Wang
239d0ccf56 drm/amd/pm: fix smu v14 soft clock frequency setting issue
v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.

v2:
patch this convert on 'max' value only.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 53868dd877)
Cc: stable@vger.kernel.org
2026-01-27 18:24:21 -05:00
Yang Wang
c764b7af15 drm/amd/pm: fix smu v13 soft clock frequency setting issue
v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.

v2:
patch this convert on 'max' value only.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6194f60c70)
Cc: stable@vger.kernel.org
2026-01-27 18:24:00 -05:00
Timur Kristóf
764a90eb02 drm/amd/pm: Workaround SI powertune issue on Radeon 430 (v2)
Radeon 430 and 520 are OEM GPUs from 2016~2017
They have the same device id: 0x6611 and revision: 0x87

On the Radeon 430, powertune is buggy and throttles the GPU,
never allowing it to reach its maximum SCLK. Work around this
bug by raising the TDP limits we program to the SMC from
24W (specified by the VBIOS on Radeon 430) to 32W.

Disabling powertune entirely is	not a viable workaround,
because	it causes the Radeon 520 to heat up above 100 C,
which I prefer to avoid.

Additionally, revise the maximum SCLK limit. Considering the
above issue, these GPUs never reached a high SCLK on Linux,
and the workarounds were added before the GPUs were released,
so the workaround likely didn't target these specifically.
Use 780 MHz (the maximum SCLK according to the VBIOS on the
Radeon 430). Note that the Radeon 520 VBIOS has a higher
maximum SCLK: 905 MHz, but in practice it doesn't seem to
perform better with the higher clock, only heats up more.

v2:
Move the workaround to si_populate_smc_tdp_limits.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 966d70f1e1)
2026-01-21 14:55:33 -05:00
Timur Kristóf
d5077426e1 drm/amd/pm: Don't clear SI SMC table when setting power limit
There is no reason to clear the SMC table.
We also don't need to recalculate the power limit then.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e214d62625)
2026-01-21 14:55:33 -05:00
Timur Kristóf
4ca284c6d1 drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT setting
Use WREG32 to write mmCG_THERMAL_INT.
This is a direct access register.

Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2555f4e4a7)
2026-01-21 14:53:51 -05:00
Yang Wang
90dbc0bc2a drm/amd/pm: fix smu overdrive data type wrong issue on smu 14.0.2
resolving the issue of incorrect type definitions potentially causing calculation errors.

Fixes: 54f7f3ca98 ("drm/amdgpu/swm14: Update power limit logic")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit e3a03d0ae1)
2026-01-14 15:05:52 -05:00
Perry Yuan
0de604d035 drm/amd/pm: Disable MMIO access during SMU Mode 1 reset
During Mode 1 reset, the ASIC undergoes a reset cycle and becomes
temporarily inaccessible via PCIe. Any attempt to access MMIO registers
during this window (e.g., from interrupt handlers or other driver threads)
can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.

To prevent this, set the `no_hw_access` flag to true immediately after
triggering the reset. This signals other driver components to skip
register accesses while the device is offline.

A memory barrier `smp_mb()` is added to ensure the flag update is
globally visible to all cores before the driver enters the sleep/wait
state.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7edb503fe4)
2026-01-07 17:24:10 -05:00
Yang Wang
dc8a887de1 drm/amd/pm: force send pcie parmater on navi1x
v1:
the PMFW didn't initialize the PCIe DPM parameters
and requires the KMD to actively provide these parameters.

v2:
clean & remove unused code logic (lijo)

Fixes: 1a18607c07 ("drm/amd/pm: override pcie dpm parameters only if it is necessary")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4671
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit b0dbd5db7c)
2026-01-05 17:28:45 -05:00
Yang Wang
4f74c2dd97 drm/amd/pm: fix wrong pcie parameter on navi1x
fix wrong pcie dpm parameter on navi1x

Fixes: 1a18607c07 ("drm/amd/pm: override pcie dpm parameters only if it is necessary")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4671
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Co-developed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5c5189cf4b)
2026-01-05 17:24:46 -05:00
mythilam
7a372e214f drm/amd/pm: restore SCLK settings after S0ix resume
User-configured SCLK(GPU core clock)frequencies were not persisting
across S0ix suspend/resume cycles on smu v14 hardware.
The issue occurred because of the code resetting clock frequency
to zero during resume.

This patch addresses the problem by:
- Preserving user-configured values in driver and sets the
  clock frequency across resume
- Preserved settings are sent to the hardware during resume

Signed-off-by: mythilam <mythilam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 20ba98326f)
2025-12-16 14:16:48 -05:00
Yang Wang
5de8ce0f37 drm/amd/pm: adjust the visibility of pp_table sysfs node
v1:
- make pp_table invisible on VF mode (only valid on BM)
- make pp_table invisible on Mi* chips (Not supported)
- make pp_table invisible if scpm feature is enabled.

v2:
move pp_table invisible code logic into amdgpu_dpm_get_pp_table() function.

v3:
add table buffer pointer check both on powerplay & swsmu.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24 12:36:12 -05:00
Yang Wang
e12603bf2c drm/amd/pm: fix amdgpu_irq enabled counter unbalanced on smu v11.0
v1:
- fix amdgpu_irq enabled counter unbalanced issue on smu_v11_0_disable_thermal_alert.

v2:
- re-enable smu thermal alert to make amdgpu irq counter balance for smu v11.0 if in runpm state

[75582.361561] ------------[ cut here ]------------
[75582.361565] WARNING: CPU: 42 PID: 533 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:639 amdgpu_irq_put+0xd8/0xf0 [amdgpu]
...
[75582.362211] Tainted: [E]=UNSIGNED_MODULE
[75582.362214] Hardware name: GIGABYTE MZ01-CE0-00/MZ01-CE0-00, BIOS F14a 08/14/2020
[75582.362218] Workqueue: pm pm_runtime_work
[75582.362225] RIP: 0010:amdgpu_irq_put+0xd8/0xf0 [amdgpu]
[75582.362556] Code: 31 f6 31 ff e9 c9 bf cf c2 44 89 f2 4c 89 e6 4c 89 ef e8 db fc ff ff 5b 41 5c 41 5d 41 5e 5d 31 d2 31 f6 31 ff e9 a8 bf cf c2 <0f> 0b eb c3 b8 fe ff ff ff eb 97 e9 84 e8 8b 00 0f 1f 84 00 00 00
[75582.362560] RSP: 0018:ffffd50d51297b80 EFLAGS: 00010246
[75582.362564] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000
[75582.362568] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[75582.362570] RBP: ffffd50d51297ba0 R08: 0000000000000000 R09: 0000000000000000
[75582.362573] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8e72091d2008
[75582.362576] R13: ffff8e720af80000 R14: 0000000000000000 R15: ffff8e720af80000
[75582.362579] FS:  0000000000000000(0000) GS:ffff8e9158262000(0000) knlGS:0000000000000000
[75582.362582] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[75582.362585] CR2: 000074869d040c14 CR3: 0000001e37a3e000 CR4: 00000000003506f0
[75582.362588] Call Trace:
[75582.362591]  <TASK>
[75582.362597]  smu_v11_0_disable_thermal_alert+0x17/0x30 [amdgpu]
[75582.362983]  smu_smc_hw_cleanup+0x79/0x4f0 [amdgpu]
[75582.363375]  smu_suspend+0x92/0x110 [amdgpu]
[75582.363762]  ? gfx_v10_0_hw_fini+0xd5/0x150 [amdgpu]
[75582.364098]  amdgpu_ip_block_suspend+0x27/0x80 [amdgpu]
[75582.364377]  ? timer_delete_sync+0x10/0x20
[75582.364384]  amdgpu_device_ip_suspend_phase2+0x190/0x450 [amdgpu]
[75582.364665]  amdgpu_device_suspend+0x1ae/0x2f0 [amdgpu]
[75582.364948]  amdgpu_pmops_runtime_suspend+0xf3/0x1f0 [amdgpu]
[75582.365230]  pci_pm_runtime_suspend+0x6d/0x1f0
[75582.365237]  ? __pfx_pci_pm_runtime_suspend+0x10/0x10
[75582.365242]  __rpm_callback+0x4c/0x190
[75582.365246]  ? srso_return_thunk+0x5/0x5f
[75582.365252]  ? srso_return_thunk+0x5/0x5f
[75582.365256]  ? ktime_get_mono_fast_ns+0x43/0xe0
[75582.365263]  rpm_callback+0x6e/0x80
[75582.365267]  rpm_suspend+0x124/0x5f0
[75582.365271]  ? srso_return_thunk+0x5/0x5f
[75582.365275]  ? __schedule+0x439/0x15e0
[75582.365281]  ? srso_return_thunk+0x5/0x5f
[75582.365285]  ? __queue_delayed_work+0xb8/0x180
[75582.365293]  pm_runtime_work+0xc6/0xe0
[75582.365297]  process_one_work+0x1a1/0x3f0
[75582.365303]  worker_thread+0x2ba/0x3d0
[75582.365309]  kthread+0x107/0x220
[75582.365313]  ? __pfx_worker_thread+0x10/0x10
[75582.365318]  ? __pfx_kthread+0x10/0x10
[75582.365323]  ret_from_fork+0xa2/0x120
[75582.365328]  ? __pfx_kthread+0x10/0x10
[75582.365332]  ret_from_fork_asm+0x1a/0x30
[75582.365343]  </TASK>
[75582.365345] ---[ end trace 0000000000000000 ]---
[75582.365350] amdgpu 0000:05:00.0: amdgpu: Fail to disable thermal alert!
[75582.365379] amdgpu 0000:05:00.0: amdgpu: suspend of IP block <smu> failed -22

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-24 12:34:31 -05:00
Timur Kristóf
53cc70f8f1 drm/amd/pm/si: Hook up VCE1 to SI DPM
On SI GPUs, the SMC needs to be aware of whether or not the VCE1
is used. The VCE1 is enabled/disabled through the DPM code.

Also print VCE clocks in amdgpu_pm_info.
Users can inspect the current power state using:
cat /sys/kernel/debug/dri/<card>/amdgpu_pm_info

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:18 -05:00
Asad Kamal
9dc8d07ce0 drm/amd/pm: Remove power2_average node
SOC power consumption is reported by power1_average.
power2_cap_default/min/max only represent second level limits
and don't represent a different type of power or power consumption
by a subsection of the SOC. Therefore power2_average does not serve any
purpose and hence removing power2_average sysfs node

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:17 -05:00
Asad Kamal
af2e61d61b drm/amd/pm: Enable ppt1 caps for smu_v13_0_12
Enable ppt1 caps to fetch and configure ppt1 for smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:17 -05:00
Asad Kamal
12c958d1db drm/amd/pm: Expose ppt1 limit for gc_v9_5_0
Expose power2_cap hwmon node for retrieving and configuring ppt1
limit on supported boards for gc_v9_5_0

v2: Remove version check (Lijo)

v3: Remove power2_average (Lijo)

v4: Put back power2_average, will be removed separately (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:16 -05:00
Asad Kamal
d88edb2460 drm/amd/pm: Add ppt1 support for smu_v13_0_12
Add support to configure and retrieve ppt1 limit for smu_v13_0_12

v2: Add update_caps function and update ppt1 cap based on max ppt1
value, optimize the return values (Lijo)

v3: Add Null ptr check, return not supported in case of invalid
level/type (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:16 -05:00
Asad Kamal
efa6bffae5 drm/amd/pm: Update pmfw headers for smu_v13_0_12
Update pmfw headers for smu_v13_0_12 to include ppt1 messages and
static parameters

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:16 -05:00
Gangliang Xie
0ea8176ce6 drm/amd/pm: remove unnecessary prints for smu busy
smu busy is a normal case when calling SMU_MSG_GetBadPageCount, so no need
to print error status at each time.Instead, only print error status when
timeout given by user is reached.

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:15 -05:00
Asad Kamal
42993bcf1c drm/amd/pm: Add NULL check for power limit
Add NULL check for smu power limit pointer

v2: Update error code on failure (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-11 21:54:14 -05:00
Asad Kamal
2e640e8e7b drm/amd/pm: Update default power1_cap
Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 10:02:19 -05:00
Gangliang Xie
a448c40ff2 drm/amd/pm: check pmfw eeprom feature bit
get and check the pmfw eeprom feature bit to
decide if pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-06 09:56:19 -05:00
Gangliang Xie
f5346a176c drm/amd/pm: add smu ras driver framework
add functions to get smu ras driver

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
77dbd7c0a2 drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12
implement ras_smu_drv interface for smu v13.0.12

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Gangliang Xie
0c6f09e65b drm/amd/pm: add new message definitions for pmfw eeprom interface
Add new message definitions for pmfw eeprom interface

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:58 -05:00
Alex Deucher
fd39b5a583 drm/amdgpu/smu: Handle S0ix for vangogh
Fix the flows for S0ix.  There is no need to stop
rlc or reintialize PMFW in S0ix.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Lijo Lazar
c83fd2a665 drm/amd/pm: Update SMUv13.0.12 partition metrics
Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
56aeca499a drm/amd/pm: Update SMUv13.0.6 partition metrics
For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Lijo Lazar
b480f573a8 drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12
Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Alex Deucher
960e30a61e drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()
For S3 on vangogh, PMFW needs to be notified before the
driver powers down RLC.  This already happens in smu_disable_dpms()
so drop the superfluous call in amdgpu_device_suspend().

Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:48 -05:00
Timur Kristóf
9f1cb2c3fa drm/amd/pm/si: Delete unused structs and fields
The contents of si_dpm.h seem to have been copied from the
old radeon driver, including a lot of structs and fields which
were only relevant to GPU generations even older than SI.

A lot of these can be deleted without causing much churn to the
actual SI DPM code. Let's delete them to make the code easier
to understand.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:47 -05:00
Lijo Lazar
b4f748f22d drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.6
Fill and publish GPU metrics in v1.9 format for SMUv13.0.6 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:47 -05:00
Lijo Lazar
c3cd00fea6 drm/amd/pm: Add helper functions for gpu metrics
Add helper macros to define metrics struct definitions. It will define
structs with field type followed by actual field. A helper macro is also
added to initialize the field encoding for all fields and to initialize
the field members to 0xFFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:46 -05:00
Yang Wang
4c4c138a1c drm/amd/pm: fix missing device_attr cleanup in amdgpu_pm_sysfs_init()
Use the correct label to complete all cleanup work.

Fixes: 4d154b1ca5 ("drm/amd/pm: Add support for DPM policies")
Fixes: 25e82f2e2c ("drm/amd/pm: Add temperature metrics sysfs entry")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:46 -05:00
Yang Wang
8f94d5d0d7 drm/amd/pm: fix the issue of size calculation error for smu 13.0.6
v1:
the driver should handle return value of smu_v13_0_6_printk_clk_levels()
to return the correct size for sysfs reads.

v2:
fix the issue of size calculation error in smu_v13_0_6_print_clks()

Fixes: cdfdec6f16 ("drm/amd/pm: Avoid writing nulls into `pp_od_clk_voltage`")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:46 -05:00
Asad Kamal
d80391dd03 drm/amdgpu: Remove invalidate and flush hdp macros
Remove amdgpu_asic_flush_hdp & amdgpu_asic_invalidate_hdp functions and
directly use the mapped ones

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:33:54 -05:00
Sakari Ailus
ef4a4b8781 drm/amd: Remove redundant pm_runtime_mark_last_busy() calls
pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to
pm_runtime_mark_last_busy().

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 11:31:45 -04:00
Jesse.Zhang
6f1ee58a5e drm/amd/pm: smu13: Enable VCN_RESET for pgm 7 with appropriate firmware version
This patch extends the VCN_RESET capability check to include pgm 7 when the firmware version is 0x07551400 or newer.

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:54:36 -04:00
John Smith
92b0a6ae66 drm/amd/pm/powerplay/smumgr: Fix PCIeBootLinkLevel value on Iceland
Previously this was initialized with zero which represented PCIe Gen
1.0 instead of using the
maximum value from the speed table which is the behaviour of all other
smumgr implementations.

Fixes: 18aafc59b1 ("drm/amd/powerplay: implement fw related smu interface for iceland.")
Signed-off-by: John Smith <itistotalbotnet@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:50:31 -04:00
John Smith
c52238c9fb drm/amd/pm/powerplay/smumgr: Fix PCIeBootLinkLevel value on Fiji
Previously this was initialized with zero which represented PCIe Gen
1.0 instead of using the
maximum value from the speed table which is the behaviour of all other
smumgr implementations.

Fixes: 18edef19ea ("drm/amd/powerplay: implement fw image related smu interface for Fiji.")
Signed-off-by: John Smith <itistotalbotnet@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:49:58 -04:00
Yang Wang
fca0c66b22 drm/amd/pm: fix smu table id bound check issue in smu_cmn_update_table()
'table_index' is a variable defined by the smu driver (kmd)
'table_id' is a variable defined by the hw smu (pmfw)

This code should use table_index as a bounds check.

Fixes: caad2613dc ("drm/amd/powerplay: move table setting common code to smu_cmn.c")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:48:55 -04:00
Ilya Zlobintsev
cdfdec6f16 drm/amd/pm: Avoid writing nulls into pp_od_clk_voltage
Calling `smu_cmn_get_sysfs_buf` aligns the
offset used by `sysfs_emit_at` to the current page boundary, which was
previously directly returned from the various `print_clk_levels`
implementations to be added to the buffer position.
Instead, only the relative offset showing how much was written
to the buffer should be returned, regardless of how it was changed
for alignment purposes.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Ilya Zlobintsev <ilya.zlobintsev@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:17 -04:00
Mario Limonciello
4b6ec94fda drm/amd: Drop calls to restore power limit and clock from smu_resume()
User requested power limits and clock settings are already restored as
part of smu_restore_dpm_user_profile(). It's unnecessary to call the
same restore as part of smu_resume().

Revert the following commits to drop that extra restore:
commit ed4efe426a ("drm/amd: Restore cached power limit during resume")
commit 796ff8a7e0 ("drm/amd: Restore cached manual clock settings during resume")
commit f9b80514a7 ("drm/amd: Only restore cached manual clock settings in restore if OD enabled")

Suggested-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
YiPeng Chai
80e462c5b1 drm/amd/pm: export a function amdgpu_smu_ras_send_msg to allow send msg directly
provide a interface that allows ras client send msg to smu/pmfw directly.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
Lijo Lazar
2b5b3f9b69 drm/amd/pm: Grant interface access after full init
Allow access to user interfaces like sysfs/hwmon only after full
initialization of the device. When device is part of XGMI hive and a
reset is required during initialization, the inteface files will be
created as part of minimal device initialization. Full initialization of
the device will be done only after all devices in XGMI hive are probed
and a reset is done together on all.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:35 -04:00
Mario Limonciello
3cd7ceee9a drm/amd: Save and restore all limit types
Vangogh has separate limits for default PPT and fast PPT. Add
infrastructure to save both of these limits and restore both of them.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:35 -04:00
Mario Limonciello
56a207c39d drm/amd: Remove second call to set_power_limit()
The min/max limits only make sense for default PPT. Restructure
smu_set_power_limit() to only use them in that case.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:35 -04:00
Mario Limonciello
5f4f49a41c drm/amd: Stop overloading power limit with limit type
When passed around internally the upper 8 bits of power limit include
the limit type. This is non-obvious without digging into the nuances
of each function. Instead pass the limit type as an argument to all
applicable layers.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:35 -04:00
Mario Limonciello
861fc60b17 drm/amd: Remove some unncessary header includes
Unnecessary headers can slow down the build, drop em.

No intended functional changes.

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Tested-by: Robert Beckett <bob.beckett@collabora.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:33 -04:00