The smu needs to get the rlc power down message to sync the rlc state
with smu, the rlc state updating message need to be sent at while smu
begin suspend sequence , otherwise SMU will crash while RLC state is not
notified by driver, and rlc state probally changed after that
notification, so it needs to notify rlc state to smu at the end of the
suspend sequence in amdgpu_device_suspend() that can make sure the rlc
state is correctly set to SMU.
[ 101.000590] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 101.000598] amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
[ 110.838026] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 110.838035] amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
[ 110.838039] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 110.838040] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
[ 110.884394] PM: suspend of devices aborted after 21213.620 msecs
[ 110.884402] PM: start suspend of devices aborted after 21213.882 msecs
[ 110.884405] PM: Some devices failed to suspend, or early wake event detected
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support for getting power1_cap_min value on smu13 and smu11.
For other Asics, we still use 0 as the default value.
Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
smu_check_fw_version is called in smu hw init, thus smu if version
and version are garenteed to be stored in smu context. No need to
call smu_cmn_get_smc_version again after system boot up.
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for older BIOS, smu won't fill average field of gpu_metrics_table, so we acquire
average_* from current field. but now average value is available in gpu_metrics_v2_4
Signed-off-by: Kun Liu <Kun.Liu2@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
To acquire the voltage and current info from gpu_metrics interface,
but gpu_metrics_v2_3 doesn't contain them, and to be backward compatible,
add new gpu_metrics_v2_4 structure.
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Wenyou Yang <WenYou.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch reverses the DPM clocks levels output of pp_dpm_mclk
and pp_dpm_fclk.
On dGPUs and older APUs we expose the levels from lowest clocks
to highest clocks. But for some APUs, the clocks levels that from
the DFPstateTable are given the reversed orders by PMFW. Like the
memory DPM clocks that are exposed by pp_dpm_mclk.
It's not intuitive that they are reversed on these APUs. All tools
and software that talks to the driver then has to know different ways
to interpret the data depending on the asic.
So we need to reverse them to expose the clocks levels from the
driver consistently.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Capped and Uncapped workload types are supported, each workload type
has different performance thresholds and pstate conditions.
* capped mode is used by power centric workload
* uncapped mode is used by perf centric workload
Acked-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new gpu_metrics_v2_3 to acquire average temperature info from SMU metrics. To acquire average temp info from gpu_metrics interface, but gpu_metrics_v2_2 only has members to show current temp info.
---
v1:
Only add average_temperature_gfx in gpu_metrics_v2_3.
v2:
Add average temp members for soc, core and l3 in gpu_metrics_v2_3 and put these new members at the end of gpu_metrics_v2_3. Add operation to read average temp info from metrics table.
v3:
Merge v1 and v2 and rename the patch.
v4:
Merge v3. Add firmware version judgment in vangogh_common_get_gpu_metrics to maintain backward compatibility and rename the patch. "return ret" on error scenario in smu_cmn_get_smc_version.
Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement functions to get and set GFXOFF's entry count and residency
for vangogh.
Signed-off-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
So we can eventaully use them in the common smu code for
accessing the SMU mailboxes without needing a lot of
per asic logic in the common code.
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following scenarios make the driver cache for enabled ppfeatures
outdated and invalid:
- Other tools interact with PMFW to change the enabled ppfeatures.
- PMFW may enable/disable some features behind driver's back. E.g.
for sienna_cichild, on gfxoff entering, PMFW will disable gfx
related DPM features. All those are performed without driver's
notice.
Also considering driver does not actually interact with PMFW such
frequently, the benefit brought by such cache is very limited.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The supported features should be retrieved just after EnableAllDpmFeatures message
complete. And the check(whether some dpm feature is supported) is only needed when we
decide to enable or disable it.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use uint64_t instead of an array of uint32_t. This can avoid
some non-necessary intermediate uint32_t -> uint64_t conversions.
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
add support that allow the userspace tool like RGP to get the GFX clock
value at runtime, the fix follow the old way to show the min/current/max
clocks level for compatible consideration.
=== Test ===
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz *
1: 1100Mhz
2: 1600Mhz
then run stress test on one APU system.
$ cat /sys/class/drm/card0/device/pp_dpm_sclk
0: 200Mhz
1: 1040Mhz *
2: 1600Mhz
The current GFXCLK value is updated at runtime.
BugLink: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5260
Reviewed-by: Huang Ray <Ray.Huang@amd.com>
Signed-off-by: Perry Yuan <Perry.Yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
'watermarks_table' must be freed instead 'clocks_table', because
'clocks_table' is known to be NULL at this point and 'watermarks_table' is
never freed if the last kzalloc fails.
Fixes: c98ee89736 ("drm/amd/pm: add the fine grain tuning function for vangogh")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
As OOB(out-of-band) interface may be used to update the power limits.
Thus to make sure the power limits reporting of our driver always
reflects the correct values, the internal cache must be aligned
carefully.
V2: add support for out-of-band of other ASICs
align cached current power limit with OOB imposed
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-By: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to revise two names of sensor values for vangogh.
New smu metrics table is supported by new pmfw
(from version 4.63.36.00 ), it includes two parts, one part is
the current smu metrics table data and the other part is the
average smu metrics table data. The hwmon will read the current gfxclk
and mclk from the current smu metrics table data.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to remove the "set" function of pp_dpm_mclk for vangogh.
For vangogh, mclk bonds with fclk, they will lock each other
on the same perfomance level. But according to the smu message from pmfw,
only fclk is allowed to set value manually, so remove the unnecessary
code of "set" function for mclk.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to add support for new smu metrics table for vangogh.
It will support new and legacy smu metrics table in the meanwhile.
New pmfw version is 4.63.36.00, and new smu interface version is #3.
v1: check smu pmfw version to determine to use new or legacy smu metrics
table
v2: check smu interface version to determine to use new or legacy smu
metrics table
v3: revise wrong symbol
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to add the callback to get vbios bootup values for
vangogh, it will get the bootup values of gfxclk, mclk, socclk and so
on.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Do the same thing we do for Renoir. We can check, but since
the sbios has started DPM, it will always return true which
causes the driver to skip some of the SMU init when it shouldn't.
Reviewed-by: Zhan Liu <zhan.liu@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the gpu_metrics interface implementations to use the latest
upgraded data structures.
V2: fit the data type change of energy_accumulator
Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to correct the name of one function for vangogh.
This function is used to print the clock levels of all kinds of IP
components.
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Reviewed-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Per discussions with PMFW team, the driver only needs to
notify the PMFW when the RLC is disabled. The RLC FW will notify
the PMFW directly when it's enabled.
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We should commit the value after restore them back to default as well.
$ echo "r" > pp_od_clk_voltage
$ echo "c" > pp_od_clk_voltage
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are spelling mistakes in error and warning messages, the text
power_dpm_force_perfomance_level is missing a letter r and should be
power_dpm_force_performance_level. Fix them.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch is to make the error log more clear for fine grain tuning
function, it covers Raven/Raven2/Picasso/Renoir/Vangogh.
The fine grain tuning function uses the sysfs file -- pp_od_clk_voltage,
but only when another sysfs file -- power_dpm_force_performance_level is
switched to "manual" mode, it is allowed to access "pp_od_clk_voltage".
Signed-off-by: Xiaojian Du <Xiaojian.Du@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Acked-by: Kevin Wang <kevin1.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement hwmon API for reading/setting slow and fast PPT limit.
APU power is managed to system-level requirements through the PPT
(package power tracking) feature. PPT is intended to limit power to the
requirements of the power source and could be dynamically updated to
maximize APU performance within the system power budget.
Here FAST_PPT_LIMIT manages the ~10 ms moving average of APU power,
while SLOW_PPT_LIMIT manages the configurable, thermally significant
moving average of APU power (default ~5000 ms).
User could read slow/fast ppt limit using command "cat power*_cap" or
"sensors" in the hwmon device directory. User could adjust values of
slow/fast ppt limit as needed depending on workloads through command
"echo ## > power*_cap".
Example:
$ echo 15000000 > power1_cap
$ echo 18000000 > power2_cap
$ sensors
amdgpu-pci-0300
Adapter: PCI adapter
slowPPT: 9.04W (cap = 15.00 W)
fastPPT: 9.04W (cap = 18.00 W)
v2: align with existing interfaces for the getting/setting of PPT
limits. Encode the upper 8 bits of limit value to distinguish
slow and fast power limit type.
Signed-off-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>