Implement caching separately for SystemMetrics table from PMFW. The same
table could be used for multiple interfaces. Hence, cache it internally
to avoid multiple queries to the firmware. For SystemMetrics table, 5ms
cache interval is sufficient.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v1:
Returns different error codes based on the scenario to help the user app understand
the AMDGPU device status when an exception occurs.
v2:
change -NODEV to -EBUSY.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use int instead of uint32_t for 'ret' variable to store negative error
codes or zero returned by other functions.
Storing the negative error codes in unsigned type, doesn't cause an issue
at runtime but can be confusing. Additionally, assigning negative error
codes to unsigned type may trigger a GCC warning when the -Wsign-conversion
flag is enabled.
No effect on runtime.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code. Swap variable positions
on either side of '==' to enhance readability.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code. Swap variable positions
on either side of '!=' to enhance readability.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For ternary operators in the form of "a ? true : false", if 'a' itself
returns a boolean result, the ternary operator can be omitted. Remove
redundant ternary operators to clean up the code.
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Port of commit 227545b9a0 ("drm/radeon/dpm: Disable sclk
switching on Oland when two 4K 60Hz monitors are connected")
This is an ad-hoc DPM fix, necessary because we don't have
proper bandwidth calculation for DCE 6.
We define "high pixelclock" for SI as higher than necessary
for 4K 30Hz. For example, 4K 60Hz and 1080p 144Hz fall into
this category.
When two high pixel clock displays are connected to Oland,
additionally disable shader clock switching, which results in
a higher voltage, thereby addressing some visible flickering.
v2:
Add more comments.
v3:
Split into two commits for easier review.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to pp_pm_compute_clocks the non-DC display code
has "issues with mclk switching with refresh rates over 120 hz".
The workaround is to disable MCLK switching in this case.
Do the same for legacy DPM.
Fixes: 6ddbd37f10 ("drm/amd/pm: optimize the amdgpu_pm_compute_clocks() implementations")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some parts of the code base expect that MCLK switching is turned
off when the vblank time is set to zero.
According to pp_pm_compute_clocks the non-DC code has issues
with MCLK switching with refresh rates over 120 Hz.
v3:
Add code comment to explain this better.
Add an if statement instead of changing the switch_limit.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Based on some comments in dm_pp_display_configuration
above the crtc_index and line_time fields, these values
are programmed to the SMC to work around an SMC hang
when it switches MCLK.
According to Alex, the Windows driver programs them to:
mclk_change_block_cp_min = 200 / line_time
mclk_change_block_cp_max = 100 / line_time
Let's use the same for the sake of consistency.
Previously we used the watermark values, but it seemed buggy
as the code was mixing up low/high and A/B watermarks, and
was not saving a low watermark value on DCE 6, so
mclk_change_block_cp_max would be always zero previously.
Split this change off from the previous si_upload_smc_data
to make it easier to bisect, in case it causes any issues.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The si_upload_smc_data function uses si_write_smc_soft_register
to set some register values in the SMC, and expects the result
to be PPSMC_Result_OK which is 1.
The PPSMC_Result_OK / PPSMC_Result_Failed values are used for
checking the result of a command sent to the SMC.
However, the si_write_smc_soft_register actually doesn't send
any commands to the SMC and returns zero on success,
so this check was incorrect.
Fix that by not checking the return value, just like other
calls to si_write_smc_soft_register.
v3:
Additionally, when no display is plugged in, there is no need
to restrict MCLK switching, so program the registers to zero.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The SMC can take an excessive amount of time to process some
messages under some conditions.
Background:
Sending a message to the SMC works by writing the message into
the mmSMC_MESSAGE_0 register and its optional parameter into
the mmSMC_SCRATCH0, and then polling mmSMC_RESP_0. Previously
the timeout was AMDGPU_MAX_USEC_TIMEOUT, ie. 100 ms.
Increase the timeout to 200 ms for all messages and to 1 sec for
a few messages which I've observed to be especially slow:
PPSMC_MSG_NoForcedLevel
PPSMC_MSG_SetEnabledLevels
PPSMC_MSG_SetForcedLevels
PPSMC_MSG_DisableULV
PPSMC_MSG_SwitchToSwState
This fixes the following problems on Tahiti when switching
from a lower clock power state to a higher clock state, such
as when DC turns on a display which was previously turned off.
* si_restrict_performance_levels_before_switch would fail
(if the user previously forced high clocks using sysfs)
* si_set_sw_state would fail (always)
It turns out that both of those failures were SMC timeouts and
that the SMC actually didn't fail or hang, just needs more time
to process those.
Add a warning when there is an SMC timeout to make it easier to
identify this type of problem in the future.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Always send PPSMC_MSG_DisableULV to the SMC, even if ULV mode
is unsupported, to make sure it is properly turned off.
v3:
Simplify si_disable_ulv further.
Always check the return value of amdgpu_si_send_msg_to_smc.
Fixes: 841686df9f ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the conditions for setting the SMU vcn reset caps in the SMU v13.0.6 PPT
initialization function. Specifically:
- Add support for VCN reset capability for firmware versions 0x00558200 and
above when the program version is 0.
- Add support for VCN reset capability for firmware versions 0x05551800 and
above when the program version is 5.
v2: correct the smu mp1 version for program 5 (Lijo)
Suggested-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit implements VCN reset capability for SMU v13.0.6 with the following changes:
1. Added new PPSMC message ID (0x5B) for VCN reset in SMU firmware interface
2. Extended SMU capabilities to include VCN_RESET support
3. Implemented VCN reset support check:
- Added smu_v13_0_6_reset_vcn_is_supported() function
4. Updated SMU v13.0.6 PPT functions to include VCN reset operations
v2: clean up debug info (Alex)
v3: remove unsupported message and split smu v13.0.6 changes to a separate patch (Lijo)
v4: simply the function (smu_v13_0_6_reset_vcn_is_supported) (Lijo)
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This change introduces infrastructure to check whether VCN reset
is supported by the SMU firmware. Key changes include:
1. Added new functions to query VCN reset support:
- amdgpu_dpm_reset_vcn_is_supported()
- smu_reset_vcn_is_supported()
- pptable_funcs.reset_vcn_is_supported callback
2. Implemented proper locking in the DPM layer with mutex protection
3. Maintained consistency with existing SDMA reset support checks
The new capability allows callers to check for VCN reset support
before attempting the operation, preventing unnecessary attempts
on unsupported platforms.
v2: clean up debug info(Alex)
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove caching logic of temperature metrics from SMUv13.0.12. The
caching logic needs to be moved to a higher level.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The bad pages threshold exceed CPER should be generated once threshold
exceeded, no matter the bad_page_threshold setted or not.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fetch system metrics table to fill gpuboard/baseboard temperature
metrics data for smu_v13_0_12
v2: Remove unnecessary checks, used separate metrics time for
temperature metrics table(Lijo)
v3: Use cached values for back to back system metrics query(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add smu interface to get baseboard/gpuboard temperature metrics
v2: Rename is_support to is_supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add dpm interface to get gpuboard/baseboard temperature metrics
v2: Add temperature metrics support check(Lijo)
v3: Return error code in case of operation not supported(Lijo)
Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Certain messages will processed with high priority by PMFW even if it
hasn't responded to a previous message. Send the priority message
regardless of the success/fail status of the previous message. Add
support on SMUv13.0.6 and SMUv13.0.12
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cached metrics data validity is 1ms on arcturus. It's not reasonable for
any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cached metrics data validity is 1ms on SMUv13.0.6 SOCs. It's not
reasonable for any client to query gpu_metrics at a faster rate and
constantly interrupt PMFW.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If dpm tables are already populated on SMU v13.0.6 SOCs, use the cached
data. Otherwise, fetch values from firmware.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PMFW interface to get max/min frequencies is not available on aldebaran
VFs. Use data, if available, in DPM tables to get the max/min
frequencies.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a small typo in phm_wait_on_indirect_register().
Swap mask and value arguments provided to phm_wait_on_register() so that
they satisfy the function signature and actual usage scheme.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
In practice this doesn't fix any issues because the only place this
function is used uses the same value for the value and mask.
Fixes: 3bace35914 ("drm/amd/powerplay: add hardware manager sub-component")
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pass amdgpu device context instead of drm device context to some
amdgpu_device_* functions. DRM device context is not required in those
functions. No functional change.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>