Commit Graph

73 Commits

Author SHA1 Message Date
Yang Wang
0cd2bc06de drm/amd/pm: enable amdgpu smu send message log
v1:
enable amdgpu smu driver message log.

v2:
add smu/pmfw response value into debug log.

Signed-off-by: Yang Wang <KevinYang.Wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-18 15:47:52 -05:00
Asad Kamal
a62503ca85 drm/amd/pm: Add gpu_metrics_v1_5
Add new gpu_metrics_v1_5 to acquire vcn/jpeg activity
& pcie nak error counters

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-01-03 11:16:05 -05:00
Li Ma
34ec3cedca drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
Update driver if headers and metrics table in smu v14_0_0 after smu fw promotion.
Drop the legacy metrics table and add warning of checking pmfw version.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-11-03 12:18:33 -04:00
Asad Kamal
011d99ee71 drm/amd/pm: Add gpu_metrics_v1_4
Add new gpu_metrics_v1_4 to acquire XGMI data transfer,
pcie bandwidth & Clock lock status

v2:
Add pcie error counter to gpu metric table v1_4

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-10-13 10:59:02 -04:00
Asad Kamal
f1d1abd616 drm/amd/pm: Update pci link speed for smu v13.0.6
Update pcie link speed registers for smu v13.0.6 &
populate gpu metric table with pcie link speed rather than
gen for smu v13_0_0, smu v13_0_6 & smu v13_0_7

v2:
Update ESM register address
Used macro to convert pcie gen to speed

v3:
Chaged macro to inline function for pcie gen to speed

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-16 11:34:37 -04:00
Ran Sun
7406f963bf drm/amd/pm: Clean up errors in arcturus_ppt.c
Fix the following errors reported by checkpatch:

ERROR: "foo* bar" should be "foo *bar"
ERROR: spaces required around that '=' (ctx:VxW)
ERROR: space prohibited before that close parenthesis ')'

Signed-off-by: Ran Sun <sunran001@208suo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-27 14:47:43 -04:00
Wenyou Yang
41cec40bc9 drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics
To acquire the voltage and current info from gpu_metrics interface,
but gpu_metrics_v2_3 doesn't contain them, and to be backward compatible,
add new gpu_metrics_v2_4 structure.

Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Wenyou Yang <WenYou.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-07-25 13:47:27 -04:00
Perry Yuan
31865e96f9 drm/amdgpu/pm: add capped/uncapped power profile modes
Capped and uncapped workload types switching are supported on Vangogh,
User can switch the power profile and check current type with below commands.

1) switch to capped mode:
`# echo 8 > /sys/class/drm/card0/device/pp_power_profile_mode`

2) switch to uncapped mode:
`# echo 9 > /sys/class/drm/card0/device/pp_power_profile_mode`

3) check current mode:
$ cat /sys/class/drm/card0/device/pp_power_profile_mode
 1 3D_FULL_SCREEN
 3          VIDEO
 4             VR
 5        COMPUTE
 6         CUSTOM
 8         CAPPED
 9       UNCAPPED*

Acked-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-03-07 14:22:39 -05:00
Candice Li
51097df1b2 drm/amd/pm: Support RAS fatal error mode1 reset on smu v13_0_0 and v13_0_10
Support RAS fatal error mode1 reset on smu v13_0_0 and v13_0_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-01-13 14:59:26 -05:00
Kenneth Feng
60cfad329a drm/amd/pm: enable mode1 reset on smu_v13_0_10
enable mode1 reset and prioritize debug port on smu_v13_0_10
as a more reliable message processing

v2 - move mode1 reset callback to smu_v13_0_0_ppt.c

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-11-09 17:41:42 -05:00
Li Ma
0d6516efff drm/amd/pm:add new gpu_metrics_v2_3 to acquire average temperature info
Add new gpu_metrics_v2_3 to acquire average temperature info from SMU metrics. To acquire average temp info from gpu_metrics interface, but gpu_metrics_v2_2 only has members to show current temp info.
---
v1:
	Only add average_temperature_gfx in gpu_metrics_v2_3.
v2:
	Add average temp members for soc, core and l3 in gpu_metrics_v2_3 and put these new members at the end of gpu_metrics_v2_3. Add operation to read average temp info from metrics table.
v3:
	Merge v1 and v2 and rename the patch.
v4:
	Merge v3. Add firmware version judgment in vangogh_common_get_gpu_metrics to maintain backward compatibility and rename the patch. "return ret" on error scenario in smu_cmn_get_smc_version.

Signed-off-by: Li Ma <li.ma@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-09-14 12:38:41 -04:00
Evan Quan
6f73d67626 drm/amd/pm: optimize the interface for dpm feature status query
Drop extra CMN2ASIC_MAPPING_FEATURE transform. Also some cosmetic
fixes for better readability.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-08 11:42:44 -04:00
Alex Deucher
7a09f61f8e drm/amdgpu/swsmu: use new register offsets for smu_cmn.c
Use the per asic offsets so the we don't have to have
asic specific logic in the common code.

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:45:00 -04:00
Huang Rui
8763e4c1a0 drm/amdgpu/pm: update MP v13_0_4 smu message register marco
Update MP v13_0_4 register macro for SMU message

v2: squash in missed case (Alex)

Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Tim Huang <Tim.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-06-03 16:44:15 -04:00
Kenneth Feng
334682ae81 drm/amd/pm: enable workload type change on smu_v13_0_7
enable workload type change on smu_v13_0_7

v2: squash in out of bounds fix (Alex)

Signed-off-by: Kenneth Feng <kenneth.feng@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-06 10:36:12 -04:00
Evan Quan
276c03a054 drm/amd/smu: Update SMU13 support for SMU 13.0.0
Modify the common smu13 code and add a new smu
13.0.0 ppt file to handle the smu 13.0.0 specific
configuration.

v2: squash in typo fix in profile name

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 09:58:46 -04:00
Evan Quan
6a2d7a229e drm/amd/pm: enable the support for retrieving combo pptable
We need to relay on this way to get the raw PPTable when
SCPM feature is enabled.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-05-04 09:58:33 -04:00
Darren Powell
f24044bd9b amdgpu/pm: Clarify documentation of error handling in send_smc_mesg
Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two
cases exist where messages are silently dropped with no error returned.
These cases occur in unusual situations where either:
 1. the message type is not allowed to a virtual GPU, or
 2. a PCI recovery is underway and the HW is not yet in sync with the SW

For more details see
 commit 4ea5081c82 ("drm/amd/powerplay: enable SMC message filter")
 commit bf36b52e78 ("drm/amdgpu: Avoid accessing HW when suspending SW state")

(v2)
  Reworked with suggestions from Luben & Paul

(v3)
  Updated wording as per Luben's feedback
  Corrected error stating all messages denied on virtual GPU
  (each GPU has mask of which messages are allowed)

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-04-19 13:58:32 -04:00
Dan Carpenter
508a47d434 drm/amd/pm: fix indenting in __smu_cmn_reg_print_error()
Smatch complains that the dev_err_ratelimited() is indented one tab more
than the surrounding lines.

	drivers/gpu/drm/amd/amdgpu/../pm/swsmu/smu_cmn.c:174
	__smu_cmn_reg_print_error() warn: inconsistent indenting

It looks like it's not a bug, just that the indenting needs to be cleaned
up.

Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 15:01:12 -04:00
Lijo Lazar
8718ca1dbf drm/amd/pm: Send message when resp status is 0xFC
When PMFW is really busy, it will respond with 0xFC. However, it doesn't
change the response register state when it becomes free. Driver should
retry and proceed to send message if the response status is 0xFC.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 14:25:16 -04:00
Yifan Zhang
7c916f95f5 drm/amdgpu: change registers in error checking for smu 13.0.5
smu 13.0.5 use new registers for smu msg and param.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-03-15 14:25:15 -04:00
Lijo Lazar
a1235a01e0 drm/amd/pm: Fix missing prototype warning
Fix below warning
warning: no previous prototype for '__smu_get_enabled_features' [-Wmissing-prototypes]

Fixes: f141e25147 ("drm/amd/pm: validate SMU feature enable message for getting feature enabled mask")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-23 14:26:36 -05:00
Prike Liang
f141e25147 drm/amd/pm: validate SMU feature enable message for getting feature enabled mask
There's always miss the SMU feature enabled checked in the NPI phase,
so let validate the SMU feature enable message directly rather than
add more and more MP1 version check.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Lijo Lazar <Lijo.Lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-22 14:44:36 -05:00
Yifan Zhang
cec24112e1 drm/amd/pm: update smc message sequence for smu 13.0.5
this patch updates smc message sequence for smu 13.0.5.

Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-18 14:07:00 -05:00
Prike Liang
db090ff8f9 drm/amd/pm: Add support for MP1 13.0.8
Set smu sw function and enable swSMU support for MP1.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-16 17:30:03 -05:00
Evan Quan
cc188a73ad drm/amd/pm: fix enabled features retrieving on Renoir and Cyan Skillfish
For Cyan Skillfish and Renoir, there is no interface provided by PMFW
to retrieve the enabled features. So, we assume all features are enabled.

Fixes: 7ade3ca9cd ("drm/amd/pm: correct the usage for 'supported' member of smu_feature structure")

Signed-off-by: Evan Quan <evan.quan@amd.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-11 16:07:56 -05:00
Evan Quan
f69c15e15e drm/amd/pm: revise the implementation of smu_cmn_disable_all_features_with_exception
As there is no internal cache for enabled ppfeatures now. Thus the 2nd
parameter will be not needed any more.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
a89ef0448c drm/amd/pm: avoid consecutive retrieving for enabled ppfeatures
As the enabled ppfeatures are just retrieved ahead. We can use
that directly instead of retrieving again and again.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
3c6591e947 drm/amd/pm: drop the cache for enabled ppfeatures
The following scenarios make the driver cache for enabled ppfeatures
outdated and invalid:
  - Other tools interact with PMFW to change the enabled ppfeatures.
  - PMFW may enable/disable some features behind driver's back. E.g.
    for sienna_cichild, on gfxoff entering, PMFW will disable gfx
    related DPM features. All those are performed without driver's
    notice.
Also considering driver does not actually interact with PMFW such
frequently, the benefit brought by such cache is very limited.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
2d282665d2 drm/amd/pm: update the data type for retrieving enabled ppfeatures
Use uint64_t instead of an array of uint32_t. This can avoid
some non-necessary intermediate uint32_t -> uint64_t conversions.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
5af779adc3 drm/amd/pm: unify the interface for retrieving enabled ppfeatures
Instead of having two which do the same thing.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
bd42571168 drm/amd/pm: correct the way for retrieving enabled ppfeatures on Renoir
As other dGPU asics, Renoir should use smu_cmn_get_enabled_mask() for
that job.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 18:01:16 -05:00
Evan Quan
1f2cf08aa0 drm/amd/pm: drop unneeded feature->mutex
As all those related APIs are already well protected by adev->pm.mutex.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Evan Quan
da11407f06 drm/amd/pm: drop unneeded smu->metrics_lock
As all those related APIs are already well protected by
adev->pm.mutex and smu->message_lock.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-01-25 18:00:32 -05:00
Dave Airlie
b06103b532 Merge tag 'amd-drm-next-5.17-2021-12-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amdgpu:
- Add some display debugfs entries
- RAS fixes
- SR-IOV fixes
- W=1 fixes
- Documentation fixes
- IH timestamp fix
- Misc power fixes
- IP discovery fixes
- Large driver documentation updates
- Multi-GPU memory use reductions
- Misc display fixes and cleanups
- Add new SMU debug option

amdkfd:
- SVM fixes

radeon:
- Fix typo in comment

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211216202731.5900-1-alexander.deucher@amd.com
2021-12-23 11:55:28 +10:00
Evan Quan
7e31a8585b drm/amdgpu: move smu_debug_mask to a more proper place
As the smu_context will be invisible from outside(of power). Also,
the smu_debug_mask can be shared around all power code instead of
some specific framework(swSMU) only.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-14 16:09:11 -05:00
Lang Yu
6ff7fddbd1 drm/amdgpu: add support for SMU debug option
SMU firmware expects the driver maintains error context
and doesn't interact with SMU any more when SMU errors
occurred. That will aid in debugging SMU firmware issues.

Add SMU debug option support for this request, it can be
enabled or disabled via amdgpu_smu_debug debugfs file.
Use a 32-bit mask to indicate corresponding debug modes.
Currently, only one mode(HALT_ON_ERROR) is supported.
When enabled, it brings hardware to a kind of halt state
so that no one can touch it any more in the envent of SMU
errors.

The dirver interacts with SMU via sending messages. And
threre are three ways to sending messages to SMU in current
implementation. Handle them respectively as following:

1, smu_cmn_send_smc_msg_with_param() for normal timeout cases

  Halt on any error.

2, smu_cmn_send_msg_without_waiting()/smu_cmn_wait_for_response()
for longer timeout cases

  Halt on errors apart from ETIME. Otherwise this way won't work.
  Let the user handle ETIME error in such a case.

3, smu_cmn_send_msg_without_waiting() for no waiting cases

  Halt on errors apart from ETIME. Otherwise second way won't work.

== Command Guide ==

1, enable HALT_ON_ERROR mode

 # echo 0x1 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

2, disable HALT_ON_ERROR mode

 # echo 0x0 > /sys/kernel/debug/dri/0/amdgpu_smu_debug

v5:
 - Use bit mask to allow more debug features.(Evan)
 - Use WRAN() instead of BUG().(Evan)

v4:
 - Set to halt state instead of a simple hang.(Christian)

v3:
 - Use debugfs_create_bool().(Christian)
 - Put variable into smu_context struct.
 - Don't resend command when timeout.

v2:
 - Resend command when timeout.(Lijo)
 - Use debugfs file instead of module parameter.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13 16:33:16 -05:00
Isabella Basso
bbe04dec5c drm/amd: fix improper docstring syntax
This fixes various warnings relating to erroneous docstring syntax, of
which some are listed below:

 warning: Function parameter or member 'adev' not described in
 'amdgpu_atomfirmware_ras_rom_addr'
 ...
 warning: expecting prototype for amdgpu_atpx_validate_functions().
 Prototype was for amdgpu_atpx_validate() instead
 ...
 warning: Excess function parameter 'mem' description in 'amdgpu_preempt_mgr_new'
 ...
 warning: Cannot understand  * @kfd_get_cu_occupancy - Collect number of
 waves in-flight on this device
 ...
 warning: This comment starts with '/**', but isn't a kernel-doc
 comment. Refer Documentation/doc-guide/kernel-doc.rst

Signed-off-by: Isabella Basso <isabbasso@riseup.net>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-12-13 16:32:34 -05:00
Dave Airlie
f8eb96b4df Merge tag 'amd-drm-next-5.17-2021-12-02' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-5.17-2021-12-02:

amdgpu:
- Use generic drm fb helpers
- PSR fixes
- Rework DCN3.1 clkmgr
- DPCD 1.3 fixes
- Misc display fixes can cleanups
- Clock query fixes for APUs
- LTTPR fixes
- DSC fixes
- Misc PM fixes
- RAS fixes
- OLED backlight fix
- SRIOV fixes
- Add STB (Smart Trace Buffer) for supported dGPUs
- IH rework
- Enable seamless boot for DCN3.01

amdkfd:
- Rework more stuff around IP discovery enumeration
- Further clean up of interfaces with amdgpu
- SVM fixes

radeon:
- Indentation fixes

UAPI:
- Add a new KFD header that defines some of the sysfs bitfields and enums that userspace has been using for a while
  The corresponding bit-fields and enums in user mode are defined in
  https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface/blob/master/include/hsakmttypes.h

Signed-off-by: Dave Airlie <airlied@redhat.com>

# Conflicts:
#	drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211202191643.5970-1-alexander.deucher@amd.com
2021-12-10 13:52:51 +10:00
Luben Tuikov
e771d71d8d drm/amd/pm: Print the error on command submission
Print the error on command submission immediately after submitting to
the SMU. This is rate-limited. It helps to immediately know there was an
error on command submission, rather than leave it up to clients to report
the error, as sometimes they do not.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Luben Tuikov
ca4b32bb2d drm/amd/pm: Add debug prints
Add prints where there are none and none are printed in the callee.

Remove the word "previous" from comment and print to make it shorter and
avoid confusion in various prints.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-24 14:06:53 -05:00
Luben Tuikov
38a268b391 drm/amd/pm: Enhanced reporting also for a stuck command
Also print the message index and parameter of the stuck command.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 17:40:46 -05:00
Luben Tuikov
8bd1b7c29b drm/amd/pm: Enhanced reporting also for a stuck command
Also print the message index and parameter of the stuck command.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 17:08:44 -05:00
Evan Quan
8b514e898e drm/amd/pm: fix runpm hang when amdgpu loaded prior to sound driver
Current RUNPM mechanism relies on PMFW to master the timing for BACO
in/exit. And that needs cooperation from sound driver for dstate
change notification for function 1(audio). Otherwise(on sound driver
missing), BACO cannot be kicked in correctly and hang will be observed
on RUNPM exit.

By switching back to legacy message way on sound driver missing,
we are able to fix the runpm hang observed for the scenario below:
amdgpu driver loaded -> runpm suspend kicked -> sound driver loaded

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reported-and-tested-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Guchun Chen <guchun.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2021-09-16 09:56:24 -04:00
Darren Powell
828db598bf amdgpu/pm: Replace navi10 usage of sprintf with sysfs_emit
initial modification of files
  smu_cmn.c
  navi10_ppt.c

=== Test ===
AMDGPU_PCI_ADDR=`lspci -nn | grep "VGA\|Display" | cut -d " " -f 1`
AMDGPU_HWMON=`ls -la /sys/class/hwmon | grep $AMDGPU_PCI_ADDR | awk '{print $9}'`
HWMON_DIR=/sys/class/hwmon/${AMDGPU_HWMON}
LOGFILE=pp_printf.test.log

lspci -nn | grep "VGA\|Display"  > $LOGFILE
FILES="pp_dpm_sclk
pp_sclk_od
pp_mclk_od
pp_dpm_pcie
pp_od_clk_voltage
pp_power_profile_mode "

for f in $FILES
do
  echo === $f === >> $LOGFILE
  cat $HWMON_DIR/device/$f >> $LOGFILE
done
cat $LOGFILE

Signed-off-by: Darren Powell <darren.powell@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-08-10 10:34:37 -04:00
Luben Tuikov
544dcd74b7 drm/amd/pm: Fix a bug in semaphore double-lock
Fix a bug in smu_cmn_send_msg_without_waiting() in
that this function does not need to take the
smu->message_lock mutex in order to send a message
down to the SMU. The mutex is acquired by the
caller of this function instead.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Changfeng Zhu <Changfeng.Zhu@amd.com>
Cc: Huang Rui <ray.huang@amd.com>
Fixes: 5810323ba6 ("drm/amd/pm: Fix a bug communicating with the SMU (v5)")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-28 22:15:44 -04:00
Luben Tuikov
5810323ba6 drm/amd/pm: Fix a bug communicating with the SMU (v5)
This fixes a bug which if we probe a non-existing
I2C device, and the SMU returns 0xFF, from then on
we can never communicate with the SMU, because the
code before this patch reads and interprets 0xFF
as a terminal error, and thus we never write 0
into register 90 to clear the status (and
subsequently send a new command to the SMU.)

It is not an error that the SMU returns status
0xFF. This means that the SMU executed the last
command successfully (execution status), but the
command result is an error of some sort (execution
result), depending on what the command was.

When doing a status check of the SMU, before we
send a new command, the only status which
precludes us from sending a new command is 0--the
SMU hasn't finished executing a previous command,
and 0xFC--the SMU is busy.

This bug was seen as the following line in the
kernel log,

amdgpu: Msg issuing pre-check failed(0xff) and SMU may be not in the right state!

when subsequent SMU commands, not necessarily
related to I2C, were sent to the SMU.

This patch fixes this bug.

v2: Add a comment to the description of
__smu_cmn_poll_stat() to explain why we're NOT
defining the SMU FW return codes as macros, but
are instead hard-coding them. Such a change, can
be followed up by a subsequent patch.

v3: The changes are,
a) Add comments to break labels in
   __smu_cmn_reg2errno().

b) When an unknown/unspecified/undefined result is
   returned back from the SMU, map that to
   -EREMOTEIO, to distinguish failure at the SMU
   FW.

c) Add kernel-doc to
   smu_cmn_send_msg_without_waiting(),
   smu_cmn_wait_for_response(),
   smu_cmn_send_smc_msg_with_param().

d) In smu_cmn_send_smc_msg_with_param(), since we
   wait for completion of the command, if the
   result of the completion is
   undefined/unknown/unspecified, we print that to
   the kernel log.

v4: a) Add macros as requested, though redundant, to
    be removed when SMU consolidates for all
    ASICs--see comment in code.
    b) Get out if the SMU code is unknown.

v5: Rename the macro names.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Evan Quan <evan.quan@amd.com>
Cc: Lijo Lazar <Lijo.Lazar@amd.com>
Fixes: fcb1fe9c9e ("drm/amd/powerplay: pre-check the SMU state before issuing message")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Reviewed-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-07-23 10:09:28 -04:00
Evan Quan
1e75be2b67 drm/amd/pm: update the cached dpm feature status
For some ASICs, the real dpm feature disablement job is handled by
PMFW during baco reset and custom pptable loading. Cached dpm feature
status need to be updated to pair that.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-11 16:02:30 -04:00
Graham Sider
c23083cd37 drm/amd/pm: Add common throttler translation func
Defines smu_cmn_get_indep_throttler_status which performs ASIC
independent translation given a corresponding lookup table.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-10 11:44:25 -04:00
Graham Sider
22a7dcf580 drm/amd/pm: Add u64 throttler status field to gpu_metrics
This patch set adds support for a new ASIC independant u64 throttler
status field (indep_throttle_status). Piggybacks off the
gpu_metrics_v1_3 bump and similarly bumps gpu_metrics_v2 version (to
v2_2) to add field.

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-06-10 11:44:25 -04:00