Starting in Linux 5.8, the graphics and memory clock frequency were not being
reported for CIK cards. This is a regression, since they were reported correctly
in Linux 5.7.
After investigation, I discovered that the smum_send_msg_to_smc() function,
attempts to call the corresponding get_argument() function of ci_smu_funcs.
However, the get_argument() function is not defined in ci_smu_funcs.
This patch fixes the bug by specifying the correct get_argument() function.
Fixes: a0ec225633 ("drm/amd/powerplay: unified interfaces for message issuing and response checking")
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reproducing bug report here:
After hibernating and resuming, DPM is not enabled. This remains the case
even if you test hibernate using the steps here:
https://www.kernel.org/doc/html/latest/power/basic-pm-debugging.html
I debugged the problem, and figured out that in the file hardwaremanager.c,
in the function, phm_enable_dynamic_state_management(), the check
'if (!hwmgr->pp_one_vf && smum_is_dpm_running(hwmgr) && !amdgpu_passthrough(adev) && adev->in_suspend)'
returns true for the hibernate case, and false for the suspend case.
This means that for the hibernate case, the AMDGPU driver doesn't enable DPM
(even though it should) and simply returns from that function.
In the suspend case, it goes ahead and enables DPM, even though it doesn't need to.
I debugged further, and found out that in the case of suspend, for the
CIK/Hawaii GPUs, smum_is_dpm_running(hwmgr) returns false, while in the case of
hibernate, smum_is_dpm_running(hwmgr) returns true.
For CIK, the ci_is_dpm_running() function calls the ci_is_smc_ram_running() function,
which is ultimately used to determine if DPM is currently enabled or not,
and this seems to provide the wrong answer.
I've changed the ci_is_dpm_running() function to instead use the same method that
some other AMD GPU chips do (e.g Fiji), which seems to read the voltage controller.
I've tested on my R9 390 and it seems to work correctly for both suspend and
hibernate use cases, and has been stable so far.
Bug: https://bugzilla.kernel.org/show_bug.cgi?id=208839
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
I updated my system with Radeon VII from kernel 5.6 to kernel 5.7, and
following started to happen on each boot:
...
BUG: kernel NULL pointer dereference, address: 0000000000000128
...
CPU: 9 PID: 1940 Comm: modprobe Tainted: G E 5.7.2-200.im0.fc32.x86_64 #1
Hardware name: System manufacturer System Product Name/PRIME X570-P, BIOS 1407 04/02/2020
RIP: 0010:lock_bus+0x42/0x60 [amdgpu]
...
Call Trace:
i2c_smbus_xfer+0x3d/0xf0
i2c_default_probe+0xf3/0x130
i2c_detect.isra.0+0xfe/0x2b0
? kfree+0xa3/0x200
? kobject_uevent_env+0x11f/0x6a0
? i2c_detect.isra.0+0x2b0/0x2b0
__process_new_driver+0x1b/0x20
bus_for_each_dev+0x64/0x90
? 0xffffffffc0f34000
i2c_register_driver+0x73/0xc0
do_one_initcall+0x46/0x200
? _cond_resched+0x16/0x40
? kmem_cache_alloc_trace+0x167/0x220
? do_init_module+0x23/0x260
do_init_module+0x5c/0x260
__do_sys_init_module+0x14f/0x170
do_syscall_64+0x5b/0xf0
entry_SYSCALL_64_after_hwframe+0x44/0xa9
...
Error appears when some i2c device driver tries to probe for devices
using adapter registered by `smu_v11_0_i2c_eeprom_control_init()`.
Code supporting this adapter requires `adev->psp.ras.ras` to be not
NULL, which is true only when `amdgpu_ras_init()` detects HW support by
calling `amdgpu_ras_check_supported()`.
Before 9015d60c9e, adapter was registered by
-> amdgpu_device_ip_init()
-> amdgpu_ras_recovery_init()
-> amdgpu_ras_eeprom_init()
-> smu_v11_0_i2c_eeprom_control_init()
after verifying that `adev->psp.ras.ras` is not NULL in
`amdgpu_ras_recovery_init()`. Currently it is registered
unconditionally by
-> amdgpu_device_ip_init()
-> pp_sw_init()
-> hwmgr_sw_init()
-> vega20_smu_init()
-> smu_v11_0_i2c_eeprom_control_init()
Fix simply adds HW support check (ras == NULL => no support) before
calling `smu_v11_0_i2c_eeprom_control_{init,fini}()`.
Please note that there is a chance that similar fix is also required for
CHIP_ARCTURUS. I do not know whether any actual Arcturus hardware without
RAS exist, and whether calling `smu_i2c_eeprom_init()` makes any sense
when there is no HW support.
Cc: stable@vger.kernel.org
Fixes: 9015d60c9e ("drm/amdgpu: Move EEPROM I2C adapter to amdgpu_device")
Signed-off-by: Ivan Mironov <mironov.ivan@gmail.com>
Tested-by: Bjorn Nostvold <bjorn.nostvold@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Initializes Powertune data for a specific Hawaii card by fixing what
looks like a typo in the code. The device ID 66B1 is not a supported
device ID for this driver, and is not mentioned elsewhere. 67B1 is a
valid device ID, and is a Hawaii Pro GPU.
I have tested on my R9 390 which has device ID 67B1, and it works
fine without problems.
Signed-off-by: Sandeep Raghuraman <sandy.8925@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add some APU flags to simplify handling of different APU
variants. It's easier to understand the special cases
if we use names flags rather than checking device ids and
silicon revisions.
v2: rebase on latest code
Acked-by: Evan Quan <evan.quan@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Puts the i2c adapter in common place for sharing by RAS
and upcoming data read from FRU EEPROM feature.
v2:
Move i2c adapter to amdgpu_pm and rename it.
v3: Move i2c adapter init to ASIC specific code and get rid
of the switch case in amdgpu_device
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Originally, due to the restriction from PSP and SMU, VF has
to send message to hypervisor driver to handle powerplay
change which is complicated and redundant. Currently, SMU
and PSP can support VF to directly handle powerplay
change by itself. Therefore, the old code about the handshake
between VF and PF to handle powerplay will be removed and VF
will use new the registers below to handshake with SMU.
mmMP1_SMN_C2PMSG_101: register to handle SMU message
mmMP1_SMN_C2PMSG_102: register to handle SMU parameter
mmMP1_SMN_C2PMSG_103: register to handle SMU response
v2: remove module parameter pp_one_vf
v3: fix the parens
v4: forbid vf to change smu feature
v5: use hwmon_attributes_visible to skip sepicified hwmon atrribute
v6: change skip condition at vega10_copy_table_to_smc
Signed-off-by: Yintian Tao <yttao@amd.com>
Acked-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/vegam_smumgr.c:
In function vegam_populate_clock_stretcher_data_table:
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/vegam_smumgr.c:1489:29:
warning: variable stretch_amount2 set but not used [-Wunused-but-set-variable]
It is never used, so can be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The variables HiSidd and LoSidd are being initialized with values that
are never read and are being updated a little later with a new value.
The initialization is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
return errno code to caller when error occur, and meanwhile
remove gcc '-Wunused-but-set-variable' warning.
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/vegam_smumgr.c: In function vegam_populate_smc_boot_level:
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/vegam_smumgr.c:1364:6: warning: variable result set but not used [-Wunused-but-set-variable]
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Chen Wandun <chenwandun@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/powerplay/smumgr/vegam_smumgr.c: In
function ‘vegam_populate_smc_acpi_level’:
drivers/gpu/drm/amd/powerplay/smumgr/vegam_smumgr.c:1117:11:
warning: variable 'us_mvdd' set but not used [-Wunused-but-set-variable]
It is never used, so can be removed.
Fixes: ac7822b002 ("drm/amd/powerplay: add smumgr support for VEGAM (v2)")
Signed-off-by: yu kuai <yukuai3@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c: In function fiji_populate_single_graphic_level:
drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c:943:11: warning: variable threshold set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c: In function fiji_populate_memory_timing_parameters:
drivers/gpu/drm/amd/powerplay/smumgr/fiji_smumgr.c:1504:8: warning: variable state set but not used [-Wunused-but-set-variable]
They are introduced by commit 2e112b4ae3 ("drm/amd/pp:
remove fiji_smc/smumgr split."), but never used,
so remove it.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhengbin <zhengbin13@huawei.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-next-5.5-2019-10-09:
amdgpu:
- Additional RAS enablement for vega20
- RAS page retirement and bad page storage in EEPROM
- No GPU reset with unrecoverable RAS errors
- Reserve vram for page tables rather than trying to evict
- Fix issues with GPU reset and xgmi hives
- DC i2c over aux fixes
- Direct submission for clears, PTE/PDE updates
- Improvements to help support recoverable GPU page faults
- Silence harmless SAD block messages
- Clean up code for creating a bo at a fixed location
- Initial DC HDCP support
- Lots of documentation fixes
- GPU reset for renoir
- Add IH clockgating support for soc15 asics
- Powerplay improvements
- DC MST cleanups
- Add support for MSI-X
- Misc cleanups and bug fixes
amdkfd:
- Query KFD device info by asic type rather than pci ids
- Add navi14 support
- Add renoir support
- Add navi12 support
- gfx10 trap handler improvements
- pasid cleanups
- Check against device cgroup
ttm:
- Return -EBUSY with pipelining with no_gpu_wait
radeon:
- Silence harmless SAD block messages
device_cgroup:
- Export devcgroup_check_permission
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexdeucher@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20191010041713.3412-1-alexander.deucher@amd.com
Polaris and vegam use count for the value rather than
level. This looks like a copy paste typo from when
the code was adapted from previous asics.
I'm not sure that the SMU actually uses this value, so
I don't know that it actually is a bug per se.
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=108609
Reported-by: Robert Strube <rstrube@gmail.com>
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Even though 'smu8_smu' is declared, it is not used after below statement.
smu8_smu = hwmgr->smu_backend;
So 'unused variable' could be safely removed
to stop warning message as below:
drivers/gpu/drm/amd/amdgpu/../powerplay/smumgr/smu8_smumgr.c:180:22:
warning: variable ‘smu8_smu’ set but not used
[-Wunused-but-set-variable]
struct smu8_smumgr *smu8_smu;
^
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Null pointer dereference check should have been checked,
ahead of below routine.
struct amdgpu_device *adev = hwmgr->adev;
With this commit, it could avoid potential NULL dereference.
Signed-off-by: Austin Kim <austindh.kim@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
v2:
PPSMC_MSG_RequestI2CBus seems not to work and so to avoid conflict
over I2C bus and engine disable thermal control access to
force SMU stop using the I2C bus until the issue is reslolved.
Expose and call vega20_is_smc_ram_running to skip locking when SMU
FW is not yet loaded.
v3:
Remove the prevoius hack as the SMU found the bug.
v5: Typo fix
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm-misc-next for v5.3:
UAPI Changes:
Cross-subsystem Changes:
- Add code to signal all dma-fences when freed with pending signals.
- Annotate reservation object access in CONFIG_DEBUG_MUTEXES
Core Changes:
- Assorted documentation fixes.
- Use irqsave/restore spinlock to add crc entry.
- Move code around to drm_client, for internal modeset clients.
- Make drm_crtc.h and drm_debugfs.h self-contained.
- Remove drm_fb_helper_connector.
- Add bootsplash to todo.
- Fix lock ordering in pan_display_legacy.
- Support pinning buffers to current location in gem-vram.
- Remove the now unused locking functions from gem-vram.
- Remove the now unused kmap-object argument from vram helpers.
- Stop checking return value of debugfs_create.
- Add atomic encoder enable/disable helpers.
- pass drm_atomic_state to atomic connector check.
- Add atomic support for bridge enable/disable.
- Add self refresh helpers to core.
Driver Changes:
- Add extra delay to make MTP SDM845 work.
- Small fixes to virtio, vkms, sii902x, sii9234, ast, mcde, analogix, rockchip.
- Add zpos and ?BGR8888 support to meson.
- More removals of drm_os_linux and drmP headers for amd, radeon, sti, r128, r128, savage, sis.
- Allow synopsis to unwedge the i2c hdmi bus.
- Add orientation quirks for GPD panels.
- Edid cleanups and fixing handling for edid < 1.2.
- Add runtime pm to stm.
- Handle s/r in dw-hdmi.
- Add hooks for power on/off to dsi for stm.
- Remove virtio dirty tracking code, done in drm core.
- Rework BO handling in ast and mgag200.
Tiny conflict in drivers/gpu/drm/amd/display/dc/clk_mgr/clk_mgr.c,
needed #include <linux/slab.h> to make it compile.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/0e01de30-9797-853c-732f-4a5bd6e61445@linux.intel.com
This patch enables gfxoff and stutter mode again, since we take more testing on
raven series. For raven2 and picasso, we can enable it directly. And for raven,
we need check the RLC/SMC ucode version cannot be less than #531/0x1e45.
v2: add smc version checking for raven.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> (v1)
Tested-by: Likun Gao <Likun.Gao@amd.com> (v2)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This patch uses REG32_PCIE wrapper instead of writting pci_index2 and reading
pci_data2 for powerplay. This sequence should be protected by pcie_idx_lock.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This was noticed by Gustavo and his -Wimplicit-fallthrough
patches. However, in this case, I believe we should have breaks
rather than falling though, that said, in practice we should
never fall through in the first place so there should be no
change in behavior.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Missing firmware declaration caused firmware requirement to
not be noted by the module and may cause firmware to not
be available in initrd.
Fixes: bc4b539e38 "drm/amdgpu: remove old CI DPM implementation"
Reviewed-by: James Zhu <James.Zhu@amd.com>
Tested-by: James Zhu <James.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>