Commit Graph

12923 Commits

Author SHA1 Message Date
Evan Quan
1af3d0a8e8 drm/amd/pm: update the LC_L1_INACTIVITY setting to address possible noise issue
It is proved that insufficient LC_L1_INACTIVITY setting can cause audio
noise on some platform. With the LC_L1_INACTIVITY increased to 4ms, the
issue can be resolved.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:46:22 -04:00
Zhigang Luo
acbe761046 drm/amdgpu: Skip TMR for MP0_HWIP 13.0.6
For SRIOV VF, no TMR needed.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Zhigang Luo <Zhigang.Luo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:36:48 -04:00
Evan Quan
fd21987274 drm/amd/pm: revise the ASPM settings for thunderbolt attached scenario
Also, correct the comment for NAVI10_PCIE__LC_L1_INACTIVITY_TBT_DEFAULT
as 0x0000000E stands for 400ms instead of 4ms.

Signed-off-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:36:03 -04:00
Samuel Pitoiset
ea2c3c0855 drm/amdgpu: fix clearing mappings for BOs that are always valid in VM
Per VM BOs must be marked as moved or otherwise their ranges are not
updated on use which might be necessary when the replace operation
splits mappings.

This fixes random GPU hangs when replacing sparse mappings from the
userspace, while OP_MAP/OP_UNMAP works fine because always valid BOs
are correctly handled there.

Cc: stable@vger.kernel.org
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:51 -04:00
Lijo Lazar
184d838482 drm/amdgpu: Add vbios attribute only if supported
Not all devices carry VBIOS version information. Add the device
attribute only if supported.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:26 -04:00
Alex Deucher
c09b3bf736 drm/amdgpu/atomfirmware: fix LPDDR5 width reporting
LPDDR5 channels are 32 bit rather than 64, report the width properly
in the log.

v2: Only LPDDR5 are 32 bits per channel.  DDR5 is 64 bits per channel

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2468
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:35:16 -04:00
Nathan Chancellor
4e3f85d1c0 drm/amdgpu: Remove CONFIG_DEBUG_FS guard around body of amdgpu_rap_debugfs_init()
After commit 8020f0f931 ("drm/amd/amdgpu: enable W=1 for amdgpu"),
there is an instance of -Wunused-const-variable when CONFIG_DEBUG_FS is
disabled:

  drivers/gpu/drm/amd/amdgpu/amdgpu_rap.c:110:37: error: unused variable 'amdgpu_rap_debugfs_ops' [-Werror,-Wunused-const-variable]
    110 | static const struct file_operations amdgpu_rap_debugfs_ops = {
        |                                     ^
  1 error generated.

There is no reason for the body of this function to be guarded when
CONFIG_DEBUG_FS is disabled, as debugfs_create_file() is a stub that
just returns an error pointer in that situation. Remove the preprocessor
guards so that the variable never appears unused, while not changing
anything at run time.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:33:09 -04:00
Lijo Lazar
025654ae42 drm/amdgpu: Move calculation of xcp per memory node
Its value is required for finding the memory id of xcp.

Fixes: d26ea1b346 ("drm/amdgpu: Add xcp manager num_xcp_per_mem_partition")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:31:55 -04:00
Jiadong Zhu
ef3c36a6e0 drm/amdgpu: Skip mark offset for high priority rings
Only low priority rings are using chunks to save the offset.
Bypass the mark offset callings from high priority rings.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-23 15:31:50 -04:00
Thomas Zimmermann
042aeecc02 drm/amdgpu: Remove struct drm_driver.gem_prime_mmap
The callback struct drm_driver.gem_prime_mmap as been removed in
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").
Do not assign to it. The assigned function, drm_gem_prime_mmap(), is
now the default for the operation, so there is no change in functionality.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Fixes: 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap")
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Reviewed-by: Christian König <christian.koenig@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230619141129.2002-1-tzimmermann@suse.de
2023-06-19 16:35:27 +02:00
Thomas Zimmermann
de8a334f21 Merge drm/drm-next into drm-misc-next
Backmerging into drm-misc-next to get commit 2c1c7ba457
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2023-06-19 16:33:14 +02:00
Thomas Zimmermann
0adec22702 drm: Remove struct drm_driver.gem_prime_mmap
All drivers initialize this field with drm_gem_prime_mmap(). Call
the function directly and remove the field. Simplifies the code and
resolves a long-standing TODO item.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-3-tzimmermann@suse.de
2023-06-19 13:56:40 +02:00
Philip Yang
5d1c70bb6e drm/amdgpu: Increase hmm range get pages timeout
If hmm_range_fault returns -EBUSY, we should call hmm_range_fault again
to validate the remaining pages. On one system with NUMA auto balancing
enabled, hmm_range_fault takes 6 seconds for 1GB range because CPU
migrate the range one page at a time. To be safe, increase timeout value
to 1 second for 128MB range.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 17:55:41 -04:00
Philip Yang
d728eda3c5 drm/amdgpu: Enable translate further for GC v9.4.3
To extend UTCL2 reach.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 17:55:41 -04:00
Lijo Lazar
0e41639d9a drm/amdgpu: Remove unused NBIO interface
Set compute partition mode interface in NBIO is no longer used. Remove
the only implementation from NBIO v7.9

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 14:33:56 -04:00
ZhenGuo Yin
71eaac368d drm/amdgpu: add entity error check in amdgpu_ctx_get_entity
[Why]
UMD is not aware of entity error, and will keep submitting jobs
into the error entity.

[How]
Add entity error check when getting entity from ctx.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
f88e295e90 drm/amdgpu: add VM generation token
Instead of using the VRAM lost counter add a 64bit token which indicates
if a context or job is still valid to use.

Should the VRAM be lost or the page tables need re-creation the token will
change indicating that userspace needs to act and re-create the contexts
and re-submit the work.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
55bf196f60 drm/amdgpu: reset VM when an error is detected
When some problem with the updates of page tables is detected reset the
state machine of the VM and re-create all page tables from scratch.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
e84e697d92 drm/amdgpu: abort submissions during prepare on error
Forward errors from previous submissions to this one.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
89fae8dc41 drm/amdgpu: mark soft recovered fences with -ENODATA
Set the fence error code before trying to soft-recover it.

It gets overwritten when a hard recovery is required.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
0a33b11d26 drm/amdgpu: mark force completed fences with -ECANCELED
When we force complete fences we should mark them as canceled.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:55 -04:00
Christian König
b13eb02ba8 drm/amdgpu: add amdgpu_error_* debugfs file
This allows us to insert some error codes into the bottom of the pipeline
on an engine.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:54 -04:00
Alex Deucher
2eb841bdbc drm/amdgpu: mark GC 9.4.3 experimental for now
Mark as experimental for now until we get closer to production
to avoid possible undesireable behavior when mixing newer
boards with older kernels.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:37:54 -04:00
Lijo Lazar
b00f55374c drm/amdgpu: Use PSP FW API for partition switch
Use PSP firmware interface for switching compute partitions.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Lijo Lazar
fe381726c9 drm/amdgpu: Change nbio v7.9 xcp status definition
PARTITION_MODE field in PARTITION_COMPUTE_STATUS register is defined as
below by firmware.

	SPX = 0, DPX = 1, TPX = 2, QPX = 3, CPX = 4

Change driver definition accordingly.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Christian König
ca0b954a43 drm/amdgpu: make sure that BOs have a backing store
It's perfectly possible that the BO is about to be destroyed and doesn't
have a backing store associated with it.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Christian König
e2ad8e2df4 drm/amdgpu: make sure BOs are locked in amdgpu_vm_get_memory
We need to grab the lock of the BO or otherwise can run into a crash
when we try to inspect the current location.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Guchun Chen <guchun.chen@amd.com>
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Stanley.Yang
43aedbf4da drm/amdgpu: Add checking mc_vram_size
Do not compare injection address with mc_vram_size
if mc_vram_size is zero.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Stanley.Yang
38298ce6fc drm/amdgpu: Optimize checking ras supported
Using "is_app_apu" to identify device in the native
APU mode or carveout mode.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Candice Li
6fac3964a9 drm/amdgpu: Add channel_dis_num to ras init flags
Add disabled channel number to ras init flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Candice Li
bcd9a5f8b9 drm/amdgpu: Update total channel number for umc v8_10
Update total channel number for umc v8_10.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:59 -04:00
Luben Tuikov
71344a718a drm/amdgpu: Fix usage of UMC fill record in RAS
The fixed commit listed in the Fixes tag below, introduced a bug in
amdgpu_ras.c::amdgpu_reserve_page_direct(), in that when introducing the new
amdgpu_umc_fill_error_record() and internally in that new function the physical
address (argument "uint64_t retired_page"--wrong name) is right-shifted by
AMDGPU_GPU_PAGE_SHIFT. Thus, in amdgpu_reserve_page_direct() when we pass
"address" to that new function, we should NOT right-shift it, since this
results, erroneously, in the page address to be 0 for first
2^(2*AMDGPU_GPU_PAGE_SHIFT) memory addresses.

This commit fixes this bug.

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <Alexander.Deucher@amd.com>
Fixes: 400013b268 ("drm/amdgpu: add umc_fill_error_record to make code more simple")
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230610113536.10621-1-luben.tuikov@amd.com
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Alex Deucher
e5df16d942 drm/amdgpu/sdma4: set align mask to 255
The wptr needs to be incremented at at least 64 dword intervals,
use 256 to align with windows.  This should fix potential hangs
with unaligned updates.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Luben Tuikov
740f42a28f drm/amdgpu: Report ras_num_recs in debugfs
Report the number of records stored in the RAS EEPROM table in debugfs.

This can be used by user-space to calculate the capacity of the RAS EEPROM
table since "bad_page_cnt_threshold" is also reported in the same place in
debugfs.

See commit 7fb6407145 ("drm/amdgpu: Add bad_page_cnt_threshold to debugfs").

ras_num_recs can already be inferred by dumping the RAS EEPROM table, also in
the same debugfs location, see commit reference c65b0805e7 (drm/amdgpu:
RAS EEPROM table is now in debugfs, 2021-04-08). This commit makes it an
integer value easily shown in a single file.

Cc: Alex Deucher <Alexander.Deucher@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Stanley Yang <Stanley.Yang@amd.com>
Cc: John Clements <john.clements@amd.com>
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20230603051043.211548-1-luben.tuikov@amd.com
Acked-by: Alex Deucher <Alexander.Deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Lijo Lazar
82a1f42f6a drm/amdgpu: Release SDMAv4.4.2 ecc irq properly
Release ECC irq only if irq is enabled - only when RAS feature is enabled
ECC irq gets enabled.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Likun Gao
d4a4ff1c8e drm/amdgpu: add wait_for helper for spirom update
Spirom update typically requires extremely long
duration for command execution, and special helper
function to wait for it completion.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:58 -04:00
Ma Jun
a1c23485b8 drm/amdgpu: Print client id for the unregistered interrupt resource
Modify the debug information and print the clien id for these
interrupts as well.

Signed-off-by: Ma Jun <Jun.Ma2@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 11:06:57 -04:00
Arunpravin Paneer Selvam
59eddd4e21 Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
This reverts commit c105518679.

This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.

When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB

Hence, we are reverting this patch.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:36 -04:00
Shiwu Zhang
8d8ffe3740 drm/amdgpu: expose num_hops and num_links xgmi info through dev attr
Add these two dev attrs for xgmi info details which is helpful for
developers checking the xgmi topology by catting the sys file directly.

Take 4 cards with xgmi connection as an example, get the num_hops for each
device or node through xmig_hive_info dir like,
cat /sys/bus/pci/devices/0000:41:00.0/xgmi_hive_info/node1/num_hops
will return "00 41 41 41" where "00" stands for the hops to node1 itself
and "41" is the hops in hex format to every other node in the same hive.
There are node1/node2/node3/node4 representing 4 cards in the hive.

The same for num_links dev attr.

Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Acked-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:31 -04:00
Sonny Jiang
188d3f80fc drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:43:26 -04:00
Hamza Mahfooz
8020f0f931 drm/amd/amdgpu: enable W=1 for amdgpu
We have a clean build with W=1 as of
commit c168feed5d ("drm/amd/display/amdgpu_dm/amdgpu_dm_helpers: Move
SYNAPTICS_DEVICE_ID into CONFIG_DRM_AMD_DC_DCN ifdef"). So, let's enable
these checks unconditionally for the entire module to catch these errors
during development.

Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:42 -04:00
Srinivasan Shanmugam
e06da81749 drm/amdgpu: Fix kdoc warning
Fixes the following gcc with W=1:

drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * EEPROM Table structure v1
drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c:98: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * EEPROM Table structrue v2.1

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:39 -04:00
Mukul Joshi
41ce6d6d03 drm/amdgpu: Rename DRM schedulers in amdgpu TTM
Rename mman.entity to mman.high_pr to make the distinction
clearer that this is a high priority scheduler. Similarly,
rename the recently added mman.delayed to mman.low_pr to
make it clear it is a low priority scheduler.
No functional change in this patch.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-15 10:42:33 -04:00
Dave Airlie
901bdf5ea1 Merge tag 'amd-drm-next-6.5-2023-06-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.5-2023-06-02:

amdgpu:
- SR-IOV fixes
- Warning fixes
- Misc code cleanups and spelling fixes
- DCN 3.2 updates
- Improved DC FAMS support for better power management
- Improved DC SubVP support for better power management
- DCN 3.1.x fixes
- Max IB size query
- DC GPU reset fixes
- RAS updates
- DCN 3.0.x fixes
- S/G display fixes
- CP shadow buffer support
- Implement connector force callback
- Z8 power improvements
- PSP 13.0.10 vbflash support
- Mode2 reset fixes
- Store MQDs in VRAM to improve queue switch latency
- VCN 3.x fixes
- JPEG 3.x fixes
- Enable DC_FP on LoongArch
- GFXOFF fixes
- GC 9.4.3 partition support
- SDMA 4.4.2 partition support
- VCN/JPEG 4.0.3 partition support
- VCN 4.0.3 updates
- NBIO 7.9 updates
- GC 9.4.3 updates
- Take NUMA into account when allocating memory
- Handle NUMA for partitions
- SMU 13.0.6 updates
- GC 9.4.3 RAS updates
- Stop including unused swiotlb.h
- SMU 13.0.7 fixes
- Fix clock output ordering on some APUs
- Clean up DC FPGA code
- GFX9 preemption fixes
- Misc irq fixes
- S0ix fixes
- Add new DRM_AMDGPU_WERROR config parameter to help with CI
- PCIe fix for RDNA2
- kdoc fixes
- Documentation updates

amdkfd:
- Query TTM mem limit rather than hardcoding it
- GC 9.4.3 partition support
- Handle NUMA for partitions

radeon:
- Fix possible double free
- Stop including unused swiotlb.h
- Fix possible division by zero

ttm:
- Add query for TTM mem limit
- Add NUMA awareness to pools
- Export ttm_pool_fini()

UAPI:
- Add new ctx query flag to better handle GPU resets
  Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22290
- Add new interface to query and set shadow buffer for RDNA3
  Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21986
- Add new INFO query for max IB size
  Proposed userspace: https://gitlab.freedesktop.org/bnieuwenhuizen/mesa/-/commits/ib-rejection-v3

amd-drm-next-6.5-2023-06-09:

amdgpu:
- S0ix fixes
- Initial SMU13 Overdrive support
- kdoc fixes
- Misc clode cleanups
- Flexible array fixes
- Display OTG fixes
- SMU 13.0.6 updates
- Revert some broken clock counter updates
- Misc display fixes
- GFX9 preemption fixes
- Add support for newer EEPROM bad page table format
- Add missing radeon secondary id
- Add support for new colorspace KMS API
- CSA fix
- Stable pstate fixes for APUs
- make vbl interface admin only
- Handle PCI accelerator class

amdkfd:
- Add debugger support for gdb

radeon:
- Fix possible UAF

drm:
- Add Colorspace functionality

UAPI:
- Add debugger interface for enabling gdb
  Proposed userspace: https://github.com/ROCm-Developer-Tools/ROCdbgapi/tree/wip-dbgapi
- Add KMS colorspace API
  Discussion: https://lists.freedesktop.org/archives/dri-devel/2023-June/408128.html

From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230609174817.7764-1-alexander.deucher@amd.com
2023-06-15 14:11:22 +10:00
Arunpravin Paneer Selvam
34e5a54327 Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar system"
This reverts commit c105518679.

This patch disables the TOPDOWN flag for APU and few dGPU cards
which has the VRAM size equal to the BAR size.

When we enable the TOPDOWN flag, we get the free blocks at
the highest available memory region and we don't split the
lower order blocks. This change is required to keep off
the fragmentation related issues particularly in ASIC
which has VRAM space <= 500MiB

Hence, we are reverting this patch.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-13 17:12:41 -04:00
Sonny Jiang
9db5ec1ceb drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1
Only vcn0 can process AV1 codecx. In order to use both vcn0 and
vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1
at initialization time.

Signed-off-by: Sonny Jiang <sonjiang@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.1.x
2023-06-13 17:08:42 -04:00
Mario Limonciello
7ab1a4913d drm/amd: Tighten permissions on VBIOS flashing attributes
Non-root users shouldn't be able to try to trigger a VBIOS flash
or query the flashing status.  This should be reserved for users with the
appropriate permissions.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13 17:04:51 -04:00
Mario Limonciello
3eb1a3a040 drm/amd: Make sure image is written to trigger VBIOS image update flow
The VBIOS image update flow requires userspace to:
1) Write the image to `psp_vbflash`
2) Read `psp_vbflash`
3) Poll `psp_vbflash_status` to check for completion

If userspace reads `psp_vbflash` before writing an image, it's
possible that it causes problems that can put the dGPU into an invalid
state.

Explicitly check that an image has been written before letting a read
succeed.

Cc: stable@vger.kernel.org
Fixes: 8424f2ccb3 ("drm/amdgpu/psp: Add vbflash sysfs interface support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13 17:04:45 -04:00
Alex Deucher
e61f67749b drm/amdgpu: add missing radeon secondary PCI ID
0x5b70 is a missing RV370 secondary id.  Add it so
we don't try and probe it with amdgpu.

Cc: michel@daenzer.net
Reviewed-by: Michel Dänzer <mdaenzer@redhat.com>
Tested-by: Michel Dänzer <mdaenzer@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2023-06-13 17:02:51 -04:00
Jiadong Zhu
5b711e7f9c drm/amdgpu: Implement gfx9 patch functions for resubmission
Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9
during the resubmission scenario.

Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.3.x
2023-06-13 16:58:23 -04:00