Kenneth Feng
5f38ac54e6
drm/amd/pm: fix the high voltage and temperature issue
...
fix the high voltage and temperature issue after the driver is unloaded on smu 13.0.0,
smu 13.0.7 and smu 13.0.10
v2 - fix the code format and make sure it is used on the unload case only.
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com >
Reviewed-by: Yang Wang <kevinyang.wang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-31 16:40:16 -04:00
Yifan Zhang
a17f574ab4
drm/amdgpu: remove amdgpu_mes_self_test in gpu recover
...
gpu tlb flush is skipped if reset sem is held, it makes
mes_self_test fail since it involves add_hw_queue/remove_hw_queue
which needs tlb flush functional. Remove mes_self_test in gpu
recover sequence.
This patch is to fix the recover failure in gfx11.
[ 1831.768292] [drm] ring sdma_32769.3.3 was added
[ 1831.768313] [drm] ring gfx_32769.1.1 ib test pass
[ 1831.768337] [drm] ring compute_32769.2.2 ib test pass
[ 1831.768399] amdgpu 0000:c2:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:8 pasid:32769, for process pid 0 thread pid 0)
[ 1831.768434] amdgpu 0000:c2:00.0: amdgpu: in page starting at address 0x0000aec200000000 from client 10
[ 1831.768456] amdgpu 0000:c2:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00800A30
[ 1831.768473] amdgpu 0000:c2:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
[ 1831.768489] amdgpu 0000:c2:00.0: amdgpu: MORE_FAULTS: 0x0
[ 1831.768501] amdgpu 0000:c2:00.0: amdgpu: WALKER_ERROR: 0x0
[ 1831.768513] amdgpu 0000:c2:00.0: amdgpu: PERMISSION_FAULTS: 0x3
[ 1831.768521] amdgpu 0000:c2:00.0: amdgpu: MAPPING_ERROR: 0x0
[ 1831.768529] amdgpu 0000:c2:00.0: amdgpu: RW: 0x0
[ 1831.931229] amdgpu 0000:c2:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma_32769.3.3 test failed (-110)
[ 1832.062917] [drm:mes_v11_0_submit_pkt_and_poll_completion.constprop.0 [amdgpu]] *ERROR* MES failed to response msg=3
[ 1832.063107] [drm:amdgpu_mes_remove_hw_queue [amdgpu]] *ERROR* failed to remove hardware queue, queue id = 3
Fixes: e2e3788850 ("drm/amdgpu: rework lock handling for flush_tlb v2")
Reported-by: Li Ma <li.ma@amd.com >
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-31 16:40:15 -04:00
Candice Li
e020d01575
drm/amdgpu: Drop deferred error in uncorrectable error check
...
Drop checking deferred error which can be handled by poison
consumption.
Signed-off-by: Candice Li <candice.li@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-31 16:40:15 -04:00
Tao Zhou
d1d4c0b7b6
drm/amdgpu: check RAS supported first in ras_reset_error_count
...
Not all platforms support RAS.
Fixes: 73582be11a ("drm/amdgpu: bypass RAS error reset in some conditions")
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Yang Wang <kevinyang.wang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-31 16:40:15 -04:00
Dave Airlie
631808095a
Merge tag 'amd-drm-next-6.7-2023-10-27' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
...
amd-drm-next-6.7-2023-10-27:
amdgpu:
- RAS fixes
- Seamless boot fixes
- NBIO 7.7 fix
- SMU 14.0 fixes
- GC 11.5 fixes
- DML2 fixes
- ASPM fixes
- VPE fixes
- Misc code cleanups
- SRIOV fixes
- Add some missing copyright notices
- DCN 3.5 fixes
- FAMS fixes
- Backlight fix
- S/G display fix
- fdinfo cleanups
- EXT_COHERENT fixes for APU and NUMA systems
amdkfd:
- Misc fixes
- Misc code cleanups
- SVM fixes
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexander.deucher@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com
2023-10-31 12:37:19 +10:00
Dave Airlie
915b6d034b
Merge tag 'drm-misc-next-2023-10-27' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
...
drm-misc-next for v6.7-rc1:
drm-misc-next-2023-10-19 + following:
UAPI Changes:
Cross-subsystem Changes:
- Convert fbdev drivers to use fbdev i/o mem helpers.
Core Changes:
- Use cross-references for macros in docs.
- Make drm_client_buffer_addb use addfb2.
- Add NV20 and NV30 YUV formats.
- Documentation updates for create_dumb ioctl.
- CI fixes.
- Allow variable number of run-queues in scheduler.
Driver Changes:
- Rename drm/ast constants.
- Make ili9882t its own driver.
- Assorted fixes in ivpu, vc4, bridge/synopsis, amdgpu.
- Add planar formats to rockchip.
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/3d92fae8-9b1b-4165-9ca8-5fda11ee146b@linux.intel.com
2023-10-31 10:47:50 +10:00
Umio Yasuno
dd3dd9829b
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
...
Remove unused variables from amdgpu_show_fdinfo
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:23:01 -04:00
Rob Clark
e8e696c307
drm/amdgpu: Remove duplicate fdinfo fields
...
Some of the fields that are handled by drm_show_fdinfo() crept back in
when rebasing the patch. Remove them again.
Fixes: 376c25f8ca ("drm/amdgpu: Switch to fdinfo helper")
Reviewed-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Rob Clark <robdclark@chromium.org >
Reviewed-by: <alexander.deucher@amd.com >
Co-developed-by: Umio Yasuno <coelacanth_dream@protonmail.com >
Signed-off-by: Umio Yasuno <coelacanth_dream@protonmail.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:23:01 -04:00
Kenneth Feng
3ea8dd3758
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
...
avoid to disable gfxhub interrupt when driver is unloaded on gmc 11
Signed-off-by: Kenneth Feng <kenneth.feng@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:15:39 -04:00
David Francis
142262a1c0
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
...
On gfx943 APU, EXT_COHERENT should give MTYPE_CC for local and
MTYPE_UC for nonlocal memory.
On NUMA systems, local memory gets the local mtype, set by an
override callback. If EXT_COHERENT is set, memory will be set as
MTYPE_UC by default, with local memory MTYPE_CC.
Add an option in the override function for this case, and
add a check to ensure it is not used on UNCACHED memory.
V2: Combined APU and NUMA code into one patch
V3: Fixed a potential nullptr in amdgpu_vm_bo_update
Signed-off-by: David Francis <David.Francis@amd.com >
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:15:16 -04:00
Candice Li
a395f7ffce
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
...
Retrieve correctable error count from ce_count_lo_chip instead of
mca_umc_status.
Signed-off-by: Candice Li <candice.li@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:15:10 -04:00
Candice Li
d59fcfb084
drm/amdgpu: Identify data parity error corrected in replay mode
...
Use ErrorCodeExt field to identify data parity error in replay mode.
Signed-off-by: Candice Li <candice.li@amd.com >
Reviewed-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Yang Wang <kevinyang.wang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:15:03 -04:00
Mukul Joshi
f7a17b2b36
drm/amdgpu: Fix typo in IP discovery parsing
...
Fix a typo in parsing of the GC info table header when
reading the IP discovery table.
Fixes: 0e64c9aad0 ("drm/amdgpu: add type conversion for gc info")
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-27 14:13:27 -04:00
Dave Airlie
44117828ed
Merge tag 'amd-drm-fixes-6.6-2023-10-25' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
...
amd-drm-fixes-6.6-2023-10-25:
amdgpu:
- Extend VI APSM quirks to more platforms
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexander.deucher@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231026035452.14921-1-alexander.deucher@amd.com
2023-10-27 12:17:26 +10:00
Dave Airlie
6366ffa6ed
Merge tag 'drm-misc-fixes-2023-10-26' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
...
Short summary of fixes pull:
amdgpu:
- ignore duplicated BOs in CS parser
- remove redundant call to amdgpu_ctx_priority_is_valid()
amdkfd:
- reserve fence slot while locking BO
dp_mst:
- Fix NULL deref in get_mst_branch_device_by_guid_helper()
logicvc:
- Kconfig: Select REGMAP and REGMAP_MMIO
ivpu:
- Fix missing VPUIP interrupts
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Thomas Zimmermann <tzimmermann@suse.de >
Link: https://patchwork.freedesktop.org/patch/msgid/20231026110132.GA10591@linux-uq9g.fritz.box
2023-10-27 11:51:35 +10:00
Lijo Lazar
d055714a21
drm/amdgpu: Use pcie domain of xcc acpi objects
...
PCI domain/segment information of xccs is available through ACPI DSM
methods. Consider that also while looking for devices.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com >
Reviewed-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com >
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 19:04:39 -04:00
Lijo Lazar
3f69d5860f
drm/amdgpu: Add a read to GFX v9.4.3 ring test
...
Issue a read to confirm the register write before ringing doorbell. With
multiple XCCs there is chance for race condition.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com >
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Asad Kamal <asad.kamal@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 19:04:13 -04:00
Tao Zhou
2cea7bb911
drm/amdgpu: get RAS poison status from DF v4_6_2
...
Add DF block and RAS poison mode query for DF v4_6_2.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com >
Acked-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 19:02:52 -04:00
Lijo Lazar
dd2687f5d9
drm/amdgpu: Use discovery table's subrevision
...
Use subrevision of IP version in discovery table to identify SOC
revision id for NBIO v7.9 SOCs. Only newer bootloaders update
subrevision field.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Le Ma <le.ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 19:02:44 -04:00
Mario Limonciello
2757a848cb
drm/amd: Explicitly disable ASPM when dynamic switching disabled
...
Currently there are separate but related checks:
* amdgpu_device_should_use_aspm()
* amdgpu_device_aspm_support_quirk()
* amdgpu_device_pcie_dynamic_switching_supported()
Simplify into checking whether DPM was enabled or not in the auto
case. This works because amdgpu_device_pcie_dynamic_switching_supported()
populates that value.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:23 -04:00
Mario Limonciello
1a6513de49
drm/amd: Move AMD_IS_APU check for ASPM into top level function
...
There is no need for every ASIC driver to perform the same check.
Move the duplicated code into amdgpu_device_should_use_aspm().
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:23 -04:00
Mario Limonciello
fbf1035b03
drm/amd: Disable PP_PCIE_DPM_MASK when dynamic speed switching not supported
...
Rather than individual ASICs checking for the quirk, set the quirk at the
driver level.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:23 -04:00
Lin.Cao
9ee819285c
drm/amdgpu remove restriction of sriov max_pfn on Vega10
...
Remove restriction of sriov max_pfn so that TBA and TMA can move to high
47 bits address.
Regression test: change range alloc flag of libdrm as
AMDGPU_VA_RANGE_HIGH and there is no flr occur when testing amdgpu_test
of drm.
Signed-off-by: Lin.Cao <lincao12@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Qu Huang
5104fdf50d
drm/amdgpu: Fix a null pointer access when the smc_rreg pointer is NULL
...
In certain types of chips, such as VEGA20, reading the amdgpu_regs_smc file could result in an abnormal null pointer access when the smc_rreg pointer is NULL. Below are the steps to reproduce this issue and the corresponding exception log:
1. Navigate to the directory: /sys/kernel/debug/dri/0
2. Execute command: cat amdgpu_regs_smc
3. Exception Log::
[4005007.702554] BUG: kernel NULL pointer dereference, address: 0000000000000000
[4005007.702562] #PF: supervisor instruction fetch in kernel mode
[4005007.702567] #PF: error_code(0x0010) - not-present page
[4005007.702570] PGD 0 P4D 0
[4005007.702576] Oops: 0010 [#1 ] SMP NOPTI
[4005007.702581] CPU: 4 PID: 62563 Comm: cat Tainted: G OE 5.15.0-43-generic #46-Ubunt u
[4005007.702590] RIP: 0010:0x0
[4005007.702598] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.702600] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.702605] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.702609] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.702612] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.702615] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.702618] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.702622] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.702626] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.702629] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0
[4005007.702633] Call Trace:
[4005007.702636] <TASK>
[4005007.702640] amdgpu_debugfs_regs_smc_read+0xb0/0x120 [amdgpu]
[4005007.703002] full_proxy_read+0x5c/0x80
[4005007.703011] vfs_read+0x9f/0x1a0
[4005007.703019] ksys_read+0x67/0xe0
[4005007.703023] __x64_sys_read+0x19/0x20
[4005007.703028] do_syscall_64+0x5c/0xc0
[4005007.703034] ? do_user_addr_fault+0x1e3/0x670
[4005007.703040] ? exit_to_user_mode_prepare+0x37/0xb0
[4005007.703047] ? irqentry_exit_to_user_mode+0x9/0x20
[4005007.703052] ? irqentry_exit+0x19/0x30
[4005007.703057] ? exc_page_fault+0x89/0x160
[4005007.703062] ? asm_exc_page_fault+0x8/0x30
[4005007.703068] entry_SYSCALL_64_after_hwframe+0x44/0xae
[4005007.703075] RIP: 0033:0x7f5e07672992
[4005007.703079] Code: c0 e9 b2 fe ff ff 50 48 8d 3d fa b2 0c 00 e8 c5 1d 02 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 e c 28 48 89 54 24
[4005007.703083] RSP: 002b:00007ffe03097898 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[4005007.703088] RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007f5e07672992
[4005007.703091] RDX: 0000000000020000 RSI: 00007f5e06753000 RDI: 0000000000000003
[4005007.703094] RBP: 00007f5e06753000 R08: 00007f5e06752010 R09: 00007f5e06752010
[4005007.703096] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000022000
[4005007.703099] R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000
[4005007.703105] </TASK>
[4005007.703107] Modules linked in: nf_tables libcrc32c nfnetlink algif_hash af_alg binfmt_misc nls_ iso8859_1 ipmi_ssif ast intel_rapl_msr intel_rapl_common drm_vram_helper drm_ttm_helper amd64_edac t tm edac_mce_amd kvm_amd ccp mac_hid k10temp kvm acpi_ipmi ipmi_si rapl sch_fq_codel ipmi_devintf ipm i_msghandler msr parport_pc ppdev lp parport mtd pstore_blk efi_pstore ramoops pstore_zone reed_solo mon ip_tables x_tables autofs4 ib_uverbs ib_core amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) iommu_v 2 amd_sched(OE) amdkcl(OE) drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm igb ahci xhci_pci libahci i2c_piix4 i2c_algo_bit xhci_pci_renesas dca
[4005007.703184] CR2: 0000000000000000
[4005007.703188] ---[ end trace ac65a538d240da39 ]---
[4005007.800865] RIP: 0010:0x0
[4005007.800871] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[4005007.800874] RSP: 0018:ffffa82b46d27da0 EFLAGS: 00010206
[4005007.800878] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffa82b46d27e68
[4005007.800881] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff9940656e0000
[4005007.800883] RBP: ffffa82b46d27dd8 R08: 0000000000000000 R09: ffff994060c07980
[4005007.800886] R10: 0000000000020000 R11: 0000000000000000 R12: 00007f5e06753000
[4005007.800888] R13: ffff9940656e0000 R14: ffffa82b46d27e68 R15: 00007f5e06753000
[4005007.800891] FS: 00007f5e0755b740(0000) GS:ffff99479d300000(0000) knlGS:0000000000000000
[4005007.800895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[4005007.800898] CR2: ffffffffffffffd6 CR3: 00000003253fc000 CR4: 00000000003506e0
Signed-off-by: Qu Huang <qu.huang@linux.dev >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Tao Zhou
73582be11a
drm/amdgpu: bypass RAS error reset in some conditions
...
PMFW is responsible for RAS error reset in some conditions, driver can
skip the operation.
v2: add check for ras->in_recovery, it's set earlier than
amdgpu_in_reset.
v3: fix error in gpu reset check.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Tao Zhou
f3a3bbf156
drm/amdgpu: enable RAS poison mode for APU
...
Enable it by default on APU platform.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Lang Yu
fc4981b69c
drm/amdgpu/vpe: correct queue stop programing
...
Otherwise IB test would fail during GPU reset.
Signed-off-by: Lang Yu <Lang.Yu@amd.com >
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Mario Limonciello
e5f52a84bf
drm/amd: Disable ASPM for VI w/ all Intel systems
...
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.
Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.
Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com >
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:22 -04:00
Lijo Lazar
8eece69ace
drm/amdgpu: Add API to get full IP version
...
Fetch the full version of IP including variant and subrevision.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Reviewed-by: Le Ma <le.ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:21 -04:00
Jiadong Zhu
037fb9c600
drm/amdgpu: add tmz support for GC IP v11.5.0
...
Add tmz support for GC 11.5.0.
Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com >
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:21 -04:00
Li Ma
493c75bbe3
drm/amdgpu: modify if condition in nbio_v7_7.c
...
remove unnecessary "enable" in if condition.
Signed-off-by: Li Ma <li.ma@amd.com >
Reviewed-by: Tim Huang <Tim.Huang@amd.com >
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:21 -04:00
Yang Wang
ec3e0a9167
drm/amdgpu: refine ras error kernel log print
...
refine ras error kernel log to avoid user-ridden ambiguity.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:21 -04:00
Yang Wang
53d4d77927
drm/amdgpu: fix find ras error node error
...
the origin function might return the wrong node.
Fixes: 5b1270beb3 ("drm/amdgpu: add ras_err_info to identify RAS error source")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-26 18:41:21 -04:00
Alex Deucher
b70438004a
drm/amdgpu: move buffer funcs setting up a level
...
Rather than doing this in the IP code for the SDMA paging
engine, move it up to the core device level init level.
This should fix the scheduler init ordering.
v2: drop extra parens
v3: drop SDMA helpers
v4: Added a Fixes tag because amdgpu dereferences an uninitialized
scheduler without this patch, and this patch fixes this. (Luben)
Tested-by: Luben Tuikov <luben.tuikov@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
Link: https://lore.kernel.org/r/20231025171928.3318505-1-alexander.deucher@amd.com
Acked-by: Christian König <christian.koenig@amd.com >
Fixes: 56e449603f ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com >
2023-10-26 16:04:24 -04:00
Luben Tuikov
56e449603f
drm/sched: Convert the GPU scheduler to variable number of run-queues
...
The GPU scheduler has now a variable number of run-queues, which are set up at
drm_sched_init() time. This way, each driver announces how many run-queues it
requires (supports) per each GPU scheduler it creates. Note, that run-queues
correspond to scheduler "priorities", thus if the number of run-queues is set
to 1 at drm_sched_init(), then that scheduler supports a single run-queue,
i.e. single "priority". If a driver further sets a single entity per
run-queue, then this creates a 1-to-1 correspondence between a scheduler and
a scheduled entity.
Cc: Lucas Stach <l.stach@pengutronix.de >
Cc: Russell King <linux+etnaviv@armlinux.org.uk >
Cc: Qiang Yu <yuq825@gmail.com >
Cc: Rob Clark <robdclark@gmail.com >
Cc: Abhinav Kumar <quic_abhinavk@quicinc.com >
Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org >
Cc: Danilo Krummrich <dakr@redhat.com >
Cc: Matthew Brost <matthew.brost@intel.com >
Cc: Boris Brezillon <boris.brezillon@collabora.com >
Cc: Alex Deucher <alexander.deucher@amd.com >
Cc: Christian König <christian.koenig@amd.com >
Cc: Emma Anholt <emma@anholt.net >
Cc: etnaviv@lists.freedesktop.org
Cc: lima@lists.freedesktop.org
Cc: linux-arm-msm@vger.kernel.org
Cc: freedreno@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Link: https://lore.kernel.org/r/20231023032251.164775-1-luben.tuikov@amd.com
2023-10-26 12:03:47 -04:00
Mario Limonciello
64ffd2f1d0
drm/amd: Disable ASPM for VI w/ all Intel systems
...
Originally we were quirking ASPM disabled specifically for VI when
used with Alder Lake, but it appears to have problems with Rocket
Lake as well.
Like we've done in the case of dpm for newer platforms, disable
ASPM for all Intel systems.
Cc: stable@vger.kernel.org # 5.15+
Fixes: 0064b0ce85 ("drm/amd/pm: enable ASPM by default")
Reported-and-tested-by: Paolo Gentili <paolo.gentili@canonical.com >
Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com >
Reviewed-by: Alex Deucher <alexander.deucher@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-25 09:53:17 -04:00
Dave Airlie
0ecf4aa32b
Merge tag 'amd-drm-next-6.7-2023-10-20' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
...
amd-drm-next-6.7-2023-10-20:
amdgpu:
- SMU 13 updates
- UMSCH updates
- DC MPO fixes
- RAS updates
- MES 11 fixes
- Fix possible memory leaks in error pathes
- GC 11.5 fixes
- Kernel doc updates
- PSP updates
- APU IMU fixes
- Misc code cleanups
- SMU 11 fixes
- OD fix
- Frame size warning fixes
- SR-IOV fixes
- NBIO 7.11 updates
- NBIO 7.7 updates
- XGMI fixes
- devcoredump updates
amdkfd:
- Misc code cleanups
- SVM fixes
Signed-off-by: Dave Airlie <airlied@redhat.com >
From: Alex Deucher <alexander.deucher@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231020195043.4937-1-alexander.deucher@amd.com
2023-10-25 10:54:22 +10:00
Christian König
4984fc578a
drm/amdkfd: reserve a fence slot while locking the BO
...
Looks like the KFD still needs this.
Signed-off-by: Christian König <christian.koenig@amd.com >
Fixes: 8abc1eb298 ("drm/amdkfd: switch over to using drm_exec v3")
Acked-by: Alex Deucher <alexander.deucher@amd.com >
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231020123306.43978-1-christian.koenig@amd.com
2023-10-23 14:48:47 +02:00
Dave Airlie
7cd62eab9b
BackMerge tag 'v6.6-rc7' into drm-next
...
This is needed to add the msm pr which is based on a higher base.
Signed-off-by: Dave Airlie <airlied@redhat.com >
2023-10-23 18:20:06 +10:00
Luben Tuikov
d3df66fd98
drm/amdgpu: Remove redundant call to priority_is_valid()
...
Remove a redundant call to amdgpu_ctx_priority_is_valid() from
amdgpu_ctx_priority_permit(), which is called from amdgpu_ctx_init() which is
called from amdgpu_ctx_alloc() which is called from amdgpu_ctx_ioctl(), where
we've called amdgpu_ctx_priority_is_valid() already first thing in the
function.
Cc: Alex Deucher <Alexander.Deucher@amd.com >
Cc: Christian König <christian.koenig@amd.com >
Signed-off-by: Luben Tuikov <luben.tuikov@amd.com >
Acked-by: Christian König <christian.koenig@amd.com >
Link: https://lore.kernel.org/r/20231018010359.30393-1-luben.tuikov@amd.com
2023-10-21 20:27:15 -04:00
André Almeida
de009982c6
drm/amdgpu: Create version number for coredumps
...
Even if there's nothing currently parsing amdgpu's coredump files, if
we eventually have such tools they will be glad to find a version field
to properly read the file.
Create a version number to be displayed on top of coredump file, to be
incremented when the file format or content get changed.
Signed-off-by: André Almeida <andrealmeid@igalia.com >
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:29 -04:00
André Almeida
69619868d3
drm/amdgpu: Move coredump code to amdgpu_reset file
...
Giving that we use codedump just for device resets, move it's functions
and structs to a more semantic file, the amdgpu_reset.{c, h}.
Signed-off-by: André Almeida <andrealmeid@igalia.com >
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com >
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:29 -04:00
André Almeida
2d6a2a28cd
drm/amdgpu: Encapsulate all device reset info
...
To better organize struct amdgpu_device, keep all reset information
related fields together in a separated struct.
Signed-off-by: André Almeida <andrealmeid@igalia.com >
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com >
Reviewed-by: Shashank Sharma <shashank.sharma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Shiwu Zhang
723fac64d0
drm/amdgpu: support the port num info based on the capability flag
...
XGMI TA will set the capability flag to indicate whether the port_num
info is supported or not. KGD checks the flag and accordingly picks up
the right buffer format and send the right command to TA to retrieve
the info.
v2: simplify the code by reusing the same statement (lijo)
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com >
Acked-by: Lijo Lazar <lijo.lazar@amd.com >
Reviewed-by: Le Ma <le.ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Shiwu Zhang
e8a5ded36b
drm/amdgpu: prepare the output buffer for GET_PEER_LINKS command
...
Per the xgmi ta implementation, KGD needs to fill in node_ids
in concern into the shared command output buffer rather than the
command input buffer.
Input buffer is not used for GET_PEER_LINKS command execution.
In this way, xgmi ta can reuse the node info in the output buffer
just filled in and populate the same buffer with link info directly.
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com >
Reviewed-by: Le Ma <le.ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Tao Zhou
d9443ac4f9
drm/amdgpu: drop status query/reset for GCEA 9.4.3 and MMEA 1.8
...
PMFW will be responsible for them.
v2: remove query interfaces.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Shiwu Zhang
626121fce4
drm/amdgpu: update the xgmi ta interface header
...
Update the header file to the v20.00.00.13
v1: rename TA_COMMAND_XGMI__GET_GET_TOPOLOGY_INFO to
TA_COMMAND_XGMI__GET_TOPOLOGY_INFO
And also rename struct ta_xgmi_cmd_get_peer_link_info_output to
ta_xgmi_cmd_get_peer_link_info accordingly
v2: add structs to support xgmi GET_EXTEND_PEER_LINK command
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com >
Reviewed-by: Le Ma <le.ma@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Tao Zhou
8096df7664
drm/amdgpu: add set/get mca debug mode operations
...
Record the debug mode status in RAS.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Tao Zhou
21226f02d7
drm/amdgpu: replace reset_error_count with amdgpu_ras_reset_error_count
...
Simplify the code.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com >
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00
Li Ma
9d7a965e22
drm/amdgpu: add clockgating support for NBIO v7.7.1
...
add clockgating support for NBIO ip 7.7.1
Signed-off-by: Li Ma <li.ma@amd.com >
Reviewed-by: Tim Huang <Tim.Huang@amd.com >
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com >
Signed-off-by: Alex Deucher <alexander.deucher@amd.com >
2023-10-20 15:11:28 -04:00