Commit Graph

82 Commits

Author SHA1 Message Date
Yang Wang
a6571045cf drm/amdgpu: fix gpu idle power consumption issue for gfx v12
Older versions of the MES firmware may cause abnormal GPU power consumption.
When performing inference tasks on the GPU (e.g., with Ollama using ROCm),
the GPU may show abnormal power consumption in idle state and incorrect GPU load information.
This issue has been fixed in firmware version 0x8b and newer.

Closes: https://github.com/ROCm/ROCm/issues/5706
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4e22a5fe6e)
2026-03-06 17:11:46 -05:00
Mario Limonciello
6b0d812971 drm/amd: Disable MES LR compute W/A
A workaround was introduced in commit 1fb710793c ("drm/amdgpu: Enable
MES lr_compute_wa by default") to help with some hangs observed in gfx1151.

This WA didn't fully fix the issue.  It was actually fixed by adjusting
the VGPR size to the correct value that matched the hardware in commit
b42f3bf953 ("drm/amdkfd: bump minimum vgpr size for gfx1151").

There are reports of instability on other products with newer GC microcode
versions, and I believe they're caused by this workaround. As we don't
need the workaround any more, remove it.

Fixes: b42f3bf953 ("drm/amdkfd: bump minimum vgpr size for gfx1151")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9973e64bd6)
Cc: stable@vger.kernel.org
2026-02-25 17:58:06 -05:00
Mario Limonciello (AMD)
e291729873 drm/amd: Convert DRM_*() to drm_*()
The drm_*() macros include the device which is helpful for debugging
issues in multi-GPU systems.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05 16:59:55 -05:00
Jack Xiao
d09c7e266c drm/amdgpu/mes: add multi-xcc support
a. extend mes pipe instances to num_xcc * max_mes_pipe
b. initialize mes schq/kiq pipes per xcc
c. submit mes packet to mes ring according to xcc_id

v2: rebase (Alex)

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-08 13:56:29 -05:00
Jonathan Kim
72ea12f6be drm/amdgpu: update remove after reset flag for MES remove queue
Remove queue after reset flag is required to remove a queue that has
been successfully reset to clean up the MES' internal state.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
Jonathan Kim
0ef930e1fa drm/amdgpu: fix hung reset queue array memory allocation
By design the MES will return an array result that is twice the number
of hung doorbells it can report.

i.e. if up k reported doorbells are supported, then the
second half of the array, also of length k, holds the HQD information
(type/queue/pipe) where queue 1 corresponds to index 0 and k,
queue 2 corresponds to index 1 and k + 1 etc ...

The driver will use the HDQ info to target queue/pipe reset for
hardware scheduled user compute queues.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
Jonathan Kim
d0de79f66a drm/amdgpu: fix gfx12 mes packet status return check
GFX12 MES uses low 32 bits of status return for success (1 or 0)
and high bits for debug information if low bits are 0.

GFX11 MES doesn't do this so checking full 64-bit status return
for 1 or 0 is still valid.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
2025-10-13 14:14:16 -04:00
Mario Limonciello
1fb710793c drm/amdgpu: Enable MES lr_compute_wa by default
The MES set resources packet has an optional bit 'lr_compute_wa'
which can be used for preventing MES hangs on long compute jobs.

Set this bit by default.

Co-developed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:22:38 -04:00
Jesse.Zhang
724471254e drm/amdgpu/mes12: implement detect and reset callback
Implement support for the hung queue detect and reset
functionality.

v2: Always use AMDGPU_MES_SCHED_PIPE

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 17:38:38 -04:00
Jesse.Zhang
6abd725fdf drm/amd/amdgpu: Implement MES suspend/resume gang functionality for v12
This commit implements the actual MES (Micro Engine Scheduler) suspend
and resume gang operations for version 12 hardware. Previously these
functions were just stubs returning success.

v2: Always use AMDGPU_MES_SCHED_PIPE

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
2025-09-05 17:38:07 -04:00
Colin Ian King
4320fd9e0d drm/amd/amdgpu: Fix a less than zero check on a uint32_t struct field
Currently the error check from the call to mes_v12_inv_tlb_convert_hub_id
is always false because a uint32_t struct field hub_id is being used to
to perform the less than zero error check. Fix this by using the int
variable ret to perform the check.

Fixes: 87e6505261 ("drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12")
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-05 15:59:11 -04:00
Shaoyun Liu
87e6505261 drm/amd/amdgpu : Use the MES INV_TLBS API for tlb invalidation on gfx12
From MES version 0x81, it provide the new API INV_TLBS that support
invalidate tlbs with PASID.

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-08-27 13:57:51 -04:00
Alex Deucher
0180e0a5dd drm/amdgpu/mes: add compatibility checks for set_hw_resource_1
Seems some older MES firmware versions do not properly support
this packet.  Add back some the compatibility checks.

v2: switch to fw version check (Shaoyun)

Fixes: f81cd79311 ("drm/amd/amdgpu: Fix MES init sequence")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4295
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: shaoyun.liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-06-24 09:54:33 -04:00
Alex Deucher
2e828a25f8 drm/amdgpu/mes: use correct MES pipe for resets
Use the KIQ pipe for kernel queues and the SCHED pipe for
user queues.

Fixes: 2408b0272b ("drm/amdgpu/mes: consolidate on a single mes reset callback")
Cc: Michael Chen <Michael.Chen@amd.com>
Cc: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:16:14 -04:00
Alex Deucher
2408b0272b drm/amdgpu/mes: consolidate on a single mes reset callback
Use the legacy one as it covers both kernel queues and
user queues.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:16:07 -04:00
Alex Deucher
6535348a3e drm/amdgpu/mes: remove more unused functions
These were leftover from mes bring up and are unused.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:15:57 -04:00
Jesse.Zhang
ad7c088e31 drm/amdgpu: Fix API status offset for MES queue reset
The mes_v11_0_reset_hw_queue and mes_v12_0_reset_hw_queue functions were
using the wrong union type (MESAPI__REMOVE_QUEUE) when getting the offset
for api_status. Since these functions handle queue reset operations, they
should use MESAPI__RESET union instead.

This fixes the polling of API status during hardware queue reset operations
in the MES for both v11 and v12 versions.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-By: Shaoyun.liu <Shaoyun.liu@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-30 18:10:06 -04:00
Alex Deucher
a83be6e479 drm/amdgpu/mes12: add conversion for priority levels
Convert driver priority levels to MES11 priority levels.
At the moment they are the same, but they may not always
be.

Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Jesse.Zhang <Jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-21 10:56:36 -04:00
Alex Deucher
2e0454b730 drm/amdgpu: adjust enforce_isolation handling
Switch from a bool to an enum and allow more options
for enforce isolation.  There are now 3 modes of operation:
- Disabled (0)
- Enabled (serialization and cleaner shader) (1)
- Enabled in legacy mode (no serialization or cleaner shader) (2)
This provides better flexibility for more use cases.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:15 -04:00
Alex Deucher
b86fd212f3 drm/amdgpu/mes12: use the device value for enforce isolation
Use the local setting rather than the global parameter.

Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-11 16:58:12 -04:00
Alex Deucher
9e2bbba1d5 drm/amdgpu/mes: centralize gfx_hqd mask management
Move it to amdgpu_mes to align with the compute and
sdma hqd masks. No functional change.

v2: rebase on new changes
v3: misc optimizations

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Sunil Khatri<sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:21 -04:00
Arunpravin Paneer Selvam
ac4a1f7f13 drm/amdgpu: Remove the MES self test
Remove MES self test as this conflicts the userqueue fence
interrupts.

v2:(Christian)
  - remove the amdgpu_mes_self_test() function and any now unused code.

Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:16 -04:00
Arvind Yadav
9d3afcb7b9 drm/amdgpu: fix MES GFX mask
Current MES GFX mask prevents FW to enable oversubscription. This patch
does the following:
- Fixes the mask values and adds a description for the same
- Removes the central mask setup and makes it IP specific, as it would
  be different when the number of pipes and queues are different.

v2: squash in fix from Shashank

Cc: Christian König <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Shashank Sharma <shashank.sharma@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:15 -04:00
Alex Deucher
9e7b08d239 drm/amdgpu/mes12: optimize MES pipe FW version fetching
Don't fetch it again if we already have it.  It seems the
registers don't reliably have the value at resume in some
cases.

Fixes: 785f0f9fe7 ("drm/amdgpu: Add mes v12_0 ip block support (v4)")
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-04-08 16:48:13 -04:00
Shaoyun Liu
f81cd79311 drm/amd/amdgpu: Fix MES init sequence
When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-03-13 23:16:08 -04:00
Alex Deucher
4343f814e5 drm/amdgpu/mes12: drop amdgpu_mes_suspend()/amdgpu_mes_resume() calls
They are noops on GFX12.  There is no suspend/resume all support
in firmware so the function doesn't do anything.  KFD already
handles its own queues and they should already be unmapped at this
point so even if this runs, it's not doing anything.

Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-27 15:52:43 -05:00
Alex Deucher
27b7915147 drm/amdgpu/mes: keep enforce isolation up to date
Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f0 ("drm/amd/amdgpu: limit single process inside MES")
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Shaoyun Liu <shaoyun.liu@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:44:00 -05:00
Likun Gao
71209c9663 drm/amdgpu: correct the name of mes_pipe structure
Correct the structure name admgpu_mes_pipe to amdgpu_mes_pipe.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-25 11:43:58 -05:00
Alex Deucher
f3e10e1a0c drm/amdgpu/mes12: allocate hw_resource_1 buffer once
Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Reviewed-by: Shaoyun.liu <shaouyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-17 14:09:29 -05:00
Alex Deucher
7845438718 drm/amdgpu/mes: Add cleaner shader fence address handling in MES for GFX12
This commit introduces enhancements to the handling of the cleaner
shader fence in the AMDGPU MES driver:

- The MES (Microcode Execution Scheduler) now sends a PM4 packet to the
  KIQ (Kernel Interface Queue) to request the cleaner shader, ensuring
  that requests are handled in a controlled manner and avoiding the
  race conditions.
- The CP (Compute Processor) firmware has been updated to use a private
  bus for accessing specific registers, avoiding unnecessary operations
  that could lead to issues in VF (Virtual Function) mode.
- The cleaner shader fence memory address is now set correctly in the
  `mes_set_hw_res_pkt` structure, allowing for proper synchronization of
  the cleaner shader execution.

Cc: Christian König <christian.koenig@amd.com>
Cc: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Suggested-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-02-12 21:06:37 -05:00
Shaoyun Liu
335acfb64e drm/amd/amdgpu: Enable scratch data dump for mes 12
MES internal will check CP_MES_MSCRATCH_LO/HI register to set scratch
data location during ucode start, driver side need to start the MES
one by one with different setting for each pipe

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-01-24 09:56:13 -05:00
Jesse.zhang@amd.com
a73a83241e drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmio
Reset gfx/compute queue through mmio based on me_id and queue_id.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11 17:32:13 -05:00
Jesse.zhang@amd.com
0f8666138f drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmio
Reset sdma queue through mmio based on me_id and queue_id.

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-11 17:32:06 -05:00
Prike Liang
66f4f7d5aa drm/amdgpu: reduce the mmio writes in kiq setting
There's no need to perform the two MMIO writes in the KIQ
Setting registers programmed period, and reducing the MMIO
writes will save the driver loading time.

Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-12-10 10:26:45 -05:00
Shaoyun Liu
8521e3c5f0 drm/amd/amdgpu: limit single process inside MES
This is for MES to limit only one process for the user queues

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-12 17:02:04 -05:00
Jack Xiao
cfe98204a0 drm/amdgpu/mes12: correct kiq unmap latency
Correct kiq unmap queue timeout value.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-11-11 12:22:58 -05:00
Michael Chen
144df260f3 drm/amdgpu/mes: fix issue of writing to the same log buffer from 2 MES pipes
With Unified MES enabled in gfx12, need separate event log buffer for the
2 MES pipes to avoid data overwrite.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-15 11:21:08 -04:00
Sunil Khatri
692d2cd180 drm/amdgpu: update the handle ptr in hw_fini
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_fini.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_fini on suspend
and some followed clean up.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri
58608034ed drm/amdgpu: update the handle ptr in hw_init
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_init.

Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_init on resume.

v2: squash in isp fix

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:03:25 -04:00
Sunil Khatri
7feb4f3ad8 drm/amdgpu: update the handle ptr in resume
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of resume.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:50 -04:00
Sunil Khatri
982d7f9bfe drm/amdgpu: update the handle ptr in suspend
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of suspend.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-07 14:02:45 -04:00
Sunil Khatri
36aa9ab9c0 drm/amdgpu: update the handle ptr in sw_fini
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_fini.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:43 -04:00
Sunil Khatri
d5347e8d27 drm/amdgpu: update the handle ptr in sw_init
update the *handle to amdgpu_ip_block ptr for all
functions pointers of sw_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:37 -04:00
Sunil Khatri
3138ab2c5b drm/amdgpu: update the handle ptr in late_init
Update the ptr handle to amdgpu_ip_block ptr in all
the functions of late_init function ptr.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:31 -04:00
Sunil Khatri
146b085ead drm/amdgpu: update the handle ptr in early_init
update the handle ptr to amdgpu_ip_block ptr
for all functions pointers on early_init.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-10-01 17:40:22 -04:00
Jack Xiao
4771d2ecb7 drm/amdgpu/mes12: set enable_level_process_quantum_check
enable_level_process_quantum_check is requried to enable process
quantum based scheduling.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-25 12:55:14 -04:00
Alex Deucher
84f76408ab drm/amdgpu/mes12: reduce timeout
The firmware timeout is 2s.  Reduce the driver timeout to
2.1 seconds to avoid back pressure on queue submissions.

Fixes: 94b51a3d01 ("drm/amdgpu/mes12: increase mes submission timeout")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-18 16:15:13 -04:00
Jack Xiao
3c75518cf2 drm/amdgpu/mes12: switch SET_SHADER_DEBUGGER pkt to mes schq pipe
The SET_SHADER_DEBUGGER packet must work with the added
hardware queue, switch the packet submitting to mes schq pipe.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org # 6.11.x
2024-09-18 16:14:27 -04:00
Jack Xiao
52491d97aa drm/amdgpu/mes: add mes mapping legacy queue switch
For mes11 old firmware has issue to map legacy queue,
add a flag to switch mes to map legacy queue.

Fixes: f9d8c5c785 ("drm/amdgpu/gfx: enable mes to map legacy queue support")
Reported-by: Andrew Worsley <amworsley@gmail.com>
Link: https://lists.freedesktop.org/archives/amd-gfx/2024-August/112773.html
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-29 13:41:02 -04:00
Alex Deucher
32aada4d0a drm/amdgpu/mes12: add API for user queue reset
Add API for resetting user queues.

Acked-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-08-16 14:25:09 -04:00