Commit Graph

57 Commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Andrew Martin
73463e26f7 drm/amdkfd: Disable MQD queue priority
This solves a priority inversion issue, caused by the language
runtime making high-priority queues wait for activity on
lower-priority queues.

Signed-off-by: Andrew Martin <andrew.martin@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-19 12:16:11 -05:00
Lang Yu
3aca6f835b drm/amdkfd: Adjust parameter of allocate_mqd
Make allocate_mqd consistent with other callbacks.
Prepare for next patch to use mqd_manager->mqd_size.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-29 12:26:58 -05:00
Mukul Joshi
0991a4c192 drm/amdkfd: Check preemption status on all XCDs
This patch adds the following functionality:
- Check the queue preemption status on all XCDs in a partition
  for GFX 9.4.3.
- Update the queue preemption debug message to print the queue
  doorbell id for which preemption failed.
- Change the signature of check preemption failed function to
  return a bool instead of uint32_t and pass the MQD manager
  as an argument.

Suggested-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:12 -04:00
Mukul Joshi
26d97182bb drm/amdkfd: Rename read_doorbell_id in MQD functions
Rename read_doorbell_id function to a more meaningful name,
implying what it is used for. No functional change.

Suggested-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2024-03-20 13:38:12 -04:00
Mukul Joshi
fc6efed2c7 drm/amdkfd: Update CU masking for GFX 9.4.3
The CU mask passed from user-space will change based on
different spatial partitioning mode. As a result, update
CU masking code for GFX9.4.3 to work for all partitioning
modes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-09-11 18:17:27 -04:00
Alex Deucher
c99a2e7ae2 drm/amdkfd: drop IOMMUv2 support
Now that we use the dGPU path for all APUs, drop the
IOMMUv2 support.

v2: drop the now unused queue manager functions for gfx7/8 APUs

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-08-11 14:47:25 -04:00
Jonathan Kim
69a8c3ae2d drm/amdkfd: apply trap workaround for gfx11
Due to a HW bug, waves in only half the shader arrays can enter trap.

When starting a debug session, relocate all waves to the first shader
array of each shader engine and mask off the 2nd shader array as
unavailable.

When ending a debug session, re-enable the 2nd shader array per
shader engine.

User CU masking per queue cannot be guaranteed to remain functional
if requested during debugging (e.g. user cu mask requests only 2nd shader
array as an available resource leading to zero HW resources available)
nor can runtime be alerted of any of these changes during execution.

Make user CU masking and debugging mutual exclusive with respect to
availability.

If the debugger tries to attach to a process with a user cu masked
queue, return the runtime status as enabled but busy.

If the debugger tries to attach and fails to reallocate queue waves to
the first shader array of each shader engine, return the runtime status
as enabled but with an error.

In addition, like any other mutli-process debug supported devices,
disable trap temporary setup per-process to avoid performance impact from
setup overhead.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 12:35:52 -04:00
Mukul Joshi
e2069a7b08 drm/amdkfd: Add XCC instance to kgd2kfd interface (v3)
Gfx 9 starts to have multiple XCC instances in one device. Add instance
parameter to kgd2kfd functions where XCC instance was hard coded as 0.
Also, update code to pass the correct instance number when running
on a multi-XCC setup.

v2: introduce the XCC instance to gfx v11 (Morris)
v3: rebase (Alex)

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:42:43 -04:00
Mukul Joshi
2f77b9a242 drm/amdkfd: Update MQD management on multi XCC setup
Update MQD management for both HIQ and user-mode compute
queues on a multi XCC setup. MQDs needs to be allocated,
initialized, loaded and destroyed for each XCC in the KFD
node.

v2: squash in fix "drm/amdkfd: Fix SDMA+HIQ HQD allocation on GFX9.4.3"

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:42:36 -04:00
Mukul Joshi
8dc1db3172 drm/amdkfd: Introduce kfd_node struct (v5)
Introduce a new structure, kfd_node, which will now represent
a compute node. kfd_node is carved out of kfd_dev structure.
kfd_dev struct now will become the parent of kfd_node, and will
store common resources such as doorbells, GTT sub-alloctor etc.
kfd_node struct will store all resources specific to a compute
node, such as device queue manager, interrupt handling etc.

This is the first step in adding compute partition support in KFD.

v2: introduce kfd_node struct to gc v11 (Hawking)
v3: make reference to kfd_dev struct through kfd_node (Morris)
v4: use kfd_node instead for kfd isr/mqd functions (Morris)
v5: rebase (Alex)

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Morris Zhang <Shiwu.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-09 09:42:27 -04:00
Rajneesh Bhardwaj
d87f36a063 drm/amdkfd: update SPDX license header
Update the SPDX License header for all the KFD files.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-14 15:08:40 -05:00
Mukul Joshi
a439b890db drm/amdkfd: Consolidate MQD manager functions
A few MQD manager functions are duplicated for all versions of
MQD manager. Remove this duplication by moving the common
functions into kfd_mqd_manager.c file.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-09 16:57:51 -05:00
David Yat Sin
3a9822d7bd drm/amdkfd: CRIU checkpoint and restore queue control stack
Checkpoint contents of queue control stacks on CRIU dump and restore them
during CRIU restore.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 17:59:52 -05:00
David Yat Sin
42c6c48214 drm/amdkfd: CRIU checkpoint and restore queue mqds
Checkpoint contents of queue MQD's on CRIU dump and restore them during
CRIU restore.

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: David Yat Sin <david.yatsin@amd.com>
Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2022-02-07 17:59:52 -05:00
Graham Sider
420185fdad drm/amdkfd: replace kgd_dev in hqd/mqd kfd2kgd funcs
Modified definitions:

- hqd_load
- hiq_mqd_load
- hqd_sdma_load
- hqd_dump
- hqd_sdma_dump
- hqd_is_occupied
- hqd_destroy
- hqd_sdma_is_occupied
- hqd_sdma_destroy

Signed-off-by: Graham Sider <Graham.Sider@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-11-17 16:58:01 -05:00
Lang Yu
7c695a2c54 drm/amdkfd: Remove cu mask from struct queue_properties(v2)
Actually, cu_mask has been copied to mqd memory and
does't have to persist in queue_properties. Remove it
from queue_properties.

And use struct mqd_update_info to store such properties,
then pass it to update queue operation.

v2:
* Rename pqm_update_queue to pqm_update_queue_properties.
* Rename struct queue_update_info to struct mqd_update_info.
* Rename pqm_set_cu_mask to pqm_update_mqd.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:13 -04:00
Lang Yu
c6e559eb3b drm/amdkfd: Add an optional argument into update queue operation(v2)
Currently, queue is updated with data in queue_properties.
And all allocated resource in queue_properties will not
be freed until the queue is destroyed.

But some properties(e.g., cu mask) bring some memory
management headaches(e.g., memory leak) and make code
complex. Actually they have been copied to mqd and
don't have to persist in queue_properties.

Add an argument into update queue to pass such properties,
then we can remove them from queue_properties.

v2: Don't use void *.

Suggested-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-10-28 14:26:13 -04:00
Oak Zeng
51a0f459f1 drm/amdkfd: Check HIQ's MQD for queue preemption status
MEC firmware can silently fail the queue preemption request
without time out. In this case, HIQ's MQD's queue_doorbell_id
will be set. Check this field to see whether last queue preemption
was successful or not.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Suggested-by: Jay Cornwall <Jay.Cornwall@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2021-03-23 22:59:25 -04:00
Yong Zhao
7633c5e0bd drm/amdkfd: DIQ should not use HIQ way to allocate memory
In the mqd_diq_sdma buffer, there should be only one HIQ mqd. All DIQs
should be allocated somewhere else using the regular way.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-22 14:27:11 -05:00
Yong Zhao
d7c0b0477b drm/amdkfd: Delete KFD_MQD_TYPE_COMPUTE
It is the same as KFD_MQD_TYPE_CP, so delete it. As a result, we will
have one less mqd mananger per device.

Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-11-22 14:27:11 -05:00
Oak Zeng
8636e53c47 drm/amdkfd: Separate mqd allocation and initialization
Introduce a new mqd allocation interface and split the original
init_mqd function into two functions: allocate_mqd and init_mqd.
Also renamed uninit_mqd to free_mqd. This is preparation work to
fix a circular lock dependency.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-11 12:56:59 -05:00
Oak Zeng
0ccbc7cdf5 drm/amdkfd: CP queue priority controls
Translate queue priority into pipe priority and write to MQDs.
The priority values are used to perform queue and pipe arbitration.

Signed-off-by: Oak Zeng <Oak.Zeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-11 12:35:15 -05:00
Felix Kuehling
bb2d2128a5 drm/amdkfd: Simplify eviction state logic
Always mark evicted queues with q->properties.is_evicted = true, even
queues that are inactive for other reason. This simplifies maintaining
the eviction state as it doesn't require updating is_evicted when other
queue activation conditions change.

On the other hand, we now need to check those other queue activation
conditions whenever an evicted queues is restored. To minimize code
duplication, move the queue activation check into a macro so it can be
maintained in one central place.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Philip Cox <Philip.Cox@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-06-11 11:57:45 -05:00
Oak Zeng
0803e7a9e8 drm/amdkfd: Allocate hiq and sdma mqd from mqd trunk
Instead of allocat hiq and sdma mqd from sub-allocator, allocate
them from a mqd trunk pool. This is done for all asics

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:02 -05:00
Oak Zeng
d1f8f0d17d drm/amdkfd: Move non-sdma mqd allocation out of init_mqd
This is preparation work to introduce more mqd allocation
scheme

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:02 -05:00
Oak Zeng
6c6cde557a drm/amdkfd: Add mqd size in mqd manager struct
Also initialize mqd size on mqd manager initialization

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:02 -05:00
Oak Zeng
59f650a06f drm/amdkfd: Introduce DIQ type mqd manager
With introduction of new mqd allocation scheme for HIQ,
DIQ and HIQ use different mqd allocation scheme, DIQ
can't reuse HIQ mqd manager

Signed-off-by: Oak Zeng <ozeng@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-05-24 12:21:02 -05:00
Kevin Wang
cac734c2db drm/amdkfd: use init_mqd function to allocate object for hid_mqd (CI)
if use the legacy method to allocate object, when mqd_hiq need to run
uninit code, it will be cause WARNING call trace.

eg: (s3 suspend test)
[   34.918944] Call Trace:
[   34.918948]  [<ffffffff92961dc1>] dump_stack+0x19/0x1b
[   34.918950]  [<ffffffff92297648>] __warn+0xd8/0x100
[   34.918951]  [<ffffffff9229778d>] warn_slowpath_null+0x1d/0x20
[   34.918991]  [<ffffffffc03ce1fe>] uninit_mqd_hiq_sdma+0x4e/0x50 [amdgpu]
[   34.919028]  [<ffffffffc03d0ef7>] uninitialize+0x37/0xe0 [amdgpu]
[   34.919064]  [<ffffffffc03d15a6>] kernel_queue_uninit+0x16/0x30 [amdgpu]
[   34.919086]  [<ffffffffc03d26c2>] pm_uninit+0x12/0x20 [amdgpu]
[   34.919107]  [<ffffffffc03d4915>] stop_nocpsch+0x15/0x20 [amdgpu]
[   34.919129]  [<ffffffffc03c1dce>] kgd2kfd_suspend.part.4+0x2e/0x50 [amdgpu]
[   34.919150]  [<ffffffffc03c2667>] kgd2kfd_suspend+0x17/0x20 [amdgpu]
[   34.919171]  [<ffffffffc03c103a>] amdgpu_amdkfd_suspend+0x1a/0x20 [amdgpu]
[   34.919187]  [<ffffffffc02ec428>] amdgpu_device_suspend+0x88/0x3a0 [amdgpu]
[   34.919189]  [<ffffffff922e22cf>] ? enqueue_entity+0x2ef/0xbe0
[   34.919205]  [<ffffffffc02e8220>] amdgpu_pmops_suspend+0x20/0x30 [amdgpu]
[   34.919207]  [<ffffffff925c56ff>] pci_pm_suspend+0x6f/0x150
[   34.919208]  [<ffffffff925c5690>] ? pci_pm_freeze+0xf0/0xf0
[   34.919210]  [<ffffffff926b45c6>] dpm_run_callback+0x46/0x90
[   34.919212]  [<ffffffff926b49db>] __device_suspend+0xfb/0x2a0
[   34.919213]  [<ffffffff926b4b9f>] async_suspend+0x1f/0xa0
[   34.919214]  [<ffffffff922c918f>] async_run_entry_fn+0x3f/0x130
[   34.919216]  [<ffffffff922b9d4f>] process_one_work+0x17f/0x440
[   34.919217]  [<ffffffff922bade6>] worker_thread+0x126/0x3c0
[   34.919218]  [<ffffffff922bacc0>] ? manage_workers.isra.25+0x2a0/0x2a0
[   34.919220]  [<ffffffff922c1c31>] kthread+0xd1/0xe0
[   34.919221]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
[   34.919222]  [<ffffffff92974c1d>] ret_from_fork_nospec_begin+0x7/0x21
[   34.919224]  [<ffffffff922c1b60>] ? insert_kthread_work+0x40/0x40
[   34.919224] ---[ end trace 38cd9f65c963adad ]---

Signed-off-by: Kevin Wang <kevin1.wang@amd.com>
Reviewed-by: Oak Zeng <Oak.Zeng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2019-02-27 22:19:07 -05:00
Felix Kuehling
39e7f33186 drm/amdkfd: Add CU-masking ioctl to KFD
CU-masking allows a KFD client to control the set of CUs used by a
user mode queue for executing compute dispatches. This can be used
for optimizing the partitioning of the GPU and minimize conflicts
between concurrent tasks.

Signed-off-by: Flora Cui <flora.cui@amd.com>
Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-14 19:05:59 -04:00
Felix Kuehling
1cd106ecfc drm/amdkfd: Stop using GFP_NOIO explicitly
This is no longer needed with the memalloc_nofs_save/restore in
dqm_lock/unlock.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-07-11 22:32:45 -04:00
Felix Kuehling
ccb76b149e drm/amdkfd: Remove initialization of cp_hqd_ib_control on CIK
The initialization is not necessary. amd-kfd-staging and ROCm
releases have worked without it for two years.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-05-01 17:56:09 -04:00
Felix Kuehling
d1853f42b6 drm/amdkfd: GFP_NOIO while holding locks taken in MMU notifier
When an MMU notifier runs in memory reclaim context, it can deadlock
trying to take locks that are already held in the thread causing the
memory reclaim. The solution is to avoid memory reclaim while holding
locks that are taken in MMU notifiers by using GFP_NOIO.

This commit fixes memory allocations done while holding the dqm->lock
which is needed in the MMU notifier (dqm->ops.evict_process_queues).

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-03-23 15:32:31 -04:00
Felix Kuehling
26103436da drm/amdkfd: Implement KFD process eviction/restore
When the TTM memory manager in KGD evicts BOs, all user mode queues
potentially accessing these BOs must be evicted temporarily. Once
user mode queues are evicted, the eviction fence is signaled,
allowing the migration of the BO to proceed.

A delayed worker is scheduled to restore all the BOs belonging to
the evicted process and restart its queues.

During suspend/resume of the GPU we also evict all processes to allow
KGD to save BOs in system memory, since VRAM will be lost.

v2:
* Account for eviction when updating of q->is_active in MQD manager

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-02-06 20:32:45 -05:00
Felix Kuehling
ee04955af6 drm/amdkfd: Add dGPU support to the MQD manager
On dGPUs don't set ATC addressing bits and use MTYPE_UC for coherent
memory.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2018-01-04 17:17:45 -05:00
Felix Kuehling
851a645efd drm/amdkfd: Add debugfs support to KFD
This commit adds several debugfs entries for kfd:

kfd/hqds: dumps all HQDs on all GPUs for KFD-controlled compute and
    SDMA RLC queues

kfd/mqds: dumps all MQDs of all KFD processes on all GPUs

kfd/rls: dumps HWS runlists on all GPUs

Signed-off-by: Yong Zhao <yong.zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-27 18:29:49 -05:00
Felix Kuehling
115c8c4104 drm/amdkfd: Use order_base_2 to get log2 of buffes sizes
Replace (ffs(size) - 1) with order_base_2(size) as a more straight
forward way to get log2 of buffer sizes.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-06 14:52:28 -05:00
Felix Kuehling
6d56693025 drm/amdkfd: Hardware DWORD size is 4 bytes
Don't use sizeof(uint32_t) or similar types for hardware or firmware
DWORD size. The hardware and firmware don't care about Linux types.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-06 14:52:27 -05:00
Felix Kuehling
97b9ad12ba drm/amdkfd: Use ASIC-specific SDMA MQD type
Signed-off-by: shaoyun liu <shaoyun.liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-01 19:22:01 -04:00
Felix Kuehling
7ce66118aa drm/amd: Update kgd_kfd interface for resuming SDMA queues
Add wptr and mm parameters to hqd_sdma_load and pass these parameters
from device_queue_manager through the mqd_manager.

SDMA doesn't support polling while the engine believes it's idle. The
driver must update the wptr. The new parameters will be used for looking
up the updated value from the specified mm when SDMA queues are resumed
after being disabled.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-01 19:21:58 -04:00
shaoyunl
d12fb13f23 drm/amdkfd: Fix SDMA ring buffer size calculation
ffs function return the position of the first bit set on 1 based.
(bit zero returns 1).

Signed-off-by: shaoyun liu <shaoyun.liu@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-26 11:31:32 +02:00
Jay Cornwall
bba9662db7 drm/amdkfd: Disable CP/SDMA ring/doorbell in MQD
The MQD represents an inactive context and should not have ring or
doorbell enable bits set. Doing so interferes with HWS which streams
the MQD onto the HQD. If enable bits are set this activates the ring
or doorbell before the HQD is fully configured.

Signed-off-by: Jay Cornwall <Jay.Cornwall@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-11-01 19:21:27 -04:00
Felix Kuehling
70539bd795 drm/amd: Update MEC HQD loading code for KFD
Various bug fixes and improvements that accumulated over the last two
years.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:17 -04:00
Felix Kuehling
32fa821958 drm/amdkfd: Handle remaining BUG_ONs more gracefully v2
In most cases, BUG_ONs can be replaced with WARN_ON with an error
return. In some void functions just turn them into a WARN_ON and
possibly an early exit.

v2:
* Cleaned up error handling in pm_send_unmap_queue
* Removed redundant WARN_ON in kfd_process_destroy_delayed

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:12 -04:00
Felix Kuehling
4f52f2256e drm/amdkfd: Remove BUG_ONs for NULL pointer arguments
Remove BUG_ONs that check for NULL pointer arguments that are
dereferenced in the same function. Dereferencing the NULL pointer
will generate a BUG anyway, so the explicit check is redundant and
unnecessary overhead.

Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:09 -04:00
Kent Russell
dbf56ab11a drm/amdkfd: Remove usage of alloc(sizeof(struct...
See https://kernel.org/doc/html/latest/process/coding-style.html
under "14) Allocating Memory" for rationale behind removing the
x=alloc(sizeof(struct) style and using x=alloc(sizeof(*x) instead

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:08 -04:00
Kent Russell
4eacc26b3b drm/amdkfd: Change x==NULL/false references to !x
Upstream prefers the !x notation to x==NULL or x==false. Along those lines
change the ==true or !=NULL references as well. Also make the references
to !x the same, excluding () for readability.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:06 -04:00
Kent Russell
79775b627d drm/amdkfd: Consolidate and clean up log commands
Consolidate log commands so that dev_info(NULL, "Error...") uses the more
accurate pr_err, remove the module name from the log (can be seen via
dynamic debugging with +m), and the function name (can be seen via
dynamic debugging with +f). We also don't need debug messages saying
what function we're in. Those can be added by devs when needed

Don't print vendor and device ID in error messages. They are typically
the same for all GPUs in a multi-GPU system. So this doesn't add any
value to the message.

Lastly, remove parentheses around %d, %i and 0x%llX.
According to kernel.org:
"Printing numbers in parentheses (%d) adds no value and should be
avoided."

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Yong Zhao <Yong.Zhao@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:05 -04:00
Kent Russell
8eabaf54cf drm/amdkfd: Clean up KFD style errors and warnings v2
Using checkpatch.pl -f <file> showed a number of style issues. This
patch addresses as many of them as possible. Some long lines have been
left for readability, but attempts to minimize them have been made.

v2: Broke long lines in gfx_v7 get_fw_version

Signed-off-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2017-08-15 23:00:04 -04:00