drm/amdgpu: rework how isolation is enforced v2

Limiting the number of available VMIDs to enforce isolation causes some
issues with gang submit and applying certain HW workarounds which
require multiple VMIDs to work correctly.

So instead start to track all submissions to the relevant engines in a
per partition data structure and use the dma_fences of the submissions
to enforce isolation similar to what a VMID limit does.

v2: use ~0l for jobs without isolation to distinct it from kernel
    submissions which uses NULL for the owner. Add some warning when we
    are OOM.

Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Christian König
2025-01-15 13:44:26 +01:00
committed by Alex Deucher
parent 7f11c59e07
commit bd22e44ad4
6 changed files with 155 additions and 35 deletions

View File

@@ -405,6 +405,25 @@ int amdgpu_sync_clone(struct amdgpu_sync *source, struct amdgpu_sync *clone)
return 0;
}
/**
* amdgpu_sync_move - move all fences from src to dst
*
* @src: source of the fences, empty after function
* @dst: destination for the fences
*
* Moves all fences from source to destination. All fences in destination are
* freed and source is empty after the function call.
*/
void amdgpu_sync_move(struct amdgpu_sync *src, struct amdgpu_sync *dst)
{
unsigned int i;
amdgpu_sync_free(dst);
for (i = 0; i < HASH_SIZE(src->fences); ++i)
hlist_move_list(&src->fences[i], &dst->fences[i]);
}
/**
* amdgpu_sync_push_to_job - push fences into job
* @sync: sync object to get the fences from