drm/amdgpu: add gang submit frontend v6

Allows submitting jobs as gang which needs to run on multiple engines at the
same time.

All members of the gang get the same implicit, explicit and VM dependencies. So
no gang member will start running until everything else is ready.

The last job is considered the gang leader (usually a submission to the GFX
ring) and used for signaling output dependencies.

Each job is remembered individually as user of a buffer object, so there is no
joining of work at the end.

v2: rebase and fix review comments from Andrey and Yogesh
v3: use READ instead of BOOKKEEP for now because of VM unmaps, set gang
    leader only when necessary
v4: fix order of pushing jobs and adding fences found by Trigger.
v5: fix job index calculation and adding IBs to jobs
v6: fix typo found by Alex

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Christian König
2022-03-02 16:39:34 +01:00
committed by Alex Deucher
parent 68ce8b2422
commit 4624459c84
5 changed files with 195 additions and 99 deletions

View File

@@ -140,8 +140,10 @@ TRACE_EVENT(amdgpu_bo_create,
);
TRACE_EVENT(amdgpu_cs,
TP_PROTO(struct amdgpu_cs_parser *p, int i),
TP_ARGS(p, i),
TP_PROTO(struct amdgpu_cs_parser *p,
struct amdgpu_job *job,
struct amdgpu_ib *ib),
TP_ARGS(p, job, ib),
TP_STRUCT__entry(
__field(struct amdgpu_bo_list *, bo_list)
__field(u32, ring)
@@ -151,10 +153,10 @@ TRACE_EVENT(amdgpu_cs,
TP_fast_assign(
__entry->bo_list = p->bo_list;
__entry->ring = to_amdgpu_ring(p->entity->rq->sched)->idx;
__entry->dw = p->job->ibs[i].length_dw;
__entry->ring = to_amdgpu_ring(job->base.sched)->idx;
__entry->dw = ib->length_dw;
__entry->fences = amdgpu_fence_count_emitted(
to_amdgpu_ring(p->entity->rq->sched));
to_amdgpu_ring(job->base.sched));
),
TP_printk("bo_list=%p, ring=%u, dw=%u, fences=%u",
__entry->bo_list, __entry->ring, __entry->dw,