drm/amdgpu: Fix double deletion of validate_list

If amdgpu_amdkfd_gpuvm_free_memory_of_gpu() fails after kgd_mem is
removed from validate_list, the mem handle still lingers in the KFD idr.
This means when process is terminated,
kfd_process_free_outstanding_kfd_bos() will call
amdgpu_amdkfd_gpuvm_free_memory_of_gpu() again resulting in double
deletion.

To avoid this -
 (a) Check if list is empty before deleting it
 (b) Rearragne amdgpu_amdkfd_gpuvm_free_memory_of_gpu() such that it can
     be safely called again if it returns failure the first time.

Signed-off-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Harish Kasiviswanathan
2026-01-09 15:26:36 -05:00
committed by Alex Deucher
parent 08b1eb621c
commit 6ba60345f4

View File

@@ -1924,21 +1924,21 @@ int amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
/* Make sure restore workers don't access the BO any more */
mutex_lock(&process_info->lock);
list_del(&mem->validate_list);
if (!list_empty(&mem->validate_list))
list_del_init(&mem->validate_list);
mutex_unlock(&process_info->lock);
/* Cleanup user pages and MMU notifiers */
if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
amdgpu_hmm_unregister(mem->bo);
mutex_lock(&process_info->notifier_lock);
amdgpu_hmm_range_free(mem->range);
mutex_unlock(&process_info->notifier_lock);
}
ret = reserve_bo_and_cond_vms(mem, NULL, BO_VM_ALL, &ctx);
if (unlikely(ret))
return ret;
/* Cleanup user pages and MMU notifiers */
if (amdgpu_ttm_tt_get_usermm(mem->bo->tbo.ttm)) {
amdgpu_hmm_unregister(mem->bo);
amdgpu_hmm_range_free(mem->range);
mem->range = NULL;
}
amdgpu_amdkfd_remove_eviction_fence(mem->bo,
process_info->eviction_fence);
pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,