drm/amdkfd: kfd driver supports hot unplug/replug amdgpu devices

This patch allows kfd driver function correctly when AMD gpu devices got
unplug/replug at run time.

When an AMD gpu device got unplug kfd driver gracefully terminates existing
kfd processes after stops all queues by sending SIGBUS to user process. After
that user space can still use remaining AMD gpu devices. When all AMD gpu
devices at system got removed kfd driver will not response new requests.

Unplugged AMD gpu devices can be re-plugged. kfd driver will use added devices
to function as usual.

The purpose of this patch is having kfd driver behavior as expected during and
after AMD gpu devices unplug/replug at run time.

Signed-off-by: Xiaogang Chen <Xiaogang.Chen@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Xiaogang Chen
2026-01-13 20:45:14 -06:00
committed by Alex Deucher
parent d81e52fc61
commit 6cca686dfc
8 changed files with 156 additions and 2 deletions

View File

@@ -3510,6 +3510,7 @@ static int amdgpu_device_ip_fini_early(struct amdgpu_device *adev)
amdgpu_device_set_cg_state(adev, AMD_CG_STATE_UNGATE);
amdgpu_amdkfd_suspend(adev, true);
amdgpu_amdkfd_teardown_processes(adev);
amdgpu_userq_suspend(adev);
/* Workaround for ASICs need to disable SMC first */