drm/amdgpu: Implement unrecoverable error message handling for VFs

This notification may arrive in VF mailbox while polling for response from
another event.

This patches covers the following scenarios:

- If VF is already in RMA state, then do not attempt to contact the host.
  Host will ignore the VF after sending the notification.

- If the notification is detected during polling, then set the RMA status,
  and return error to caller.

- If the notification arrives by interrupt, then set the RMA status and
  queue a reset.  This reset will fail and VF will stop runtime services.

Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com>
Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Ellen Pan
2025-04-29 17:18:44 -04:00
committed by Alex Deucher
parent 6be34e1d1f
commit 086809c82c
5 changed files with 52 additions and 6 deletions

View File

@@ -57,6 +57,7 @@ enum idh_event {
IDH_RAS_ERROR_DETECTED,
IDH_RAS_BAD_PAGES_READY = 15,
IDH_RAS_BAD_PAGES_NOTIFICATION = 16,
IDH_UNRECOV_ERR_NOTIFICATION = 17,
IDH_TEXT_MESSAGE = 255,
};