mirror of
https://github.com/torvalds/linux.git
synced 2026-04-19 07:13:56 -04:00
drm/amdgpu: break driver init process when it's bad GPU(v5)
When retrieving bad gpu tag from eeprom, GPU init should
fail as the GPU needs to be retired for further check.
v2: Fix spelling typo, correct the condition to detect
bad gpu tag and refine error message.
v3: Refine function argument name.
v4: Fix missing check of returning value of i2c
initialization error case.
v5: Use dev_err to print PCI information in dmesg instead
of DRM_ERROR.
Signed-off-by: Guchun Chen <guchun.chen@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
committed by
Alex Deucher
parent
1d6a9d122d
commit
b82e65a935
@@ -2055,13 +2055,19 @@ static int amdgpu_device_ip_init(struct amdgpu_device *adev)
|
||||
* it should be called after amdgpu_device_ip_hw_init_phase2 since
|
||||
* for some ASICs the RAS EEPROM code relies on SMU fully functioning
|
||||
* for I2C communication which only true at this point.
|
||||
* recovery_init may fail, but it can free all resources allocated by
|
||||
* itself and its failure should not stop amdgpu init process.
|
||||
*
|
||||
* amdgpu_ras_recovery_init may fail, but the upper only cares the
|
||||
* failure from bad gpu situation and stop amdgpu init process
|
||||
* accordingly. For other failed cases, it will still release all
|
||||
* the resource and print error message, rather than returning one
|
||||
* negative value to upper level.
|
||||
*
|
||||
* Note: theoretically, this should be called before all vram allocations
|
||||
* to protect retired page from abusing
|
||||
*/
|
||||
amdgpu_ras_recovery_init(adev);
|
||||
r = amdgpu_ras_recovery_init(adev);
|
||||
if (r)
|
||||
goto init_failed;
|
||||
|
||||
if (adev->gmc.xgmi.num_physical_nodes > 1)
|
||||
amdgpu_xgmi_add_device(adev);
|
||||
|
||||
Reference in New Issue
Block a user