Tao Zhou
|
2c684b9342
|
drm/amdgpu: add deferred error check for UMC v12 address query
Both RAS UE and deferred errors need page retirement.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-02-29 20:35:14 -05:00 |
|
Tao Zhou
|
01087a1974
|
drm/amdgpu: use PSP address query command
Get UMC physical address from PSP in RAS error address coversion.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-31 14:05:19 -05:00 |
|
YiPeng Chai
|
0795b5d234
|
drm/amdgpu:Support retiring multiple MCA error address pages
Support retiring multiple MCA error address pages in
one in-band query for umc v12_0.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-22 17:13:25 -05:00 |
|
YiPeng Chai
|
afb617f38f
|
drm/amdgpu: add interface to check mca umc status
Add interface to check mca umc status.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-22 17:13:25 -05:00 |
|
YiPeng Chai
|
22f6e3e112
|
drm/amdgpu: Add log info for umc_v12_0
Add log info for umc_v12_0.
v2:
Delete redundant logs.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-22 17:13:25 -05:00 |
|
Tao Zhou
|
a9e4f61df1
|
drm/amdgpu: update error condition check for umc_v12_0_query_error_address
Deferred error is also taken into account.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-18 15:47:24 -05:00 |
|
Candice Li
|
46e2231ce0
|
drm/amdgpu: Log deferred error separately
Separate deferred error from UE and CE and log it
individually.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-15 18:35:37 -05:00 |
|
Yang Wang
|
f38765de83
|
drm/amdgpu: add umc v12.0 ACA support
add umc v12.0 ACA driver support
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2024-01-15 18:35:36 -05:00 |
|
YiPeng Chai
|
99cab331a4
|
drm/amdgpu: Add umc page retirement for umc v12_0
Add umc page retirement for umc v12_0.
V2:
1. Changed umc page retirement check condition
to call umc_v12_0_is_uncorrectable_error.
2. Use memset to clear the contents of the umc
error address structure.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-12-19 14:59:03 -05:00 |
|
YiPeng Chai
|
a8c77a121c
|
drm/amdgpu: Add poison mode check error condition for umc v12_0
Add poison mode check error condition for umc v12_0.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-12-19 14:59:03 -05:00 |
|
YiPeng Chai
|
9f91e983ee
|
drm/amdgpu: MCA supports recording umc address information
MCA supports recording umc address information.
V2:
Move err_addr variable from struct ras_err_node to
struct ras_err_info.
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-12-19 14:59:03 -05:00 |
|
Yang Wang
|
bf13da6ae1
|
drm/amdgpu: correct smu v13.0.6 umc ras error check
correct smu v13.0.0 umc ras error check
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-11-09 17:01:20 -05:00 |
|
Candice Li
|
e020d01575
|
drm/amdgpu: Drop deferred error in uncorrectable error check
Drop checking deferred error which can be handled by poison
consumption.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-10-31 16:40:15 -04:00 |
|
Candice Li
|
d59fcfb084
|
drm/amdgpu: Identify data parity error corrected in replay mode
Use ErrorCodeExt field to identify data parity error in replay mode.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-10-27 14:15:03 -04:00 |
|
Candice Li
|
afcf949cf3
|
drm/amdgpu: Log UE corrected by replay as correctable error
Support replay mode where UE could be converted to CE.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-10-20 15:11:26 -04:00 |
|
Yang Wang
|
3bba4bc6a0
|
drm/amdgpu: add RAS error info support for umc_v12_0
add RAS error info support for umc_v12_0.
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-10-13 11:36:11 -04:00 |
|
Tao Zhou
|
f8754f58d6
|
drm/amdgpu: print channel index for UMC bad page
Print channel index for UMC v12.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-09-20 16:25:17 -04:00 |
|
Tao Zhou
|
ced575203a
|
drm/amdgpu: print more address info of UMC bad page
Print out row, column and bank value of UMC error address for UMC v12.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-09-11 17:15:15 -04:00 |
|
Tao Zhou
|
3cb9ebc9d6
|
drm/amdgpu: add channel index table for UMC v12
Get UMC phyical channel index according to node id, umc instance and
channel instance.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-09-11 17:10:58 -04:00 |
|
Tao Zhou
|
40a08fe890
|
drm/amdgpu: add address conversion for UMC v12
Convert MCA error address to physical address and find out all pages in
one physical row.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-09-11 17:10:35 -04:00 |
|
Candice Li
|
7e6ec09974
|
drm/amdgpu: Add umc v12_0 ras functions
Add umc v12_0 ras error querying.
Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
2023-09-06 14:38:00 -04:00 |
|