Commit Graph

38 Commits

Author SHA1 Message Date
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Stanley.Yang
c40c94693c drm/amd/ras: statistic xgmi training error count
Report xgmi training error uncorrectable error count.

Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-02-05 17:17:48 -05:00
Jinzhou Su
0099f2e92c drm/amd/ras: Replace NPS flags in ras module
Replace AMDGPU_NPS8_PARTITION_MODE with
UMC_MEMORY_PARTITION_MODE_NPS8 to pass sriov
compilation.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05 17:00:00 -05:00
Jinzhou Su
d3336c935e drm/amd/ras: Support physical address convert
Support physical address convert to current NPS
pages in uniras.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2026-01-05 16:26:12 -05:00
Candice Li
90254524ee drm/amd/ras: Add vram_type to ras_ta_init_flags
Add vram_type to ras_ta_init_flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-10 17:38:13 -05:00
Srinivasan Shanmugam
8b971ce0cb drm/amd/ras: Reduce stack usage in ras_umc_handle_bad_pages()
ras_umc_handle_bad_pages() function used a large local array:
  struct eeprom_umc_record records[MAX_ECC_NUM_PER_RETIREMENT];

Move this array off the stack by allocating it with kcalloc()
and freeing it before return.

This reduces the stack frame size of ras_umc_handle_bad_pages()
and avoids the frame size warning.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../ras/rascore/ras_umc.c:498:5: warning: stack frame size (1208) exceeds limit (1024) in 'ras_umc_handle_bad_pages' [-Wframe-larger-than]

v2: Removed the duplicate ras_umc_get_new_records() invocation. (Lijo)

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-08 14:25:37 -05:00
YiPeng Chai
11dcf72eb5 drm/amd/ras: Support high-frequency querying sriov ras block error count
Support high-frequency querying sriov ras block error count:
1. Create shared memory and fills it with RAS_CMD__GET_LAL_LOC_STATUS
   ras command.
2. The RAS_CMD_GET_ALL_BLOCK_ECC_STATUS command and shared
   memory are registered to sriov host ras auto-update list
   via RAS_CMD_SET_CMD_AUTO_UPDATE command.
3. Once sriov host detects ras error, it will automatically execute
   RAS_CMD__GET_ALL_BLOCK_ECC_STATUS command and write the result to
   shared memory.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-08 13:56:33 -05:00
YiPeng Chai
d95ca7f515 drm/amdgpu: suspend ras module before gpu reset
During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.

V2:
  Rename functions to avoid misunderstanding.

V3:
  Move flush_delayed_work to amdgpu_ras_process_pause,
  Move schedule_delayed_work to amdgpu_ras_process_unpause.

V4:
  Rename functions.

V5:
  Move the function to amdgpu_ras.c.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:59 -05:00
YiPeng Chai
3f16007d86 drm/amd/ras: Add ras support for umc v12_5_0
Add ras support for umc v12_5_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
YiPeng Chai
d7f105a402 drm/amd/ras: Add ras support for nbio v7_9_1
Add ras support for nbio v7_9_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:22 -05:00
Xiang Liu
7cf422ed33 drm/amd/ras: Fix format truncation
../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:53:21 -05:00
Xiang Liu
988fd51e45 drm/amd/ras: Use correct severity for BP threshold exceed event
The severity of CPER for BP threshold exceed event should be set as
FATAL to match the OOB implementation.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:47 -05:00
Xiang Liu
bfdffc2995 drm/amd/ras: Correct info field of bad page threshold exceed CPER
Correct valid_bits and ms_chk_bits of section info field for bad page
threshold exceed CPER to match OOB's behavior.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:47 -05:00
Xiang Liu
87208c1068 drm/amd/ras: Update IPID value for bad page threshold CPER
The IPID register value for bad page threshold CPER holds socket_id info
now according to the latest definition.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-11-04 11:52:46 -05:00
YiPeng Chai
25c1e7414b drm/amd/ras: Update function and remove redundant code
Update function and remove redundant code:
1. Update function to prepare for internal use.
2. Remove unused function code previously prepared
   for ioctl.

V2:
  Update commit message content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:37 -04:00
YiPeng Chai
4c74635afd drm/amd/ras: Update ras command context structure name
According to the actual usage of this structure,
it is more appropriate to call it context, the
structure name with ioctl is easy to cause
misunderstanding.

V2:
  Update commit message content.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:33 -04:00
YiPeng Chai
ace232eff5 drm/amdgpu: Add ras module files into amdgpu
Add ras module files into amdgpu.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
YiPeng Chai
cef10272e7 drm/amd/ras: Add files to ras core Makefile
Add files to ras core Makefile.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:30 -04:00
YiPeng Chai
13c91b5b43 drm/amd/ras: Add rascore unified interface function
1. Complete the initialization call of all
   sub-functions.
2. Export common interfaces.

V2:
  Remove the use of typedef to define function pointer.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:30 -04:00
YiPeng Chai
54ad42c23d drm/amd/ras: Add cper conversion function
Add cper conversion function.

V3:
  Change commit message and update the calling function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:30 -04:00
YiPeng Chai
0ec9ed84fb drm/amd/ras: Use ring buffer to record ras ecc data
Use ring buffer to record ras ecc data.

V3:
  Change commit message and rename the file and
  function names.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:30 -04:00
YiPeng Chai
ea61341b90 drm/amd/ras: Add thread to handle ras events
Add thread to handle ras events.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:30 -04:00
YiPeng Chai
19030244e1 drm/amd/ras: Add ras ioctl command handler
Add ras ioctl command handler.

V2:
  Remove ras global device list.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
c49ef01183 drm/amd/ras: Add psp ras common functions
Add psp ras common functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
9f3083dc9f drm/amd/ras: Add psp v13_0 ras functions
Add psp v13_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
5c3be5defc drm/amd/ras: Add eeprom ras functions
Add eeprom ras functions.

V5:
  Remove duplicate data structure definition.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
a8f2352a41 drm/amd/ras: Add gfx common ras functions
Add gfx common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
4b23ebf7a0 drm/amd/ras: Add gfx v9_0 ras functions
Add gfx v9_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
7a3f9c0992 drm/amd/ras: Add umc common ras functions
Add umc common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
8bd7fe95a4 drm/amd/ras: Add umc v12_0 ras functions
Add umc v12_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
df2d8574c5 drm/amd/ras: Add nbio common ras functions
Add nbio common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
fa4fe20f45 drm/amd/ras: Add nbio v7_9 ras functions
Add nbio v7_9 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:29 -04:00
YiPeng Chai
adf0e0e089 drm/amd/ras: Add mp1 common ras functions
Add mp1 common ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
YiPeng Chai
71abe27a9a drm/amd/ras: Add mp1 v13_0 ras functions
Add mp1 v13_0 ras functions.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
YiPeng Chai
88e379e5b8 drm/amd/ras: Add aca common ras functions
Add aca common ras functions:
1. Aca hw init/fini.
2. Get ecc count of each ras block.
3. Update query ecc count from mp1.
4. Clear ras block ecc count.

V3:
  Update the calling function.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
YiPeng Chai
fd98319f73 drm/amd/ras: Add ras aca parser v1.0
Add ras aca parser v1.0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:28 -04:00
YiPeng Chai
2330437da0 drm/amd/ras: Add rascore status definition
Add rascore status definition.

V5:
  Merge the previous empty files.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-23 10:36:02 -04:00