mirror of
https://github.com/torvalds/linux.git
synced 2026-04-18 06:44:00 -04:00
Add support for decoding NVIDIA-specific CPER sections delivered via the APEI GHES vendor record notifier chain. NVIDIA hardware generates vendor-specific CPER sections containing error signatures and diagnostic register dumps. This implementation registers a notifier_block with the GHES vendor record notifier and decodes these sections, printing error details via dev_info(). The driver binds to ACPI device NVDA2012, present on NVIDIA server platforms. The NVIDIA CPER section contains a fixed header with error metadata (signature, error type, severity, socket) followed by variable-length register address-value pairs for hardware diagnostics. This work is based on libcper [1]. Example output: nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544 nvidia-ghes NVDA2012:00: signature: CMET-INFO nvidia-ghes NVDA2012:00: error_type: 0 nvidia-ghes NVDA2012:00: error_instance: 0 nvidia-ghes NVDA2012:00: severity: 3 nvidia-ghes NVDA2012:00: socket: 0 nvidia-ghes NVDA2012:00: number_regs: 32 nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000 nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 value=0x0000000100000000 https://github.com/openbmc/libcper/commit/683e055061ce [1] Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Kai-Heng Feng <kaihengf@nvidia.com> [ rjw: Changelog edits ] Link: https://patch.msgid.link/20260330094203.38022-4-kaihengf@nvidia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>