habanalabs/gaudi: add debugfs to fetch internal sync status

When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
This commit is contained in:
Ohad Sharabi
2022-03-22 14:32:40 +02:00
committed by Greg Kroah-Hartman
parent f5d85fe05a
commit d0b59cf68c
7 changed files with 211 additions and 16 deletions

View File

@@ -190,6 +190,30 @@ Description: Check and display page fault or access violation mmu errors for
echo "0x200" > /sys/kernel/debug/habanalabs/hl0/mmu_error
cat /sys/kernel/debug/habanalabs/hl0/mmu_error
What: /sys/kernel/debug/habanalabs/hl<n>/monitor_dump
Date: Mar 2022
KernelVersion: 5.19
Contact: osharabi@habana.ai
Description: Allows the root user to dump monitors status from the device's
protected config space.
This property is a binary blob that contains the result of the
monitors registers dump.
This custom interface is needed (instead of using the generic
Linux user-space PCI mapping) because this space is protected
and cannot be accessed using PCI read.
This interface doesn't support concurrency in the same device.
Only supported on GAUDI.
What: /sys/kernel/debug/habanalabs/hl<n>/monitor_dump_trig
Date: Mar 2022
KernelVersion: 5.19
Contact: osharabi@habana.ai
Description: Triggers dump of monitor data. The value to trigger the operation
must be 1. Triggering the monitor dump operation initiates dump of
current registers values of all monitors.
When the write is finished, the user can read the "monitor_dump"
blob
What: /sys/kernel/debug/habanalabs/hl<n>/set_power_state
Date: Jan 2019
KernelVersion: 5.1