Commit Graph

40 Commits

Author SHA1 Message Date
Dani Liberman
0c88760f8f habanalabs/gaudi2: add secured attestation info uapi
User will provide a nonce via the ioctl, and will retrieve
secured attestation data of the boot, generated using given
nonce.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:40 +03:00
Ofir Bitton
0626fa1a4d habanalabs: add support for new cpucp return codes
Firmware now responds with a more detailed cpucp return codes.
Driver can now distinguish between error and debug return codes.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:38 +03:00
farah kassabri
4745b2f0d0 habanalabs: send device active message to f/w
As part of the RAS that is done by the f/w, we should send a message
to the f/w when a user either acquires or releases the device.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-09-19 15:08:37 +03:00
Ofir Bitton
a85e389a84 habanalabs/gaudi2: reset device upon critical ECC event
Correctable ECC events are not fatal, but as they accumulate, the f/w
can decide that a hard-rest is required. This indication is
propagated to the host using the existing ECC event interface.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:28 +03:00
Oded Gabbay
d7bb1ac89b habanalabs: add gaudi2 asic-specific code
Add the ASIC-specific code for Gaudi2. Supply (almost) all of the
function callbacks that the driver's common code need to initialize,
finalize and submit workloads to the Gaudi2 ASIC.

It also contains the code to initialize the F/W of the Gaudi2 ASIC
and to receive events from the F/W.

It contains new debugfs entry to dump razwi events. razwi is a case
where the device's engines create a transaction that reaches an
invalid destination.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:27 +03:00
ran shalit
e41c641856 habanalabs: add critical indication in sram ecc
Multiple SRAM SERR events are treated as critical events,
and host should be notified about it. Thus, adding is_critical
indication as part of SRAM ECC failure packet.

Signed-off-by: ran shalit <rshalit@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-07-12 09:09:23 +03:00
Oded Gabbay
368b0b4fd6 habanalabs: update firmware header
Update cpucp_if.h to latest version.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22 21:01:20 +02:00
Ohad Sharabi
d0b59cf68c habanalabs/gaudi: add debugfs to fetch internal sync status
When Gaudi device is secured the monitors data in the configuration
space is blocked from PCI access.
As we need to enable user to get sync-manager monitors registers when
debugging, this patch adds a debugfs that dumps the information to a
binary file (blob).
When a root user will trigger the dump, the driver will send request to
the f/w to fill a data structure containing dump of all monitors
registers.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-22 20:57:37 +02:00
Linus Torvalds
02e2af20f4 Merge tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc and other driver updates from Greg KH:
 "Here is the big set of char/misc and other small driver subsystem
  updates for 5.18-rc1.

  Included in here are merges from driver subsystems which contain:

   - iio driver updates and new drivers

   - fsi driver updates

   - fpga driver updates

   - habanalabs driver updates and support for new hardware

   - soundwire driver updates and new drivers

   - phy driver updates and new drivers

   - coresight driver updates

   - icc driver updates

  Individual changes include:

   - mei driver updates

   - interconnect driver updates

   - new PECI driver subsystem added

   - vmci driver updates

   - lots of tiny misc/char driver updates

  All of these have been in linux-next for a while with no reported
  problems"

* tag 'char-misc-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (556 commits)
  firmware: google: Properly state IOMEM dependency
  kgdbts: fix return value of __setup handler
  firmware: sysfb: fix platform-device leak in error path
  firmware: stratix10-svc: add missing callback parameter on RSU
  arm64: dts: qcom: add non-secure domain property to fastrpc nodes
  misc: fastrpc: Add dma handle implementation
  misc: fastrpc: Add fdlist implementation
  misc: fastrpc: Add helper function to get list and page
  misc: fastrpc: Add support to secure memory map
  dt-bindings: misc: add fastrpc domain vmid property
  misc: fastrpc: check before loading process to the DSP
  misc: fastrpc: add secure domain support
  dt-bindings: misc: add property to support non-secure DSP
  misc: fastrpc: Add support to get DSP capabilities
  misc: fastrpc: add support for FASTRPC_IOCTL_MEM_MAP/UNMAP
  misc: fastrpc: separate fastrpc device from channel context
  dt-bindings: nvmem: brcm,nvram: add basic NVMEM cells
  dt-bindings: nvmem: make "reg" property optional
  nvmem: brcm_nvram: parse NVRAM content into NVMEM cells
  nvmem: dt-bindings: Fix the error of dt-bindings check
  ...
2022-03-28 12:27:35 -07:00
Rajaravi Krishna Katta
4c01e524b2 habanalabs: sysfs support for fw os version
Adds new sysfs entry to display firmware os version
/sys/class/habanalabs/hl<n>/fw_os_ver

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2022-02-28 14:22:02 +02:00
Gustavo A. R. Silva
5224f79096 treewide: Replace zero-length arrays with flexible-array members
There is a regular need in the kernel to provide a way to declare
having a dynamically sized set of trailing elements in a structure.
Kernel code should always use “flexible array members”[1] for these
cases. The older style of one-element or zero-length arrays should
no longer be used[2].

This code was transformed with the help of Coccinelle:
(next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch)

@@
identifier S, member, array;
type T1, T2;
@@

struct S {
  ...
  T1 member;
  T2 array[
- 0
  ];
};

UAPI and wireless changes were intentionally excluded from this patch
and will be sent out separately.

[1] https://en.wikipedia.org/wiki/Flexible_array_member
[2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays

Link: https://github.com/KSPP/linux/issues/78
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-02-17 07:00:39 -06:00
Tomer Tayar
f297a0e9fe habanalabs: add CPU-CP packet for engine core ASID cfg
In some cases the driver cannot configure ASID of some engines due to
the security level of the relevant registers.
For this a new CPU-CP packet is introduced, which will allow the driver
to ask the F/W to do this configuration instead.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 14:39:53 +02:00
Ofir Bitton
b5c92b8882 habanalabs: sysfs support for two infineon versions
Currently sysfs support dumping a single infineon version, in
future asics we will have two infineon versions.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:09 +02:00
Ofir Bitton
3eb7754ff4 habanalabs: debugfs support for larger I2C transactions
I2C debugfs support is limited to 1 byte. We extend functionality
to more than 1 byte by using one of the pad fields as a length.
No backward compatibility issues as new F/W versions will treat 0
length as a 1 byte length transaction.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:05 +02:00
farah kassabri
49c052dad6 habanalabs: add new opcodes for INFO IOCTL
Add implementation for new opcodes in the INFO IOCTL:
1. Retrieve the replaced DRAM rows from f/w.
2. Retrieve the pending DRAM rows from f/w.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:05 +02:00
Rajaravi Krishna Katta
e84e31a912 habanalabs: add dedicated message towards f/w to set power
CPUCP_PACKET_POWER_GET packet type was used for both
hl_get_power() and hl_set_power().

To align with other sensor functions hl_set_power()
should use CPUCP_PACKET_POWER_SET.

This packet will only be used with newer ASICs, so need to add
a compatibility flag to the asic properties to indicate whether to use
this packet or the GET packet.

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-12-26 08:59:04 +02:00
Oded Gabbay
efc6b04b86 habanalabs: update firmware files
Update the firmware headers to the latest version

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:47 +03:00
Rajaravi Krishna Katta
2b28485d0a habanalabs: enable power info via HWMON framework
Add support to retrieve following power info via HWMON:
- instantaneous power value
- highest value since last reset
- reset the highest place holder

Signed-off-by: Rajaravi Krishna Katta <rkatta@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-10-18 12:05:46 +03:00
Oded Gabbay
c2aa713618 habanalabs: update to latest firmware headers
Add several new packets between driver and firmware.
Add matching compatibility bits for backward compatibility.
Add support for 4K event types.
Add information about pcie errors.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01 18:38:24 +03:00
Oded Gabbay
5dc9ffaff1 habanalabs: expose server type in INFO IOCTL
Add the server type property to the hl_info_hw_ip_info structure
that is exposed to the user via the INFO IOCTL.

This is needed by the userspace s/w stack to know the connections map
of the internal links that connect the ASIC among themselves inside the
server.

The F/W will tell us, as part of the NIC information, the server type
that the GAUDI is located in.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-09-01 18:38:24 +03:00
Ohad Sharabi
e1222c2794 habanalabs: report EQ fault during heartbeat
In case we have EQ fault we would like to know about it.
For this, a status bitmask was added in which EQ_FAULT bit is
set by FW in case of EQ fault.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Ofir Bitton
254fac6d1a habanalabs/gaudi: add FW alive event support
In order for driver to be aware of process or thread crashes inside
GAUDI's CPU, we introduce a new event which contains all relevant
information. Upon event reception, driver will dump information and
will reset the device.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:41 +03:00
Oded Gabbay
1242e9f0f4 habanalabs: check running index in eqe control
To harden the event queue mechanism, we add a running index to the
control header of the entry.

The firmware writes the index in each entry and the driver verifies
that the index of the current entry is larger by 1 of the index of
the previous entry.

In case it isn't, the driver will treat the entry as if it wasn't
valid (it won't process it but won't skip it).

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:40 +03:00
Oded Gabbay
90bd4798a8 habanalabs: update to latest f/w headers
Update the common and GAUDI firmware header files to the latest version.

The latest version use the correct endianness types so this commit also
contains minor changes to the code to use the correct conversions when
reading/writing to the firmware structures.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:39 +03:00
Oded Gabbay
3b39840083 habanalabs: update firmware files to latest
Update the firmware files to the latest from the firmware team.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-06-18 15:23:38 +03:00
Ohad Sharabi
669b018835 habanalabs: update to latest F/W communication header
update files to latest version from F/W team.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:25 +03:00
Ohad Sharabi
e9c2003be4 habanalabs: send dynamic msi-x indexes to f/w
In order to minimize hard coded values between F/W and the driver, we
send msi-x indexes dynamically to the F/W.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Ohad Sharabi
e8f9392a5c habanalabs: support legacy and new pll indexes
In order to use minimum of hard coded values common to LKD and F/W
a dynamic method to work with PLLs is introduced in this patch.
Formerly asic specific PLL numbering is now common for all asics.
To be backward compatible a bit in dev status is defined, if the bit is
not set LKD will keep working with old PLL numbering.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:24 +03:00
Sagiv Ozeri
586f2caf0e habanalabs: return current power via INFO IOCTL
Add driver implementation for reading the current power from the device
CPU F/W.

Signed-off-by: Sagiv Ozeri <sozeri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:23 +03:00
Ohad Sharabi
5d6a198f9d habanalabs: reset device in case of sync error
As the F/wW is the first to detect out of sync event, a new event is
added to notify the driver on such event. In which case the driver
performs hard reset.

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-04-09 14:09:22 +03:00
Oded Gabbay
7838504171 habanalabs: update SyncManager interrupt handling
The firmware provides more information about SyncManager events.
Adjust the code to the latest firmware interface file.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27 21:03:51 +02:00
Ofir Bitton
f8bc7f091c habanalabs/gaudi: print sync manager SEI interrupt info
Driver must print sync manager SEI information upon receiving
interrupt from FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2021-01-27 21:03:50 +02:00
Alon Mizrahi
d2bbf2ca33 habanalabs: add ull to PLL masks
These defines are 64-bit defines so they need ull suffix.

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:37 +02:00
Alon Mizrahi
4147864e8d habanalabs: fetch pll frequency from firmware
Once firmware security is enabled, driver must fetch pll frequencies
through the firmware message interface instead of reading the registers
directly.

Signed-off-by: Alon Mizrahi <amizrahi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:36 +02:00
Ofir Bitton
5a2998f46c habanalabs/gaudi: fetch HBM ecc info from FW
Once FW security is enabled there is no access to HBM ecc registers,
need to read values from FW using a dedicated interface.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:34 +02:00
Ofir Bitton
323b726706 habanalabs: fetch security indication from FW
Add support for fetching security indication from FW.
This indication is needed in order to skip unnecessary
initializations done by FW.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:31 +02:00
Oded Gabbay
b3a9c0bd2f habanalabs/gaudi: add NIC firmware-related definitions
Add new structures and messages that the driver use to interact with the
firmware to receive information and events (errors) about GAUDI's NIC.

Signed-off-by: Omer Shpigelman <oshpigelman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2020-11-30 10:47:29 +02:00
Oded Gabbay
219b8f2ff0 habanalabs: update firmware interface file
Add new packet to fetch PLL information from firmware. This will be needed
in the future when the driver won't be able to access the PLL registers
directly

Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:54 +03:00
Ofir Bitton
763a0b4d81 habanalabs: Fix alignment issue in cpucp_info structure
Because the device CPU compiler aligns structures to 8 bytes,
struct cpucp_info has an alignment issue as some parts
in the structure are not aligned to 8 bytes.
It is preferred that we explicitly insert placeholders inside
the structure to avoid confusion

in order to validate this scenario, we printed both pointers:

__u8 cpucp_version[VERSION_MAX_LEN]; (0xffff899c67ed4cbc)
__le64 dram_size;                    (0xffff899c67ed4d40)

we see difference of 132 bytes although the first array
is only 128 bytes long, Meaning compiler added a 4 byte padding.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:52 +03:00
Oded Gabbay
2f55342c5e habanalabs: replace armcp with the generic cpucp
ArmCP mandates that the device CPU is always an ARM processor, which might
be wrong in the future.

Most of this change is an internal renaming of variables, functions and
defines but there are two entries in sysfs which have armcp in their
names. Add identical cpucp entries but don't remove yet the armcp entries.
Those will be deprecated next year. Add the documentation about it in sysfs
documentation.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
2020-09-22 18:49:51 +03:00