Commit Graph

25 Commits

Author SHA1 Message Date
Tomer Tayar
c517068349 accel/habanalabs: keep explicit size of reserved memory for FW
The reserved memory for FW is currently saved in an ASIC property in
units of MB, just like the value that comes from FW.
Except the fact that it is not clear from the property's name, it means
also that a calculation to actual size is required everywhere that it is
used.
Modify the property to hold the size in bytes.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26 09:47:27 +02:00
Tomer Tayar
db45bbdd02 accel/habanalabs: handle reserved memory request when working with full FW
Currently the reserved memory request from FW is handled when running
with preboot only, but this request is relevant also when running with
full FW.
Modify to always handle this reservation request.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26 09:47:24 +02:00
Dani Liberman
fd8d2fa066 accel/habanalabs: fix error print
The unmasking is for event and it can be other event than RAZWI.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26 09:47:12 +02:00
Tomer Tayar
7159813c91 accel/habanalabs: modify print for skip loading linux FW to debug log
Skip loading a linux FW image into the device with the current supported
ASICs is done for test purposes only.
Moreover, for future supported ASICs it is possible that there won't be
a need to load such an image.
The print in such a case is therefore not needed in most cases, so
replace the used dev_info() with dev_dbg().

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2024-02-26 09:46:52 +02:00
Moti Haimovski
7259eb7b53 accel/habanalabs/gaudi2: add signed dev info uAPI
User will provide a nonce via the INFO ioctl, and will retrieve
the signed device info generated using given nonce.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-12-19 11:09:43 +02:00
Farah Kassabri
fbc2a09e09 accel/habanalabs: update device boot error check
Use a predefined mask which set the device critical boot errors.
Driver will fail and stop its loading, only upon detecting at least
one of those errors defined in this mask.

Signed-off-by: Farah Kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-12-19 11:09:42 +02:00
Ariel Suller
de8773fdc5 accel/habanalabs: update boot status print
FW shutdown preparation status was added to spec.

Signed-off-by: Ariel Suller <asuller@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:23 +03:00
Dafna Hirschfeld
674f77798e accel/habanalabs: extend preboot timeout when preboot might take longer
There are cases such when FW runs MBIST, that preboot is expected to take
longer than the usual. In such cases the firmware reports status
SECURITY_READY/IN_PREBOOT and we extend the timeout waiting for it.
This is currently implemented for Gaudi2 only.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:22 +03:00
David Meriin
2b76129c5a accel/habanalabs: move cpucp interface to linux/habanalabs
The CPUCP interface is moved to a shared folder outside of accel as
a pre-requisite to upstream the NIC drivers that will also include
this file.

Signed-off-by: David Meriin <dmeriin@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:21 +03:00
Dani Liberman
8887279092 accel/habanalabs: handle f/w reserved dram space request
It is possible for FW to request reserved space in dram.
If the device supports this option, it will retrieve the size from the
f/w and will reserve it.

Currently we add the common code infrastructure to support it.

Signed-off-by: Dani Liberman <dliberman@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:19 +03:00
Ofir Bitton
d33c3d0541 accel/habanalabs: dump temperature threshold boot error
Add dump of an error reported from f/w during boot time.
This error indicates a failure with setting temperature threshold.

Signed-off-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:19 +03:00
farah kassabri
bffd2f16ae accel/habanalabs: fix standalone preboot descriptor request
The preboot used to statically allocate memory for the comms descriptor
on the device memory when driver requested the descriptor information.
Now preboot moved to dynamic memory allocation where it wants to check
the size the driver expects vs. what the f/w expects.
Note there are no backward compatibility issues as older f/w versions
simply ignore this value.

Signed-off-by: farah kassabri <fkassabri@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-10-09 12:37:18 +03:00
Koby Elbaz
7e63f317c0 accel/habanalabs: update state when loading boot fit
Any FW component we load must be followed by a corresponding state
update. However, it seems that so far we skipped doing so for the
bootfit case, so fix that.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:56 +03:00
Koby Elbaz
67d19a2f49 accel/habanalabs: poll for device status update following WFE cmd
Currently, we rely on COMMS protocol's ack to verify that WFE command
has been acknowledged by the FW. However, this does not guarantee that
the device status has been updated.
Although unlikely, this could trigger a race since the driver expects
the device to be halted at that stage, but it might not be.
Therefore, we increase WFE's robustness by polling on the status
register that will be updated once the device is actually halted.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-08 12:35:54 +03:00
Dafna Hirschfeld
cc7b790d41 accel/habanalabs: do soft-reset using cpucp packet
This is done depending on the FW version. The cpucp method is
preferable and saves scratchpads resource.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05 15:31:34 +03:00
Dafna Hirschfeld
dd5667ff6f accel/habanalabs: extract and save the FW's SW major/minor/sub-minor
It is not always possible to know the FW's SW version from the inner FW
version. Therefore we should extract the general SW version in addition
to the FW version and use it in functions like
'hl_is_fw_ver_below_1_9' etc.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05 15:31:33 +03:00
Dafna Hirschfeld
3071247ca0 accel/habanalabs: rename fw_{major/minor}_version to fw_inner_{major/minor}_ver
We later want to add fields for Firmware SW version. The current
extracted FW version is the inner FW versioning so the new name
is better and also better differentiate from the FW's SW version.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05 15:31:33 +03:00
Dafna Hirschfeld
9ef23f05ae accel/habanalabs: add helper to extract the FW major/minor
the helper is extract_u32_until_given_char and can later be used to
also get the major/minor of the sw version.

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-06-05 15:31:33 +03:00
Koby Elbaz
c19350efa9 accel/habanalabs: don't wait for STS_OK after sending COMMS WFE
Sending COMMS_GOTO_WFE instructs the FW's CPU to halt (WFE state).
Once sent, FW's CPU isn't expected to continue communicating with LKD.
Therefore, the stage of waiting for COMMS_STS_OK should be skipped or
else waiting for COMMS_STS_OK will simply timeout, which will trigger
unexpected behavior.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08 10:39:34 +03:00
Dafna Hirschfeld
12f7701138 accel/habanalabs: improvements to FW ver extraction
1. Rename the func to hl_get_preboot_major_minor because we also set
   the extracted values in hdev fields.

2. Free the allocated string in the calling function which makes more
   sense

Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-04-08 10:39:33 +03:00
Koby Elbaz
9d7fef7c59 accel/habanalabs: change COMMS warning messages to error level
COMMS protocol is used for LKD <--> FW communication, and any
communication failure between the two might turn out to be
destructive, hence, it should be well emphasized.

Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com>
2023-04-08 10:39:33 +03:00
Tomer Tayar
7cd6b5625b accel/habanalabs: fix a missing-braces compilation warning
Replace initialization of "struct cpucp_packet" from "{0} to "{}" to
avoid a "missing braces around initializer" compilation warning.

Signed-off-by: Tomer Tayar <ttayar@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-03-20 17:35:34 +02:00
Moti Haimovski
eba773d30d habanalabs: enhance info printed on FW load errors
This commit enhances the following error messages to also provide the
type of error occurred, this in order to ease debugging of errors
detected during firmware-load.

Signed-off-by: Moti Haimovski <mhaimovski@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26 11:52:13 +02:00
Ohad Sharabi
ab509d81c9 habanalabs: add set engines masks ASIC function
This function shall be used whenever components enable/binning masks
should be updated.

Usage is in one of the below cases:
- update user (or default) component masks
- update when getting the masks from FW (either CPUCP or COMMS)

Signed-off-by: Ohad Sharabi <osharabi@habana.ai>
Reviewed-by: Oded Gabbay <ogabbay@kernel.org>
Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26 11:52:11 +02:00
Oded Gabbay
e65e175b07 habanalabs: move driver to accel subsystem
Now that we have a subsystem for compute accelerators, move the
habanalabs driver to it.

This patch only moves the files and fixes the Makefiles. Future
patches will change the existing code to register to the accel
subsystem and expose the accel device char files instead of the
habanalabs device char files.

Update the MAINTAINERS file to reflect this change.

Signed-off-by: Oded Gabbay <ogabbay@kernel.org>
2023-01-26 11:52:10 +02:00