linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-03 22:12:32 -04:00

Author	SHA1	Message	Date
Ofir Bitton	f8422017b2	accel/habanalbs/gaudi2: reduce interrupt count to 128 Some systems allow a maximum number of 128 MSI-X interrupts. Hence we reduce the interrupt count to 128 instead of 512. Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:53:07 +03:00
Rakesh Ughreja	3309887c6f	accel/habanalabs/gaudi2: unsecure edma max outstanding register Netowrk EDMAs uses more outstanding transfers so this needs to be programmed by EDMA firmware. Signed-off-by: Rakesh Ughreja <rughreja@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:53:04 +03:00
Tomer Tayar	93a296dde1	accel/habanalabs: move hl_eq_heartbeat_event_handle() to common code hl_eq_heartbeat_event_handle() doesn't have ASIC specific code, and therefore can be moved from Gaudi2-only code to common code, and possibly used for other ASICs. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:53:04 +03:00
Tomer Tayar	c8c10dcaca	accel/habanalabs/gaudi2: assume hard-reset by FW upon MC SEI severe error FW initiates a hard reset upon an MC SEI severe error. Align the driver to expect this reset and avoid accessing the device until the reset is done. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:53:03 +03:00
Tomer Tayar	c754bcf9dd	accel/habanalabs/gaudi2: revise return value handling in gaudi2_hbm_sei_handle_read_err() The return value in gaudi2_hbm_sei_handle_read_err() is boolean and not a bitmask, so there is need for "\|= true". In addition, rename the 'rc' variable, as no "return code" is returned here but an indication if a hard reset is required. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:53:03 +03:00
Farah Kassabri	31bd26931d	accel/habanalabs: add heartbeat debug info It is hard to debug the reason for heartbeat check failures. As an attempt to ease this task, this patch will provide more information when this failure happens. Heartbeat checks the communication with FW, so printing the CPU queue pi/ci and the counter of how many times that event was received would help in debugging the issue. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:52:53 +03:00
Ohad Sharabi	ecda35d461	accel/habanalabs: no CPUCP prints on heartbeat failure If we detected heartbet event while some daemon in the background send (via driver interface) CPUCP messages the dmesg will be flooded. Instead, a slight refactor in hl_fw_send_cpu_message() returns -EAGAIN when CPU is disabled (i.e. heartbeat failure) and only then. Later, all calling functions that may be invoked by user space can issue prints only if the error code is not -EAGAIN. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:45:51 +03:00
Ohad Sharabi	892bc64827	accel/habanalabs/gaudi2: use single function to compare FW versions Currently, the code contains 2 types of FW version comparison functions: - hl_is_fw_sw_ver_[below/equal_or_greater]() - gaudi2 specific function of the type gaudi2_is_fw_ver_[below/above]x_y_z() Moreover, some functions use the inner FW version which shuold be only stage during development but not version dependencies. Finally, some tests are done to deprecated FW version to which LKD should hold no compatibility. This commit aligns all APIs to a single function that just compares the version and return an integers indicator (similar in some way to strcmp()). In addition, this generic function now considers also the sub-minor FW version and also remove dead code resulting in deprecated FW versions compatibility. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Ofir Bitton <obitton@habana.ai>	2024-06-23 09:42:24 +03:00
Ofir Bitton	3bf6ef981f	accel/habanalabs/gaudi2: drain event lacks rd/wr indication Due to a H/W issue, AXI drain event does not include a read/write indication, hence we remove this print. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:47:16 +02:00
Tomer Tayar	e855869bec	accel/habanalabs: fix glbl error cause handling The glbl error cause handling has a wrong assumption that all error bits are consecutive. Fix the handling to check all relevant error bits per ASIC. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:47:00 +02:00
Tomer Tayar	c1e89ae455	accel/habanalabs/gaudi2: check extended errors according to PCIe addr_dec interrupt info The FW interrupt info for a PCIe addr_dec event is set correctly, so check for either global errors or razwi according to the indications there. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:46:55 +02:00
Farah Kassabri	c14e5cd3ed	accel/habanalabs: remove hop size from asic properties The hop size related properties is a MMU properties and not asic properties. As for PMMU and HMMU we could have different sizes. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:46:40 +02:00
Tomer Tayar	01f8cd0faf	accel/habanalabs/gaudi2: fail memory memset when failing to copy QM packet to device gaudi2_memset_memory_chunk_using_edma_qm() calls the access_dev_mem() ASIC function, but ignores its return value. Add this missing check. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:30:40 +02:00
Dani Liberman	731d320e68	accel/habanalabs: remove call to deprecated function In newer kernel versions, irq_set_affinity_hint() is deprecated. Instead, use the newer version which is irq_set_affinity_and_hint(). Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:30:40 +02:00
Farah Kassabri	f728c17fc9	accel/habanalabs/gaudi2: move HMMU page tables to device memory Currently the HMMU page tables reside in the host memory, which will cause host access from the device for every page walk. This can affect PCIe bandwidth in certain scenarios. To prevent that problem, HMMU page tables will be moved to the device memory so the miss transaction will read the hops from there instead of going to the host. Signed-off-by: Farah Kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:30:40 +02:00
Dani Liberman	e91c37f194	accel/habanalabs/gaudi2: add interrupt affinity for user interrupts User interrupts are MSIx interrupts coming from Gaudi2, that have specific range of IDs and are assigned to the sole use of the user process that opened the Gaudi2 device (reminder: there can be only a single user process running on Gaudi2 at any given time). The interrupts are allocated and managed by the driver and therefore, the user expects the driver to initialize them properly, which also includes setting the affinity to the related CPU cores of the device's NUMA node to get maximum performance. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2024-02-26 09:30:40 +02:00
Tomer Tayar	bc5f15abcf	accel/habanalabs/gaudi2: avoid overriding existing undefined opcode data Part of the undefined opcode data is updated in gaudi2_handle_qman_err_generic() and some in handle_lower_qman_data_on_err(). However, the 'write_enable' flag is checked only in gaudi2_handle_qman_err_generic(), and information of more than a single error can be mixed there. Moreover, handle_lower_qman_data_on_err() is called only for the lower QMAN, so for an error in the upper QMAN there is only a partial info. Move all the data update to be done in a single place, protected by the 'write_enable' flag. As mainly the lower QMAN's info is interesting, avoid saving the partial info for the upper QMAN. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:44 +02:00
Tomer Tayar	565ee78840	accel/habanalabs/gaudi2: add zero padding when printing QM CP instruction QM instructions are in multiples of 64 bits and the command type is in the upper bits of first QWORD. To make it clearer that an undefined command is due to a type of 0x0, always print all 64 bits and add a zero padding if needed. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:44 +02:00
Tomer Tayar	5bc155cfea	accel/habanalabs/gaudi2: use correct registers to dump QM CQ info The QM CQ PTR_LO/PTR_HI/TSIZE registers are for pushing a CQ entry, and although they are updated by HW even when descriptors are fetched by PQ and CB addresses are fed into CQ, the correct registers to use when dumping the CQ info are the ones with the _STS suffix. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:43 +02:00
Tomer Tayar	ae303d885d	accel/habanalabs/gaudi2: get the correct QM CQ info upon an error Upon a QM error, the address/size from both the CQ and the ARC_CQ are printed, although the instruction that led to the error was received from only one of them. Moreover, in case of a QM undefined opcode, only one of these address/size sets will be captured based on the value of ARC_CQ_PTR. However, this value can be non-zero even if currently the CQ is used, in case the CQ/ARC_CQ are alternately used. Under the assumption of having a stop-on-error configuration, modify to use CP_STS.CUR_CQ field to get the relevant CQ for the QM error. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:43 +02:00
Dafna Hirschfeld	0ec3467796	accel/habanalabs/gaudi2: fix undef opcode reporting currently the undefined opcode event bit in set only for lower cp and only if 'write_enable' is true. It should be set anyway and for all streams in order to report that event to userspace. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:43 +02:00
Tomer Tayar	c648548233	accel/habanalabs/gaudi2: assume hard-reset by FW upon PCIe AXI drain When a PCIe AXI drain event happens, it is possible that the driver cannot access the device through PCIe, and therefore cannot send a hard-reset request to FW. Starting from FW version 1.13, FW will initiate a hard-reset in such a case without waiting for a reset request from the driver. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-12-19 11:09:42 +02:00
Oded Gabbay	4db74c0fde	accel/habanalabs/gaudi2: fix spmu mask creation event_types_num received from the user can be 0. In that case, the event_mask should be 0. In addition, to create a correct mask we need to match the number of event types to the bit location such that bit 0 represents a single event type, bit 1 represents 2 types and so on. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>	2023-10-09 12:37:24 +03:00
Ohad Sharabi	ff92d01052	accel/habanalabs: trace dma map sgtable Traces the DMA [un]map_sgtable using the new traces we added. Signed-off-by: Ohad Sharabi <osharabi@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:23 +03:00
Oded Gabbay	d7aa294805	accel/habanalabs: remove unused asic functions asic_dma_{un}map_single() asic-specific functions are no longer called from the common code, so delete these functions. In addition, delete the gaudi2 implementation as they are also not called. Signed-off-by: Oded Gabbay <ogabbay@kernel.org> Reviewed-by: Ofir Bitton <obitton@habana.ai>	2023-10-09 12:37:23 +03:00
Dafna Hirschfeld	674f77798e	accel/habanalabs: extend preboot timeout when preboot might take longer There are cases such when FW runs MBIST, that preboot is expected to take longer than the usual. In such cases the firmware reports status SECURITY_READY/IN_PREBOOT and we extend the timeout waiting for it. This is currently implemented for Gaudi2 only. Signed-off-by: Dafna Hirschfeld <dhirschfeld@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:22 +03:00
farah kassabri	ba24b5ec78	accel/habanalabs: split user interrupts pending list Currently driver maintain one list for both pending user interrupts which seeks to wait till CQ reaches it's target value and also the ones that seeks to get timestamp records when the CQ reaches it's target value. This causes delay in handling the waiters which gets higher priority than the timestamp records. In order to solve this, let's split the list into two, one for each case and each one is protected by it's own spinlock. Waiters will be handled within the interrupt context first, then the timestamp records will be set. Freeing the timestamp related memory will be handled in a workqueue. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Tomer Tayar <ttayar@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:22 +03:00
farah kassabri	764bfd138f	accel/habanalabs/gaudi2: add eq health check using irq This is the second patch for applying the eq health check mechanism which will add support for the interrupt flow for gaudi2 asic. More info about the interrupt mechanism: set a dedicated msix for the eq error interrupt, and add interrupt handler for it. when FW detects some issue with EQ like EQ_FULL, it'll raise that interrupt and driver should reset the device. Driver will inform the FW which msix index to use through the already existing handshake mechanism which will send msix info message to fw. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:21 +03:00
farah kassabri	7c4130e6dd	accel/habanalabs/gaudi2: handle eq health heartbeat check Add mechanism for fw eq health check. this will be done using two flows: using the heartbeat mechanism and raising a dedicated interrupt to indicate an eq failure like EQ full. This patch will add implementation for the eq heartbeat for gaudi2 asic. More info about the heartbeat mechanism: Expand the heartbeat mechanism to monitor a new event that will be sent from FW upon receiving heartbeat message. that way driver can know that the eq is working or not. Signed-off-by: farah kassabri <fkassabri@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:21 +03:00
Moti Haimovski	72bff371b2	accel/habanalabs/gaudi2: print power-mode changes Print to kernel log any device power mode changes events reported by the FW. Signed-off-by: Moti Haimovski <mhaimovski@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:21 +03:00
David Meriin	2b76129c5a	accel/habanalabs: move cpucp interface to linux/habanalabs The CPUCP interface is moved to a shared folder outside of accel as a pre-requisite to upstream the NIC drivers that will also include this file. Signed-off-by: David Meriin <dmeriin@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:21 +03:00
Ofir Bitton	d261b0ab13	accel/habanalabs/gaudi2: include block id in ECC error reporting During ECC event handling, Memory wrapper id was mistakenly printed as block id. Fix the print and in addition fetch the actual block-id from firmware. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:21 +03:00
Benjamin Dotan	10d260f655	accel/habanalabs: improve etf configuration coresight ETF blocks have different size. As a result, sync packets need to be aligned based on fifo size. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Christophe JAILLET	90f3de6162	accel/habanalabs/gaudi2: Fix incorrect string length computation in gaudi2_psoc_razwi_get_engines() snprintf() returns the "number of characters which would be generated for the given input", not the size really generated. In order to avoid too large values for 'str_size' (and potential negative values for "PSOC_RAZWI_ENG_STR_SIZE - str_size") use scnprintf() instead of snprintf(). Fixes: `c0e6df9160` ("accel/habanalabs: fix address decode RAZWI handling") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Justin Stitt	a45d5cf09d	accel/habanalabs: refactor deprecated strncpy to strscpy_pad `strncpy` is deprecated for use on NUL-terminated destination strings [1]. We see that `prop->cpucp_info.card_name` is supposed to be NUL-terminated based on its usage within `__hwmon_device_register()` (wherein it's called "name"): \| if (name && (!strlen(name) \|\| strpbrk(name, "-* \t\n"))) \| dev_warn(dev, \| "hwmon: '%s' is not a valid name attribute, please fix\n", \| name); A suitable replacement is `strscpy_pad` [2] due to the fact that it guarantees both NUL-termination and NUL-padding on its destination buffer. NUL-padding on `prop->cpucp_info.card_name` is not strictly necessary as `hdev->prop` is explicitly zero-initialized but should be used regardless as it gets copied out to userspace directly -- as per Kees' suggestion. Link: https://www.kernel.org/doc/html/latest/process/deprecated.html#strncpy-on-nul-terminated-strings [1] Link: https://manpages.debian.org/testing/linux-manual-4.8/strscpy.9.en.html [2] Link: https://github.com/KSPP/linux/issues/90 Cc: linux-hardening@vger.kernel.org Signed-off-by: Justin Stitt <justinstitt@google.com> Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Benjamin Dotan	428f6882a6	accel/habanalabs: fix ETR/ETF flush logic When config_etr or config_etf are called we need to validate the parameters that are passed into them to make sure the requested operation is valid. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Benjamin Dotan	cf1ed52d12	accel/habanalabs/gaudi2 : remove psoc_arc access Because firmware is blocking PSOC_ARC_DBG, we need to disable access to this block. Signed-off-by: Benjamin Dotan <bdotan@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Igor Grinberg	01ab1629ad	accel/habanalabs/gaudi2: prepare to remove cpu_rst_status The soft reset has transitioned to CPUCP packet instead of plain register write and is about to be removed from the struct cpu_dyn_regs. As a preparation for removing the cpu_rst_status field from struct cpu_dyn_regs, switch to use the plain macro - this keeps the backward compatibility. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:20 +03:00
Ofir Bitton	a8ab1a81cc	accel/habanalabs: add info ioctl for engine error reports User gets notification for every engine error report, but he still lacks the exact engine information. Hence, we allow user to query for the exact engine reported an error. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:19 +03:00
Oded Gabbay	fa46c7bb50	accel/habanalabs/gaudi2: fix missing check of kernel ctx If we are initializing the kernel context when we have a Gaudi2 device, we don't need to do any late initializing of that context with specific Gaudi2 code. Reviewed-by: Ofir Bitton <obitton@habana.ai> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:19 +03:00
Igor Grinberg	15c0bb1623	accel/habanalabs/gaudi2: prepare to remove soft_rst_irq The soft reset has transitioned to CPUCP packet instead of plain register write and is about to be removed from the struct cpu_dyn_regs. As a preparation for removing the gic_host_soft_rst_irq field from struct cpu_dyn_regs, switch to use the plain macro - this keeps the backward compatibility. Signed-off-by: Igor Grinberg <igrinberg@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:19 +03:00
Ofir Bitton	1e3a78270b	accel/habanalabs/gaudi2: unsecure tpc count registers As TPC kernels now must use those registers we unsecure them. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:19 +03:00
Tomer Tayar	5a8487ac54	accel/habanalabs/gaudi2: un-secure register for engine cores interrupt The F/W dynamically allocates one of the PSOC scratchpad registers for the engine cores, so they can raise events towards the F/W. To allow the engine cores to access this register, this register must be non-secured. Signed-off-by: Tomer Tayar <ttayar@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:19 +03:00
Dani Liberman	43d8acce60	accel/habanalabs: handle arc farm razwi Implement razwi handling for arc farm and add it to arc farm sei event handler. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:18 +03:00
Ofir Bitton	f17182d036	accel/habanalabs: stop fetching MME SBTE error cause Because in this case we have only a single possible cause, we can safely stop fetching the cause from firmware. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:18 +03:00
Ofir Bitton	c6a4f256ae	accel/habanalabs: notify user about undefined opcode event In order for user to be aware of undefined opcode events, we must store all relevant information and notify user about the failure. The user will fetch the stored info via info ioctl. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-10-09 12:37:17 +03:00
Ofir Bitton	fac91dd54f	accel/habanalabs: add event queue extra validation In order to increase reliability of the event queue interface, we apply to Gaudi2 the same mechanism we have in Gaudi1. The extra validation is basically checking that the received event index matches the expected index. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-06-08 12:35:56 +03:00
Ofir Bitton	19aa21b980	accel/habanalabs: unsecure TSB_CFG_MTRR regs In order to utilize Engine Barrier padding, user must have access to this register set. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-06-08 12:35:56 +03:00
Ofir Bitton	8a20b38164	accel/habanalabs: fix bug of not fetching addr_dec info addr_dec info should always be fetched, regardless of cause value. Signed-off-by: Ofir Bitton <obitton@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-06-08 12:35:56 +03:00
Dani Liberman	5d658d0c51	accel/habanalabs: mask part of hmmu page fault captured address When receiving page fault from hmmu, the captured address is scrambled both by HW and by driver. The driver part is unscrambled but the HW part isn't getting unscrambled. To avoid declaring wrong address, the HW scrambled part will be masked. Signed-off-by: Dani Liberman <dliberman@habana.ai> Reviewed-by: Oded Gabbay <ogabbay@kernel.org> Signed-off-by: Oded Gabbay <ogabbay@kernel.org>	2023-06-08 12:35:56 +03:00

1 2 3

120 Commits