IB-core and rdma-core restrict the sgid_index specified by users,
which is uint8_t/u8 data type, to only be within the range of 0-255,
so it's meaningless to support excessively large gid_table_len.
On the other hand, ib-core creates as many sysfs gid files as
gid_table_len, most of which are not only useless because of the
reason above, but also greatly increase the traversal time of
the sysfs gid files for applications.
This patch limits the maximum length of gid table to 256.
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231207114231.2872104-4-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
During device init, a struct for HW stats will be allocated. As HW
stats are not supported for VF and HIP08, currently
hns_roce_alloc_hw_port_stats() returns NULL in this case. However,
ib-core considers the returned NULL pointer as memory allocation
failure and returns ENOMEM, eventually leading to the failure of VF
and HIP08 init.
In the case where the driver does not support the .alloc_hw_port_stats()
ops, ib-core will return EOPNOTSUPP and ignore this error code in the
upper layer function. So for VF and HIP08, just don't set the HW stats
ops to ib-core.
Fixes: 5a87279591 ("RDMA/hns: Support hns HW stats")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-8-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
ucmd in hns_roce_create_qp_common() are not initialized. But it works fine
until new member sdb_addr is added to struct hns_roce_ib_create_qp.
If the user-mode driver uses an old version ABI, then the value of the new
member will be undefined after ib_copy_from_udata().
This patch fixes it by initialize this variable to 0. And the default value
of the new member sdb_addr will be 0 which is invalid.
Fixes: 0425e3e6e0 ("RDMA/hns: Support flush cqe for hip08 in kernel space")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
The current driver will print all asynchronous events. Some of the
print levels are set improperly, e.g. SRQ limit reach and SRQ last
wqe reach, which may also occur during normal operation of the software.
Currently, the information of these event is printed as a warning,
which causes a large amount of printing even during normal use of the
application. As a result, the service performance deteriorates.
This patch fixes the printing storms by modifying the print level.
Fixes: b00a92c8f2 ("RDMA/hns: Move all prints out of irq handle")
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20231017125239.164455-2-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Compared with normal doorbell, using record doorbell can shorten the
process of ringing the doorbell and reduce the latency.
Add a flag HNS_ROCE_CAP_FLAG_SRQ_RECORD_DB to allow FW to
enable/disable SRQ record doorbell.
If the flag above is set, allocate the dma buffer for SRQ record
doorbell and write the buffer address into SRQC during SRQ creation.
For userspace SRQ, add a flag HNS_ROCE_RSP_SRQ_CAP_RECORD_DB to notify
userspace whether the SRQ record doorbell is enabled.
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230926130026.583088-1-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Currently, direct wqe is not supported for wr-list. RoCE driver excludes
direct wqe for wr-list by judging whether the number of wr is 1.
For a wr-list where the second wr is a length-error atomic wr, the
post-send driver handles the first wr and adds 1 to the wr number counter
firstly. While handling the second wr, the driver finds out a length error
and terminates the wr handle process, remaining the counter at 1. This
causes the driver mistakenly judges there is only 1 wr and thus enters
the direct wqe process, carrying the current length-error atomic wqe.
This patch fixes the error by adding a judgement whether the current wr
is a bad wr. If so, use the normal doorbell process but not direct wqe
despite the wr number is 1.
Fixes: 01584a5edc ("RDMA/hns: Add support of direct wqe")
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Link: https://lore.kernel.org/r/20230804012711.808069-3-huangjunxian6@hisilicon.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Long message loopback slice is used for achieving traffic balance between
QPs. It prevents the problem that QPs with large traffic occupying the
hardware pipeline for a long time and QPs with small traffic cannot be
scheduled.
Currently, its maximum value is set to 16K, which means only after a QP
sends 16K will the second QP be scheduled. This value is too large, which
will lead to unbalanced traffic scheduling, and thus it needs to be
modified.
The setting range of the long message loopback slice is modified to be
from 1024 (the lower limit supported by hardware) to mtu. Actual testing
shows that this value can significantly reduce error in hardware traffic
scheduling.
This solution is compatible with both HIP08 and HIP09. The modified
lp_pktn_ini has a maximum value of 2 (when mtu is 256), so the range
checking code for lp_pktn_ini is no longer necessary and needs to be
deleted.
Fixes: 0e60778efb ("RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility")
Link: https://lore.kernel.org/r/20230512092245.344442-4-huangjunxian6@hisilicon.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For hns, the specification of an entry like resource (E.g. WQE/CQE/EQE)
depends on BT page size, buf page size and hopnum. For user mode, the buf
page size depends on UMEM. Therefore, the actual specification is
controlled by BT page size and hopnum.
The current BT page size and hopnum are obtained from firmware. This makes
the driver inflexible and introduces unnecessary constraints. Resource
allocation failures occur in many scenarios.
This patch will calculate whether the BT page size set by firmware is
sufficient before allocating BT, and increase the BT page size if it is
insufficient.
Fixes: 1133401412 ("RDMA/hns: Optimize base address table config flow for qp buffer")
Link: https://lore.kernel.org/r/20230512092245.344442-3-huangjunxian6@hisilicon.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
On HIP08, the queried timeout attr is different from the timeout attr
configured by the user.
It is found by rdma-core testcase test_rdmacm_async_traffic:
======================================================================
FAIL: test_rdmacm_async_traffic (tests.test_rdmacm.CMTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests/test_rdmacm.py", line 33, in test_rdmacm_async_traffic
self.two_nodes_rdmacm_traffic(CMAsyncConnection, self.rdmacm_traffic,
File "./tests/base.py", line 382, in two_nodes_rdmacm_traffic
raise(res)
AssertionError
Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/20230512092245.344442-2-huangjunxian6@hisilicon.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Pull rdma updates from Jason Gunthorpe:
"Usual size of updates, a new driver, and most of the bulk focusing on
rxe:
- Usual typos, style, and language updates
- Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma,
mlx4, srp
- Lots of RXE updates:
* Improve reply error handling for bad MR operations
* Code tidying
* Debug printing uses common loggers
* Remove half implemented RD related stuff
* Support IBA's recently defined Atomic Write and Flush operations
- erdma support for atomic operations
- New driver 'mana' for Ethernet HW available in Azure VMs. This
driver only supports DPDK"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits)
IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces
RDMA: Add missed netdev_put() for the netdevice_tracker
RDMA/rxe: Enable RDMA FLUSH capability for rxe device
RDMA/cm: Make QP FLUSHABLE for supported device
RDMA/rxe: Implement flush completion
RDMA/rxe: Implement flush execution in responder side
RDMA/rxe: Implement RC RDMA FLUSH service in requester side
RDMA/rxe: Extend rxe packet format to support flush
RDMA/rxe: Allow registering persistent flag for pmem MR only
RDMA/rxe: Extend rxe user ABI to support flush
RDMA: Extend RDMA kernel verbs ABI to support flush
RDMA: Extend RDMA user ABI to support flush
RDMA/rxe: Fix incorrect responder length checking
RDMA/rxe: Fix oops with zero length reads
RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define
RDMA/hns: Fix XRC caps on HIP08
RDMA/hns: Fix error code of CMD
RDMA/hns: Fix page size cap from firmware
RDMA/hns: Fix PBL page MTR find
RDMA/hns: Fix AH attr queried by query_qp
...
The queried AH attr is invalid. This patch fix it.
This problem is found by rdma-core test test_mr_rereg_pd
ERROR: test_mr_rereg_pd (tests.test_mr.MRTest)
Test that cover rereg MR's PD with this flow:
----------------------------------------------------------------------
Traceback (most recent call last):
File "./tests/test_mr.py", line 157, in test_mr_rereg_pd
self.restate_qps()
File "./tests/test_mr.py", line 113, in restate_qps
self.server.qp.to_rts(self.server_qp_attr)
File "qp.pyx", line 1137, in pyverbs.qp.QP.to_rts
File "qp.pyx", line 1123, in pyverbs.qp.QP.to_rtr
pyverbs.pyverbs_error.PyverbsRDMAError: Failed to modify QP state to RTR.
Errno: 22, Invalid argument
Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/20221126102911.2921820-3-xuhaoyue1@hisilicon.com
Signed-off-by: Chengchang Tang <tangchengchang@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
After the hns roce driver is loaded, if you modify the mac address of the
network port, the following error will appear:
__ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4600:4dff:fe22:abb5 error=-28
hns3 0000:7d:00.0 hns_0: attr path_mtu(1) invalid while modify qp
The reason for the error is that the gid being occupied will cause the
failure to modify the gid. The gid is occupied by the loopback QP used by
free mr. When the mac address is modified, the gid will change. If there
is a busy QP at this time, the gid will not be released and the
modification will fail. The QP of free mr is created using the ib
interface. The ib interface will add a reference count to the gid,
resulting in this error scenario.
Considering that free mr is solving a bug in HIP08, not an actual
business, it is not necessary to use ib interfaces.
Fixes: 70f9252158 ("RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT")
Link: https://lore.kernel.org/r/20221126102911.2921820-2-xuhaoyue1@hisilicon.com
Signed-off-by: Yixing Liu <liuyixing1@huawei.com>
Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>