linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-27 19:12:29 -04:00

Author	SHA1	Message	Date
Jason Gunthorpe	faa72102b1	RDMA/ionic: Fix kernel stack leak in ionic_create_cq() struct ionic_cq_resp resp { __u32 cqid[2]; // offset 0 - PARTIALLY SET (see below) __u8 udma_mask; // offset 8 - SET (resp.udma_mask = vcq->udma_mask) __u8 rsvd[7]; // offset 9 - NEVER SET <- LEAK }; rsvd[7]: 7 bytes of stack memory leaked unconditionally. cqid[2]: The loop at line 1256 iterates over udma_idx but skips indices where !(vcq->udma_mask & BIT(udma_idx)). The array has 2 entries but udma_count could be 1, meaning cqid[1] might never be written via ionic_create_cq_common(). If udma_mask only has bit 0 set, cqid[1] (4 bytes) is also leaked. So potentially 11 bytes leaked. Cc: stable@vger.kernel.org Fixes: `e8521822c7` ("RDMA/ionic: Register device ops for control path") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/4-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Abhijit Gangurde <abhijit.gangurde@amd.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-24 05:03:15 -05:00
Jason Gunthorpe	74586c6da9	RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah() struct irdma_create_ah_resp { // 8 bytes, no padding __u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx) __u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK }; rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata(). The reserved members of the structure were not zeroed. Cc: stable@vger.kernel.org Fixes: `b48c24c2d7` ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-24 05:03:15 -05:00
Jason Gunthorpe	117942ca43	IB/mthca: Add missed mthca_unmap_user_db() for mthca_create_srq() Fix a user triggerable leak on the system call failure path. Cc: stable@vger.kernel.org Fixes: `ec34a922d2` ("[PATCH] IB/mthca: Add SRQ implementation") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/2-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-24 05:03:15 -05:00
Jason Gunthorpe	f22c77ce49	RDMA/efa: Fix typo in efa_alloc_mr() The pattern is to check the entire driver request space, not just sizeof something unrelated. Fixes: `40909f664d` ("RDMA/efa: Add EFA verbs implementation") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/1-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Acked-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-24 05:01:32 -05:00
Kamal Heib	fd80bd7105	RDMA/ionic: Fix potential NULL pointer dereference in ionic_query_port The function ionic_query_port() calls ib_device_get_netdev() without checking the return value which could lead to NULL pointer dereference, Fix it by checking the return value and return -ENODEV if the 'ndev' is NULL. Fixes: `2075bbe8ef` ("RDMA/ionic: Register device ops for miscellaneous functionality") Signed-off-by: Kamal Heib <kheib@redhat.com> Link: https://patch.msgid.link/20260220222125.16973-2-kheib@redhat.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-24 04:54:10 -05:00
Siva Reddy Kallam	3d2e5d12a2	RDMA/bng_re: Unwind bng_re_dev_init properly Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:270 bng_re_dev_init() warn: missing unwind goto? Current bng_re_dev_init function is not having clear unwinding code. So, added proper unwinding with ladder. Fixes: `4f830cd8d7` ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Reported-by: Simon Horman <horms@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20260218091246.1764808-3-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-02-24 03:56:28 -05:00
Siva Reddy Kallam	7a23af417d	RDMA/bng_re: Remove unnessary validity checks Fix below smatch warning: drivers/infiniband/hw/bng_re/bng_dev.c:113 bng_re_net_ring_free() warn: variable dereferenced before check 'rdev' (see line 107) current driver has unnessary validity checks. So, removing these unnessary validity checks. Fixes: `4f830cd8d7` ("RDMA/bng_re: Add infrastructure for enabling Firmware channel") Fixes: `745065770c` ("RDMA/bng_re: Register and get the resources from bnge driver") Fixes: `04e031ff6e` ("RDMA/bng_re: Initialize the Firmware and Hardware") Fixes: `d0da769c19` ("RDMA/bng_re: Add Auxiliary interface") Reported-by: Simon Horman <horms@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Closes: https://lore.kernel.org/r/202601010413.sWadrQel-lkp@intel.com/ Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Link: https://patch.msgid.link/20260218091246.1764808-2-siva.kallam@broadcom.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-02-24 03:53:24 -05:00
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	311aa68319	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...	2026-02-12 17:05:20 -08:00
Vikas Gupta	42d1c54d62	bnge/bng_re: Add a new HSI The HSI is shared between the firmware and the driver and is automatically generated. Add a new HSI for the BNGE driver. The current HSI refers to BNXT, which will become incompatible with ThorUltra devices as the BNGE driver adds more features. The BNGE driver will not use the HSI located in the bnxt folder. Also, add an HSI for ThorUltra RoCE driver. Changes in v3: - Fix in bng_roce_hsi.h reported by Jakub (AI review) https://lore.kernel.org/netdev/20260207051422.4181717-1-kuba@kernel.org/ - Add an entry in MAINTAINERS Signed-off-by: Vikas Gupta <vikas.gupta@broadcom.com> Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com> Reviewed-by: Bhargava Chenna Marreddy <bhargava.marreddy@broadcom.com> Link: https://patch.msgid.link/20260208172925.1861255-1-vikas.gupta@broadcom.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-02-11 13:44:47 +01:00
Yishai Hadas	d6c58f4eb3	RDMA/mlx5: Implement DMABUF export ops Enable p2pdma on the mlx5 PCI device to allow DMABUF-based peer-to-peer DMA mappings. Add implementation of the mmap_get_pfns and pgoff_to_mmap_entry device operations required for DMABUF support in the mlx5 RDMA driver. The pgoff_to_mmap_entry operation converts a page offset to the corresponding rdma_user_mmap_entry by extracting the command and index from the offset and looking it up in the ucontext's mmap_xa. The mmap_get_pfns operation retrieves the physical address and length from the mmap entry and obtains the p2pdma provider for the underlying PCI device, which is needed for peer-to-peer DMA operations with DMABUFs. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260201-dmabuf-export-v3-3-da238b614fe3@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-08 23:50:41 -05:00
Yael Chemla	215b53099b	net/mlx5: Fix 1600G link mode enum naming Rename TAUI/TBASE to GAUI/GBASE in 1600G link mode identifier and its usage in ethtool and link-info tables. Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Shahar Shitrit <shshitrit@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Reported-by: Dawid Osuchowski <dawid.osuchowski@linux.intel.com> Signed-off-by: Yael Chemla <ychemla@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://patch.msgid.link/20260204194324.1723534-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-02-05 18:29:04 -08:00
Kalesh AP	cae42d97d9	RDMA/mlx5: Support rate limit only for Raw Packet QP mlx5 based hardware supports rate limiting only on Raw ethernet QPs. Added an explicit check to fail the operation on any other QP types. The rate limit support has been enahanced in the stack for RC QPs too. Compile tested only. CC: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Link: https://patch.msgid.link/20260202133413.3182578-5-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-02 08:37:59 -05:00
Kalesh AP	949e7c062d	RDMA/bnxt_re: Report QP rate limit in debugfs Update QP info debugfs hook to report the rate limit applied on the QP. 0 means unlimited. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20260202133413.3182578-4-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-02 08:37:59 -05:00
Kalesh AP	13edc7d4e0	RDMA/bnxt_re: Report packet pacing capabilities when querying device Enable the support to report packet pacing capabilities from kernel to user space. Packet pacing allows to limit the rate to any number between the maximum and minimum. The capabilities are exposed to user space through query_device. The following capabilities are reported: 1. The maximum and minimum rate limit in kbps. 2. Bitmap showing which QP types support rate limit. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20260202133413.3182578-3-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Anantha Prabhu <anantha.prabhu@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-02 08:37:59 -05:00
Kalesh AP	e72d45d274	RDMA/bnxt_re: Add support for QP rate limiting Broadcom P7 chips supports applying rate limit to RC QPs. It allows adjust shaper rate values during the INIT -> RTR, RTR -> RTS, RTS -> RTS state changes or after QP transitions to RTR or RTS. Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20260202133413.3182578-2-kalesh-anakkur.purayil@broadcom.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-02 08:37:59 -05:00
Carlos Bilbao	959d2c356e	RDMA/irdma: Use kvzalloc for paged memory DMA address array Allocate array chunk->dmainfo.dmaaddrs using kvzalloc() to allow the allocation to fall back to vmalloc when contiguous memory is unavailable (instead of failing and logging page allocation warnings). Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org> Link: https://patch.msgid.link/20260128014446.405247-1-carlos.bilbao@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-28 05:44:04 -05:00
Konstantin Taranov	a01745ccf7	RDMA/mana_ib: Add device‑memory support Introduce a basic DM implementation that enables creating and registering device memory, and using the associated memory keys for networking operations. Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260127082649.429018-1-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-27 09:16:11 -05:00
Zilin Guan	9b9d253908	RDMA/mlx5: Fix memory leak in GET_DATA_DIRECT_SYSFS_PATH handler The UVERBS_HANDLER(MLX5_IB_METHOD_GET_DATA_DIRECT_SYSFS_PATH) function allocates memory for the device path using kobject_get_path(). If the length of the device path exceeds the output buffer length, the function returns -ENOSPC but does not free the allocated memory, resulting in a memory leak. Add a kfree() call to the error path to ensure the allocated memory is properly freed. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: `ec7ad65309` ("RDMA/mlx5: Introduce GET_DATA_DIRECT_SYSFS_PATH ioctl") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Link: https://patch.msgid.link/20260126074801.627898-1-zilin@seu.edu.cn Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-27 07:04:18 -05:00
Jacob Moroni	2529aead51	RDMA/irdma: Use CQ ID for CEQE context The hardware allows for an opaque CQ context field to be carried over into CEQEs for the CQ. Previously, a pointer to the CQ was used for this context. In the normal CQ destroy flow, the CEQ ring is scrubbed to remove any preexisting CEQEs for the CQ that may not have been processed yet so that the CQ structure is not dereferenced in the CEQ ISR after the CQ has been freed. However, in some cases, it is possible for a CEQE to be in flight in HW even after the CQ destroy command completion is received, so it could be missed during the scrub. To protect against this, we can take advantage of the CQ table that already exists and use the CQ ID for this context rather than a CQ pointer. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-2-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:54:20 -05:00
Jacob Moroni	2b7c2ba130	RDMA/irdma: Add enum defs for reserved CQs/QPs Added definitions for the special reserved CQs and QPs. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:54:20 -05:00
Or Har-Toov	18ea78e2ae	IB/mlx5: Fix port speed query for representors When querying speed information for a representor in switchdev mode, the code previously used the first device in the eswitch, which may not match the device that actually owns the representor. In setups such as multi-port eswitch or LAG, this led to incorrect port attributes being reported. Fix this by retrieving the correct core device from the representor's eswitch before querying its port attributes. Fixes: `27f9e0ccb6` ("net/mlx5: Lag, Add single RDMA device in multiport mode") Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260115-port-speed-query-fix-v2-1-3bde6a3c78e7@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:36:59 -05:00
Chiara Meiohas	ebc2164a4c	RDMA/mlx5: Fix UMR hang in LAG error state unload During firmware reset in LAG mode, a race condition causes the driver to hang indefinitely while waiting for UMR completion during device unload. See [1]. In LAG mode the bond device is only registered on the master, so it never sees sys_error events from the slave. During firmware reset this causes UMR waits to hang forever on unload as the slave is dead but the master hasn't entered error state yet, so UMR posts succeed but completions never arrive. Fix this by adding a sys_error notifier that gets registered before MLX5_IB_STAGE_IB_REG and stays alive until after ib_unregister_device(). This ensures error events reach the bond device throughout teardown. [1] Call Trace: __schedule+0x2bd/0x760 schedule+0x37/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.6+0x2b5/0x4a0 __mlx5_ib_dereg_mr+0x606/0x870 [mlx5_ib] ? __xa_erase+0x4a/0xa0 ? _cond_resched+0x15/0x30 ? wait_for_completion+0x31/0x100 ib_dereg_mr_user+0x48/0xc0 [ib_core] ? rdmacg_uncharge_hierarchy+0xa0/0x100 destroy_hw_idr_uobject+0x20/0x50 [ib_uverbs] uverbs_destroy_uobject+0x37/0x150 [ib_uverbs] __uverbs_cleanup_ufile+0xda/0x140 [ib_uverbs] uverbs_destroy_ufile_hw+0x3a/0xf0 [ib_uverbs] ib_uverbs_remove_one+0xc3/0x140 [ib_uverbs] remove_client_context+0x8b/0xd0 [ib_core] disable_device+0x8c/0x130 [ib_core] __ib_unregister_device+0x10d/0x180 [ib_core] ib_unregister_device+0x21/0x30 [ib_core] __mlx5_ib_remove+0x1e4/0x1f0 [mlx5_ib] auxiliary_bus_remove+0x1e/0x30 device_release_driver_internal+0x103/0x1f0 bus_remove_device+0xf7/0x170 device_del+0x181/0x410 mlx5_rescan_drivers_locked.part.10+0xa9/0x1d0 [mlx5_core] mlx5_disable_lag+0x253/0x260 [mlx5_core] mlx5_lag_disable_change+0x89/0xc0 [mlx5_core] mlx5_eswitch_disable+0x67/0xa0 [mlx5_core] mlx5_unload+0x15/0xd0 [mlx5_core] mlx5_unload_one+0x71/0xc0 [mlx5_core] mlx5_sync_reset_reload_work+0x83/0x100 [mlx5_core] process_one_work+0x1a7/0x360 worker_thread+0x30/0x390 ? create_worker+0x1a0/0x1a0 kthread+0x116/0x130 ? kthread_flush_work_fn+0x10/0x10 ret_from_fork+0x22/0x40 Fixes: `ede132a5cf` ("RDMA/mlx5: Move events notifier registration to be after device registration") Signed-off-by: Chiara Meiohas <cmeiohas@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/20260113-umr-hand-lag-fix-v1-1-3dc476e00cd9@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:04:07 -05:00
Konstantin Taranov	f972bde732	RDMA/mana_ib: Take CQ type from the device type Get CQ type from the used gdma device. The MANA_IB_CREATE_RNIC_CQ flag is ignored. It was used in older kernel versions where the mana_ib was shared between ethernet and rnic. Fixes: `d4293f96ce` ("RDMA/mana_ib: unify mana_ib functions to support any gdma device") Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com> Link: https://patch.msgid.link/20260115093625.177306-1-kotaranov@linux.microsoft.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 06:03:22 -05:00
Chengchang Tang	354e7a6d44	RDMA/hns: Support drain SQ and RQ Some ULPs, e.g. rpcrdma, rely on drain_qp() to ensure all outstanding requests are completed before releasing related memory. If drain_qp() fails, ULPs may release memory directly, and in-flight WRs may later be flushed after the memory is freed, potentially leading to UAF. drain_qp() failures can happen when HW enters an error state or is reset. Add support to drain SQ and RQ in such cases by posting a fake WR during reset, so the driver can process all remaining WRs in sequence and generate corresponding completions. Always invoke comp_handler() in drain process to ensure completions are not lost under concurrency (e.g. concurrent post_send() and reset, or QPs created during reset). If the CQ is already processed, cancel any already scheduled comp_handler() to avoid concurrency issues. Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260108113032.856306-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 04:59:53 -05:00
Jacob Moroni	5c3f795d17	RDMA/irdma: Remove fixed 1 ms delay during AH wait loop The AH CQP command wait loop executes in an atomic context and was using a fixed 1 ms delay. Since many AH create commands can complete much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay. Also, use the timeout value indicated during the capability exchange rather than a hard-coded value. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260105180550.2907858-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:19:11 -05:00
Jacob Moroni	52f3d34c29	RDMA/irdma: Remove redundant dma_wmb() before writel() A dma_wmb() is not necessary before a writel() because writel() already has an even stronger store barrier. A dma_wmb() is only required to order writes to consistent/DMA memory whereas the barrier in writel() is specified to order writes to DMA memory as well as MMIO. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260103172517.2088895-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-13 08:01:37 -05:00
Michael Chan	fdb573d675	bnxt_en: Update FW interface to 1.10.3.151 The main changes are the new HWRM_PORT_PHY_FDRSTAT command to collect FEC histogram bins and the new HWRM_NVM_DEFRAG command to defragment the NVRAM. There is also a minor name change in struct hwrm_vnic_cfg_input that requires updating the bnxt_re driver's main.c. Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260108183521.215610-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 15:19:50 -08:00
Leon Romanovsky	325e3b5431	RDMA/ocrdma: Remove unused OCRDMA_UVERBS definition The OCRDMA_UVERBS() macro is unused, so remove it to clean up the code. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-6-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:04:01 -05:00
Leon Romanovsky	cc016ebeb1	RDMA/qedr: Remove unused defines Perform basic cleanup by removing unused defines from qedr.h. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-5-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:03:46 -05:00
Leon Romanovsky	522a5c1c56	RDMA/mlx5: Avoid direct access to DMA device pointer The dma_device field is marked as internal and must not be accessed by drivers or ULPs. Remove all direct mlx5 references to this field. Link: https://patch.msgid.link/20260104-ib-core-misc-v1-4-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2026-01-05 04:02:51 -05:00
Maher Sanalla	6dc78c53de	RDMA/mlx5: Fix ucaps init error flow In mlx5_ib_stage_caps_init(), if mlx5_ib_init_ucaps() fails after mlx5_ib_init_var_table() succeeds, the VAR bitmap is leaked since the function returns without cleanup. Thus, cleanup the var table bitmap in case of error of initializing ucaps before exiting, preventing the leak above. Fixes: `cf7174e898` ("RDMA/mlx5: Create UCAP char devices for supported device capabilities") Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Link: https://patch.msgid.link/20260104-ib-core-misc-v1-3-00367f77f3a8@nvidia.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 04:02:40 -05:00
Or Har-Toov	aaecff5e13	RDMA/mlx5: Implement query_port_speed callback Implement the query_port_speed callback for mlx5 driver to support querying effective port bandwidth. For LAG configurations, query the aggregated speed from the LAG layer or from the modified vport max_tx_speed. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:43:17 -05:00
Or Har-Toov	3fd984d5cd	RDMA/mlx5: Raise async event on device speed change Raise IB_EVENT_DEVICE_SPEED_CHANGE whenever the speed of one of the device's ports changes. Usually all ports of the device changes together. This ensures user applications and upper-layer software are immediately notified when bandwidth changes, improving traffic management in dynamic environments. This is especially useful for vports which are part of a LAG configuration, to know if the effective speed of the LAG was changed. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-05 02:43:12 -05:00
Chengchang Tang	0789f92990	RDMA/hns: Notify ULP of remaining soft-WCs during reset During a reset, software-generated WCs cannot be reported via interrupts. This may cause the ULP to miss some WCs. To avoid this, add check in the CQ arm process: if a hardware reset has occurred and there are still unreported soft-WCs, notify the ULP to handle the remaining WCs, thereby preventing any loss of completions. Fixes: `626903e935` ("RDMA/hns: Add support for reporting wc as software mode") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-5-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Junxian Huang	84bd5d60f0	RDMA/hns: Fix RoCEv1 failure due to DSCP DSCP is not supported in RoCEv1, but get_dscp() is still called. If get_dscp() returns an error, it'll eventually cause create_ah to fail even when using RoCEv1. Correct the return value and avoid calling get_dscp() when using RoCEv1. Fixes: `ee20cc17e9` ("RDMA/hns: Support DSCP") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-4-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Junxian Huang	8cda8acbb1	RDMA/hns: Return actual error code instead of fixed EINVAL query_cqc() and query_mpt() may return various error codes in different cases. Return actual error code instead of fixed EINVAL. Fixes: `f2b070f36d` ("RDMA/hns: Support CQ's restrack raw ops for hns driver") Fixes: `3d67e7e236` ("RDMA/hns: Support MR's restrack raw ops for hns driver") Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-3-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Chengchang Tang	c0a26bbd3f	RDMA/hns: Fix WQ_MEM_RECLAIM warning When sunrpc is used, if a reset triggered, our wq may lead the following trace: workqueue: WQ_MEM_RECLAIM xprtiod:xprt_rdma_connect_worker [rpcrdma] is flushing !WQ_MEM_RECLAIM hns_roce_irq_workq:flush_work_handle [hns_roce_hw_v2] WARNING: CPU: 0 PID: 8250 at kernel/workqueue.c:2644 check_flush_dependency+0xe0/0x144 Call trace: check_flush_dependency+0xe0/0x144 start_flush_work.constprop.0+0x1d0/0x2f0 __flush_work.isra.0+0x40/0xb0 flush_work+0x14/0x30 hns_roce_v2_destroy_qp+0xac/0x1e0 [hns_roce_hw_v2] ib_destroy_qp_user+0x9c/0x2b4 rdma_destroy_qp+0x34/0xb0 rpcrdma_ep_destroy+0x28/0xcc [rpcrdma] rpcrdma_ep_put+0x74/0xb4 [rpcrdma] rpcrdma_xprt_disconnect+0x1d8/0x260 [rpcrdma] xprt_rdma_connect_worker+0xc0/0x120 [rpcrdma] process_one_work+0x1cc/0x4d0 worker_thread+0x154/0x414 kthread+0x104/0x144 ret_from_fork+0x10/0x18 Since QP destruction frees memory, this wq should have the WQ_MEM_RECLAIM. Fixes: `ffd541d457` ("RDMA/hns: Add the workqueue framework for flush cqe handler") Signed-off-by: Chengchang Tang <tangchengchang@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20260104064057.1582216-2-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-04 10:09:51 -05:00
Lianfa Weng	8818ffb04b	RDMA/hns: Introduce limit_bank mode with better performance In limit_bank mode, QPs/CQs are restricted to using half of the banks. HW concentrates resources on these banks, thereby improving performance compared to the default mode. Switch between limit_bank mode and default mode by setting the cap flag in FW. Since the number of QPs and CQs will be halved, this mode is suitable for scenarios where fewer QPs and CQs are required. Signed-off-by: Lianfa Weng <wenglianfa@huawei.com> Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Link: https://patch.msgid.link/20251230154911.3397584-1-huangjunxian6@hisilicon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-31 05:33:34 -05:00
Thomas Fourier	fcd431a962	RDMA/bnxt_re: fix dma_free_coherent() pointer The dma_alloc_coherent() allocates a dma-mapped buffer, pbl->pg_arr[i]. The dma_free_coherent() should pass the same buffer to dma_free_coherent() and not page-aligned. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com> Link: https://patch.msgid.link/20251230085121.8023-2-fourier.thomas@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-30 06:45:51 -05:00
Kalesh AP	3d70e0fb0f	RDMA/bnxt_re: Fix to use correct page size for PDE table In bnxt_qplib_alloc_init_hwq(), while allocating memory for PDE table driver incorrectly is using the "pg_size" value passed to the function. Fixed to use the right value 4K. Also, fixed the allocation size for PBL table. Fixes: `0c4dcd6028` ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Link: https://patch.msgid.link/20251223131855.145955-1-kalesh-anakkur.purayil@broadcom.com Reviewed-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-23 09:23:22 -05:00
Ding Hui	9b68a1cc96	RDMA/bnxt_re: Fix OOB write in bnxt_re_copy_err_stats() Commit `ef56081d18` ("RDMA/bnxt_re: RoCE related hardware counters update") added three new counters and placed them after BNXT_RE_OUT_OF_SEQ_ERR. BNXT_RE_OUT_OF_SEQ_ERR acts as a boundary marker for allocating hardware statistics with different num_counters values on chip_gen_p5_p7 devices. As a result, BNXT_RE_NUM_STD_COUNTERS are used when allocating hw_stats, which leads to an out-of-bounds write in bnxt_re_copy_err_stats(). The counters BNXT_RE_REQ_CQE_ERROR, BNXT_RE_RESP_CQE_ERROR, and BNXT_RE_RESP_REMOTE_ACCESS_ERRS are applicable to generic hardware, not only p5/p7 devices. Fix this by moving these counters before BNXT_RE_OUT_OF_SEQ_ERR so they are included in the generic counter set. Fixes: `ef56081d18` ("RDMA/bnxt_re: RoCE related hardware counters update") Reported-by: Yingying Zheng <zhengyingying@sangfor.com.cn> Signed-off-by: Ding Hui <dinghui@sangfor.com.cn> Link: https://patch.msgid.link/20251208072110.28874-1-dinghui@sangfor.com.cn Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Tested-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-22 03:56:35 -05:00
Alok Tiwari	f01765a236	RDMA/bnxt_re: Fix IB_SEND_IP_CSUM handling in post_send The bnxt_re SEND path checks wr->send_flags to enable features such as IP checksum offload. However, send_flags is a bitmask and may contain multiple flags (e.g. IB_SEND_SIGNALED \| IB_SEND_IP_CSUM), while the existing code uses a switch() statement that only matches when send_flags is exactly IB_SEND_IP_CSUM. As a result, checksum offload is not enabled when additional SEND flags are present. Replace the switch() with a bitmask test: if (wr->send_flags & IB_SEND_IP_CSUM) This ensures IP checksum offload is enabled correctly when multiple SEND flags are used. Fixes: `1ac5a40479` ("RDMA/bnxt_re: Add bnxt_re RoCE driver") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20251219093308.2415620-1-alok.a.tiwari@oracle.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-22 03:02:26 -05:00
Alok Tiwari	145a417a39	RDMA/bnxt_re: Fix incorrect BAR check in bnxt_qplib_map_creq_db() RCFW_COMM_CONS_PCI_BAR_REGION is defined as BAR 2, so checking !creq_db->reg.bar_id is incorrect and always false. pci_resource_start() returns the BAR base address, and a value of 0 indicates that the BAR is unassigned. Update the condition to test bar_base == 0 instead. This ensures the driver detects and logs an error for an unassigned RCFW communication BAR. Fixes: `cee0c7bba4` ("RDMA/bnxt_re: Refactor command queue management code") Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Link: https://patch.msgid.link/20251217100158.752504-1-alok.a.tiwari@oracle.com Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-21 04:32:47 -05:00
Yonatan Nachum	dab5825491	RDMA/efa: Improve admin completion context state machine Add a new unused state to the admin completion contexts state machine instead of the occupied field. This improves the completion validity check because it now enforce the context to be in submitted state prior to completing it. Also add allocated state as a intermediate state between unused and submitted. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://patch.msgid.link/20251210130614.36460-3-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-18 10:12:38 -05:00
Yonatan Nachum	4b01ec0f13	RDMA/efa: Check stored completion CTX command ID with received one In admin command completion, we receive a CQE with the command ID which is constructed from context index and entropy bits from the admin queue producer counter. To try to detect memory corruptions in the received CQE, validate the full command ID of the fetched context with the CQE command ID. If there is a mismatch, complete the CQE with error. Also use LSBs of the admin queue producer counter to better detect entropy mismatch between smaller number of commands. Reviewed-by: Daniel Kranzdorf <dkkranzd@amazon.com> Reviewed-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Yonatan Nachum <ynachum@amazon.com> Link: https://patch.msgid.link/20251210130614.36460-2-ynachum@amazon.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-18 10:12:38 -05:00

1 2 3 4 5 ...

9403 Commits