linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-30 20:42:33 -04:00

Author	SHA1	Message	Date
Linus Torvalds	323bbfcf1e	Convert 'alloc_flex' family to use the new default GFP_KERNEL argument This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Linus Torvalds	311aa68319	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...	2026-02-12 17:05:20 -08:00
YunJe Shin	14ab3da122	RDMA/siw: Fix potential NULL pointer dereference in header processing If siw_get_hdr() returns -EINVAL before set_rx_fpdu_context(), qp->rx_fpdu can be NULL. The error path in siw_tcp_rx_data() dereferences qp->rx_fpdu->more_ddp_segs without checking, which may lead to a NULL pointer deref. Only check more_ddp_segs when rx_fpdu is present. KASAN splat: [ 101.384271] KASAN: null-ptr-deref in range [0x00000000000000c0-0x00000000000000c7] [ 101.385869] RIP: 0010:siw_tcp_rx_data+0x13ad/0x1e50 Fixes: `8b6a361b8c` ("rdma/siw: receive path") Signed-off-by: YunJe Shin <ioerts@kookmin.ac.kr> Link: https://patch.msgid.link/20260204092546.489842-1-ioerts@kookmin.ac.kr Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-02-05 07:46:52 -05:00
Li Zhijian	87bf646921	RDMA/rxe: Fix race condition in QP timer handlers I encontered the following warning: WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0 ... libsha1 [last unloaded: ip6_udp_tunnel] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.19.0-rc5-64k-v8+ #37 PREEMPT Tainted: [C]=CRAP Hardware name: Raspberry Pi 4 Model B Rev 1.2 Call trace: rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P) retransmit_timer+0x130/0x188 [rdma_rxe] call_timer_fn+0x68/0x4d0 __run_timers+0x630/0x888 ... WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0 ... WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400 ... refcount_t: underflow; use-after-free. WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400 The issue is caused by a race condition between retransmit_timer() and rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping to zero during timer handler execution. It seems this warning is harmless because rxe_qp_do_cleanup() will flush all pending timers and requests. Example of flow causing the issue: CPU0 CPU1 retransmit_timer() { spin_lock_irqsave rxe_destroy_qp() __rxe_cleanup() __rxe_put() // qp->ref_count decrease to 0 rxe_qp_do_cleanup() { if (qp->valid) { rxe_sched_task() { WARN_ON(rxe_read(task->qp) <= 0); } } spin_unlock_irqrestore } spin_lock_irqsave qp->valid = 0 spin_unlock_irqrestore } Ensure the QP's reference count is maintained and its validity is checked within the timer callbacks by adding calls to rxe_get(qp) and corresponding rxe_put(qp) after use. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Fixes: `d946716325` ("RDMA/rxe: Rewrite rxe_task.c") Link: https://patch.msgid.link/20260120074437.623018-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-28 05:02:30 -05:00
Li Zhijian	12985e5915	RDMA/rxe: Fix iova-to-va conversion for MR page sizes != PAGE_SIZE The current implementation incorrectly handles memory regions (MRs) with page sizes different from the system PAGE_SIZE. The core issue is that rxe_set_page() is called with mr->page_size step increments, but the page_list stores individual struct page pointers, each representing PAGE_SIZE of memory. ib_sg_to_page() has ensured that when i>=1 either a) SG[i-1].dma_end and SG[i].dma_addr are contiguous or b) SG[i-1].dma_end and SG[i].dma_addr are mr->page_size aligned. This leads to incorrect iova-to-va conversion in scenarios: 1) page_size < PAGE_SIZE (e.g., MR: 4K, system: 64K): ibmr->iova = 0x181800 sg[0]: dma_addr=0x181800, len=0x800 sg[1]: dma_addr=0x173000, len=0x1000 Access iova = 0x181800 + 0x810 = 0x182010 Expected VA: 0x173010 (second SG, offset 0x10) Before fix: - index = (0x182010 >> 12) - (0x181800 >> 12) = 1 - page_offset = 0x182010 & 0xFFF = 0x10 - xarray[1] stores system page base 0x170000 - Resulting VA: 0x170000 + 0x10 = 0x170010 (wrong) 2) page_size > PAGE_SIZE (e.g., MR: 64K, system: 4K): ibmr->iova = 0x18f800 sg[0]: dma_addr=0x18f800, len=0x800 sg[1]: dma_addr=0x170000, len=0x1000 Access iova = 0x18f800 + 0x810 = 0x190010 Expected VA: 0x170010 (second SG, offset 0x10) Before fix: - index = (0x190010 >> 16) - (0x18f800 >> 16) = 1 - page_offset = 0x190010 & 0xFFFF = 0x10 - xarray[1] stores system page for dma_addr 0x170000 - Resulting VA: system page of 0x170000 + 0x10 = 0x170010 (wrong) Yi Zhang reported a kernel panic[1] years ago related to this defect. Solution: 1. Replace xarray with pre-allocated rxe_mr_page array for sequential indexing (all MR page indices are contiguous) 2. Each rxe_mr_page stores both struct page* and offset within the system page 3. Handle MR page_size != PAGE_SIZE relationships: - page_size > PAGE_SIZE: Split MR pages into multiple system pages - page_size <= PAGE_SIZE: Store offset within system page 4. Add boundary checks and compatibility validation This ensures correct iova-to-va conversion regardless of MR page size and system PAGE_SIZE relationship, while improving performance through array-based sequential access. Tests on 4K and 64K PAGE_SIZE hosts: - rdma-core/pytests $ ./build/bin/run_tests.py --dev eth0_rxe - blktest: $ TIMEOUT=30 QUICK_RUN=1 USE_RXE=1 NVMET_TRTYPES=rdma ./check nvme srp rnbd [1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/ Fixes: `592627ccbd` ("RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20260116032753.2574363-1-lizhijian@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-25 08:33:19 -05:00
Li Zhijian	d3922f6dad	RDMA/rxe: Remove unused page_offset member In rxe_map_mr_sg(), the `page_offset` member of the `rxe_mr` struct was initialized based on `ibmr.iova`, which will be updated inside ib_sg_to_pages() later. Consequently, the value assigned to `page_offset` was incorrect. However, since `page_offset` was never utilized throughout the code, it can be safely removed to clean up the codebase and avoid future confusion. Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20260116032833.2574627-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-18 11:46:15 -05:00
Jiasheng Jiang	0beefd0e15	RDMA/rxe: Fix double free in rxe_srq_from_init In rxe_srq_from_init(), the queue pointer 'q' is assigned to 'srq->rq.queue' before copying the SRQ number to user space. If copy_to_user() fails, the function calls rxe_queue_cleanup() to free the queue, but leaves the now-invalid pointer in 'srq->rq.queue'. The caller of rxe_srq_from_init() (rxe_create_srq) eventually calls rxe_srq_cleanup() upon receiving the error, which triggers a second rxe_queue_cleanup() on the same memory, leading to a double free. The call trace looks like this: kmem_cache_free+0x.../0x... rxe_queue_cleanup+0x1a/0x30 [rdma_rxe] rxe_srq_cleanup+0x42/0x60 [rdma_rxe] rxe_elem_release+0x31/0x70 [rdma_rxe] rxe_create_srq+0x12b/0x1a0 [rdma_rxe] ib_create_srq_user+0x9a/0x150 [ib_core] Fix this by moving 'srq->rq.queue = q' after copy_to_user. Fixes: `aae0484e15` ("IB/rxe: avoid srq memory leak") Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Link: https://patch.msgid.link/20260112015412.29458-1-jiashengjiangcool@gmail.com Reviewed-by: Zhu Yanjun <yanjun.Zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2026-01-15 04:59:53 -05:00
Li Zhijian	3c68cf6823	IB/rxe: Fix missing umem_odp->umem_mutex unlock on error path rxe_odp_map_range_and_lock() must release umem_odp->umem_mutex when an error occurs, including cases where rxe_check_pagefault() fails. Fixes: `2fae67ab63` ("RDMA/rxe: Add support for Send/Recv/Write/Read with ODP") Signed-off-by: Li Zhijian <lizhijian@fujitsu.com> Link: https://patch.msgid.link/20251226094112.3042583-1-lizhijian@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-30 04:22:59 -05:00
Stefan Metzmacher	de41cbc64d	RDMA/rxe: let rxe_reclassify_recv_socket() call sk_owner_put() On kernels build with CONFIG_PROVE_LOCKING, CONFIG_MODULES and CONFIG_DEBUG_LOCK_ALLOC 'rmmod rdma_rxe' is no longer possible. For the global recv sockets rxe_net_exit() is where we call rxe_release_udp_tunnel-> udp_tunnel_sock_release(), which means the sockets are destroyed before 'rmmod rdma_rxe' finishes, so there's no need to protect against rxe_recv_slock_key and rxe_recv_sk_key disappearing while the sockets are still alive. Fixes: `80a85a771d` ("RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep") Cc: Zhu Yanjun <zyjzyj2000@gmail.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251219140408.2300163-1-metze@samba.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-12-21 05:29:11 -05:00
Linus Torvalds	55aa394a5e	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "This has another new RDMA driver 'bng_en' for latest generation Broadcom NICs. There might be one more new driver still to come. Otherwise it is a fairly quite cycle. Summary: - Minor driver bug fixes and updates to cxgb4, rxe, rdmavt, bnxt_re, mlx5 - Many bug fix patches for irdma - WQ_PERCPU annotations and system_dfl_wq changes - Improved mlx5 support for "other eswitches" and multiple PFs - 1600Gbps link speed reporting support. Four Digits Now! - New driver bng_en for latest generation Broadcom NICs - Bonding support for hns - Adjust mlx5's hmm based ODP to work with the very large address space created by the new 5 level paging default on x86 - Lockdep fixups in rxe and siw" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep RDMA/siw: reclassify sockets in order to avoid false positives from lockdep RDMA/bng_re: Remove prefetch instruction RDMA/core: Reduce cond_resched() frequency in __ib_umem_release RDMA/irdma: Fix SRQ shadow area address initialization RDMA/irdma: Remove doorbell elision logic RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+ RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEY RDMA/irdma: Add missing mutex destroy RDMA/irdma: Fix SIGBUS in AEQ destroy RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2 RDMA/irdma: Fix data race in irdma_free_pble RDMA/irdma: Fix data race in irdma_sc_ccq_arm RDMA/mlx5: Add support for 1600_8x lane speed RDMA/core: Add new IB rate for XDR (8x) support IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled RDMA/bnxt_re: Pass correct flag for dma mr creation RDMA/bnxt_re: Fix the inline size for GenP7 devices RDMA/hns: Support reset recovery for bond RDMA/hns: Support link state reporting for bond ...	2025-12-04 18:54:37 -08:00
Stefan Metzmacher	80a85a771d	RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep While developing IPPROTO_SMBDIRECT support for the code under fs/smb/common/smbdirect [1], I noticed false positives like this: [+0,003927] ============================================ [+0,000532] WARNING: possible recursive locking detected [+0,000611] 6.18.0-rc5-metze-kasan-lockdep.02+ #1 Tainted: G OE [+0,000835] -------------------------------------------- [+0,000729] ksmbd:r5445/3609 is trying to acquire lock: [+0,000709] ffff88800b9570f8 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: inet_shutdown+0x52/0x360 [+0,000831] but task is already holding lock: [+0,000684] ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: smbdirect_sk_close+0x122/0x790 [smbdirect] [+0,000928] other info that might help us debug this: [+0,005552] Possible unsafe locking scenario: [+0,000723] CPU0 [+0,000359] ---- [+0,000377] lock(k-sk_lock-AF_INET); [+0,000478] lock(k-sk_lock-AF_INET); [+0,000498] * DEADLOCK * [+0,001012] May be due to missing lock nesting notation [+0,000831] 3 locks held by ksmbd:r5445/3609: [+0,000484] #0: ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0}, at: smbdirect_sk_close+0x122/0x790 [smbdirect] [+0,001000] #1: ffff888020a40458 (&id_priv->handler_mutex){+.+.}-{4:4}, at: rdma_lock_handler+0x17/0x30 [rdma_cm] [+0,000982] #2: ffff888020a40350 (&id_priv->qp_mutex){+.+.}-{4:4}, at: rdma_destroy_qp+0x5d/0x1f0 [rdma_cm] [+0,000934] stack backtrace: [+0,000589] CPU: 0 UID: 0 PID: 3609 Comm: ksmbd:r5445 Kdump: loaded Tainted: G OE 6.18.0-rc5-metze-kasan-lockdep.02+ #1 PREEMPT(voluntary) [+0,000023] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [+0,000004] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 ... [+0,000010] print_deadlock_bug+0x245/0x330 [+0,000014] validate_chain+0x32a/0x590 [+0,000012] __lock_acquire+0x535/0xc30 [+0,000013] lock_acquire.part.0+0xb3/0x240 [+0,000017] ? inet_shutdown+0x52/0x360 [+0,000013] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000007] ? mark_held_locks+0x46/0x90 [+0,000012] lock_acquire+0x60/0x140 [+0,000006] ? inet_shutdown+0x52/0x360 [+0,000028] lock_sock_nested+0x3b/0xf0 [+0,000009] ? inet_shutdown+0x52/0x360 [+0,000008] inet_shutdown+0x52/0x360 [+0,000010] kernel_sock_shutdown+0x5b/0x90 [+0,000011] rxe_qp_do_cleanup+0x4ef/0x810 [rdma_rxe] [+0,000043] ? __pfx_rxe_qp_do_cleanup+0x10/0x10 [rdma_rxe] [+0,000030] execute_in_process_context+0x2b/0x170 [+0,000013] rxe_qp_cleanup+0x1c/0x30 [rdma_rxe] [+0,000021] __rxe_cleanup+0x1cf/0x2e0 [rdma_rxe] [+0,000036] ? __pfx___rxe_cleanup+0x10/0x10 [rdma_rxe] [+0,000020] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? __kasan_check_read+0x11/0x20 [+0,000012] rxe_destroy_qp+0xe1/0x230 [rdma_rxe] [+0,000035] ib_destroy_qp_user+0x217/0x450 [ib_core] [+0,000074] rdma_destroy_qp+0x83/0x1f0 [rdma_cm] [+0,000034] smbdirect_connection_destroy_qp+0x98/0x2e0 [smbdirect] [+0,000017] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd] [+0,000044] smbdirect_connection_destroy+0x698/0xed0 [smbdirect] [+0,000023] ? __pfx_smbdirect_connection_destroy+0x10/0x10 [smbdirect] [+0,000033] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd] [+0,000031] smbdirect_connection_destroy_sync+0x42b/0x9f0 [smbdirect] [+0,000029] ? mark_held_locks+0x46/0x90 [+0,000012] ? __pfx_smbdirect_connection_destroy_sync+0x10/0x10 [smbdirect] [+0,000019] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000007] ? trace_hardirqs_on+0x64/0x70 [+0,000029] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000010] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? __smbdirect_connection_schedule_disconnect+0x339/0x4b0 [+0,000021] smbdirect_sk_destroy+0xb0/0x680 [smbdirect] [+0,000024] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000006] ? trace_hardirqs_on+0x64/0x70 [+0,000006] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000005] ? __local_bh_enable_ip+0xba/0x150 [+0,000011] sk_common_release+0x66/0x340 [+0,000010] smbdirect_sk_close+0x12a/0x790 [smbdirect] [+0,000023] ? ip_mc_drop_socket+0x1e/0x240 [+0,000013] inet_release+0x10a/0x240 [+0,000011] smbdirect_sock_release+0x502/0xe80 [smbdirect] [+0,000015] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000024] sock_release+0x91/0x1c0 [+0,000010] smb_direct_free_transport+0x31/0x50 [ksmbd] [+0,000025] ksmbd_conn_free+0x1d0/0x240 [ksmbd] [+0,000040] smb_direct_disconnect+0xb2/0x120 [ksmbd] [+0,000023] ? srso_alias_return_thunk+0x5/0xfbef5 [+0,000018] ksmbd_conn_handler_loop+0x94e/0xf10 [ksmbd] ... I'll also add reclassify to the smbdirect socket code [1], but I think it's better to have it in both direction (below and above the RDMA layer). [1] https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect Cc: Zhu Yanjun <zyjzyj2000@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251127105614.2040922-1-metze@samba.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-27 07:10:02 -05:00
Stefan Metzmacher	3a2c32d357	RDMA/siw: reclassify sockets in order to avoid false positives from lockdep While developing IPPROTO_SMBDIRECT support for the code under fs/smb/common/smbdirect [1], I noticed false positives like this: [T79] ====================================================== [T79] WARNING: possible circular locking dependency detected [T79] 6.18.0-rc4-metze-kasan-lockdep.01+ #1 Tainted: G OE [T79] ------------------------------------------------------ [T79] kworker/2:0/79 is trying to acquire lock: [T79] ffff88801f968278 (sk_lock-AF_INET){+.+.}-{0:0}, at: sock_set_reuseaddr+0x14/0x70 [T79] but task is already holding lock: [T79] ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] which lock already depends on the new lock. [T79] the existing dependency chain (in reverse order) is: [T79] -> #1 (lock#9){+.+.}-{4:4}: [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] __mutex_lock+0x1af/0x1c10 [T79] mutex_lock_nested+0x1b/0x30 [T79] cma_get_port+0xba/0x7d0 [rdma_cm] [T79] rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] cma_bind_addr+0x107/0x320 [rdma_cm] [T79] rdma_resolve_addr+0xa3/0x830 [rdma_cm] [T79] destroy_lease_table+0x12b/0x420 [ksmbd] [T79] ksmbd_NTtimeToUnix+0x3e/0x80 [ksmbd] [T79] ndr_encode_posix_acl+0x6e9/0xab0 [ksmbd] [T79] ndr_encode_v4_ntacl+0x53/0x870 [ksmbd] [T79] __sys_connect_file+0x131/0x1c0 [T79] __sys_connect+0x111/0x140 [T79] __x64_sys_connect+0x72/0xc0 [T79] x64_sys_call+0xe7d/0x26a0 [T79] do_syscall_64+0x93/0xff0 [T79] entry_SYSCALL_64_after_hwframe+0x76/0x7e [T79] -> #0 (sk_lock-AF_INET){+.+.}-{0:0}: [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] lock_acquire.part.0+0xb3/0x240 [T79] lock_acquire+0x60/0x140 [T79] lock_sock_nested+0x3b/0xf0 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] worker_thread+0x6f0/0x11f0 [T79] kthread+0x3ec/0x8b0 [T79] ret_from_fork+0x314/0x400 [T79] ret_from_fork_asm+0x1a/0x30 [T79] other info that might help us debug this: [T79] Possible unsafe locking scenario: [T79] CPU0 CPU1 [T79] ---- ---- [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] lock(lock#9); [T79] lock(sk_lock-AF_INET); [T79] * DEADLOCK * [T79] 5 locks held by kworker/2:0/79: [T79] #0: ffff88800120b158 ((wq_completion)events_long){+.+.}-{0:0}, at: process_one_work+0xfca/0x1930 [T79] #1: ffffc9000474fd00 ((work_completion)(&ctrl->ctrl_work)) {+.+.}-{0:0}, at: process_one_work+0x804/0x1930 [T79] #2: ffffffffc11307d0 (ctrl_lock){+.+.}-{4:4}, at: server_ctrl_handle_work+0x21/0x280 [ksmbd] [T79] #3: ffffffffc11347b0 (init_lock){+.+.}-{4:4}, at: ksmbd_conn_transport_init+0x18/0x70 [ksmbd] [T79] #4: ffffffffc10f7230 (lock#9){+.+.}-{4:4}, at: rdma_listen+0x3d2/0x740 [rdma_cm] [T79] stack backtrace: [T79] CPU: 2 UID: 0 PID: 79 Comm: kworker/2:0 Kdump: loaded Tainted: G OE 6.18.0-rc4-metze-kasan-lockdep.01+ #1 PREEMPT(voluntary) [T79] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [T79] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [T79] Workqueue: events_long server_ctrl_handle_work [ksmbd] ... [T79] print_circular_bug+0xfd/0x130 [T79] check_noncircular+0x150/0x170 [T79] check_prev_add+0xf3/0xcd0 [T79] validate_chain+0x466/0x590 [T79] __lock_acquire+0x535/0xc30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] lock_acquire.part.0+0xb3/0x240 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? __kasan_check_write+0x14/0x30 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? apparmor_socket_post_create+0x180/0x700 [T79] lock_acquire+0x60/0x140 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] lock_sock_nested+0x3b/0xf0 [T79] ? sock_set_reuseaddr+0x14/0x70 [T79] sock_set_reuseaddr+0x14/0x70 [T79] siw_create_listen+0x145/0x1540 [siw] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? local_clock_noinstr+0xe/0xd0 [T79] ? __pfx_siw_create_listen+0x10/0x10 [siw] [T79] ? trace_preempt_on+0x4c/0x130 [T79] ? __raw_spin_unlock_irqrestore+0x4a/0x90 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? preempt_count_sub+0x52/0x80 [T79] iw_cm_listen+0x313/0x5b0 [iw_cm] [T79] cma_iw_listen+0x271/0x3c0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] rdma_listen+0x3b1/0x740 [rdma_cm] [T79] ? _raw_spin_unlock+0x2c/0x60 [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? rdma_restrack_add+0x12c/0x630 [ib_core] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] cma_listen_on_dev+0x46a/0x750 [rdma_cm] [T79] rdma_listen+0x4b0/0x740 [rdma_cm] [T79] ? __pfx_rdma_listen+0x10/0x10 [rdma_cm] [T79] ? cma_get_port+0x30d/0x7d0 [rdma_cm] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? rdma_bind_addr_dst+0x598/0x9a0 [rdma_cm] [T79] ksmbd_rdma_init+0x12b/0x270 [ksmbd] [T79] ? __pfx_ksmbd_rdma_init+0x10/0x10 [ksmbd] [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? register_netdevice_notifier+0x1dc/0x240 [T79] ksmbd_conn_transport_init+0x26/0x70 [ksmbd] [T79] server_ctrl_handle_work+0x1e5/0x280 [ksmbd] [T79] process_one_work+0x86c/0x1930 [T79] ? __pfx_process_one_work+0x10/0x10 [T79] ? srso_alias_return_thunk+0x5/0xfbef5 [T79] ? assign_work+0x16f/0x280 [T79] worker_thread+0x6f0/0x11f0 I was not able to reproduce this as I was testing with various runs switching siw and rxe as well as IPPROTO_SMBDIRECT sockets, while the above stack used siw with the non IPPROTO_SMBDIRECT patches [1]. Even if this patch doesn't solve the above I think it's a good idea to reclassify the sockets used by siw, I also send patches for rxe to reclassify, as well as my IPPROTO_SMBDIRECT socket patches [1] will do it, this should minimize potential false positives. [1] https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect Cc: Bernard Metzler <bernard.metzler@linux.dev> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Link: https://patch.msgid.link/20251126150842.1837072-1-metze@samba.org Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-27 05:07:24 -05:00
Marco Crivellari	7196156b0c	IB/rdmavt: WQ_PERCPU added to alloc_workqueue users Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This change adds a new WQ_PERCPU flag to explicitly request alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. CC: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Marco Crivellari <marco.crivellari@suse.com> Link: https://patch.msgid.link/20251101163121.78400-6-marco.crivellari@suse.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-11-06 02:23:23 -05:00
Kees Cook	85cb0757d7	net: Convert proto_ops connect() callbacks to use sockaddr_unsized Update all struct proto_ops connect() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 19:10:32 -08:00
Kees Cook	0e50474fa5	net: Convert proto_ops bind() callbacks to use sockaddr_unsized Update all struct proto_ops bind() callback function prototypes from "struct sockaddr " to "struct sockaddr_unsized " to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 19:10:32 -08:00
Zhu Yanjun	503a5e4690	RDMA/rxe: Fix null deref on srq->rq.queue after resize failure A NULL pointer dereference can occur in rxe_srq_chk_attr() when ibv_modify_srq() is invoked twice in succession under certain error conditions. The first call may fail in rxe_queue_resize(), which leads rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then triggers a crash (null deref) when accessing srq->rq.queue->buf->index_mask. Call Trace: <TASK> rxe_modify_srq+0x170/0x480 [rdma_rxe] ? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe] ? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs] ? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs] ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs] ? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs] ? tryinc_node_nr_active+0xe6/0x150 ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs] ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs] ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs] ? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs] ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs] ? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs] ? __pfx___raw_spin_lock_irqsave+0x10/0x10 ? __pfx_do_vfs_ioctl+0x10/0x10 ? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0 ? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10 ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs] ? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs] __x64_sys_ioctl+0x138/0x1c0 do_syscall_64+0x82/0x250 ? fdget_pos+0x58/0x4c0 ? ksys_write+0xf3/0x1c0 ? __pfx_ksys_write+0x10/0x10 ? do_syscall_64+0xc8/0x250 ? __pfx_vm_mmap_pgoff+0x10/0x10 ? fget+0x173/0x230 ? fput+0x2a/0x80 ? ksys_mmap_pgoff+0x224/0x4c0 ? do_syscall_64+0xc8/0x250 ? do_user_addr_fault+0x37b/0xfe0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 ? clear_bhb_loop+0x50/0xa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `8700e3e7c4` ("Soft RoCE driver") Tested-by: Liu Yi <asatsuyu.liu@gmail.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20251027215203.1321-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-28 04:04:07 -04:00
Colin Ian King	1511efaca0	RDMA/rxe: Remove redundant assignment to variable page_offset The variable page_offset is being assigned a value at the start of a loop and being redundantly zero'd at the end of the loop, there is no code that reads the zero'd value. The assignment is redundant and can be removed. Signed-off-by: Colin Ian King <coking@nvidia.com> Link: https://patch.msgid.link/20251014120343.2528608-1-coking@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-10-19 08:06:21 -04:00
Linus Torvalds	2ccb4d203f	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "A new Pensando ionic driver, a new Gen 3 HW support for Intel irdma, and lots of small bnxt_re improvements. - Small bug fixes and improves to hfi1, efa, mlx5, erdma, rdmarvt, siw - Allow userspace access to IB service records through the rdmacm - Optimize dma mapping for erdma - Fix shutdown of the GSI QP in mana - Support relaxed ordering MR and fix a corruption bug with mlx5 DMA Data Direct - Many improvement to bnxt_re: - Debugging features and counters - Improve performance of some commands - Change flow_label reporting in completions - Mirror vnic - RDMA flow support - New RDMA driver for Pensando Ethernet devices: ionic - Gen 3 hardware support for the Intel irdma driver - Fix rdma routing resolution with VRFs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (85 commits) RDMA/ionic: Fix memory leak of admin q_wr RDMA/siw: Always report immediate post SQ errors RDMA/bnxt_re: improve clarity in ALLOC_PAGE handler RDMA/irdma: Remove unused struct irdma_cq fields RDMA/irdma: Fix positive vs negative error codes in irdma_post_send() RDMA/bnxt_re: Remove non-statistics counters from hw_counters RDMA/bnxt_re: Add debugfs info entry for device and resource information RDMA/bnxt_re: Fix incorrect errno used in function comments RDMA: Use %pe format specifier for error pointers RDMA/ionic: Use ether_addr_copy instead of memcpy RDMA/ionic: Fix build failure on SPARC due to xchg() operand size RDMA/rxe: Fix race in do_task() when draining IB/sa: Fix sa_local_svc_timeout_ms read race IB/ipoib: Ignore L3 master device RDMA/core: Use route entry flag to decide on loopback traffic RDMA/core: Resolve MAC of next-hop device without ARP support RDMA/core: Squash a single user static function RDMA/irdma: Update Kconfig RDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices RDMA/irdma: Add Atomic Operations support ...	2025-10-03 18:35:22 -07:00
Bernard Metzler	fdd0fe94d6	RDMA/siw: Always report immediate post SQ errors In siw_post_send(), any immediate error encountered during processing of the work request list must be reported to the caller, even if previous work requests in that list were just accepted and added to the send queue. Not reporting those errors confuses the caller, which would wait indefinitely for the failing and potentially subsequently aborted work requests completion. This fixes a case where immediate errors were overwritten by subsequent code in siw_post_send(). Fixes: `303ae1cdfd` ("rdma/siw: application interface") Link: https://patch.msgid.link/r/20250923144536.103825-1-bernard.metzler@linux.dev Suggested-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-09-26 13:12:19 -03:00
Gui-Dong Han	8ca7eada62	RDMA/rxe: Fix race in do_task() when draining When do_task() exhausts its iteration budget (!ret), it sets the state to TASK_STATE_IDLE to reschedule, without a secondary check on the current task->state. This can overwrite the TASK_STATE_DRAINING state set by a concurrent call to rxe_cleanup_task() or rxe_disable_task(). While state changes are protected by a spinlock, both rxe_cleanup_task() and rxe_disable_task() release the lock while waiting for the task to finish draining in the while(!is_done(task)) loop. The race occurs if do_task() hits its iteration limit and acquires the lock in this window. The cleanup logic may then proceed while the task incorrectly reschedules itself, leading to a potential use-after-free. This bug was introduced during the migration from tasklets to workqueues, where the special handling for the draining case was lost. Fix this by restoring the original pre-migration behavior. If the state is TASK_STATE_DRAINING when iterations are exhausted, set cont to 1 to force a new loop iteration. This allows the task to finish its work, so that a subsequent iteration can reach the switch statement and correctly transition the state to TASK_STATE_DRAINED, stopping the task as intended. Fixes: `9b4b7c1f9f` ("RDMA/rxe: Add workqueue support for rxe tasks") Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Link: https://patch.msgid.link/20250919025212.1682087-1-hanguidong02@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-09-21 07:22:28 -04:00
Qianfeng Rong	490a253cb4	RDMA/rdmavt: Use int type to store negative error codes Change 'ret' from u32 to int in alloc_qpn() to store -EINVAL, and remove the 'bail' label as it simply returns 'ret'. Storing negative error codes in an u32 causes no runtime issues, but it's ugly as pants, Change 'ret' from u32 to int type - this change has no runtime impact. Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Link: https://patch.msgid.link/20250826150556.541440-1-rongqianfeng@vivo.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-09-11 02:18:35 -04:00
Zhu Yanjun	3c3e9a9f29	RDMA/rxe: Flush delayed SKBs while releasing RXE resources When skb packets are sent out, these skb packets still depends on the rxe resources, for example, QP, sk, when these packets are destroyed. If these rxe resources are released when the skb packets are destroyed, the call traces will appear. To avoid skb packets hang too long time in some network devices, a timestamp is added when these skb packets are created. If these skb packets hang too long time in network devices, these network devices can free these skb packets to release rxe resources. Reported-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8425ccfb599521edb153 Tested-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com Fixes: `1a633bdc8f` ("RDMA/rxe: Let destroy qp succeed with stuck packet") Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250726013104.463570-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-08-13 06:20:00 -04:00
Linus Torvalds	2095cf558f	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma fix from Jason Gunthorpe: "Single fix to correct the iov_iter construction in soft iwarp. This avoids blktest crashes with recent changes to the allocators" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages	2025-08-07 07:36:23 +03:00
Pedro Falcato	c18646248f	RDMA/siw: Fix the sendmsg byte count in siw_tcp_sendpages Ever since commit `c2ff29e99a` ("siw: Inline do_tcp_sendpages()"), we have been doing this: static int siw_tcp_sendpages(struct socket s, struct page page, int offset, size_t size) [...] / Calculate the number of bytes we need to push, for this page * specifically / size_t bytes = min_t(size_t, PAGE_SIZE - offset, size); / If we can't splice it, then copy it in, as normal / if (!sendpage_ok(page[i])) msg.msg_flags &= ~MSG_SPLICE_PAGES; / Set the bvec pointing to the page, with len $bytes / bvec_set_page(&bvec, page[i], bytes, offset); / Set the iter to $size, aka the size of the whole sendpages (!!!) / iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); try_page_again: lock_sock(sk); / Sendmsg with $size size (!!!) / rv = tcp_sendmsg_locked(sk, &msg, size); This means we've been sending oversized iov_iters and tcp_sendmsg calls for a while. This has a been a benign bug because sendpage_ok() always returned true. With the recent slab allocator changes being slowly introduced into next (that disallow sendpage on large kmalloc allocations), we have recently hit out-of-bounds crashes, due to slight differences in iov_iter behavior between the MSG_SPLICE_PAGES and "regular" copy paths: (MSG_SPLICE_PAGES) skb_splice_from_iter iov_iter_extract_pages iov_iter_extract_bvec_pages uses i->nr_segs to correctly stop in its tracks before OoB'ing everywhere skb_splice_from_iter gets a "short" read (!MSG_SPLICE_PAGES) skb_copy_to_page_nocache copy=iov_iter_count [...] copy_from_iter / this doesn't help */ if (unlikely(iter->count < len)) len = iter->count; iterate_bvec ... and we run off the bvecs Fix this by properly setting the iov_iter's byte count, plus sending the correct byte count to tcp_sendmsg_locked. Link: https://patch.msgid.link/r/20250729120348.495568-1-pfalcato@suse.de Cc: stable@vger.kernel.org Fixes: `c2ff29e99a` ("siw: Inline do_tcp_sendpages()") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202507220801.50a7210-lkp@intel.com Reviewed-by: David Howells <dhowells@redhat.com> Signed-off-by: Pedro Falcato <pfalcato@suse.de> Acked-by: Bernard Metzler <bernard.metzler@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-08-05 11:33:10 -03:00
Linus Torvalds	7ce4de1cda	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: - Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1, rxe, erdma, mana_ib - Prefetch supprot for rxe ODP - Remove memory window support from hns as new device FW is no longer support it - Remove qib, it is very old and obsolete now, Cornelis wishes to restructure the hfi1/qib shared layer - Fix a race in destroying CQs where we can still end up with work running because the work is cancled before the driver stops triggering it - Improve interaction with namespaces: * Follow the devlink namespace for newly spawned RDMA devices * Create iopoib net devces in the parent IB device's namespace * Allow CAP_NET_RAW checks to pass in user namespaces - A new flow control scheme for IB MADs to try and avoid queue overflows in the network - Fix 2G message sizes in bnxt_re - Optimize mkey layout for mlx5 DMABUF - New "DMA Handle" concept to allow controlling PCI TPH and steering tags * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits) RDMA/siw: Change maintainer email address RDMA/mana_ib: add support of multiple ports RDMA/mlx5: Refactor optional counters steering code RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr IB: Extend UVERBS_METHOD_REG_MR to get DMAH RDMA/mlx5: Add DMAH object support RDMA/core: Introduce a DMAH object and its alloc/free APIs IB/core: Add UVERBS_METHOD_REG_MR on the MR object net/mlx5: Add support for device steering tag net/mlx5: Expose IFC bits for TPH PCI/TPH: Expose pcie_tph_get_st_table_size() RDMA/mlx5: Fix incorrect MKEY masking RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey() RDMA/mlx5: remove redundant check on err on return expression RDMA/mana_ib: add additional port counters RDMA/mana_ib: Fix DSCP value in modify QP RDMA/efa: Add CQ with external memory support RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers RDMA/uverbs: Add a common way to create CQ with umem RDMA/mlx5: Optimize DMABUF mkey page size ...	2025-07-31 12:19:55 -07:00
Yishai Hadas	a272019a46	IB: Extend UVERBS_METHOD_REG_MR to get DMAH Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers. It will be used in mlx5 driver as part of the next patch from the series. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Edward Srouji <edwards@nvidia.com> Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-07-23 01:42:11 -04:00
Stanislav Fomichev	93893a57ef	net: s/dev_get_flags/netif_get_flags/ Commit `cc34acd577` ("docs: net: document new locking reality") introduced netif_ vs dev_ function semantics: the former expects locked netdev, the latter takes care of the locking. We don't strictly follow this semantics on either side, but there are more dev_xxx handlers now that don't fit. Rename them to netif_xxx where appropriate. Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-18 17:27:47 -07:00
Mark Bloch	8cffca866b	RDMA/core: Extend RDMA device registration to be net namespace aware Presently, RDMA devices are always registered within the init network namespace, even if the associated devlink device's namespace was changed via a devlink reload. This mismatch leads to discrepancies between the network namespace of the devlink device and that of the RDMA device. Therefore, extend the RDMA device allocation API to optionally take the net namespace. This isn't limited to devices that support devlink but allows all users to provide the network namespace if they need to do so. If a network namespace is provided during device allocation, it's up to the caller to make sure the namespace stays valid until ib_register_device() is called. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2025-06-26 08:10:07 -04:00
Dan Carpenter	19564a8576	RDMA/rxe: Fix a couple IS_ERR() vs NULL bugs The lookup_mr() function returns NULL on error. It never returns error pointers. Fixes: `9284bc34c7` ("RDMA/rxe: Enable asynchronous prefetch for ODP MRs") Fixes: `3576b0df15` ("RDMA/rxe: Implement synchronous prefetch for ODP MRs") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://patch.msgid.link/685c1430.050a0220.18b0ef.da83@mx.google.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-06-26 05:19:56 -04:00
Arnd Bergmann	12423d8e18	RDMA/siw: work around clang stack size warning clang inlines a lot of functions into siw_qp_sq_process(), with the aggregate stack frame blowing the warning limit in some configurations: drivers/infiniband/sw/siw/siw_qp_tx.c:1014:5: error: stack frame size (1544) exceeds limit (1280) in 'siw_qp_sq_process' [-Werror,-Wframe-larger-than] The real problem here is the array of kvec structures in siw_tx_hdt that makes up the majority of the consumed stack space. Ideally there would be a way to avoid allocating the array on the stack, but that would require a larger rework. Add a noinline_for_stack annotation to avoid the warning for now, and make clang behave the same way as gcc here. The combined stack usage is still similar, but is spread over multiple functions now. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20250620114332.4072051-1-arnd@kernel.org Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-06-26 05:19:56 -04:00
Daisuke Matsuda	c81fef2202	RDMA/rxe: Remove redundant page presence check hmm_pfn_to_page() does not return NULL. ib_umem_odp_map_dma_and_lock() should return an error in case the target pages cannot be mapped until timeout, so these checks can safely be removed. Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com> Link: https://patch.msgid.link/20250611162758.10000-1-dskmtsd@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-06-12 07:07:40 -04:00
Daisuke Matsuda	9284bc34c7	RDMA/rxe: Enable asynchronous prefetch for ODP MRs Calling ibv_advise_mr(3) with flags other than IBV_ADVISE_MR_FLAG_FLUSH invokes an asynchronous request. It is best-effort, and thus can safely be deferred to the system-wide workqueue. The reference counter in rxe_mr is used to ensure that the MRs persist and that rxe is not terminated until the queued work is done. Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com> Link: https://patch.msgid.link/20250522111955.3227-3-dskmtsd@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-06-12 04:09:42 -04:00
Daisuke Matsuda	3576b0df15	RDMA/rxe: Implement synchronous prefetch for ODP MRs Minimal implementation of ibv_advise_mr(3) requires synchronous calls being successful with the IBV_ADVISE_MR_FLAG_FLUSH flag. Asynchronous requests, which are best-effort, will be supported subsequently. Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com> Link: https://patch.msgid.link/20250522111955.3227-2-dskmtsd@gmail.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-06-12 04:07:04 -04:00
Ingo Molnar	41cb08555c	treewide, timers: Rename from_timer() to timer_container_of() Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com	2025-06-08 09:07:37 +02:00
Linus Torvalds	dd91b5e1d6	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma Pull rdma updates from Jason Gunthorpe: "Usual collection of driver fixes: - Small bug fixes and cleansup in hfi, hns, rxe, mlx5, mana siw - Further ODP functionality in rxe - Remote access MRs in mana, along with more page sizes - Improve CM scalability with a rwlock around the agent - More trace points for hns - ODP hmm conversion to the new two step dma API - Support the ethernet HW device in mana as well as the RNIC - Cleanups: - Use secs_to_jiffies() when appropriate - Use ERR_CAST() instead of naked casts - Don't use %pK in printk - Unusued functions removed - Allocation type matching" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (57 commits) RDMA/cma: Fix hang when cma_netevent_callback fails to queue_work RDMA/bnxt_re: Support extended stats for Thor2 VF RDMA/hns: Fix endian issue in trace events RDMA/mlx5: Avoid flexible array warning IB/cm: Remove dead code and adjust naming RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices RDMA/rxe: Break endless pagefault loop for RO pages RDMA/bnxt_re: Fix return code of bnxt_re_configure_cc RDMA/bnxt_re: Fix missing error handling for tx_queue RDMA/bnxt_re: Fix incorrect display of inactivity_cp in debugfs output RDMA/mlx5: Add support for 200Gbps per lane speeds RDMA/mlx5: Remove the redundant MLX5_IB_STAGE_UAR stage RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction net: mana: Add support for auxiliary device servicing events RDMA/mana_ib: unify mana_ib functions to support any gdma device RDMA/mana_ib: Add support of mana_ib for RNIC and ETH nic net: mana: Probe rdma device in mana driver RDMA/siw: replace redundant ternary operator with just rv RDMA/umem: Separate implicit ODP initialization from explicit ODP RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage ...	2025-05-30 10:18:56 -07:00
Jason Gunthorpe	ef2233850e	Merge tag 'v6.15' into rdma.git for-next Following patches need the RDMA rc branch since we are past the RC cycle now. Merge conflicts resolved based on Linux-next: - For RXE odp changes keep for-next version and fixup new places that need to call is_odp_mr() https://lore.kernel.org/r/20250422143019.500201bd@canb.auug.org.au https://lore.kernel.org/r/20250514122455.3593b083@canb.auug.org.au - irdma is keeping the while/kfree bugfix from -rc and the pf/cdev_info change from for-next https://lore.kernel.org/r/20250513130630.280ee6c5@canb.auug.org.au Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-05-26 15:33:52 -03:00
Jakub Kicinski	33e1b1b399	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.15-rc8). Conflicts: `80f2ab46c2` ("irdma: free iwdev->rf after removing MSI-X") `4bcc063939` ("ice, irdma: fix an off by one in error handling code") `c24a65b6a2` ("iidc/ice/irdma: Update IDC to support multiple consumers") https://lore.kernel.org/20250513130630.280ee6c5@canb.auug.org.au No extra adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-22 09:42:41 -07:00
Leon Romanovsky	0b261d7c1c	RDMA/rxe: Break endless pagefault loop for RO pages RO pages has "perm" equal to 0, that caused to the situation where such pages were marked as needed to have fault and caused to infinite loop. Fixes: `eedd5b1276` ("RDMA/umem: Store ODP access mask information in PFN") Reported-by: Daisuke Matsuda <dskmtsd@gmail.com> Closes: https://lore.kernel.org/all/3016329a-4edd-4550-862f-b298a1b79a39@gmail.com/ Link: https://patch.msgid.link/096fab178d48ed86942ee22eafe9be98e29092aa.1747913377.git.leonro@nvidia.com Tested-by: Daisuke Matsuda <dskmtsd@gmail.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2025-05-22 12:05:21 -04:00
Eric Biggers	62673b7df9	RDMA/siw: use skb_crc32c() instead of __skb_checksum() Instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C, just call the new function skb_crc32c(). This is faster and simpler. Acked-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-5-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-21 15:40:05 -07:00
Colin Ian King	8536666a52	RDMA/siw: replace redundant ternary operator with just rv The use of the ternary operator on rv is redundant, rv is either the initialized value of 0 or a negative error return code, so it can never be greater than zero, and hence the zero assignment in ternary operator is redundant. Just return rv instead. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://patch.msgid.link/20250507131834.253823-1-colin.i.king@gmail.com Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-05-12 06:20:24 -04:00
Leon Romanovsky	1efe8c0670	RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage Reuse newly added DMA API to cache IOVA and only link/unlink pages in fast path for UMEM ODP flow. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2025-05-12 06:06:51 -04:00
Leon Romanovsky	eedd5b1276	RDMA/umem: Store ODP access mask information in PFN As a preparation to remove dma_list, store access mask in PFN pointer and not in dma_addr_t. Tested-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>	2025-05-12 06:06:46 -04:00
Dr. David Alan Gilbert	e56b4eab9c	RDMA/siw: Remove unused siw_mem_add siw_mem_add() was added in 2019 by commit `2251334dca` ("rdma/siw: application buffer management") but has remained unused. Remove it. Link: https://patch.msgid.link/r/20250505210226.88994-1-linux@treblig.org Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Acked-by: Bernard Metzler <bmt@zurich.ibm.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2025-05-06 14:30:13 -03:00
Daisuke Matsuda	23ea3c70ee	RDMA/rxe: Remove 32-bit architecture support Major linux distibutions have phased out support for 32-bit machines. Since rxe is primarily used for development and testing, the benefit of maintaining 32-bit support is minimal. This change simplifies ATOMIC WRITE implementations and improves maintainability of the driver. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250421025101.3588139-1-matsuda-daisuke@fujitsu.com Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-04-21 04:16:29 -04:00
Dr. David Alan Gilbert	d85080df12	RDMA/rxe: Remove unused rxe_run_task rxe_run_task() has been unused since 2024's commit `23bc06af54` ("RDMA/rxe: Don't call direct between tasks") Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Link: https://patch.msgid.link/20250419132725.199785-1-linux@treblig.org Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-04-20 11:27:39 -04:00
Zhu Yanjun	1c7eec4d5f	RDMA/rxe: Fix "trying to register non-static key in rxe_qp_do_cleanup" bug Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 assign_lock_key kernel/locking/lockdep.c:986 [inline] register_lock_class+0x4a3/0x4c0 kernel/locking/lockdep.c:1300 __lock_acquire+0x99/0x1ba0 kernel/locking/lockdep.c:5110 lock_acquire kernel/locking/lockdep.c:5866 [inline] lock_acquire+0x179/0x350 kernel/locking/lockdep.c:5823 __timer_delete_sync+0x152/0x1b0 kernel/time/timer.c:1644 rxe_qp_do_cleanup+0x5c3/0x7e0 drivers/infiniband/sw/rxe/rxe_qp.c:815 execute_in_process_context+0x3a/0x160 kernel/workqueue.c:4596 __rxe_cleanup+0x267/0x3c0 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_qp+0x3f7/0x5f0 drivers/infiniband/sw/rxe/rxe_verbs.c:604 create_qp+0x62d/0xa80 drivers/infiniband/core/verbs.c:1250 ib_create_qp_kernel+0x9f/0x310 drivers/infiniband/core/verbs.c:1361 ib_create_qp include/rdma/ib_verbs.h:3803 [inline] rdma_create_qp+0x10c/0x340 drivers/infiniband/core/cma.c:1144 rds_ib_setup_qp+0xc86/0x19a0 net/rds/ib_cm.c:600 rds_ib_cm_initiate_connect+0x1e8/0x3d0 net/rds/ib_cm.c:944 rds_rdma_cm_event_handler_cmn+0x61f/0x8c0 net/rds/rdma_transport.c:109 cma_cm_event_handler+0x94/0x300 drivers/infiniband/core/cma.c:2184 cma_work_handler+0x15b/0x230 drivers/infiniband/core/cma.c:3042 process_one_work+0x9cc/0x1b70 kernel/workqueue.c:3238 process_scheduled_works kernel/workqueue.c:3319 [inline] worker_thread+0x6c8/0xf10 kernel/workqueue.c:3400 kthread+0x3c2/0x780 kernel/kthread.c:464 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:153 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> The root cause is as below: In the function rxe_create_qp, the function rxe_qp_from_init is called to create qp, if this function rxe_qp_from_init fails, rxe_cleanup will be called to handle all the allocated resources, including the timers: retrans_timer and rnr_nak_timer. The function rxe_qp_from_init calls the function rxe_qp_init_req to initialize the timers: retrans_timer and rnr_nak_timer. But these timers are initialized in the end of rxe_qp_init_req. If some errors occur before the initialization of these timers, this problem will occur. The solution is to check whether these timers are initialized or not. If these timers are not initialized, ignore these timers. Fixes: `8700e3e7c4` ("Soft RoCE driver") Reported-by: syzbot+4edb496c3cad6e953a31@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=4edb496c3cad6e953a31 Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250419080741.1515231-1-yanjun.zhu@linux.dev Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-04-20 11:25:37 -04:00
Zhu Yanjun	f81b33582f	RDMA/rxe: Fix slab-use-after-free Read in rxe_queue_cleanup bug Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x7d/0xa0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcf/0x610 mm/kasan/report.c:489 kasan_report+0xb5/0xe0 mm/kasan/report.c:602 rxe_queue_cleanup+0xd0/0xe0 drivers/infiniband/sw/rxe/rxe_queue.c:195 rxe_cq_cleanup+0x3f/0x50 drivers/infiniband/sw/rxe/rxe_cq.c:132 __rxe_cleanup+0x168/0x300 drivers/infiniband/sw/rxe/rxe_pool.c:232 rxe_create_cq+0x22e/0x3a0 drivers/infiniband/sw/rxe/rxe_verbs.c:1109 create_cq+0x658/0xb90 drivers/infiniband/core/uverbs_cmd.c:1052 ib_uverbs_create_cq+0xc7/0x120 drivers/infiniband/core/uverbs_cmd.c:1095 ib_uverbs_write+0x969/0xc90 drivers/infiniband/core/uverbs_main.c:679 vfs_write fs/read_write.c:677 [inline] vfs_write+0x26a/0xcc0 fs/read_write.c:659 ksys_write+0x1b8/0x200 fs/read_write.c:731 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xaa/0x1b0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f In the function rxe_create_cq, when rxe_cq_from_init fails, the function rxe_cleanup will be called to handle the allocated resources. In fact, some memory resources have already been freed in the function rxe_cq_from_init. Thus, this problem will occur. The solution is to let rxe_cleanup do all the work. Fixes: `8700e3e7c4` ("Soft RoCE driver") Link: https://paste.ubuntu.com/p/tJgC42wDf6/ Tested-by: liuyi <liuy22@mails.tsinghua.edu.cn> Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev> Link: https://patch.msgid.link/20250412075714.3257358-1-yanjun.zhu@linux.dev Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-04-20 06:14:49 -04:00
Daisuke Matsuda	29610226c3	RDMA/rxe: Fix mismatched type declarations Some functions return int values while they are defined as enum resp_states variables. This patch resolves the mismatches in rxe. Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Link: https://patch.msgid.link/20250409102701.1275265-1-matsuda-daisuke@fujitsu.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leon@kernel.org>	2025-04-11 13:45:07 -04:00

1 2 3 4 5 ...

1219 Commits