linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Jakub Kicinski	b6e39e4846	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c `c3812651b5` ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") `78723a62b9` ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c `fde29fd934` ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") `d98adfbdd5` ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c `51f4e090b9` ("net: stmmac: fix integer underflow in chain mode") `6b4286e055` ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 13:20:59 -07:00
Linus Torvalds	a55f7f5f29	Merge tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter, IPsec and wireless. This is again considerably bigger than the old average. No known outstanding regressions. Current release - regressions: - net: increase IP_TUNNEL_RECURSION_LIMIT to 5 - eth: ice: fix PTP timestamping broken by SyncE code on E825C Current release - new code bugs: - eth: stmmac: dwmac-motorcomm: fix eFUSE MAC address read failure Previous releases - regressions: - core: fix cross-cache free of KFENCE-allocated skb head - sched: act_csum: validate nested VLAN headers - rxrpc: fix call removal to use RCU safe deletion - xfrm: - wait for RCU readers during policy netns exit - fix refcount leak in xfrm_migrate_policy_find - wifi: rt2x00usb: fix devres lifetime - mptcp: fix slab-use-after-free in __inet_lookup_established - ipvs: fix NULL deref in ip_vs_add_service error path - eth: - airoha: fix memory leak in airoha_qdma_rx_process() - lan966x: fix use-after-free and leak in lan966x_fdma_reload() Previous releases - always broken: - ipv6: ioam: fix potential NULL dereferences in __ioam6_fill_trace_data() - ipv4: nexthop: avoid duplicate NHA_HW_STATS_ENABLE on nexthop group dump - bridge: guard local VLAN-0 FDB helpers against NULL vlan group - xsk: tailroom reservation and MTU validation - rxrpc: - fix to request an ack if window is limited - fix RESPONSE authenticator parser OOB read - netfilter: nft_ct: fix use-after-free in timeout object destroy - batman-adv: hold claim backbone gateways by reference - eth: - stmmac: fix PTP ref clock for Tegra234 - idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling - ipa: fix GENERIC_CMD register field masks for IPA v5.0+" * tag 'net-7.0-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (104 commits) net: lan966x: fix use-after-free and leak in lan966x_fdma_reload() net: lan966x: fix page pool leak in error paths net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool() nfc: pn533: allocate rx skb before consuming bytes l2tp: Drop large packets with UDP encap net: ipa: fix event ring index not programmed for IPA v5.0+ net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+ MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver devlink: Fix incorrect skb socket family dumping af_unix: read UNIX_DIAG_VFS data under unix_state_lock Revert "mptcp: add needs_id for netlink appending addr" mptcp: fix slab-use-after-free in __inet_lookup_established net: txgbe: leave space for null terminators on property_entry net: ioam6: fix OOB and missing lock rxrpc: proc: size address buffers for %pISpc output rxrpc: only handle RESPONSE during service challenge rxrpc: Fix buffer overread in rxgk_do_verify_authenticator() rxrpc: Fix leak of rxgk context in rxgk_verify_response() rxrpc: Fix integer overflow in rxgk_verify_response() rxrpc: Fix missing error checks for rxkad encryption/decryption failure ...	2026-04-09 08:39:25 -07:00
Linus Torvalds	8b02520ec5	Merge tag 'iommu-fixes-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull IOMMU fix from Will Deacon: - Fix regression introduced by the empty MMU gather fix in -rc7, where the ->iotlb_sync() callback can be elided incorrectly, resulting in boot failures (hangs), crashes and potential memory corruption. * tag 'iommu-fixes-v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu: Ensure .iotlb_sync is called correctly	2026-04-09 08:36:31 -07:00
Linus Torvalds	acfa7a3544	Merge tag 'platform-drivers-x86-v7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform drivers fixes from Ilpo Järvinen: - amd/pmc: Add Thinkpad L14 Gen3 to quirk_s2idle_bug - asus-armoury: Add support for FA607NU, GU605MU, and GV302XU. - intel-uncore-freq: Handle autonomous UFS status bit - ISST: Handle cases with less than max buckets correctly - intel-uncore-freq & ISST: Mark minor version 3 supported (no additional driver changes required) * tag 'platform-drivers-x86-v7.0-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: asus-armoury: add support for GU605MU platform/x86: asus-armoury: add support for FA607NU platform/x86: asus-armoury: add support for GV302XU platform/x86/amd: pmc: Add Thinkpad L14 Gen3 to quirk_s2idle_bug platform/x86/intel-uncore-freq: Increase minor version platform/x86: ISST: Increase minor version platform/x86/intel-uncore-freq: Handle autonomous UFS status bit platform/x86: ISST: Reset core count to 0	2026-04-09 08:34:08 -07:00
Paolo Abeni	b4afe3fa76	Merge branch 'net-lan966x-fix-page_pool-error-handling-and-error-paths' David Carlier says: ==================== net: lan966x: fix page_pool error handling and error paths This series fixes error handling around the lan966x page pool: 1/3 adds the missing IS_ERR check after page_pool_create(), preventing a kernel oops when the error pointer flows into xdp_rxq_info_reg_mem_model(). 2/3 plugs page pool leaks in the lan966x_fdma_rx_alloc() and lan966x_fdma_init() error paths, now reachable after 1/3. 3/3 fixes a use-after-free and page pool leak in the lan966x_fdma_reload() restore path, where the hardware could resume DMA into pages already returned to the page pool. ==================== Link: https://patch.msgid.link/20260405055241.35767-1-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:25 +02:00
David Carlier	59c3d55a94	net: lan966x: fix use-after-free and leak in lan966x_fdma_reload() When lan966x_fdma_reload() fails to allocate new RX buffers, the restore path restarts DMA using old descriptors whose pages were already freed via lan966x_fdma_rx_free_pages(). Since page_pool_put_full_page() can release pages back to the buddy allocator, the hardware may DMA into memory now owned by other kernel subsystems. Additionally, on the restore path, the newly created page pool (if allocation partially succeeded) is overwritten without being destroyed, leaking it. Fix both issues by deferring the release of old pages until after the new allocation succeeds. Save the old page array before the allocation so old pages can be freed on the success path. On the failure path, the old descriptors, pages and page pool are all still valid, making the restore safe. Also ensure the restore path re-enables NAPI and wakes the netdev, matching the success path. Fixes: `89ba464fcf` ("net: lan966x: refactor buffer reload function") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-4-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
David Carlier	076344a6ad	net: lan966x: fix page pool leak in error paths lan966x_fdma_rx_alloc() creates a page pool but does not destroy it if the subsequent fdma_alloc_coherent() call fails, leaking the pool. Similarly, lan966x_fdma_init() frees the coherent DMA memory when lan966x_fdma_tx_alloc() fails but does not destroy the page pool that was successfully created by lan966x_fdma_rx_alloc(), leaking it. Add the missing page_pool_destroy() calls in both error paths. Fixes: `11871aba19` ("net: lan96x: Use page_pool API") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-3-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
David Carlier	3fd0da4fd8	net: lan966x: fix page_pool error handling in lan966x_fdma_rx_alloc_page_pool() page_pool_create() can return an ERR_PTR on failure. The return value is used unconditionally in the loop that follows, passing the error pointer through xdp_rxq_info_reg_mem_model() into page_pool_use_xdp_mem(), which dereferences it, causing a kernel oops. Add an IS_ERR check after page_pool_create() to return early on failure. Fixes: `11871aba19` ("net: lan96x: Use page_pool API") Cc: stable@vger.kernel.org Signed-off-by: David Carlier <devnexen@gmail.com> Link: https://patch.msgid.link/20260405055241.35767-2-devnexen@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 15:17:23 +02:00
Robin Murphy	7e0548525a	iommu: Ensure .iotlb_sync is called correctly Many drivers have no reason to use the iotlb_gather mechanism, but do still depend on .iotlb_sync being called to properly complete an unmap. Since the core code is now relying on the gather to detect when there is legitimately something to sync, it should also take care of encoding a successful unmap when the driver does not touch the gather itself. Fixes: `90c5def10b` ("iommu: Do not call drivers for empty gathers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Closes: https://lore.kernel.org/r/8800a38b-8515-4bbe-af15-0dae81274bf7@nvidia.com Signed-off-by: Robin Murphy <robin.murphy@arm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Will Deacon <will@kernel.org>	2026-04-09 13:07:13 +01:00
Pengpeng Hou	c71ba669b5	nfc: pn533: allocate rx skb before consuming bytes pn532_receive_buf() reports the number of accepted bytes to the serdev core. The current code consumes bytes into recv_skb and may already hand a complete frame to pn533_recv_frame() before allocating a fresh receive buffer. If that alloc_skb() fails, the callback returns 0 even though it has already consumed bytes, and it leaves recv_skb as NULL for the next receive callback. That breaks the receive_buf() accounting contract and can also lead to a NULL dereference on the next skb_put_u8(). Allocate the receive skb lazily before consuming the next byte instead. If allocation fails, return the number of bytes already accepted. Fixes: `c656aa4c27` ("nfc: pn533: add UART phy driver") Cc: stable@vger.kernel.org Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Link: https://patch.msgid.link/20260405094003.3-pn533-v2-pengpeng@iscas.ac.cn Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 13:54:37 +02:00
Paolo Abeni	9700282a7e	Merge branch 'r8152-add-support-for-the-rtl8157-5gbit-usb-ethernet-chip' Birger Koblitz says: ==================== r8152: Add support for the RTL8157 5Gbit USB Ethernet chip Add support for the RTL8157, which is a 5GBit USB-Ethernet adapter chip in the RTL815x family of chips. The RTL8157 uses a different frame descriptor format, and different SRAM/ADV access methods, plus offers 5GBit/s Ethernet, so support for these features is added in addition to chip initialization and configuration. The module was tested with an OEM RTL8157 USB adapter: [25758.328238] usb 4-1: new SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd [25758.345565] usb 4-1: New USB device found, idVendor=0bda, idProduct=8157, bcdDevice=30.00 [25758.345585] usb 4-1: New USB device strings: Mfr=1, Product=2, SerialNumber=7 [25758.345593] usb 4-1: Product: USB 10/100/1G/2.5G/5G LAN [25758.345599] usb 4-1: Manufacturer: Realtek [25758.345605] usb 4-1: SerialNumber: 000300E04C68xxxx [25758.534241] r8152-cfgselector 4-1: reset SuperSpeed Plus Gen 2x1 USB device number 2 using xhci_hcd [25758.603511] r8152 4-1:1.0: skip request firmware [25758.653351] r8152 4-1:1.0 eth0: v1.12.13 [25758.689271] r8152 4-1:1.0 enx00e04c68xxxx: renamed from eth0 [25763.271682] r8152 4-1:1.0 enx00e04c68xxxx: carrier on The RTL8157 adapter was tested against an AQC107 PCIe-card supporting 10GBit/s and an RTL8126 5Gbit PCIe-card supporting 5GBit/s for performance, link speed and EEE negotiation. Using USB3.2 Gen 1 with the RTL8157 USB adapter and running iperf3 against the AQC107 PCIe card resulted in 3.47 Gbits/sec, whereas using USB3.2 Gen2 resulted in 4.70 Gbits/sec, speeds against the RTL8126-card were the same. As the code integrates the RTL8157-specific code with existing RTL8156 code in order to improve code maintainability (instead of adding RTL8157-specific functions duplicaing most of the RTL8156 code), regression tests were done with an Edimax EU-4307 V1.0 USB-Ethernet adapter with RTL8156. The code is based on the out-of-tree r8152 driver published by Realtek under the GPL. This patch is on top of linux-next as the code re-uses the 2.5 Gbit EEE recently added in r8152.c. Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> ==================== Link: https://patch.msgid.link/20260404-rtl8157_next-v7-0-039121318f23@birger-koblitz.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 12:16:48 +02:00
Birger Koblitz	fd3c7d080d	r8152: Add support for the RTL8157 hardware The RTL8157 uses a different packet descriptor format compared to the previous generation of chips. Add support for this format by adding a descriptor format structure into the r8152 structure and corresponding desc_ops functions which abstract the vlan-tag, tx/rx len and tx/rx checksum algorithms. Also, add support for the ADV indirect access interface of the RTL8157 and PHY setup. For initialization of the RTL8157, combine the existing RTL8156B and RTL8156 init functions and add RTL8157-specific functinality in order to improve code readability and maintainability. r8156_init() is now called with RTL_VER_10 and RTL_VER_11 for the RTL8156, with RTL_VER_12, RTL_VER_13 and RTL_VER_15 for the RTL8156B and with RTL_VER_16 for the RTL8157 and checks the version for chip-specific code. Also add USB power control functions for the RTL8157. Add support for the USB device ID of Realtek RTL8157-based adapters. Detect the RTL8157 as RTL_VER_16 and set it up. Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260404-rtl8157_next-v7-2-039121318f23@birger-koblitz.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 12:16:46 +02:00
Birger Koblitz	ebe5fd2ed2	r8152: Add support for 5Gbit Link Speeds and EEE The RTL8157 supports 5GBit Link speeds. Add support for this speed in the setup and setting/getting through ethtool. Also add 5GBit EEE. Add functionality for setup and ethtool get/set methods. Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260404-rtl8157_next-v7-1-039121318f23@birger-koblitz.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 12:16:46 +02:00
Alice Mikityanska	ebe560ea5f	l2tp: Drop large packets with UDP encap syzbot reported a WARN on my patch series [1]. The actual issue is an overflow of 16-bit UDP length field, and it exists in the upstream code. My series added a debug WARN with an overflow check that exposed the issue, that's why syzbot tripped on my patches, rather than on upstream code. syzbot's repro: r0 = socket$pppl2tp(0x18, 0x1, 0x1) r1 = socket$inet6_udp(0xa, 0x2, 0x0) connect$inet6(r1, &(0x7f00000000c0)={0xa, 0x0, 0x0, @loopback, 0xfffffffc}, 0x1c) connect$pppl2tp(r0, &(0x7f0000000240)=@pppol2tpin6={0x18, 0x1, {0x0, r1, 0x4, 0x0, 0x0, 0x0, {0xa, 0x4e22, 0xffff, @ipv4={'\x00', '\xff\xff', @empty}}}}, 0x32) writev(r0, &(0x7f0000000080)=[{&(0x7f0000000000)="ee", 0x34000}], 0x1) It basically sends an oversized (0x34000 bytes) PPPoL2TP packet with UDP encapsulation, and l2tp_xmit_core doesn't check for overflows when it assigns the UDP length field. The value gets trimmed to 16 bites. Add an overflow check that drops oversized packets and avoids sending packets with trimmed UDP length to the wire. syzbot's stack trace (with my patch applied): len >= 65536u WARNING: ./include/linux/udp.h:38 at udp_set_len_short include/linux/udp.h:38 [inline], CPU#1: syz.0.17/5957 WARNING: ./include/linux/udp.h:38 at l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline], CPU#1: syz.0.17/5957 WARNING: ./include/linux/udp.h:38 at l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327, CPU#1: syz.0.17/5957 Modules linked in: CPU: 1 UID: 0 PID: 5957 Comm: syz.0.17 Not tainted syzkaller #0 PREEMPT(full) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 RIP: 0010:udp_set_len_short include/linux/udp.h:38 [inline] RIP: 0010:l2tp_xmit_core net/l2tp/l2tp_core.c:1293 [inline] RIP: 0010:l2tp_xmit_skb+0x1204/0x18d0 net/l2tp/l2tp_core.c:1327 Code: 0f 0b 90 e9 21 f9 ff ff e8 e9 05 ec f6 90 0f 0b 90 e9 8d f9 ff ff e8 db 05 ec f6 90 0f 0b 90 e9 cc f9 ff ff e8 cd 05 ec f6 90 <0f> 0b 90 e9 de fa ff ff 44 89 f1 80 e1 07 80 c1 03 38 c1 0f 8c 4f RSP: 0018:ffffc90003d67878 EFLAGS: 00010293 RAX: ffffffff8ad985e3 RBX: ffff8881a6400090 RCX: ffff8881697f0000 RDX: 0000000000000000 RSI: 0000000000034010 RDI: 000000000000ffff RBP: dffffc0000000000 R08: 0000000000000003 R09: 0000000000000004 R10: dffffc0000000000 R11: fffff520007acf00 R12: ffff8881baf20900 R13: 0000000000034010 R14: ffff8881a640008e R15: ffff8881760f7000 FS: 000055557e81f500(0000) GS:ffff8882a9467000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000033000 CR3: 00000001612f4000 CR4: 00000000000006f0 Call Trace: <TASK> pppol2tp_sendmsg+0x40a/0x5f0 net/l2tp/l2tp_ppp.c:302 sock_sendmsg_nosec net/socket.c:727 [inline] __sock_sendmsg net/socket.c:742 [inline] sock_write_iter+0x503/0x550 net/socket.c:1195 do_iter_readv_writev+0x619/0x8c0 fs/read_write.c:-1 vfs_writev+0x33c/0x990 fs/read_write.c:1059 do_writev+0x154/0x2e0 fs/read_write.c:1105 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x14d/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f636479c629 Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 e8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffffd4241c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007f6364a15fa0 RCX: 00007f636479c629 RDX: 0000000000000001 RSI: 0000200000000080 RDI: 0000000000000003 RBP: 00007f6364832b39 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007f6364a15fac R14: 00007f6364a15fa0 R15: 00007f6364a15fa0 </TASK> [1]: https://lore.kernel.org/all/20260226201600.222044-1-alice.kernel@fastmail.im/ Fixes: `3557baabf2` ("[L2TP]: PPP over L2TP driver core") Reported-by: syzbot+ci3edea60a44225dec@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/69a1dfba.050a0220.3a55be.0026.GAE@google.com/ Signed-off-by: Alice Mikityanska <alice@isovalent.com> Link: https://patch.msgid.link/20260403174949.843941-1-alice.kernel@fastmail.im Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 10:19:05 +02:00
Alexander Koskovich	56007972c0	net: ipa: fix event ring index not programmed for IPA v5.0+ For IPA v5.0+, the event ring index field moved from CH_C_CNTXT_0 to CH_C_CNTXT_1. The v5.0 register definition intended to define this field in the CH_C_CNTXT_1 fmask array but used the old identifier of ERINDEX instead of CH_ERINDEX. Without a valid event ring, GSI channels could never signal transfer completions. This caused gsi_channel_trans_quiesce() to block forever in wait_for_completion(). At least for IPA v5.2 this resolves an issue seen where runtime suspend, system suspend, and remoteproc stop all hanged forever. It also meant the IPA data path was completely non functional. Fixes: `faf0678ec8` ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alexander Koskovich <akoskovich@pm.me> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403-milos-ipa-v1-2-01e9e4e03d3e@fairphone.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 09:47:31 +02:00
Alexander Koskovich	9709b56d90	net: ipa: fix GENERIC_CMD register field masks for IPA v5.0+ Fix the field masks to match the hardware layout documented in downstream GSI (GSI_V3_0_EE_n_GSI_EE_GENERIC_CMD_*). Notably this fixes a WARN I was seeing when I tried to send "stop" to the MPSS remoteproc while IPA was up. Fixes: `faf0678ec8` ("net: ipa: add IPA v5.0 GSI register definitions") Signed-off-by: Alexander Koskovich <akoskovich@pm.me> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260403-milos-ipa-v1-1-01e9e4e03d3e@fairphone.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 09:47:31 +02:00
Jakub Kicinski	2607c0907d	Merge branch '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-04-06 (idpf, ice, ixgbe, ixgbevf, igb, e1000) Emil converts to use spinlock_t for virtchnl transactions to make consistent use of the xn_bm_lock when accessing the free_xn_bm bitmap, while also avoiding nested raw/bh spinlock issue on PREEMPT_RT kernels. He also sets payload size before calling the async handler, to make sure it doesn't error out prematurely due to invalid size check for idpf. Kohei Enju changes WARN_ON for missing PTP control PF to a dev_info() on ice as there are cases where this is expected and acceptable. Petr Oros fixes conditions in which error paths failed to call ice_ptp_port_phy_restart() breaking PTP functionality on ice. Alex significantly reduces reporting of driver information, and time under RTNL locl, on ixgbe e610 devices by reducing reads of flash info only on events that could change it. Michal Schmidt adds missing Hyper-V op on ixgbevf. Alex Dvoretsky removes call to napi_synchronize() in igb_down() to resolve a deadlock. Agalakov Daniil adds error check on e1000 for failed EEPROM read. * '200GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: e1000: check return value of e1000_read_eeprom igb: remove napi_synchronize() in igb_down() ixgbevf: add missing negotiate_features op to Hyper-V ops table ixgbe: stop re-reading flash on every get_drvinfo for e610 ice: fix PTP timestamping broken by SyncE code on E825C ice: ptp: don't WARN when controlling PF is unavailable idpf: set the payload size before calling the async handler idpf: improve locking around idpf_vc_xn_push_free() idpf: fix PREEMPT_RT raw/bh spinlock nesting for async VC handling ==================== Link: https://patch.msgid.link/20260406213038.444732-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 20:05:10 -07:00
Jakub Kicinski	58dd34dbd5	Merge branch 'devlink-add-per-port-resource-support' Tariq Toukan says: ==================== devlink: add per-port resource support This series by Or adds devlink per-port resource support: Currently, devlink resources are only available at the device level. However, some resources are inherently per-port, such as the maximum number of subfunctions (SFs) that can be created on a specific PF port. This limitation prevents user space from obtaining accurate per-port capacity information. This series adds infrastructure for per-port resources in devlink core and implements it in the mlx5 driver to expose the max_SFs resource on PF devlink ports. Patch #1 refactors resource functions to be generic Patch #2 adds port-level resource registration infrastructure Patch #3 registers SF resource on PF port representor in mlx5 Patch #4 adds devlink port resource registration to netdevsim for testing Patch #5 adds dump support for device-level resources Patch #6 includes port resources in the resource dump dumpit path Patch #7 adds port-specific option to resource dump doit path Patch #8 adds selftest for devlink port resource doit Patch #9 documents port-level resources and full dump Patch #10 adds resource scope filtering to resource dump Patch #11 adds selftest for resource dump and scope filter Patch #12 documents resource scope filtering ==================== Link: https://patch.msgid.link/20260407194107.148063-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:43 -07:00
Or Har-Toov	78c327c172	devlink: Document resource scope filtering Document the scope parameter for devlink resource show, which allows filtering the dump to device-level or port-level resources only. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	2a8e912352	selftest: netdevsim: Add resource dump and scope filter test Add resource_dump_test() which verifies dumping resources for all devices and ports, and tests that scope=dev returns only device-level resources and scope=port returns only port resources. Skip if userspace does not support the scope parameter. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	1bc45341a6	devlink: Add resource scope filtering to resource dump Allow filtering the resource dump to device-level or port-level resources using the 'scope' option. Example - dump only device-level resources: $ devlink resource show scope dev pci/0000:03:00.0: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none Example - dump only port-level resources: $ devlink resource show scope port pci/0000:03:00.0/196608: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.0/196609: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196708: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196709: name max_SFs size 128 unit entry dpipe_tables none Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	170e160a0e	devlink: Document port-level resources and full dump Document the port-level resource support and the option to dump all resources, including both device-level and port-level entries. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	3961353771	selftest: netdevsim: Add devlink port resource doit test Tests that querying a specific port handle returns the expected resource name and size. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	7511ff14f3	devlink: Add port-specific option to resource dump doit Allow querying devlink resources per-port via the resource-dump doit handler. When a port-index attribute is provided, only that port's resources are returned. When no port-index is given, only device-level resources are returned, preserving backward compatibility. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	810b76394d	devlink: Include port resources in resource dump dumpit Allow querying devlink resources per-port via the resource-dump dumpit handler. Both device-level and all ports resources are included in the reply. For example: $ devlink resource show pci/0000:03:00.0: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:03:00.0/196608: name max_SFs size 20 unit entry pci/0000:03:00.1: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:03:00.1/262144: name max_SFs size 20 unit entry Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	11636b550e	devlink: Add dump support for device-level resources Add dumpit handler for resource-dump command to iterate over all devlink devices and show their resources. $ devlink resource show pci/0000:08:00.0: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:08:00.1: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	085b234b28	netdevsim: Add devlink port resource registration Register port-level resources for netdevsim ports to enable testing of the port resource infrastructure. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	4be8326d81	net/mlx5: Register SF resource on PF port representor The device-level "resource show" displays max_local_SFs and max_external_SFs without indicating which port each resource belongs to. Users cannot determine the controller number and pfnum associated with each SF pool. Register max_SFs resource on the host PF representor port to expose per-port SF limits. Users can correlate the port resource with the controller number and pfnum shown in 'devlink port show'. Future patches will introduce an ECPF that manages multiple PFs, where each PF has its own SF pool. Example usage: $ devlink resource show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: name max_SFs size 20 unit entry $ devlink port show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: type eth netdev pf0hpf flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr b8:3f:d2:e1:8f:dc roce enable max_io_eqs 120 We can create up to 20 SFs over devlink port pci/0000:03:00.0/196608, with pfnum 0 and controller 1. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	6f38acfed5	devlink: Add port-level resource registration infrastructure The current devlink resource infrastructure supports only device-level resources. Some hardware resources are associated with specific ports rather than the entire device, and today we have no way to show resource per-port. Add support for registering resources at the port level. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	7be3163c49	devlink: Refactor resource functions to be generic Currently the resource functions take devlink pointer as parameter and take the resource list from there. Allow resource functions to work with other resource lists that will be added in next patches and not only with the devlink's resource list. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Leon Hwang	5ae4ba98d7	selftests/drivers/net: Add an xdp test to xdp.py In "bpf: Disallow freplace on XDP with mismatched xdp_has_frags values" [1], this XDP test is suggested to add to xdp.py. 1. Verify the failure of updating frag-capable prog with non-frag-capable prog, when the frag-capable prog attaches to mtu=9k driver. The test has been verified against Mellanox CX6 and Intel 82599ES NICs. With dropping other tests, here is the test log. # ethtool -i eth0 driver: mlx5_core version: 6.19.0-061900-generic # NETIF=eth0 python3 xdp.py TAP version 13 1..1 ok 1 xdp.test_xdp_native_update_mb_to_sb # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 # ethtool -i eth0 driver: ixgbe version: 6.19.0-061900-generic # NETIF=eth0 python3 xdp.py TAP version 13 1..1 # CMD: ip link set dev eth0 xdpdrv obj /path/to/tools/testing/selftests/net/lib/xdp_dummy.bpf.o sec xdp.frags # EXIT: 2 # STDERR: RTNETLINK answers: Invalid argument ok 1 xdp.test_xdp_native_update_mb_to_sb # SKIP device does not support multi-buffer XDP # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0 Signed-off-by: Leon Hwang <leon.huangfu@shopee.com> Link: https://patch.msgid.link/20260406072655.368173-1-leon.huangfu@shopee.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:48:56 -07:00
Raju Rangoju	30f3b767ae	MAINTAINERS: Add Prashanth as additional maintainer for amd-xgbe driver Add Prashanth as an additional maintainer for the amd-xgbe Ethernet driver to help with ongoing development and maintenance. Cc: Prashanth Kumar K R <PrashanthKumar.K.R@amd.com> Signed-off-by: Raju Rangoju <Raju.Rangoju@amd.com> Link: https://patch.msgid.link/20260406073816.3218387-1-Raju.Rangoju@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:43:13 -07:00
Jakub Kicinski	68911235cf	Merge branch 'dsa_loop-and-platform_data-cleanups' Vladimir Oltean says: ==================== dsa_loop and platform_data cleanups While working to add some new features to dsa_loop, I gathered a number of cleanup patches. They mostly remove some data structures that became unused after the multi-switch platforms were migrated to the modern DT bindings. ==================== Link: https://patch.msgid.link/20260406212158.721806-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:56 -07:00
Vladimir Oltean	da9008674d	net: dsa: eliminate <linux/dsa/loop.h> There is no reason at all to export these data types to the global include directory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	c3b09190e6	net: dsa: remove unused platform_data definitions Pretty self-explanatory, nobody needs these. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	dc915f375e	net: dsa: clean up struct dsa_chip_data This has accumulated some fields which are no longer parsed by the core or set by any driver. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	b773b99352	net: dsa: remove struct platform_data This is not used anywhere in the kernel. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Li RongQing	0006c6f109	devlink: Fix incorrect skb socket family dumping The devlink_fmsg_dump_skb function was incorrectly using the socket type (sk->sk_type) instead of the socket family (sk->sk_family) when filling the "family" field in the fast message dump. This patch fixes this to properly display the socket family. Fixes: `3dbfde7f6b` ("devlink: add devlink_fmsg_dump_skb() function") Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://patch.msgid.link/20260407022730.2393-1-lirongqing@baidu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:34:38 -07:00
Jiexun Wang	39897df386	af_unix: read UNIX_DIAG_VFS data under unix_state_lock Exact UNIX diag lookups hold a reference to the socket, but not to u->path. Meanwhile, unix_release_sock() clears u->path under unix_state_lock() and drops the path reference after unlocking. Read the inode and device numbers for UNIX_DIAG_VFS while holding unix_state_lock(), then emit the netlink attribute after dropping the lock. This keeps the VFS data stable while the reply is being built. Fixes: `5f7b056946` ("unix_diag: Unix inode info NLA") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Jiexun Wang <wangjiexun2025@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260407080015.1744197-1-n05ec@lzu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:33:52 -07:00
Jakub Kicinski	3723c3b656	Merge branch 'mptcp-autotune-related-improvement' Matthieu Baerts says: ==================== mptcp: autotune related improvement Here are two patches from Paolo that have been crafted a couple of months ago, but needed more validation because they were indirectly causing instabilities in the sefltests. The root cause has been fixed in 'net' recently in commit `8c09412e58` ("selftests: mptcp: more stable simult_flows tests"). These patches refactor the receive space and RTT estimator, overall making DRS more correct while avoiding receive buffer drifting to tcp_rmem[2], which in turn makes the throughput more stable and less bursty, especially with high bandwidth and low delay environments. Note that the first patch addresses a very old issue. 'net-next' is targeted because the change is quite invasive and based on a recent backlog refactor. The 'Fixes' tag is then there more as a FYI, because backporting this patch will quickly be blocked due to large conflicts. ==================== Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-0-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:05 -07:00
Paolo Abeni	7272d8131a	mptcp: add receive queue awareness in tcp_rcv_space_adjust() This is the MPTCP counter-part of commit `ea33537d82` ("tcp: add receive queue awareness in tcp_rcv_space_adjust()"). Prior to this commit: ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \ skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0) After: ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \ skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0) Same throughput. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-2-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:01 -07:00
Paolo Abeni	d2000361e4	mptcp: better mptcp-level RTT estimator The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk RTT to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ACK on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level RTT. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. While at it, also use EWMA to compute the msk-level scaling_ratio, to that MPTCP can avoid traversing the subflow list is mptcp_rcv_space_adjust(). Use some care to avoid updating msk and ssk level fields too often. Fixes: `a6b118febb` ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-1-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:00 -07:00
Matthieu Baerts (NGI0)	8e2760eaab	Revert "mptcp: add needs_id for netlink appending addr" This commit was originally adding the ability to add MPTCP endpoints with ID 0 by accident. The in-kernel PM, handling MPTCP endpoints at the net namespace level, is not supposed to handle endpoints with such ID, because this ID 0 is reserved to the initial subflow, as mentioned in the MPTCPv1 protocol [1], a per-connection setting. Note that 'ip mptcp endpoint add id 0' stops early with an error, but other tools might still request the in-kernel PM to create MPTCP endpoints with this restricted ID 0. In other words, it was wrong to call the mptcp_pm_has_addr_attr_id helper to check whether the address ID attribute is set: if it was set to 0, a new MPTCP endpoint would be created with ID 0, which is not expected, and might cause various issues later. Fixes: `584f389426` ("mptcp: add needs_id for netlink appending addr") Cc: stable@vger.kernel.org Link: https://datatracker.ietf.org/doc/html/rfc8684#section-3.2-9 [1] Reviewed-by: Geliang Tang <geliang@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-mptcp-revert-pm-needs-id-v2-1-7a25cbc324f8@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:31:16 -07:00
Jiayuan Chen	9b55b25390	mptcp: fix slab-use-after-free in __inet_lookup_established The ehash table lookups are lockless and rely on SLAB_TYPESAFE_BY_RCU to guarantee socket memory stability during RCU read-side critical sections. Both tcp_prot and tcpv6_prot have their slab caches created with this flag via proto_register(). However, MPTCP's mptcp_subflow_init() copies tcpv6_prot into tcpv6_prot_override during inet_init() (fs_initcall, level 5), before inet6_init() (module_init/device_initcall, level 6) has called proto_register(&tcpv6_prot). At that point, tcpv6_prot.slab is still NULL, so tcpv6_prot_override.slab remains NULL permanently. This causes MPTCP v6 subflow child sockets to be allocated via kmalloc (falling into kmalloc-4k) instead of the TCPv6 slab cache. The kmalloc-4k cache lacks SLAB_TYPESAFE_BY_RCU, so when these sockets are freed without SOCK_RCU_FREE (which is cleared for child sockets by design), the memory can be immediately reused. Concurrent ehash lookups under rcu_read_lock can then access freed memory, triggering a slab-use-after-free in __inet_lookup_established. Fix this by splitting the IPv6-specific initialization out of mptcp_subflow_init() into a new mptcp_subflow_v6_init(), called from mptcp_proto_v6_init() before protocol registration. This ensures tcpv6_prot_override.slab correctly inherits the SLAB_TYPESAFE_BY_RCU slab cache. Fixes: `b19bc2945b` ("mptcp: implement delegated actions") Cc: stable@vger.kernel.org Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260406031512.189159-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:30:08 -07:00
Jiayuan Chen	1a6b396538	net: initialize sk_rx_queue_mapping in sk_clone() sk_clone() initializes sk_tx_queue_mapping via sk_tx_queue_clear() but does not initialize sk_rx_queue_mapping. Since this field is in the sk_dontcopy region, it is neither copied from the parent socket by sock_copy() nor zeroed by sk_prot_alloc() (called without __GFP_ZERO from sk_clone). Commit `03cfda4fa6` ("tcp: fix another uninit-value (sk_rx_queue_mapping)") attempted to fix this by introducing sk_mark_napi_id_set() with force_set=true in tcp_child_process(). However, sk_mark_napi_id_set() -> sk_rx_queue_set() only writes when skb_rx_queue_recorded(skb) is true. If the 3-way handshake ACK arrives through a device that does not record rx_queue (e.g. loopback or veth), sk_rx_queue_mapping remains uninitialized. When a subsequent data packet arrives with a recorded rx_queue, sk_mark_napi_id() -> sk_rx_queue_update() reads the uninitialized field for comparison (force_set=false path), triggering KMSAN. This was reproduced by establishing a TCP connection over loopback (which does not call skb_record_rx_queue), then attaching a BPF TC program on lo ingress to set skb->queue_mapping on data packets: BUG: KMSAN: uninit-value in tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2287) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207) ip_local_deliver_finish (net/ipv4/ip_input.c:242) ip_local_deliver (net/ipv4/ip_input.c:262) ip_rcv (net/ipv4/ip_input.c:573) __netif_receive_skb (net/core/dev.c:6294) process_backlog (net/core/dev.c:6646) __napi_poll (net/core/dev.c:7710) net_rx_action (net/core/dev.c:7929) handle_softirqs (kernel/softirq.c:623) do_softirq (kernel/softirq.c:523) __local_bh_enable_ip (kernel/softirq.c:?) __dev_queue_xmit (net/core/dev.c:?) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) __ip_queue_xmit (net/ipv4/ip_output.c:534) __tcp_transmit_skb (net/ipv4/tcp_output.c:1693) tcp_write_xmit (net/ipv4/tcp_output.c:3064) tcp_sendmsg_locked (net/ipv4/tcp.c:?) tcp_sendmsg (net/ipv4/tcp.c:1465) inet_sendmsg (net/ipv4/af_inet.c:865) sock_write_iter (net/socket.c:1195) vfs_write (fs/read_write.c:688) ... Uninit was created at: kmem_cache_alloc_noprof (mm/slub.c:4873) sk_prot_alloc (net/core/sock.c:2239) sk_alloc (net/core/sock.c:2301) inet_create (net/ipv4/af_inet.c:334) __sock_create (net/socket.c:1605) __sys_socket (net/socket.c:1747) Fix this at the root by adding sk_rx_queue_clear() alongside sk_tx_queue_clear() in sk_clone(). Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407084219.95718-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:28:05 -07:00
Ioana Ciornei	fff75dba79	selftests: forwarding: lib: rewrite processing of command line arguments The piece of code which processes the command line arguments and populates NETIFS based on them is really unobvious. Rewrite it so that the intention is clear and the code is easy to follow. Suggested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260407102058.867279-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:26:44 -07:00
Florian Fainelli	686a7587bd	net: bcmasp: Switch to page pool for RX path This shows an improvement of 1.9% in reducing the CPU cycles and data cache misses. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260408001813.635679-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:24:12 -07:00
Eric Dumazet	202ab59941	net: dropreason: add MACVLAN_BROADCAST_BACKLOG and IPVLAN_MULTICAST_BACKLOG ipvlan and macvlan use queues to process broadcast/multicast packets from a work queue. Under attack these queues can drop packets. Add MACVLAN_BROADCAST_BACKLOG drop_reason for macvlan broadcast queue. Add IPVLAN_MULTICAST_BACKLOG drop_reason for ipvlan multicast queue. Use different reasons as some deployments use both ipvlan and macvlan. Also change ipvlan_rcv_frame() to use SKB_DROP_REASON_DEV_READY when the device is not UP. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407150710.1640747-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:19:18 -07:00
Eric Dumazet	ea25e03da7	codel: annotate data-races in codel_dump_stats() codel_dump_stats() only runs with RTNL held, reading fields that can be changed in qdisc fast path. Add READ_ONCE()/WRITE_ONCE() annotations. Alternative would be to acquire the qdisc spinlock, but our long-term goal is to make qdisc dump operations lockless as much as we can. tc_codel_xstats fields don't need to be latched atomically, otherwise this bug would have been caught earlier. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/0 grow/shrink: 1/1 up/down: 3/-1 (2) Function old new delta codel_qdisc_dequeue 2462 2465 +3 codel_dump_stats 250 249 -1 Total: Before=29739919, After=29739921, chg +0.00% Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407143053.1570620-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:18:52 -07:00
Aleksander Jan Bajkowski	dbc2bb4e87	net: phy: realtek: get rid of magic numbers in rtl8201_config_intr() Replace the magic numbers with defines. Register names were obtained from publicly available documentation[1]. This should make it clear what's going on in the code. 1. RTL8201F/RTL8201FL/RTL8201FN Rev. 1.4 Datasheet Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Nicolai Buchwitz nb@tipi-net.de Link: https://patch.msgid.link/20260406201222.1043396-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:17:17 -07:00

1 2 3 4 5 ...

1431226 Commits