linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 14:53:58 -04:00

Author	SHA1	Message	Date
Birger Koblitz	ebe5fd2ed2	r8152: Add support for 5Gbit Link Speeds and EEE The RTL8157 supports 5GBit Link speeds. Add support for this speed in the setup and setting/getting through ethtool. Also add 5GBit EEE. Add functionality for setup and ethtool get/set methods. Signed-off-by: Birger Koblitz <mail@birger-koblitz.de> Link: https://patch.msgid.link/20260404-rtl8157_next-v7-1-039121318f23@birger-koblitz.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-09 12:16:46 +02:00
Jakub Kicinski	58dd34dbd5	Merge branch 'devlink-add-per-port-resource-support' Tariq Toukan says: ==================== devlink: add per-port resource support This series by Or adds devlink per-port resource support: Currently, devlink resources are only available at the device level. However, some resources are inherently per-port, such as the maximum number of subfunctions (SFs) that can be created on a specific PF port. This limitation prevents user space from obtaining accurate per-port capacity information. This series adds infrastructure for per-port resources in devlink core and implements it in the mlx5 driver to expose the max_SFs resource on PF devlink ports. Patch #1 refactors resource functions to be generic Patch #2 adds port-level resource registration infrastructure Patch #3 registers SF resource on PF port representor in mlx5 Patch #4 adds devlink port resource registration to netdevsim for testing Patch #5 adds dump support for device-level resources Patch #6 includes port resources in the resource dump dumpit path Patch #7 adds port-specific option to resource dump doit path Patch #8 adds selftest for devlink port resource doit Patch #9 documents port-level resources and full dump Patch #10 adds resource scope filtering to resource dump Patch #11 adds selftest for resource dump and scope filter Patch #12 documents resource scope filtering ==================== Link: https://patch.msgid.link/20260407194107.148063-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:43 -07:00
Or Har-Toov	78c327c172	devlink: Document resource scope filtering Document the scope parameter for devlink resource show, which allows filtering the dump to device-level or port-level resources only. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	2a8e912352	selftest: netdevsim: Add resource dump and scope filter test Add resource_dump_test() which verifies dumping resources for all devices and ports, and tests that scope=dev returns only device-level resources and scope=port returns only port resources. Skip if userspace does not support the scope parameter. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-12-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	1bc45341a6	devlink: Add resource scope filtering to resource dump Allow filtering the resource dump to device-level or port-level resources using the 'scope' option. Example - dump only device-level resources: $ devlink resource show scope dev pci/0000:03:00.0: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1: name max_local_SFs size 128 unit entry dpipe_tables none name max_external_SFs size 128 unit entry dpipe_tables none Example - dump only port-level resources: $ devlink resource show scope port pci/0000:03:00.0/196608: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.0/196609: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196708: name max_SFs size 128 unit entry dpipe_tables none pci/0000:03:00.1/196709: name max_SFs size 128 unit entry dpipe_tables none Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-11-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	170e160a0e	devlink: Document port-level resources and full dump Document the port-level resource support and the option to dump all resources, including both device-level and port-level entries. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-10-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	3961353771	selftest: netdevsim: Add devlink port resource doit test Tests that querying a specific port handle returns the expected resource name and size. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-9-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	7511ff14f3	devlink: Add port-specific option to resource dump doit Allow querying devlink resources per-port via the resource-dump doit handler. When a port-index attribute is provided, only that port's resources are returned. When no port-index is given, only device-level resources are returned, preserving backward compatibility. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-8-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:39 -07:00
Or Har-Toov	810b76394d	devlink: Include port resources in resource dump dumpit Allow querying devlink resources per-port via the resource-dump dumpit handler. Both device-level and all ports resources are included in the reply. For example: $ devlink resource show pci/0000:03:00.0: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:03:00.0/196608: name max_SFs size 20 unit entry pci/0000:03:00.1: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:03:00.1/262144: name max_SFs size 20 unit entry Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-7-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	11636b550e	devlink: Add dump support for device-level resources Add dumpit handler for resource-dump command to iterate over all devlink devices and show their resources. $ devlink resource show pci/0000:08:00.0: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry pci/0000:08:00.1: name local_max_SFs size 508 unit entry name external_max_SFs size 508 unit entry Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	085b234b28	netdevsim: Add devlink port resource registration Register port-level resources for netdevsim ports to enable testing of the port resource infrastructure. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	4be8326d81	net/mlx5: Register SF resource on PF port representor The device-level "resource show" displays max_local_SFs and max_external_SFs without indicating which port each resource belongs to. Users cannot determine the controller number and pfnum associated with each SF pool. Register max_SFs resource on the host PF representor port to expose per-port SF limits. Users can correlate the port resource with the controller number and pfnum shown in 'devlink port show'. Future patches will introduce an ECPF that manages multiple PFs, where each PF has its own SF pool. Example usage: $ devlink resource show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: name max_SFs size 20 unit entry $ devlink port show pci/0000:03:00.0/196608 pci/0000:03:00.0/196608: type eth netdev pf0hpf flavour pcipf controller 1 pfnum 0 external true splittable false function: hw_addr b8:3f:d2:e1:8f:dc roce enable max_io_eqs 120 We can create up to 20 SFs over devlink port pci/0000:03:00.0/196608, with pfnum 0 and controller 1. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	6f38acfed5	devlink: Add port-level resource registration infrastructure The current devlink resource infrastructure supports only device-level resources. Some hardware resources are associated with specific ports rather than the entire device, and today we have no way to show resource per-port. Add support for registering resources at the port level. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-3-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Or Har-Toov	7be3163c49	devlink: Refactor resource functions to be generic Currently the resource functions take devlink pointer as parameter and take the resource list from there. Allow resource functions to work with other resource lists that will be added in next patches and not only with the devlink's resource list. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260407194107.148063-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:55:38 -07:00
Leon Hwang	5ae4ba98d7	selftests/drivers/net: Add an xdp test to xdp.py In "bpf: Disallow freplace on XDP with mismatched xdp_has_frags values" [1], this XDP test is suggested to add to xdp.py. 1. Verify the failure of updating frag-capable prog with non-frag-capable prog, when the frag-capable prog attaches to mtu=9k driver. The test has been verified against Mellanox CX6 and Intel 82599ES NICs. With dropping other tests, here is the test log. # ethtool -i eth0 driver: mlx5_core version: 6.19.0-061900-generic # NETIF=eth0 python3 xdp.py TAP version 13 1..1 ok 1 xdp.test_xdp_native_update_mb_to_sb # Totals: pass:1 fail:0 xfail:0 xpass:0 skip:0 error:0 # ethtool -i eth0 driver: ixgbe version: 6.19.0-061900-generic # NETIF=eth0 python3 xdp.py TAP version 13 1..1 # CMD: ip link set dev eth0 xdpdrv obj /path/to/tools/testing/selftests/net/lib/xdp_dummy.bpf.o sec xdp.frags # EXIT: 2 # STDERR: RTNETLINK answers: Invalid argument ok 1 xdp.test_xdp_native_update_mb_to_sb # SKIP device does not support multi-buffer XDP # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:1 error:0 Signed-off-by: Leon Hwang <leon.huangfu@shopee.com> Link: https://patch.msgid.link/20260406072655.368173-1-leon.huangfu@shopee.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:48:56 -07:00
Jakub Kicinski	68911235cf	Merge branch 'dsa_loop-and-platform_data-cleanups' Vladimir Oltean says: ==================== dsa_loop and platform_data cleanups While working to add some new features to dsa_loop, I gathered a number of cleanup patches. They mostly remove some data structures that became unused after the multi-switch platforms were migrated to the modern DT bindings. ==================== Link: https://patch.msgid.link/20260406212158.721806-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:56 -07:00
Vladimir Oltean	da9008674d	net: dsa: eliminate <linux/dsa/loop.h> There is no reason at all to export these data types to the global include directory. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-5-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	c3b09190e6	net: dsa: remove unused platform_data definitions Pretty self-explanatory, nobody needs these. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-4-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	dc915f375e	net: dsa: clean up struct dsa_chip_data This has accumulated some fields which are no longer parsed by the core or set by any driver. Remove them. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-3-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Vladimir Oltean	b773b99352	net: dsa: remove struct platform_data This is not used anywhere in the kernel. Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://patch.msgid.link/20260406212158.721806-2-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:38:52 -07:00
Jakub Kicinski	3723c3b656	Merge branch 'mptcp-autotune-related-improvement' Matthieu Baerts says: ==================== mptcp: autotune related improvement Here are two patches from Paolo that have been crafted a couple of months ago, but needed more validation because they were indirectly causing instabilities in the sefltests. The root cause has been fixed in 'net' recently in commit `8c09412e58` ("selftests: mptcp: more stable simult_flows tests"). These patches refactor the receive space and RTT estimator, overall making DRS more correct while avoiding receive buffer drifting to tcp_rmem[2], which in turn makes the throughput more stable and less bursty, especially with high bandwidth and low delay environments. Note that the first patch addresses a very old issue. 'net-next' is targeted because the change is quite invasive and based on a recent backlog refactor. The 'Fixes' tag is then there more as a FYI, because backporting this patch will quickly be blocked due to large conflicts. ==================== Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-0-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:05 -07:00
Paolo Abeni	7272d8131a	mptcp: add receive queue awareness in tcp_rcv_space_adjust() This is the MPTCP counter-part of commit `ea33537d82` ("tcp: add receive queue awareness in tcp_rcv_space_adjust()"). Prior to this commit: ESTAB 33165568 0 192.168.255.2:5201 192.168.255.1:53380 \ skmem:(r33076416,rb33554432,t0,tb91136,f448,w0,o0,bl0,d0) After: ESTAB 3279168 0 192.168.255.2:5201 192.168.255.1]:53042 \ skmem:(r3190912,rb3719956,t0,tb91136,f1536,w0,o0,bl0,d0) Same throughput. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-2-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:01 -07:00
Paolo Abeni	d2000361e4	mptcp: better mptcp-level RTT estimator The current MPTCP-level RTT estimator has several issues. On high speed links, the MPTCP-level receive buffer auto-tuning happens with a frequency well above the TCP-level's one. That in turn can cause excessive/unneeded receive buffer increase. On such links, the initial rtt_us value is considerably higher than the actual delay, and the current mptcp_rcv_space_adjust() updates msk->rcvq_space.rtt_us with a period equal to the such field previous value. If the initial rtt_us is 40ms, its first update will happen after 40ms, even if the subflows see actual RTT orders of magnitude lower. Additionally: - setting the msk RTT to the maximum among all the subflows RTTs makes DRS constantly overshooting the rcvbuf size when a subflow has considerable higher latency than the other(s). - during unidirectional bulk transfers with multiple active subflows, the TCP-level RTT estimator occasionally sees considerably higher value than the real link delay, i.e. when the packet scheduler reacts to an incoming ACK on given subflow pushing data on a different subflow. - currently inactive but still open subflows (i.e. switched to backup mode) are always considered when computing the msk-level RTT. Address the all the issues above with a more accurate RTT estimation strategy: the MPTCP-level RTT is set to the minimum of all the subflows actually feeding data into the MPTCP receive buffer, using a small sliding window. While at it, also use EWMA to compute the msk-level scaling_ratio, to that MPTCP can avoid traversing the subflow list is mptcp_rcv_space_adjust(). Use some care to avoid updating msk and ssk level fields too often. Fixes: `a6b118febb` ("mptcp: add receive buffer auto-tuning") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260407-net-next-mptcp-reduce-rbuf-v2-1-0d1d135bf6f6@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:32:00 -07:00
Jiayuan Chen	1a6b396538	net: initialize sk_rx_queue_mapping in sk_clone() sk_clone() initializes sk_tx_queue_mapping via sk_tx_queue_clear() but does not initialize sk_rx_queue_mapping. Since this field is in the sk_dontcopy region, it is neither copied from the parent socket by sock_copy() nor zeroed by sk_prot_alloc() (called without __GFP_ZERO from sk_clone). Commit `03cfda4fa6` ("tcp: fix another uninit-value (sk_rx_queue_mapping)") attempted to fix this by introducing sk_mark_napi_id_set() with force_set=true in tcp_child_process(). However, sk_mark_napi_id_set() -> sk_rx_queue_set() only writes when skb_rx_queue_recorded(skb) is true. If the 3-way handshake ACK arrives through a device that does not record rx_queue (e.g. loopback or veth), sk_rx_queue_mapping remains uninitialized. When a subsequent data packet arrives with a recorded rx_queue, sk_mark_napi_id() -> sk_rx_queue_update() reads the uninitialized field for comparison (force_set=false path), triggering KMSAN. This was reproduced by establishing a TCP connection over loopback (which does not call skb_record_rx_queue), then attaching a BPF TC program on lo ingress to set skb->queue_mapping on data packets: BUG: KMSAN: uninit-value in tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1875) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2287) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:207) ip_local_deliver_finish (net/ipv4/ip_input.c:242) ip_local_deliver (net/ipv4/ip_input.c:262) ip_rcv (net/ipv4/ip_input.c:573) __netif_receive_skb (net/core/dev.c:6294) process_backlog (net/core/dev.c:6646) __napi_poll (net/core/dev.c:7710) net_rx_action (net/core/dev.c:7929) handle_softirqs (kernel/softirq.c:623) do_softirq (kernel/softirq.c:523) __local_bh_enable_ip (kernel/softirq.c:?) __dev_queue_xmit (net/core/dev.c:?) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) __ip_queue_xmit (net/ipv4/ip_output.c:534) __tcp_transmit_skb (net/ipv4/tcp_output.c:1693) tcp_write_xmit (net/ipv4/tcp_output.c:3064) tcp_sendmsg_locked (net/ipv4/tcp.c:?) tcp_sendmsg (net/ipv4/tcp.c:1465) inet_sendmsg (net/ipv4/af_inet.c:865) sock_write_iter (net/socket.c:1195) vfs_write (fs/read_write.c:688) ... Uninit was created at: kmem_cache_alloc_noprof (mm/slub.c:4873) sk_prot_alloc (net/core/sock.c:2239) sk_alloc (net/core/sock.c:2301) inet_create (net/ipv4/af_inet.c:334) __sock_create (net/socket.c:1605) __sys_socket (net/socket.c:1747) Fix this at the root by adding sk_rx_queue_clear() alongside sk_tx_queue_clear() in sk_clone(). Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407084219.95718-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:28:05 -07:00
Ioana Ciornei	fff75dba79	selftests: forwarding: lib: rewrite processing of command line arguments The piece of code which processes the command line arguments and populates NETIFS based on them is really unobvious. Rewrite it so that the intention is clear and the code is easy to follow. Suggested-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Link: https://patch.msgid.link/20260407102058.867279-1-ioana.ciornei@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:26:44 -07:00
Florian Fainelli	686a7587bd	net: bcmasp: Switch to page pool for RX path This shows an improvement of 1.9% in reducing the CPU cycles and data cache misses. Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Justin Chen <justin.chen@broadcom.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260408001813.635679-1-florian.fainelli@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:24:12 -07:00
Eric Dumazet	202ab59941	net: dropreason: add MACVLAN_BROADCAST_BACKLOG and IPVLAN_MULTICAST_BACKLOG ipvlan and macvlan use queues to process broadcast/multicast packets from a work queue. Under attack these queues can drop packets. Add MACVLAN_BROADCAST_BACKLOG drop_reason for macvlan broadcast queue. Add IPVLAN_MULTICAST_BACKLOG drop_reason for ipvlan multicast queue. Use different reasons as some deployments use both ipvlan and macvlan. Also change ipvlan_rcv_frame() to use SKB_DROP_REASON_DEV_READY when the device is not UP. Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407150710.1640747-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:19:18 -07:00
Eric Dumazet	ea25e03da7	codel: annotate data-races in codel_dump_stats() codel_dump_stats() only runs with RTNL held, reading fields that can be changed in qdisc fast path. Add READ_ONCE()/WRITE_ONCE() annotations. Alternative would be to acquire the qdisc spinlock, but our long-term goal is to make qdisc dump operations lockless as much as we can. tc_codel_xstats fields don't need to be latched atomically, otherwise this bug would have been caught earlier. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/0 grow/shrink: 1/1 up/down: 3/-1 (2) Function old new delta codel_qdisc_dequeue 2462 2465 +3 codel_dump_stats 250 249 -1 Total: Before=29739919, After=29739921, chg +0.00% Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260407143053.1570620-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:18:52 -07:00
Aleksander Jan Bajkowski	dbc2bb4e87	net: phy: realtek: get rid of magic numbers in rtl8201_config_intr() Replace the magic numbers with defines. Register names were obtained from publicly available documentation[1]. This should make it clear what's going on in the code. 1. RTL8201F/RTL8201FL/RTL8201FN Rev. 1.4 Datasheet Signed-off-by: Aleksander Jan Bajkowski <olek2@wp.pl> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Nicolai Buchwitz nb@tipi-net.de Link: https://patch.msgid.link/20260406201222.1043396-1-olek2@wp.pl Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:17:17 -07:00
Xiang Mei	f81f4e79b1	bonding: remove unused bond_is_first_slave and bond_is_last_slave macros Since commit `2884bf72fb` ("net: bonding: fix use-after-free in bond_xmit_broadcast()"), bond_is_last_slave() was only used in bond_xmit_broadcast(). After the recent fix replaced that usage with a simple index comparison, bond_is_last_slave() has no remaining callers. bond_is_first_slave() likewise has no callers. Remove both unused macros. Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260404220412.444753-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:07:08 -07:00
Jakub Kicinski	bd5c24e400	docs: netdev: improve wording of reviewer guidance Reword the reviewer guidance based on behavior we see on the list. Steer folks: - towards sending tags - away from process issues. Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Link: https://patch.msgid.link/20260406175334.3153451-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 19:03:00 -07:00
Jakub Kicinski	1795654f00	Merge tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next 1) Fix ancient sparse warnings in nf conntrack nat modules, from Sun Jian. 2) Fix typo in enum description, from Jelle van der Waa. 3) remove redundant refetch of netns pointer in nf_conntrack_sip. 4) add a deprecation warning for dccp match. We can extend the deadline later if needed, but plan atm is to remove the feature. 5) remove nf_conntrack_h323 debug code that can read out-of-bounds with malformed messages. This code was commented out, but better remove this. 6+7) add more netlink policy validations in netfilter. This could theoretically cause issues when a client sends e.g. unsupported feature flags that were previously ignored, so we may have to relax some changes. For now, try to be stricter and reject upfront. 8+9) minor code cleanup in nft_set_pipapo (an nftables set backend). 10) Add nftables matching support fro double-tagged vlan and pppoe frames, from Pablo Neira Ayuso. 11) Fix up indentation of debug messages in nf_conntrack_h323 conntrack helper, from David Laight. 12) Add a helper to iterate to next flow action and bail out if the maximum number of actions is reached, also from Pablo. * tag 'nf-next-26-04-08' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined netfilter: nft_meta: add double-tagged vlan and pppoe support netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow netfilter: nft_set_pipapo: increment data in one step netfilter: nf_tables: add netlink policy based cap on registers netfilter: add more netlink-based policy range checks netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr netfilter: add deprecation warning for dccp support netfilter: nf_conntrack_sip: remove net variable shadowing netfilter: nf_tables: Fix typo in enum description netfilter: use function typedefs for __rcu NAT helper hook pointers ==================== Link: https://patch.msgid.link/20260408060419.25258-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:58:08 -07:00
Jakub Kicinski	ea0f90d1ed	Merge tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== pull request (net-next): ipsec-next 2026-04-08 1) Update outdated comment in xfrm_dst_check(). From kexinsun. 2) Drop support for HMAC-RIPEMD-160 from IPsec. From Eric Biggers. * tag 'ipsec-next-2026-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: Drop support for HMAC-RIPEMD-160 xfrm: update outdated comment ==================== Link: https://patch.msgid.link/20260408094258.148555-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-08 18:51:54 -07:00
Pablo Neira Ayuso	c6f8557758	netfilter: nf_tables_offload: add nft_flow_action_entry_next() and use it Add a new helper function to retrieve the next action entry in flow rule, check if the maximum number of actions is reached, bail out in such case. Replace existing opencoded iteration on the action array by this helper function. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
David Laight	f33fad8dbf	netfilter: nf_conntrack_h323: Correct indentation when H323_TRACE defined The trace lines are indented using PRINT("%.s", xx, " "). Userspace will treat this as "%.0s" and will output no characters when 'xx' is zero, the kernel treats it as "%s" and will output a single ' ' - which is probably what is intended. Change all the formats to "%s" removing the default precision. This gives a single space indent when level is zero. Signed-off-by: David Laight <david.laight.linux@gmail.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Pablo Neira Ayuso	3785091c6c	netfilter: nft_meta: add double-tagged vlan and pppoe support Currently: add rule netdev x y ip saddr 1.1.1.1 does not work with neither double-tagged vlan nor pppoe packets. This is because the network and transport header offset are not pointing to the IP and transport protocol headers in the stack. This patch expands NFT_META_PROTOCOL and NFT_META_L4PROTO to parse double-tagged vlan and pppoe packets so matching network and transport header fields becomes possible with the existing userspace generated bytecode. Note that this parser only supports double-tagged vlan which is composed of vlan offload + vlan header in the skb payload area for simplicity. NFT_META_PROTOCOL is used by bridge and netdev family as an implicit dependency in the bytecode to match on network header fields. Similarly, there is also NFT_META_L4PROTO, which is also used as an implicit dependency when matching on the transport protocol header fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Florian Westphal	a3f1e6a19a	netfilter: nft_set_pipapo_avx2: remove redundant loop in lookup_slow nft_pipapo_avx2_lookup_slow will never be used in reality, because the common sizes are handled by avx2 optimized versions. However, nft_pipapo_avx2_lookup_slow loops over the data just like the avx2 functions. However, _slow doesn't need to do that. As-is, first loop sets all the right result bits and the next iterations boil down to 'x = x & x'. Remove the loop. Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Florian Westphal	04e1ca21a5	netfilter: nft_set_pipapo: increment data in one step Since commit `e807b13cb3` ("nft_set_pipapo: Generalise group size for buckets") there is no longer a need to increment the data pointer in two steps. Switch to a single invocation of NFT_PIPAPO_GROUPS_PADDED_SIZE() helper, like the avx2 implementation. [ Stefano: Improve commit message ] Reviewed-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Florian Westphal	8e57338c36	netfilter: nf_tables: add netlink policy based cap on registers Should have no effect in practice; all of these use the nft_parse_register_load/store apis which is mandatory anyway due to the need to further validate the register load/store, e.g. that the size argument doesn't result in out-of-bounds load/store. OTOH this is a simple method to reject obviously wrong input at earlier stage. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Florian Westphal	66b75e6bbe	netfilter: add more netlink-based policy range checks These spots either already check the attribute range manually before use or the consuming functions tolerate unexpected values. Nevertheless, add more range checks via netlink policy so we gain more users and avoid possible re-use in other places that might not have the required manual checks. This also improves error reporting: netlink core can generate extack errors. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:30 +02:00
Florian Westphal	390a57dd61	netfilter: nf_conntrack_h323: remove unreliable debug code in decode_octstr The debug code (not enabled in any build) reads up to 6 octets of the inpt buffer, but does so without bound checks. Zap this. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:27 +02:00
Florian Westphal	606bd17ef0	netfilter: add deprecation warning for dccp support Add a deprecation warning for the xt_dccp match and the nft exthdr code. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:27 +02:00
Florian Westphal	7970d6aaf7	netfilter: nf_conntrack_sip: remove net variable shadowing net is already set, derived from nf_conn. I don't see how the device could be living in a different netns than the conntrack entry. Remove the extra variable and re-use existing one. Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:27 +02:00
Jelle van der Waa	1f290c497c	netfilter: nf_tables: Fix typo in enum description Fix the spelling of "options". Signed-off-by: Jelle van der Waa <jelle@vdwaa.nl> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:26 +02:00
Sun Jian	6e6f2b9b33	netfilter: use function typedefs for __rcu NAT helper hook pointers After commit `07919126ec` ("netfilter: annotate NAT helper hook pointers with __rcu"), sparse can warn about type/address-space mismatches when RCU-dereferencing NAT helper hook function pointers. The hooks are __rcu-annotated and accessed via rcu_dereference(), but the combination of complex function pointer declarators and the WRITE_ONCE() machinery used by RCU_INIT_POINTER()/rcu_assign_pointer() can confuse sparse and trigger false positives. Introduce typedefs for the NAT helper function types, so __rcu applies to a simple "fn_t __rcu " pointer form. Also replace local typeof(hook) variables with "fn_t " to avoid propagating __rcu address space into temporaries. No functional change intended. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603022359.3dGE9fwI-lkp@intel.com/ Signed-off-by: Sun Jian <sun.jian.kdev@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:26 +02:00
Jakub Kicinski	b3e69fc319	Merge branch 'net-pull-gso-packet-headers-in-core-stack' Eric Dumazet says: ==================== net: pull gso packet headers in core stack Most ndo_start_xmit() methods expects headers of gso packets to be already in skb->head. net/core/tso.c users are particularly at risk, because tso_build_hdr() does a memcpy(hdr, skb->data, hdr_len); qdisc_pkt_len_segs_init() already does a dissection of gso packets. Use pskb_may_pull() instead of skb_header_pointer() to make sure drivers do not have to reimplement this. First patch is a small cleanup to ease second patch review. ==================== Link: https://patch.msgid.link/20260403221540.3297753-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 19:02:18 -07:00
Eric Dumazet	7fb4c19670	net: pull headers in qdisc_pkt_len_segs_init() Most ndo_start_xmit() methods expects headers of gso packets to be already in skb->head. net/core/tso.c users are particularly at risk, because tso_build_hdr() does a memcpy(hdr, skb->data, hdr_len); qdisc_pkt_len_segs_init() already does a dissection of gso packets. Use pskb_may_pull() instead of skb_header_pointer() to make sure drivers do not have to reimplement this. Some malicious packets could be fed, detect them so that we can drop them sooner with a new SKB_DROP_REASON_SKB_BAD_GSO drop_reason. Fixes: `e876f208af` ("net: Add a software TSO helper API") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260403221540.3297753-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 19:02:13 -07:00
Eric Dumazet	30e02ec3b4	net: qdisc_pkt_len_segs_init() cleanup Reduce indentation level by returning early if the transport header was not set. Add an unlikely() clause as this is not the common case. No functional change. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260403221540.3297753-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 19:02:13 -07:00
Jakub Kicinski	e65d8b6f30	selftests: drv-net: adjust to socat changes socat v1.8.1.0 now defaults to shut-null, it sends an extra 0-length UDP packet when sender disconnects. This breaks our tests which expect the exact packet sequence. Add shut-none which was the old default where necessary. Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260404230103.2719103-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 18:54:03 -07:00
Fernando Fernandez Mancera	2ce8a41113	net: hsr: emit notification for PRP slave2 changed hw addr on port deletion On PRP protocol, when deleting the port the MAC address change notification was missing. In addition to that, make sure to only perform the MAC address change on slave2 deletion and PRP protocol as the operation isn't necessary for HSR nor slave1. Note that the eth_hw_addr_set() is correct on PRP context as the slaves are either in promiscuous mode or forward offload enabled. Reported-by: Luka Gejak <luka.gejak@linux.dev> Closes: https://lore.kernel.org/netdev/DHFCZEM93FTT.1RWFBIE32K7OT@linux.dev/ Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Reviewed-by: Felix Maurer <fmaurer@redhat.com> Link: https://patch.msgid.link/20260403123928.4249-2-fmancera@suse.de Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-07 17:06:16 +02:00

1 2 3 4 5 ...

1430776 Commits