linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 14:53:58 -04:00

Author	SHA1	Message	Date
Pablo Neira Ayuso	3785091c6c	netfilter: nft_meta: add double-tagged vlan and pppoe support Currently: add rule netdev x y ip saddr 1.1.1.1 does not work with neither double-tagged vlan nor pppoe packets. This is because the network and transport header offset are not pointing to the IP and transport protocol headers in the stack. This patch expands NFT_META_PROTOCOL and NFT_META_L4PROTO to parse double-tagged vlan and pppoe packets so matching network and transport header fields becomes possible with the existing userspace generated bytecode. Note that this parser only supports double-tagged vlan which is composed of vlan offload + vlan header in the skb payload area for simplicity. NFT_META_PROTOCOL is used by bridge and netdev family as an implicit dependency in the bytecode to match on network header fields. Similarly, there is also NFT_META_L4PROTO, which is also used as an implicit dependency when matching on the transport protocol header fields. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Florian Westphal <fw@strlen.de>	2026-04-08 07:51:31 +02:00
Eric Dumazet	7fb4c19670	net: pull headers in qdisc_pkt_len_segs_init() Most ndo_start_xmit() methods expects headers of gso packets to be already in skb->head. net/core/tso.c users are particularly at risk, because tso_build_hdr() does a memcpy(hdr, skb->data, hdr_len); qdisc_pkt_len_segs_init() already does a dissection of gso packets. Use pskb_may_pull() instead of skb_header_pointer() to make sure drivers do not have to reimplement this. Some malicious packets could be fed, detect them so that we can drop them sooner with a new SKB_DROP_REASON_SKB_BAD_GSO drop_reason. Fixes: `e876f208af` ("net: Add a software TSO helper API") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260403221540.3297753-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-07 19:02:13 -07:00
Geliang Tang	eb477fdd68	tcp: add recv_should_stop helper Factor out a new helper tcp_recv_should_stop() from tcp_recvmsg_locked() and tcp_splice_read() to check whether to stop receiving. And use this helper in mptcp_recvmsg() and mptcp_splice_read() to reduce redundant code. Suggested-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260403-net-next-mptcp-msg_eor-misc-v1-3-b0b33bea3fed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 19:14:27 -07:00
Daniel Golle	f259e08494	net: dsa: add bridge member iteration macro Drivers that offload bridges need to iterate over the ports that are members of a given bridge, for example to rebuild per-port forwarding bitmaps when membership changes. Currently drivers typically open-code this by combining dsa_switch_for_each_user_port() with a dsa_port_offloads_bridge_dev() check, or cache bridge membership within the driver. Add dsa_switch_for_each_bridge_member() macro to express this pattern directly, and use it for the existing dsa_bridge_ports() inline helper. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/e7136aaa26773f39e805a00fe4ecf13cd2b83fc0.1775049897.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:30:33 -07:00
Daniel Golle	b0a79590d1	net: dsa: move dsa_bridge_ports() helper to dsa.h The yt921x driver contains a helper to create a bitmap of ports which are members of a bridge. Move the helper as static inline function into dsa.h, so other driver can make use of it as well. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Link: https://patch.msgid.link/4f8bbfce3e4e3a02064fc4dc366263136c6e0383.1775049897.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-06 18:30:33 -07:00
Jakub Kicinski	8ffb33d770	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c `b18c833888` ("vsock: initialize child_ns_mode_locked in vsock_net_init()") `0de607dc4f` ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c `ceee35e567` ("bnxt_en: Refactor some basic ring setup and adjustment logic") `57cdfe0dc7` ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c `4d56037a02` ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") `687a95d204` ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h `b6045c899e` ("wifi: iwlwifi: mld: Refactor scan command handling") `ec66ec6a5a` ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c `078df640ef` ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") `323156c354` ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-02 11:03:13 -07:00
Jeremy Kerr	22cb45afd2	net: mctp: perform source address lookups when we populate our dst Rather than querying the output device for its address in mctp_local_output, set up the source address when we're populating the dst structure. If no address is assigned, use MCTP_ADDR_NULL. This will allow us more flexibility when routing for NULL-source-eid cases. For now though, we still reject a NULL source address in the output path. We need to update the tests a little, so that addresses are assigned before we do the dst lookups. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Link: https://patch.msgid.link/20260331-dev-mctp-null-eids-v1-1-b4d047372eaf@codeconstruct.com.au Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-04-02 13:31:36 +02:00
Fernando Fernandez Mancera	964870b4b9	ipv6: remove ipv6_stub infrastructure completely As IPv6 is built-in only and there are no more users of ipv6_stub, the ipv6_stub is now entirely obsolete. Remove all the code related to the definition, initialization and usage. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-11-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:24 -07:00
Fernando Fernandez Mancera	ad84b1eefe	bpf: remove ipv6_bpf_stub completely and use direct function calls As IPv6 is built-in only, the ipv6_bpf_stub can be removed completely. Convert all ipv6_bpf_stub usage to direct function calls instead. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-10-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:24 -07:00
Fernando Fernandez Mancera	d76f6b170a	net: convert remaining ipv6_stub users to direct function calls As IPv6 is built-in only, the ipv6_stub infrastructure is no longer necessary. Convert remaining ipv6_stub users to make direct function calls. The fallback functions introduced previously will prevent linkage errors when CONFIG_IPV6 is disabled. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-9-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	4b70b20215	ipv6: prepare headers for ipv6_stub removal In preparation for dropping ipv6_stub and converting its users to direct function calls, introduce static inline dummy functions and fallback macros in the IPv6 networking headers. In addition, introduce checks on fib6_nh_init(), ip6_dst_lookup_flow() and ip6_fragment() to avoid a crash due to ipv6.disable=1 set during booting. The other functions are safe as they cannot be called with ipv6.disable=1 set. These fallbacks ensure that when CONFIG_IPV6 is completely disabled, there are no compiling or linking errors due to code paths not guarded by preprocessor macro IS_ENABLED(CONFIG_IPV6). In addition, export ndisc_send_na(), ip6_route_input() and ip6_fragment(). Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Link: https://patch.msgid.link/20260325120928.15848-6-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	fde39f7df1	ipv6: replace IS_BUILTIN(CONFIG_IPV6) with IS_ENABLED(CONFIG_IPV6) As IPv6 is built-in only, it does not make sense to continue using IS_BUILTIN(CONFIG_IPV6). Therefore, replace it with IS_ENABLED() when necessary and drop it if it isn't valid anymore. Notice that there is still one instance related to ICMPv6, as it requires more changes it will be handle separately. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Tested-by: Ricardo B. Marlière <rbm@suse.com> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260325120928.15848-4-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:23 -07:00
Fernando Fernandez Mancera	0557a34487	net: remove EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() macros As IPv6 is built-in only, the macro is always evaluating to an empty one. Remove it completely from the code. Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de> Link: https://patch.msgid.link/20260325120928.15848-3-fmancera@suse.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-29 11:21:22 -07:00
Jiayuan Chen	552994294f	tcp: Fix inconsistent indenting warning Suppress such warning reported by test robot: include/net/tcp.h:1449 tcp_ca_event() warn: inconsistent indenting Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603251430.gQ3VuiKV-lkp@intel.com/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260325071854.805-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 20:44:45 -07:00
Sabrina Dubroca	629ec78ef8	mpls: add seqcount to protect the platform_label{,s} pair The RCU-protected codepaths (mpls_forward, mpls_dump_routes) can have an inconsistent view of platform_labels vs platform_label in case of a concurrent resize (resize_platform_label_table, under platform_mutex). This can lead to OOB accesses. This patch adds a seqcount, so that we get a consistent snapshot. Note that mpls_label_ok is also susceptible to this, so the check against RTA_DST in rtm_to_route_config, done outside platform_mutex, is not sufficient. This value gets passed to mpls_label_ok once more in both mpls_route_add and mpls_route_del, so there is no issue, but that additional check must not be removed. Reported-by: Yuan Tan <tanyuan98@outlook.com> Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Fixes: `7720c01f3f` ("mpls: Add a sysctl to control the size of the mpls label table") Fixes: `dde1b38e87` ("mpls: Convert mpls_dump_routes() to RCU.") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://patch.msgid.link/cd8fca15e3eb7e212b094064cd83652e20fd9d31.1774284088.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:32:14 -07:00
Jakub Kicinski	dbd94b9831	Merge tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== A fairly big set of changes all over, notably with: - cfg80211: new APIs for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware - mt76: - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - iwlwifi: UNII-9 and continuing UHR work * tag 'wireless-next-2026-03-26' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (230 commits) wifi: mac80211: ignore reserved bits in reconfiguration status wifi: cfg80211: allow protected action frame TX for NAN wifi: ieee80211: Add some missing NAN definitions wifi: nl80211: Add a notification to notify NAN channel evacuation wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification wifi: nl80211: allow reporting spurious NAN Data frames wifi: cfg80211: allow ToDS=0/FromDS=0 data frames on NAN data interfaces wifi: nl80211: define an API for configuring the NAN peer's schedule wifi: nl80211: add support for NAN stations wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN wifi: cfg80211: add support for NAN data interface wifi: cfg80211: make sure NAN chandefs are valid wifi: cfg80211: Add an API to configure local NAN schedule wifi: mac80211: cleanup error path of ieee80211_do_open wifi: mac80211: extract channel logic from link logic wifi: iwlwifi: mld: set RX_FLAG_RADIOTAP_TLV_AT_END generically wifi: iwlwifi: reduce the number of prints upon firmware crash wifi: iwlwifi: fix the description of SESSION_PROTECTION_CMD wifi: iwlwifi: mld: introduce iwl_mld_vif_fw_id_valid wifi: iwlwifi: mld: block EMLSR during TDLS connections ... ==================== Link: https://patch.msgid.link/20260326152021.305959-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 18:17:14 -07:00
Jakub Kicinski	9ebcf66cd6	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc6). No conflicts, or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-26 12:09:57 -07:00
Long Li	45b2b84ac6	net: mana: Set default number of queues to 16 Set the default number of queues per vPort to MANA_DEF_NUM_QUEUES (16), as 16 queues can achieve optimal throughput for typical workloads. The actual number of queues may be lower if it exceeds the hardware reported limit. Users can increase the number of queues up to max_queues via ethtool if needed. Signed-off-by: Long Li <longli@microsoft.com> Link: https://patch.msgid.link/20260323194925.1766385-1-longli@microsoft.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-26 15:04:31 +01:00
Pablo Neira Ayuso	02a3231b6d	netfilter: nf_conntrack_expect: store netns and zone in expectation __nf_ct_expect_find() and nf_ct_expect_find_get() are called under rcu_read_lock() but they dereference the master conntrack via exp->master. Since the expectation does not hold a reference on the master conntrack, this could be dying conntrack or different recycled conntrack than the real master due to SLAB_TYPESAFE_RCU. Store the netns, the master_tuple and the zone in struct nf_conntrack_expect as a safety measure. This patch is required by the follow up fix not to dump expectations that do not belong to this netns. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:24:40 +01:00
Pablo Neira Ayuso	bffcaad9af	netfilter: ctnetlink: ensure safe access to master conntrack Holding reference on the expectation is not sufficient, the master conntrack object can just go away, making exp->master invalid. To access exp->master safely: - Grab the nf_conntrack_expect_lock, this gets serialized with clean_from_lists() which also holds this lock when the master conntrack goes away. - Hold reference on master conntrack via nf_conntrack_find_get(). Not so easy since the master tuple to look up for the master conntrack is not available in the existing problematic paths. This patch goes for extending the nf_conntrack_expect_lock section to address this issue for simplicity, in the cases that are described below this is just slightly extending the lock section. The add expectation command already holds a reference to the master conntrack from ctnetlink_create_expect(). However, the delete expectation command needs to grab the spinlock before looking up for the expectation. Expand the existing spinlock section to address this to cover the expectation lookup. Note that, the nf_ct_expect_iterate_net() calls already grabs the spinlock while iterating over the expectation table, which is correct. The get expectation command needs to grab the spinlock to ensure master conntrack does not go away. This also expands the existing spinlock section to cover the expectation lookup too. I needed to move the netlink skb allocation out of the spinlock to keep it GFP_KERNEL. For the expectation events, the IPEXP_DESTROY event is already delivered under the spinlock, just move the delivery of IPEXP_NEW under the spinlock too because the master conntrack event cache is reached through exp->master. While at it, add lockdep notations to help identify what codepaths need to grab the spinlock. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:32 +01:00
Pablo Neira Ayuso	9c42bc9db9	netfilter: nf_conntrack_expect: honor expectation helper field The expectation helper field is mostly unused. As a result, the netfilter codebase relies on accessing the helper through exp->master. Always set on the expectation helper field so it can be used to reach the helper. nf_ct_expect_init() is called from packet path where the skb owns the ct object, therefore accessing exp->master for the newly created expectation is safe. This saves a lot of updates in all callsites to pass the ct object as parameter to nf_ct_expect_init(). This is a preparation patches for follow up fixes. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>	2026-03-26 13:18:31 +01:00
Miri Korenblit	154b0296c0	wifi: nl80211: Add a notification to notify NAN channel evacuation If all available channel resources are used for NAN channels, and one of them is shared with another interface, and that interface needs to move to a different channel (for example STA interface that needs to do a channel or a link switch), then the driver can evacuate one of the NAN channels (i.e. detach it from its channel resource and announce to the peers that this channel is ULWed). In that case, the driver needs to notify user space about the channel evacuation, so the user space can adjust the local schedule accordingly. Add a notification to let userspace know about it. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.d5bebfd5ff73.Iaaf5ef17e1ab7a38c19d60558e68fcf517e2b400@changeid Link: https://patch.msgid.link/20260318123926.206536-11-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	44ea50a5bf	wifi: nl80211: add NL80211_CMD_NAN_ULW_UPDATE notification Add a new notification command that allows drivers to notify user space when the device's ULW (Unaligned Schedule) blob has been updated. This enables user space to attach the updated ULW blob to frames sent to NAN peers. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.32b715af4ebb.Ibdb6e33941afd94abf77245245f87e4338d729d3@changeid Link: https://patch.msgid.link/20260318123926.206536-10-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	f826534483	wifi: nl80211: allow reporting spurious NAN Data frames Currently we have this ability for AP and GO. But it is now needed also for NAN_DATA mode - as per Wi-Fi Aware (TM) 4.0 specification 6.2.5: "If a NAN Device receives a unicast NAN Data frame destined for it, but with A1 address and A2 address that are not assigned to the NDP, it shall discard the frame, and should send a Data Path Termination NAF to the frame transmitter" To allow this, change NL80211_CMD_UNEXPECTED_FRAME to support also NAN_DATA, so drivers can report such cases and the user space can act accordingly. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260108102921.5cf9f1351655.I47c98ce37843730b8b9eb8bd8e9ef62ed6c17613@changeid Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219094725.3846371-6-miriam.rachel.korenblit@intel.com Link: https://patch.msgid.link/20260318123926.206536-9-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	c4aa273ff6	wifi: nl80211: define an API for configuring the NAN peer's schedule Add an NL80211 command to configure the NAN schedule of a NAN peer. Such a schedule contains a list of NAN channels, and a mapping from each time slots to the corresponding channel (or unscheduled). Also contains more information about the schedule, such as sequence ID and map ID. Not all of the restrictions are validated in this patch. In particular, comparison of two maps of the same peer requires storing/retrieving each map of each peer, only for validation. Therefore, it is the responsibilty of the driver to check that. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.5b13fa5af4f6.If0e214ff5b52c9666e985fefa3f7be0ad14d93fb@changeid Link: https://patch.msgid.link/20260318123926.206536-7-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:55 +01:00
Miri Korenblit	1f1101c29e	wifi: nl80211: add support for NAN stations There are 2 types of logical links with a NAN peer: - management (NMI), which is used for Tx/Rx of NAN management frames. - data (NDI), which is used for Tx/Rx of data frames, or non-NAN management frames. The NMI station has two roles: - representation of the NAN peer - for example, the peer's schedule and the HT, VHT, HE capabilities - belong to the NMI station, and not to the NDI ones. - Tx/Rx of NAN management frames to/from the peer. The NDI station is used for Tx/Rx data frames of a specific NDP that was established with the NAN peer. Note that a peer can choose to reuse its NMI address as the NDI address. In that case, it is expected that two stations will be added even though they will have the same address. - An NDI station can only be added after the corresponding NMI station was configured with capabilities. - All the NDI stations will be removed before the NDI interface is brought down. - All NMI stations will be removed before NAN is stopped. - Before NMI sta removal, all corresponding NDI stations will be removed Add support for adding, removing, and changing NMI and NDI stations. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.d280936ee832.I6d859eee759bb5824a9ffd2984410faf879ba00e@changeid Link: https://patch.msgid.link/20260318123926.206536-6-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:54 +01:00
Miri Korenblit	bd11c96604	wifi: cfg80211: separately store HT, VHT and HE capabilities for NAN In NAN, unlike in other modes, there is only one set of (HT, VHT, HE) capabilities that is used for all channels (and bands) used in the NAN data path. This set of capabilities will have to be a special one, for example - have the minimum of (HT-for-5 GHz, HT-for-2.4 GHz), careful handling of the bits that have a different meaning for each band, etc. While we could use the exiting sband/iftype capabilities, and require identical capabilities for all bands (makes no sense since this means that we will have VHT capabilities in the 2.4 GHz slot), or require that only one of the sbands will be set, or have logic to extract the minimum and handle the conflicting bits - it seems simpler to add a dedicated set of capabilities which is special for NAN, and is band agnostic, to be populated by the driver. That way we also let the driver decide how it wants to handle the conflicting bits. Add this special set of these capabilities to wiphy:nan_capabilities, to be populated by the driver. Send it to user space. Reviewed-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.4b6f3e4a81b4.I45422adc0df3ad4101d857a92e83f0de5cf241e1@changeid Link: https://patch.msgid.link/20260318123926.206536-5-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:54 +01:00
Miri Korenblit	0e8ec738a7	wifi: cfg80211: add support for NAN data interface This new interface type represents a NAN data interface (NDI). It is used for data communication with NAN peers. Note that the existing NL80211_IFTYPE_NAN interface, which is the NAN Management Interface (NMI), is used for management communication. An NDI interface is started when a new NAN data path is about to be established, and is stopped after the NAN data path is terminated. - An NDI interface can only be started if the NMI is running, and NAN is started. - Before the NMI is stopped, the NDI interfaces will be stopped. Add the new interface type, handle add/remove operations for it, and makes sure of the conditions above. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.0d681335c2e2.I92973483e927820ae2297853c141842fdb262747@changeid Link: https://patch.msgid.link/20260318123926.206536-4-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:53 +01:00
Miri Korenblit	6e78b70c9a	wifi: cfg80211: Add an API to configure local NAN schedule Add an nl80211 API to allow user space to configure the local NAN schedule. The local schedule consists of a list of channel definitions and a schedule map, in which each element covers a time slot and indicates on what channel the device should be in that time slot. Channels can be added to schedule even without being scheduled, for reservation purposes. A schedule can be configured either immedietally or be deferred, in case there are already connected peers. When the deferred flag is set, the command is a request from the device to perform an announced schedule update: send the updated NAN Availability - as set in this command - to the peers, and do the actual switch to the new schedule on the right time (i.e. at the end of the slot after the slot in which the update was sent to the peers). In addition, a notification will be sent to indicate a deferred update completion. Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com> Link: https://patch.msgid.link/20260219114327.ecca178a2de0.Ic977ab08b4ed5cf9b849e55d3a59b01ad3fbd08e@changeid Link: https://patch.msgid.link/20260318123926.206536-2-miriam.rachel.korenblit@intel.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-25 20:56:05 +01:00
Eric Dumazet	d1e59a4697	tcp: add cwnd_event_tx_start to tcp_congestion_ops (tcp_congestion_ops)->cwnd_event() is called very often, with @event oscillating between CA_EVENT_TX_START and other values. This is not branch prediction friendly. Provide a new cwnd_event_tx_start pointer dedicated for CA_EVENT_TX_START. Both BBR and CUBIC benefit from this change, since they only care about CA_EVENT_TX_START. No change in kernel size: $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 4/4 grow/shrink: 3/1 up/down: 564/-568 (-4) Function old new delta bbr_cwnd_event_tx_start - 450 +450 cubictcp_cwnd_event_tx_start - 70 +70 __pfx_cubictcp_cwnd_event_tx_start - 16 +16 __pfx_bbr_cwnd_event_tx_start - 16 +16 tcp_unregister_congestion_control 93 99 +6 tcp_update_congestion_control 518 521 +3 tcp_register_congestion_control 422 425 +3 __tcp_transmit_skb 3308 3306 -2 __pfx_cubictcp_cwnd_event 16 - -16 __pfx_bbr_cwnd_event 16 - -16 cubictcp_cwnd_event 80 - -80 bbr_cwnd_event 454 - -454 Total: Before=25240512, After=25240508, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323234920.1097858-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 21:00:38 -07:00
Jonas Köppeler	815980fe6d	net_sched: codel: fix stale state for empty flows in fq_codel When codel_dequeue() finds an empty queue, it resets vars->dropping but does not reset vars->first_above_time. The reference CoDel algorithm (Nichols & Jacobson, ACM Queue 2012) resets both: dodeque_result codel_queue_t::dodeque(time_t now) { ... if (r.p == NULL) { first_above_time = 0; // <-- Linux omits this } ... } Note that codel_should_drop() does reset first_above_time when called with a NULL skb, but codel_dequeue() returns early before ever calling codel_should_drop() in the empty-queue case. The post-drop code paths do reach codel_should_drop(NULL) and correctly reset the timer, so a dropped packet breaks the cycle -- but the next delivered packet re-arms first_above_time and the cycle repeats. For sparse flows such as ICMP ping (one packet every 200ms-1s), the first packet arms first_above_time, the flow goes empty, and the second packet arrives after the interval has elapsed and gets dropped. The pattern repeats, producing sustained loss on flows that are not actually congested. Test: veth pair, fq_codel, BQL disabled, 30000 iptables rules in the consumer namespace (NAPI-64 cycle ~14ms, well above fq_codel's 5ms target), ping at 5 pps under UDP flood: Before fix: 26% ping packet loss After fix: 0% ping packet loss Fix by resetting first_above_time to zero in the empty-queue path of codel_dequeue(), matching the reference algorithm. Fixes: `76e3cc126b` ("codel: Controlled Delay AQM") Fixes: `d068ca2ae2` ("codel: split into multiple files") Co-developed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reported-by: Chris Arges <carges@cloudflare.com> Tested-by: Jonas Köppeler <j.koeppeler@tu-berlin.de> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/all/20260318134826.1281205-7-hawk@kernel.org/ Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260323174920.253526-1-hawk@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-24 20:57:57 -07:00
Paolo Abeni	51a209ee33	Merge tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2026-03-23 1) Add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi. From Sabrina Dubroca. 2) Fix the condition on x->pcpu_num in xfrm_sa_len by using the proper check. From Sabrina Dubroca. 3) Call xdo_dev_state_delete during state update to properly cleanup the xdo device state. From Sabrina Dubroca. 4) Fix a potential skb leak in espintcp when async crypto is used. From Sabrina Dubroca. 5) Validate inner IPv4 header length in IPTFS payload to avoid parsing malformed packets. From Roshan Kumar. 6) Fix skb_put() panic on non-linear skb during IPTFS reassembly. From Fernando Fernandez Mancera. 7) Silence various sparse warnings related to RCU, state, and policy handling. From Sabrina Dubroca. 8) Fix work re-schedule race after cancel in xfrm_nat_keepalive_net_fini(). From Hyunwoo Kim. 9) Prevent policy_hthresh.work from racing with netns teardown by using a proper cleanup mechanism. From Minwoo Ra. 10) Validate that the family of the source and destination addresses match in pfkey_send_migrate(). From Eric Dumazet. 11) Only publish mode_data after the clone is setup in the IPTFS receive path. This prevents leaving x->mode_data pointing at freed memory on error. From Paul Moses. Please pull or let me know if there are problems. ipsec-2026-03-23 * tag 'ipsec-2026-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec: xfrm: iptfs: only publish mode_data after clone setup af_key: validate families in pfkey_send_migrate() xfrm: prevent policy_hthresh.work from racing with netns teardown xfrm: Fix work re-schedule after cancel in xfrm_nat_keepalive_net_fini() xfrm: avoid RCU warnings around the per-netns netlink socket xfrm: add rcu_access_pointer to silence sparse warning for xfrm_input_afinfo xfrm: policy: silence sparse warning in xfrm_policy_unregister_afinfo xfrm: policy: fix sparse warnings in xfrm_policy_{init,fini} xfrm: state: silence sparse warnings during netns exit xfrm: remove rcu/state_hold from xfrm_state_lookup_spi_proto xfrm: state: add xfrm_state_deref_prot to state_by* walk under lock xfrm: state: fix sparse warnings around XFRM_STATE_INSERT xfrm: state: fix sparse warnings in xfrm_state_init xfrm: state: fix sparse warnings on xfrm_state_hold_rcu xfrm: iptfs: fix skb_put() panic on non-linear skb during reassembly xfrm: iptfs: validate inner IPv4 header length in IPTFS payload esp: fix skb leak with espintcp and async crypto xfrm: call xdo_dev_state_delete during state update xfrm: fix the condition on x->pcpu_num in xfrm_sa_len xfrm: add missing extack for XFRMA_SA_PCPU in add_acquire and allocspi ==================== Link: https://patch.msgid.link/20260323083440.2741292-1-steffen.klassert@secunet.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-24 15:16:28 +01:00
Martin KaFai Lau	e537dd15d0	udp: Fix wildcard bind conflict check when using hash2 When binding a udp_sock to a local address and port, UDP uses two hashes (udptable->hash and udptable->hash2) for collision detection. The current code switches to "hash2" when hslot->count > 10. "hash2" is keyed by local address and local port. "hash" is keyed by local port only. The issue can be shown in the following bind sequence (pseudo code): bind(fd1, "[fd00::1]:8888") bind(fd2, "[fd00::2]:8888") bind(fd3, "[fd00::3]:8888") bind(fd4, "[fd00::4]:8888") bind(fd5, "[fd00::5]:8888") bind(fd6, "[fd00::6]:8888") bind(fd7, "[fd00::7]:8888") bind(fd8, "[fd00::8]:8888") bind(fd9, "[fd00::9]:8888") bind(fd10, "[fd00::10]:8888") /* Correctly return -EADDRINUSE because "hash" is used * instead of "hash2". udp_lib_lport_inuse() detects the * conflict. / bind(fail_fd, "[::]:8888") / After one more socket is bound to "[fd00::11]:8888", * hslot->count exceeds 10 and "hash2" is used instead. / bind(fd11, "[fd00::11]:8888") bind(fail_fd, "[::]:8888") / succeeds unexpectedly */ The same issue applies to the IPv4 wildcard address "0.0.0.0" and the IPv4-mapped wildcard address "::ffff:0.0.0.0". For example, if there are existing sockets bound to "192.168.1.[1-11]:8888", then binding "0.0.0.0:8888" or "[::ffff:0.0.0.0]:8888" can also miss the conflict when hslot->count > 10. TCP inet_csk_get_port() already has the correct check in inet_use_bhash2_on_bind(). Rename it to inet_use_hash2_on_bind() and move it to inet_hashtables.h so udp.c can reuse it in this fix. Fixes: `30fff9231f` ("udp: bind() optimisation") Reported-by: Andrew Onyshchuk <oandrew@meta.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260319181817.1901357-1-martin.lau@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 18:46:45 -07:00
Kuniyuki Iwashima	4be7b99c25	ipv6: Don't remove permanent routes with exceptions from tb6_gc_hlist. The cited commit mechanically put fib6_remove_gc_list() just after every fib6_clean_expires() call. When a temporary route is promoted to a permanent route, there may already be exception routes tied to it. If fib6_remove_gc_list() removes the route from tb6_gc_hlist, such exception routes will no longer be aged. Let's replace fib6_remove_gc_list() with a new helper fib6_may_remove_gc_list() and use fib6_age_exceptions() there. Note that net->ipv6 is only compiled when CONFIG_IPV6 is enabled, so fib6_{add,remove,may_remove}_gc_list() are guarded. Fixes: `5eb902b8e7` ("net/ipv6: Remove expired routes with a separated list of routes.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/20260320072317.2561779-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-23 16:59:31 -07:00
Jakub Kicinski	edab1ca5ec	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c `598adea720` ("netfilter: revert nft_set_rbtree: validate open interval overlap") `3aea466a43` ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-19 14:16:00 -07:00
Luiz Augusto von Dentz	761fb8ec87	Bluetooth: L2CAP: Fix regressions caused by reusing ident This attempt to fix regressions caused by reusing ident which apparently is not handled well on certain stacks causing the stack to not respond to requests, so instead of simple returning the first unallocated id this stores the last used tx_ident and then attempt to use the next until all available ids are exausted and then cycle starting over to 1. Link: https://bugzilla.kernel.org/show_bug.cgi?id=221120 Link: https://bugzilla.kernel.org/show_bug.cgi?id=221177 Fixes: `6c3ea155e5` ("Bluetooth: L2CAP: Fix not tracking outstanding TX ident") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Tested-by: Christian Eggers <ceggers@arri.de>	2026-03-19 14:44:25 -04:00
Paolo Abeni	9ac76f3d0b	Merge tag 'wireless-next-2026-03-19' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Aside from various small improvements/cleanups, not much: - cfg80211/mac80211: S1G and UHR improvements - hwsim: incumbent signal report test support * tag 'wireless-next-2026-03-19' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (31 commits) qtnfmac: use alloc_netdev macro for single queue devices wifi: libertas: don't kill URBs in interrupt context wifi: libertas: use USB anchors for tracking in-flight URBs wifi: nl80211: use int for band coming from netlink wifi: rsi_91x_usb: do not pause rfkill polling when stopping mac80211 wifi: mac80211: fix STA link removal during link removal wifi: nl80211: reject S1G/60G with HT chantype wifi: ieee80211: fix definition of EHT-MCS 15 in MRU wifi: cfg80211: check non-S1G width with S1G chandef wifi: cfg80211: restrict cfg80211_chandef_create() to only HT-based bands wifi: mac80211: don't use cfg80211_chandef_create() for default chandef wifi: mac80211: Remove deleted sta links in ieee80211_ml_reconf_work() wifi: b43: use register definitions in nphy_op_software_rfkill wifi: cfg80211: split control freq check from chandef check wifi: mac80211: always use full chanctx compatible check wifi: mac80211: refactor chandef tracing macros wifi: mac80211: validate HE 6 GHz operation when EHT is used wifi: nl80211: split out UHR operation information wifi: mwifiex: drop redundant device reference wifi: rt2x00: drop redundant device reference ... ==================== Link: https://patch.msgid.link/20260319082439.79875-3-johannes@sipsolutions.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-19 15:30:20 +01:00
Eric Woudstra	96450df197	bridge: No DEV_PATH_BR_VLAN_UNTAG_HW for dsa foreign In network setup as below: fastpath bypass .----------------------------------------. / \ \| IP - forwarding \| \| / \ v \| / wan ... \| / \| \| \| \| \| brlan.1 \| \| \| +-------------------------------+ \| \| vlan 1 \| \| \| \| \| \| brlan (vlan-filtering) \| \| \| +---------------+ \| \| \| DSA-SWITCH \| \| \| vlan 1 \| \| \| \| to \| \| \| \| untagged 1 vlan 1 \| \| +---------------+---------------+ . / \ ----->wlan1 lan0 . . . ^ ^ vlan 1 tagged packets untagged packets br_vlan_fill_forward_path_mode() sets DEV_PATH_BR_VLAN_UNTAG_HW when filling in from brlan.1 towards wlan1. But it should be set to DEV_PATH_BR_VLAN_UNTAG in this case. Using BR_VLFLAG_ADDED_BY_SWITCHDEV is not correct. The dsa switchdev adds it as a foreign port. The same problem for all foreignly added dsa vlans on the bridge. First add the vlan, trying only native devices. If this fails, we know this may be a vlan from a foreign device. Use BR_VLFLAG_TAGGING_BY_SWITCHDEV to make sure DEV_PATH_BR_VLAN_UNTAG_HW is set only when there if no foreign device involved. Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Eric Woudstra <ericwouds@gmail.com> Link: https://patch.msgid.link/20260317110347.363875-1-ericwouds@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-19 13:14:00 +01:00
Haiyang Zhang	d01440e10a	net: mana: Add ethtool counters for RX CQEs in coalesced type For RX CQEs with type CQE_RX_COALESCED_4, to measure the coalescing efficiency, add counters to count how many contains 2, 3, 4 packets respectively. Also, add a counter for the error case of first packet with length == 0. Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-4-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 20:01:10 -07:00
Haiyang Zhang	c2fe3ff3d6	net: mana: Add support for RX CQE Coalescing Our NIC can have up to 4 RX packets on 1 CQE. To support this feature, check and process the type CQE_RX_COALESCED_4. The default setting is disabled, to avoid possible regression on latency. And, add ethtool handler to switch this feature. To turn it on, run: ethtool -C <nic> rx-cqe-frames 4 To turn it off: ethtool -C <nic> rx-cqe-frames 1 The rx-cqe-nsec is the time out value in nanoseconds after the first packet arrival in a coalesced CQE to be sent. It's read-only for this NIC. Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Link: https://patch.msgid.link/20260317191826.1346111-3-haiyangz@linux.microsoft.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 20:01:10 -07:00
Jakub Kicinski	7c46bd845d	Merge tag 'wireless-2026-03-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes Berg says: ==================== Just a few updates: - cfg80211: - guarantee pmsr work is cancelled - mac80211: - reject TDLS operations on non-TDLS stations - fix crash in AP_VLAN bandwidth change - fix leak or double-free on some TX preparation failures - remove keys needed for beacons _after_ stopping those - fix debugfs static branch race - avoid underflow in inactive time - fix another NULL dereference in mesh on invalid frames - ti/wlcore: avoid infinite realloc loop * tag 'wireless-2026-03-18' of https://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless: wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure wifi: wlcore: Return -ENOMEM instead of -EAGAIN if there is not enough headroom wifi: mac80211: fix NULL deref in mesh_matches_local() wifi: mac80211: check tdls flag in ieee80211_tdls_oper wifi: cfg80211: cancel pmsr_free_wk in cfg80211_pmsr_wdev_down wifi: mac80211: Fix static_branch_dec() underflow for aql_disable. mac80211: fix crash in ieee80211_chan_bw_change for AP_VLAN stations wifi: mac80211: use jiffies_delta_to_msecs() for sta_info inactive times wifi: mac80211: remove keys after disabling beaconing wifi: mac80211_hwsim: fully initialise PMSR capabilities ==================== Link: https://patch.msgid.link/20260318172515.381148-3-johannes@sipsolutions.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 19:25:41 -07:00
Xiang Mei	b3a6df291f	udp_tunnel: fix NULL deref caused by udp_sock_create6 when CONFIG_IPV6=n When CONFIG_IPV6 is disabled, the udp_sock_create6() function returns 0 (success) without actually creating a socket. Callers such as fou_create() then proceed to dereference the uninitialized socket pointer, resulting in a NULL pointer dereference. The captured NULL deref crash: BUG: kernel NULL pointer dereference, address: 0000000000000018 RIP: 0010:fou_nl_add_doit (net/ipv4/fou_core.c:590 net/ipv4/fou_core.c:764) [...] Call Trace: <TASK> genl_family_rcv_msg_doit.constprop.0 (net/netlink/genetlink.c:1114) genl_rcv_msg (net/netlink/genetlink.c:1194 net/netlink/genetlink.c:1209) [...] netlink_rcv_skb (net/netlink/af_netlink.c:2550) genl_rcv (net/netlink/genetlink.c:1219) netlink_unicast (net/netlink/af_netlink.c:1319 net/netlink/af_netlink.c:1344) netlink_sendmsg (net/netlink/af_netlink.c:1894) __sock_sendmsg (net/socket.c:727 (discriminator 1) net/socket.c:742 (discriminator 1)) __sys_sendto (./include/linux/file.h:62 (discriminator 1) ./include/linux/file.h:83 (discriminator 1) net/socket.c:2183 (discriminator 1)) __x64_sys_sendto (net/socket.c:2213 (discriminator 1) net/socket.c:2209 (discriminator 1) net/socket.c:2209 (discriminator 1)) do_syscall_64 (arch/x86/entry/syscall_64.c:63 (discriminator 1) arch/x86/entry/syscall_64.c:94 (discriminator 1)) entry_SYSCALL_64_after_hwframe (net/arch/x86/entry/entry_64.S:130) This patch makes udp_sock_create6 return -EPFNOSUPPORT instead, so callers correctly take their error paths. There is only one caller of the vulnerable function and only privileged users can trigger it. Fixes: `fd384412e1` ("udp_tunnel: Seperate ipv6 functions into its own file.") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Link: https://patch.msgid.link/20260317010241.1893893-1-xmei5@asu.edu Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-18 18:00:07 -07:00
Felix Fietkau	d5ad6ab61c	wifi: mac80211: always free skb on ieee80211_tx_prepare_skb() failure ieee80211_tx_prepare_skb() has three error paths, but only two of them free the skb. The first error path (ieee80211_tx_prepare() returning TX_DROP) does not free it, while invoke_tx_handlers() failure and the fragmentation check both do. Add kfree_skb() to the first error path so all three are consistent, and remove the now-redundant frees in callers (ath9k, mt76, mac80211_hwsim) to avoid double-free. Document the skb ownership guarantee in the function's kdoc. Signed-off-by: Felix Fietkau <nbd@nbd.name> Link: https://patch.msgid.link/20260314065455.2462900-1-nbd@nbd.name Fixes: `06be6b149f` ("mac80211: add ieee80211_tx_prepare_skb() helper function") Signed-off-by: Johannes Berg <johannes.berg@intel.com>	2026-03-18 09:09:58 +01:00
Daniel Borkmann	a0671125d4	clsact: Fix use-after-free in init/destroy rollback asymmetry Fix a use-after-free in the clsact qdisc upon init/destroy rollback asymmetry. The latter is achieved by first fully initializing a clsact instance, and then in a second step having a replacement failure for the new clsact qdisc instance. clsact_init() initializes ingress first and then takes care of the egress part. This can fail midway, for example, via tcf_block_get_ext(). Upon failure, the kernel will trigger the clsact_destroy() callback. Commit `1cb6f0bae5` ("bpf: Fix too early release of tcx_entry") details the way how the transition is happening. If tcf_block_get_ext on the q->ingress_block ends up failing, we took the tcx_miniq_inc reference count on the ingress side, but not yet on the egress side. clsact_destroy() tests whether the {ingress,egress}_entry was non-NULL. However, even in midway failure on the replacement, both are in fact non-NULL with a valid egress_entry from the previous clsact instance. What we really need to test for is whether the qdisc instance-specific ingress or egress side previously got initialized. This adds a small helper for checking the miniq initialization called mini_qdisc_pair_inited, and utilizes that upon clsact_destroy() in order to fix the use-after-free scenario. Convert the ingress_destroy() side as well so both are consistent to each other. Fixes: `1cb6f0bae5` ("bpf: Fix too early release of tcx_entry") Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260313065531.98639-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-03-17 12:09:16 +01:00
Jamal Hadi Salim	66360460ca	net/sched: teql: Fix double-free in teql_master_xmit Whenever a TEQL devices has a lockless Qdisc as root, qdisc_reset should be called using the seq_lock to avoid racing with the datapath. Failure to do so may cause crashes like the following: [ 238.028993][ T318] BUG: KASAN: double-free in skb_release_data (net/core/skbuff.c:1139) [ 238.029328][ T318] Free of addr ffff88810c67ec00 by task poc_teql_uaf_ke/318 [ 238.029749][ T318] [ 238.029900][ T318] CPU: 3 UID: 0 PID: 318 Comm: poc_teql_ke Not tainted 7.0.0-rc3-00149-ge5b31d988a41 #704 PREEMPT(full) [ 238.029906][ T318] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 238.029910][ T318] Call Trace: [ 238.029913][ T318] <TASK> [ 238.029916][ T318] dump_stack_lvl (lib/dump_stack.c:122) [ 238.029928][ T318] print_report (mm/kasan/report.c:379 mm/kasan/report.c:482) [ 238.029940][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029944][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.029957][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029969][ T318] kasan_report_invalid_free (mm/kasan/report.c:221 mm/kasan/report.c:563) [ 238.029979][ T318] ? skb_release_data (net/core/skbuff.c:1139) [ 238.029989][ T318] check_slab_allocation (mm/kasan/common.c:231) [ 238.029995][ T318] kmem_cache_free (mm/slub.c:2637 (discriminator 1) mm/slub.c:6168 (discriminator 1) mm/slub.c:6298 (discriminator 1)) [ 238.030004][ T318] skb_release_data (net/core/skbuff.c:1139) ... [ 238.030025][ T318] sk_skb_reason_drop (net/core/skbuff.c:1256) [ 238.030032][ T318] pfifo_fast_reset (./include/linux/ptr_ring.h:171 ./include/linux/ptr_ring.h:309 ./include/linux/skb_array.h:98 net/sched/sch_generic.c:827) [ 238.030039][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) ... [ 238.030054][ T318] qdisc_reset (net/sched/sch_generic.c:1034) [ 238.030062][ T318] teql_destroy (./include/linux/spinlock.h:395 net/sched/sch_teql.c:157) [ 238.030071][ T318] __qdisc_destroy (./include/net/pkt_sched.h:328 net/sched/sch_generic.c:1077) [ 238.030077][ T318] qdisc_graft (net/sched/sch_api.c:1062 net/sched/sch_api.c:1053 net/sched/sch_api.c:1159) [ 238.030089][ T318] ? __pfx_qdisc_graft (net/sched/sch_api.c:1091) [ 238.030095][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030102][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030106][ T318] ? srso_alias_return_thunk (arch/x86/lib/retpoline.S:221) [ 238.030114][ T318] tc_get_qdisc (net/sched/sch_api.c:1529 net/sched/sch_api.c:1556) ... [ 238.072958][ T318] Allocated by task 303 on cpu 5 at 238.026275s: [ 238.073392][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.073884][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.074230][ T318] __kasan_slab_alloc (mm/kasan/common.c:369) [ 238.074578][ T318] kmem_cache_alloc_node_noprof (./include/linux/kasan.h:253 mm/slub.c:4542 mm/slub.c:4869 mm/slub.c:4921) [ 238.076091][ T318] kmalloc_reserve (net/core/skbuff.c:616 (discriminator 107)) [ 238.076450][ T318] __alloc_skb (net/core/skbuff.c:713) [ 238.076834][ T318] alloc_skb_with_frags (./include/linux/skbuff.h:1383 net/core/skbuff.c:6763) [ 238.077178][ T318] sock_alloc_send_pskb (net/core/sock.c:2997) [ 238.077520][ T318] packet_sendmsg (net/packet/af_packet.c:2926 net/packet/af_packet.c:3019 net/packet/af_packet.c:3108) [ 238.081469][ T318] [ 238.081870][ T318] Freed by task 299 on cpu 1 at 238.028496s: [ 238.082761][ T318] kasan_save_stack (mm/kasan/common.c:58) [ 238.083481][ T318] kasan_save_track (mm/kasan/common.c:64 (discriminator 5) mm/kasan/common.c:79 (discriminator 5)) [ 238.085348][ T318] kasan_save_free_info (mm/kasan/generic.c:587 (discriminator 1)) [ 238.085900][ T318] __kasan_slab_free (mm/kasan/common.c:287) [ 238.086439][ T318] kmem_cache_free (mm/slub.c:6168 (discriminator 3) mm/slub.c:6298 (discriminator 3)) [ 238.087007][ T318] skb_release_data (net/core/skbuff.c:1139) [ 238.087491][ T318] consume_skb (net/core/skbuff.c:1451) [ 238.087757][ T318] teql_master_xmit (net/sched/sch_teql.c:358) [ 238.088116][ T318] dev_hard_start_xmit (./include/linux/netdevice.h:5324 ./include/linux/netdevice.h:5333 net/core/dev.c:3871 net/core/dev.c:3887) [ 238.088468][ T318] sch_direct_xmit (net/sched/sch_generic.c:347) [ 238.088820][ T318] __qdisc_run (net/sched/sch_generic.c:420 (discriminator 1)) [ 238.089166][ T318] __dev_queue_xmit (./include/net/sch_generic.h:229 ./include/net/pkt_sched.h:121 ./include/net/pkt_sched.h:117 net/core/dev.c:4196 net/core/dev.c:4802) Workflow to reproduce: 1. Initialize a TEQL topology (dummy0 and ifb0 as slaves, teql0 up). 2. Start multiple sender workers continuously transmitting packets through teql0 to drive teql_master_xmit(). 3. In parallel, repeatedly delete and re-add the root qdisc on dummy0 and ifb0 via RTNETLINK, forcing frequent teardown and reset activity (teql_destroy() / qdisc_reset()). 4. After running both workloads concurrently for several iterations, KASAN reports slab-use-after-free or double-free in the skb free path. Fix this by moving dev_reset_queue to sch_generic.h and calling it, instead of qdisc_reset, in teql_destroy since it handles both the lock and lockless cases correctly for root qdiscs. Fixes: `96009c7d50` ("sched: replace __QDISC_STATE_RUNNING bit with a spin lock") Reported-by: Xianrui Dong <keenanat2000@gmail.com> Tested-by: Xianrui Dong <keenanat2000@gmail.com> Co-developed-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Victor Nogueira <victor@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Link: https://patch.msgid.link/20260315155422.147256-1-jhs@mojatatu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-16 19:40:32 -07:00
Maciej Fijalkowski	cc6421acd9	xsk: remove repeated defines Seems we have been carrying around repeated defines for unaligned mode logic. Remove redundant ones. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260313111931.438911-1-maciej.fijalkowski@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-16 19:28:21 -07:00
Jiri Pirko	1850e76b38	devlink: introduce shared devlink instance for PFs on same chip Multiple PFs may reside on the same physical chip, running a single firmware. Some of the resources and configurations may be shared among these PFs. Currently, there is no good object to pin the configuration knobs on. Introduce a shared devlink instance, instantiated upon probe of the first PF and removed during remove of the last PF. The shared devlink instance is not backed by any device device, as there is no PCI device related to it. The implementation uses reference counting to manage the lifecycle: each PF that probes calls devlink_shd_get() to get or create the shared instance, and calls devlink_shd_put() when it removes. The shared instance is automatically destroyed when the last PF removes. Example: pci/0000:08:00.0: index 0 nested_devlink: auxiliary/mlx5_core.eth.0 devlink_index/1: index 1 nested_devlink: pci/0000:08:00.0 pci/0000:08:00.1 auxiliary/mlx5_core.eth.0: index 2 pci/0000:08:00.1: index 3 nested_devlink: auxiliary/mlx5_core.eth.1 auxiliary/mlx5_core.eth.1: index 4 Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-12-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:50 -07:00
Jiri Pirko	20b0f383aa	devlink: add devlink_dev_driver_name() helper and use it in trace events In preparation to dev-less devlinks, add devlink_dev_driver_name() that returns the driver name stored in devlink struct, and use it in all trace events. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-9-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:49 -07:00
Jiri Pirko	0f5531879a	devlink: add helpers to get bus_name/dev_name Introduce devlink_bus_name() and devlink_dev_name() helpers and convert all direct accesses to devlink->dev->bus->name and dev_name(devlink->dev) to use them. This prepares for dev-less devlink instances where these helpers will be extended to handle the missing device. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Link: https://patch.msgid.link/20260312100407.551173-3-jiri@resnulli.us Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 13:08:47 -07:00
Eric Dumazet	d15d3de94a	net: dropreason: add SKB_DROP_REASON_RECURSION_LIMIT ip[6]tunnel_xmit() can drop packets if a too deep recursion level is detected. Add SKB_DROP_REASON_RECURSION_LIMIT drop reason. We will use this reason later in __dev_queue_xmit(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260312201824.203093-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-03-14 08:38:06 -07:00

1 2 3 4 5 ...

19201 Commits