linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 23:03:57 -04:00

Author	SHA1	Message	Date
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Gustavo A. R. Silva	4156c3745f	virtio_net: Fix misalignment bug in struct virtnet_info Use the new TRAILING_OVERLAP() helper to fix a misalignment bug along with the following warning: drivers/net/virtio_net.c:429:46: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] This helper creates a union between a flexible-array member (FAM) and a set of members that would otherwise follow it (in this case `u8 rss_hash_key_data[VIRTIO_NET_RSS_MAX_KEY_SIZE];`). This overlays the trailing members (rss_hash_key_data) onto the FAM (hash_key_data) while keeping the FAM and the start of MEMBERS aligned. The static_assert() ensures this alignment remains. Notice that due to tail padding in flexible `struct virtio_net_rss_config_trailer`, `rss_trailer.hash_key_data` (at offset 83 in struct virtnet_info) and `rss_hash_key_data` (at offset 84 in struct virtnet_info) are misaligned by one byte. See below: struct virtio_net_rss_config_trailer { __le16 max_tx_vq; /* 0 2 / __u8 hash_key_length; / 2 1 / __u8 hash_key_data[]; / 3 0 / / size: 4, cachelines: 1, members: 3 / / padding: 1 / / last cacheline: 4 bytes / }; struct virtnet_info { ... struct virtio_net_rss_config_trailer rss_trailer; / 80 4 / / XXX last struct has 1 byte of padding / u8 rss_hash_key_data[40]; / 84 40 / ... / size: 832, cachelines: 13, members: 48 / / sum members: 801, holes: 8, sum holes: 31 / / paddings: 2, sum paddings: 5 / }; After changes, those members are correctly aligned at offset 795: struct virtnet_info { ... union { struct virtio_net_rss_config_trailer rss_trailer; / 792 4 / struct { unsigned char __offset_to_hash_key_data[3]; / 792 3 / u8 rss_hash_key_data[40]; / 795 40 / }; / 792 43 / }; / 792 44 / ... / size: 840, cachelines: 14, members: 47 / / sum members: 801, holes: 8, sum holes: 35 / / padding: 4 / / paddings: 1, sum paddings: 4 / / last cacheline: 8 bytes / }; As a result, the RSS key passed to the device is shifted by 1 byte: the last byte is cut off, and instead a (possibly uninitialized) byte is added at the beginning. As a last note `struct virtio_net_rss_config_hdr rss_hdr;` is also moved to the end, since it seems those three members should stick around together. :) Cc: stable@vger.kernel.org Fixes: `ed3100e90d` ("virtio_net: Use new RSS config structs") Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/aWIItWq5dV9XTTCJ@kspp Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-15 10:18:42 +01:00
Bui Quang Minh	a0c159647e	virtio-net: clean up __virtnet_rx_pause/resume The delayed refill worker is removed which makes virtnet_rx_pause/resume quite the same as __virtnet_rx_pause/resume. So remove __virtnet_rx_pause/resume and move the code to virtnet_rx_pause/resume. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20260106150438.7425-4-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:13:02 -08:00
Bui Quang Minh	1e7b90aa79	virtio-net: remove unused delayed refill worker Since we switched to retry refilling receive buffer in NAPI poll instead of delayed worker, remove all now unused delayed refill worker code. Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20260106150438.7425-3-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:13:02 -08:00
Bui Quang Minh	fcdef3bcbb	virtio-net: don't schedule delayed refill worker When we fail to refill the receive buffers, we schedule a delayed worker to retry later. However, this worker creates some concurrency issues. For example, when the worker runs concurrently with virtnet_xdp_set, both need to temporarily disable queue's NAPI before enabling again. Without proper synchronization, a deadlock can happen when napi_disable() is called on an already disabled NAPI. That napi_disable() call will be stuck and so will the subsequent napi_enable() call. To simplify the logic and avoid further problems, we will instead retry refilling in the next NAPI poll. Fixes: `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") Reported-by: Paolo Abeni <pabeni@redhat.com> Closes: https://lore.kernel.org/526b5396-459d-4d02-8635-a222d07b46d7@redhat.com Cc: stable@vger.kernel.org Suggested-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20260106150438.7425-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-10 11:12:48 -08:00
Kommula Shiva Shankar	acb4bc6e1b	virtio_net: fix device mismatch in devm_kzalloc/devm_kfree Initial rss_hdr allocation uses virtio_device->device, but virtnet_set_queues() frees using net_device->device. This device mismatch causing below devres warning [ 3788.514041] ------------[ cut here ]------------ [ 3788.514044] WARNING: drivers/base/devres.c:1095 at devm_kfree+0x84/0x98, CPU#16: vdpa/1463 [ 3788.514054] Modules linked in: octep_vdpa virtio_net virtio_vdpa [last unloaded: virtio_vdpa] [ 3788.514064] CPU: 16 UID: 0 PID: 1463 Comm: vdpa Tainted: G W 6.18.0 #10 PREEMPT [ 3788.514067] Tainted: [W]=WARN [ 3788.514069] Hardware name: Marvell CN106XX board (DT) [ 3788.514071] pstate: 63400009 (nZCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--) [ 3788.514074] pc : devm_kfree+0x84/0x98 [ 3788.514076] lr : devm_kfree+0x54/0x98 [ 3788.514079] sp : ffff800084e2f220 [ 3788.514080] x29: ffff800084e2f220 x28: ffff0003b2366000 x27: 000000000000003f [ 3788.514085] x26: 000000000000003f x25: ffff000106f17c10 x24: 0000000000000080 [ 3788.514089] x23: ffff00045bb8ab08 x22: ffff00045bb8a000 x21: 0000000000000018 [ 3788.514093] x20: ffff0004355c3080 x19: ffff00045bb8aa00 x18: 0000000000080000 [ 3788.514098] x17: 0000000000000040 x16: 000000000000001f x15: 000000000007ffff [ 3788.514102] x14: 0000000000000488 x13: 0000000000000005 x12: 00000000000fffff [ 3788.514106] x11: ffffffffffffffff x10: 0000000000000005 x9 : ffff800080c8c05c [ 3788.514110] x8 : ffff800084e2eeb8 x7 : 0000000000000000 x6 : 000000000000003f [ 3788.514115] x5 : ffff8000831bafe0 x4 : ffff800080c8b010 x3 : ffff0004355c3080 [ 3788.514119] x2 : ffff0004355c3080 x1 : 0000000000000000 x0 : 0000000000000000 [ 3788.514123] Call trace: [ 3788.514125] devm_kfree+0x84/0x98 (P) [ 3788.514129] virtnet_set_queues+0x134/0x2e8 [virtio_net] [ 3788.514135] virtnet_probe+0x9c0/0xe00 [virtio_net] [ 3788.514139] virtio_dev_probe+0x1e0/0x338 [ 3788.514144] really_probe+0xc8/0x3a0 [ 3788.514149] __driver_probe_device+0x84/0x170 [ 3788.514152] driver_probe_device+0x44/0x120 [ 3788.514155] __device_attach_driver+0xc4/0x168 [ 3788.514158] bus_for_each_drv+0x8c/0xf0 [ 3788.514161] __device_attach+0xa4/0x1c0 [ 3788.514164] device_initial_probe+0x1c/0x30 [ 3788.514168] bus_probe_device+0xb4/0xc0 [ 3788.514170] device_add+0x614/0x828 [ 3788.514173] register_virtio_device+0x214/0x258 [ 3788.514175] virtio_vdpa_probe+0xa0/0x110 [virtio_vdpa] [ 3788.514179] vdpa_dev_probe+0xa8/0xd8 [ 3788.514183] really_probe+0xc8/0x3a0 [ 3788.514186] __driver_probe_device+0x84/0x170 [ 3788.514189] driver_probe_device+0x44/0x120 [ 3788.514192] __device_attach_driver+0xc4/0x168 [ 3788.514195] bus_for_each_drv+0x8c/0xf0 [ 3788.514197] __device_attach+0xa4/0x1c0 [ 3788.514200] device_initial_probe+0x1c/0x30 [ 3788.514203] bus_probe_device+0xb4/0xc0 [ 3788.514206] device_add+0x614/0x828 [ 3788.514209] _vdpa_register_device+0x58/0x88 [ 3788.514211] octep_vdpa_dev_add+0x104/0x228 [octep_vdpa] [ 3788.514215] vdpa_nl_cmd_dev_add_set_doit+0x2d0/0x3c0 [ 3788.514218] genl_family_rcv_msg_doit+0xe4/0x158 [ 3788.514222] genl_rcv_msg+0x218/0x298 [ 3788.514225] netlink_rcv_skb+0x64/0x138 [ 3788.514229] genl_rcv+0x40/0x60 [ 3788.514233] netlink_unicast+0x32c/0x3b0 [ 3788.514237] netlink_sendmsg+0x170/0x3b8 [ 3788.514241] __sys_sendto+0x12c/0x1c0 [ 3788.514246] __arm64_sys_sendto+0x30/0x48 [ 3788.514249] invoke_syscall.constprop.0+0x58/0xf8 [ 3788.514255] do_el0_svc+0x48/0xd0 [ 3788.514259] el0_svc+0x48/0x210 [ 3788.514264] el0t_64_sync_handler+0xa0/0xe8 [ 3788.514268] el0t_64_sync+0x198/0x1a0 [ 3788.514271] ---[ end trace 0000000000000000 ]--- Fix by using virtio_device->device consistently for allocation and deallocation Fixes: `4944be2f5a` ("virtio_net: Allocate rss_hdr with devres") Signed-off-by: Kommula Shiva Shankar <kshankar@marvell.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20260102101900.692770-1-kshankar@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-04 10:55:25 -08:00
Jakub Kicinski	db4029859d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Conflicts: net/xdp/xsk.c `0ebc27a4c6` ("xsk: avoid data corruption on cq descriptor number") `8da7bea7db` ("xsk: add indirect call for xsk_destruct_skb") `30ed05adca` ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-27 12:19:08 -08:00
Jon Kohler	1cd1c47234	virtio-net: avoid unnecessary checksum calculation on guest RX Commit `a2fb4bc4e2` ("net: implement virtio helpers to handle UDP GSO tunneling.") inadvertently altered checksum offload behavior for guests not using UDP GSO tunneling. Before, tun_put_user called tun_vnet_hdr_from_skb, which passed has_data_valid = true to virtio_net_hdr_from_skb. After, tun_put_user began calling tun_vnet_hdr_tnl_from_skb instead, which passes has_data_valid = false into both call sites. This caused virtio hdr flags to not include VIRTIO_NET_HDR_F_DATA_VALID for SKBs where skb->ip_summed == CHECKSUM_UNNECESSARY. As a result, guests are forced to recalculate checksums unnecessarily. Restore the previous behavior by ensuring has_data_valid = true is passed in the !tnl_gso_type case, but only from tun side, as virtio_net_hdr_tnl_from_skb() is used also by the virtio_net driver, which in turn must not use VIRTIO_NET_HDR_F_DATA_VALID on tx. cc: stable@vger.kernel.org Fixes: `a2fb4bc4e2` ("net: implement virtio helpers to handle UDP GSO tunneling.") Signed-off-by: Jon Kohler <jon@nutanix.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20251125222754.1737443-1-jon@nutanix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-26 19:45:54 -08:00
Liming Wu	cfeb7cd80f	virtio_net: enhance wake/stop tx queue statistics accounting This patch refines and strengthens the statistics collection of TX queue wake/stop events introduced by commit `c39add9b24` ("virtio_net: Add TX stopped and wake counters"). Previously, the driver only recorded partial wake/stop statistics for TX queues. Some wake events triggered by 'skb_xmit_done()' or resume operations were not counted, which made the per-queue metrics incomplete. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251120015320.1418-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-25 19:38:01 -08:00
Jakub Kicinski	c99ebb6132	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc6). No conflicts, adjacent changes in: drivers/net/phy/micrel.c `96a9178a29` ("net: phy: micrel: lan8814 fix reset of the QSGMII interface") `61b7ade9ba` ("net: phy: micrel: Add support for non PTP SKUs for lan8814") and a trivial one in tools/testing/selftests/drivers/net/Makefile. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-13 12:35:38 -08:00
Xuan Zhuo	0eff2eaa53	virtio-net: fix incorrect flags recording in big mode The purpose of commit `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") is to record the flags in advance, as their value may be overwritten in the XDP case. However, the flags recorded under big mode are incorrect, because in big mode, the passed buf does not point to the rx buffer, but rather to the page of the submitted buffer. This commit fixes this issue. For the small mode, the commit `c11a49d58a` ("virtio_net: Fix mismatched buf address when unmapping for small packets") fixed it. Tested-by: Alyssa Ross <hi@alyssa.is> Fixes: `703eec1b24` ("virtio_net: fixing XDP for fully checksummed packets handling") Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251111090828.23186-1-xuanzhuo@linux.alibaba.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-11-13 13:16:30 +01:00
Jakub Kicinski	1ec9871fbb	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.18-rc5). Conflicts: drivers/net/wireless/ath/ath12k/mac.c `9222582ec5` ("Revert "wifi: ath12k: Fix missing station power save configuration"") `6917e268c4` ("wifi: ath12k: Defer vdev bring-up until CSA finalize to avoid stale beacon") https://lore.kernel.org/11cece9f7e36c12efd732baa5718239b1bf8c950.camel@sipsolutions.net Adjacent changes: drivers/net/ethernet/intel/Kconfig `b1d16f7c00` ("libie: depend on DEBUG_FS when building LIBIE_FWLOG") `93f53db9f9` ("ice: switch to Page Pool") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-06 09:27:40 -08:00
Bui Quang Minh	0c71670396	virtio-net: fix received length check in big packets Since commit `4959aebba8` ("virtio-net: use mtu size as buffer length for big packets"), when guest gso is off, the allocated size for big packets is not MAX_SKB_FRAGS * PAGE_SIZE anymore but depends on negotiated MTU. The number of allocated frags for big packets is stored in vi->big_packets_num_skbfrags. Because the host announced buffer length can be malicious (e.g. the host vhost_net driver's get_rx_bufs is modified to announce incorrect length), we need a check in virtio_net receive path. Currently, the check is not adapted to the new change which can lead to NULL page pointer dereference in the below while loop when receiving length that is larger than the allocated one. This commit fixes the received length check corresponding to the new change. Fixes: `4959aebba8` ("virtio-net: use mtu size as buffer length for big packets") Cc: stable@vger.kernel.org Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251030144438.7582-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 18:49:29 -08:00
Michael S. Tsirkin	c3838262b8	virtio_net: fix alignment for virtio_net_hdr_v1_hash Changing alignment of header would mean it's no longer safe to cast a 2 byte aligned pointer between formats. Use two 16 bit fields to make it 2 byte aligned as previously. This fixes the performance regression since commit ("virtio_net: enable gso over UDP tunnel support.") as it uses virtio_net_hdr_v1_hash_tunnel which embeds virtio_net_hdr_v1_hash. Pktgen in guest + XDP_DROP on TAP + vhost_net shows the TX PPS is recovered from 2.4Mpps to 4.45Mpps. Fixes: `56a06bd40f` ("virtio_net: enable gso over UDP tunnel support.") Cc: stable@vger.kernel.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20251031060551.126-1-jasowang@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:14:07 -08:00
Chu Guangqing	f4b2786fb1	virtio_net: Fix a typo error in virtio_net Fix the spelling error of "separate". Signed-off-by: Chu Guangqing <chuguangqing@inspur.com> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://patch.msgid.link/20251103074305.4727-1-chuguangqing@inspur.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-11-04 17:00:37 -08:00
Bui Quang Minh	1ab6658174	virtio-net: drop the multi-buffer XDP packet in zerocopy In virtio-net, we have not yet supported multi-buffer XDP packet in zerocopy mode when there is a binding XDP program. However, in that case, when receiving multi-buffer XDP packet, we skip the XDP program and return XDP_PASS. As a result, the packet is passed to normal network stack which is an incorrect behavior (e.g. a XDP program for packet count is installed, multi-buffer XDP packet arrives and does go through XDP program. As a result, the packet count does not increase but the packet is still received from network stack).This commit instead returns XDP_ABORTED in that case. Fixes: `99c861b44e` ("virtio_net: xsk: rx: support recv merge mode") Cc: stable@vger.kernel.org Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20251022155630.49272-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-10-23 17:30:40 -07:00
Linus Torvalds	bf897d2626	Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: "Just fixes and cleanups this time around. The mapping cleanups are preparing the ground for new features, though" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio-vdpa: Drop redundant conversion to bool vduse: Use fixed 4KB bounce pages for non-4KB page size vduse: switch to use virtio map API instead of DMA API vdpa: introduce map ops vdpa: support virtio_map virtio: introduce map ops in virtio core virtio_ring: rename dma_handle to map_handle virtio: introduce virtio_map container union virtio: rename dma helpers virtio_ring: switch to use dma_{map\|unmap}_page() virtio_ring: constify virtqueue pointer for DMA helpers virtio_balloon: Remove redundant __GFP_NOWARN vhost: vringh: Fix copy_to_iter return value check vhost: vringh: Modify the return value check	2025-10-04 08:48:16 -07:00
Jason Wang	b41cb3bcf6	virtio: rename dma helpers Following patch will introduce virtio mapping function to avoid abusing DMA API for device that doesn't do DMA. To ease the introduction, this patch rename "dma" to "map" for the current dma mapping helpers. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Message-Id: <20250821064641.5025-4-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Eugenio Pérez <eperezma@redhat.com>	2025-10-01 07:24:43 -04:00
Breno Leitao	483446690a	net: virtio_net: add get_rxrings ethtool callback for RX ring queries Replace the existing virtnet_get_rxnfc callback with a dedicated virtnet_get_rxrings implementation to provide the number of RX rings directly via the new ethtool_ops get_rx_ring_count pointer. This simplifies the RX ring count retrieval and aligns virtio_net with the new ethtool API for querying RX ring parameters. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20250917-gxrings-v4-8-dae520e2e1cb@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-09-18 07:07:39 -07:00
Jakub Kicinski	1827f773e4	net: xdp: pass full flags to xdp_update_skb_shared_info() xdp_update_skb_shared_info() needs to update skb state which was maintained in xdp_buff / frame. Pass full flags into it, instead of breaking it out bit by bit. We will need to add a bit for unreadable frags (even tho XDP doesn't support those the driver paths may be common), at which point almost all call sites would become: xdp_update_skb_shared_info(skb, num_frags, sinfo->xdp_frags_size, MY_PAGE_SIZE * num_frags, xdp_buff_is_frag_pfmemalloc(xdp), xdp_buff_is_frag_unreadable(xdp)); Keep a helper for accessing the flags, in case we need to transform them somehow in the future (e.g. to cover up xdp_buff vs xdp_frame differences). While we are touching call callers - rename the helper to xdp_update_skb_frags_info(), previous name may have implied that it's shinfo that's updated. We are updating flags in struct sk_buff based on frags that got attched. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Link: https://patch.msgid.link/20250905221539.2930285-2-kuba@kernel.org Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-09-11 12:00:20 +02:00
Junnan Wu	45d8ef6322	virtio_net: adjust the execution order of function `virtnet_close` during freeze "Use after free" issue appears in suspend once race occurs when napi poll scheduls after `netif_device_detach` and before napi disables. For details, during suspend flow of virtio-net, the tx queue state is set to "__QUEUE_STATE_DRV_XOFF" by CPU-A. And at some coincidental times, if a TCP connection is still working, CPU-B does `virtnet_poll` before napi disable. In this flow, the state "__QUEUE_STATE_DRV_XOFF" of tx queue will be cleared. This is not the normal process it expects. After that, CPU-A continues to close driver then virtqueue is removed. Sequence likes below: -------------------------------------------------------------------------- CPU-A CPU-B ----- ----- suspend is called A TCP based on virtio-net still work virtnet_freeze \|- virtnet_freeze_down \| \|- netif_device_detach \| \| \|- netif_tx_stop_all_queues \| \| \|- netif_tx_stop_queue \| \| \|- set_bit \| \| (__QUEUE_STATE_DRV_XOFF,...) \| \| softirq rasied \| \| \|- net_rx_action \| \| \|- napi_poll \| \| \|- virtnet_poll \| \| \|- virtnet_poll_cleantx \| \| \|- netif_tx_wake_queue \| \| \|- test_and_clear_bit \| \| (__QUEUE_STATE_DRV_XOFF,...) \| \|- virtnet_close \| \|- virtnet_disable_queue_pair \| \|- virtnet_napi_tx_disable \|- remove_vq_common -------------------------------------------------------------------------- When TCP delayack timer is up, a cpu gets softirq and irq handler `tcp_delack_timer_handler` will be called, which will finally call `start_xmit` in virtio net driver. Then the access to tx virtq will cause panic. The root cause of this issue is that napi tx is not disable before `netif_tx_stop_queue`, once `virnet_poll` schedules in such coincidental time, the tx queue state will be cleared. To solve this issue, adjusts the order of function `virtnet_close` in `virtnet_freeze_down`. Co-developed-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Ying Xu <ying123.xu@samsung.com> Signed-off-by: Junnan Wu <junnan01.wu@samsung.com> Message-Id: <20250812090817.3463403-1-junnan01.wu@samsung.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>	2025-08-26 03:38:20 -04:00
Jakub Kicinski	af2d6148d2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc7). Conflicts: Documentation/netlink/specs/ovpn.yaml `880d43ca9a` ("netlink: specs: clean up spaces in brackets") `af52020fc5` ("ovpn: reject unexpected netlink attributes") drivers/net/phy/phy_device.c `a44312d58e` ("net: phy: Don't register LEDs for genphy") `f0f2b992d8` ("net: phy: Don't register LEDs for genphy") https://lore.kernel.org/20250710114926.7ec3a64f@kernel.org drivers/net/wireless/intel/iwlwifi/fw/regulatory.c drivers/net/wireless/intel/iwlwifi/mld/regulatory.c `5fde0fcbd7` ("wifi: iwlwifi: mask reserved bits in chan_state_active_bitmap") `ea045a0de3` ("wifi: iwlwifi: add support for accepting raw DSM tables by firmware") net/ipv6/mcast.c `ae3264a25a` ("ipv6: mcast: Delay put pmc->idev in mld_del_delrec()") `a8594c956c` ("ipv6: mcast: Avoid a duplicate pointer check in mld_del_delrec()") https://lore.kernel.org/8cc52891-3653-4b03-a45e-05464fe495cf@kernel.org No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 11:00:33 -07:00
Zigit Zo	be5dcaed69	virtio-net: fix recursived rtnl_lock() during probe() The deadlock appears in a stack trace like: virtnet_probe() rtnl_lock() virtio_config_changed_work() netdev_notify_peers() rtnl_lock() It happens if the VMM sends a VIRTIO_NET_S_ANNOUNCE request while the virtio-net driver is still probing. The config_work in probe() will get scheduled until virtnet_open() enables the config change notification via virtio_config_driver_enable(). Fixes: `df28de7b00` ("virtio-net: synchronize operstate with admin state on up/down") Signed-off-by: Zigit Zo <zuozhijie@bytedance.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250716115717.1472430-1-zuozhijie@bytedance.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-17 07:37:59 -07:00
Liming Wu	2f82e99546	virtio_net: simplify tx queue wake condition check Consolidate the two nested if conditions for checking tx queue wake conditions into a single combined condition. This improves code readability without changing functionality. And move netif_tx_wake_queue into if condition to reduce unnecessary checks for queue stops. Signed-off-by: Liming Wu <liming.wu@jaguarmicro.com> Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250710023208.846-1-liming.wu@jaguarmicro.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-11 15:59:54 -07:00
Jakub Kicinski	b430f6c38d	Merge branch 'virtio_udp_tunnel_08_07_2025' of https://github.com/pabeni/linux-devel Paolo Abeni says: ==================== virtio: introduce GSO over UDP tunnel Some virtualized deployments use UDP tunnel pervasively and are impacted negatively by the lack of GSO support for such kind of traffic in the virtual NIC driver. The virtio_net specification recently introduced support for GSO over UDP tunnel, this series updates the virtio implementation to support such a feature. Currently the kernel virtio support limits the feature space to 64, while the virtio specification allows for a larger number of features. Specifically the GSO-over-UDP-tunnel-related virtio features use bits 65-69. The first four patches in this series rework the virtio and vhost feature support to cope with up to 128 bits. The limit is set by a define and could be easily raised in future, as needed. This implementation choice is aimed at keeping the code churn as limited as possible. For the same reason, only the virtio_net driver is reworked to leverage the extended feature space; all other virtio/vhost drivers are unaffected, but could be upgraded to support the extended features space in a later time. The last four patches bring in the actual GSO over UDP tunnel support. As per specification, some additional fields are introduced into the virtio net header to support the new offload. The presence of such fields depends on the negotiated features. New helpers are introduced to convert the UDP-tunneled skb metadata to an extended virtio net header and vice versa. Such helpers are used by the tun and virtio_net driver to cope with the newly supported offloads. Tested with basic stream transfer with all the possible permutations of host kernel/qemu/guest kernel with/without GSO over UDP tunnel support. ==================== Link: https://patch.msgid.link/cover.1751874094.git.pabeni@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-10 13:32:35 -07:00
Bui Quang Minh	f47e8f618c	virtio-net: xsk: rx: move the xdp->data adjustment to buf_to_xdp() This commit does not do any functional changes. It moves xdp->data adjustment for buffer other than first buffer to buf_to_xdp() helper so that the xdp_buff adjustment does not scatter over different functions. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250705075515.34260-1-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-07-09 18:41:35 -07:00
Paolo Abeni	56a06bd40f	virtio_net: enable gso over UDP tunnel support. If the related virtio feature is set, enable transmission and reception of gso over UDP tunnel packets. Most of the work is done by the previously introduced helper, just need to determine the UDP tunnel features inside the virtio_net_hdr and update accordingly the virtio net hdr size. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 18:05:59 +02:00
Paolo Abeni	3b17aa1301	virtio_net: add supports for extended offloads The virtio_net driver needs it to implement GSO over UDP tunnel offload. The only missing piece is mapping them to/from the extended features. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-08 18:05:33 +02:00
Paolo Abeni	6b9fd8857b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-6.16-rc5). No conflicts. No adjacent changes. Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-04 08:03:18 +02:00
Laurent Vivier	24b2f5df86	virtio_net: Enforce minimum TX ring size for reliability The `tx_may_stop()` logic stops TX queues if free descriptors (`sq->vq->num_free`) fall below the threshold of (`MAX_SKB_FRAGS` + 2). If the total ring size (`ring_num`) is not strictly greater than this value, queues can become persistently stopped or stop after minimal use, severely degrading performance. A single sk_buff transmission typically requires descriptors for: - The virtio_net_hdr (1 descriptor) - The sk_buff's linear data (head) (1 descriptor) - Paged fragments (up to MAX_SKB_FRAGS descriptors) This patch enforces that the TX ring size ('ring_num') must be strictly greater than (MAX_SKB_FRAGS + 2). This ensures that the ring is always large enough to hold at least one maximally-fragmented packet plus at least one additional slot. Reported-by: Lei Yang <leiyang@redhat.com> Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250521092236.661410-4-lvivier@redhat.com Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:40:02 +02:00
Laurent Vivier	bd2948d258	virtio_net: Cleanup '2+MAX_SKB_FRAGS' Improve consistency by using everywhere it is needed 'MAX_SKB_FRAGS + 2' rather than '2+MAX_SKB_FRAGS' or '2 + MAX_SKB_FRAGS'. No functional change. Signed-off-by: Laurent Vivier <lvivier@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250521092236.661410-3-lvivier@redhat.com Tested-by: Lei Yang <leiyang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:40:02 +02:00
Bui Quang Minh	5177373c31	virtio-net: xsk: rx: fix the frame's length check When calling buf_to_xdp, the len argument is the frame data's length without virtio header's length (vi->hdr_len). We check that len with xsk_pool_get_rx_frame_size() + vi->hdr_len to ensure the provided len does not larger than the allocated chunk size. The additional vi->hdr_len is because in virtnet_add_recvbuf_xsk, we use part of XDP_PACKET_HEADROOM for virtio header and ask the vhost to start placing data from hard_start + XDP_PACKET_HEADROOM - vi->hdr_len not hard_start + XDP_PACKET_HEADROOM But the first buffer has virtio_header, so the maximum frame's length in the first buffer can only be xsk_pool_get_rx_frame_size() not xsk_pool_get_rx_frame_size() + vi->hdr_len like in the current check. This commit adds an additional argument to buf_to_xdp differentiate between the first buffer and other ones to correctly calculate the maximum frame's length. Cc: stable@vger.kernel.org Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Fixes: `a4e7ba7027` ("virtio_net: xsk: rx: support recv small mode") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250630151315.86722-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 11:23:03 +02:00
Bui Quang Minh	7d4a119e45	virtio-net: use the check_mergeable_len helper Replace the current repeated code to check received length in mergeable mode with the new check_mergeable_len helper. Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250630144212.48471-4-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Bui Quang Minh	4be2193b33	virtio-net: remove redundant truesize check with PAGE_SIZE The truesize is guaranteed not to exceed PAGE_SIZE in get_mergeable_buf_len(). It is saved in mergeable context, which is not changeable by the host side, so the check in receive path is quite redundant. Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Link: https://patch.msgid.link/20250630144212.48471-3-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Bui Quang Minh	315dbdd7cd	virtio-net: ensure the received length does not exceed allocated size In xdp_linearize_page, when reading the following buffers from the ring, we forget to check the received length with the true allocate size. This can lead to an out-of-bound read. This commit adds that missing check. Cc: <stable@vger.kernel.org> Fixes: `4941d472bf` ("virtio-net: do not reset during XDP set") Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250630144212.48471-2-minhquangbui99@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-07-03 10:56:55 +02:00
Jakub Kicinski	63d474cfb5	net: drv: virtio: migrate to new RXFH callbacks Add support for the new rxfh_fields callbacks, instead of de-muxing the rxnfc calls. This driver does not support flow filtering so the set_rxnfc callback is completely removed. Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250611145949.2674086-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-06-12 17:46:58 -07:00
Jakub Kicinski	001160ec8c	virtio-net: fix total qstat values NIPA tests report that the interface statistics reported via qstat are lower than those reported via ip link. Looks like this is because some tests flip the queue count up and down, and we end up with some of the traffic accounted on disabled queues. Add up counters from disabled queues. Fixes: `d888f04c09` ("virtio-net: support queue stat") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Link: https://patch.msgid.link/20250507003221.823267-3-kuba@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2025-05-08 11:56:12 +02:00
Jakub Kicinski	4397684a29	virtio-net: free xsk_buffs on error in virtnet_xsk_pool_enable() The selftests added to our CI by Bui Quang Minh recently reveals that there is a mem leak on the error path of virtnet_xsk_pool_enable(): unreferenced object 0xffff88800a68a000 (size 2048): comm "xdp_helper", pid 318, jiffies 4294692778 hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc 0): __kvmalloc_node_noprof+0x402/0x570 virtnet_xsk_pool_enable+0x293/0x6a0 (drivers/net/virtio_net.c:5882) xp_assign_dev+0x369/0x670 (net/xdp/xsk_buff_pool.c:226) xsk_bind+0x6a5/0x1ae0 __sys_bind+0x15e/0x230 __x64_sys_bind+0x72/0xb0 do_syscall_64+0xc1/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f Acked-by: Jason Wang <jasowang@redhat.com> Fixes: `e9f3962441` ("virtio_net: xsk: rx: support fill with xsk buffer") Link: https://patch.msgid.link/20250430163836.3029761-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 13:54:40 -07:00
Jakub Kicinski	1e20324b23	virtio-net: don't re-enable refill work too early when NAPI is disabled Commit `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") fixed a deadlock between reconfig paths and refill work trying to disable the same NAPI instance. The refill work can't run in parallel with reconfig because trying to double-disable a NAPI instance causes a stall under the instance lock, which the reconfig path needs to re-enable the NAPI and therefore unblock the stalled thread. There are two cases where we re-enable refill too early. One is in the virtnet_set_queues() handler. We call it when installing XDP: virtnet_rx_pause_all(vi); ... virtnet_napi_tx_disable(..); ... virtnet_set_queues(..); ... virtnet_rx_resume_all(..); We want the work to be disabled until we call virtnet_rx_resume_all(), but virtnet_set_queues() kicks it before NAPIs were re-enabled. The other case is a more trivial case of mis-ordering in __virtnet_rx_resume() found by code inspection. Taking the spin lock in virtnet_set_queues() (requested during review) may be unnecessary as we are under rtnl_lock and so are all paths writing to ->refill_enabled. Acked-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Bui Quang Minh <minhquangbui99@gmail.com> Fixes: `4bc12818b3` ("virtio-net: disable delayed refill when pausing rx") Fixes: `413f0271f3` ("net: protect NAPI enablement with netdev_lock()") Link: https://patch.msgid.link/20250430163758.3029367-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-05-05 13:53:53 -07:00
Bui Quang Minh	4bc12818b3	virtio-net: disable delayed refill when pausing rx When pausing rx (e.g. set up xdp, xsk pool, rx resize), we call napi_disable() on the receive queue's napi. In delayed refill_work, it also calls napi_disable() on the receive queue's napi. When napi_disable() is called on an already disabled napi, it will sleep in napi_disable_locked while still holding the netdev_lock. As a result, later napi_enable gets stuck too as it cannot acquire the netdev_lock. This leads to refill_work and the pause-then-resume tx are stuck altogether. This scenario can be reproducible by binding a XDP socket to virtio-net interface without setting up the fill ring. As a result, try_fill_recv will fail until the fill ring is set up and refill_work is scheduled. This commit adds virtnet_rx_(pause/resume)_all helpers and fixes up the virtnet_rx_resume to disable future and cancel all inflights delayed refill_work before calling napi_disable() to pause the rx. Fixes: `413f0271f3` ("net: protect NAPI enablement with netdev_lock()") Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Bui Quang Minh <minhquangbui99@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://patch.msgid.link/20250417072806.18660-2-minhquangbui99@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-04-22 18:29:13 -07:00
Linus Torvalds	1a9239bb42	Merge tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Continue Netlink conversions to per-namespace RTNL lock (IPv4 routing, routing rules, routing next hops, ARP ioctls) - Continue extending the use of netdev instance locks. As a driver opt-in protect queue operations and (in due course) ethtool operations with the instance lock and not RTNL lock. - Support collecting TCP timestamps (data submitted, sent, acked) in BPF, allowing for transparent (to the application) and lower overhead tracking of TCP RPC performance. - Tweak existing networking Rx zero-copy infra to support zero-copy Rx via io_uring. - Optimize MPTCP performance in single subflow mode by 29%. - Enable GRO on packets which went thru XDP CPU redirect (were queued for processing on a different CPU). Improving TCP stream performance up to 2x. - Improve performance of contended connect() by 200% by searching for an available 4-tuple under RCU rather than a spin lock. Bring an additional 229% improvement by tweaking hash distribution. - Avoid unconditionally touching sk_tsflags on RX, improving performance under UDP flood by as much as 10%. - Avoid skb_clone() dance in ping_rcv() to improve performance under ping flood. - Avoid FIB lookup in netfilter if socket is available, 20% perf win. - Rework network device creation (in-kernel) API to more clearly identify network namespaces and their roles. There are up to 4 namespace roles but we used to have just 2 netns pointer arguments, interpreted differently based on context. - Use sysfs_break_active_protection() instead of trylock to avoid deadlocks between unregistering objects and sysfs access. - Add a new sysctl and sockopt for capping max retransmit timeout in TCP. - Support masking port and DSCP in routing rule matches. - Support dumping IPv4 multicast addresses with RTM_GETMULTICAST. - Support specifying at what time packet should be sent on AF_XDP sockets. - Expose TCP ULP diagnostic info (for TLS and MPTCP) to non-admin users. - Add Netlink YAML spec for WiFi (nl80211) and conntrack. - Introduce EXPORT_IPV6_MOD() and EXPORT_IPV6_MOD_GPL() for symbols which only need to be exported when IPv6 support is built as a module. - Age FDB entries based on Rx not Tx traffic in VxLAN, similar to normal bridging. - Allow users to specify source port range for GENEVE tunnels. - netconsole: allow attaching kernel release, CPU ID and task name to messages as metadata Driver API: - Continue rework / fixing of Energy Efficient Ethernet (EEE) across the SW layers. Delegate the responsibilities to phylink where possible. Improve its handling in phylib. - Support symmetric OR-XOR RSS hashing algorithm. - Support tracking and preserving IRQ affinity by NAPI itself. - Support loopback mode speed selection for interface selftests. Device drivers: - Remove the IBM LCS driver for s390 - Remove the sb1000 cable modem driver - Add support for SFP module access over SMBus - Add MCTP transport driver for MCTP-over-USB - Enable XDP metadata support in multiple drivers - Ethernet high-speed NICs: - Broadcom (bnxt): - add PCIe TLP Processing Hints (TPH) support for new AMD platforms - support dumping RoCE queue state for debug - opt into instance locking - Intel (100G, ice, idpf): - ice: rework MSI-X IRQ management and distribution - ice: support for E830 devices - iavf: add support for Rx timestamping - iavf: opt into instance locking - nVidia/Mellanox: - mlx4: use page pool memory allocator for Rx - mlx5: support for one PTP device per hardware clock - mlx5: support for 200Gbps per-lane link modes - mlx5: move IPSec policy check after decryption - AMD/Solarflare: - support FW flashing via devlink - Cisco (enic): - use page pool memory allocator for Rx - enable 32, 64 byte CQEs - get max rx/tx ring size from the device - Meta (fbnic): - support flow steering and RSS configuration - report queue stats - support TCP segmentation - support IRQ coalescing - support ring size configuration - Marvell/Cavium: - support AF_XDP - Wangxun: - support for PTP clock and timestamping - Huawei (hibmcge): - checksum offload - add more statistics - Ethernet virtual: - VirtIO net: - aggressively suppress Tx completions, improve perf by 96% with 1 CPU and 55% with 2 CPUs - expose NAPI to IRQ mapping and persist NAPI settings - Google (gve): - support XDP in DQO RDA Queue Format - opt into instance locking - Microsoft vNIC: - support BIG TCP - Ethernet NICs consumer, and embedded: - Synopsys (stmmac): - cleanup Tx and Tx clock setting and other link-focused cleanups - enable SGMII and 2500BASEX mode switching for Intel platforms - support Sophgo SG2044 - Broadcom switches (b53): - support for BCM53101 - TI: - iep: add perout configuration support - icssg: support XDP - Cadence (macb): - implement BQL - Xilinx (axinet): - support dynamic IRQ moderation and changing coalescing at runtime - implement BQL - report standard stats - MediaTek: - support phylink managed EEE - Intel: - igc: don't restart the interface on every XDP program change - RealTek (r8169): - support reading registers of internal PHYs directly - increase max jumbo packet size on RTL8125/RTL8126 - Airoha: - support for RISC-V NPU packet processing unit - enable scatter-gather and support MTU up to 9kB - Tehuti (tn40xx): - support cards with TN4010 MAC and an Aquantia AQR105 PHY - Ethernet PHYs: - support for TJA1102S, TJA1121 - dp83tg720: add randomized polling intervals for link detection - dp83822: support changing the transmit amplitude voltage - support for LEDs on 88q2xxx - CAN: - canxl: support Remote Request Substitution bit access - flexcan: add S32G2/S32G3 SoC - WiFi: - remove cooked monitor support - strict mode for better AP testing - basic EPCS support - OMI RX bandwidth reduction support - batman-adv: add support for jumbo frames - WiFi drivers: - RealTek (rtw88): - support RTL8814AE and RTL8814AU - RealTek (rtw89): - switch using wiphy_lock and wiphy_work - add BB context to manipulate two PHY as preparation of MLO - improve BT-coexistence mechanism to play A2DP smoothly - Intel (iwlwifi): - add new iwlmld sub-driver for latest HW/FW combinations - MediaTek (mt76): - preparation for mt7996 Multi-Link Operation (MLO) support - Qualcomm/Atheros (ath12k): - continued work on MLO - Silabs (wfx): - Wake-on-WLAN support - Bluetooth: - add support for skb TX SND/COMPLETION timestamping - hci_core: enable buffer flow control for SCO/eSCO - coredump: log devcd dumps into the monitor - Bluetooth drivers: - intel: add support to configure TX power - nxp: handle bootloader error during cmd5 and cmd7" * tag 'net-next-6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1681 commits) unix: fix up for "apparmor: add fine grained af_unix mediation" mctp: Fix incorrect tx flow invalidation condition in mctp-i2c net: usb: asix: ax88772: Increase phy_name size net: phy: Introduce PHY_ID_SIZE — minimum size for PHY ID string net: libwx: fix Tx L4 checksum net: libwx: fix Tx descriptor content for some tunnel packets atm: Fix NULL pointer dereference net: tn40xx: add pci-id of the aqr105-based Tehuti TN4010 cards net: tn40xx: prepare tn40xx driver to find phy of the TN9510 card net: tn40xx: create swnode for mdio and aqr105 phy and add to mdiobus net: phy: aquantia: add essential functions to aqr105 driver net: phy: aquantia: search for firmware-name in fwnode net: phy: aquantia: add probe function to aqr105 for firmware loading net: phy: Add swnode support to mdiobus_scan gve: add XDP DROP and PASS support for DQ gve: update XDP allocation path support RX buffer posting gve: merge packet buffer size fields gve: update GQ RX to use buf_size gve: introduce config-based allocation for XDP gve: remove xdp_xsk_done and xdp_xsk_wakeup statistics ...	2025-03-26 21:48:21 -07:00
Akihiko Odaki	4944be2f5a	virtio_net: Allocate rss_hdr with devres virtnet_probe() lacks the code to free rss_hdr in its error path. Allocate rss_hdr with devres so that it will be automatically freed. Fixes: `86a48a00ef` ("virtio_net: Support dynamic rss indirection table size") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20250321-virtio-v2-4-33afb8f4640b@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 09:30:22 -07:00
Akihiko Odaki	ed3100e90d	virtio_net: Use new RSS config structs The new RSS configuration structures allow easily constructing data for VIRTIO_NET_CTRL_MQ_RSS_CONFIG as they strictly follow the order of data for the command. Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20250321-virtio-v2-3-33afb8f4640b@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 09:30:21 -07:00
Akihiko Odaki	97841341e3	virtio_net: Fix endian with virtio_net_ctrl_rss Mark the fields of struct virtio_net_ctrl_rss as little endian as they are in struct virtio_net_rss_config, which it follows. Fixes: `c7114b1249` ("drivers/net/virtio_net: Added basic RSS support.") Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Link: https://patch.msgid.link/20250321-virtio-v2-2-33afb8f4640b@daynix.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-25 09:30:21 -07:00
Joe Damato	d5d715207e	virtio_net: Use persistent NAPI config Use persistent NAPI config so that NAPI IDs are not renumbered as queue counts change. $ sudo ethtool -l ens4 \| tail -5 \| egrep -i '(current\|combined)' Current hardware settings: Combined: 4 $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'type': 'tx'}] Now adjust the queue count, note that the NAPI IDs are not renumbered: $ sudo ethtool -L ens4 combined 1 $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'type': 'tx'}] $ sudo ethtool -L ens4 combined 8 $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8193, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8194, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8195, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8196, 'type': 'rx'}, {'id': 4, 'ifindex': 2, 'napi-id': 8197, 'type': 'rx'}, {'id': 5, 'ifindex': 2, 'napi-id': 8198, 'type': 'rx'}, {'id': 6, 'ifindex': 2, 'napi-id': 8199, 'type': 'rx'}, {'id': 7, 'ifindex': 2, 'napi-id': 8200, 'type': 'rx'}, [...] Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Gerhard Engleder <gerhard@engleder-embedded.com> Acked-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20250307011215.266806-5-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:09:22 -07:00
Joe Damato	e7231f49d5	virtio-net: Map NAPIs to queues Use netif_queue_set_napi to map NAPIs to queue IDs so that the mapping can be accessed by user apps. Note that the netif_queue_set_napi currently requires RTNL, so care must be taken to ensure RTNL is held on paths where this API might be reached. The paths in the driver where this API can be reached appear to be: - ndo_open, ndo_close, which hold RTNL so no driver change is needed. - rx_pause, rx_resume, tx_pause, tx_resume are reached either via an ethtool ioctl or via XSK - neither path requires a driver change. - power management paths (which call open and close), which have been updated to hold/release RTNL. $ ethtool -i ens4 \| grep driver driver: virtio_net $ sudo ethtool -L ens4 combined 4 $ ./tools/net/ynl/pyynl/cli.py \ --spec Documentation/netlink/specs/netdev.yaml \ --dump queue-get --json='{"ifindex": 2}' [{'id': 0, 'ifindex': 2, 'napi-id': 8289, 'type': 'rx'}, {'id': 1, 'ifindex': 2, 'napi-id': 8290, 'type': 'rx'}, {'id': 2, 'ifindex': 2, 'napi-id': 8291, 'type': 'rx'}, {'id': 3, 'ifindex': 2, 'napi-id': 8292, 'type': 'rx'}, {'id': 0, 'ifindex': 2, 'type': 'tx'}, {'id': 1, 'ifindex': 2, 'type': 'tx'}, {'id': 2, 'ifindex': 2, 'type': 'tx'}, {'id': 3, 'ifindex': 2, 'type': 'tx'}] Note that virtio_net has TX-only NAPIs which do not have NAPI IDs, so the lack of 'napi-id' in the above output is expected. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20250307011215.266806-4-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:09:21 -07:00
Joe Damato	986a930451	virtio-net: Refactor napi_disable paths Create virtnet_napi_disable helper and refactor virtnet_napi_tx_disable to take a struct send_queue. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20250307011215.266806-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:09:21 -07:00
Joe Damato	2af5adf962	virtio-net: Refactor napi_enable paths Refactor virtnet_napi_enable and virtnet_napi_tx_enable to take a struct receive_queue. Create a helper, virtnet_napi_do_enable, which contains the logic to enable a NAPI. Signed-off-by: Joe Damato <jdamato@fastly.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Lei Yang <leiyang@redhat.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Link: https://patch.msgid.link/20250307011215.266806-2-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2025-03-10 13:09:21 -07:00
Yury Norov	f02f2a1fe5	virtio_net: simplify virtnet_set_affinity() The inner loop may be replaced with the dedicated for_each_online_cpu_wrap. Use it as it improves readability and simplifies maintenance. Signed-off-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Nick Child <nnac123@linux.ibm.com> Acked-by: Michael S. Tsirkin <mst@redhat.com>	2025-02-24 16:36:59 -05:00

1 2 3 4 5 ...

826 Commits