linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-24 09:35:52 -04:00

Author	SHA1	Message	Date
Bailey Forrest	36e3b949e3	gve: Fix an edge case for TSO skb validity check The NIC requires each TSO segment to not span more than 10 descriptors. NIC further requires each descriptor to not exceed 16KB - 1 (GVE_TX_MAX_BUF_SIZE_DQO). The descriptors for an skb are generated by gve_tx_add_skb_no_copy_dqo() for DQO RDA queue format. gve_tx_add_skb_no_copy_dqo() loops through each skb frag and generates a descriptor for the entire frag if the frag size is not greater than GVE_TX_MAX_BUF_SIZE_DQO. If the frag size is greater than GVE_TX_MAX_BUF_SIZE_DQO, it is split into descriptor(s) of size GVE_TX_MAX_BUF_SIZE_DQO and a descriptor is generated for the remainder (frag size % GVE_TX_MAX_BUF_SIZE_DQO). gve_can_send_tso() checks if the descriptors thus generated for an skb would meet the requirement that each TSO-segment not span more than 10 descriptors. However, the current code misses an edge case when a TSO segment spans multiple descriptors within a large frag. This change fixes the edge case. gve_can_send_tso() relies on the assumption that max gso size (9728) is less than GVE_TX_MAX_BUF_SIZE_DQO and therefore within an skb fragment a TSO segment can never span more than 2 descriptors. Fixes: `a57e5de476` ("gve: DQO: Add TX path") Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Cc: stable@vger.kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20240724143431.3343722-1-pkaligineedi@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-07-25 07:58:07 -07:00
Joshua Washington	1b9f756344	gve: ignore nonrelevant GSO type bits when processing TSO headers TSO currently fails when the skb's gso_type field has more than one bit set. TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes virtualization, such as QEMU, a real use-case. The gso_type and gso_size fields as passed from userspace in virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type \|= SKB_GSO_DODGY to force the packet to enter the software GSO stack for verification. This issue might similarly come up when the CWR bit is set in the TCP header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit to be set. Fixes: `a57e5de476` ("gve: DQO: Add TX path") Signed-off-by: Joshua Washington <joshwash@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Suggested-by: Eric Dumazet <edumazet@google.com> Acked-by: Andrei Vagin <avagin@gmail.com> v2 - Remove unnecessary comments, remove line break between fixes tag and signoffs. v3 - Add back unrelated empty line removal. Link: https://lore.kernel.org/r/20240610225729.2985343-1-joshwash@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-06-11 19:42:35 -07:00
Shailend Chand	ee24284e2a	gve: Alloc and free QPLs with the rings Every tx and rx ring has its own queue-page-list (QPL) that serves as the bounce buffer. Previously we were allocating QPLs for all queues before the queues themselves were allocated and later associating a QPL with a queue. This is avoidable complexity: it is much more natural for each queue to allocate and free its own QPL. Moreover, the advent of new queue-manipulating ndo hooks make it hard to keep things as is: we would need to transfer a QPL from an old queue to a new queue, and that is unpleasant. Tested-by: Mina Almasry <almasrymina@google.com> Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: Shailend Chand <shailend@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-05-05 14:35:34 +01:00
Ziwei Xiao	fdf4123743	gve: Remove qpl_cfg struct since qpl_ids map with queues respectively The qpl_cfg struct was used to make sure that no two different queues are using QPL with the same qpl_id. We can remove that qpl_cfg struct since now the qpl_ids map with the queues respectively as follows: For tx queues: qpl_id = tx_qid For rx queues: qpl_id = max_tx_queues + rx_qid And when XDP is used, it will need the user to reduce the tx queues to be at most half of the max_tx_queues. Then it will use the same number of tx queues starting from the end of existing tx queues for XDP. So the XDP queues will not exceed the max_tx_queues range and will not overlap with the rx queues, where the qpl_ids will not have overlapping too. Considering of that, we remove the qpl_cfg struct to get the qpl_id directly based on the queue id. Unless we are erroneously allocating a rx/tx queue that has already been allocated, we would never allocate the qpl with the same qpl_id twice. In that case, it should fail much earlier than the QPL assignment. Suggested-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Harshitha Ramamurthy <hramamurthy@google.com> Reviewed-by: Shailend Chand <shailend@google.com> Link: https://lore.kernel.org/r/20240417205757.778551-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-04-18 18:38:34 -07:00
Harshitha Ramamurthy	5dee3c702c	gve: make the completion and buffer ring size equal for DQO For the DQO queue format, the gve driver stores two ring sizes for both TX and RX - one for completion queue ring and one for data buffer ring. This is supposed to enable asymmetric sizes for these two rings but that is not supported. Make both fields reference the same single variable. This change renders reading supported TX completion ring size and RX buffer ring size for DQO from the device useless, so change those fields to reserved and remove related code. Reviewed-by: Praveen Kaligineedi <pkaligineedi@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Harshitha Ramamurthy <hramamurthy@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2024-04-03 11:11:15 +01:00
Shailend Chand	f13697cc7a	gve: Switch to config-aware queue allocation The new config-aware functions will help achieve the goal of being able to allocate resources for new queues while there already are active queues serving traffic. These new functions work off of arbitrary queue allocation configs rather than just the currently active config in priv, and they return the newly allocated resources instead of writing them into priv. Signed-off-by: Shailend Chand <shailend@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122182632.1102721-4-shailend@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-01-23 17:41:31 -08:00
Eric Dumazet	18de1e517e	gve: add gve_features_check() It is suboptimal to attempt skb linearization from ndo_start_xmit() if a gso skb has pathological layout, or if host stack does not have access to the payload (TCP direct). Linearization of large skbs can also fail under memory pressure. We should instead have an ndo_features_check() so that we can fallback to GSO, which is supported even for TCP direct, and generally much more efficient (no payload copy). Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Bailey Forrest <bcf@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jeroen de Borst <jeroendb@google.com> Cc: Praveen Kaligineedi <pkaligineedi@google.com> Cc: Shailend Chand <shailend@google.com> Cc: Ziwei Xiao <ziweixiao@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-11-17 03:29:16 +00:00
Rushil Gupta	a6fb8d5a8b	gve: Tx path for DQO-QPL Each QPL page is divided into GVE_TX_BUFS_PER_PAGE_DQO buffers. When a packet needs to be transmitted, we break the packet into max GVE_TX_BUF_SIZE_DQO sized chunks and transmit each chunk using a TX descriptor. We allocate the TX buffers from the free list in dqo_tx. We store these TX buffer indices in an array in the pending_packet structure. The TX buffers are returned to the free list in dqo_compl after receiving packet completion or when removing packets from miss completions list. Signed-off-by: Rushil Gupta <rushilg@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Praveen Kaligineedi <pkaligineedi@google.com> Signed-off-by: Bailey Forrest <bcf@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-08-06 08:34:36 +01:00
Coco Li	a695641c8e	gve: Support IPv6 Big TCP on DQ Add support for using IPv6 Big TCP on DQ which can handle large TSO/GRO packets. See https://lwn.net/Articles/895398/. This can improve the throughput and CPU usage. Perf test result: ip -d link show $DEV gso_max_size 185000 gso_max_segs 65535 tso_max_size 262143 tso_max_segs 65535 gro_max_size 185000 For performance, tested with neper using 9k MTU on hardware that supports 200Gb/s line rate. In single streams when line rate is not saturated, we expect throughput improvements. When the networking is performing at line rate, we expect cpu usage improvements. Tcp_stream (unidirectional stream test, T=thread, F=flow): skb=180kb, T=1, F=1, no zerocopy: throughput average=64576.88 Mb/s, sender stime=8.3, receiver stime=10.68 skb=64kb, T=1, F=1, no zerocopy: throughput average=64862.54 Mb/s, sender stime=9.96, receiver stime=12.67 skb=180kb, T=1, F=1, yes zerocopy: throughput average=146604.97 Mb/s, sender stime=10.61, receiver stime=5.52 skb=64kb, T=1, F=1, yes zerocopy: throughput average=131357.78 Mb/s, sender stime=12.11, receiver stime=12.25 skb=180kb, T=20, F=100, no zerocopy: throughput average=182411.37 Mb/s, sender stime=41.62, receiver stime=79.4 skb=64kb, T=20, F=100, no zerocopy: throughput average=182892.02 Mb/s, sender stime=57.39, receiver stime=72.69 skb=180kb, T=20, F=100, yes zerocopy: throughput average=182337.65 Mb/s, sender stime=27.94, receiver stime=39.7 skb=64kb, T=20, F=100, yes zerocopy: throughput average=182144.20 Mb/s, sender stime=47.06, receiver stime=39.01 Signed-off-by: Ziwei Xiao <ziweixiao@google.com> Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Simon Horman <simon.horman@corigine.com> Link: https://lore.kernel.org/r/20230522201552.3585421-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-05-23 21:11:35 -07:00
Jeroen de Borst	a5affbd8a7	gve: Handle alternate miss completions The virtual NIC has 2 ways of indicating a miss-path completion. This handles the alternate. Signed-off-by: Jeroen de Borst <jeroendb@google.com> Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-11-21 10:52:14 +00:00
Eric Dumazet	504148fedb	net: add skb_[inner_]tcp_all_headers helpers Most drivers use "skb_transport_offset(skb) + tcp_hdrlen(skb)" to compute headers length for a TCP packet, but others use more convoluted (but equivalent) ways. Add skb_tcp_all_headers() and skb_inner_tcp_all_headers() helpers to harmonize this a bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2022-07-02 16:22:25 +01:00
Jilin Yuan	577d7685d5	google/gve:fix repeated words in comments Delete the redundant word 'a'. Signed-off-by: Jilin Yuan <yuanjilin@cdjrlc.com> Link: https://lore.kernel.org/r/20220629130013.3273-1-yuanjilin@cdjrlc.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-06-30 20:04:10 -07:00
Arnd Bergmann	1e0083bd07	gve: DQO: avoid unused variable warnings The use of dma_unmap_addr()/dma_unmap_len() in the driver causes multiple warnings when these macros are defined as empty, e.g. in an ARCH=i386 allmodconfig build: drivers/net/ethernet/google/gve/gve_tx_dqo.c: In function 'gve_tx_add_skb_no_copy_dqo': drivers/net/ethernet/google/gve/gve_tx_dqo.c:494:40: error: unused variable 'buf' [-Werror=unused-variable] 494 \| struct gve_tx_dma_buf *buf = This is not how the NEED_DMA_MAP_STATE macros are meant to work, as they rely on never using local variables or a temporary structure like gve_tx_dma_buf. Remote the gve_tx_dma_buf definition and open-code the contents in all places to avoid the warning. This causes some rather long lines but otherwise ends up making the driver slightly smaller. Fixes: `a57e5de476` ("gve: DQO: Add TX path") Link: https://lore.kernel.org/netdev/20210723231957.1113800-1-bcf@google.com/ Link: https://lore.kernel.org/netdev/20210721151100.2042139-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-09-28 15:24:36 +01:00
Bailey Forrest	e8192476de	gve: Fix warnings reported for DQO patchset https://patchwork.kernel.org/project/netdevbpf/list/?series=506637&state=* - Remove unused variable - Use correct integer type for string formatting. - Remove `inline` in C files Fixes: `9c1a59a2f4` ("gve: DQO: Add ring allocation and initialization") Fixes: `a57e5de476` ("gve: DQO: Add TX path") Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 15:38:29 -07:00
Bailey Forrest	a57e5de476	gve: DQO: Add TX path TX SKBs will have their buffers DMA mapped with the device. Each buffer will have at least one TX descriptor associated. Each SKB will also have a metadata descriptor. Each TX queue maintains an array of `gve_tx_pending_packet_dqo` objects. Every TX SKB will have an associated pending_packet object. A TX SKB's descriptors will use its pending_packet's index as the completion tag, which will be returned on the TX completion queue. The device implements a "flow-miss model". Most packets will simply receive a packet completion. The flow-miss system may choose to process a packet based on its contents. A TX packet which experiences a flow miss would receive a miss completion followed by a later reinjection completion. The miss-completion is received when the packet starts to be processed by the flow-miss system and the reinjection completion is received when the flow-miss system completes processing the packet and sends it on the wire. Notable mentions: - Buffers may be freed after receiving the miss-completion, but in order to avoid packet reordering, we do not complete the SKB until receiving the reinjection completion. - The driver must robustly handle the unlikely scenario where a miss completion does not have an associated reinjection completion. This is accomplished by maintaining a list of packets which have a pending reinjection completion. After a short timeout (5 seconds), the SKB and buffers are released and the pending_packet is moved to a second list which has a longer timeout (60 seconds), where the pending_packet will not be reused. When the longer timeout elapses, the driver may assume the reinjection completion would never be received and the pending_packet may be reused. - Completion handling is triggered by an interrupt and is done in the NAPI poll function. Because the TX path and completion exist in different threading contexts they maintain their own lists for free pending_packet objects. The TX path uses a lock-free approach to steal the list from the completion path. - Both the TSO context and general context descriptors have metadata bytes. The device requires that if multiple descriptors contain the same field, each descriptor must have the same value set for that field. Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	9c1a59a2f4	gve: DQO: Add ring allocation and initialization Allocate the buffer and completion ring structures. Do not populate the rings yet. That will happen in the respective rx and tx datapath follow-on patches Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00
Bailey Forrest	5e8c5adf95	gve: DQO: Add core netdev features Add napi netdev device registration, interrupt handling and initial tx and rx polling stubs. The stubs will be filled in follow-on patches. Also: - LRO feature advertisement and handling - Also update ethtool logic Signed-off-by: Bailey Forrest <bcf@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2021-06-24 12:47:38 -07:00

17 Commits