Commit Graph

41982 Commits

Author SHA1 Message Date
Maciej Fijalkowski
9d87e41a6d i40e, xsk: Get rid of redundant 'fallthrough'
Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:

XDP_REDIRECT -> I40E_XDP_{REDIR,CONSUMED}
XDP_TX -> I40E_XDP_{TX,CONSUMED}
XDP_DROP -> I40E_XDP_CONSUMED
XDP_ABORTED -> I40E_XDP_CONSUMED
XDP_PASS -> I40E_XDP_PASS

Commit b8aef650e5 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation

XDP_REDIRECT -> I40E_XDP_EXIT

which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid I40E_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.

Simply drop this leftover.

Fixes: b8aef650e5 ("i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132126.471515-3-maciej.fijalkowski@intel.com
2022-04-21 16:31:10 +02:00
Maciej Fijalkowski
e130e8d543 ixgbe, xsk: Get rid of redundant 'fallthrough'
Intel drivers translate actions returned from XDP programs to their own
return codes that have the following mapping:

XDP_REDIRECT -> IXGBE_XDP_{REDIR,CONSUMED}
XDP_TX -> IXGBE_XDP_{TX,CONSUMED}
XDP_DROP -> IXGBE_XDP_CONSUMED
XDP_ABORTED -> IXGBE_XDP_CONSUMED
XDP_PASS -> IXGBE_XDP_PASS

Commit c7dd09fd46 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx
queue gets full") introduced new translation

XDP_REDIRECT -> IXGBE_XDP_EXIT

which is set when XSK RQ gets full and to indicate that driver should
stop further Rx processing. This happens for unsuccessful
xdp_do_redirect() so it is valuable to call trace_xdp_exception() for
this case. In order to avoid IXGBE_XDP_EXIT -> IXGBE_XDP_CONSUMED
overwrite, XDP_DROP case was moved above which in turn made the
'fallthrough' that is in XDP_ABORTED useless as it became the last label
in the switch statement.

Simply drop this leftover.

Fixes: c7dd09fd46 ("ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220421132126.471515-2-maciej.fijalkowski@intel.com
2022-04-21 16:31:10 +02:00
Maciej Fijalkowski
4efad19616 ice, xsk: Avoid refilling single Rx descriptors
Call alloc Rx routine for ZC driver only when the amount of unallocated
descriptors exceeds given threshold.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-14-maciej.fijalkowski@intel.com
2022-04-15 21:11:13 +02:00
Maciej Fijalkowski
a817ead415 stmmac, xsk: Diversify return values from xsk_wakeup call paths
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in stmmac's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-13-maciej.fijalkowski@intel.com
2022-04-15 21:11:05 +02:00
Maciej Fijalkowski
7b7f2f273d mlx5, xsk: Diversify return values from xsk_wakeup call paths
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIO in mlx5's ndo_xsk_wakeup() implementation to EINVAL, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-12-maciej.fijalkowski@intel.com
2022-04-15 21:11:01 +02:00
Maciej Fijalkowski
0f8bf01889 ixgbe, xsk: Diversify return values from xsk_wakeup call paths
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in ixgbe's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-11-maciej.fijalkowski@intel.com
2022-04-15 21:10:57 +02:00
Maciej Fijalkowski
ed7ae2d622 i40e, xsk: Diversify return values from xsk_wakeup call paths
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in i40e's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-10-maciej.fijalkowski@intel.com
2022-04-15 21:10:54 +02:00
Maciej Fijalkowski
ed8a6bc60f ice, xsk: Diversify return values from xsk_wakeup call paths
Currently, when debugging AF_XDP workloads, one can correlate the -ENXIO
return code as the case that XSK is not in the bound state. Returning
same code from ndo_xsk_wakeup can be misleading and simply makes it
harder to follow what is going on.

Change ENXIOs in ice's ndo_xsk_wakeup() implementation to EINVALs, so
that when probing it is clear that something is wrong on the driver
side, not the xsk_{recv,send}msg.

There is a -ENETDOWN that can happen from both kernel/driver sides
though, but I don't have a correct replacement for this on one of the
sides, so let's keep it that way.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-9-maciej.fijalkowski@intel.com
2022-04-15 21:10:49 +02:00
Maciej Fijalkowski
c7dd09fd46 ixgbe, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was
returned from xdp_do_redirect() with a XSK Rx queue being full. In such
case, terminate the Rx processing that is being done on the current HW
Rx ring and let the user space consume descriptors from XSK Rx queue so
that there is room that driver can use later on.

Introduce new internal return code IXGBE_XDP_EXIT that will indicate case
described above.

Note that it does not affect Tx processing that is bound to the same
NAPI context, nor the other Rx rings.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-8-maciej.fijalkowski@intel.com
2022-04-15 21:10:45 +02:00
Maciej Fijalkowski
b8aef650e5 i40e, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was
returned from xdp_do_redirect() with a XSK Rx queue being full. In such
case, terminate the Rx processing that is being done on the current HW
Rx ring and let the user space consume descriptors from XSK Rx queue so
that there is room that driver can use later on.

Introduce new internal return code I40E_XDP_EXIT that will indicate case
described above.

Note that it does not affect Tx processing that is bound to the same
NAPI context, nor the other Rx rings.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-7-maciej.fijalkowski@intel.com
2022-04-15 21:10:41 +02:00
Maciej Fijalkowski
50ae066480 ice, xsk: Terminate Rx side of NAPI when XSK Rx queue gets full
When XSK pool uses need_wakeup feature, correlate -ENOBUFS that was
returned from xdp_do_redirect() with a XSK Rx queue being full. In such
case, terminate the Rx processing that is being done on the current HW
Rx ring and let the user space consume descriptors from XSK Rx queue so
that there is room that driver can use later on.

Introduce new internal return code ICE_XDP_EXIT that will indicate case
described above.

Note that it does not affect Tx processing that is bound to the same
NAPI context, nor the other Rx rings.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-6-maciej.fijalkowski@intel.com
2022-04-15 21:10:37 +02:00
Maciej Fijalkowski
d090c88586 ixgbe, xsk: Decorate IXGBE_XDP_REDIR with likely()
ixgbe_run_xdp_zc() suggests to compiler that XDP_REDIRECT is the most
probable action returned from BPF program that AF_XDP has in its
pipeline. Let's also bring this suggestion up to the callsite of
ixgbe_run_xdp_zc() so that compiler will be able to generate more
optimized code which in turn will make branch predictor happy.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-5-maciej.fijalkowski@intel.com
2022-04-15 21:10:32 +02:00
Maciej Fijalkowski
0bd5ab511e ice, xsk: Decorate ICE_XDP_REDIR with likely()
ice_run_xdp_zc() suggests to compiler that XDP_REDIRECT is the most
probable action returned from BPF program that AF_XDP has in its
pipeline. Let's also bring this suggestion up to the callsite of
ice_run_xdp_zc() so that compiler will be able to generate more
optimized code which in turn will make branch predictor happy.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20220413153015.453864-4-maciej.fijalkowski@intel.com
2022-04-15 21:10:25 +02:00
Bert Kenward
bd4a2697e5 sfc: use hardware tx timestamps for more than PTP
The 8000 series and newer NICs all get hardware timestamps from the MAC
 and can provide timestamps on a normal TX queue, rather than via a slow
 path through the MC. As such we can use this path for any packet where a
 hardware timestamp is requested.
This also enables support for PTP over transports other than IPv4+UDP.

Signed-off-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: Edward Cree <ecree@xilinx.com>
Link: https://lore.kernel.org/r/510652dc-54b4-0e11-657e-e37ee3ca26a9@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-08 14:43:10 -07:00
David S. Miller
4a778f3d53 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-04-07

Alexander Lobakin says:

This hunts down several places around packet templates/dummies for
switch rules which are either repetitive, fragile or just not
really readable code.
It's a common need to add new packet templates and to review such
changes as well, try to simplify both with the help of a pair
macros and aliases.
ice_find_dummy_packet() became very complex at this point with tons
of nested if-elses. It clearly showed this approach does not scale,
so convert its logics to the simple mask-match + static const array.

bloat-o-meter is happy about that (built w/ LLVM 13):

add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function                                     old     new   delta
ice_fill_adv_dummy_packet                    289     291      +2
ice_adv_add_update_vsi_list                  201       -    -201
ice_add_adv_rule                            2950    2093    -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data                                      old     new   delta
ice_dummy_pkt_profiles                         -     672    +672
Total: Before=37895, After=38567, chg +1.77%

Diffstat also looks nice, and adding new packet templates now takes
less lines.

We'll probably come out with dynamic template crafting in a while,
but for now let's improve what we have currently.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 13:41:31 +01:00
Andy Gospodarek
9f4b28301c bnxt: XDP multibuffer enablement
Allow aggregation buffers to be in place in the receive path and
allow XDP programs to be attached when using a larger than 4k MTU.

v3: Add a check to sure XDP program supports multipage packets.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
a7559bc8c1 bnxt: support transmit and free of aggregation buffers
This patch adds the following features:
- Support for XDP_TX and XDP_DROP action when using xdp_buff
  with frags
- Support for freeing all frags attached to an xdp_buff
- Cleanup of TX ring buffers after transmits complete
- Slight change in definition of bnxt_sw_tx_bd since nr_frags
  and RX producer may both need to be used
- Clear out skb_shared_info at the end of the buffer

v2: Fix uninitialized variable warning in bnxt_xdp_buff_frags_free().

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
1dc4c557bf bnxt: adding bnxt_xdp_build_skb to build skb from multibuffer xdp_buff
Since we have an xdp_buff with frags there needs to be a way to
convert that into a valid sk_buff in the event that XDP_PASS is
the resulting operation.  This adds a new rx_skb_func when the
netdev has an MTU that prevents the packets from sitting in a
single page.

This also make sure that GRO/LRO stay disabled even when using
the aggregation ring for large buffers.

v3: Use BNXT_PAGE_MODE_BUF_SIZE for build_skb

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
9a6aa35048 bnxt: add page_pool support for aggregation ring when using xdp
If we are using aggregation rings with XDP enabled, allocate page
buffers for the aggregation rings from the page_pool.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:48 +01:00
Andy Gospodarek
3286123619 bnxt: change receive ring space parameters
Modify ring header data split and jumbo parameters to account
for the fact that the design for XDP multibuffer puts close to
the first 4k of data in a page and the remaining portions of
the packet go in the aggregation ring.

v3: Simplified code around initial buffer size calculation

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
31b9998bf2 bnxt: set xdp_buff pfmemalloc flag if needed
Set the pfmemaloc flag in the xdp buff so that this can be
copied to the skb if needed for an XDP_PASS action.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
4c6c123c9a bnxt: adding bnxt_rx_agg_pages_xdp for aggregated xdp
This patch adds a new function that will read pages from the
aggregation ring and create an xdp_buff with frags based on
the entries in the aggregation ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
23e4c0469a bnxt: rename bnxt_rx_pages to bnxt_rx_agg_pages_skb
Clarify that this is reading buffers from the aggregation ring.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
ca1df2dd8e bnxt: refactor bnxt_rx_pages operate on skb_shared_info
Rather than operating on an sk_buff, add frags from the aggregation
ring into the frags of an skb_shared_info.  This will allow the
caller to use either an sk_buff or xdp_buff.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
ee536dcbdc bnxt: add flag to denote that an xdp program is currently attached
This will be used to determine if bnxt_rx_xdp should be called
rather than calling it every time.

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Andy Gospodarek
b231c3f341 bnxt: refactor bnxt_rx_xdp to separate xdp_init_buff/xdp_prepare_buff
Move initialization of xdp_buff outside of bnxt_rx_xdp to prepare
for allowing bnxt_rx_xdp to operate on multibuffer xdp_buffs.

v2: Fix uninitalized variables warning in bnxt_xdp.c.
v3: Add new define BNXT_PAGE_MODE_BUF_SIZE

Signed-off-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-08 11:52:47 +01:00
Jakub Kicinski
dc2e0617f1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
No conflicts.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 23:24:23 -07:00
Xiaomeng Tong
4daf5f1956 qed: remove an unneed NULL check on list iterator
The define for_each_pci_dev(d) is:
 while ((d = pci_get_device(PCI_ANY_ID, PCI_ANY_ID, d)) != NULL)

Thus, the list iterator 'd' is always non-NULL so it doesn't need to
be checked. So just remove the unnecessary NULL check. Also remove the
unnecessary initializer because the list iterator is always initialized.

Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Link: https://lore.kernel.org/r/20220406015921.29267-1-xiam0nd.tong@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 21:04:10 -07:00
Robin Murphy
6a62924c0a sfc: Stop using iommu_present()
Even if an IOMMU might be present for some PCI segment in the system,
that doesn't necessarily mean it provides translation for the device
we care about. It appears that what we care about here is specifically
whether DMA mapping ops involve any IOMMU overhead or not, so check for
translation actually being active for our device.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://lore.kernel.org/r/7350f957944ecfce6cce90f422e3992a1f428775.1649166055.git.robin.murphy@arm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-07 21:04:10 -07:00
Alexander Lobakin
e33163a40d ice: switch: convert packet template match code to rodata
Trade text size for rodata size and replace tons of nested if-elses
to the const mask match based structs. The almost entire
ice_find_dummy_packet() now becomes just one plain while-increment
loop. The order in ice_dummy_pkt_profiles[] should be same with the
if-elses order previously, as masks become less and less strict
through the array to follow the original code flow.
Apart from removing 80 locs of 4-level if-elses, it brings a solid
text size optimization:

add/remove: 0/1 grow/shrink: 1/1 up/down: 2/-1058 (-1056)
Function                                     old     new   delta
ice_fill_adv_dummy_packet                    289     291      +2
ice_adv_add_update_vsi_list                  201       -    -201
ice_add_adv_rule                            2950    2093    -857
Total: Before=414512, After=413456, chg -0.25%
add/remove: 53/52 grow/shrink: 0/0 up/down: 4660/-3988 (672)
RO Data                                      old     new   delta
ice_dummy_pkt_profiles                         -     672    +672
Total: Before=37895, After=38567, chg +1.77%

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07 08:20:10 -07:00
Alexander Lobakin
07a28842bb ice: switch: use convenience macros to declare dummy pkt templates
Declarations of dummy/template packet headers and offsets can be
minified to improve readability and simplify adding new templates.
Move all the repetitive constructions into two macros and let them
do the name and type expansions.
Linewrap removal is yet another positive side effect.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07 08:20:10 -07:00
Alexander Lobakin
1b699f81db ice: switch: use a struct to pass packet template params
ice_find_dummy_packet() contains a lot of boilerplate code and a
nice room for copy-paste mistakes.
Instead of passing 3 separate pointers back and forth to get packet
template (dummy) params, directly return a structure containing
them. Then, use a macro to compose compound literals and avoid code
duplication on return path.
Now, dummy packet type/name is needed only once to return a full
correct triple pkt-pkt_len-offsets, and those are all one-liners.
dummy_ipv4_gtpu_ipv4_packet_offsets is just moved around and renamed
(as well as dummy_ipv6_gtp_packet_offsets) with no function changes.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07 08:20:10 -07:00
Alexander Lobakin
27ffa273a0 ice: switch: unobscurify bitops loop in ice_fill_adv_dummy_packet()
A loop performing header modification according to the provided mask
in ice_fill_adv_dummy_packet() is very cryptic (and error-prone).
Replace two identical cast-deferences with a variable. Replace three
struct-member-array-accesses with a variable. Invert the condition,
reduce the indentation by one -> eliminate line wraps.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07 08:20:10 -07:00
Alexander Lobakin
135a161a5e ice: switch: add and use u16[] aliases to ice_adv_lkup_elem::{h, m}_u
ice_adv_lkup_elem fields h_u and m_u are being accessed as raw u16
arrays in several places.
To reduce cast and braces burden, add permanent array-of-u16 aliases
with the same size as the `union ice_prot_hdr` itself via anonymous
unions to the actual struct declaration, and just access them
directly.

This:
 - removes the need to cast the union to u16[] and then dereference
   it each time -> reduces the horizon for potential bugs;
 - improves -Warray-bounds coverage -- the array size is now known
   at compilation time;
 - addresses cppcheck complaints.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-07 08:20:10 -07:00
Volodymyr Mytnyk
e8bd70250a prestera: acl: add action hw_stats support
Currently, when user adds a tc action and the action gets offloaded,
the user expects the HW stats to be counted also. This limits the
amount of supported offloaded filters, as HW counter resources may
be quite limited. Without counter assigned, the HW is capable to
carry much more filters.

To resolve the issue above, the following types of HW stats are
offloaded and supported by the driver:

any       - current default, user does not care about the type.
delayed   - polled from HW periodically.
disabled  - no HW stats needed.
immediate - not supported.

Example:
  tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x11 \
    action drop
  tc filter add dev PORT ingress proto ip flower skip_sw ip_proto 0x12 \
    action drop hw_stats disabled
  tc filter add dev sw1p1 ingress proto ip flower skip_sw ip_proto 0x14 \
    action drop hw_stats delayed

Signed-off-by: Volodymyr Mytnyk <vmytnyk@marvell.com>
Link: https://lore.kernel.org/r/1649164814-18731-1-git-send-email-volodymyr.mytnyk@plvision.eu
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 22:47:38 -07:00
Borislav Petkov
8dd7cdb0f4 bnx2x: Fix undefined behavior due to shift overflowing the constant
Fix:

  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c: In function ‘bnx2x_check_blocks_with_parity3’:
  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c:4917:4: error: case label does not reduce to an integer constant
      case AEU_INPUTS_ATTN_BITS_MCP_LATCHED_SCPAD_PARITY:
      ^~~~

See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory
details as to why it triggers with older gccs only.

Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: Ariel Elior <aelior@marvell.com>
Cc: Sudarsana Kalluru <skalluru@marvell.com>
Cc: Manish Chopra <manishc@marvell.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Link: https://lore.kernel.org/r/20220405151517.29753-4-bp@alien8.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-04-06 12:05:48 -07:00
Xiaomeng Tong
b423e54ba9 myri10ge: fix an incorrect free for skb in myri10ge_sw_tso
All remaining skbs should be released when myri10ge_xmit fails to
transmit a packet. Fix it within another skb_list_walk_safe.

Signed-off-by: Xiaomeng Tong <xiam0nd.tong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 15:29:18 +01:00
Wang Qing
be8d9d0527 net: ethernet: xilinx: use of_property_read_bool() instead of of_get_property
"little-endian" has no specific content, use more helper function
of_property_read_bool() instead of of_get_property()

Signed-off-by: Wang Qing <wangqing@vivo.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 15:18:03 +01:00
Jamie Bainbridge
4e910dbe36 qede: confirm skb is allocated before using
qede_build_skb() assumes build_skb() always works and goes straight
to skb_reserve(). However, build_skb() can fail under memory pressure.
This results in a kernel panic because the skb to reserve is NULL.

Add a check in case build_skb() failed to allocate and return NULL.

The NULL return is handled correctly in callers to qede_build_skb().

Fixes: 8a8633978b ("qede: Add build_skb() support.")
Signed-off-by: Jamie Bainbridge <jamie.bainbridge@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 15:16:23 +01:00
David S. Miller
74edbe9ede Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2022-04-05

Maciej Fijalkowski says:

We were solving issues around AF_XDP busy poll's not-so-usual scenarios,
such as very big busy poll budgets applied to very small HW rings. This
set carries the things that were found during that work that apply to
net tree.

One thing that was fixed for all in-tree ZC drivers was missing on ice
side all the time - it's about syncing RCU before destroying XDP
resources. Next one fixes the bit that is checked in ice_xsk_wakeup and
third one avoids false setting of DD bits on Tx descriptors.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 15:03:50 +01:00
Felix Fietkau
33fc42de33 net: ethernet: mtk_eth_soc: support creating mac address based offload entries
This will be used to implement a limited form of bridge offloading.
Since the hardware does not support flow table entries with just source
and destination MAC address, the driver has to emulate it.

The hardware automatically creates entries entries for incoming flows, even
when they are bridged instead of routed, and reports when packets for these
flows have reached the minimum PPS rate for offloading.

After this happens, we look up the L2 flow offload entry based on the MAC
header and fill in the output routing information in the flow table.
The dynamically created per-flow entries are automatically removed when
either the hardware flowtable entry expires, is replaced, or if the offload
rule they belong to is removed

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:51 +01:00
Felix Fietkau
8ff25d3774 net: ethernet: mtk_eth_soc: remove bridge flow offload type entry support
According to MediaTek, this feature is not supported in current hardware

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:50 +01:00
Felix Fietkau
c4f033d9e0 net: ethernet: mtk_eth_soc: rework hardware flow table management
The hardware was designed to handle flow detection and creation of flow entries
by itself, relying on the software primarily for filling in egress routing
information.
When there is a hash collision between multiple flows, this allows the hardware
to maintain the entry for the most active flow.
Additionally, the hardware only keeps offloading active for entries with at
least 30 packets per second.

With this rework, the code no longer creates a hardware entries directly.
Instead, the hardware entry is only created when the PPE reports a matching
unbound flow with the minimum target rate.
In order to reduce CPU overhead, looking for flows belonging to a hash entry
is rate limited to once every 100ms.

This rework is also used as preparation for emulating bridge offload by
managing L4 offload entries on demand.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:50 +01:00
Felix Fietkau
1ccc723b58 net: ethernet: mtk_eth_soc: allocate struct mtk_ppe separately
Preparation for adding more data to it, which will increase its size.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:50 +01:00
Felix Fietkau
bb14c19122 net: ethernet: mtk_eth_soc: support TC_SETUP_BLOCK for PPE offload
This allows offload entries to be created from user space

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:50 +01:00
David Bentham
817b2fdf16 net: ethernet: mtk_eth_soc: add ipv6 flow offload support
Add the missing IPv6 flow offloading support for routing only.
Hardware flow offloading is done by the packet processing engine (PPE)
of the Ethernet MAC and as it doesn't support mangling of IPv6 packets,
IPv6 NAT cannot be supported.

Signed-off-by: David Bentham <db260179@gmail.com>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:50 +01:00
Felix Fietkau
a333215e10 net: ethernet: mtk_eth_soc: implement flow offloading to WED devices
This allows hardware flow offloading from Ethernet to WLAN on MT7622 SoC

Co-developed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:49 +01:00
Felix Fietkau
804775dfc2 net: ethernet: mtk_eth_soc: add support for Wireless Ethernet Dispatch (WED)
The Wireless Ethernet Dispatch subsystem on the MT7622 SoC can be
configured to intercept and handle access to the DMA queues and
PCIe interrupts for a MT7615/MT7915 wireless card.
It can manage the internal WDMA (Wireless DMA) controller, which allows
ethernet packets to be passed from the packet switch engine (PSE) to the
wireless card, bypassing the CPU entirely.
This can be used to implement hardware flow offloading from ethernet to
WLAN.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:49 +01:00
Felix Fietkau
d776a57e4a net: ethernet: mtk_eth_soc: add support for coherent DMA
It improves performance by eliminating the need for a cache flush on rx and tx
In preparation for supporting WED (Wireless Ethernet Dispatch), also add a
function for disabling coherent DMA at runtime.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 14:08:48 +01:00
Andy Chiu
19c7a43912 net: axiemac: use a phandle to reference pcs_phy
In some SGMII use cases where both a fixed link external PHY and the
internal PCS/PMA PHY need to be configured, we should explicitly use a
phandle "pcs-phy" to get the reference to the PCS/PMA PHY. Otherwise, the
driver would use "phy-handle" in the DT as the reference to both the
external and the internal PCS/PMA PHY.

In other cases where the core is connected to a SFP cage, we could still
point phy-handle to the intenal PCS/PMA PHY, and let the driver connect
to the SFP module, if exist, via phylink.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Robert Hancock <robert.hancock@calian.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2022-04-06 13:54:52 +01:00