Commit Graph

8814 Commits

Author SHA1 Message Date
Maciej Fijalkowski
0cbb08707c i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.

Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
a045d2f2d0 i40e: set xdp_rxq_info::frag_size
i40e support XDP multi-buffer so it is supposed to use
__xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the
frag_size. It can not be simply converted at existing callsite because
rx_buf_len could be un-initialized, so let us register xdp_rxq_info
within i40e_configure_rx_ring(), which happen to be called with already
initialized rx_buf_len value.

Commit 5180ff1364 ("i40e: use int for i40e_status") converted 'err' to
int, so two variables to deal with return codes are not needed within
i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status
from xdp_rxq_info registration.

Fixes: e213ced19b ("i40e: add support for XDP multi-buffer Rx")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
3de38c8717 ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us
make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC
is being set via xsk_pool_get_rx_frame_size() and this needs to be
propagated up to xdp_rxq_info.

Use a bigger hammer and instead of unregistering only xdp_rxq_info's
memory model, unregister it altogether and register it again and have
xdp_rxq_info with correct frag_size value.

Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
290779905d intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
Ice and i40e ZC drivers currently set offset of a frag within
skb_shared_info to 0, which is incorrect. xdp_buffs that come from
xsk_buff_pool always have 256 bytes of a headroom, so they need to be
taken into account to retrieve xdp_buff::data via skb_frag_address().
Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from
xdp_buff::data_hard_start which would result in overwriting existing
payload.

Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:07 -08:00
Maciej Fijalkowski
2ee788c064 ice: remove redundant xdp_rxq_info registration
xdp_rxq_info struct can be registered by drivers via two functions -
xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows
drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size
which in turn will make it possible to grow the packet via
bpf_xdp_adjust_tail() BPF helper.

Currently, ice registers xdp_rxq_info in two spots:
1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG
2) ice_vsi_cfg_rxq()   // via __xdp_rxq_info_reg(), OK

Cited commit under fixes tag took care of setting up frag_size and
updated registration scheme in 2) but it did not help as
1) is called before 2) and as shown above it uses old registration
function. This means that 2) sees that xdp_rxq_info is already
registered and never calls __xdp_rxq_info_reg() which leaves us with
xdp_rxq_info::frag_size being set to 0.

To fix this misbehavior, simply remove xdp_rxq_info_reg() call from
ice_setup_rx_ring().

Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
Tirthendu Sarkar
83014323c6 i40e: handle multi-buffer packets that are shrunk by xdp prog
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail()
helper function. For multi-buffer packets this may lead to reduction of
frag count stored in skb_shared_info area of the xdp_buff struct. This
results in issues with the current handling of XDP_PASS and XDP_DROP
cases.

For XDP_PASS, currently skb is being built using frag count of
xdp_buffer before it was processed by XDP prog and thus will result in
an inconsistent skb when frag count gets reduced by XDP prog. To fix
this, get correct frag count while building the skb instead of using
pre-obtained frag count.

For XDP_DROP, current page recycling logic will not reuse the page but
instead will adjust the pagecnt_bias so that the page can be freed. This
again results in inconsistent behavior as the page refcnt has already
been changed by the helper while freeing the frag(s) as part of
shrinking the packet. To fix this, only adjust pagecnt_bias for buffers
that are stillpart of the packet post-xdp prog run.

Fixes: e213ced19b ("i40e: add support for XDP multi-buffer Rx")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
Maciej Fijalkowski
ad2047cf5d ice: work on pre-XDP prog frag count
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a
multi-buffer packet by 4k bytes and then redirects it to an AF_XDP
socket.

Since support for handling multi-buffer frames was added to XDP, usage
of bpf_xdp_adjust_tail() helper within XDP program can free the page
that given fragment occupies and in turn decrease the fragment count
within skb_shared_info that is embedded in xdp_buff struct. In current
ice driver codebase, it can become problematic when page recycling logic
decides not to reuse the page. In such case, __page_frag_cache_drain()
is used with ice_rx_buf::pagecnt_bias that was not adjusted after
refcount of page was changed by XDP prog which in turn does not drain
the refcount to 0 and page is never freed.

To address this, let us store the count of frags before the XDP program
was executed on Rx ring struct. This will be used to compare with
current frag count from skb_shared_info embedded in xdp_buff. A smaller
value in the latter indicates that XDP prog freed frag(s). Then, for
given delta decrement pagecnt_bias for XDP_DROP verdict.

While at it, let us also handle the EOP frag within
ice_set_rx_bufs_act() to make our life easier, so all of the adjustments
needed to be applied against freed frags are performed in the single
place.

Fixes: 2fba7dc515 ("ice: Add support for XDP multi-buffer on Rx side")
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
Maciej Fijalkowski
f7f6aa8e24 xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is
used by drivers to notify data path whether xdp_buff contains fragments
or not. Data path looks up mentioned flag on first buffer that occupies
the linear part of xdp_buff, so drivers only modify it there. This is
sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on
stack or it resides within struct representing driver's queue and
fragments are carried via skb_frag_t structs. IOW, we are dealing with
only one xdp_buff.

ZC mode though relies on list of xdp_buff structs that is carried via
xsk_buff_pool::xskb_list, so ZC data path has to make sure that
fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise,
xsk_buff_free() could misbehave if it would be executed against xdp_buff
that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can
take place when within supplied XDP program bpf_xdp_adjust_tail() is
used with negative offset that would in turn release the tail fragment
from multi-buffer frame.

Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would
result in releasing all the nodes from xskb_list that were produced by
driver before XDP program execution, which is not what is intended -
only tail fragment should be deleted from xskb_list and then it should
be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never
make it up to user space, so from AF_XDP application POV there would be
no traffic running, however due to free_list getting constantly new
nodes, driver will be able to feed HW Rx queue with recycled buffers.
Bottom line is that instead of traffic being redirected to user space,
it would be continuously dropped.

To fix this, let us clear the mentioned flag on xsk_buff_pool side
during xdp_buff initialization, which is what should have been done
right from the start of XSK multi-buffer support.

Fixes: 1bbc04de60 ("ice: xsk: add RX multi-buffer support")
Fixes: 1c9ba9c146 ("i40e: xsk: add RX multi-buffer support")
Fixes: 24ea50127e ("xsk: support mbuf on ZC RX")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24 16:24:06 -08:00
Michal Schmidt
359724fa3a idpf: distinguish vports by the dev_port attribute
idpf registers multiple netdevs (virtual ports) for one PCI function,
but it does not provide a way for userspace to distinguish them with
sysfs attributes. Per Documentation/ABI/testing/sysfs-class-net, it is
a bug not to set dev_port for independent ports on the same PCI bus,
device and function.

Without dev_port set, systemd-udevd's default naming policy attempts
to assign the same name ("ens2f0") to all four idpf netdevs on my test
system and obviously fails, leaving three of them with the initial
eth<N> name.

With this patch, systemd-udevd is able to assign unique names to the
netdevs (e.g. "ens2f0", "ens2f0d1", "ens2f0d2", "ens2f0d3").

The Intel-provided out-of-tree idpf driver already sets dev_port. In
this patch I chose to do it in the same place in the idpf_cfg_netdev
function.

Fixes: 0fe45467a1 ("idpf: add create vport and netdev configuration")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-21 18:11:09 +00:00
Tony Nguyen
9cfd3b5021 i40e: Include types.h to some headers
Commit 56df345917 ("i40e: Remove circular header dependencies and fix
headers") redistributed a number of includes from one large header file
to the locations they were needed. In some environments, types.h is not
included and causing compile issues. The driver should not rely on
implicit inclusion from other locations; explicitly include it to these
files.

Snippet of issue. Entire log can be seen through the Closes: link.

In file included from drivers/net/ethernet/intel/i40e/i40e_diag.h:7,
                 from drivers/net/ethernet/intel/i40e/i40e_diag.c:4:
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:33:9: error: unknown type name '__le16'
   33 |         __le16 flags;
      |         ^~~~~~
drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h:34:9: error: unknown type name '__le16'
   34 |         __le16 opcode;
      |         ^~~~~~
...
drivers/net/ethernet/intel/i40e/i40e_diag.h:22:9: error: unknown type name 'u32'
   22 |         u32 elements;   /* number of elements if array */
      |         ^~~
drivers/net/ethernet/intel/i40e/i40e_diag.h:23:9: error: unknown type name 'u32'
   23 |         u32 stride;     /* bytes between each element */

Reported-by: Martin Zaharinov <micron10@gmail.com>
Closes: https://lore.kernel.org/netdev/21BBD62A-F874-4E42-B347-93087EEA8126@gmail.com/
Fixes: 56df345917 ("i40e: Remove circular header dependencies and fix headers")
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20240117172534.3555162-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-18 09:52:52 -08:00
Jakub Kicinski
e63c1822ac Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Conflicts:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  e009b2efb7 ("bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()")
  0f2b214779 ("bnxt_en: Fix compile error without CONFIG_RFS_ACCEL")
https://lore.kernel.org/all/20240105115509.225aa8a2@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 18:06:46 -08:00
Jakub Kicinski
2e957f9c32 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-01-03 (i40e, ice, igc)

This series contains updates to i40e, ice, and igc drivers.

Ke Xiao fixes use after free for unicast filters on i40e.

Andrii restores VF MSI-X flag after PCI reset on i40e.

Paul corrects admin queue link status structure to fulfill firmware
expectations for ice.

Rodrigo Cataldo corrects value used for hicredit calculations on igc.

* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Fix hicredit calculation
  ice: fix Get link status data length
  i40e: Restore VF MSI-X state during PCI reset
  i40e: fix use-after-free in i40e_aqc_add_filters()
====================

Link: https://lore.kernel.org/r/20240103193254.822968-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-04 08:19:14 -08:00
David S. Miller
5403d39b4b Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/nex
t-queue

Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2024-01-02 (ice)

This series contains updates to ice driver only.

Karol adds support for capable devices to receive timestamp via
interrupt rather than polling to allow for less delay.

Andrii adds support switchdev hardware packet mirroring.

Jake reworks VF rebuild to avoid destroying objects that do not need to
be.

Jan S removes reporting of rx_len_errors as they are incorrectly reported
by hardware.

Jan G adds const modifier to some uses that are applicable.

Kunwu Chan adds some checks for failed memory allocations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-04 10:40:07 +00:00
Jakub Kicinski
b77c1e3b9f Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-27 (igc)

This series contains updates to igc driver only.

Kurt Kanzenbach resolves issues around VLAN ntuple rules; correctly
reporting back added rules and checking for valid values.

* '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  igc: Check VLAN EtherType mask
  igc: Check VLAN TCI mask
  igc: Report VLAN EtherType matching back to user
====================

Link: https://lore.kernel.org/r/20231227210041.3035055-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 17:02:45 -08:00
Jakub Kicinski
0fa4f912ee Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:

====================
Intel Wired LAN Driver Updates 2023-12-27 (ice, i40e)

This series contains updates to ice and i40e drivers.

Katarzyna changes message to no longer be reported as error under
certain conditions as it can be expected on ice.

Ngai-Mint ensures VSI is always closed when stopping interface to
prevent NULL pointer dereference for ice.

Arkadiusz corrects reporting of phase offset value for ice.

Sudheer corrects checking on ADQ filters to prevent invalid values on
i40e.

* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
  i40e: Fix filter input checks to prevent config with invalid values
  ice: dpll: fix phase offset value
  ice: Shut down VSI with "link-down-on-close" enabled
  ice: Fix link_down_on_close message
====================

Link: https://lore.kernel.org/r/20231227182541.3033124-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-03 17:00:15 -08:00
Rodrigo Cataldo
947dfc8138 igc: Fix hicredit calculation
According to the Intel Software Manual for I225, Section 7.5.2.7,
hicredit should be multiplied by the constant link-rate value, 0x7736.

Currently, the old constant link-rate value, 0x7735, from the boards
supported on igb are being used, most likely due to a copy'n'paste, as
the rest of the logic is the same for both drivers.

Update hicredit accordingly.

Fixes: 1ab011b0bf ("igc: Add support for CBS offloading")
Reviewed-by: Kurt Kanzenbach <kurt@linutronix.de>
Signed-off-by: Rodrigo Cataldo <rodrigo.cadore@l-acoustics.com>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03 11:19:47 -08:00
Paul Greenwalt
9fb266dcb6 ice: fix Get link status data length
Get link status version 2 (opcode 0x0607) is returning an error because FW
expects a data length of 56 bytes, and this is causing the driver to fail
probe.

Update the get link status version 2 data length to 56 bytes by adding 5
byte reserved5 field to the end of struct ice_aqc_get_link_status_data and
passing it as parameter to offsetofend() to the fix error.

Fixes: 2777d24ec6 ("ice: Add ice_get_link_status_datalen")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Paul Greenwalt <paul.greenwalt@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03 11:19:47 -08:00
Andrii Staikov
371e576ff3 i40e: Restore VF MSI-X state during PCI reset
During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability
register will be cleared. This can lead to issues when a VF is
assigned to a VM because in these cases the VF driver receives no
indication of the PF PCI error/reset and additionally it is incapable
of restoring the cleared flag in the hypervisor configuration space
without fully reinitializing the driver interrupt functionality.

Since the VF driver is unable to easily resolve this condition on its own,
restore the VF MSI-X flag during the PF PCI reset handling.

Fixes: 19b7960b2d ("i40e: implement split PCI error reset handler")
Co-developed-by: Karen Ostrowska <karen.ostrowska@intel.com>
Signed-off-by: Karen Ostrowska <karen.ostrowska@intel.com>
Co-developed-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03 11:18:54 -08:00
Ke Xiao
6a15584e99 i40e: fix use-after-free in i40e_aqc_add_filters()
Commit 3116f59c12 ("i40e: fix use-after-free in
i40e_sync_filters_subtask()") avoided use-after-free issues,
by increasing refcount during update the VSI filter list to
the HW. However, it missed the unicast situation.

When deleting an unicast FDB entry, the i40e driver will release
the mac_filter, and i40e_service_task will concurrently request
firmware to add the mac_filter, which will lead to the following
use-after-free issue.

Fix again for both netdev->uc and netdev->mc.

BUG: KASAN: use-after-free in i40e_aqc_add_filters+0x55c/0x5b0 [i40e]
Read of size 2 at addr ffff888eb3452d60 by task kworker/8:7/6379

CPU: 8 PID: 6379 Comm: kworker/8:7 Kdump: loaded Tainted: G
Workqueue: i40e i40e_service_task [i40e]
Call Trace:
 dump_stack+0x71/0xab
 print_address_description+0x6b/0x290
 kasan_report+0x14a/0x2b0
 i40e_aqc_add_filters+0x55c/0x5b0 [i40e]
 i40e_sync_vsi_filters+0x1676/0x39c0 [i40e]
 i40e_service_task+0x1397/0x2bb0 [i40e]
 process_one_work+0x56a/0x11f0
 worker_thread+0x8f/0xf40
 kthread+0x2a0/0x390
 ret_from_fork+0x1f/0x40

Allocated by task 21948:
 kasan_kmalloc+0xa6/0xd0
 kmem_cache_alloc_trace+0xdb/0x1c0
 i40e_add_filter+0x11e/0x520 [i40e]
 i40e_addr_sync+0x37/0x60 [i40e]
 __hw_addr_sync_dev+0x1f5/0x2f0
 i40e_set_rx_mode+0x61/0x1e0 [i40e]
 dev_uc_add_excl+0x137/0x190
 i40e_ndo_fdb_add+0x161/0x260 [i40e]
 rtnl_fdb_add+0x567/0x950
 rtnetlink_rcv_msg+0x5db/0x880
 netlink_rcv_skb+0x254/0x380
 netlink_unicast+0x454/0x610
 netlink_sendmsg+0x747/0xb00
 sock_sendmsg+0xe2/0x120
 __sys_sendto+0x1ae/0x290
 __x64_sys_sendto+0xdd/0x1b0
 do_syscall_64+0xa0/0x370
 entry_SYSCALL_64_after_hwframe+0x65/0xca

Freed by task 21948:
 __kasan_slab_free+0x137/0x190
 kfree+0x8b/0x1b0
 __i40e_del_filter+0x116/0x1e0 [i40e]
 i40e_del_mac_filter+0x16c/0x300 [i40e]
 i40e_addr_unsync+0x134/0x1b0 [i40e]
 __hw_addr_sync_dev+0xff/0x2f0
 i40e_set_rx_mode+0x61/0x1e0 [i40e]
 dev_uc_del+0x77/0x90
 rtnl_fdb_del+0x6a5/0x860
 rtnetlink_rcv_msg+0x5db/0x880
 netlink_rcv_skb+0x254/0x380
 netlink_unicast+0x454/0x610
 netlink_sendmsg+0x747/0xb00
 sock_sendmsg+0xe2/0x120
 __sys_sendto+0x1ae/0x290
 __x64_sys_sendto+0xdd/0x1b0
 do_syscall_64+0xa0/0x370
 entry_SYSCALL_64_after_hwframe+0x65/0xca

Fixes: 3116f59c12 ("i40e: fix use-after-free in i40e_sync_filters_subtask()")
Fixes: 41c445ff0f ("i40e: main driver core")
Signed-off-by: Ke Xiao <xiaoke@sangfor.com.cn>
Signed-off-by: Ding Hui <dinghui@sangfor.com.cn>
Cc: Di Zhu <zhudi2@huawei.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-03 10:27:13 -08:00
Simon Horman
55f96e8bbe i40e: Avoid unnecessary use of comma operator
Although it does not seem to have any untoward side-effects,
the use of ';' to separate to assignments seems more appropriate than ','.

Flagged by clang-17 -Wcomma

No functional change intended.
Compile tested only.

Signed-off-by: Simon Horman <horms@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 13:46:18 -08:00
Andrii Staikov
31deb12e85 i40e: Fix VF disable behavior to block all traffic
Currently, if a VF is disabled using the
'ip link set dev $ETHX vf $VF_NUM state disable' command, the VF is still
able to receive traffic.

Fix the behavior of the 'ip link set dev $ETHX vf $VF_NUM state disable'
to completely shutdown the VF's queues making it entirely disabled and
not able to receive or send any traffic.

Modify the behavior of the 'ip link set $ETHX vf $VF_NUM state enable'
command to make a VF do reinitialization bringing the queues back up.

Co-developed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 13:46:18 -08:00
Jedrzej Jagielski
5795f533f3 ixgbe: Refactor returning internal error codes
Change returning codes to the kernel ones instead of
the internal ones for the entire ixgbe driver.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 13:46:00 -08:00
Jedrzej Jagielski
6c1b4af8c1 ixgbe: Refactor overtemp event handling
Currently ixgbe driver is notified of overheating events
via internal IXGBE_ERR_OVERTEMP error code.

Change the approach for handle_lasi() to use freshly introduced
is_overtemp function parameter which set when such event occurs.
Change check_overtemp() to bool and return true if overtemp
event occurs.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 13:30:26 -08:00
Ovidiu Panait
738808ae82 ixgbe: report link state for VF devices
The link state of VF devices can be controlled via "ip link set", but the
current state (auto/disabled) is not reported by "ip link show".

Update ixgbe_ndo_get_vf_config() to make this info available to userspace.

Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ovidiu Panait <ovidiu.panait@windriver.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 13:30:26 -08:00
Kunwu Chan
3027e7b15b ice: Fix some null pointer dereference issues in ice_ptp.c
devm_kasprintf() returns a pointer to dynamically allocated memory
which can be NULL upon failure.

Fixes: d938a8cca8 ("ice: Auxbus devices & driver for E822 TS")
Cc: Kunwu Chan <kunwu.chan@hotmail.com>
Suggested-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Kunwu Chan <chentao@kylinos.cn>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Jan Glaza
b8ab885819 ice: ice_base.c: Add const modifier to params and vars
Add const modifier to function parameters and variables where appropriate
in ice_base.c and corresponding declarations in ice_base.h.

The reason for starting the change is that read-only pointers should be
marked as const when possible to allow for smoother and more optimal code
generation and optimization as well as allowing the compiler to warn the
developer about potentially unwanted modifications, while not carrying
noticeable negative impact.

Reviewed-by: Andrii Staikov <andrii.staikov@intel.com>
Reviewed-by: Sachin Bahadur <sachin.bahadur@intel.com>
Signed-off-by: Jan Glaza <jan.glaza@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Jan Sokolowski
f9f9de23dc ice: remove rx_len_errors statistic
It was found that this statistic is incorrectly
reported by HW and thus, useless.

As RX length error statistics are shown to the
end user when requested, the values reported
are misleading.

Thus, that value is no longer reported and
doesn't count anymore when adding all rx errors.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Mateusz Polchlopek <mateusz.polchlopek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Jacob Keller
2a2cb4c6c1 ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()
The ice_vf_create_vsi() function and its VF ops helper introduced by commit
a4c785e8162e ("ice: convert vf_ops .vsi_rebuild to .create_vsi") are used
during an individual VF reset to re-create the VSI. This was done in order
to ensure that the VSI gets properly reconfigured within the hardware.

This is somewhat heavy handed as we completely release the VSI memory and
structure, and then create a new VSI. This can also potentially force a
change of the VSI index as we will re-use the first open slot in the VSI
array which may not be the same.

As part of implementing devlink reload, commit 6624e780a5 ("ice: split
ice_vsi_setup into smaller functions") split VSI setup into smaller
functions, introducing both ice_vsi_cfg() and ice_vsi_decfg() which can be
used to configure or deconfigure an existing software VSI structure.

Rather than completely removing the VSI and adding a new one using the
.create_vsi() VF operation, simply use ice_vsi_decfg() to remove the
current configuration. Save the VSI type and then call ice_vsi_cfg() to
reconfigure the VSI as the same type that it was before.

The existing reset logic assumes that all hardware filters will be removed,
so also call ice_fltr_remove_all() before re-configuring the VSI.

This new operation does not re-create the VSI, so rename it to
ice_vf_reconfig_vsi().

The new approach can safely share the exact same flow for both SR-IOV VFs
as well as the Scalable IOV VFs being worked on. This uses less code and is
a better abstraction over fully deleting the VSI and adding a new one.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Petr Oros <poros@redhat.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Andrii Staikov
aa4967d852 ice: Add support for packet mirroring using hardware in switchdev mode
Switchdev mode allows to add mirroring rules to mirror incoming and
outgoing packets to the interface's port representor. Previously, this was
available only using software functionality. Add possibility to offload
this functionality to the NIC hardware.

Introduce ICE_MIRROR_PACKET filter action to the ice_sw_fwd_act_type enum
to identify the desired action and pass it to the hardware as well as the
VSI to mirror.

Example of tc mirror command using hardware:
  tc filter add dev ens1f0np0 ingress protocol ip prio 1 flower src_mac
  b4:96:91:a5:c7:a7 skip_sw action mirred egress mirror dev eth1

ens1f0np0 - PF
b4:96:91:a5:c7:a7 - source MAC address
eth1 - PR of a VF to mirror to

Co-developed-by: Marcin Szycik <marcin.szycik@intel.com>
Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com>
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Karol Kolacinski
82e71b226e ice: Enable SW interrupt from FW for LL TS
Introduce new capability - Low Latency Timestamping with Interrupt.
On supported devices, driver can request a single timestamp from FW
without polling the register afterwards. Instead, FW can issue
a dedicated interrupt when the timestamp was read from the PHY register
and its value is available to read from the register.
This eliminates the need of bottom half scheduling, which results in
minimal delay for timestamping.

For this mode, allocate TS indices sequentially, so that timestamps are
always completed in FIFO manner.

Co-developed-by: Yochai Hagvi <yochai.hagvi@intel.com>
Signed-off-by: Yochai Hagvi <yochai.hagvi@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:32 -08:00
Karol Kolacinski
00d5000144 ice: Schedule service task in IRQ top half
Schedule service task and EXTTS in the top half to avoid bottom half
scheduling if possible, which significantly reduces timestamping delay.

Co-developed-by: Michal Michalik <michal.michalik@intel.com>
Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-01-02 11:18:31 -08:00
Kurt Kanzenbach
7afd49a38e igc: Check VLAN EtherType mask
Currently the driver accepts VLAN EtherType steering rules regardless of
the configured mask. And things might fail silently or with confusing error
messages to the user. The VLAN EtherType can only be matched by full
mask. Therefore, add a check for that.

For instance the following rule is invalid, but the driver accepts it and
ignores the user specified mask:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \
|             m 0x00ff action 0
|Added rule with ID 63
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 63
|        Flow Type: Raw Ethernet
|        Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Ethertype: 0x0 mask: 0xFFFF
|        VLAN EtherType: 0x8100 mask: 0x0
|        VLAN: 0x0 mask: 0xffff
|        User-defined: 0x0 mask: 0xffffffffffffffff
|        Action: Direct to queue 0

After:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 \
|             m 0x00ff action 0
|rmgr: Cannot insert RX class rule: Operation not supported

Fixes: 2b477d057e ("igc: Integrate flex filter into ethtool ops")
Suggested-by: Suman Ghosh <sumang@marvell.com>
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:26:04 -08:00
Kurt Kanzenbach
b5063cbe14 igc: Check VLAN TCI mask
Currently the driver accepts VLAN TCI steering rules regardless of the
configured mask. And things might fail silently or with confusing error
messages to the user.

There are two ways to handle the VLAN TCI mask:

 1. Match on the PCP field using a VLAN prio filter
 2. Match on complete TCI field using a flex filter

Therefore, add checks and code for that.

For instance the following rule is invalid and will be converted into a
VLAN prio rule which is not correct:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \
|             action 1
|Added rule with ID 61
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 61
|        Flow Type: Raw Ethernet
|        Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Ethertype: 0x0 mask: 0xFFFF
|        VLAN EtherType: 0x0 mask: 0xffff
|        VLAN: 0x1 mask: 0x1fff
|        User-defined: 0x0 mask: 0xffffffffffffffff
|        Action: Direct to queue 1

After:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan 0x0001 m 0xf000 \
|             action 1
|rmgr: Cannot insert RX class rule: Operation not supported

Fixes: 7991487ecb ("igc: Allow for Flex Filters to be installed")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:26:04 -08:00
Kurt Kanzenbach
088464abd4 igc: Report VLAN EtherType matching back to user
Currently the driver allows to configure matching by VLAN EtherType.
However, the retrieval function does not report it back to the user. Add
it.

Before:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0
|Added rule with ID 63
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 63
|        Flow Type: Raw Ethernet
|        Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Ethertype: 0x0 mask: 0xFFFF
|        Action: Direct to queue 0

After:
|root@host:~# ethtool -N enp3s0 flow-type ether vlan-etype 0x8100 action 0
|Added rule with ID 63
|root@host:~# ethtool --show-ntuple enp3s0
|4 RX rings available
|Total 1 rules
|
|Filter: 63
|        Flow Type: Raw Ethernet
|        Src MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Dest MAC addr: 00:00:00:00:00:00 mask: FF:FF:FF:FF:FF:FF
|        Ethertype: 0x0 mask: 0xFFFF
|        VLAN EtherType: 0x8100 mask: 0x0
|        VLAN: 0x0 mask: 0xffff
|        User-defined: 0x0 mask: 0xffffffffffffffff
|        Action: Direct to queue 0

Fixes: 2b477d057e ("igc: Integrate flex filter into ethtool ops")
Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:26:04 -08:00
Sudheer Mogilappagari
3e48041d98 i40e: Fix filter input checks to prevent config with invalid values
Prevent VF from configuring filters with unsupported actions or use
REDIRECT action with invalid tc number. Current checks could cause
out of bounds access on PF side.

Fixes: e284fc2804 ("i40e: Add and delete cloud filter")
Reviewed-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:07:11 -08:00
Arkadiusz Kubalewski
8278a6a43d ice: dpll: fix phase offset value
Stop dividing the phase_offset value received from firmware. This fault
is present since the initial implementation.
The phase_offset value received from firmware is in 0.01ps resolution.
Dpll subsystem is using the value in 0.001ps, raw value is adjusted
before providing it to the user.

The user can observe the value of phase offset with response to
`pin-get` netlink message of dpll subsystem for an active pin:
$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml \
	--do pin-get --json '{"id":2}'

Where example of correct response would be:
{'board-label': 'C827_0-RCLKA',
 'capabilities': 6,
 'clock-id': 4658613174691613800,
 'frequency': 1953125,
 'id': 2,
 'module-name': 'ice',
 'parent-device': [{'direction': 'input',
                    'parent-id': 6,
                    'phase-offset': -216839550,
                    'prio': 9,
                    'state': 'connected'},
                   {'direction': 'input',
                    'parent-id': 7,
                    'phase-offset': -42930,
                    'prio': 8,
                    'state': 'connected'}],
 'phase-adjust': 0,
 'phase-adjust-max': 16723,
 'phase-adjust-min': -16723,
 'type': 'mux'}

Provided phase-offset value (-42930) shall be divided by the user with
DPLL_PHASE_OFFSET_DIVIDER to get actual value of -42.930 ps.

Before the fix, the response was not correct:
{'board-label': 'C827_0-RCLKA',
 'capabilities': 6,
 'clock-id': 4658613174691613800,
 'frequency': 1953125,
 'id': 2,
 'module-name': 'ice',
 'parent-device': [{'direction': 'input',
                    'parent-id': 6,
                    'phase-offset': -216839,
                    'prio': 9,
                    'state': 'connected'},
                   {'direction': 'input',
                    'parent-id': 7,
                    'phase-offset': -42,
                    'prio': 8,
                    'state': 'connected'}],
 'phase-adjust': 0,
 'phase-adjust-max': 16723,
 'phase-adjust-min': -16723,
 'type': 'mux'}

Where phase-offset value (-42), after division
(DPLL_PHASE_OFFSET_DIVIDER) would be: -0.042 ps.

Fixes: 8a3a565ff2 ("ice: add admin commands to access cgu configuration")
Fixes: 90e1c90750 ("ice: dpll: implement phase related callbacks")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:07:11 -08:00
Ngai-Mint Kwan
6d05ff55ef ice: Shut down VSI with "link-down-on-close" enabled
Disabling netdev with ethtool private flag "link-down-on-close" enabled
can cause NULL pointer dereference bug. Shut down VSI regardless of
"link-down-on-close" state.

Fixes: 8ac7132704 ("ice: Fix interface being down after reset with link-down-on-close flag on")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Ngai-Mint Kwan <ngai-mint.kwan@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:07:11 -08:00
Katarzyna Wieczerzycka
6a8d8bb55e ice: Fix link_down_on_close message
The driver should not report an error message when for a medialess port
the link_down_on_close flag is enabled and the physical link cannot be
set down.

Fixes: 8ac7132704 ("ice: Fix interface being down after reset with link-down-on-close flag on")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Katarzyna Wieczerzycka <katarzyna.wieczerzycka@intel.com>
Signed-off-by: Wojciech Drewek <wojciech.drewek@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-27 10:07:11 -08:00
Pavan Kumar Linga
a613fb464d idpf: avoid compiler introduced padding in virtchnl2_rss_key struct
Size of the virtchnl2_rss_key struct should be 7 bytes but the
compiler introduces a padding byte for the structure alignment.
This results in idpf sending an additional byte of memory to the device
control plane than the expected buffer size. As the control plane
enforces virtchnl message size checks to validate the message,
set RSS key message fails resulting in the driver load failure.

Remove implicit compiler padding by using "__packed" structure
attribute for the virtchnl2_rss_key struct.

Also there is no need to use __DECLARE_FLEX_ARRAY macro for the
'key_flex' struct field. So drop it.

Fixes: 0d7502a9b4 ("virtchnl: add virtchnl version 2 ops")
Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Scott Register <scott.register@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26 09:05:57 -08:00
Alexander Lobakin
fea7b71b87 idpf: fix corrupted frames and skb leaks in singleq mode
idpf_ring::skb serves only for keeping an incomplete frame between
several NAPI Rx polling cycles, as one cycle may end up before
processing the end of packet descriptor. The pointer is taken from
the ring onto the stack before entering the loop and gets written
there after the loop exits. When inside the loop, only the onstack
pointer is used.
For some reason, the logics is broken in the singleq mode, where the
pointer is taken from the ring each iteration. This means that if a
frame got fragmented into several descriptors, each fragment will have
its own skb, but only the last one will be passed up the stack
(containing garbage), leaving the rest leaked.
Then, on ifdown, rxq::skb is being freed only in the splitq mode, while
it can point to a valid skb in singleq as well. This can lead to a yet
another skb leak.
Just don't touch the ring skb field inside the polling loop, letting
the onstack skb pointer work as expected: build a new skb if it's the
first frame descriptor and attach a frag otherwise. On ifdown, free
rxq::skb unconditionally if the pointer is non-NULL.

Fixes: a5ab9ee0df ("idpf: add singleq start_xmit and napi poll")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Tested-by: Scott Register <scott.register@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-26 09:04:53 -08:00
David S. Miller
afa9e26748 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
intel: use bitfield operations

Jesse Brandeburg says:

After repeatedly getting review comments on new patches, and sporadic
patches to fix parts of our drivers, we should just convert the Intel code
to use FIELD_PREP() and FIELD_GET().  It's then "common" in the code and
hopefully future change-sets will see the context and do-the-right-thing.

This conversion was done with a coccinelle script which is mentioned in the
commit messages. Generally there were only a couple conversions that were
"undone" after the automatic changes because they tried to convert a
non-contiguous mask.

Patch 1 is required at the beginning of this series to fix a "forever"
issue in the e1000e driver that fails the compilation test after conversion
because the shift / mask was out of range.

The second patch just adds all the new #includes in one go.

The patch titled: "ice: fix pre-shifted bit usage" is needed to allow the
use of the FIELD_* macros and fix up the unexpected "shifts included"
defines found while creating this series.

The rest are the conversion to use FIELD_PREP()/FIELD_GET(), and the
occasional leXX_{get,set,encode}_bits() call, as suggested by Alex.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 12:09:52 +00:00
Paolo Abeni
56794e5358 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
  23c93c3b62 ("bnxt_en: do not map packet buffers twice")
  6d1add9553 ("bnxt_en: Modify TX ring indexing logic.")

tools/testing/selftests/net/Makefile
  2258b66648 ("selftests: add vlan hw filter tests")
  a0bc96c0cd ("selftests: net: verify fq per-band packet limit")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 22:17:23 +01:00
Jakub Kicinski
c49b292d03 Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2023-12-18

This PR is larger than usual and contains changes in various parts
of the kernel.

The main changes are:

1) Fix kCFI bugs in BPF, from Peter Zijlstra.

End result: all forms of indirect calls from BPF into kernel
and from kernel into BPF work with CFI enabled. This allows BPF
to work with CONFIG_FINEIBT=y.

2) Introduce BPF token object, from Andrii Nakryiko.

It adds an ability to delegate a subset of BPF features from privileged
daemon (e.g., systemd) through special mount options for userns-bound
BPF FS to a trusted unprivileged application. The design accommodates
suggestions from Christian Brauner and Paul Moore.

Example:
$ sudo mkdir -p /sys/fs/bpf/token
$ sudo mount -t bpf bpffs /sys/fs/bpf/token \
             -o delegate_cmds=prog_load:MAP_CREATE \
             -o delegate_progs=kprobe \
             -o delegate_attachs=xdp

3) Various verifier improvements and fixes, from Andrii Nakryiko, Andrei Matei.

 - Complete precision tracking support for register spills
 - Fix verification of possibly-zero-sized stack accesses
 - Fix access to uninit stack slots
 - Track aligned STACK_ZERO cases as imprecise spilled registers.
   It improves the verifier "instructions processed" metric from single
   digit to 50-60% for some programs.
 - Fix verifier retval logic

4) Support for VLAN tag in XDP hints, from Larysa Zaremba.

5) Allocate BPF trampoline via bpf_prog_pack mechanism, from Song Liu.

End result: better memory utilization and lower I$ miss for calls to BPF
via BPF trampoline.

6) Fix race between BPF prog accessing inner map and parallel delete,
from Hou Tao.

7) Add bpf_xdp_get_xfrm_state() kfunc, from Daniel Xu.

It allows BPF interact with IPSEC infra. The intent is to support
software RSS (via XDP) for the upcoming ipsec pcpu work.
Experiments on AWS demonstrate single tunnel pcpu ipsec reaching
line rate on 100G ENA nics.

8) Expand bpf_cgrp_storage to support cgroup1 non-attach, from Yafang Shao.

9) BPF file verification via fsverity, from Song Liu.

It allows BPF progs get fsverity digest.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (164 commits)
  bpf: Ensure precise is reset to false in __mark_reg_const_zero()
  selftests/bpf: Add more uprobe multi fail tests
  bpf: Fail uprobe multi link with negative offset
  selftests/bpf: Test the release of map btf
  s390/bpf: Fix indirect trampoline generation
  selftests/bpf: Temporarily disable dummy_struct_ops test on s390
  x86/cfi,bpf: Fix bpf_exception_cb() signature
  bpf: Fix dtor CFI
  cfi: Add CFI_NOSEAL()
  x86/cfi,bpf: Fix bpf_struct_ops CFI
  x86/cfi,bpf: Fix bpf_callback_t CFI
  x86/cfi,bpf: Fix BPF JIT call
  cfi: Flip headers
  selftests/bpf: Add test for abnormal cnt during multi-kprobe attachment
  selftests/bpf: Don't use libbpf_get_error() in kprobe_multi_test
  selftests/bpf: Add test for abnormal cnt during multi-uprobe attachment
  bpf: Limit the number of kprobes when attaching program to multiple kprobes
  bpf: Limit the number of uprobes when attaching program to multiple uprobes
  bpf: xdp: Register generic_kfunc_set with XDP programs
  selftests/bpf: utilize string values for delegate_xxx mount options
  ...
====================

Link: https://lore.kernel.org/r/20231219000520.34178-1-alexei.starovoitov@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-18 16:46:08 -08:00
Jesse Brandeburg
6aa7ca3c7d idpf: refactor some missing field get/prep conversions
Most of idpf correctly uses FIELD_GET and FIELD_PREP, but a couple spots
were missed so fix those.

Automated conversion with coccinelle script and manually fixed up,
including audits for opportunities to convert to {get,encode,replace}
bits functions.

Add conversions to le16_get/encode/replace_bits where appropriate. And
in one place fix up a cast from a u16 to a u16.

@prep2@
constant shift,mask;
type T;
expression a;
@@
-(((T)(a) << shift) & mask)
+FIELD_PREP(mask, a)

@prep@
constant shift,mask;
type T;
expression a;
@@
-((T)((a) << shift) & mask)
+FIELD_PREP(mask, a)

@get@
constant shift,mask;
type T;
expression a;
@@
-((T)((a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
316a28daa8 ice: cleanup inconsistent code
It was found while doing further testing of the previous commit
fbf32a9bab91 ("ice: field get conversion") that one of the FIELD_GET
conversions should really be a FIELD_PREP. The previous code was styled
as a match to the FIELD_GET conversion, which always worked because the
shift value was 0.  The code makes way more sense as a FIELD_PREP and
was in fact the only FIELD_GET with two constant arguments in this
series.

Didn't squash this patch to make it easier to call out the
(non-impactful) bug.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
5a259f8e0b ice: field get conversion
Refactor the ice driver to use FIELD_GET() for mask and shift reads,
which reduces lines of code and adds clarity of intent.

This code was generated by the following coccinelle/spatch script and
then manually repaired.

@get@
constant shift,mask;
type T;
expression a;
@@
-(((T)(a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
65db56d5fa iavf: field get conversion
Refactor the iavf driver to use FIELD_GET() for mask and shift reads,
which reduces lines of code and adds clarity of intent.

This code was generated by the following coccinelle/spatch script and
then manually repaired in a later patch.

@get@
constant shift,mask;
type T;
expression a;
@@
-((T)((a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

Cc: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
62589808d7 i40e: field get conversion
Refactor the i40e driver to use FIELD_GET() for mask and shift reads,
which reduces lines of code and adds clarity of intent.

This code was generated by the following coccinelle/spatch script and
then manually repaired.

While making one of the conversions, an if() check was inverted to
return early and avoid un-necessary indentation of the remainder of the
function. In some other cases a stack variable was moved inside the
block where it was used while doing cleanups/review.

A couple places were changed to use le16_get_bits() instead of FIELD_GET
with a le16_to_cpu combination.

@get@
constant shift,mask;
metavariable type T;
expression a;
@@
-(((T)(a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

Cc: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
a8e0c7a680 igc: field get conversion
Refactor the igc driver to use FIELD_GET() for mask and shift reads,
which reduces lines of code and adds clarity of intent.

This code was generated by the following coccinelle/spatch script and
then manually repaired in a later patch.

@get@
constant shift,mask;
type T;
expression a;
@@
-((T)((a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

Cc: Julia Lawall <Julia.Lawall@inria.fr>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00
Jesse Brandeburg
b9a4525450 intel: legacy: field get conversion
Refactor several older Intel drivers to use FIELD_GET(), which reduces
lines of code and adds clarity of intent.

This code was generated by the following coccinelle/spatch script and
then manually repaired.

@get@
constant shift,mask;
type T;
expression a;
@@
(
-((T)((a) & mask) >> shift)
+FIELD_GET(mask, a)

and applied via:
spatch --sp-file field_prep.cocci --in-place --dir \
 drivers/net/ethernet/intel/

Cc: Julia Lawall <Julia.Lawall@inria.fr>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Marcin Szycik <marcin.szycik@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2023-12-18 11:20:43 -08:00