Commit Graph

4228 Commits

Author SHA1 Message Date
Lenny Szubowicz
e0efe83ed3 tg3: Disable tg3 PCIe AER on system reboot
Disable PCIe AER on the tg3 device on system reboot on a limited
list of Dell PowerEdge systems. This prevents a fatal PCIe AER event
on the tg3 device during the ACPI _PTS (prepare to sleep) method for
S5 on those systems. The _PTS is invoked by acpi_enter_sleep_state_prep()
as part of the kernel's reboot sequence as a result of commit
38f34dba80 ("PM: ACPI: reboot: Reinstate S5 for reboot").

There was an earlier fix for this problem by commit 2ca1c94ce0
("tg3: Disable tg3 device on system reboot to avoid triggering AER").
But it was discovered that this earlier fix caused a reboot hang
when some Dell PowerEdge servers were booted via ipxe. To address
this reboot hang, the earlier fix was essentially reverted by commit
9fc3bc7643 ("tg3: power down device only on SYSTEM_POWER_OFF").
This re-exposed the tg3 PCIe AER on reboot problem.

This fix is not an ideal solution because the root cause of the AER
is in system firmware. Instead, it's a targeted work-around in the
tg3 driver.

Note also that the PCIe AER must be disabled on the tg3 device even
if the system is configured to use "firmware first" error handling.

V3:
   - Fix sparse warning on improper comparison of pdev->current_state
   - Adhere to netdev comment style

Fixes: 9fc3bc7643 ("tg3: power down device only on SYSTEM_POWER_OFF")
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-02-03 10:12:14 +00:00
Florian Fainelli
46ded70923 net: bcmgenet: Correct overlaying of PHY and MAC Wake-on-LAN
Some Wake-on-LAN modes such as WAKE_FILTER may only be supported by the MAC,
while others might be only supported by the PHY. Make sure that the .get_wol()
returns the union of both rather than only that of the PHY if the PHY supports
Wake-on-LAN.

When disabling Wake-on-LAN, make sure that this is done at both the PHY
and MAC level, rather than doing an early return from the PHY driver.

Fixes: 7e400ff35c ("net: bcmgenet: Add support for PHY-based Wake-on-LAN")
Fixes: 9ee09edc05 ("net: bcmgenet: Properly overlay PHY and MAC Wake-on-LAN capabilities")
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250129231342.35013-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-02-01 16:53:33 -08:00
Linus Torvalds
c2933b2bef Merge tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Including fixes from IPSec, netfilter and Bluetooth.

  Nothing really stands out, but as usual there's a slight concentration
  of fixes for issues added in the last two weeks before the merge
  window, and driver bugs from 6.13 which tend to get discovered upon
  wider distribution.

  Current release - regressions:

   - net: revert RTNL changes in unregister_netdevice_many_notify()

   - Bluetooth: fix possible infinite recursion of btusb_reset

   - eth: adjust locking in some old drivers which protect their state
     with spinlocks to avoid sleeping in atomic; core protects netdev
     state with a mutex now

  Previous releases - regressions:

   - eth:
      - mlx5e: make sure we pass node ID, not CPU ID to kvzalloc_node()
      - bgmac: reduce max frame size to support just 1500 bytes; the
        jumbo frame support would previously cause OOB writes, but now
        fails outright

   - mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted, avoid
     false detection of MPTCP blackholing

  Previous releases - always broken:

   - mptcp: handle fastopen disconnect correctly

   - xfrm:
      - make sure skb->sk is a full sock before accessing its fields
      - fix taking a lock with preempt disabled for RT kernels

   - usb: ipheth: improve safety of packet metadata parsing; prevent
     potential OOB accesses

   - eth: renesas: fix missing rtnl lock in suspend/resume path"

* tag 'net-6.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
  MAINTAINERS: add Neal to TCP maintainers
  net: revert RTNL changes in unregister_netdevice_many_notify()
  net: hsr: fix fill_frame_info() regression vs VLAN packets
  doc: mptcp: sysctl: blackhole_timeout is per-netns
  mptcp: blackhole only if 1st SYN retrans w/o MPC is accepted
  netfilter: nf_tables: reject mismatching sum of field_len with set key length
  net: sh_eth: Fix missing rtnl lock in suspend/resume path
  net: ravb: Fix missing rtnl lock in suspend/resume path
  selftests/net: Add test for loading devbound XDP program in generic mode
  net: xdp: Disallow attaching device-bound programs in generic mode
  tcp: correct handling of extreme memory squeeze
  bgmac: reduce max frame size to support just MTU 1500
  vsock/test: Add test for connect() retries
  vsock/test: Add test for UAF due to socket unbinding
  vsock/test: Introduce vsock_connect_fd()
  vsock/test: Introduce vsock_bind()
  vsock: Allow retrying on connect() failure
  vsock: Keep the binding until socket destruction
  Bluetooth: L2CAP: accept zero as a special value for MTU auto-selection
  Bluetooth: btnxpuart: Fix glitches seen in dual A2DP streaming
  ...
2025-01-30 12:24:20 -08:00
Rafał Miłecki
752e5fcc2e bgmac: reduce max frame size to support just MTU 1500
bgmac allocates new replacement buffer before handling each received
frame. Allocating & DMA-preparing 9724 B each time consumes a lot of CPU
time. Ideally bgmac should just respect currently set MTU but it isn't
the case right now. For now just revert back to the old limited frame
size.

This change bumps NAT masquerade speed by ~95%.

Since commit 8218f62c9c ("mm: page_frag: use initial zero offset for
page_frag_alloc_align()"), the bgmac driver fails to open its network
interface successfully and runs out of memory in the following call
stack:

bgmac_open
  -> bgmac_dma_init
    -> bgmac_dma_rx_skb_for_slot
      -> netdev_alloc_frag

BGMAC_RX_ALLOC_SIZE = 10048 and PAGE_FRAG_CACHE_MAX_SIZE = 32768.

Eventually we land into __page_frag_alloc_align() with the following
parameters across multiple successive calls:

__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=0
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=10048
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=20096
__page_frag_alloc_align: fragsz=10048, align_mask=-1, size=32768, offset=30144

So in that case we do indeed have offset + fragsz (40192) > size (32768)
and so we would eventually return NULL. Reverting to the older 1500
bytes MTU allows the network driver to be usable again.

Fixes: 8c7da63978 ("bgmac: configure MTU and add support for frames beyond 8192 byte size")
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
[florian: expand commit message about recent commits]
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20250127175159.1788246-1-florian.fainelli@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-29 18:57:42 -08:00
Jakub Kicinski
8ed47e4e0b eth: tg3: fix calling napi_enable() in atomic context
tg3 has a spin lock protecting most of the config,
switch to taking netdev_lock() explicitly on enable/start
paths. Disable/stop paths seem to not be under the spin
lock (since napi_disable() already needs to sleep),
so leave that side as is.

tg3_restart_hw() releases and re-takes the spin lock,
we need to do the same because dev_close() needs to
take netdev_lock().

Fixes: 413f0271f3 ("net: protect NAPI enablement with netdev_lock()")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/dcfd56bc-de32-4b11-9e19-d8bd1543745d@stanley.mountain
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250124031841.1179756-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-27 14:30:49 -08:00
Linus Torvalds
0afd22092d Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "Lighter that normal, but the now usual collection of driver fixes and
  small improvements:

   - Small fixes and minor improvements to cxgb4, bnxt_re, rxe, srp,
     efa, cxgb4

   - Update mlx4 to use the new umem APIs, avoiding direct use of
     scatterlist

   - Support ROCEv2 in erdma

   - Remove various uncalled functions, constify bin_attribute

   - Provide core infrastructure to catch netdev events and route them
     to drivers, consolidating duplicated driver code

   - Fix rare race condition crashes in mlx5 ODP flows"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (63 commits)
  RDMA/mlx5: Fix implicit ODP use after free
  RDMA/mlx5: Fix a race for an ODP MR which leads to CQE with error
  RDMA/qib: Constify 'struct bin_attribute'
  RDMA/hfi1: Constify 'struct bin_attribute'
  RDMA/rxe: Fix the warning "__rxe_cleanup+0x12c/0x170 [rdma_rxe]"
  RDMA/cxgb4: Notify rdma stack for IB_EVENT_QP_LAST_WQE_REACHED event
  RDMA/bnxt_re: Allocate dev_attr information dynamically
  RDMA/bnxt_re: Pass the context for ulp_irq_stop
  RDMA/bnxt_re: Add support to handle DCB_CONFIG_CHANGE event
  RDMA/bnxt_re: Query firmware defaults of CC params during probe
  RDMA/bnxt_re: Add Async event handling support
  bnxt_en: Add ULP call to notify async events
  RDMA/mlx5: Fix indirect mkey ODP page count
  MAINTAINERS: Update the bnxt_re maintainers
  RDMA/hns: Clean up the legacy CONFIG_INFINIBAND_HNS
  RDMA/rtrs: Add missing deinit() call
  RDMA/efa: Align interrupt related fields to same type
  RDMA/bnxt_re: Fix to drop reference to the mmap entry in case of error
  RDMA/mlx5: Fix link status down event for MPV
  RDMA/erdma: Support create_ah/destroy_ah in non-sleepable contexts
  ...
2025-01-24 12:21:28 -08:00
Jakub Kicinski
99d028c634 eth: bnxt: update header sizing defaults
300-400B RPC requests are fairly common. With the current default
of 256B HDS threshold bnxt ends up splitting those, lowering PCIe
bandwidth efficiency and increasing the number of memory allocation.

Increase the HDS threshold to fit 4 buffers in a 4k page.
This works out to 640B as the threshold on a typical kernel confing.
This change increases the performance for a microbenchmark which
receives 400B RPCs and sends empty responses by 4.5%.
Admittedly this is just a single benchmark, but 256B works out to
just 6 (so 2 more) packets per head page, because shinfo size
dominates the headers.

Now that we use page pool for the header pages I was also tempted
to default rx_copybreak to 0, but in synthetic testing the copybreak
size doesn't seem to make much difference.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-8-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:58 -08:00
Jakub Kicinski
bee018052d eth: bnxt: allocate enough buffer space to meet HDS threshold
Now that we can configure HDS threshold separately from the rx_copybreak
HDS threshold may be higher than rx_copybreak.

We need to make sure that we have enough space for the headers.

Fixes: 6b43673a25 ("bnxt_en: add support for hds-thresh ethtool command")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-7-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:58 -08:00
Jakub Kicinski
928459bbda net: ethtool: populate the default HDS params in the core
The core has the current HDS config, it can pre-populate the values
for the drivers. While at it, remove the zero-setting in netdevsim.
Zero are the default values since the config is zalloc'ed.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-6-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:58 -08:00
Jakub Kicinski
e58263e911 eth: bnxt: apply hds_thrs settings correctly
Use the pending config for hds_thrs. Core will only update the "current"
one after we return success. Without this change 2 reconfigs would be
required for the setting to reach the device.

Fixes: 6b43673a25 ("bnxt_en: add support for hds-thresh ethtool command")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-5-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:57 -08:00
Jakub Kicinski
3c836451ca net: move HDS config from ethtool state
Separate the HDS config from the ethtool state struct.
The HDS config contains just simple parameters, not state.
Having it as a separate struct will make it easier to clone / copy
and also long term potentially make it per-queue.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250119020518.1962249-2-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-20 11:44:57 -08:00
Jakub Kicinski
17656eb5cf eth: bnxt: fix string truncation warning in FW version
W=1 builds with gcc 14.2.1 report:

drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c:4193:32: error: ‘%s’ directive output may be truncated writing up to 31 bytes into a region of size 27 [-Werror=format-truncation=]
 4193 |                          "/pkg %s", buf);

It's upset that we let buf be full length but then we use 5
characters for "/pkg ".

The builds is also clear with clang version 19.1.5 now.

Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250117183726.1481524-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-18 17:32:45 -08:00
Jakub Kicinski
2ee738e90e Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc8).

Conflicts:

drivers/net/ethernet/realtek/r8169_main.c
  1f691a1fc4 ("r8169: remove redundant hwmon support")
  152d00a913 ("r8169: simplify setting hwmon attribute visibility")
https://lore.kernel.org/20250115122152.760b4e8d@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt.c
  152f4da05a ("bnxt_en: add support for rx-copybreak ethtool command")
  f0aa6a37a3 ("eth: bnxt: always recalculate features after XDP clearing, fix null-deref")

drivers/net/ethernet/intel/ice/ice_type.h
  50327223a8 ("ice: add lock to protect low latency interface")
  dc26548d72 ("ice: Fix quad registers read on E825")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-16 10:34:59 -08:00
Taehee Yoo
6b43673a25 bnxt_en: add support for hds-thresh ethtool command
The bnxt_en driver has configured the hds_threshold value automatically
when TPA is enabled based on the rx-copybreak default value.
Now the hds-thresh ethtool command is added, so it adds an
implementation of hds-thresh option.

Configuration of the hds-thresh is applied only when
the tcp-data-split is enabled. The default value of
hds-thresh is 256, which is the default value of
rx-copybreak, which used to be the hds_thresh value.

The maximum hds-thresh is 1023.

   # Example:
   # ethtool -G enp14s0f0np0 tcp-data-split on hds-thresh 256
   # ethtool -g enp14s0f0np0
   Ring parameters for enp14s0f0np0:
   Pre-set maximums:
   ...
   HDS thresh:  1023
   Current hardware settings:
   ...
   TCP data split:         on
   HDS thresh:  256

Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-9-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 14:42:12 -08:00
Taehee Yoo
87c8f8496a bnxt_en: add support for tcp-data-split ethtool command
NICs that uses bnxt_en driver supports tcp-data-split feature by the
name of HDS(header-data-split).
But there is no implementation for the HDS to enable by ethtool.
Only getting the current HDS status is implemented and The HDS is just
automatically enabled only when either LRO, HW-GRO, or JUMBO is enabled.
The hds_threshold follows rx-copybreak value. and it was unchangeable.

This implements `ethtool -G <interface name> tcp-data-split <value>`
command option.
The value can be <on> and <auto>.
The value is <auto> and one of LRO/GRO/JUMBO is enabled, HDS is
automatically enabled and all LRO/GRO/JUMBO are disabled, HDS is
automatically disabled.

HDS feature relies on the aggregation ring.
So, if HDS is enabled, the bnxt_en driver initializes the aggregation ring.
This is the reason why BNXT_FLAG_AGG_RINGS contains HDS condition.

Acked-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-8-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 14:42:12 -08:00
Taehee Yoo
152f4da05a bnxt_en: add support for rx-copybreak ethtool command
The bnxt_en driver supports rx-copybreak, but it couldn't be set by
userspace. Only the default value(256) has worked.
This patch makes the bnxt_en driver support following command.
`ethtool --set-tunable <devname> rx-copybreak <value> ` and
`ethtool --get-tunable <devname> rx-copybreak`.

By this patch, hds_threshol is set to the rx-copybreak value.
But it will be set by `ethtool -G eth0 hds-thresh N`
in the next patch.

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Tested-by: Stanislav Fomichev <sdf@fomichev.me>
Tested-by: Andy Gospodarek <gospo@broadcom.com>
Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250114142852.3364986-7-ap420073@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 14:42:11 -08:00
Russell King (Oracle)
21f56ad1b2 net: bcm: asp2: convert to phylib managed EEE
Convert the Broadcom ASP2 driver to use phylib managed EEE support.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk81-000r4x-TS@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 13:17:56 -08:00
Russell King (Oracle)
df8017e8a1 net: bcm: asp2: remove tx_lpi_enabled
Phylib maintains a copy of tx_lpi_enabled, which will be used to
populate the member when phy_ethtool_get_eee(). Therefore, writing to
this member before phy_ethtool_get_eee() will have no effect. Remove
it. Also remove setting our copy of info->eee.tx_lpi_enabled which
becomes write-only.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk7w-000r4r-Pq@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 13:17:56 -08:00
Russell King (Oracle)
54033f5512 net: bcm: asp2: fix LPI timer handling
Fix the LPI timer handling in Broadcom ASP2 driver after the phylib
managed EEE patches were merged.

Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Tested-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/E1tXk7r-000r4l-Li@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-15 13:17:56 -08:00
Kalesh AP
57e6464c22 RDMA/bnxt_re: Pass the context for ulp_irq_stop
ulp_irq_stop() can be invoked from a context where FW is healthy or
when FW is in a reset state. In the latter case, ULP must stop all
interactions with HW/FW and also with application and stack. Added a
new parameter to the ulp_irq_stop() function to achieve that.

Reviewed-by: Vikas Gupta <vikas.gupta@broadcom.com>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Chandramohan Akula <chandramohan.akula@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Link: https://patch.msgid.link/1736446693-6692-2-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14 06:22:10 -05:00
Kalesh AP
7fea327840 RDMA/bnxt_re: Add Async event handling support
Using the option provided by Ethernet driver, register for FW Async
event. During probe, while registeriung with Ethernet driver, provide
the ulp hook 'ulp_async_notifier' for receiving the firmware events.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107024553.2926983-3-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14 04:05:20 -05:00
Michael Chan
184fe6f238 bnxt_en: Add ULP call to notify async events
When the driver receives an async event notification from the Firmware,
we make the new ulp_async_notifier() call to inform the RDMA driver that
a firmware async event has been received. RDMA driver can then take
necessary actions based on the event type.

In the next patch, we will implement the ulp_async_notifier() callbacks
in the RDMA driver.

Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://patch.msgid.link/20250107024553.2926983-2-kalesh-anakkur.purayil@broadcom.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-01-14 03:39:46 -05:00
Jakub Kicinski
f0aa6a37a3 eth: bnxt: always recalculate features after XDP clearing, fix null-deref
Recalculate features when XDP is detached.

Before:
  # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
  # ip li set dev eth0 xdp off
  # ethtool -k eth0 | grep gro
  rx-gro-hw: off [requested on]

After:
  # ip li set dev eth0 xdp obj xdp_dummy.bpf.o sec xdp
  # ip li set dev eth0 xdp off
  # ethtool -k eth0 | grep gro
  rx-gro-hw: on

The fact that HW-GRO doesn't get re-enabled automatically is just
a minor annoyance. The real issue is that the features will randomly
come back during another reconfiguration which just happens to invoke
netdev_update_features(). The driver doesn't handle reconfiguring
two things at a time very robustly.

Starting with commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") we only reconfigure the RSS hash table
if the "effective" number of Rx rings has changed. If HW-GRO is
enabled "effective" number of rings is 2x what user sees.
So if we are in the bad state, with HW-GRO re-enablement "pending"
after XDP off, and we lower the rings by / 2 - the HW-GRO rings
doing 2x and the ethtool -L doing / 2 may cancel each other out,
and the:

  if (old_rx_rings != bp->hw_resc.resv_rx_rings &&

condition in __bnxt_reserve_rings() will be false.
The RSS map won't get updated, and we'll crash with:

  BUG: kernel NULL pointer dereference, address: 0000000000000168
  RIP: 0010:__bnxt_hwrm_vnic_set_rss+0x13a/0x1a0
    bnxt_hwrm_vnic_rss_cfg_p5+0x47/0x180
    __bnxt_setup_vnic_p5+0x58/0x110
    bnxt_init_nic+0xb72/0xf50
    __bnxt_open_nic+0x40d/0xab0
    bnxt_open_nic+0x2b/0x60
    ethtool_set_channels+0x18c/0x1d0

As we try to access a freed ring.

The issue is present since XDP support was added, really, but
prior to commit 98ba1d931f ("bnxt_en: Fix RSS logic in
__bnxt_reserve_rings()") it wasn't causing major issues.

Fixes: 1054aee823 ("bnxt_en: Use NETIF_F_GRO_HW.")
Fixes: 98ba1d931f ("bnxt_en: Fix RSS logic in __bnxt_reserve_rings()")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Link: https://patch.msgid.link/20250109043057.2888953-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-10 18:01:29 -08:00
Jakub Kicinski
14ea4cd1b1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc7).

Conflicts:
  a42d71e322 ("net_sched: sch_cake: Add drop reasons")
  737d4d91d3 ("sched: sch_cake: add bounds checks to host bulk flow fairness counts")

Adjacent changes:

drivers/net/ethernet/meta/fbnic/fbnic.h
  3a856ab347 ("eth: fbnic: add IRQ reuse support")
  95978931d5 ("eth: fbnic: Revert "eth: fbnic: Add hardware monitoring support via HWMON interface"")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-09 16:11:47 -08:00
Michael Chan
40452969a5 bnxt_en: Fix DIM shutdown
DIM work will call the firmware to adjust the coalescing parameters on
the RX rings.  We should cancel DIM work before we call the firmware
to free the RX rings.  Otherwise, FW will reject the call from DIM
work if the RX ring has been freed.  This will generate an error
message like this:

bnxt_en 0000:21:00.1 ens2f1np1: hwrm req_type 0x53 seq id 0x6fca error 0x2

and cause unnecessary concern for the user.  It is also possible to
modify the coalescing parameters of the wrong ring if the ring has
been re-allocated.

To prevent this, cancel DIM work right before freeing the RX rings.
We also have to add a check in NAPI poll to not schedule DIM if the
RX rings are shutting down.  Check that the VNIC is active before we
schedule DIM.  The VNIC is always disabled before we free the RX rings.

Fixes: 0bc0b97fca ("bnxt_en: cleanup DIM work on device shutdown")
Reviewed-by: Hongguang Gao <hongguang.gao@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06 16:40:26 -08:00
Kalesh AP
c8dafb0e43 bnxt_en: Fix possible memory leak when hwrm_req_replace fails
When hwrm_req_replace() fails, the driver is not invoking bnxt_req_drop()
which could cause a memory leak.

Fixes: bbf33d1d98 ("bnxt_en: update all firmware calls to use the new APIs")
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20250104043849.3482067-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-06 16:40:26 -08:00
Jakub Kicinski
385f186aba Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.13-rc6).

No conflicts.

Adjacent changes:

include/linux/if_vlan.h
  f91a5b8089 ("af_packet: fix vlan_get_protocol_dgram() vs MSG_PEEK")
  3f330db306 ("net: reformat kdoc return statements")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-01-03 16:29:29 -08:00
Vitalii Mordan
b255ef45fc eth: bcmsysport: fix call balance of priv->clk handling routines
Check the return value of clk_prepare_enable to ensure that priv->clk has
been successfully enabled.

If priv->clk was not enabled during bcm_sysport_probe, bcm_sysport_resume,
or bcm_sysport_open, it must not be disabled in any subsequent execution
paths.

Fixes: 31bc72d976 ("net: systemport: fetch and use clock resources")
Signed-off-by: Vitalii Mordan <mordan@ispras.ru>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Link: https://patch.msgid.link/20241227123007.2333397-1-mordan@ispras.ru
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-30 17:33:46 -08:00
Michael Chan
bf2afe0f14 bnxt_en: Skip reading PXP registers during ethtool -d if unsupported
Newer firmware does not allow reading the PXP registers during
ethtool -d, so skip the firmware call in that case.  Userspace
(bnxt.c) always expects the register block to be populated so
zeroes will be returned instead.

Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-6-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 17:30:00 -08:00
Michael Chan
b45a850585 bnxt_en: Skip MAC loopback selftest if it is unsupported by FW
Call the new HWRM_PORT_MAC_QCAPS to check if mac loopback is
supported.  Skip the MAC loopback ethtool self test if it is
not supported.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Link: https://patch.msgid.link/20241217182620.2454075-5-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 17:30:00 -08:00
Michael Chan
36d1e70a90 bnxt_en: Skip PHY loopback ethtool selftest if unsupported by FW
Skip PHY loopback selftest if firmware advertises that it is unsupported
in the HWRM_PORT_PHY_QCAPS call.  Only show PHY loopback test result to
be 0 if the test has run and passes.  Do the same for external loopback
to be consistent.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-4-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 17:30:00 -08:00
Michael Chan
fac5472fc8 bnxt_en: Do not allow ethtool -m on an untrusted VF
Block all ethtool module operations on an untrusted VF.  The firmware
won't allow it and will return error.

Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 17:30:00 -08:00
Hongguang Gao
b1b66ae094 bnxt_en: Use FW defined resource limits for RoCE
If FW supports setting resource limits for RoCE, then just use the
FW limits instead of using some fixed values in the driver.  These
limits will be used to allocate context memory for QP, SRQ, AH, and
MR resources for RoCE.

Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241217182620.2454075-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-19 17:30:00 -08:00
Joe Hattori
0cb2c504d7 net: ethernet: bgmac-platform: fix an OF node reference leak
The OF node obtained by of_parse_phandle() is not freed. Call
of_node_put() to balance the refcount.

This bug was found by an experimental static analysis tool that I am
developing.

Fixes: 1676aba5ef ("net: ethernet: bgmac: device tree phy enablement")
Signed-off-by: Joe Hattori <joe@pf.is.s.u-tokyo.ac.jp>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241214014912.2810315-1-joe@pf.is.s.u-tokyo.ac.jp
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-12-17 13:22:05 +01:00
Michael Chan
24c6843b73 bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5).  However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits.  The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID.  The aggregation ID mask was not modified
when adding support for P7 chips.  Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer.  It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:

kernel BUG at include/linux/skbuff.h:2766!
Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G           OE      6.12.0-rc2+ #7
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
RIP: 0010:eth_type_trans+0xda/0x140
Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
RSP: 0018:ff615003803fcc28 EFLAGS: 00010283
RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040
RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000
RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001
R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0
R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000
FS:  0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 ? die+0x33/0x90
 ? do_trap+0xd9/0x100
 ? eth_type_trans+0xda/0x140
 ? do_error_trap+0x65/0x80
 ? eth_type_trans+0xda/0x140
 ? exc_invalid_op+0x4e/0x70
 ? eth_type_trans+0xda/0x140
 ? asm_exc_invalid_op+0x16/0x20
 ? eth_type_trans+0xda/0x140
 bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
 ? bnxt_tpa_start+0x195/0x320 [bnxt_en]
 bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
 ? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
 ? kmem_cache_free+0x343/0x440
 ? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
 __bnxt_poll_work+0x193/0x370 [bnxt_en]
 bnxt_poll_p5+0x9a/0x300 [bnxt_en]
 ? try_to_wake_up+0x209/0x670
 __napi_poll+0x29/0x1b0

Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits.  This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.

Fixes: 13d2d3d381 ("bnxt_en: Add new P7 hardware interface definitions")
Reviewed-by: Damodharam Ammepalli <damodharam.ammepalli@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20241209015448.1937766-1-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-10 18:25:38 -08:00
Hongguang Gao
fab4b4d2c9 bnxt_en: Fix potential crash when dumping FW log coredump
If the FW log context memory is retained after FW reset, the existing
code is not handling the condition correctly and zeroes out the data
structures.  This potentially will cause a division by zero crash
when the user runs ethtool -w.  The last_type is also not set
correctly when the context memory is retained.  This will cause errors
because the last_type signals to the FW that all context memory types
have been configured.

Oops: divide error: 0000 1 PREEMPT SMP NOPTI
CPU: 53 UID: 0 PID: 7019 Comm: ethtool Kdump: loaded Tainted: G           OE      6.12.0-rc7+ #1
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Supermicro SYS-621C-TN12R/X13DDW-A, BIOS 1.4 08/10/2023
RIP: 0010:__bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
Code: 0a 31 d2 4c 89 6c 24 10 45 8b a5 fc df ff ff 4c 8b 74 24 20 31 db 66 89 44 24 06 48 63 c5 c1 e5 09 4c 0f af e0 48 8b 44 24 30 <49> f7 f4 4c 89 64 24 08 48 63 c5 4d 89 ec 31 ed 48 89 44 24 18 49
RSP: 0018:ff480591603d78b8 EFLAGS: 00010206
RAX: 0000000000100000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ff23959e46740000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000100000 R09: ff23959e46740000
R10: ff480591603d7a18 R11: 0000000000000010 R12: 0000000000000000
R13: ff23959e46742008 R14: 0000000000000000 R15: 0000000000000000
FS:  00007f04227c1740(0000) GS:ff2395adbf680000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f04225b33a5 CR3: 000000108b9a4001 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 ? die+0x33/0x90
 ? do_trap+0xd9/0x100
 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
 ? do_error_trap+0x65/0x80
 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
 ? exc_divide_error+0x36/0x50
 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
 ? asm_exc_divide_error+0x16/0x20
 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0x86/0x160 [bnxt_en]
 ? __bnxt_copy_ctx_mem.constprop.0.isra.0+0xda/0x160 [bnxt_en]
 bnxt_get_ctx_coredump.constprop.0+0x1ed/0x390 [bnxt_en]
 ? __memcg_slab_post_alloc_hook+0x21c/0x3c0
 ? __bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
 __bnxt_get_coredump+0x473/0x4b0 [bnxt_en]
 ? security_file_alloc+0x74/0xe0
 ? cred_has_capability.isra.0+0x78/0x120
 bnxt_get_coredump_length+0x4b/0xf0 [bnxt_en]
 bnxt_get_dump_flag+0x40/0x60 [bnxt_en]
 __dev_ethtool+0x17e4/0x1fc0
 ? syscall_exit_to_user_mode+0xc/0x1d0
 ? do_syscall_64+0x85/0x150
 ? unmap_page_range+0x299/0x4b0
 ? vma_interval_tree_remove+0x215/0x2c0
 ? __kmalloc_cache_noprof+0x10a/0x300
 dev_ethtool+0xa8/0x170
 dev_ioctl+0x1b5/0x580
 ? sk_ioctl+0x4a/0x110
 sock_do_ioctl+0xab/0xf0
 sock_ioctl+0x1ca/0x2e0
 __x64_sys_ioctl+0x87/0xc0
 do_syscall_64+0x79/0x150

Fixes: 24d694aec1 ("bnxt_en: Allocate backing store memory for FW trace logs")
Signed-off-by: Hongguang Gao <hongguang.gao@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204215918.1692597-3-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:39:13 -08:00
Michael Chan
de37faf41a bnxt_en: Fix GSO type for HW GRO packets on 5750X chips
The existing code is using RSS profile to determine IPV4/IPV6 GSO type
on all chips older than 5760X.  This won't work on 5750X chips that may
be using modified RSS profiles.  This commit from 2018 has updated the
driver to not use RSS profile for HW GRO packets on newer chips:

50f011b63d ("bnxt_en: Update RSS setup and GRO-HW logic according to the latest spec.")

However, a recent commit to add support for the newest 5760X chip broke
the logic.  If the GRO packet needs to be re-segmented by the stack, the
wrong GSO type will cause the packet to be dropped.

Fix it to only use RSS profile to determine GSO type on the oldest
5730X/5740X chips which cannot use the new method and is safe to use the
RSS profiles.

Also fix the L3/L4 hash type for RX packets by not using the RSS
profile for the same reason.  Use the ITYPE field in the RX completion
to determine L3/L4 hash types correctly.

Fixes: a7445d6980 ("bnxt_en: Add support for new RX and TPA_START completion types for P7")
Reviewed-by: Colin Winegarden <colin.winegarden@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204215918.1692597-2-michael.chan@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-06 17:39:13 -08:00
David Wei
bd649c5cc9 bnxt_en: handle tpa_info in queue API implementation
Commit 7ed816be35 ("eth: bnxt: use page pool for head frags") added a
page pool for header frags, which may be distinct from the existing pool
for the aggregation ring. Prior to this change, frags used in the TPA
ring rx_tpa were allocated from system memory e.g. napi_alloc_frag()
meaning their lifetimes were not associated with a page pool. They can
be returned at any time and so the queue API did not alloc or free
rx_tpa.

But now frags come from a separate head_pool which may be different to
page_pool. Without allocating and freeing rx_tpa, frags allocated from
the old head_pool may be returned to a different new head_pool which
causes a mismatch between the pp hold/release count.

Fix this problem by properly freeing and allocating rx_tpa in the queue
API implementation.

Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-4-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04 19:23:35 -08:00
David Wei
bf1782d70d bnxt_en: refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap()
Refactor bnxt_alloc_rx_rings() to call bnxt_alloc_rx_agg_bmap() for
allocating rx_agg_bmap.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-3-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04 19:23:35 -08:00
David Wei
5883a3e0ba bnxt_en: refactor tpa_info alloc/free into helpers
Refactor bnxt_rx_ring_info->tpa_info operations into helpers that work
on a single tpa_info in prep for queue API using them.

There are 2 pairs of operations:

* bnxt_alloc_one_tpa_info()
* bnxt_free_one_tpa_info()

These alloc/free the tpa_info array itself.

* bnxt_alloc_one_tpa_info_data()
* bnxt_free_one_tpa_info_data()

These alloc/free the frags stored in tpa_info array.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: David Wei <dw@davidwei.uk>
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Link: https://patch.msgid.link/20241204041022.56512-2-dw@davidwei.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-12-04 19:23:35 -08:00
Daniel Xu
be75cda92a bnxt_en: ethtool: Supply ntuple rss context action
Commit 2f4f9fe5bf ("bnxt_en: Support adding ntuple rules on RSS
contexts") added support for redirecting to an RSS context as an ntuple
rule action. However, it forgot to update the ETHTOOL_GRXCLSRULE
codepath. This caused `ethtool -n` to always report the action as
"Action: Direct to queue 0" which is wrong.

Fix by teaching bnxt driver to report the RSS context when applicable.

Fixes: 2f4f9fe5bf ("bnxt_en: Support adding ntuple rules on RSS contexts")
Reviewed-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://patch.msgid.link/2e884ae39e08dc5123be7c170a6089cefe6a78f7.1732748253.git.dxu@dxuuu.xyz
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-30 14:16:12 -08:00
Linus Torvalds
65ae975e97 Merge tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from bluetooth.

  Current release - regressions:

   - rtnetlink: fix rtnl_dump_ifinfo() error path

   - bluetooth: remove the redundant sco_conn_put

  Previous releases - regressions:

   - netlink: fix false positive warning in extack during dumps

   - sched: sch_fq: don't follow the fast path if Tx is behind now

   - ipv6: delete temporary address if mngtmpaddr is removed or
     unmanaged

   - tcp: fix use-after-free of nreq in reqsk_timer_handler().

   - bluetooth: fix slab-use-after-free Read in set_powered_sync

   - l2tp: fix warning in l2tp_exit_net found

   - eth:
       - bnxt_en: fix receive ring space parameters when XDP is active
       - lan78xx: fix double free issue with interrupt buffer allocation
       - tg3: set coherent DMA mask bits to 31 for BCM57766 chipsets

  Previous releases - always broken:

   - ipmr: fix tables suspicious RCU usage

   - iucv: MSG_PEEK causes memory leak in iucv_sock_destruct()

   - eth:
       - octeontx2-af: fix low network performance
       - stmmac: dwmac-socfpga: set RX watchdog interrupt as broken
       - rtase: correct the speed for RTL907XD-V1

  Misc:

   - some documentation fixup"

* tag 'net-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (49 commits)
  ipmr: fix build with clang and DEBUG_NET disabled.
  Documentation: tls_offload: fix typos and grammar
  Fix spelling mistake
  ipmr: fix tables suspicious RCU usage
  ip6mr: fix tables suspicious RCU usage
  ipmr: add debug check for mr table cleanup
  selftests: rds: move test.py to TEST_FILES
  net_sched: sch_fq: don't follow the fast path if Tx is behind now
  tcp: Fix use-after-free of nreq in reqsk_timer_handler().
  net: phy: fix phy_ethtool_set_eee() incorrectly enabling LPI
  net: Comment copy_from_sockptr() explaining its behaviour
  rxrpc: Improve setsockopt() handling of malformed user input
  llc: Improve setsockopt() handling of malformed user input
  Bluetooth: SCO: remove the redundant sco_conn_put
  Bluetooth: MGMT: Fix possible deadlocks
  Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync
  bnxt_en: Unregister PTP during PCI shutdown and suspend
  bnxt_en: Refactor bnxt_ptp_init()
  bnxt_en: Fix receive ring space parameters when XDP is active
  bnxt_en: Fix queue start to update vnic RSS table
  ...
2024-11-28 10:15:20 -08:00
Michael Chan
3661c05c54 bnxt_en: Unregister PTP during PCI shutdown and suspend
If we go through the PCI shutdown or suspend path, we shutdown the
NIC but PTP remains registered.  If the kernel continues to run for
a little bit, the periodic PTP .do_aux_work() function may be called
and it will read the PHC from the BAR register.  Since the device
has already been disabled, it will cause a PCIe completion timeout.
Fix it by calling bnxt_ptp_clear() in the PCI shutdown/suspend
handlers.  bnxt_ptp_clear() will unregister from PTP and
.do_aux_work() will be canceled.

In bnxt_resume(), we need to re-initialize PTP.

Fixes: a521c8a01d ("bnxt_en: Move bnxt_ptp_init() from bnxt_open() back to bnxt_init_one()")
Cc: Richard Cochran <richardcochran@gmail.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Michael Chan
1e9614cd95 bnxt_en: Refactor bnxt_ptp_init()
Instead of passing the 2nd parameter phc_cfg to bnxt_ptp_init().
Store it in bp->ptp_cfg so that the caller doesn't need to know what
the value should be.

In the next patch, we'll need to call bnxt_ptp_init() in bnxt_resume()
and this will make it easier.

Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Shravya KN
3051a77a09 bnxt_en: Fix receive ring space parameters when XDP is active
The MTU setting at the time an XDP multi-buffer is attached
determines whether the aggregation ring will be used and the
rx_skb_func handler.  This is done in bnxt_set_rx_skb_mode().

If the MTU is later changed, the aggregation ring setting may need
to be changed and it may become out-of-sync with the settings
initially done in bnxt_set_rx_skb_mode().  This may result in
random memory corruption and crashes as the HW may DMA data larger
than the allocated buffer size, such as:

BUG: kernel NULL pointer dereference, address: 00000000000003c0
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Tainted: G S         OE      6.1.0-226bf9805506 #1
Hardware name: Wiwynn Delta Lake PVT BZA.02601.0150/Delta Lake-Class1, BIOS F0E_3A12 08/26/2021
RIP: 0010:bnxt_rx_pkt+0xe97/0x1ae0 [bnxt_en]
Code: 8b 95 70 ff ff ff 4c 8b 9d 48 ff ff ff 66 41 89 87 b4 00 00 00 e9 0b f7 ff ff 0f b7 43 0a 49 8b 95 a8 04 00 00 25 ff 0f 00 00 <0f> b7 14 42 48 c1 e2 06 49 03 95 a0 04 00 00 0f b6 42 33f
RSP: 0018:ffffa19f40cc0d18 EFLAGS: 00010202
RAX: 00000000000001e0 RBX: ffff8e2c805c6100 RCX: 00000000000007ff
RDX: 0000000000000000 RSI: ffff8e2c271ab990 RDI: ffff8e2c84f12380
RBP: ffffa19f40cc0e48 R08: 000000000001000d R09: 974ea2fcddfa4cbf
R10: 0000000000000000 R11: ffffa19f40cc0ff8 R12: ffff8e2c94b58980
R13: ffff8e2c952d6600 R14: 0000000000000016 R15: ffff8e2c271ab990
FS:  0000000000000000(0000) GS:ffff8e3b3f840000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000003c0 CR3: 0000000e8580a004 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <IRQ>
 __bnxt_poll_work+0x1c2/0x3e0 [bnxt_en]

To address the issue, we now call bnxt_set_rx_skb_mode() within
bnxt_change_mtu() to properly set the AGG rings configuration and
update rx_skb_func based on the new MTU value.
Additionally, BNXT_FLAG_NO_AGG_RINGS is cleared at the beginning of
bnxt_set_rx_skb_mode() to make sure it gets set or cleared based on
the current MTU.

Fixes: 08450ea98a ("bnxt_en: Fix max_mtu setting for multi-buf XDP")
Co-developed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Somnath Kotur
5ac066b7b0 bnxt_en: Fix queue start to update vnic RSS table
HWRM_RING_FREE followed by a HWRM_RING_ALLOC is not guaranteed to
have the same FW ring ID as before.  So we must reinitialize the
RSS table with the correct ring IDs.  Otherwise, traffic may not
resume properly if the restarted ring ID is stale.  Since this
feature is only supported on P5_PLUS chips, we call
bnxt_vnic_set_rss_p5() to update the HW RSS table.

Fixes: 2d694c27d3 ("bnxt_en: implement netdev_queue_mgmt_ops")
Cc: David Wei <dw@davidwei.uk>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Shravya KN
5007991670 bnxt_en: Set backplane link modes correctly for ethtool
Use the return value from bnxt_get_media() to determine the port and
link modes.  bnxt_get_media() returns the proper BNXT_MEDIA_KR when
the PHY is backplane.  This will correct the ethtool settings for
backplane devices.

Fixes: 5d4e1bf606 ("bnxt_en: extend media types to supported and autoneg modes")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Shravya KN <shravya.k-n@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Saravanan Vajravel
5311598f7f bnxt_en: Reserve rings after PCIe AER recovery if NIC interface is down
After successful PCIe AER recovery, FW will reset all resource
reservations.  If it is IF_UP, the driver will call bnxt_open() and
all resources will be reserved again.  It it is IF_DOWN, we should
call bnxt_reserve_rings() so that we can reserve resources including
RoCE resources to allow RoCE to resume after AER.  Without this
patch, RoCE fails to resume in this IF_DOWN scenario.

Later, if it becomes IF_UP, bnxt_open() will see that resources have
been reserved and will not reserve again.

Fixes: fb1e6e562b ("bnxt_en: Fix AER recovery.")
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Saravanan Vajravel <saravanan.vajravel@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-26 15:29:31 +01:00
Pavan Chebbi
614f4d166e tg3: Set coherent DMA mask bits to 31 for BCM57766 chipsets
The hardware on Broadcom 1G chipsets have a known limitation
where they cannot handle DMA addresses that cross over 4GB.
When such an address is encountered, the hardware sets the
address overflow error bit in the DMA status register and
triggers a reset.

However, BCM57766 hardware is setting the overflow bit and
triggering a reset in some cases when there is no actual
underlying address overflow. The hardware team analyzed the
issue and concluded that it is happening when the status
block update has an address with higher (b16 to b31) bits
as 0xffff following a previous update that had lowest bits
as 0xffff.

To work around this bug in the BCM57766 hardware, set the
coherent dma mask from the current 64b to 31b. This will
ensure that upper bits of the status block DMA address are
always at most 0x7fff, thus avoiding the improper overflow
check described above. This work around is intended for only
status block and ring memories and has no effect on TX and
RX buffers as they do not require coherent memory.

Fixes: 72f2afb8a6 ("[TG3]: Add DMA address workaround")
Reported-by: Salam Noureddine <noureddine@arista.com>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Link: https://patch.msgid.link/20241119055741.147144-1-pavan.chebbi@broadcom.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-24 16:45:48 -08:00
Linus Torvalds
2a163a4cea Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "Seveal fixes scattered across the drivers and a few new features:

   - Minor updates and bug fixes to hfi1, efa, iopob, bnxt, hns

   - Force disassociate the userspace FD when hns does an async reset

   - bnxt new features for optimized modify QP to skip certain stayes,
     CQ coalescing, better debug dumping

   - mlx5 new data placement ordering feature

   - Faster destruction of mlx5 devx HW objects

   - Improvements to RDMA CM mad handling"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (51 commits)
  RDMA/bnxt_re: Correct the sequence of device suspend
  RDMA/bnxt_re: Use the default mode of congestion control
  RDMA/bnxt_re: Support different traffic class
  IB/cm: Rework sending DREQ when destroying a cm_id
  IB/cm: Do not hold reference on cm_id unless needed
  IB/cm: Explicitly mark if a response MAD is a retransmission
  RDMA/mlx5: Move events notifier registration to be after device registration
  RDMA/bnxt_re: Cache MSIx info to a local structure
  RDMA/bnxt_re: Refurbish CQ to NQ hash calculation
  RDMA/bnxt_re: Refactor NQ allocation
  RDMA/bnxt_re: Fail probe early when not enough MSI-x vectors are reserved
  RDMA/hns: Fix different dgids mapping to the same dip_idx
  RDMA/bnxt_re: Add set_func_resources support for P5/P7 adapters
  RDMA/bnxt_re: Enhance RoCE SRIOV resource configuration design
  bnxt_en: Add support for RoCE sriov configuration
  RDMA/hns: Fix NULL pointer derefernce in hns_roce_map_mr_sg()
  RDMA/hns: Fix out-of-order issue of requester when setting FENCE
  RDMA/nldev: Add IB device and net device rename events
  RDMA/mlx5: Add implementation for ufile_hw_cleanup device operation
  RDMA/core: Move ib_uverbs_file struct to uverbs_types.h
  ...
2024-11-22 20:03:57 -08:00