Commit Graph

201 Commits

Author SHA1 Message Date
Dipayaan Roy
fa103fc8f5 net: mana: fix use-after-free in mana_hwc_destroy_channel() by reordering teardown
A potential race condition exists in mana_hwc_destroy_channel() where
hwc->caller_ctx is freed before the HWC's Completion Queue (CQ) and
Event Queue (EQ) are destroyed. This allows an in-flight CQ interrupt
handler to dereference freed memory, leading to a use-after-free or
NULL pointer dereference in mana_hwc_handle_resp().

mana_smc_teardown_hwc() signals the hardware to stop but does not
synchronize against IRQ handlers already executing on other CPUs. The
IRQ synchronization only happens in mana_hwc_destroy_cq() via
mana_gd_destroy_eq() -> mana_gd_deregister_irq(). Since this runs
after kfree(hwc->caller_ctx), a concurrent mana_hwc_rx_event_handler()
can dereference freed caller_ctx (and rxq->msg_buf) in
mana_hwc_handle_resp().

Fix this by reordering teardown to reverse-of-creation order: destroy
the TX/RX work queues and CQ/EQ before freeing hwc->caller_ctx. This
ensures all in-flight interrupt handlers complete before the memory they
access is freed.

Fixes: ca9c54d2d6 ("net: mana: Add a driver for Microsoft Azure Network Adapter (MANA)")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/abHA3AjNtqa1nx9k@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-14 10:40:50 -07:00
Shiraz Saleem
87c2302813 net/mana: Null service_wq on setup error to prevent double destroy
In mana_gd_setup() error path, set gc->service_wq to NULL after
destroy_workqueue() to match the cleanup in mana_gd_cleanup().
This prevents a use-after-free if the workqueue pointer is checked
after a failed setup.

Fixes: f975a09552 ("net: mana: Fix double destroy_workqueue on service rescan PCI path")
Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20260309172443.688392-1-kotaranov@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-11 20:21:45 -07:00
Long Li
dabffd0854 net: mana: Ring doorbell at 4 CQ wraparounds
MANA hardware requires at least one doorbell ring every 8 wraparounds
of the CQ. The driver rings the doorbell as a form of flow control to
inform hardware that CQEs have been consumed.

The NAPI poll functions mana_poll_tx_cq() and mana_poll_rx_cq() can
poll up to CQE_POLLING_BUFFER (512) completions per call. If the CQ
has fewer than 512 entries, a single poll call can process more than
4 wraparounds without ringing the doorbell. The doorbell threshold
check also uses ">" instead of ">=", delaying the ring by one extra
CQE beyond 4 wraparounds. Combined, these issues can cause the driver
to exceed the 8-wraparound hardware limit, leading to missed
completions and stalled queues.

Fix this by capping the number of CQEs polled per call to 4 wraparounds
of the CQ in both TX and RX paths. Also change the doorbell threshold
from ">" to ">=" so the doorbell is rung as soon as 4 wraparounds are
reached.

Cc: stable@vger.kernel.org
Fixes: 58a63729c9 ("net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings")
Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Link: https://patch.msgid.link/20260226192833.1050807-1-longli@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27 19:29:38 -08:00
Linus Torvalds
b9c8fc2cae Merge tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
 "Including fixes from IPsec, Bluetooth and netfilter

  Current release - regressions:

   - wifi: fix dev_alloc_name() return value check

   - rds: fix recursive lock in rds_tcp_conn_slots_available

  Current release - new code bugs:

   - vsock: lock down child_ns_mode as write-once

  Previous releases - regressions:

   - core:
      - do not pass flow_id to set_rps_cpu()
      - consume xmit errors of GSO frames

   - netconsole: avoid OOB reads, msg is not nul-terminated

   - netfilter: h323: fix OOB read in decode_choice()

   - tcp: re-enable acceptance of FIN packets when RWIN is 0

   - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb().

   - wifi: brcmfmac: fix potential kernel oops when probe fails

   - phy: register phy led_triggers during probe to avoid AB-BA deadlock

   - eth:
      - bnxt_en: fix deleting of Ntuple filters
      - wan: farsync: fix use-after-free bugs caused by unfinished tasklets
      - xscale: check for PTP support properly

  Previous releases - always broken:

   - tcp: fix potential race in tcp_v6_syn_recv_sock()

   - kcm: fix zero-frag skb in frag_list on partial sendmsg error

   - xfrm:
      - fix race condition in espintcp_close()
      - always flush state and policy upon NETDEV_UNREGISTER event

   - bluetooth:
      - purge error queues in socket destructors
      - fix response to L2CAP_ECRED_CONN_REQ

   - eth:
      - mlx5:
         - fix circular locking dependency in dump
         - fix "scheduling while atomic" in IPsec MAC address query
      - gve: fix incorrect buffer cleanup for QPL
      - team: avoid NETDEV_CHANGEMTU event when unregistering slave
      - usb: validate USB endpoints"

* tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits)
  netfilter: nf_conntrack_h323: fix OOB read in decode_choice()
  dpaa2-switch: validate num_ifs to prevent out-of-bounds write
  net: consume xmit errors of GSO frames
  vsock: document write-once behavior of the child_ns_mode sysctl
  vsock: lock down child_ns_mode as write-once
  selftests/vsock: change tests to respect write-once child ns mode
  net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query
  net/mlx5: Fix missing devlink lock in SRIOV enable error path
  net/mlx5: E-switch, Clear legacy flag when moving to switchdev
  net/mlx5: LAG, disable MPESW in lag_disable_change()
  net/mlx5: DR, Fix circular locking dependency in dump
  selftests: team: Add a reference count leak test
  team: avoid NETDEV_CHANGEMTU event when unregistering slave
  net: mana: Fix double destroy_workqueue on service rescan PCI path
  MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER
  dpll: zl3073x: Remove redundant cleanup in devm_dpll_init()
  selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0
  tcp: re-enable acceptance of FIN packets when RWIN is 0
  vsock: Use container_of() to get net namespace in sysctl handlers
  net: usb: kaweth: validate USB endpoints
  ...
2026-02-26 08:00:13 -08:00
Dipayaan Roy
f975a09552 net: mana: Fix double destroy_workqueue on service rescan PCI path
While testing corner cases in the driver, a use-after-free crash
was found on the service rescan PCI path.

When mana_serv_reset() calls mana_gd_suspend(), mana_gd_cleanup()
destroys gc->service_wq. If the subsequent mana_gd_resume() fails
with -ETIMEDOUT or -EPROTO, the code falls through to
mana_serv_rescan() which triggers pci_stop_and_remove_bus_device().
This invokes the PCI .remove callback (mana_gd_remove), which calls
mana_gd_cleanup() a second time, attempting to destroy the already-
freed workqueue. Fix this by NULL-checking gc->service_wq in
mana_gd_cleanup() and setting it to NULL after destruction.

Call stack of issue for reference:
[Sat Feb 21 18:53:48 2026] Call Trace:
[Sat Feb 21 18:53:48 2026]  <TASK>
[Sat Feb 21 18:53:48 2026]  mana_gd_cleanup+0x33/0x70 [mana]
[Sat Feb 21 18:53:48 2026]  mana_gd_remove+0x3a/0xc0 [mana]
[Sat Feb 21 18:53:48 2026]  pci_device_remove+0x41/0xb0
[Sat Feb 21 18:53:48 2026]  device_remove+0x46/0x70
[Sat Feb 21 18:53:48 2026]  device_release_driver_internal+0x1e3/0x250
[Sat Feb 21 18:53:48 2026]  device_release_driver+0x12/0x20
[Sat Feb 21 18:53:48 2026]  pci_stop_bus_device+0x6a/0x90
[Sat Feb 21 18:53:48 2026]  pci_stop_and_remove_bus_device+0x13/0x30
[Sat Feb 21 18:53:48 2026]  mana_do_service+0x180/0x290 [mana]
[Sat Feb 21 18:53:48 2026]  mana_serv_func+0x24/0x50 [mana]
[Sat Feb 21 18:53:48 2026]  process_one_work+0x190/0x3d0
[Sat Feb 21 18:53:48 2026]  worker_thread+0x16e/0x2e0
[Sat Feb 21 18:53:48 2026]  kthread+0xf7/0x130
[Sat Feb 21 18:53:48 2026]  ? __pfx_worker_thread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork+0x269/0x350
[Sat Feb 21 18:53:48 2026]  ? __pfx_kthread+0x10/0x10
[Sat Feb 21 18:53:48 2026]  ret_from_fork_asm+0x1a/0x30
[Sat Feb 21 18:53:48 2026]  </TASK>

Fixes: 505cc26bca ("net: mana: Add support for auxiliary device servicing events")
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/aZ2bzL64NagfyHpg@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25 19:16:09 -08:00
Linus Torvalds
32a92f8c89 Convert more 'alloc_obj' cases to default GFP_KERNEL arguments
This converts some of the visually simpler cases that have been split
over multiple lines.  I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script.  I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 20:03:00 -08:00
Linus Torvalds
323bbfcf1e Convert 'alloc_flex' family to use the new default GFP_KERNEL argument
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Linus Torvalds
bf4afc53b7 Convert 'alloc_obj' family to use the new default GFP_KERNEL argument
This was done entirely with mindless brute force, using

    git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
        xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21 17:09:51 -08:00
Kees Cook
69050f8d6d treewide: Replace kmalloc with kmalloc_obj for non-scalar types
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations:	kmalloc(sizeof(TYPE), ...)
are replaced with:	kmalloc_obj(TYPE, ...)

Array allocations:	kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with:	kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations:	kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with:	kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-21 01:02:28 -08:00
Alice Mikityanska
c0165fcb8d net: mana: Remove jumbo_remove step from TX path
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove
unnecessary steps from the mana TX path, that used to check and remove
HBH.

Signed-off-by: Alice Mikityanska <alice@isovalent.com>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260205133925.526371-11-alice.kernel@fastmail.im
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06 20:50:12 -08:00
Breno Leitao
3eb7225718 net: mana: convert to use .get_rx_ring_count
Use the newly introduced .get_rx_ring_count ethtool ops callback instead
of handling ETHTOOL_GRXRINGS directly in .get_rxnfc().

Since ETHTOOL_GRXRINGS was the only command handled by mana_get_rxnfc(),
remove the function entirely.

Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Brett Creeley <brett.creeley@amd.com>
Link: https://patch.msgid.link/20260122-grxring_big_v4-v2-4-94dbe4dcaa10@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23 10:53:05 -08:00
Dipayaan Roy
3b194343c2 net: mana: Implement ndo_tx_timeout and serialize queue resets per port.
Implement .ndo_tx_timeout for MANA so any stalled TX queue can be detected
and a device-controlled port reset for all queues can be scheduled to a
ordered workqueue. The reset for all queues on stall detection is
recomended by hardware team.

Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20260112130552.GA11785@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-13 19:14:36 -08:00
Dipayaan Roy
3387a7ad47 net: mana: Fix use-after-free in reset service rescan path
When mana_serv_reset() encounters -ETIMEDOUT or -EPROTO from
mana_gd_resume(), it performs a PCI rescan via mana_serv_rescan().

mana_serv_rescan() calls pci_stop_and_remove_bus_device(), which can
invoke the driver's remove path and free the gdma_context associated
with the device. After returning, mana_serv_reset() currently jumps to
the out label and attempts to clear gc->in_service, dereferencing a
freed gdma_context.

The issue was observed with the following call logs:
[  698.942636] BUG: unable to handle page fault for address: ff6c2b638088508d
[  698.943121] #PF: supervisor write access in kernel mode
[  698.943423] #PF: error_code(0x0002) - not-present page
[S[  698.943793] Pat Dec  6 07:GD5 100000067 P4D 1002f7067 PUD 1002f8067 PMD 101bef067 PTE 0
0:56 2025] hv_[n e 698.944283] Oops: Oops: 0002 [#1] SMP NOPTI
tvsc f8615163-00[  698.944611] CPU: 28 UID: 0 PID: 249 Comm: kworker/28:1
...
[Sat Dec  6 07:50:56 2025] R10: [  699.121594] mana 7870:00:00.0 enP30832s1: Configured vPort 0 PD 18 DB 16
000000000000001b R11: 0000000000000000 R12: ff44cf3f40270000
[Sat Dec  6 07:50:56 2025] R13: 0000000000000001 R14: ff44cf3f402700c8 R15: ff44cf3f4021b405
[Sat Dec  6 07:50:56 2025] FS:  0000000000000000(0000) GS:ff44cf7e9fcf9000(0000) knlGS:0000000000000000
[Sat Dec  6 07:50:56 2025] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[Sat Dec  6 07:50:56 2025] CR2: ff6c2b638088508d CR3: 000000011fe43001 CR4: 0000000000b73ef0
[Sat Dec  6 07:50:56 2025] Call Trace:
[Sat Dec  6 07:50:56 2025]  <TASK>
[Sat Dec  6 07:50:56 2025]  mana_serv_func+0x24/0x50 [mana]
[Sat Dec  6 07:50:56 2025]  process_one_work+0x190/0x350
[Sat Dec  6 07:50:56 2025]  worker_thread+0x2b7/0x3d0
[Sat Dec  6 07:50:56 2025]  kthread+0xf3/0x200
[Sat Dec  6 07:50:56 2025]  ? __pfx_worker_thread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork+0x21a/0x250
[Sat Dec  6 07:50:56 2025]  ? __pfx_kthread+0x10/0x10
[Sat Dec  6 07:50:56 2025]  ret_from_fork_asm+0x1a/0x30
[Sat Dec  6 07:50:56 2025]  </TASK>

Fix this by returning immediately after mana_serv_rescan() to avoid
accessing GC state that may no longer be valid.

Fixes: 9bf66036d6 ("net: mana: Handle hardware recovery events when probing the device")
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20251218131054.GA3173@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-12-28 10:34:00 +01:00
Long Li
9bf66036d6 net: mana: Handle hardware recovery events when probing the device
When MANA is being probed, it's possible that hardware is in recovery
mode and the device may get GDMA_EQE_HWC_RESET_REQUEST over HWC in the
middle of the probe. Detect such condition and go through the recovery
service procedure.

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1764193552-9712-1-git-send-email-longli@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-12-01 13:53:53 -08:00
Aditya Garg
45120304e8 net: mana: Drop TX skb on post_work_request failure and unmap resources
Drop TX packets when posting the work request fails and ensure DMA
mappings are always cleaned up.

Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-3-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:11:57 -08:00
Aditya Garg
934fa943b5 net: mana: Handle SKB if TX SGEs exceed hardware limit
The MANA hardware supports a maximum of 30 scatter-gather entries (SGEs)
per TX WQE. Exceeding this limit can cause TX failures.
Add ndo_features_check() callback to validate SKB layout before
transmission. For GSO SKBs that would exceed the hardware SGE limit, clear
NETIF_F_GSO_MASK to enforce software segmentation in the stack.
Add a fallback in mana_start_xmit() to linearize non-GSO SKBs that still
exceed the SGE limit.

Also, Add ethtool counter for SKBs linearized

Co-developed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Signed-off-by: Aditya Garg <gargaditya@linux.microsoft.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763464269-10431-2-git-send-email-gargaditya@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19 20:11:57 -08:00
Erni Sri Satya Vennela
be4f1d67ec net: mana: Add standard counter rx_missed_errors
Report standard counter stats->rx_missed_errors
using hc_rx_discards_no_wqe from the hardware.

Add a global workqueue to periodically run
mana_query_gf_stats every 2 seconds to get the latest
info in eth_stats and define a driver capability flag
to notify hardware of the periodic queries.

To avoid repeated failures and log flooding, the workqueue
is not rescheduled if mana_query_gf_stats fails on HWC timeout
error and the stats are reset to 0. Other errors are transient
which will not need a VF reset for recovery.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:52:30 -08:00
Erni Sri Satya Vennela
e275d9091c net: mana: Move hardware counter stats from per-port to per-VF context
Move hardware counter (HC) statistics from mana_port_context to
mana_context to enable sharing stats across multiple network ports
on the same MANA VF. Previously, each network port queried
hardware counters independently using MANA_QUERY_GF_STAT command
(GF = Generic Function stats from GDMA hardware), resulting in
redundant queries when multiple ports existed on the same device.

Isolate hardware counter stats by introducing mana_ethtool_hc_stats
in mana_context and update the code to ensure all stats are properly
reported via ethtool -S <interface>, maintaining consistency with
previous behavior.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1763120599-6331-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17 19:52:30 -08:00
Erni Sri Satya Vennela
140039580e net: mana: Fix incorrect speed reported by debugfs
Once the netshaper is created for MANA, the current bandwidth
is reported in debugfs like this:

$ sudo ./tools/net/ynl/pyynl/cli.py \
  --spec Documentation/netlink/specs/net_shaper.yaml \
  --do set \
  --json '{"ifindex":'3',
           "handle":{ "scope": "netdev", "id":'1' },
           "bw-max": 200000000 }'
None

$ sudo cat /sys/kernel/debug/mana/1/vport0/current_speed
200

After the shaper  is deleted, it is expected to report
the maximum speed supported by the SKU. But currently it is
reporting 0, which is incorrect.

Fix this inconsistency, by resetting apc->speed to apc->max_speed
during deletion of the shaper object. This will improve
readability and debuggability.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/1762369468-32570-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-07 18:49:14 -08:00
Haiyang Zhang
54133f9b4b net: mana: Support HW link state events
Handle the NIC hardware link state events received from the HW
channel, then set the proper link state accordingly.

And, add a feature bit, GDMA_DRV_CAP_FLAG_1_HW_VPORT_LINK_AWARE,
to inform the NIC hardware this handler exists.

Our MANA NIC only sends out the link state down/up messages
when we need to let the VM rerun DHCP client and change IP
address. So, add netif_carrier_on() in the probe(), let the NIC
show the right initial state in /sys/class/net/ethX/operstate.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1761770601-16920-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31 15:56:53 -07:00
Haiyang Zhang
c4deabbc1a net: mana: Reduce waiting time if HWC not responding
If HW Channel (HWC) is not responding, reduce the waiting time, so further
steps will fail quickly.
This will prevent getting stuck for a long time (30 minutes or more), for
example, during unloading while HWC is not responding.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1757537841-5063-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-14 11:54:07 -07:00
Saurabh Sengar
38611e5ada net: mana: Remove redundant netdev_lock_ops_to_full() calls
NET_SHAPER is always selected for MANA driver. When NET_SHAPER is enabled,
netdev_lock_ops_to_full() reduces effectively to only an assert for lock,
which is always held in the path when NET_SHAPER is enabled.

Remove the redundant netdev_lock_ops_to_full() call.

Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://patch.msgid.link/1757393830-20837-1-git-send-email-ssengar@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-10 17:57:09 -07:00
Dipayaan Roy
730ff06d3f net: mana: Use page pool fragments for RX buffers instead of full pages to improve memory efficiency.
This patch enhances RX buffer handling in the mana driver by allocating
pages from a page pool and slicing them into MTU-sized fragments, rather
than dedicating a full page per packet. This approach is especially
beneficial on systems with large base page sizes like 64KB.

Key improvements:

- Proper integration of page pool for RX buffer allocations.
- MTU-sized buffer slicing to improve memory utilization.
- Reduce overall per Rx queue memory footprint.
- Automatic fallback to full-page buffers when:
   * Jumbo frames are enabled (MTU > PAGE_SIZE / 2).
   * The XDP path is active, to avoid complexities with fragment reuse.

Testing on VMs with 64KB pages shows around 200% throughput improvement.
Memory efficiency is significantly improved due to reduced wastage in page
allocations. Example: We are now able to fit 35 rx buffers in a single 64kb
page for MTU size of 1500, instead of 1 rx buffer per page previously.

Tested:

- iperf3, iperf2, and nttcp benchmarks.
- Jumbo frames with MTU 9000.
- Native XDP programs (XDP_PASS, XDP_DROP, XDP_TX, XDP_REDIRECT) for
  testing the XDP path in driver.
- Memory leak detection (kmemleak).
- Driver load/unload, reboot, and stress scenarios.

Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/20250814140410.GA22089@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-08-19 14:42:44 +02:00
Jakub Kicinski
8b5a19b4ff Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc8).

Conflicts:

drivers/net/ethernet/microsoft/mana/gdma_main.c
  9669ddda18 ("net: mana: Fix warnings for missing export.h header inclusion")
  7553911210 ("net: mana: Allocate MSI-X vectors dynamically")
https://lore.kernel.org/20250711130752.23023d98@canb.auug.org.au

Adjacent changes:

drivers/net/ethernet/ti/icssg/icssg_prueth.h
  6e86fb73de ("net: ti: icssg-prueth: Fix buffer allocation for ICSSG")
  ffe8a49091 ("net: ti: icssg-prueth: Read firmware-names from device tree")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-24 11:10:46 -07:00
Shradha Gupta
380a8891fd net: mana: fix spelling for mana_gd_deregiser_irq()
Fix the typo in function name mana_gd_deregiser_irq()

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1752068580-27215-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-10 18:07:25 -07:00
Naman Jain
9669ddda18 net: mana: Fix warnings for missing export.h header inclusion
Fix below warning in Hyper-V's MANA drivers that comes when kernel is
compiled with W=1 option. Include export.h in driver files to fix it.
* warning: EXPORT_SYMBOL() is used, but #include <linux/export.h>
is missing

Signed-off-by: Naman Jain <namjain@linux.microsoft.com>
Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250611100459.92900-7-namjain@linux.microsoft.com
Signed-off-by: Wei Liu <wei.liu@kernel.org>
Message-ID: <20250611100459.92900-7-namjain@linux.microsoft.com>
2025-07-09 23:46:06 +00:00
Haiyang Zhang
fbe346ce9d net: mana: Handle Reset Request from MANA NIC
Upon receiving the Reset Request, pause the connection and clean up
queues, wait for the specified period, then resume the NIC.
In the cleanup phase, the HWC is no longer responding, so set hwc_timeout
to zero to skip waiting on the response.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1751055983-29760-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-07-01 19:17:53 -07:00
Erni Sri Satya Vennela
11cd020698 net: mana: Fix build errors when CONFIG_NET_SHAPER is disabled
Fix build errors when CONFIG_NET_SHAPER is disabled, including:

drivers/net/ethernet/microsoft/mana/mana_en.c:804:10: error:
'const struct net_device_ops' has no member named 'net_shaper_ops'

     804 |         .net_shaper_ops         = &mana_shaper_ops,

drivers/net/ethernet/microsoft/mana/mana_en.c:804:35: error:
initialization of 'int (*)(struct net_device *, struct neigh_parms *)'
from incompatible pointer type 'const struct net_shaper_ops *'
[-Werror=incompatible-pointer-types]

     804 |         .net_shaper_ops         = &mana_shaper_ops,

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Fixes: 75cabb4693 ("net: mana: Add support for net_shaper_ops")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506230625.bfUlqb8o-lkp@intel.com/
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1750851355-8067-1-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26 17:08:26 -07:00
Jakub Kicinski
28aa52b618 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.16-rc4).

Conflicts:

Documentation/netlink/specs/mptcp_pm.yaml
  9e6dd4c256 ("netlink: specs: mptcp: replace underscores with dashes in names")
  ec362192aa ("netlink: specs: fix up indentation errors")
https://lore.kernel.org/20250626122205.389c2cd4@canb.auug.org.au

Adjacent changes:

Documentation/netlink/specs/fou.yaml
  791a9ed0a4 ("netlink: specs: fou: replace underscores with dashes in names")
  880d43ca9a ("netlink: specs: clean up spaces in brackets")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-26 10:40:50 -07:00
Long Li
e0fca6f2ce net: mana: Record doorbell physical address in PF mode
MANA supports RDMA in PF mode. The driver should record the doorbell
physical address when in PF mode.

The doorbell physical address is used by the RDMA driver to map
doorbell pages of the device to user-mode applications through RDMA
verbs interface. In the past, they have been mapped to user-mode while
the device is in VF mode. With the support for PF mode implemented,
also expose those pages in PF mode.

Support for PF mode is implemented in
290e5d3c49 ("net: mana: Add support for Multi Vports on Bare metal")

Signed-off-by: Long Li <longli@microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1750210606-12167-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-19 15:55:22 -07:00
Shradha Gupta
7399ef9840 net: mana: Set tx_packets to post gso processing packet count
Allow tx_packets and tx_bytes counter in the driver to represent
the packets transmitted post GSO processing.

Currently they are populated as bigger pre-GSO packets and bytes

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2025-06-19 23:53:14 +01:00
Erni Sri Satya Vennela
ca8ac489ca net: mana: Handle unsupported HWC commands
If any of the HWC commands are not recognized by the
underlying hardware, the hardware returns the response
header status of -1. Log the information using
netdev_info_once to avoid multiple error logs in dmesg.

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Link: https://patch.msgid.link/1750144656-2021-5-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19 15:13:49 +02:00
Erni Sri Satya Vennela
a6d5edf11e net: mana: Add speed support in mana_get_link_ksettings
Allow mana ethtool get_link_ksettings operation to report
the maximum speed supported by the SKU in mbps.

The driver retrieves this information by issuing a
HWC command to the hardware via mana_query_link_cfg(),
which retrieves the SKU's maximum supported speed.

These APIs when invoked on hardware that are older/do
not support these APIs, the speed would be reported as UNKNOWN.

Before:
$ethtool enP30832s1
> Settings for enP30832s1:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: Unknown!
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes

After:
$ethtool enP30832s1
> Settings for enP30832s1:
        Supported ports: [  ]
        Supported link modes:   Not reported
        Supported pause frame use: No
        Supports auto-negotiation: No
        Supported FEC modes: Not reported
        Advertised link modes:  Not reported
        Advertised pause frame use: No
        Advertised auto-negotiation: No
        Advertised FEC modes: Not reported
        Speed: 16000Mb/s
        Duplex: Full
        Auto-negotiation: off
        Port: Other
        PHYAD: 0
        Transceiver: internal
        Link detected: yes

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1750144656-2021-4-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19 15:13:49 +02:00
Erni Sri Satya Vennela
75cabb4693 net: mana: Add support for net_shaper_ops
Introduce support for net_shaper_ops in the MANA driver,
enabling configuration of rate limiting on the MANA NIC.

To apply rate limiting, the driver issues a HWC command via
mana_set_bw_clamp() and updates the corresponding shaper object
in the net_shaper cache. If an error occurs during this process,
the driver restores the previous speed by querying the current link
configuration using mana_query_link_cfg().

The minimum supported bandwidth is 100 Mbps, and only values that are
exact multiples of 100 Mbps are allowed. Any other values are rejected.

To remove a shaper, the driver resets the bandwidth to the maximum
supported by the SKU using mana_set_bw_clamp() and clears the
associated cache entry. If an error occurs during this process,
the shaper details are retained.

On the hardware that does not support these APIs, the net-shaper
calls to set speed would fail.

Set the speed:
./tools/net/ynl/pyynl/cli.py \
 --spec Documentation/netlink/specs/net_shaper.yaml \
 --do set --json '{"ifindex":'$IFINDEX',
		   "handle":{"scope": "netdev", "id":'$ID' },
		   "bw-max": 200000000 }'

Get the shaper details:
./tools/net/ynl/pyynl/cli.py \
 --spec Documentation/netlink/specs/net_shaper.yaml \
 --do get --json '{"ifindex":'$IFINDEX',
		      "handle":{"scope": "netdev", "id":'$ID' }}'

> {'bw-max': 200000000,
> 'handle': {'scope': 'netdev'},
> 'ifindex': $IFINDEX,
> 'metric': 'bps'}

Delete the shaper object:
./tools/net/ynl/pyynl/cli.py \
 --spec Documentation/netlink/specs/net_shaper.yaml \
 --do delete --json '{"ifindex":'$IFINDEX',
		      "handle":{"scope": "netdev","id":'$ID' }}'

Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1750144656-2021-3-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19 15:13:49 +02:00
Erni Sri Satya Vennela
d5c8f0e4e0 net: mana: Fix potential deadlocks in mana napi ops
When net_shaper_ops are enabled for MANA, netdev_ops_lock
becomes active.

MANA VF setup/teardown by netvsc follows this call chain:

netvsc_vf_setup()
        dev_change_flags()
		...
         __dev_open() OR __dev_close()

dev_change_flags() holds the netdev mutex via netdev_lock_ops.

Meanwhile, mana_create_txq() and mana_create_rxq() in mana_open()
path call NAPI APIs (netif_napi_add_tx(), netif_napi_add_weight(),
napi_enable()), which also try to acquire the same lock, risking
deadlock.

Similarly in the teardown path (mana_close()), netif_napi_disable()
and netif_napi_del(), contend for the same lock.

Switch to the _locked variants of these APIs to avoid deadlocks
when the netdev_ops_lock is held.

Fixes: d4c22ec680 ("net: hold netdev instance lock during ndo_open/ndo_stop")
Signed-off-by: Erni Sri Satya Vennela <ernis@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Saurabh Singh Sengar <ssengar@linux.microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Link: https://patch.msgid.link/1750144656-2021-2-git-send-email-ernis@linux.microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-06-19 15:13:49 +02:00
Jakub Kicinski
0f0decc777 Merge branch 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux
Shradha Gupta says:

====================
Allow dyn MSI-X vector allocation of MANA

In this patchset we want to enable the MANA driver to be able to
allocate MSI-X vectors in PCI dynamically.

The first patch exports pci_msix_prepare_desc() in PCI to be able to
correctly prepare descriptors for dynamically added MSI-X vectors.

The second patch adds the support of dynamic vector allocation in
pci-hyperv PCI controller by enabling the MSI_FLAG_PCI_MSIX_ALLOC_DYN
flag and using the pci_msix_prepare_desc() exported in first patch.

The third patch adds a detailed description of the irq_setup(), to
help understand the function design better.

The fourth patch is a preparation patch for mana changes to support
dynamic IRQ allocation. It contains changes in irq_setup() to allow
skipping first sibling CPU sets, in case certain IRQs are already
affinitized to them.

The fifth patch has the changes in MANA driver to be able to allocate
MSI-X vectors dynamically. If the support does not exist it defaults to
older behavior.

* 'shradha_v6.16-rc1' of https://github.com/shradhagupta6/linux:
  net: mana: Allocate MSI-X vectors dynamically
  net: mana: Allow irq_setup() to skip cpus for affinity
  net: mana: explain irq_setup() algorithm
  PCI: hv: Allow dynamic MSI-X vector allocation
  PCI/MSI: Export pci_msix_prepare_desc() for dynamic MSI-X allocations
====================

Link: https://patch.msgid.link/1749650984-9193-1-git-send-email-shradhagupta@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-17 14:44:41 -07:00
Shradha Gupta
7553911210 net: mana: Allocate MSI-X vectors dynamically
Currently, the MANA driver allocates MSI-X vectors statically based on
MANA_MAX_NUM_QUEUES and num_online_cpus() values and in some cases ends
up allocating more vectors than it needs. This is because, by this time
we do not have a HW channel and do not know how many IRQs should be
allocated.

To avoid this, we allocate 1 MSI-X vector during the creation of HWC and
after getting the value supported by hardware, dynamically add the
remaining MSI-X vectors.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
2025-06-17 06:15:15 +00:00
Shradha Gupta
845c62c543 net: mana: Allow irq_setup() to skip cpus for affinity
In order to prepare the MANA driver to allocate the MSI-X IRQs
dynamically, we need to enhance irq_setup() to allow skipping
affinitizing IRQs to the first CPU sibling group.

This would be for cases when the number of IRQs is less than or equal
to the number of online CPUs. In such cases for dynamically added IRQs
the first CPU sibling group would already be affinitized with HWC IRQ.

Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Yury Norov [NVIDIA] <yury.norov@gmail.com>
2025-06-17 06:15:14 +00:00
Yury Norov
4607617af1 net: mana: explain irq_setup() algorithm
Commit 91bfe210e1 ("net: mana: add a function to spread IRQs per CPUs")
added the irq_setup() function that distributes IRQs on CPUs according
to a tricky heuristic. The corresponding commit message explains the
heuristic.

Duplicate it in the source code to make available for readers without
digging git in history. Also, add more detailed explanation about how
the heuristics is implemented.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
2025-06-17 06:15:14 +00:00
Haiyang Zhang
7768c5f417 net: mana: Add handler for hardware servicing events
To collaborate with hardware servicing events, upon receiving the special
EQE notification from the HW channel, remove the devices on this bus.
Then, after a waiting period based on the device specs, rescan the parent
bus to recover the devices.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/1749834034-18498-1-git-send-email-haiyangz@linux.microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-16 15:19:48 -07:00
Dipayaan Roy
c09ef59e17 net: mana: Expose additional hardware counters for drop and TC via ethtool.
Add support for reporting additional hardware counters for drop and
TC using the ethtool -S interface.

These counters include:

- Aggregate Rx/Tx drop counters
- Per-TC Rx/Tx packet counters
- Per-TC Rx/Tx byte counters
- Per-TC Rx/Tx pause frame counters

The counters are exposed using ethtool_ops->get_ethtool_stats and
ethtool_ops->get_strings. This feature/counters are not available
to all versions of hardware.

Signed-off-by: Dipayaan Roy <dipayanroy@linux.microsoft.com>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/20250609100103.GA7102@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-10 14:23:19 -07:00
Linus Torvalds
dd91b5e1d6 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:
 "Usual collection of driver fixes:

   - Small bug fixes and cleansup in hfi, hns, rxe, mlx5, mana siw

   - Further ODP functionality in rxe

   - Remote access MRs in mana, along with more page sizes

   - Improve CM scalability with a rwlock around the agent

   - More trace points for hns

   - ODP hmm conversion to the new two step dma API

   - Support the ethernet HW device in mana as well as the RNIC

   - Cleanups:
       - Use secs_to_jiffies() when appropriate
       - Use ERR_CAST() instead of naked casts
       - Don't use %pK in printk
       - Unusued functions removed
       - Allocation type matching"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (57 commits)
  RDMA/cma: Fix hang when cma_netevent_callback fails to queue_work
  RDMA/bnxt_re: Support extended stats for Thor2 VF
  RDMA/hns: Fix endian issue in trace events
  RDMA/mlx5: Avoid flexible array warning
  IB/cm: Remove dead code and adjust naming
  RDMA/core: Avoid hmm_dma_map_alloc() for virtual DMA devices
  RDMA/rxe: Break endless pagefault loop for RO pages
  RDMA/bnxt_re: Fix return code of bnxt_re_configure_cc
  RDMA/bnxt_re: Fix missing error handling for tx_queue
  RDMA/bnxt_re: Fix incorrect display of inactivity_cp in debugfs output
  RDMA/mlx5: Add support for 200Gbps per lane speeds
  RDMA/mlx5: Remove the redundant MLX5_IB_STAGE_UAR stage
  RDMA/iwcm: Fix use-after-free of work objects after cm_id destruction
  net: mana: Add support for auxiliary device servicing events
  RDMA/mana_ib: unify mana_ib functions to support any gdma device
  RDMA/mana_ib: Add support of mana_ib for RNIC and ETH nic
  net: mana: Probe rdma device in mana driver
  RDMA/siw: replace redundant ternary operator with just rv
  RDMA/umem: Separate implicit ODP initialization from explicit ODP
  RDMA/core: Convert UMEM ODP DMA mapping to caching IOVA and page linkage
  ...
2025-05-30 10:18:56 -07:00
Haiyang Zhang
290e5d3c49 net: mana: Add support for Multi Vports on Bare metal
To support Multi Vports on Bare metal, increase the device config response
version. And, skip the register HW vport, and register filter steps, when
the Bare metal hostmode is set.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Link: https://patch.msgid.link/1747671636-5810-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-05-28 08:30:46 +02:00
Shiraz Saleem
505cc26bca net: mana: Add support for auxiliary device servicing events
Handle soc servicing events which require the rdma auxiliary device resources to
be cleaned up during a suspend, and re-initialized during a resume.

Signed-off-by: Shiraz Saleem <shirazsaleem@microsoft.com>
Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1746633545-17653-5-git-send-email-kotaranov@linux.microsoft.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-12 07:31:48 -04:00
Konstantin Taranov
ced82fce77 net: mana: Probe rdma device in mana driver
Initialize gdma device for rdma inside mana module.
For each gdma device, initialize an auxiliary ib device.

Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1746633545-17653-2-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-05-12 06:44:52 -04:00
Konstantin Taranov
f1652d76f4 RDMA/mana_ib: Add support of 4M, 1G, and 2G pages
Check PF capability flag whether the 4M, 1G, and 2G pages are
supported. Add these pages sizes to mana_ib, if supported.

Define possible page sizes in enum gdma_page_type and
remove unused enum atb_page_size.

Signed-off-by: Konstantin Taranov <kotaranov@microsoft.com>
Link: https://patch.msgid.link/1744621234-26114-4-git-send-email-kotaranov@linux.microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-04-20 06:36:26 -04:00
Linus Torvalds
acc4d5ff0b Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
 "Rather tiny pull request, mostly so that we can get into our trees
  your fix to the x86 Makefile.

  Current release - regressions:

   - Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc", error
     queue accounting was missed

  Current release - new code bugs:

   - 5 fixes for the netdevice instance locking work

  Previous releases - regressions:

   - usbnet: restore usb%d name exception for local mac addresses

  Previous releases - always broken:

   - rtnetlink: allocate vfinfo size for VF GUIDs when supported, avoid
     spurious GET_LINK failures

   - eth: mana: Switch to page pool for jumbo frames

   - phy: broadcom: Correct BCM5221 PHY model detection

  Misc:

   - selftests: drv-net: replace helpers for referring to other files"

* tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (22 commits)
  Revert "tcp: avoid atomic operations on sk->sk_rmem_alloc"
  bnxt_en: bring back rtnl lock in bnxt_shutdown
  eth: gve: add missing netdev locks on reset and shutdown paths
  selftests: mptcp: ignore mptcp_diag binary
  selftests: mptcp: close fd_in before returning in main_loop
  selftests: mptcp: fix incorrect fd checks in main_loop
  mptcp: fix NULL pointer in can_accept_new_subflow
  octeontx2-af: Free NIX_AF_INT_VEC_GEN irq
  octeontx2-af: Fix mbox INTR handler when num VFs > 64
  net: fix use-after-free in the netdev_nl_sock_priv_destroy()
  selftests: net: use Path helpers in ping
  selftests: net: use the dummy bpf from net/lib
  selftests: drv-net: replace the rpath helper with Path objects
  net: lapbether: use netdev_lockdep_set_classes() helper
  net: phy: broadcom: Correct BCM5221 PHY model detection
  net: usb: usbnet: restore usb%d name exception for local mac addresses
  net/mlx5e: SHAMPO, Make reserved size independent of page size
  net: mana: Switch to page pool for jumbo frames
  MAINTAINERS: Add dedicated entries for phy_link_topology
  net: move replay logic to tc_modify_qdisc
  ...
2025-04-01 20:00:51 -07:00
Linus Torvalds
092e335082 Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma updates from Jason Gunthorpe:

 - Usual minor updates and fixes for bnxt_re, hfi1, rxe, mana, iser,
   mlx5, vmw_pvrdma, hns

 - Make rxe work on tun devices

 - mana gains more standard verbs as it moves toward supporting
   in-kernel verbs

 - DMABUF support for mana

 - Fix page size calculations when memory registration exceeds 4G

 - On Demand Paging support for rxe

 - mlx5 support for RDMA TRANSPORT flow tables and a new ucap mechanism
   to access control use of them

 - Optional RDMA_TX/RX counters per QP in mlx5

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (73 commits)
  IB/mad: Check available slots before posting receive WRs
  RDMA/mana_ib: Fix integer overflow during queue creation
  RDMA/mlx5: Fix calculation of total invalidated pages
  RDMA/mlx5: Fix mlx5_poll_one() cur_qp update flow
  RDMA/mlx5: Fix page_size variable overflow
  RDMA/mlx5: Drop access_flags from _mlx5_mr_cache_alloc()
  RDMA/mlx5: Fix cache entry update on dereg error
  RDMA/mlx5: Fix MR cache initialization error flow
  RDMA/mlx5: Support optional-counters binding for QPs
  RDMA/mlx5: Compile fs.c regardless of INFINIBAND_USER_ACCESS config
  RDMA/core: Pass port to counter bind/unbind operations
  RDMA/core: Add support to optional-counters binding configuration
  RDMA/core: Create and destroy rdma_counter using rdma_zalloc_drv_obj()
  RDMA/mlx5: Add optional counters for RDMA_TX/RX_packets/bytes
  RDMA/core: Fix use-after-free when rename device name
  RDMA/bnxt_re: Support perf management counters
  RDMA/rxe: Fix incorrect return value of rxe_odp_atomic_op()
  RDMA/uverbs: Propagate errors from rdma_lookup_get_uobject()
  RDMA/mana_ib: Handle net event for pointing to the current netdev
  net: mana: Change the function signature of mana_get_primary_netdev_rcu
  ...
2025-03-29 11:12:28 -07:00
Haiyang Zhang
fa37a88496 net: mana: Switch to page pool for jumbo frames
Frag allocators, such as netdev_alloc_frag(), were not designed to
work for fragsz > PAGE_SIZE.

So, switch to page pool for jumbo frames instead of using page frag
allocators. This driver is using page pool for smaller MTUs already.

Cc: stable@vger.kernel.org
Fixes: 80f6215b45 ("net: mana: Add support for jumbo frame")
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Long Li <longli@microsoft.com>
Reviewed-by: Shradha Gupta <shradhagupta@linux.microsoft.com>
Link: https://patch.msgid.link/1742920357-27263-1-git-send-email-haiyangz@microsoft.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-28 04:43:18 -07:00
Lorenzo Bianconi
c313d35f60 net: mana: Add metadata support for xdp mode
Set metadata size building the skb from xdp_buff in mana driver.
mana driver sets xdp headroom to XDP_PACKET_HEADROOM so the headroom is
large enough to contain xdp_frame and xdp metadata.
Please note this patch is just compiled tested.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Link: https://patch.msgid.link/20250318-mvneta-xdp-meta-v2-6-b6075778f61f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-03-25 04:35:42 -07:00