When the PF and VF drivers both support flexible rx descriptors and have
negotiated the VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC capability, the VF driver
queries the PF for the list of supported descriptor formats
(VIRTCHNL_OP_GET_SUPPORTED_RXDIDS). The PF driver is supposed to set the
supported_rxdids bits that correspond to the descriptor formats the
firmware implements. The legacy 32-byte rx desc format is always
supported, even though it is not expressed in GLFLXP_RXDID_FLAGS.
The ice driver does not advertise the legacy 32-byte rx desc support,
which leads to this failure to bring up the VF using the Intel
out-of-tree iavf driver:
iavf 0000:41:01.0: PF does not list support for default Rx descriptor format
...
iavf 0000:41:01.0: PF returned error -5 (VIRTCHNL_STATUS_ERR_PARAM) to our request 6
The in-tree iavf driver does not expose this bug, because it does not
yet implement VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC.
The ice driver must always set the ICE_RXDID_LEGACY_1 bit in
supported_rxdids. The Intel out-of-tree ice driver and the ice driver in
DPDK both do this.
I copied this piece of the code and the comment text from the Intel
out-of-tree driver.
Fixes: e753df8fbc ("ice: Add support Flex RXD")
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Link: https://lore.kernel.org/r/20230920115439.61172-1-mschmidt@redhat.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
When users attempt to obtain the coalesce setting using the
ethtool command, current code always returns 0 for tx-usecs.
This is because I225/6 always uses a queue pair setting, hence
tx_coalesce_usecs does not return a value during the
igc_ethtool_get_coalesce() callback process. The pair queue
condition checking in igc_ethtool_get_coalesce() is removed by
this patch so that the user gets information of the value of tx-usecs.
Even if i225/6 is using queue pair setting, there is no harm in
notifying the user of the tx-usecs. The implementation of the current
code may have previously been a copy of the legacy code i210.
Since I225 has the queue pair setting enabled, tx-usecs will always adhere
to the user-set rx-usecs value. An error message will appear when the user
attempts to set the tx-usecs value for the input parameters because,
by default, they should only set the rx-usecs value.
This patch also adds the helper function to get the
previous rx coalesce value similar to tx coalesce.
How to test:
User can get the coalesce value using ethtool command.
Example command:
Get: ethtool -c <interface>
Previous output:
rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 0
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a
New output:
rx-usecs: 3
rx-frames: n/a
rx-usecs-irq: n/a
rx-frames-irq: n/a
tx-usecs: 3
tx-frames: n/a
tx-usecs-irq: n/a
tx-frames-irq: n/a
Fixes: 8c5ad0dae9 ("igc: Add ethtool support")
Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Link: https://lore.kernel.org/r/20230919170331.1581031-1-anthony.l.nguyen@intel.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
queue
Tony Nguyen says:
====================
This series contains updates to iavf and i40e drivers.
Radoslaw prevents admin queue operations being added when the driver is
being removed for iavf.
Petr Oros immediately starts reconfiguration on changes to VLANs on
iavf.
Ivan Vecera moves reset of VF to occur after port VLAN values are set
on i40e.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When an XDP redirect happens before the link is ready, that
transmission will not finish and will timeout, causing an adapter
reset. If the redirects do not stop, the adapter will not stop
resetting.
Wait for the driver to signal that there's a carrier before allowing
transmissions to proceed.
Previous code was relying that when __IGC_DOWN is cleared, the NIC is
ready to transmit as all the queues are ready, what happens is that
the carrier presence will only be signaled later, after the watchdog
workqueue has a chance to run. And during this interval (between
clearing __IGC_DOWN and the watchdog running) if any transmission
happens the timeout is emitted (detected by igc_tx_timeout()) which
causes the reset, with the potential for the infinite loop.
Fixes: 4ff3203610 ("igc: Add support for XDP_REDIRECT action")
Reported-by: Ferenc Fejes <ferenc.fejes@ericsson.com>
Closes: https://lore.kernel.org/netdev/0caf33cf6adb3a5bf137eeaa20e89b167c9986d5.camel@ericsson.com/
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Ferenc Fejes <ferenc.fejes@ericsson.com>
Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If port VLAN is configured on a VF then any other VLANs on top of this VF
are broken.
During i40e_ndo_set_vf_port_vlan() call the i40e driver reset the VF and
iavf driver asks PF (using VIRTCHNL_OP_GET_VF_RESOURCES) for VF capabilities
but this reset occurs too early, prior setting of vf->info.pvid field
and because this field can be zero during i40e_vc_get_vf_resources_msg()
then VIRTCHNL_VF_OFFLOAD_VLAN capability is reported to iavf driver.
This is wrong because iavf driver should not report VLAN offloading
capability when port VLAN is configured as i40e does not support QinQ
offloading.
Fix the issue by moving VF reset after setting of vf->port_vlan_id
field.
Without this patch:
$ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
$ ip link set enp2s0f0 vf 0 vlan 3
$ ip link set enp2s0f0v0 up
$ ip link add link enp2s0f0v0 name vlan4 type vlan id 4
$ ip link set vlan4 up
...
$ ethtool -k enp2s0f0v0 | grep vlan-offload
rx-vlan-offload: on
tx-vlan-offload: on
$ dmesg -l err | grep iavf
[1292500.742914] iavf 0000:02:02.0: Failed to add VLAN filter, error IAVF_ERR_INVALID_QP_ID
With this patch:
$ echo 1 > /sys/class/net/enp2s0f0/device/sriov_numvfs
$ ip link set enp2s0f0 vf 0 vlan 3
$ ip link set enp2s0f0v0 up
$ ip link add link enp2s0f0v0 name vlan4 type vlan id 4
$ ip link set vlan4 up
...
$ ethtool -k enp2s0f0v0 | grep vlan-offload
rx-vlan-offload: off [requested on]
tx-vlan-offload: off [requested on]
$ dmesg -l err | grep iavf
Fixes: f9b4b6278d ("i40e: Reset the VF upon conflicting VLAN configuration")
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
When the iavf driver wants to reconfigure the VLAN filters
(iavf_add_vlan, iavf_del_vlan), it sets a flag in
aq_required:
adapter->aq_required |= IAVF_FLAG_AQ_ADD_VLAN_FILTER;
or:
adapter->aq_required |= IAVF_FLAG_AQ_DEL_VLAN_FILTER;
This is later processed by the watchdog_task, but it runs periodically
every 2 seconds, so it can be a long time before it processes the request.
In the worst case, the interface is unable to receive traffic for more
than 2 seconds for no objective reason.
Fixes: 5eae00c57f ("i40evf: main driver core")
Signed-off-by: Petr Oros <poros@redhat.com>
Co-developed-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
Signed-off-by: Ivan Vecera <ivecera@redhat.com>
Reviewed-by: Ahmed Zaki <ahmed.zaki@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Prevent schedule operations for adminq during device remove and when
__IAVF_IN_REMOVE_TASK flag is set. Currently, the iavf_down function
adds operations for adminq that shouldn't be processed when the device
is in the __IAVF_REMOVE state.
Reproduction:
echo 4 > /sys/bus/pci/devices/0000:17:00.0/sriov_numvfs
ip link set dev ens1f0 vf 0 trust on
ip link set dev ens1f0 vf 1 trust on
ip link set dev ens1f0 vf 2 trust on
ip link set dev ens1f0 vf 3 trust on
ip link set dev ens1f0 vf 0 mac 00:22:33:44:55:66
ip link set dev ens1f0 vf 1 mac 00:22:33:44:55:67
ip link set dev ens1f0 vf 2 mac 00:22:33:44:55:68
ip link set dev ens1f0 vf 3 mac 00:22:33:44:55:69
echo 0000:17:02.0 > /sys/bus/pci/devices/0000\:17\:02.0/driver/unbind
echo 0000:17:02.1 > /sys/bus/pci/devices/0000\:17\:02.1/driver/unbind
echo 0000:17:02.2 > /sys/bus/pci/devices/0000\:17\:02.2/driver/unbind
echo 0000:17:02.3 > /sys/bus/pci/devices/0000\:17\:02.3/driver/unbind
sleep 10
echo 0000:17:02.0 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.1 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.2 > /sys/bus/pci/drivers/iavf/bind
echo 0000:17:02.3 > /sys/bus/pci/drivers/iavf/bind
modprobe vfio-pci
echo 8086 154c > /sys/bus/pci/drivers/vfio-pci/new_id
qemu-system-x86_64 -accel kvm -m 4096 -cpu host \
-drive file=centos9.qcow2,if=none,id=virtio-disk0 \
-device virtio-blk-pci,drive=virtio-disk0,bootindex=0 -smp 4 \
-device vfio-pci,host=17:02.0 -net none \
-device vfio-pci,host=17:02.1 -net none \
-device vfio-pci,host=17:02.2 -net none \
-device vfio-pci,host=17:02.3 -net none \
-daemonize -vnc :5
Current result:
There is a probability that the mac of VF in guest is inconsistent with
it in host
Expected result:
When passthrough NIC VF to guest, the VF in guest should always get
the same mac as it in host.
Fixes: 14756b2ae2 ("iavf: Fix __IAVF_RESETTING state usage")
Signed-off-by: Radoslaw Tyl <radoslawx.tyl@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit 50f303496d ("igb: Enable SR-IOV after reinit"), removing
the igb module could hang or crash (depending on the machine) when the
module has been loaded with the max_vfs parameter set to some value != 0.
In case of one test machine with a dual port 82580, this hang occurred:
[ 232.480687] igb 0000:41:00.1: removed PHC on enp65s0f1
[ 233.093257] igb 0000:41:00.1: IOV Disabled
[ 233.329969] pcieport 0000:40:01.0: AER: Multiple Uncorrected (Non-Fatal) err0
[ 233.340302] igb 0000:41:00.0: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.352248] igb 0000:41:00.0: device [8086:1516] error status/mask=00100000
[ 233.361088] igb 0000:41:00.0: [20] UnsupReq (First)
[ 233.368183] igb 0000:41:00.0: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.376846] igb 0000:41:00.1: PCIe Bus Error: severity=Uncorrected (Non-Fata)
[ 233.388779] igb 0000:41:00.1: device [8086:1516] error status/mask=00100000
[ 233.397629] igb 0000:41:00.1: [20] UnsupReq (First)
[ 233.404736] igb 0000:41:00.1: AER: TLP Header: 40000001 0000040f cdbfc00c c
[ 233.538214] pci 0000:41:00.1: AER: can't recover (no error_detected callback)
[ 233.538401] igb 0000:41:00.0: removed PHC on enp65s0f0
[ 233.546197] pcieport 0000:40:01.0: AER: device recovery failed
[ 234.157244] igb 0000:41:00.0: IOV Disabled
[ 371.619705] INFO: task irq/35-aerdrv:257 blocked for more than 122 seconds.
[ 371.627489] Not tainted 6.4.0-dirty #2
[ 371.632257] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this.
[ 371.641000] task:irq/35-aerdrv state:D stack:0 pid:257 ppid:2 f0
[ 371.650330] Call Trace:
[ 371.653061] <TASK>
[ 371.655407] __schedule+0x20e/0x660
[ 371.659313] schedule+0x5a/0xd0
[ 371.662824] schedule_preempt_disabled+0x11/0x20
[ 371.667983] __mutex_lock.constprop.0+0x372/0x6c0
[ 371.673237] ? __pfx_aer_root_reset+0x10/0x10
[ 371.678105] report_error_detected+0x25/0x1c0
[ 371.682974] ? __pfx_report_normal_detected+0x10/0x10
[ 371.688618] pci_walk_bus+0x72/0x90
[ 371.692519] pcie_do_recovery+0xb2/0x330
[ 371.696899] aer_process_err_devices+0x117/0x170
[ 371.702055] aer_isr+0x1c0/0x1e0
[ 371.705661] ? __set_cpus_allowed_ptr+0x54/0xa0
[ 371.710723] ? __pfx_irq_thread_fn+0x10/0x10
[ 371.715496] irq_thread_fn+0x20/0x60
[ 371.719491] irq_thread+0xe6/0x1b0
[ 371.723291] ? __pfx_irq_thread_dtor+0x10/0x10
[ 371.728255] ? __pfx_irq_thread+0x10/0x10
[ 371.732731] kthread+0xe2/0x110
[ 371.736243] ? __pfx_kthread+0x10/0x10
[ 371.740430] ret_from_fork+0x2c/0x50
[ 371.744428] </TASK>
The reproducer was a simple script:
#!/bin/sh
for i in `seq 1 5`; do
modprobe -rv igb
modprobe -v igb max_vfs=1
sleep 1
modprobe -rv igb
done
It turned out that this could only be reproduce on 82580 (quad and
dual-port), but not on 82576, i350 and i210. Further debugging showed
that igb_enable_sriov()'s call to pci_enable_sriov() is failing, because
dev->is_physfn is 0 on 82580.
Prior to commit 50f303496d ("igb: Enable SR-IOV after reinit"),
igb_enable_sriov() jumped into the "err_out" cleanup branch. After this
commit it only returned the error code.
So the cleanup didn't take place, and the incorrect VF setup in the
igb_adapter structure fooled the igb driver into assuming that VFs have
been set up where no VF actually existed.
Fix this problem by cleaning up again if pci_enable_sriov() fails.
Fixes: 50f303496d ("igb: Enable SR-IOV after reinit")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The commit in fixes introduced flags to control the status of hardware
configuration while processing packets. At the same time another structure
is used to provide configuration of timestamper to user-space applications.
The way it was coded makes this structures go out of sync easily. The
repro is easy for 82599 chips:
[root@hostname ~]# hwstamp_ctl -i eth0 -r 12 -t 1
current settings:
tx_type 0
rx_filter 0
new settings:
tx_type 1
rx_filter 12
The eth0 device is properly configured to timestamp any PTPv2 events.
[root@hostname ~]# hwstamp_ctl -i eth0 -r 1 -t 1
current settings:
tx_type 1
rx_filter 12
SIOCSHWTSTAMP failed: Numerical result out of range
The requested time stamping mode is not supported by the hardware.
The error is properly returned because HW doesn't support all packets
timestamping. But the adapter->flags is cleared of timestamp flags
even though no HW configuration was done. From that point no RX timestamps
are received by user-space application. But configuration shows good
values:
[root@hostname ~]# hwstamp_ctl -i eth0
current settings:
tx_type 1
rx_filter 12
Fix the issue by applying new flags only when the HW was actually
configured.
Fixes: a9763f3cb5 ("ixgbe: Update PTP to support X550EM_x devices")
Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igb devices can use as low as 64 descriptors.
This change will unify igb with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")
Fixes: 9d5c824399 ("igb: PCI-Express 82575 Gigabit Ethernet driver")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igbvf devices can use as low as 64 descriptors.
This change will unify igbvf with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")
Fixes: d4e0fe01a3 ("igbvf: add new driver to support 82576 virtual functions")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Change the minimum value of RX/TX descriptors to 64 to enable setting the rx/tx
value between 64 and 80. All igc devices can use as low as 64 descriptors.
This change will unify igc with other drivers.
Based on commit 7b1be1987c ("e1000e: lower ring minimum size to 64")
Fixes: 0507ef8a03 ("igc: Add transmit and receive fastpath and interrupt handlers")
Signed-off-by: Olga Zaborska <olga.zaborska@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Disable virtualization features on 82580 just as on i210/i211.
This avoids that virt functions are acidentally called on 82850.
Fixes: 55cac248ca ("igb: Add full support for 82580 devices")
Signed-off-by: Corinna Vinschen <vinschen@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
The ice hardware has a synchronization mechanism used to drive the
simultaneous application of commands on both PHY ports and the source timer
in the MAC.
When issuing a sync via ice_ptp_exec_tmr_cmd(), the hardware will
simultaneously apply the commands programmed for the main timer and each
PHY port. Neither the main timer command register, nor the PHY port command
registers auto clear on command execution.
During the execution of a timer command intended for a single port on E822
devices, such as those used to configure a PHY during link up, the driver
is not correctly clearing the previous commands.
This results in unintentionally executing the last programmed command on
the main timer and other PHY ports whenever performing reconfiguration on
E822 ports after link up. This results in unintended side effects on other
timers, depending on what command was previously programmed.
To fix this, the driver must ensure that the main timer and all other PHY
ports are properly initialized to perform no action.
The enumeration for timer commands does not include an enumeration value
for doing nothing. Introduce ICE_PTP_NOP for this purpose. When writing a
timer command to hardware, leave the command bits set to zero which
indicates that no operation should be performed on that port.
Modify ice_ptp_one_port_cmd() to always initialize all ports. For all ports
other than the one being configured, write their timer command register to
ICE_PTP_NOP. This ensures that no side effect happens on the timer command.
To fix this for the PHY ports, modify ice_ptp_one_port_cmd() to always
initialize all other ports to ICE_PTP_NOP. This ensures that no side
effects happen on the other ports.
Call ice_ptp_src_cmd() with a command value if ICE_PTP_NOP in
ice_sync_phy_timer_e822() and ice_start_phy_timer_e822().
With both of these changes, the driver should no longer execute a stale
command on the main timer or another PHY port when reconfiguring one of the
PHY ports after link up.
Fixes: 3a7496234d ("ice: implement basic E822 PTP support")
Signed-off-by: Siddaraju DH <siddaraju.dh@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Sunitha Mekala <sunithax.d.mekala@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add devices IDs for the next LOM generations that will be available on the
next Intel Client platforms. This patch provides the initial support for
these devices.
Signed-off-by: Sasha Neftin <sasha.neftin@intel.com>
Tested-by: Naama Meir <naamax.meir@linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Add check for pf->vf not being NULL before dereferencing
pf->vf[vsi->vf_id] in updating VSI filter sync.
Add a similar check before dereferencing !pf->vf[vsi->vf_id].trusted
in the condition for clearing promisc mode bit.
Fixes: c87c938f62 ("i40e: Add VF VLAN pruning")
Signed-off-by: Andrii Staikov <andrii.staikov@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During stress test with attaching and detaching VF from KVM and
simultaneously changing VFs spoofcheck and trust there was a
NULL pointer dereference in ice_reset_vf that VF's VSI is null.
More than one instance of ice_reset_vf() can be running at a given
time. When we rebuild the VSI in ice_reset_vf, another reset can be
triaged from ice_service_task. In this case we can access the currently
uninitialized VSI and cause panic. The window for this racing condition
has been around for a long time but it's much worse after commit
227bf4500a ("ice: move VSI delete outside deconfig") because
the reset runs faster. ice_reset_vf() using vf->cfg_lock and when
we move this lock before accessing to the VF VSI, we can fix
BUG for all cases.
Panic occurs sometimes in ice_vsi_is_rx_queue_active() and sometimes
in ice_vsi_stop_all_rx_rings()
With our reproducer, we can hit BUG:
~8h before commit 227bf4500a ("ice: move VSI delete outside deconfig").
~20m after commit 227bf4500a ("ice: move VSI delete outside deconfig").
After this fix we are not able to reproduce it after ~48h
There was commit cf90b74341 ("ice: Fix call trace with null VSI during
VF reset") which also tried to fix this issue, but it was only
partially resolved and the bug still exists.
[ 6420.658415] BUG: kernel NULL pointer dereference, address: 0000000000000000
[ 6420.665382] #PF: supervisor read access in kernel mode
[ 6420.670521] #PF: error_code(0x0000) - not-present page
[ 6420.675659] PGD 0
[ 6420.677679] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 6420.682038] CPU: 53 PID: 326472 Comm: kworker/53:0 Kdump: loaded Not tainted 5.14.0-317.el9.x86_64 #1
[ 6420.691250] Hardware name: Dell Inc. PowerEdge R750/04V528, BIOS 1.6.5 04/15/2022
[ 6420.698729] Workqueue: ice ice_service_task [ice]
[ 6420.703462] RIP: 0010:ice_vsi_is_rx_queue_active+0x2d/0x60 [ice]
[ 6420.705860] ice 0000:ca:00.0: VF 0 is now untrusted
[ 6420.709494] Code: 00 00 66 83 bf 76 04 00 00 00 48 8b 77 10 74 3e 31 c0 eb 0f 0f b7 97 76 04 00 00 48 83 c0 01 39 c2 7e 2b 48 8b 97 68 04 00 00 <0f> b7 0c 42 48 8b 96 20 13 00 00 48 8d 94 8a 00 00 12 00 8b 12 83
[ 6420.714426] ice 0000:ca:00.0 ens7f0: Setting MAC 22:22:22:22:22:00 on VF 0. VF driver will be reinitialized
[ 6420.733120] RSP: 0018:ff778d2ff383fdd8 EFLAGS: 00010246
[ 6420.733123] RAX: 0000000000000000 RBX: ff2acf1916294000 RCX: 0000000000000000
[ 6420.733125] RDX: 0000000000000000 RSI: ff2acf1f2c6401a0 RDI: ff2acf1a27301828
[ 6420.762346] RBP: ff2acf1a27301828 R08: 0000000000000010 R09: 0000000000001000
[ 6420.769476] R10: ff2acf1916286000 R11: 00000000019eba3f R12: ff2acf19066460d0
[ 6420.776611] R13: ff2acf1f2c6401a0 R14: ff2acf1f2c6401a0 R15: 00000000ffffffff
[ 6420.783742] FS: 0000000000000000(0000) GS:ff2acf28ffa80000(0000) knlGS:0000000000000000
[ 6420.791829] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 6420.797575] CR2: 0000000000000000 CR3: 00000016ad410003 CR4: 0000000000773ee0
[ 6420.804708] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 6420.811034] vfio-pci 0000:ca:01.0: enabling device (0000 -> 0002)
[ 6420.811840] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 6420.811841] PKRU: 55555554
[ 6420.811842] Call Trace:
[ 6420.811843] <TASK>
[ 6420.811844] ice_reset_vf+0x9a/0x450 [ice]
[ 6420.811876] ice_process_vflr_event+0x8f/0xc0 [ice]
[ 6420.841343] ice_service_task+0x23b/0x600 [ice]
[ 6420.845884] ? __schedule+0x212/0x550
[ 6420.849550] process_one_work+0x1e2/0x3b0
[ 6420.853563] ? rescuer_thread+0x390/0x390
[ 6420.857577] worker_thread+0x50/0x3a0
[ 6420.861242] ? rescuer_thread+0x390/0x390
[ 6420.865253] kthread+0xdd/0x100
[ 6420.868400] ? kthread_complete_and_exit+0x20/0x20
[ 6420.873194] ret_from_fork+0x1f/0x30
[ 6420.876774] </TASK>
[ 6420.878967] Modules linked in: vfio_pci vfio_pci_core vfio_iommu_type1 vfio iavf vhost_net vhost vhost_iotlb tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter nf_tables bridge stp llc sctp ip6_udp_tunnel udp_tunnel nfp tls nfnetlink bluetooth mlx4_en mlx4_core rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace fscache netfs rfkill sunrpc intel_rapl_msr intel_rapl_common i10nm_edac nfit libnvdimm ipmi_ssif x86_pkg_temp_thermal intel_powerclamp coretemp irdma kvm_intel i40e kvm iTCO_wdt dcdbas ib_uverbs irqbypass iTCO_vendor_support mgag200 mei_me ib_core dell_smbios isst_if_mmio isst_if_mbox_pci rapl i2c_algo_bit drm_shmem_helper intel_cstate drm_kms_helper syscopyarea sysfillrect isst_if_common sysimgblt intel_uncore fb_sys_fops dell_wmi_descriptor wmi_bmof intel_vsec mei i2c_i801 acpi_ipmi ipmi_si i2c_smbus ipmi_devintf intel_pch_thermal acpi_power_meter pcspk
r
Fixes: efe4186000 ("ice: Fix memory corruption in VF driver")
Fixes: f23df5220d ("ice: Fix spurious interrupt during removal of trusted VF")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
This reverts commit 7255355a06.
After this commit we are not able to attach VF to VM:
virsh attach-interface v0 hostdev --managed 0000:41:01.0 --mac 52:52:52:52:52:52
error: Failed to attach interface
error: Cannot set interface MAC to 52:52:52:52:52:52 for ifname enp65s0f0np0 vf 0: Resource temporarily unavailable
ice_check_vf_ready_for_cfg() already contain waiting for reset.
New condition in ice_check_vf_ready_for_reset() causing only problems.
Fixes: 7255355a06 ("ice: Fix ice VF reset during iavf initialization")
Signed-off-by: Petr Oros <poros@redhat.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The driver is misconfiguring the hardware for some values of MTU such that
it could use multiple descriptors to receive a packet when it could have
simply used one.
Change the driver to use a round-up instead of the result of a shift, as
the shift can truncate the lower bits of the size, and result in the
problem noted above. It also aligns this driver with similar code in i40e.
The insidiousness of this problem is that everything works with the wrong
size, it's just not working as well as it could, as some MTU sizes end up
using two or more descriptors, and there is no way to tell that is
happening without looking at ice_trace or a bus analyzer.
Fixes: efc2214b60 ("ice: Add support for XDP")
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-17 (ice)
This series contains updates to ice driver only.
Jan removes unused functions and refactors code to make, possible,
functions static.
Jake rearranges some functions to be logically grouped.
Marcin removes an unnecessary call to disable VLAN stripping.
Yang Yingliang utilizes list_for_each_entry() helper for a couple list
traversals.
Przemek removes some parameters from ice_aq_alloc_free_res() which were
always the same and reworks ice_aq_wait_for_event() to reduce chance of
race.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
ice: split ice_aq_wait_for_event() func into two
ice: embed &ice_rq_event_info event into struct ice_aq_task
ice: ice_aq_check_events: fix off-by-one check when filling buffer
ice: drop two params from ice_aq_alloc_free_res()
ice: use list_for_each_entry() helper
ice: Remove redundant VSI configuration in eswitch setup
ice: move E810T functions to before device agnostic ones
ice: refactor ice_vsi_is_vlan_pruning_ena
ice: refactor ice_ptp_hw to make functions static
ice: refactor ice_sched to make functions static
ice: Utilize assign_bit() helper
ice: refactor ice_vf_lib to make functions static
ice: refactor ice_lib to make functions static
ice: refactor ice_ddp to make functions static
ice: remove unused methods
====================
Link: https://lore.kernel.org/r/20230817212239.2601543-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
virtchnl: fix fake 1-elem arrays
Alexander Lobakin says:
6.5-rc1 started spitting warning splats when composing virtchnl
messages, precisely on virtchnl_rss_key and virtchnl_lut:
[ 84.167709] memcpy: detected field-spanning write (size 52) of single
field "vrk->key" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1095
(size 1)
[ 84.169915] WARNING: CPU: 3 PID: 11 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1095 iavf_set_rss_key+0x123/0x140 [iavf]
...
[ 84.191982] Call Trace:
[ 84.192439] <TASK>
[ 84.192900] ? __warn+0xc9/0x1a0
[ 84.193353] ? iavf_set_rss_key+0x123/0x140 [iavf]
[ 84.193818] ? report_bug+0x12c/0x1b0
[ 84.194266] ? handle_bug+0x42/0x70
[ 84.194714] ? exc_invalid_op+0x1a/0x50
[ 84.195149] ? asm_exc_invalid_op+0x1a/0x20
[ 84.195592] ? iavf_set_rss_key+0x123/0x140 [iavf]
[ 84.196033] iavf_watchdog_task+0xb0c/0xe00 [iavf]
...
[ 84.225476] memcpy: detected field-spanning write (size 64) of single
field "vrl->lut" at drivers/net/ethernet/intel/iavf/iavf_virtchnl.c:1127
(size 1)
[ 84.227190] WARNING: CPU: 27 PID: 1044 at drivers/net/ethernet/intel/
iavf/iavf_virtchnl.c:1127 iavf_set_rss_lut+0x123/0x140 [iavf]
...
[ 84.246601] Call Trace:
[ 84.247228] <TASK>
[ 84.247840] ? __warn+0xc9/0x1a0
[ 84.248263] ? iavf_set_rss_lut+0x123/0x140 [iavf]
[ 84.248698] ? report_bug+0x12c/0x1b0
[ 84.249122] ? handle_bug+0x42/0x70
[ 84.249549] ? exc_invalid_op+0x1a/0x50
[ 84.249970] ? asm_exc_invalid_op+0x1a/0x20
[ 84.250390] ? iavf_set_rss_lut+0x123/0x140 [iavf]
[ 84.250820] iavf_watchdog_task+0xb16/0xe00 [iavf]
Gustavo already tried to fix those back in 2021[0][1]. Unfortunately,
a VM can run a different kernel than the host, meaning that those
structures are sorta ABI.
However, it is possible to have proper flex arrays + struct_size()
calculations and still send the very same messages with the same sizes.
The common rule is:
elem[1] -> elem[]
size = struct_size() + <difference between the old and the new msg size>
The "old" size in the current code is calculated 3 different ways for
10 virtchnl structures total. Each commit addresses one of the ways
cumulatively instead of per-structure.
I was planning to send it to -net initially, but given that virtchnl was
renamed from i40evf and got some fat style cleanup commits in the past,
it's not very straightforward to even pick appropriate SHAs, not
speaking of automatic portability. I may send manual backports for
a couple of the latest supported kernels later on if anyone needs it
at all.
[0] https://lore.kernel.org/all/20210525230912.GA175802@embeddedor
[1] https://lore.kernel.org/all/20210525231851.GA176647@embeddedor
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue:
virtchnl: fix fake 1-elem arrays for structures allocated as `nents`
virtchnl: fix fake 1-elem arrays in structures allocated as `nents + 1`
virtchnl: fix fake 1-elem arrays in structs allocated as `nents + 1` - 1
====================
Link: https://lore.kernel.org/r/20230816210657.1326772-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2023-08-16 (iavf, i40e)
This series contains updates to iavf and i40e drivers.
Piotr adds checks for unsupported Flow Director rules on iavf.
Andrii replaces incorrect 'write' messaging on read operations for i40e.
* '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: fix misleading debug logs
iavf: fix FDIR rule fields masks validation
====================
Link: https://lore.kernel.org/r/20230816193308.1307535-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Mitigate race between registering on wait list and receiving
AQ Response from FW.
ice_aq_prep_for_event() should be called before sending AQ command,
ice_aq_wait_for_event() should be called after sending AQ command,
to wait for AQ Response.
Please note, that this was found by reading the code,
an actual race has not yet materialized.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Expose struct ice_aq_task to callers,
what takes burden of memory ownership out from AQ-wait family of functions,
and reduces need for heap-based allocations.
Embed struct ice_rq_event_info event into struct ice_aq_task
(instead of it being a ptr) to remove some more code from the callers.
Subsequent commit will improve more based on this one.
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Drop @num_entries and @cd params, latter of which was always NULL.
Number of entities to alloc is passed in internal buffer, the outer layer
(that @num_entries was assigned to) meaning is closer to "the number of
requests", which was =1 in all cases.
ice_free_hw_res() was always called with 1 as its @num arg.
Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Commit 885fe6932a ("ice: Add support for SMA control multiplexer")
accidentally placed all of the E810T SMA control functions in the middle of
the device agnostic functions section of ice_ptp_hw.c
This works fine, but makes it harder for readers to follow. The
ice_ptp_hw.c file is laid out such that each hardware family has the
specific functions in one block, with the access functions placed at the
end of the file.
Move the E810T functions so that they are in a block just after the E810
functions. Also move the ice_get_phy_tx_tstamp_ready_e810 which got added
at the end of the E810T block.
This keeps the functions laid out in a logical order and avoids intermixing
the generic access functions with the device specific implementations.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Tested-by: Arpana Arland <arpanax.arland@intel.com> (A Contingent worker at Intel)
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The if/else check for bit setting can be replaced by using the
assign_bit() helper so do so.
Suggested-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
As following methods are not used outside ice_vf_lib,
they can be made static:
ice_vf_rebuild_host_vlan_cfg
ice_vf_rebuild_host_tx_rate_cfg
ice_vf_set_host_trust_cfg
ice_vf_rebuild_host_mac_cfg
ice_vf_rebuild_aggregator_node_cfg
ice_vf_rebuild_host_cfg
ice_set_vf_state_qs_dis
ice_vf_set_initialized
In order to achieve that, the order in which these
were defined was reorganized.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Finally, fix 3 structures which are allocated technically correctly,
i.e. the calculated size equals to the one that struct_size() would
return, except for sizeof(). For &virtchnl_vlan_filter_list_v2, use
the same approach when there are no enough space as taken previously
for &virtchnl_vlan_filter_list, i.e. let the maximum size be calculated
automatically instead of trying to guestimate it using maths.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
There are five virtchnl structures, which are allocated and checked in
the code as `nents + 1`, meaning that they always have memory for one
excessive element regardless of their actual number. This comes from
that their sizeof() includes space for 1 element and then they get
allocated via struct_size() or its open-coded equivalents, passing
the actual number of elements.
Expand virtchnl_struct_size() to handle such structures and replace
those 1-elem arrays with proper flex ones. Also fix several places
which open-code %IAVF_VIRTCHNL_VF_RESOURCE_SIZE. Finally, let the
virtchnl_ether_addr_list size be computed automatically when there's
no enough space for the whole list, otherwise we have to open-code
reverse struct_size() logics.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
The two most problematic virtchnl structures are virtchnl_rss_key and
virtchnl_rss_lut. Their "flex" arrays have the type of u8, thus, when
allocating / checking, the actual size is calculated as `sizeof +
nents - 1 byte`. But their sizeof() is not 1 byte larger than the size
of such structure with proper flex array, it's two bytes larger due to
the padding. That said, their size is always 1 byte larger unless
there are no tail elements -- then it's +2 bytes.
Add virtchnl_struct_size() macro which will handle this case (and later
other cases as well). Make its calling conv the same as we call
struct_size() to allow it to be drop-in, even though it's unlikely to
become possible to switch to generic API. The macro will calculate a
proper size of a structure with a flex array at the end, so that it
becomes transparent for the compilers, but add the difference from the
old values, so that the real size of sorta-ABI-messages doesn't change.
Use it on the allocation side in IAVF and the receiving side (defined
as static inline in virtchnl.h) for the mentioned two structures.
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Rafal Romanowski <rafal.romanowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>