When a remote device sends a completion event to the host, it contains a
pointer to the consumed TRE. The host uses this pointer to process all of
the TREs between it and the host's local copy of the ring's read pointer.
This works when processing completion for chained transactions, but can
lead to nasty results if the device sends an event for a single-element
transaction with a read pointer that is multiple elements ahead of the
host's read pointer.
For instance, if the host accesses an event ring while the device is
updating it, the pointer inside of the event might still point to an old
TRE. If the host uses the channel's xfer_cb() to directly free the buffer
pointed to by the TRE, the buffer will be double-freed.
This behavior was observed on an ep that used upstream EP stack without
'commit 6f18d174b7 ("bus: mhi: ep: Update read pointer only after buffer
is written")'. Where the device updated the events ring pointer before
updating the event contents, so it left a window where the host was able to
access the stale data the event pointed to, before the device had the
chance to update them. The usual pattern was that the host received an
event pointing to a TRE that is not immediately after the last processed
one, so it got treated as if it was a chained transaction, processing all
of the TREs in between the two read pointers.
This commit aims to harden the host by ensuring transactions where the
event points to a TRE that isn't local_rp + 1 are chained.
Fixes: 1d3173a3ba ("bus: mhi: core: Add support for processing events from client device")
Signed-off-by: Youssef Samir <quic_yabdulra@quicinc.com>
[mani: added stable tag and reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250714163039.3438985-1-quic_yabdulra@quicinc.com
T99W696 modem is based on Qualcomm SDX61 chipset, which is an economic
version compared to the baseline SDX62/SDX65 chipsets. Add support for it
by introducing a new 'mhi_channel_config'. Since this modem supports the
NMEA channel, a new config is needed.
Signed-off-by: Slark Xiao <slark_xiao@163.com>
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250528092232.16111-1-slark_xiao@163.com
Add MHI controller config for EM929x. It uses the same configuration
as EM919x. Also set the MRU to 32768 to improve downlink throughput.
02:00.0 Unassigned class [ff00]: Qualcomm Technologies, Inc Device 0308
Subsystem: Device 18d7:0301
Signed-off-by: Adam Xue <zxue@semtech.com>
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20250528175943.12739-1-zxue@semtech.com
On big endian platform like PowerPC, the MHI bus (which is little endian)
does not start properly. The following example shows the error messages by
using QCN9274 WLAN device with ath12k driver:
ath12k_pci 0001:01:00.0: BAR 0: assigned [mem 0xc00000000-0xc001fffff 64bit]
ath12k_pci 0001:01:00.0: MSI vectors: 1
ath12k_pci 0001:01:00.0: Hardware name: qcn9274 hw2.0
ath12k_pci 0001:01:00.0: failed to set mhi state: POWER_ON(2)
ath12k_pci 0001:01:00.0: failed to start mhi: -110
ath12k_pci 0001:01:00.0: failed to power up :-110
ath12k_pci 0001:01:00.0: failed to create soc core: -110
ath12k_pci 0001:01:00.0: failed to init core: -110
ath12k_pci: probe of 0001:01:00.0 failed with error -110
The issue seems to be with the incorrect DMA address/size used for
transferring the firmware image over BHI. So fix it by converting the DMA
address and size of the BHI vector table to little endian format before
sending them to the device.
Fixes: 6cd330ae76 ("bus: mhi: core: Add support for ringing channel/event ring doorbells")
Signed-off-by: Alexander Wilhelm <alexander.wilhelm@westermo.com>
[mani: added stable tag and reworded commit message]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250519145837.958153-1-alexander.wilhelm@westermo.com
Manivannan writes:
MHI Host
========
- Fix conflict between MHI power up and SYSERR state transitions by issuing MHI
reset only if the device is in SYSERR state while in SBL/PBL EEs. The device
won't respond to reset if it is not in SYSERR state in SBL/PBL EEs.
- Remove redundant call to pci_assign_resource() since PCI core calls this API
internally.
- Add Telit FN920C04 modem which is based on Qcom SDX35 chipset.
* tag 'mhi-for-v6.16' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: pci_generic: Add Telit FN920C04 modem support
bus: mhi: host: pci_generic: Remove redundant assign resource usage
bus: mhi: host: Fix conflict between power_up and SYSERR
When mhi_async_power_up() enables IRQs, it is possible that we could
receive a SYSERR notification from the device if the firmware has crashed
for some reason. Then the SYSERR notification queues a work item that
cannot execute until the pm_mutex is released by mhi_async_power_up().
So the SYSERR work item will be pending. If mhi_async_power_up() detects
the SYSERR, it will handle it. If the device is in PBL, then the PBL state
transition event will be queued, resulting in a work item after the
pending SYSERR work item. Once mhi_async_power_up() releases the pm_mutex,
the SYSERR work item can run. It will blindly attempt to reset the MHI
state machine, which is the recovery action for SYSERR. PBL/SBL are not
interrupt driven and will ignore the MHI Reset unless SYSERR is actively
advertised. This will cause the SYSERR work item to timeout waiting for
reset to be cleared, and will leave the host state in SYSERR processing.
The PBL transition work item will then run, and immediately fail because
SYSERR processing is not a valid state for PBL transition.
This leaves the device uninitialized.
This issue has a fairly unique signature in the kernel log:
mhi mhi3: Requested to power ON
Qualcomm Cloud AI 100 0000:36:00.0: Fatal error received from
device. Attempting to recover
mhi mhi3: Power on setup success
mhi mhi3: Device failed to exit MHI Reset state
mhi mhi3: Device MHI is not in valid state
We cannot remove the SYSERR handling from mhi_async_power_up() because the
device may be in the SYSERR state, but we missed the notification as the
irq was fired before irqs were enabled. We also can't queue the SYSERR work
item from mhi_async_power_up() if SYSERR is detected because that may
result in a duplicate work item, and cause the same issue since the
duplicate item will blindly issue MHI reset even if SYSERR is no longer
active.
Instead, add a check in the SYSERR work item to make sure that MHI reset is
only issued if the device is in SYSERR state for PBL or SBL EEs.
Fixes: a6e2e3522f ("bus: mhi: core: Add support for PM state transitions")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Troy Hanson <quic_thanson@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20250328163526.3365497-1-jeff.hugo@oss.qualcomm.com
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree
over and remove the historical wrapper inlines.
Conversion was done with coccinelle plus manual fixups where necessary.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull char / misc / IIO driver updates from Greg KH:
"Here is the big set of char, misc, iio, and other smaller driver
subsystems for 6.15-rc1. Lots of stuff in here, including:
- loads of IIO changes and driver updates
- counter driver updates
- w1 driver updates
- faux conversions for some drivers that were abusing the platform
bus interface
- coresight driver updates
- rust miscdevice binding updates based on real-world-use
- other minor driver updates
All of these have been in linux-next with no reported issues for quite
a while"
* tag 'char-misc-6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
samples: rust_misc_device: fix markup in top-level docs
Coresight: Fix a NULL vs IS_ERR() bug in probe
misc: lis3lv02d: convert to use faux_device
tlclk: convert to use faux_device
regulator: dummy: convert to use the faux device interface
bus: mhi: host: Fix race between unprepare and queue_buf
coresight: configfs: Constify struct config_item_type
doc: iio: ad7380: describe offload support
iio: ad7380: add support for SPI offload
iio: light: Add check for array bounds in veml6075_read_int_time_ms
iio: adc: ti-ads7924 Drop unnecessary function parameters
staging: iio: ad9834: Use devm_regulator_get_enable()
staging: iio: ad9832: Use devm_regulator_get_enable()
iio: gyro: bmg160_spi: add of_match_table
dt-bindings: iio: adc: Add i.MX94 and i.MX95 support
iio: adc: ad7768-1: remove unnecessary locking
Documentation: ABI: add wideband filter type to sysfs-bus-iio
iio: adc: ad7768-1: set MOSI idle state to prevent accidental reset
iio: adc: ad7768-1: Fix conversion result sign
iio: adc: ad7124: Benefit of dev = indio_dev->dev.parent in ad7124_parse_channel_config()
...
Manivannan writes:
MHI Host
========
- Remove unused mhi_device_get() and mhi_queue_dma() APIs.
- Add support for SA8775p MHI endpoint device with IP_SW0 channel.
- Fix the race between mhi_unprepare_from_transfer() and mhi_queue_buf().
* tag 'mhi-for-v6.15' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: Fix race between unprepare and queue_buf
bus: mhi: host: pci_generic: Add support for SA8775P endpoint
bus: mhi: host: Remove unused functions
A client driver may use mhi_unprepare_from_transfer() to quiesce
incoming data during the client driver's tear down. The client driver
might also be processing data at the same time, resulting in a call to
mhi_queue_buf() which will invoke mhi_gen_tre(). If mhi_gen_tre() runs
after mhi_unprepare_from_transfer() has torn down the channel, a panic
will occur due to an invalid dereference leading to a page fault.
This occurs because mhi_gen_tre() does not verify the channel state
after locking it. Fix this by having mhi_gen_tre() confirm the channel
state is valid, or return error to avoid accessing deinitialized data.
Cc: stable@vger.kernel.org # 6.8
Fixes: b89b6a863d ("bus: mhi: host: Add spinlock to protect WP access when queueing TREs")
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Reviewed-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com>
Reviewed-by: Youssef Samir <quic_yabdulra@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Troy Hanson <quic_thanson@quicinc.com>
Link: https://lore.kernel.org/r/20250306172913.856982-1-jeff.hugo@oss.qualcomm.com
[mani: added stable tag]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
There are multiple places from where the recovery work gets scheduled
asynchronously. Also, there are multiple places where the caller waits
synchronously for the recovery to be completed. One such place is during
the PM shutdown() callback.
If the device is not alive during recovery_work, it will try to reset the
device using pci_reset_function(). This function internally will take the
device_lock() first before resetting the device. By this time, if the lock
has already been acquired, then recovery_work will get stalled while
waiting for the lock. And if the lock was already acquired by the caller
which waits for the recovery_work to be completed, it will lead to
deadlock.
This is what happened on the X1E80100 CRD device when the device died
before shutdown() callback. Driver core calls the driver's shutdown()
callback while holding the device_lock() leading to deadlock.
And this deadlock scenario can occur on other paths as well, like during
the PM suspend() callback, where the driver core would hold the
device_lock() before calling driver's suspend() callback. And if the
recovery_work was already started, it could lead to deadlock. This is also
observed on the X1E80100 CRD.
So to fix both issues, use pci_try_reset_function() in recovery_work. This
function first checks for the availability of the device_lock() before
trying to reset the device. If the lock is available, it will acquire it
and reset the device. Otherwise, it will return -EAGAIN. If that happens,
recovery_work will fail with the error message "Recovery failed" as not
much could be done.
Cc: stable@vger.kernel.org # 5.12
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/mhi/Z1me8iaK7cwgjL92@hovoldconsulting.com
Fixes: 7389337f0a ("mhi: pci_generic: Add suspend/resume/recovery procedure")
Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Analyzed-by: Johan Hovold <johan@kernel.org>
Link: https://lore.kernel.org/mhi/Z2KKjWY2mPen6GPL@hovoldconsulting.com/
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20250108-mhi_recovery_fix-v1-1-a0a00a17da46@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
We need the IIO fixes in here as well, and it resolves a merge conflict
in:
drivers/iio/adc/ti-ads1119.c
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Foxconn uses a unique firmware for their MHI based modems. So the generic
firmware from Qcom won't work. Hence, update the EDL firmware path to
include the 'foxconn' subdirectory based on the modem SoC so that the
Foxconn specific firmware could be used.
Respective firmware will be upstreamed to linux-firmware repo.
Cc: stable@vger.kernel.org # 6.11
Fixes: bf30a75e6e ("bus: mhi: host: Add support for Foxconn SDX72 modems")
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240725022941.65948-1-slark_xiao@163.com
[mani: Reworded the subject and description]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Add Netprisma LCUR57 and FCUN69 hardware revision:
LCUR57:
02:00.0 Unassigned class [ff00]: Device 203e:1000
Subsystem: Device 203e:1000
FCUN69:
02:00.0 Unassigned class [ff00]: Device 203e:1001
Subsystem: Device 203e:1001
Both of these modules create IP interfaces through MBIM.
And these modules can be checked for successful recognition through the
following command:
$ mmcli -L
/org/freedesktop/ModemManager1/Modem/0 [NetPrisma] LCUR57-WWD
$ mmcli -L
/org/freedesktop/ModemManager1/Modem/0 [NetPrisma] FCUN69-WWD
Signed-off-by: Mank Wang <mank.wang@netprisma.us>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/PH7PR22MB30386647BE2D813B502226CF81942@PH7PR22MB3038.namprd22.prod.outlook.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Pull driver core updates from Greg KH:
"Here is the big set of driver core changes for 6.11-rc1.
Lots of stuff in here, with not a huge diffstat, but apis are evolving
which required lots of files to be touched. Highlights of the changes
in here are:
- platform remove callback api final fixups (Uwe took many releases
to get here, finally!)
- Rust bindings for basic firmware apis and initial driver-core
interactions.
It's not all that useful for a "write a whole driver in rust" type
of thing, but the firmware bindings do help out the phy rust
drivers, and the driver core bindings give a solid base on which
others can start their work.
There is still a long way to go here before we have a multitude of
rust drivers being added, but it's a great first step.
- driver core const api changes.
This reached across all bus types, and there are some fix-ups for
some not-common bus types that linux-next and 0-day testing shook
out.
This work is being done to help make the rust bindings more safe,
as well as the C code, moving toward the end-goal of allowing us to
put driver structures into read-only memory. We aren't there yet,
but are getting closer.
- minor devres cleanups and fixes found by code inspection
- arch_topology minor changes
- other minor driver core cleanups
All of these have been in linux-next for a very long time with no
reported problems"
* tag 'driver-core-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (55 commits)
ARM: sa1100: make match function take a const pointer
sysfs/cpu: Make crash_hotplug attribute world-readable
dio: Have dio_bus_match() callback take a const *
zorro: make match function take a const pointer
driver core: module: make module_[add|remove]_driver take a const *
driver core: make driver_find_device() take a const *
driver core: make driver_[create|remove]_file take a const *
firmware_loader: fix soundness issue in `request_internal`
firmware_loader: annotate doctests as `no_run`
devres: Correct code style for functions that return a pointer type
devres: Initialize an uninitialized struct member
devres: Fix memory leakage caused by driver API devm_free_percpu()
devres: Fix devm_krealloc() wasting memory
driver core: platform: Switch to use kmemdup_array()
driver core: have match() callback in struct bus_type take a const *
MAINTAINERS: add Rust device abstractions to DRIVER CORE
device: rust: improve safety comments
MAINTAINERS: add Danilo as FIRMWARE LOADER maintainer
MAINTAINERS: add Rust FW abstractions to FIRMWARE LOADER
firmware: rust: improve safety comments
...
In the match() callback, the struct device_driver * should not be
changed, so change the function callback to be a const *. This is one
step of many towards making the driver core safe to have struct
device_driver in read-only memory.
Because the match() callback is in all busses, all busses are modified
to handle this properly. This does entail switching some container_of()
calls to container_of_const() to properly handle the constant *.
For some busses, like PCI and USB and HV, the const * is cast away in
the match callback as those busses do want to modify those structures at
this point in time (they have a local lock in the driver structure.)
That will have to be changed in the future if they wish to have their
struct device * in read-only-memory.
Cc: Rafael J. Wysocki <rafael@kernel.org>
Reviewed-by: Alex Elder <elder@kernel.org>
Acked-by: Sumit Garg <sumit.garg@linaro.org>
Link: https://lore.kernel.org/r/2024070136-wrongdoer-busily-01e8@gregkh
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Currently, a single 'mhi_pci_dev_info' is shared across different product
families. Even though it makes the device functional, it misleads the users
by sharing the common product name.
For instance, below message will be printed for Foxconn SDX62 modem during
boot:
"MHI PCI device found: foxconn-sdx65"
But this is quite misleading to the users since the actual modem plugged in
could be 'T99W373' which is based on SDX62.
So fix this issue by using a unique 'mhi_pci_dev_info' for product
families. This allows us to specify a unique product name for each product
family. Also, once this name is exposed to client drivers, they may use
this name to identify the modems and use any modem specific configuration.
Modems of unknown product families are not impacted by this change.
CC: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Slark Xiao <slark_xiao@163.com>
Link: https://lore.kernel.org/r/20240626053237.4227-1-manivannan.sadhasivam@linaro.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
With the rework of how the __string() handles dynamic strings where it
saves off the source string in field in the helper structure[1], the
assignment of that value to the trace event field is stored in the helper
value and does not need to be passed in again.
This means that with:
__string(field, mystring)
Which use to be assigned with __assign_str(field, mystring), no longer
needs the second parameter and it is unused. With this, __assign_str()
will now only get a single parameter.
There's over 700 users of __assign_str() and because coccinelle does not
handle the TRACE_EVENT() macro I ended up using the following sed script:
git grep -l __assign_str | while read a ; do
sed -e 's/\(__assign_str([^,]*[^ ,]\) *,[^;]*/\1)/' $a > /tmp/test-file;
mv /tmp/test-file $a;
done
I then searched for __assign_str() that did not end with ';' as those
were multi line assignments that the sed script above would fail to catch.
Note, the same updates will need to be done for:
__assign_str_len()
__assign_rel_str()
__assign_rel_str_len()
I tested this with both an allmodconfig and an allyesconfig (build only for both).
[1] https://lore.kernel.org/linux-trace-kernel/20240222211442.634192653@goodmis.org/
Link: https://lore.kernel.org/linux-trace-kernel/20240516133454.681ba6a0@rorschach.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Julia Lawall <Julia.Lawall@inria.fr>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Christian König <christian.koenig@amd.com> for the amdgpu parts.
Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> #for
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # for thermal
Acked-by: Takashi Iwai <tiwai@suse.de>
Acked-by: Darrick J. Wong <djwong@kernel.org> # xfs
Tested-by: Guenter Roeck <linux@roeck-us.net>
Some of the MHI modems like SDX65 based ones are capable of entering the
EDL mode as per the standard triggering mechanism defined in the MHI spec
v1.2. So let's add a common mhi_pci_generic_edl_trigger() function that
triggers the EDL mode in the device when user writes to the
/sys/bus/mhi/devices/.../trigger_edl file.
As per the spec, the EDL mode can be triggered by writing a cookie to the
EDL doorbell register and then resetting the device.
Devices supporting this standard way of entering EDL mode can set the
mhi_pci_dev_info::edl_trigger flag.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1713928915-18229-4-git-send-email-quic_qianyu@quicinc.com
[mani: reworded commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Some controllers may want to access a specific doorbell register. Hence add
a new API that reads the CHDBOFF register and returns the offset of the
doorbell registers from MMIO base, so that the controller can calculate the
address of the specific doorbell register by adding the register offset
with doorbell offset and MMIO base address.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/1713928915-18229-3-git-send-email-quic_qianyu@quicinc.com
[mani: reworded commit message and Kdoc]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Add sysfs entry to allow users of MHI bus to force device to enter EDL
(Emergency Download) mode to download the device firmware. Since there is
no guarantee that all the devices will support EDL mode, the sysfs entry
is kept as an optional one and will appear only for the supported devices.
Controllers supporting the EDL mode are expected to provide edl_trigger()
callback that puts the device into EDL mode.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1713928915-18229-2-git-send-email-quic_qianyu@quicinc.com
[mani: fixed the kernel version and reworded the commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>