This reverts commit 3316ab2b45.
The MHI spec owner pointed out that the SOC_HW_VERSION register is part
of the BHIe segment, and only valid on devices which implement BHIe.
Only a small subset of MHI devices implement BHIe so blindly accessing
the register for all devices is not correct. Also, since the BHIe
segment offset is not used when accessing the register, any
implementation which moves the BHIe segment will result in accessing
some other register. We've seen that accessing this register on AIC100
which does not support BHIe can result in initialization failures.
We could try to put checks into the code to address these issues, but in
the roughly 4 years this functionality has existed, no one has used it.
Easier to drop this dead code and address the issues if anyone comes up
with a real world use for it.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240219180748.1591527-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker
Change the implementation of the arrays which has enum to strings mapping
to make it consistent in both trace header file and other files.
Where ever the trace events are added, debug messages are removed.
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240206-ftrace_support-v11-1-3f71dc187544@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
The OEM PK HASH registers in the BHI region are read once during firmware
load (boot), cached, and displayed on demand via sysfs. This has a few
problems - if firmware load is skipped, the registers will not be read and
if the register values change over the life of the device the local cache
will be out of sync.
Qualcomm Cloud AI 100 can expose both these problems. It is possible for
mhi_async_power_up() to be invoked while the device is in AMSS EE, which
would bypass firmware loading. Also, Qualcomm Cloud AI 100 has 5 PK HASH
slots which can be dynamically provisioned while the device is active,
which would result in the values changing and users may want to know what
keys are active.
Address these concerns by reading the PK HASH registers on-demand during
the sysfs read. This will result in showing the most current information.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240105174253.863388-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
When processing a SYSERR, if the device does not respond to the MHI_RESET
from the host, the host will be stuck in a difficult to recover state.
The host will remain in MHI_PM_SYS_ERR_PROCESS and not clean up the host
channels. Clients will not be notified of the SYSERR via the destruction
of their channel devices, which means clients may think that the device is
still up. Subsequent SYSERR events such as a device fatal error will not
be processed as the state machine cannot transition from PROCESS back to
DETECT. The only way to recover from this is to unload the mhi module
(wipe the state machine state) or for the mhi controller to initiate
SHUTDOWN.
This issue was discovered by stress testing soc_reset events on AIC100
via the sysfs node.
soc_reset is processed entirely in hardware. When the register write
hits the endpoint hardware, it causes the soc to reset without firmware
involvement. In stress testing, there is a rare race where soc_reset N
will cause the soc to reset and PBL to signal SYSERR (fatal error). If
soc_reset N+1 is triggered before PBL can process the MHI_RESET from the
host, then the soc will reset again, and re-run PBL from the beginning.
This will cause PBL to lose all state. PBL will be waiting for the host
to respond to the new syserr, but host will be stuck expecting the
previous MHI_RESET to be processed.
Additionally, the AMSS EE firmware (QSM) was hacked to synthetically
reproduce the issue by simulating a FW hang after the QSM issued a
SYSERR. In this case, soc_reset would not recover the device.
For this failure case, to recover the device, we need a state similar to
PROCESS, but can transition to DETECT. There is not a viable existing
state to use. POR has the needed transitions, but assumes the device is
in a good state and could allow the host to attempt to use the device.
Allowing PROCESS to transition to DETECT invites the possibility of
parallel SYSERR processing which could get the host and device out of
sync.
Thus, invent a new state - MHI_PM_SYS_ERR_FAIL
This essentially a holding state. It allows us to clean up the host
elements that are based on the old state of the device (channels), but
does not allow us to directly advance back to an operational state. It
does allow the detection and processing of another SYSERR which may
recover the device, or allows the controller to do a clean shutdown.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20240112180800.536733-1-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Clang warns about a parameter that is decremented but never evaluated here:
bus/mhi/host/main.c:803:13: error: parameter 'event_quota' set but not used [-Werror,-Wunused-but-set-parameter]
u32 event_quota)
Remove the access to the variable to avoid that warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20230811134547.3231160-1-arnd@kernel.org
[mani: minor spelling fix to commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Currently MHI loads the firmware image from the path provided by client
devices. ath11k needs to support firmware image embedded along with meta
data (named as firmware-2.bin). So allow the client driver to request the
firmware file from user space on it's own and provide the firmware image
data and size to MHI via a pointer struct mhi_controller::fw_data.
This is an optional feature, if fw_data is NULL MHI load the firmware using
the name from struct mhi_controller::fw_image string as before.
Tested with ath11k and WCN6855 hw2.0.
Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20230727100430.3603551-2-kvalo@kernel.org
[mani: wrapped commit message to 75 columns]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
If firmware loading fails, the controller's pm_state is updated to
MHI_PM_FW_DL_ERR unconditionally. This can corrupt the pm_state as the
update is not done under the proper lock, and also does not validate
the state transition. The firmware loading can fail due to a detected
syserr, but if MHI_PM_FW_DL_ERR is unconditionally set as the pm_state,
the handling of the syserr can break when it attempts to transition from
syserr detect, to syserr process.
By grabbing the lock, we ensure we don't race with some other pm_state
update. By using mhi_try_set_pm_state(), we check that the transition
to MHI_PM_FW_DL_ERR is valid via the state machine logic. If it is not
valid, then some other transition is occurring like syserr processing, and
we assume that will resolve the firmware loading error.
Fixes: 12e050c77b ("bus: mhi: core: Move to an error state on any firmware load failure")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1681142292-27571-3-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
If we detect a system error via intvec, we only process the syserr if the
current ee is different than the last observed ee. The reason for this
check is to prevent bhie from running multiple times, but with the single
queue handling syserr, that is not possible.
The check can cause an issue with device recovery. If PBL loads a bad SBL
via BHI, but that SBL hangs before notifying the host of an ee change,
then issuing soc_reset to crash the device and retry (after supplying a
fixed SBL) will not recover the device as the host will observe a PBL->PBL
transition and not process the syserr. The device will be stuck until
either the driver is reloaded, or the host is rebooted. Instead, remove
the check so that we can attempt to recover the device.
Fixes: ef2126c4e2 ("bus: mhi: core: Process execution environment changes serially")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1681142292-27571-2-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
This reverts commit 2d5253a096.
There are 2 commits with commit message "Add a secondary AT port to Telit
FN990":
commit 2d5253a096 ("bus: mhi: host: pci_generic: Add a secondary AT port
to Telit FN990")
commit 479aa3b0ec ("bus: mhi: host: pci_generic: Add a secondary AT port
to Telit FN990")
This turned out to be due to the patch getting applied through different
trees and git settled on a resolution while applying it second time. But
the second AT port of Foxconn devices don't work in PCIe mode. So the
second commit needs to be reverted.
Cc: stable@vger.kernel.org # 6.2
Fixes: 2d5253a096 ("bus: mhi: host: pci_generic: Add a secondary AT port to Telit FN990")
Signed-off-by: Slark Xiao <slark_xiao@163.com>
Reviewed-by: Fabio Porcedda <fabio.porcedda@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/20230310101715.69209-1-slark_xiao@163.com
[mani: massaged the commit message a bit, added fixes tag and CCed stable]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
pci_enable_pcie_error_reporting() enables the device to send ERR_*
Messages. Since commit <f26e58bf6f54> ("PCI/AER: Enable error reporting
when AER is native"), the PCI core does this for all devices during
enumeration, so the driver doesn't need to do it itself.
Remove the redundant pci_enable_pcie_error_reporting() call from the
driver. Also remove the corresponding pci_disable_pcie_error_reporting()
from the driver .remove() path.
Note that this only controls ERR_* Messages from the device. An ERR_*
Message may cause the Root Port to generate an interrupt, depending on the
AER Root Error Command register managed by the AER service driver.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20230307201625.879567-1-helgaas@kernel.org
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_poll() API is not used within the MHI stack and also not by any client
drivers in mainline. So let's remove it until any consumer is available.
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work
falls into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be
moved into read-only memory (i.e. const) The recent work with Rust
has pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only
making things safer overall. This is the contuation of that work
(started last release with kobject changes) in moving struct
bus_type to be constant. We didn't quite make it for this release,
but the remaining patches will be finished up for the release after
this one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems"
[ Geert Uytterhoeven points out that that last sentence isn't true, and
that there's a pending report that has a fix that is queued up - Linus ]
* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
OPP: fix error checking in opp_migrate_dentry()
debugfs: update comment of debugfs_rename()
i3c: fix device.h kernel-doc warnings
dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
Revert "driver core: add error handling for devtmpfs_create_node()"
Revert "devtmpfs: add debug info to handle()"
Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
driver core: cpu: don't hand-override the uevent bus_type callback.
devtmpfs: remove return value of devtmpfs_delete_node()
devtmpfs: add debug info to handle()
driver core: add error handling for devtmpfs_create_node()
driver core: bus: update my copyright notice
driver core: bus: add bus_get_dev_root() function
driver core: bus: constify bus_unregister()
driver core: bus: constify some internal functions
driver core: bus: constify bus_get_kset()
driver core: bus: constify bus_register/unregister_notifier()
driver core: remove private pointer from struct bus_type
...
Pull char/misc driver updates from Greg KH:
"Here is the large set of char/misc and other driver subsystem changes
for 6.2-rc1. Nothing earth-shattering in here at all, just a lot of
new driver development and minor fixes.
Highlights include:
- fastrpc driver updates
- iio new drivers and updates
- habanalabs driver updates for new hardware and features
- slimbus driver updates
- speakup module parameters added to aid in boot time configuration
- i2c probe_new conversions for lots of different drivers
- other small driver fixes and additions
One semi-interesting change in here is the increase of the number of
misc dynamic minors available to 1048448 to handle new huge-cpu
systems.
All of these have been in linux-next for a while with no reported
problems"
* tag 'char-misc-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (521 commits)
extcon: usbc-tusb320: Convert to i2c's .probe_new()
extcon: rt8973: Convert to i2c's .probe_new()
extcon: fsa9480: Convert to i2c's .probe_new()
extcon: max77843: Replace irqchip mask_invert with unmask_base
chardev: fix error handling in cdev_device_add()
mcb: mcb-parse: fix error handing in chameleon_parse_gdd()
drivers: mcb: fix resource leak in mcb_probe()
coresight: etm4x: fix repeated words in comments
coresight: cti: Fix null pointer error on CTI init before ETM
coresight: trbe: remove cpuhp instance node before remove cpuhp state
counter: stm32-lptimer-cnt: fix the check on arr and cmp registers update
misc: fastrpc: Add dma_mask to fastrpc_channel_ctx
misc: fastrpc: Add mmap request assigning for static PD pool
misc: fastrpc: Safekeep mmaps on interrupted invoke
misc: fastrpc: Add support for audiopd
misc: fastrpc: Rework fastrpc_req_munmap
misc: fastrpc: Use fastrpc_map_put in fastrpc_map_create on fail
misc: fastrpc: Add fastrpc_remote_heap_alloc
misc: fastrpc: Add reserved mem support
misc: fastrpc: Rename audio protection domain to root
...
These cases were done with this Coccinelle:
@@
expression H;
expression L;
@@
- (get_random_u32_below(H) + L)
+ get_random_u32_inclusive(L, H + L - 1)
@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- + E
- - E
)
@@
expression H;
expression L;
expression E;
@@
get_random_u32_inclusive(L,
H
- - E
- + E
)
@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- - E
+ F
- + E
)
@@
expression H;
expression L;
expression E;
expression F;
@@
get_random_u32_inclusive(L,
H
- + E
+ F
- - E
)
And then subsequently cleaned up by hand, with several automatic cases
rejected if it didn't make sense contextually.
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The Foxconn T99W175 modem has an HP variant, which has
the following output from lspci:
01:00.0 Wireless controller [0d40]: Device 03f0:0a6c
It also has some HP-specific serial numbers on the
metal case. It works well with this driver, so add
support for this to the pci_generic driver.
Signed-off-by: Song Fuchang <song.fc@gmail.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
[mani: manually applied the patch]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
There is a race condition where mhi_prepare_channel() updates the
read and write pointers as the base address and in parallel, if
an M0 transition occurs, the tasklet goes ahead and rings
doorbells for all channels with a delta in TRE rings assuming
they are already enabled. This causes a null pointer access. Fix
it by adding a channel enabled check before ringing channel
doorbells.
Cc: stable@vger.kernel.org # 5.19
Fixes: a6e2e3522f "bus: mhi: core: Add support for PM state transitions"
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1665889532-13634-1-git-send-email-quic_qianyu@quicinc.com
[mani: CCed stable list]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Pull char/misc and other driver updates from Greg KH:
"Here is the large set of char/misc and other small driver subsystem
changes for 6.1-rc1. Loads of different things in here:
- IIO driver updates, additions, and changes. Probably the largest
part of the diffstat
- habanalabs driver update with support for new hardware and
features, the second largest part of the diff.
- fpga subsystem driver updates and additions
- mhi subsystem updates
- Coresight driver updates
- gnss subsystem updates
- extcon driver updates
- icc subsystem updates
- fsi subsystem updates
- nvmem subsystem and driver updates
- misc driver updates
- speakup driver additions for new features
- lots of tiny driver updates and cleanups
All of these have been in the linux-next tree for a while with no
reported issues"
* tag 'char-misc-6.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (411 commits)
w1: Split memcpy() of struct cn_msg flexible array
spmi: pmic-arb: increase SPMI transaction timeout delay
spmi: pmic-arb: block access for invalid PMIC arbiter v5 SPMI writes
spmi: pmic-arb: correct duplicate APID to PPID mapping logic
spmi: pmic-arb: add support to dispatch interrupt based on IRQ status
spmi: pmic-arb: check apid against limits before calling irq handler
spmi: pmic-arb: do not ack and clear peripheral interrupts in cleanup_irq
spmi: pmic-arb: handle spurious interrupt
spmi: pmic-arb: add a print in cleanup_irq
drivers: spmi: Directly use ida_alloc()/free()
MAINTAINERS: add TI ECAP driver info
counter: ti-ecap-capture: capture driver support for ECAP
Documentation: ABI: sysfs-bus-counter: add frequency & num_overflows items
dt-bindings: counter: add ti,am62-ecap-capture.yaml
counter: Introduce the COUNTER_COMP_ARRAY component type
counter: Consolidate Counter extension sysfs attribute creation
counter: Introduce the Count capture component
counter: 104-quad-8: Add Signal polarity component
counter: Introduce the Signal polarity component
counter: interrupt-cnt: Implement watch_validate callback
...
Manivannan writes:
"MHI Host
--------
- Print the modem name while probing the MHI host pci-generic driver. This has
been exposed as a debug information so far but on a low storate embedded
devices such as OpenWRT based products, this helps in identifying the
attached modem without enabling the debug logs."
* tag 'mhi-for-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/mani/mhi:
bus: mhi: host: always print detected modem name
This harmless print provides a very easy way of knowing
if the modem is detected properly during probing.
Promote it to an informational print so no hassle is required
enabling kernel debugging info to obtain it.
The rationale here is that:
On a lot of low-storage embedded devices, extensive kernel
debugging info is not always present as this would
increase it's size to much causing partition size issues.
Signed-off-by: Koen Vandeputte <koen.vandeputte@citymesh.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Loic Poulain <loic.poulain@linaro.org>
Link: https://lore.kernel.org/r/20220831100349.1488762-1-koen.vandeputte@citymesh.com
[mani: added missing review tags]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
During runtime, the MHI endpoint may be powered up/down several times.
So instead of allocating and destroying the IRQs all the time, let's just
enable/disable IRQs during power up/down.
The IRQs will be allocated during mhi_register_controller() and freed
during mhi_unregister_controller(). This works well for things like PCI
hotplug also as once the PCI device gets removed, the controller will
get unregistered. And once it comes back, it will get registered back
and even if the IRQ configuration changes (MSI), that will get accounted.
Signed-off-by: Qiang Yu <quic_qianyu@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1655952183-66792-1-git-send-email-quic_qianyu@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>