This change adds ftrace support for following functions which
helps in debugging the issues when there is Channel state & MHI
state change and also when we receive data and control events:
1. mhi_intvec_mhi_states
2. mhi_process_data_event_ring
3. mhi_process_ctrl_ev_ring
4. mhi_gen_tre
5. mhi_update_channel_state
6. mhi_tryset_pm_state
7. mhi_pm_st_worker
Change the implementation of the arrays which has enum to strings mapping
to make it consistent in both trace header file and other files.
Where ever the trace events are added, debug messages are removed.
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20240206-ftrace_support-v11-1-3f71dc187544@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Clang warns about a parameter that is decremented but never evaluated here:
bus/mhi/host/main.c:803:13: error: parameter 'event_quota' set but not used [-Werror,-Wunused-but-set-parameter]
u32 event_quota)
Remove the access to the variable to avoid that warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Link: https://lore.kernel.org/r/20230811134547.3231160-1-arnd@kernel.org
[mani: minor spelling fix to commit message]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
If we detect a system error via intvec, we only process the syserr if the
current ee is different than the last observed ee. The reason for this
check is to prevent bhie from running multiple times, but with the single
queue handling syserr, that is not possible.
The check can cause an issue with device recovery. If PBL loads a bad SBL
via BHI, but that SBL hangs before notifying the host of an ee change,
then issuing soc_reset to crash the device and retry (after supplying a
fixed SBL) will not recover the device as the host will observe a PBL->PBL
transition and not process the syserr. The device will be stuck until
either the driver is reloaded, or the host is rebooted. Instead, remove
the check so that we can attempt to recover the device.
Fixes: ef2126c4e2 ("bus: mhi: core: Process execution environment changes serially")
Cc: stable@vger.kernel.org
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://lore.kernel.org/r/1681142292-27571-2-git-send-email-quic_jhugo@quicinc.com
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_poll() API is not used within the MHI stack and also not by any client
drivers in mainline. So let's remove it until any consumer is available.
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
mhi_recycle_ev_ring() computes the shared write pointer for the ring
(ctxt_wp) using a read/modify/write pattern where the ctxt_wp value in the
shared memory is read, incremented, and written back. There are no checks
on the read value, it is assumed that it is kept in sync with the locally
cached value. Per the MHI spec, this is correct. The device should only
read ctxt_wp, never write it.
However, there are devices in the wild that violate the spec, and can
update the ctxt_wp in a specific scenario. This can cause corruption, and
violate the above assumption that the ctxt_wp is in sync with the cached
value.
This can occur when the device has loaded firmware from the host, and is
transitioning from the SBL EE to the AMSS EE. As part of shutting down
SBL, the SBL flushes it's local MHI context to the shared memory since
the local context will not persist across an EE change. In the case of
the event ring, SBL will flush its entire context, not just the parts that
it is allowed to update. This means SBL will write to ctxt_wp, and
possibly corrupt it.
An example:
Host Device
---- ---
Update ctxt_wp to 0x1f0
SBL observes 0x1f0
Update ctxt_wp to 0x0
Starts transition to AMSS EE
Context flush, writes 0x1f0 to ctxt_wp
Update ctxt_wp to 0x200
Update ctxt_wp to 0x210
AMSS observes 0x210
0x210 exceeds ring size
AMSS signals syserr
The reason the ctxt_wp goes off the end of the ring is that the rollover
check is only performed on the cached wp, which is out of sync with
ctxt_wp.
Since the host is the authority of the value of ctxt_wp per the MHI spec,
we can fix this issue by not reading ctxt_wp from the shared memory, and
instead compute it based on the cached value. If SBL corrupts ctxt_wp,
the host won't observe it, and will correct the value at some point later.
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Reviewed-by: Hemant Kumar <quic_hemantk@quicinc.com>
Reviewed-by: Bhaumik Bhatt <quic_bbhatt@quicinc.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/1649868113-18826-1-git-send-email-quic_jhugo@quicinc.com
[mani: used the quicinc domain for Hemant and Bhaumik]
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Functions like mhi_read_reg_field(), mhi_poll_reg_field() and
mhi_write_reg_field() could be modified to not depend on the shift value
passed as an argument. Instead, the bitfield operation could be used to
extract the shift value from the mask itself.
This eliminates the need to define _SHIFT (and _SHFT) macros and
simplifies the code a bit. For shift values those cannot be determined
during build time, "__ffs()" helper is used find the shift value during
runtime.
While at it, let's also get rid of 32-bit masks like CHDBOFF_CHDBOFF_MASK
by doing the full 32-bit register read.
Suggested-by: Alex Elder <elder@linaro.org>
Reviewed-by: Alex Elder <elder@linaro.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Link: https://lore.kernel.org/r/20220301160308.107452-6-manivannan.sadhasivam@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>