The polled UART operations are used by the kernel debugger (KDB, KGDB),
which can interrupt the kernel at any point in time. The current
Qualcomm GENI implementation does not really work when there is on-going
serial output as it inadvertently "hijacks" the current tx command,
which can result in both the initial debugger output being corrupted as
well as the corruption of any on-going serial output (up to 4k
characters) when execution resumes:
0190: abcdefghijklmnopqrstuvwxyz0123456789 0190: abcdefghijklmnopqrstuvwxyz0123456789
0191: abcdefghijklmnop[ 50.825552] sysrq: DEBUG
qrstuvwxyz0123456789 0191: abcdefghijklmnopqrstuvwxyz0123456789
Entering kdb (current=0xffff53510b4cd280, pid 640) on processor 2 due to Keyboard Entry
[2]kdb> go
omlji3h3h2g2g1f1f0e0ezdzdycycxbxbwawav :t72r2rp
o9n976k5j5j4i4i3h3h2g2g1f1f0e0ezdzdycycxbxbwawavu:t7t8s8s8r2r2q0q0p
o9n9n8ml6k6k5j5j4i4i3h3h2g2g1f1f0e0ezdzdycycxbxbwawav v u:u:t9t0s4s4rq0p
o9n9n8m8m7l7l6k6k5j5j40q0p p o
o9n9n8m8m7l7l6k6k5j5j4i4i3h3h2g2g1f1f0e0ezdzdycycxbxbwawav :t8t9s4s4r4r4q0q0p
Fix this by making sure that the polled output implementation waits for
the tx fifo to drain before cancelling any on-going longer transfers. As
the polled code cannot take any locks, leave the state variables as they
are and instead make sure that the interrupt handler always starts a new
tx command when there is data in the write buffer.
Since the debugger can interrupt the interrupt handler when it is
writing data to the tx fifo, it is currently not possible to fully
prevent losing up to 64 bytes of tty output on resume.
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable@vger.kernel.org # 4.17
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240906131336.23625-9-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Disable the GENI interrupts during console writes to reduce the risk of
having interrupt handlers spinning on the port lock on other cores for
extended periods of time.
This can, for example, reduce the total amount of time spent in the
interrupt handler during boot of the x1e80100 CRD by up to a factor nine
(e.g. from 274 ms to 30 ms) while the worst case processing time drops
from 19 ms to 8 ms.
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240906131336.23625-8-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Qualcomm serial console implementation is broken and can lose
characters when the serial port is also used for tty output.
Specifically, the console code only waits for the current tx command to
complete when all data has already been written to the fifo. When there
are on-going longer transfers this often means that console output is
lost when the console code inadvertently "hijacks" the current tx
command instead of starting a new one.
This can, for example, be observed during boot when console output that
should have been interspersed with init output is truncated:
[ 9.462317] qcom-snps-eusb2-hsphy fde000.phy: Registered Qcom-eUSB2 phy
[ OK ] Found device KBG50ZNS256G KIOXIA Wi[ 9.471743ndows.
[ 9.539915] xhci-hcd xhci-hcd.0.auto: xHCI Host Controller
Add a new state variable to track how much data has been written to the
fifo and use it to determine when the fifo and shift register are both
empty. This is needed since there is currently no other known way to
determine when the shift register is empty.
This in turn allows the console code to interrupt long transfers without
losing data.
Note that the oops-in-progress case is similarly broken as it does not
cancel any active command and also waits for the wrong status flag when
attempting to drain the fifo (TX_FIFO_NOT_EMPTY_EN is only set when
cancelling a command leaves data in the fifo).
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Fixes: a1fee899e5 ("tty: serial: qcom_geni_serial: Fix softlock")
Fixes: 9e957a1550 ("serial: qcom-geni: Don't cancel/abort if we can't get the port lock")
Cc: stable@vger.kernel.org # 4.17
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240906131336.23625-7-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Commit 663abb1a7a ("tty: serial: qcom_geni_serial: Fix UART hang")
addressed an issue with stalled tx after the console code interrupted
the last bytes of a tx command by reenabling the watermark interrupt if
there is data in write buffer. This can however break software flow
control by re-enabling tx after the user has stopped it.
Address the original issue by not clearing the CMD_DONE flag after
polling for command completion. This allows the interrupt handler to
start another transfer when the CMD_DONE interrupt has not been disabled
due to flow control.
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Fixes: 663abb1a7a ("tty: serial: qcom_geni_serial: Fix UART hang")
Cc: stable@vger.kernel.org # 4.17
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240906131336.23625-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The qcom_geni_serial_poll_bit() can be used to wait for events like
command completion and is supposed to wait for the time it takes to
clear a full fifo before timing out.
As noted by Doug, the current implementation does not account for start,
stop and parity bits when determining the timeout. The helper also does
not currently account for the shift register and the two-word
intermediate transfer register.
A too short timeout can specifically lead to lost characters when
waiting for a transfer to complete as the transfer is cancelled on
timeout.
Instead of determining the poll timeout on every call, store the fifo
timeout when updating it in set_termios() and make sure to take the
shift and intermediate registers into account. Note that serial core has
already added a 20 ms margin to the fifo timeout.
Also note that the current uart_fifo_timeout() interface does
unnecessary calculations on every call and did not exist in earlier
kernels so only store its result once. This facilitates backports too as
earlier kernels can derive the timeout from uport->timeout, which has
since been removed.
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable@vger.kernel.org # 4.17
Reported-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240906131336.23625-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Qualcomm GENI serial driver did not handle buffer flushing and used
to print discarded characters when the circular buffer was cleared.
Since commit 1788cf6a91 ("tty: serial: switch from circ_buf to kfifo")
this instead resulted in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
The underlying bugs have now been fixed, but make sure to output NUL
characters instead of killing the machine if a similar driver bug is
ever reintroduced.
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-4-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Qualcomm GENI serial driver does not handle buffer flushing and used
to continue printing discarded characters when the circular buffer was
cleared. Since commit 1788cf6a91 ("tty: serial: switch from circ_buf
to kfifo") this instead results in a hard lockup due to
qcom_geni_serial_send_chunk_fifo() spinning indefinitely in the
interrupt handler.
This is easily triggered by interrupting a command such as dmesg in a
serial console but can also happen when stopping a serial getty on
reboot.
Implement the flush_buffer() callback and use it to cancel any active TX
command when the write buffer has been emptied.
Reported-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/lkml/20240610222515.3023730-1-dianders@chromium.org/
Fixes: 1788cf6a91 ("tty: serial: switch from circ_buf to kfifo")
Fixes: a1fee899e5 ("tty: serial: qcom_geni_serial: Fix softlock")
Cc: stable@vger.kernel.org # 5.0
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-3-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The stop_tx() callback is used to implement software flow control and
must not discard data as the Qualcomm GENI driver is currently doing
when there is an active TX command.
Cancelling an active command can also leave data in the hardware FIFO,
which prevents the watermark interrupt from being enabled when TX is
later restarted. This results in a soft lockup and is easily triggered
by stopping TX using software flow control in a serial console but this
can also happen after suspend.
Fix this by only stopping any active command, and effectively clearing
the hardware fifo, when shutting down the port. When TX is later
restarted, a transfer command may need to be issued to discard any stale
data that could prevent the watermark interrupt from firing.
Fixes: c4f528795d ("tty: serial: msm_geni_serial: Add serial driver support for GENI based QUP")
Cc: stable@vger.kernel.org # 4.17
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240704101805.30612-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Marek reports, that the -next commit 1788cf6a91 (tty: serial: switch
from circ_buf to kfifo) broke meson_uart and qcom_geni_serial. The
commit mistakenly advanced the kfifo twice: once by
uart_fifo_get()/kfifo_out() and second time by uart_xmit_advance().
To advance the fifo only once, drop the superfluous uart_xmit_advance()
from both.
To count the TX statistics properly, use uart_fifo_out() in
qcom_geni_serial (meson_uart_start_tx() already uses that).
I checked all other uses of uart_xmit_advance() and they appear correct:
either they are finishing DMA transfers or are after peek/linear_ptr
(i.e. they do not advance fifo).
Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org>
Reported-by: Marek Szyprowski <m.szyprowski@samsung.com>
Fixes: 1788cf6a91 ("tty: serial: switch from circ_buf to kfifo")
Cc: Neil Armstrong <neil.armstrong@linaro.org>
Cc: Kevin Hilman <khilman@baylibre.com>
Cc: Jerome Brunet <jbrunet@baylibre.com>
Cc: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Link: https://lore.kernel.org/r/20240416054825.6211-1-jirislaby@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull tty / serial driver updates from Greg KH:
"Here is the big set of TTY/Serial driver updates and cleanups for
6.9-rc1. Included in here are:
- more tty cleanups from Jiri
- loads of 8250 driver cleanups from Andy
- max310x driver updates
- samsung serial driver updates
- uart_prepare_sysrq_char() updates for many drivers
- platform driver remove callback void cleanups
- stm32 driver updates
- other small tty/serial driver updates
All of these have been in linux-next for a long time with no reported
issues"
* tag 'tty-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: (199 commits)
dt-bindings: serial: stm32: add power-domains property
serial: 8250_dw: Replace ACPI device check by a quirk
serial: Lock console when calling into driver before registration
serial: 8250_uniphier: Switch to use uart_read_port_properties()
serial: 8250_tegra: Switch to use uart_read_port_properties()
serial: 8250_pxa: Switch to use uart_read_port_properties()
serial: 8250_omap: Switch to use uart_read_port_properties()
serial: 8250_of: Switch to use uart_read_port_properties()
serial: 8250_lpc18xx: Switch to use uart_read_port_properties()
serial: 8250_ingenic: Switch to use uart_read_port_properties()
serial: 8250_dw: Switch to use uart_read_port_properties()
serial: 8250_bcm7271: Switch to use uart_read_port_properties()
serial: 8250_bcm2835aux: Switch to use uart_read_port_properties()
serial: 8250_aspeed_vuart: Switch to use uart_read_port_properties()
serial: port: Introduce a common helper to read properties
serial: core: Add UPIO_UNKNOWN constant for unknown port type
serial: core: Move struct uart_port::quirks closer to possible values
serial: sh-sci: Call sci_serial_{in,out}() directly
serial: core: only stop transmit when HW fifo is empty
serial: pch: Use uart_prepare_sysrq_char().
...
This reverts commit 5c7e105cd1.
As identified by KASAN, the simplification done by the cleanup patch
was not legal.
>From tracing through the code, it can be seen that we're transmitting
from a 4096-byte circular buffer. We copy anywhere from 1-4 bytes from
it each time. The simplification runs into trouble when we get near
the end of the circular buffer. For instance, we might start out with
xmit->tail = 4094 and we want to transfer 4 bytes. With the code
before simplification this was no problem. We'd read buf[4094],
buf[4095], buf[0], and buf[1]. With the new code we'll do a
memcpy(&buf[4094], 4) which reads 2 bytes past the end of the buffer
and then skips transmitting what's at buf[0] and buf[1].
KASAN isn't 100% consistent at reporting this for me, but to be extra
confident in the analysis, I added traces of the tail and tx_bytes and
then wrote a test program:
while true; do
echo -n "abcdefghijklmnopqrstuvwxyz0" > /dev/ttyMSM0
sleep .1
done
I watched the traces over SSH and saw:
qcom_geni_serial_send_chunk_fifo: 4093 4
qcom_geni_serial_send_chunk_fifo: 1 3
Which indicated that one byte should be missing. Sure enough the
output that should have been:
abcdefghijklmnopqrstuvwxyz0
In one case was actually missing a byte:
abcdefghijklmnopqrstuvwyz0
Running "ls -al" on large directories also made the missing bytes
obvious since columns didn't line up.
While the original code may not be the most elegant, we only talking
about copying up to 4 bytes here. Let's just go back to the code that
worked.
Fixes: 5c7e105cd1 ("tty: serial: simplify qcom_geni_serial_send_chunk_fifo()")
Cc: stable <stable@kernel.org>
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Acked-by: Jiri Slaby <jirislaby@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/r/20240304174952.1.I920a314049b345efd1f69d708e7f74d2213d0b49@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
As of commit d7402513c9 ("arm64: smp: IPI_CPU_STOP and
IPI_CPU_CRASH_STOP should try for NMI"), if we've got pseudo-NMI
enabled then we'll use it to stop CPUs at panic time. This is nice,
but it does mean that there's a pretty good chance that we'll end up
stopping a CPU while it holds the port lock for the console
UART. Specifically, I see a CPU get stopped while holding the port
lock nearly 100% of the time on my sc7180-trogdor based Chromebook by
enabling the "buddy" hardlockup detector and then doing:
sysctl -w kernel.hardlockup_all_cpu_backtrace=1
sysctl -w kernel.hardlockup_panic=1
echo HARDLOCKUP > /sys/kernel/debug/provoke-crash/DIRECT
UART drivers are _supposed_ to handle this case OK and this is why
UART drivers check "oops_in_progress" and only do a "trylock" in that
case. However, before we enabled pseudo-NMI to stop CPUs it wasn't a
very well-tested situation.
Now that we're testing the situation a lot, it can be seen that the
Qualcomm GENI UART driver is pretty broken. Specifically, when I run
my test case and look at the console output I just see a bunch of
garbled output like:
[ 201.069084] NMI backtrace[ 201.069084] NM[ 201.069087] CPU: 6
PID: 10296 Comm: dnsproxyd Not tainted 6.7.0-06265-gb13e8c0ede12
#1 01112b9f14923cbd0b[ 201.069090] Hardware name: Google Lazor
([ 201.069092] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DI[
201.069095] pc : smp_call_function_man[ 201.069099]
That's obviously not so great. This happens because each call to the
console driver exits after the data has been written to the FIFO but
before it's actually been flushed out of the serial port. When we have
multiple calls into the console one after the other then (if we can't
get the lock) each call tells the UART to throw away any data in the
FIFO that hadn't been transferred yet.
I've posted up a patch to change the arm64 core to avoid this
situation most of the time [1] much like x86 seems to do, but even if
that patch lands the GENI driver should still be fixed.
>From testing, it appears that we can just delete the cancel/abort in
the case where we weren't able to get the UART lock and the output
looks good. It makes sense that we'd be able to do this since that
means we'll just call into __qcom_geni_serial_console_write() and
__qcom_geni_serial_console_write() looks much like
qcom_geni_serial_poll_put_char() but with a loop. However, it seems
safest to poll the FIFO and make sure it's empty before our
transfer. This should reliably make sure that we're not
interrupting/clobbering any existing transfers.
As part of this change, we'll also avoid re-setting up a TX at the end
of the console write function if we weren't able to get the lock,
since accessing "port->tx_remaining" without the lock is not
safe. This is only needed to re-start userspace initiated transfers.
[1] https://lore.kernel.org/r/20231207170251.1.Id4817adef610302554b8aa42b090d57270dc119c@changeid
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20240112150307.2.Idb1553d1d22123c377f31eacb4486432f6c9ac8d@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is ignored (apart
from emitting a warning) and this typically results in resource leaks.
To improve here there is a quest to make the remove callback return
void. In the first step of this quest all drivers are converted to
.remove_new(), which already returns void. Eventually after all drivers
are converted, .remove_new() will be renamed to .remove().
Trivially convert this driver from always returning zero in the remove
callback to the void returning variant.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Link: https://lore.kernel.org/r/20231110152927.70601-32-u.kleine-koenig@pengutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When a serial port is used for kernel console output, then all
modifications to the UART registers which are done from other contexts,
e.g. getty, termios, are interference points for the kernel console.
So far this has been ignored and the printk output is based on the
principle of hope. The rework of the console infrastructure which aims to
support threaded and atomic consoles, requires to mark sections which
modify the UART registers as unsafe. This allows the atomic write function
to make informed decisions and eventually to restore operational state. It
also allows to prevent the regular UART code from modifying UART registers
while printk output is in progress.
All modifications of UART registers are guarded by the UART port lock,
which provides an obvious synchronization point with the console
infrastructure.
To avoid adding this functionality to all UART drivers, wrap the
spin_[un]lock*() invocations for uart_port::lock into helper functions
which just contain the spin_[un]lock*() invocations for now. In a
subsequent step these helpers will gain the console synchronization
mechanisms.
Converted with coccinelle. No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Bjorn Andersson <quic_bjorande@quicinc.com>
Signed-off-by: John Ogness <john.ogness@linutronix.de>
Link: https://lore.kernel.org/r/20230914183831.587273-50-john.ogness@linutronix.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The DT of_device.h and of_platform.h date back to the separate
of_platform_bus_type before it as merged into the regular platform bus.
As part of that merge prepping Arm DT support 13 years ago, they
"temporarily" include each other. They also include platform_device.h
and of.h. As a result, there's a pretty much random mix of those include
files used throughout the tree. In order to detangle these headers and
replace the implicit includes with struct declarations, users need to
explicitly include the correct includes.
Signed-off-by: Rob Herring <robh@kernel.org>
Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> # for imx
Link: https://lore.kernel.org/r/20230724205440.767071-1-robh@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The runtime PM state should not be changed by drivers that do not
implement runtime PM even if it happens to work around a bug in PM core.
With the wake irq arming now fixed, drop the bogus runtime PM state
update which left the device in active state (and could potentially
prevent a parent device from suspending).
Fixes: f3974413cf ("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup")
Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The driver have a race, experienced only with PREEMPT_RT patchset:
CPU0 | CPU1
==================================================================
qcom_geni_serial_probe |
uart_add_one_port |
| serdev_drv_probe
| qca_serdev_probe
| serdev_device_open
| uart_open
| uart_startup
| qcom_geni_serial_startup
| enable_irq
| __irq_startup
| WARN_ON()
| IRQ not activated
request_threaded_irq |
irq_domain_activate_irq |
The warning:
894000.serial: ttyHS1 at MMIO 0x894000 (irq = 144, base_baud = 0) is a MSM
serial serial0: tty port ttyHS1 registered
WARNING: CPU: 7 PID: 107 at kernel/irq/chip.c:241 __irq_startup+0x78/0xd8
...
qcom_geni_serial 894000.serial: serial engine reports 0 RX bytes in!
Adding UART port triggers probe of child serial devices - serdev and
eventually Qualcomm Bluetooth hci_qca driver. This opens UART port
which enables the interrupt before it got activated in
request_threaded_irq(). The issue originates in commit f3974413cf
("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup") and discussion on
mailing list [1]. However the above commit does not explain why the
uart_add_one_port() is moved above requesting interrupt.
[1] https://lore.kernel.org/all/5d9f3dfa.1c69fb81.84c4b.30bf@mx.google.com/
Fixes: f3974413cf ("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup")
Cc: <stable@vger.kernel.org>
Cc: Stephen Boyd <swboyd@chromium.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://lore.kernel.org/r/20230505152301.2181270-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On sc7180 Chromebooks, I did the following:
* Didn't enable earlycon in the kernel command line.
* Didn't enable serial console in the kernel command line.
* Didn't enable an agetty or any other client of "/dev/ttyMSM0".
* Added "kgdboc=ttyMSM0" to the kernel command line.
After I did that, I tried to enter kdb with this command over an ssh
session:
echo g > /proc/sysrq-trigger
When I did that the system just hung.
Although I thought I'd tested this scenario before, I couldn't go back
and find a time when it was working. Previous testing must have relied
on either the UART acting as the kernel console or an agetty running.
It turns out to be pretty easy to fix: we can just use
qcom_geni_serial_port_setup() as the .poll_init() function. This,
together with the patch ("serial: uart_poll_init() should power on the
UART"), allows the debugger to work even if there are no other users
of the serial port.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20230316132027.RESEND.2.Ie678853bb101091afe78cc8c22344bf3ff3aed74@changeid
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
A recent commit added back the calls top stop tx and rx to shutdown()
which had previously been removed by commit e83766334f ("tty: serial:
qcom_geni_serial: No need to stop tx/rx on UART shutdown") in order to
be able to use kgdb after stopping the getty.
Not only did this again break kgdb, but it also broke serial consoles
more generally by hanging TX when stopping the getty during reboot.
The underlying problem has been there since the driver was first merged
and fixing it is going to be a bit involved so simply stop calling the
broken stop functions during shutdown for consoles for now.
Fixes: d8aca2f968 ("tty: serial: qcom-geni-serial: stop operations in progress at shutdown")
Cc: stable <stable@kernel.org>
Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Johan Hovold <johan+linaro@kernel.org>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Tested-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org>
Reviewed-by: Andrew Halaney <ahalaney@redhat.com>
Tested-by: Andrew Halaney <ahalaney@redhat.com> # sa8540p-ride
Link: https://lore.kernel.org/r/20230307164405.14218-2-johan+linaro@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
When -Woverride-init is enabled in a build, gcc points out that
qcom_geni_serial_pm_ops contains conflicting initializers:
drivers/tty/serial/qcom_geni_serial.c:1586:20: error: initialized field overwritten [-Werror=override-init]
1586 | .restore = qcom_geni_serial_sys_hib_resume,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/tty/serial/qcom_geni_serial.c:1586:20: note: (near initialization for 'qcom_geni_serial_pm_ops.restore')
drivers/tty/serial/qcom_geni_serial.c:1587:17: error: initialized field overwritten [-Werror=override-init]
1587 | .thaw = qcom_geni_serial_sys_hib_resume,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Open-code the initializers with the version that was already used,
and use the pm_sleep_ptr() method to deal with unused ones,
in place of the __maybe_unused annotation.
Fixes: 35781d8356 ("tty: serial: qcom-geni-serial: Add support for Hibernation feature")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Link: https://lore.kernel.org/r/20221215165453.1864836-1-arnd@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Driver's probe allocates memory for RX FIFO (port->rx_fifo) based on
default RX FIFO depth, e.g. 16. Later during serial startup the
qcom_geni_serial_port_setup() updates the RX FIFO depth
(port->rx_fifo_depth) to match real device capabilities, e.g. to 32.
The RX UART handle code will read "port->rx_fifo_depth" number of words
into "port->rx_fifo" buffer, thus exceeding the bounds. This can be
observed in certain configurations with Qualcomm Bluetooth HCI UART
device and KASAN:
Bluetooth: hci0: QCA Product ID :0x00000010
Bluetooth: hci0: QCA SOC Version :0x400a0200
Bluetooth: hci0: QCA ROM Version :0x00000200
Bluetooth: hci0: QCA Patch Version:0x00000d2b
Bluetooth: hci0: QCA controller version 0x02000200
Bluetooth: hci0: QCA Downloading qca/htbtfw20.tlv
bluetooth hci0: Direct firmware load for qca/htbtfw20.tlv failed with error -2
Bluetooth: hci0: QCA Failed to request file: qca/htbtfw20.tlv (-2)
Bluetooth: hci0: QCA Failed to download patch (-2)
==================================================================
BUG: KASAN: slab-out-of-bounds in handle_rx_uart+0xa8/0x18c
Write of size 4 at addr ffff279347d578c0 by task swapper/0/0
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.1.0-rt5-00350-gb2450b7e00be-dirty #26
Hardware name: Qualcomm Technologies, Inc. Robotics RB5 (DT)
Call trace:
dump_backtrace.part.0+0xe0/0xf0
show_stack+0x18/0x40
dump_stack_lvl+0x8c/0xb8
print_report+0x188/0x488
kasan_report+0xb4/0x100
__asan_store4+0x80/0xa4
handle_rx_uart+0xa8/0x18c
qcom_geni_serial_handle_rx+0x84/0x9c
qcom_geni_serial_isr+0x24c/0x760
__handle_irq_event_percpu+0x108/0x500
handle_irq_event+0x6c/0x110
handle_fasteoi_irq+0x138/0x2cc
generic_handle_domain_irq+0x48/0x64
If the RX FIFO depth changes after probe, be sure to resize the buffer.
Fixes: f9d690b6ec ("tty: serial: qcom_geni_serial: Allocate port->rx_fifo buffer in probe")
Cc: <stable@vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Jiri Slaby <jirislaby@kernel.org>
Link: https://lore.kernel.org/r/20221221164022.1087814-1-krzysztof.kozlowski@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>