There is another cause for soft lock-up of GPU in empty ring-buffer:
race between GPU executing last commands and CPU checking ring for
emptiness. On GPU side IRQ for retire is triggered by CACHE_FLUSH_TS
event and RPTR shadow (which is used to check ring emptiness) is updated
a bit later from CP_CONTEXT_SWITCH_YIELD. Thus if GPU is executing its
last commands slow enough or we check that ring too fast we will miss a
chance to trigger switch to lower priority ring because current ring isn't
empty just yet. This can escalate to lock-up situation described in
previous patch.
To work-around this issue we keep track of last submit sequence number
for each ring and compare it with one written to memptrs from GPU during
execution of CACHE_FLUSH_TS event.
Fixes: b1fc2839d2 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/612047/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Fine grain preemption (switching from/to points within submits)
requires extra handling in command stream of those submits, especially
when rendering with tiling (using GMEM). However this handling is
missing at this point in mesa (and always was). For this reason we get
random GPU faults and hangs if more than one priority level is used
because local preemption is enabled prior to executing command stream
from submit.
With that said it was ahead of time to enable local preemption by
default considering the fact that even on downstream kernel it is only
enabled if requested via UAPI.
Fixes: a7a4c19c36 ("drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register")
Signed-off-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/612041/
Signed-off-by: Rob Clark <robdclark@chromium.org>
According to downstream we should be setting RBBM_NC_MODE_CNTL to a
non-default value on a663 and a680, we don't support a663 and on a680
we're leaving it at the wrong (suboptimal) value. Just set it on all
GPUs. Similarly, plumb through level2_swizzling_dis which will be
necessary on a663.
ubwc_mode is expanded and renamed to ubwc_swizzle to match the name on
the display side. Similarly macrotile_mode should match the display
side.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/607397/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Since the revision becomes an opaque identifier with future GPUs, move
away from treating different ranges of bits as having a given meaning.
This means that we need to explicitly list different patch revisions in
the device table.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/549782/
The commit 010c8bbad2 ("drm: msm: adreno: Disable preemption on Adreno
510") added special handling for a510 (this SKU doesn't seem to support
preemption, so the driver should clamp nr_rings to 1). However the
gpu->revn is not yet set (it is set later, in adreno_gpu_init()) and
thus the condition is always false. Check config->rev instead.
Fixes: 010c8bbad2 ("drm: msm: adreno: Disable preemption on Adreno 510")
Reported-by: Adam Skladowski <a39.skl@gmail.com>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Tested-by: Adam Skladowski <a39.skl@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/531511/
Signed-off-by: Rob Clark <robdclark@chromium.org>
msm-fixes for v6.3-rc2
- Fix for possible invalid ptr free in submit ioctl syncobj cleanup path.
- Synchronize GMU removal in driver teardown path
- a5xx preemption fixes
- Fix runpm imbalance at unbind
- DPU hw catalog fixes:
- set DPU_MDP_PERIPH_0_REMOVED for sc8280xp as this is another chipset
where the PERIPH_0 block of registers is not there
- fix the DPU features supported in QCM2290 by comparing it with the
downstream device tree
- fix the length of registers in the sc7180_ctl from 0xe4 to 0x1dc
- fix the max mixer line width for sm6115 and qcm2290 chipsets in the
DPU catalog
- fix the scaler version on sm8550, sc8280xp, sm8450, sm8250, sm8350
and sm6115. This was incorrectly populated on the SW version of the
scaler library and not the scaler HW version
- Drop dim layer support for msm8998 as its not indicated to be
supported in the downstream DTSI
- fix the DPU_CLK_CTRL bits for msm 8998 sspp blocks
- Use DPU_CLK_CTRL_DMA* prefix instead of DPU_CLK_CTRL_CURSOR*
for all chipsets for the DMA sspp blocks
- fix the ping-pong block base address for sc7280 in the DPU HW catalog
- Fix stack corruption issue in the dpu_hw_ctl_setup_blendstage() function
as it was causing a negative left shift by protecting against an invalid
index
- Clear the DSPP reservations in dpu_rm_release(). This was missed out and
as as result the DSPP was not released from the resource manager global
state.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvH+VH_Wx3mFMG51CMnoiU06CM-+-WMhM73M42Qx7Bp4A@mail.gmail.com
From testing on sc7180-trogdor devices, reading the GMU registers
needs the GMU clocks to be enabled. Those clocks get turned on in
a6xx_gmu_resume(). Confusingly enough, that function is called as a
result of the runtime_pm of the GPU "struct device", not the GMU
"struct device". Unfortunately the current a6xx_gpu_busy() grabs a
reference to the GMU's "struct device".
The fact that we were grabbing the wrong reference was easily seen to
cause crashes that happen if we change the GPU's pm_runtime usage to
not use autosuspend. It's also believed to cause some long tail GPU
crashes even with autosuspend.
We could look at changing it so that we do pm_runtime_get_if_in_use()
on the GPU's "struct device", but then we run into a different
problem. pm_runtime_get_if_in_use() will return 0 for the GPU's
"struct device" the whole time when we're in the "autosuspend
delay". That is, when we drop the last reference to the GPU but we're
waiting a period before actually suspending then we'll think the GPU
is off. One reason that's bad is that if the GPU didn't actually turn
off then the cycle counter doesn't lose state and that throws off all
of our calculations.
Let's change the code to keep track of the suspend state of
devfreq. msm_devfreq_suspend() is always called before we actually
suspend the GPU and msm_devfreq_resume() after we resume it. This
means we can use the suspended state to know if we're powered or not.
NOTE: one might wonder when exactly our status function is called when
devfreq is supposed to be disabled. The stack crawl I captured was:
msm_devfreq_get_dev_status
devfreq_simple_ondemand_func
devfreq_update_target
qos_notifier_call
qos_max_notifier_call
blocking_notifier_call_chain
pm_qos_update_target
freq_qos_apply
apply_constraint
__dev_pm_qos_update_request
dev_pm_qos_update_request
msm_devfreq_idle_work
Fixes: eadf79286a ("drm/msm: Check for powered down HW in the devfreq callbacks")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/489124/
Link: https://lore.kernel.org/r/20220610124639.v4.1.Ie846c5352bc307ee4248d7cab998ab3016b85d06@changeid
Signed-off-by: Rob Clark <robdclark@chromium.org>
A CP_PROTECT entry for SMMU registers is missing for A540. According to
downstream sources its length is same as on A530 - 0x20000 bytes.
On all other revisions SMMU region length is 0x10000 bytes. Despite
this, we setup region of length 0x20000 on all revisions. This doesn't
cause any issues on those GPUs. As for preventing accesses to the region
from protected mode it was tested to work the same.
This patch drops the "if" condition in setup of CP_PROTECT entry because
it already includes all supported revisions except A540.
Signed-off-by: Vladimir Lypak <vladimir.lypak@gmail.com>
Link: https://lore.kernel.org/r/20211212160333.980343-2-vladimir.lypak@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
On a5xx and a6xx devices that are using CP_WHERE_AM_I to update a
ringbuffer read-ptr shadow value, periodically emit a CP_WHERE_AM_I
every 32 commands, so that a later submit waiting for ringbuffer
space to become available sees partial progress, rather than not
seeing rptr advance at all until the GPU gets to the end of the
submit that it is currently chewing on.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Jordan Crouse <jordan@cosmicpenguin.net>
Link: https://lore.kernel.org/r/20210428193654.1498482-3-robdclark@gmail.com
Signed-off-by: Rob Clark <robdclark@chromium.org>
The upstream API for some reason uses logbase2 instead of
just passing the argument as-is, whereas downstream CAF
kernel does the latter.
Hence, a mistake has been made when porting:
4 is the value that's supposed to be passed, but
log2(4) = 2. Changing the value to 16 (= 2^4) fixes
the issue.
Signed-off-by: Konrad Dybcio <konrad.dybcio@somainline.org>
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Resetting the VBIF before power collapse is done to avoid getting
bogus FIFO entries during the suspend sequence or subsequent resume,
but this is doable only on Adreno 510 and Adreno 530, as the other
units will tendentially lock up.
Especially on Adreno 508, the GPU will show lockups and very bad
slownesses after processing the first frame.
Avoiding to execute the RBBM SW Reset before suspend will stop the
lockup issue from happening on at least Adreno 508/509/512.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The Adreno 508/509/512 GPUs are stripped versions of the Adreno
5xx found in the mid-end SoCs such as SDM630, SDM636, SDM660 and
SDA variants; these SoCs are usually provided with ZAP firmwares,
but they have no available GPMU.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Tested-by: Martin Botka <martin.botka1@gmail.com>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The "main" if branch where we program the other registers for the
Adreno 5xx family of GPUs should not contain the PC_DBG_ECO_CNTL
register programming because this has logical similarity
differences from all the others.
A later commit will show the entire sense of this.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
The PC_DBG_ECO_CNTL register on the Adreno A5xx family gets
programmed to some different values on a per-model basis.
At least, this is what we intend to do here;
Unfortunately, though, this register is being overwritten with a
static magic number, right after applying the GPU-specific
configuration (including the GPU-specific quirks) and that is
effectively nullifying the efforts.
Let's remove the redundant and wrong write to the PC_DBG_ECO_CNTL
register in order to retain the wanted configuration for the
target GPU.
Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
nr_rings is reset to 1, but when this function is called for a second
(and third!) time nr_rings > 1 is false, thus the else case is entered
to set up a buffer for the RPTR shadow and consequently written to
RB_RPTR_ADDR, hanging platforms without WHERE_AM_I firmware support.
Restructure the condition in such a way that shadow buffer setup only
ever happens when has_whereami is true; otherwise preemption is only
finalized when the number of ring buffers has not been reset to 1 yet.
Fixes: 8907afb476 ("drm/msm: Allow a5xx to mark the RPTR shadow as privileged")
Signed-off-by: Marijn Suijten <marijn.suijten@somainline.org>
Tested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@somainline.org>
Reviewed-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Similar to the previous patch, clear shadow on suspend to avoid timeouts
waiting for ringbuffer space.
Fixes: 8907afb476 ("drm/msm: Allow a5xx to mark the RPTR shadow as privileged")
Signed-off-by: Rob Clark <robdclark@chromium.org>
The microcode bo's should never be madvise(WONTNEED), so these should
not be using msm_gem_get_vaddr_active().
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
* DSI support for sm8150/sm8250
* Support for per-process GPU pagetables (finally!) for a6xx.
There are still some iommu/arm-smmu changes required to
enable, without which it will fallback to the current single
pgtable state. The first part (ie. what doesn't depend on
drm side patches) is queued up for v5.10[1].
* DisplayPort support. Userspace DP compliance tool support
is already merged in IGT[2]
* The usual assortment of smaller fixes/cleanups
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGvqjuzH=Po_9EzzFsp2Xq3tqJUTKfsA2g09XY7_+6Ypfw@mail.gmail.com