There are cases when we want to have separate DRM devices for GPU and
display pipelines.
One example is development, when it is beneficial to be able to bind the
GPU driver separately, without the display pipeline (and without the
hacks adding "amd,imageon" to the compatible string).
Another example is some of Qualcomm platforms, which have two MDSS
units, but only one GPU. With current approach it is next to impossible
to support this usecase properly, while separate binding allows users to
have three DRM devices: two for MDSS units and a single headless GPU.
Add kernel param msm.separate_gpu_kms, which if set to true forces
creation of separate display and GPU DRM devices. Mesa supports this
setup by using the kmsro wrapper.
The param is disabled by default, in order to be able to test userspace
for the compatibility issues. Simple clients are able to handle this
setup automatically.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/662590/
[Rob: renamed the modparam to separate_gpu_kms, and add missing
DRIVER_GEM_GPUVA]
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Currently the msm driver creates an extra interim platform device for
Imageon GPUs. This is not ideal, as the device doesn't have
corresponding OF node. If the headless mode is used for newer GPUs, then
the msm_use_mmu() function can not detect corresponding IOMMU devices.
Also the DRM device (although it's headless) is created with modesetting
flags being set.
To solve all these issues, rework the way the Imageon devices are bound.
Remove the interim device, don't register a component and instead use a
cut-down version of the normal functions to probe or remove the driver.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/662584/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
When userspace opts in to VM_BIND, the submit no longer holds references
keeping the VMA alive. This makes it difficult to distinguish between
UMD/KMD/app bugs. So add a debug option for logging the most recent VM
updates and capturing these in GPU devcoredumps.
The submitqueue id is also captured, a value of zero means the operation
did not go via a submitqueue (ie. comes from msm_gem_vm_close() tearing
down the remaining mappings when the device file is closed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661518/
With user managed VMs and multiple queues, it is in theory possible to
trigger map/unmap errors. These will (in a later patch) mark the VM as
unusable. But we want to tell the io-pgtable helpers not to spam the
log. In addition, in the unmap path, we don't want to bail early from
the unmap, to ensure we don't leave some dangling pages mapped.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661520/
Add PRR (Partial Resident Region) is a bypass address which make GPU
writes go to /dev/null and reads return zero. This is used to implement
vulkan sparse residency.
To support PRR/NULL mappings, we allocate a page to reserve a physical
address which we know will not be used as part of a GEM object, and
configure the SMMU to use this address for PRR/NULL mappings.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661486/
Now that we've realigned deletion and allocation, switch over to using
drm_gpuvm/drm_gpuva. This allows us to support multiple VMAs per BO per
VM, to allow mapping different parts of a single BO at different virtual
addresses, which is a key requirement for sparse/VM_BIND.
This prepares us for using drm_gpuvm to translate a batch of MAP/
MAP_NULL/UNMAP operations from userspace into a sequence of map/remap/
unmap steps for updating the page tables.
Since, unlike our prior vm/vma setup, with drm_gpuvm the vm_bo holds a
reference to the GEM object. To prevent reference loops causing us to
leak all GEM objects, we implicitly tear down the mapping when the GEM
handle is close or when the obj is unpinned. Which means the submit
needs to also hold a reference to the vm_bo, to prevent the VMA from
being torn down while the submit is in-flight.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661479/
Back-merge drm-next to (indirectly) get arm-smmu updates for making
stall-on-fault more reliable.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
If we have a newer dtb than kernel, we could end up in a situation where
the GPU device is present in the dtb, but not in the drivers device
table. We don't want this to prevent the display from probing. So
check that we recognize the GPU before adding the GPU component.
v2: use %pOF
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/657701/
Calling this packet is necessary when we switch contexts because there
are various pieces of state used by userspace to synchronize between BR
and BV that are persistent across submits and we need to make sure that
they are in a "safe" state when switching contexts. Otherwise a
userspace submission in one context could cause another context to
function incorrectly and hang, effectively a denial of service (although
without leaking data). This was missed during initial a7xx bringup.
Fixes: af66706acc ("drm/msm/a6xx: Add skeleton A7xx support")
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654924/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
When things go wrong, the GPU is capable of quickly generating millions
of faulting translation requests per second. When that happens, in the
stall-on-fault model each access will stall until it wins the race to
signal the fault and then the RESUME register is written. This slows
processing page faults to a crawl as the GPU can generate faults much
faster than the CPU can acknowledge them. It also means that all
available resources in the SMMU are saturated waiting for the stalled
transactions, so that other transactions such as transactions generated
by the GMU, which shares translation resources with the GPU, cannot
proceed. This causes a GMU watchdog timeout, which leads to a failed
reset because GX cannot collapse when there is a transaction pending and
a permanently hung GPU.
On older platforms with qcom,smmu-v2, it seems that when one transaction
is stalled subsequent faulting transactions are terminated, which avoids
this problem, but the MMU-500 follows the spec here.
To work around these problems, disable stall-on-fault as soon as we get a
page fault until a cooldown period after pagefaults stop. This allows
the GMU some guaranteed time to continue working. We only use
stall-on-fault to halt the GPU while we collect a devcoredump and we
always terminate the transaction afterward, so it's fine to miss some
subsequent page faults. We also keep it disabled so long as the current
devcoredump hasn't been deleted, because in that case we likely won't
capture another one if there's a fault.
After this commit HFI messages still occasionally time out, because the
crashdump handler doesn't run fast enough to let the GMU resume, but the
driver seems to recover from it. This will probably go away after the
HFI timeout is increased.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Rob Clark <robdclark@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/654891/
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
The driver handles the case where gpu fw is not in the initrd. OTOH it
doesn't always handle the case where _some_ fw is in the initrd, but
others are not. In particular the zap fw tends to be signed with an OEM
specific key, so the paths/names differ across devices with the same
SoC/GPU, so we cannot sanely list them with MODULE_FIRMWARE().
So MODULE_FIRMWARE() just ends up causing problems without actually
solving anything. Remove them!
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/652195/
Really the only purpose of this was to limit the address space size to
4GB to avoid 32b rollover problems in 64b pointer math in older sqe fw.
So replace the address_space_size with a quirk limiting the address
space to 4GB. In all other cases, use the SMMU input address size (IAS)
to determine the address space size.
v2: Properly account for vm_start
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/649467/
If the GMU takes too long to respond to an HFI message, we may return
early. If the GMU does eventually respond, and then we send a second
message, we will see the response for the first, throw another error,
and keep going. But we don't currently wait for the interrupt from the
GMU again, so if the second response isn't there immediately we may
prematurely return. This can cause a continuous cycle of missed HFI
messages, and for reasons I don't quite understand the GMU does not shut
down properly when this happens.
Fix this by waiting for the GMU interrupt when we see an empty queue. If
the GMU never responds then the queue really is empty and we quit. We
can't wait for the interrupt when we see a wrong response seqnum because
the GMU might have already queued both responses by the time we clear
the interrupt the first time so we do need to check the queue before
waiting on the interrupt again.
Signed-off-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/650013/
Signed-off-by: Rob Clark <robdclark@chromium.org>
Merge drm-misc-next to get commit Fixes: fec450ca15 ("drm/display:
hdmi: provide central data authority for ACR params").
Signed-off-by: Rob Clark <robdclark@chromium.org>