With the conversion to drm_gpuvm, we lost the lazy VMA cleanup, which
means that fb cleanup/unpin when pageflipping to new scanout buffers
immediately unmaps the scanout buffer. This is costly (with tlbinv,
it can be 4-6ms for a 1080p scanout buffer, and more for higher
resolutions)!
To avoid this, introduce a vma_ref, which is incremented whenever
userspace has a GEM handle or dma-buf fd. When unpinning if the
vm is the kms->vm we defer tearing down the VMA until the vma_ref
drops to zero. If the buffer is still part of a flip-chain then
userspace will be holding some sort of reference to the BO, either
via a GEM handle and/or dma-buf fd. So this avoids unmapping the VMA
when there is a strong possibility that it will be needed again.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661538/
This resolves a potential deadlock vs msm_gem_vm_close(). Otherwise for
_NO_SHARE buffers msm_gem_describe() could be trying to acquire the
shared vm resv, while already holding priv->obj_lock. But _vm_close()
might drop the last reference to a GEM obj while already holding the vm
resv, and msm_gem_free_object() needs to grab priv->obj_lock, a locking
inversion.
OTOH this is only for debugfs and it isn't critical if we undercount by
skipping a locked obj. So just use trylock() and move along if we can't
get the lock.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661525/
Convert to using the gpuvm's r_obj for serializing access to the VM.
This way we can use the drm_exec helper for dealing with deadlock
detection and backoff.
This will let us deal with upcoming locking order conflicts with the
VM_BIND implmentation (ie. in some scenarious we need to acquire the obj
lock first, for ex. to iterate all the VMs an obj is bound in, and in
other scenarious we need to acquire the VM lock first).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661478/
Now that we've realigned deletion and allocation, switch over to using
drm_gpuvm/drm_gpuva. This allows us to support multiple VMAs per BO per
VM, to allow mapping different parts of a single BO at different virtual
addresses, which is a key requirement for sparse/VM_BIND.
This prepares us for using drm_gpuvm to translate a batch of MAP/
MAP_NULL/UNMAP operations from userspace into a sequence of map/remap/
unmap steps for updating the page tables.
Since, unlike our prior vm/vma setup, with drm_gpuvm the vm_bo holds a
reference to the GEM object. To prevent reference loops causing us to
leak all GEM objects, we implicitly tear down the mapping when the GEM
handle is close or when the obj is unpinned. Which means the submit
needs to also hold a reference to the vm_bo, to prevent the VMA from
being torn down while the submit is in-flight.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661479/
Previously we'd also tear down the VMA, making the address space
available again. But with drm_gpuvm conversion, this would require
holding the locks of all VMs the GEM object is mapped in. Which is
problematic for the shrinker.
Instead just let the VMA hang around until the GEM object is freed.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Tested-by: Antonino Maniscalco <antomani103@gmail.com>
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/661472/
Error messages resulting from incorrect usage of the kernel uabi should
not spam dmesg by default. But it is useful to enable them to debug
userspace. So demote to DRM_UT_DRIVER.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/564189/
The EXT_external_objects extension is a bit awkward as it doesn't pass
explicit modifiers, leaving the importer to guess with incomplete
information. In the case of vk (turnip) exporting and gl (freedreno)
importing, the "OPTIMAL_TILING_EXT" layout depends on VkImageCreateInfo
flags (among other things), which the importer does not know. Which
unfortunately leaves us with the need for a metadata back-channel.
The contents of the metadata are defined by userspace. The
EXT_external_objects extension is only required to work between
compatible versions of gl and vk drivers, as defined by device and
driver UUIDs.
v2: add missing metadata kfree
v3: Rework to move copy_from/to_user out from under gem obj lock
to avoid angering lockdep about deadlocks against fs-reclaim
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/566157/
Updates for v6.6, which includes a backmerge of msm-fixes to avoid conficts.
Core:
- SM6125 MDSS support
DPU:
- SM6125 DPU support
- Added subblocks to display snapshot
- Use UBWC data from MDSS driver rather than duplicating it
- dpu_core_perf cleanup
DSI:
- Enabled burst mode to fix CMD mode panels
- Runtime PM support
- refgen regulator support
DSI PHY:
- SM6125 support in 14nm DSI PHY driver
GPU:
- Rework GPU identification to prepare for a7xx, and other a7xx prep
- Cleanups and fixes
- Disallow legacy relocs on a6xx and newer
- a690: switch to using a660_gmu.bin fw as this is what we have in
linux-firmware and we see no evidence that it should be different
from other a660 family (a6xx subgen 4) devices
- Submit overhead opts, 1.6x faster for NO_IMPLICIT_SYNC commits with
100 BOs to 2.5x faster for 1000 BOs
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rob Clark <robdclark@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGv_01g-edjdfKLWWcb-rO5aSyLsv5FpbKrTkXVL9+ngTQ@mail.gmail.com
This was not strictly necessary, as page unpinning (ie. shrinker) only
cares about the resv. It did give us some extra sanity checking for
userspace controlled iova, and was useful to catch issues on kernel and
userspace side when enabling userspace iova. But if userspace screws
this up, it just corrupts it's own gpu buffers and/or gets iova faults.
So we can just let userspace shoot it's own foot and drop the extra per-
buffer SUBMIT overhead.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Patchwork: https://patchwork.freedesktop.org/patch/551023/
Split out pin_count incrementing and lru updating into a separate loop
so we can take the lru lock only once for all objs. Since we are still
holding the obj lock, it is safe to split this up.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/551025/
Backmerging into drm-misc-next to get commit 2c1c7ba457
("drm/amdgpu: support partition drm devices"), which is required to fix
commit 0adec22702 ("drm: Remove struct drm_driver.gem_prime_mmap").
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Only the msm driver provides its own implementation of gem_prime_mmap
from struct drm_driver. All other drivers use the drm_gem_prime_mmap()
helper.
Initialize the mmap offset when constructing the buffer object in msm
and reduce the gem_prime_mmap code to the generic helper. Prepares
msm for the removal of struct drm_driver.gem_prime_mmap.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230613150441.17720-2-tzimmermann@suse.de
Merge the drm-next tree to pick up the DRM DSC helpers (merged via
drm-intel-next tree). MSM DSC v1.2 patches depend on these helpers.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
Now that everything that controls which LRU an obj lives in *except* the
backing pages is protected by the LRU lock, add a special path to unpin
in the job_run() path, where we are assured that we already have backing
pages and will not be racing against eviction (because the GEM object's
dma_resv contains the fence that will be signaled when the submit/job
completes).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527845/
Link: https://lore.kernel.org/r/20230320144356.803762-10-robdclark@gmail.com
Since the LRU lock is already acquired when moving an obj between LRUs,
we can use it to protect pin_count and madv, without any significant
change in locking (ie. it just expands the scope of the lock by a hand-
ful of instructions). This prepares the way to decrement the pin_count
in the job_run() path without needing to hold the obj lock, to avoid a
potential deadlock (or rather stall) caused by the fence-signaling path
(job_run()) blocking on shrinker/reclaim. (Only a stall because the
wait for fence signaling wait_for_idle() is not infinite.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Patchwork: https://patchwork.freedesktop.org/patch/527843/
Link: https://lore.kernel.org/r/20230320144356.803762-9-robdclark@gmail.com