This ensures that the GPU isn't able to write into already freed
objects, as doing this in the IOVA reaper isn't enough, as the
gem_free_object path will also cause unmaps to happen.
On MMUv2 this also ensures that stale entries, which may have
been prefetched into the TLB will be purged.
The flush is low overhead, as it gets batched up with the next
user command buffer, so this isn't incuring an overhead for
each buffer map/unmap.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Declare etnaviv_iommu_ops structure as const as it is only used when
the reference of one of its field is stored in the ops field of a
iommu_domain structure. This ops field is of type const, so
etnaviv_iommu_ops structures having similar properties can be declared
const too.
Done using Coccinelle.
Before and after size details of .o file remains the same after
cross compiling for arm architecture.
lst: Trimmed commit message, apply the same change to iommu_v2.
Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Set up the PULSE_EATER register (0x0010C) in etnaviv_gpu_hw_init. This
ports three mostly undocumented model/revision-specific register
overrides from the Vivante kernel driver.
This is relevant as at least the "disable internal DFS" for revisions >
0x5420 has shown to have a huge impact on shader performance (sped up
memory read performance by 7.5x and write performance by 1.5x) on an
affected GPU.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
After rollover of the IOVA space, we want to get a low IOVA address,
otherwise the the games we play by remembering the last IOVA are
pointless. When we search for a free hole with DRM_MM_SEARCH_DEFAULT,
drm_mm will pop the next entry from the free holes stack, which will
likely be a high IOVA. By using DRM_MM_SEARCH_BELOW we can trick
drm_mm into reversing the search and provide us with a low IOVA.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir van der Laan <laanwj@gmail.com>
Patch series "mm: unexport __get_user_pages_unlocked()".
This patch series continues the cleanup of get_user_pages*() functions
taking advantage of the fact we can now pass gup_flags as we please.
It firstly adds an additional 'locked' parameter to
get_user_pages_remote() to allow for its callers to utilise
VM_FAULT_RETRY functionality. This is necessary as the invocation of
__get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
this and no other existing higher level function would allow it to do
so.
Secondly existing callers of __get_user_pages_unlocked() are replaced
with the appropriate higher-level replacement -
get_user_pages_unlocked() if the current task and memory descriptor are
referenced, or get_user_pages_remote() if other task/memory descriptors
are referenced (having acquiring mmap_sem.)
This patch (of 2):
Add a int *locked parameter to get_user_pages_remote() to allow
VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
Taking into account the previous adjustments to get_user_pages*()
functions allowing for the passing of gup_flags, we are now in a
position where __get_user_pages_unlocked() need only be exported for his
ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
subsequently unexport __get_user_pages_unlocked() as well as allowing
for future flexibility in the use of get_user_pages_remote().
[sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.com
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Hugh Dickins <hughd@google.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krcmar <rkrcmar@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- fix dma-buf export path to return correct SG table
- trivially implement direct dma-buf mapping
- allow DRAW_INSTANCED commands in validator
- make the driver work on i.MX6SX, yielding a working 2D/3D stack
together with Mareks MXS DRM driver
* 'drm-etnaviv-next' of git://git.pengutronix.de/lst/linux:
MAINTAINERS: add etnaviv mailinglist
drm/etnaviv: move linear window on MC1.0 parts if necessary
drm/etnaviv: don't invoke OOM killer from dump code
drm/etnaviv: fix gem_prime_get_sg_table to return new SG table
drm/etnaviv: Allow DRAW_INSTANCED commands
drm/etnaviv: implement dma-buf mmap
On i.MX6SX the physical memory is placed above the 2GB mark, so the GPU
linear window has to be moved for the GPU to work at all. This doesn't
mix with the FAST_CLEAR feature, as the TS unit doesn't take the linear
window offset into account and will corrupt memory when used with a
non-zero offset.
Move the linear window if it's necessary for the GPU to work, but avoid
announcing FAST_CLEAR support to userspace in this case.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Marek Vasut <marex@denx.de>
The dumper is only a debugging aid so we don't want to invoke the OOM
killer if buffer for the potentially large GPU state can't be vmalloced.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The object internal SG table must not be returned, as the caller
will take ownership of the returned table.
Construct a new table from the object pages and return this one
instead.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Vivante GPUs with HALTI0 feature support a DRAW_INSTANCED command in the
command stream to draw a number of instances of the same geometry.
The information that has been figured out about the command can be found
here: https://github.com/etnaviv/etna_viv/blob/master/rnndb/cmdstream.xml#L270
This command is not allowed currently by the DRM driver because it
was not known before. This patch enables parsing it in command
streams and allows using it by userspace drivers.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This adds the required boilerplate to allow direct mmap of exported
etnaviv BOs.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Philipp Zabel <p.zabel@pengutronix.de>
First -misc pull for 4.10:
- drm_format rework from Laurent
- reservation patches from Chris that missed 4.9.
- aspect ratio support in infoframe helpers and drm mode/edid code
(Shashank Sharma)
- rotation rework from Ville (first parts at least)
- another attempt at the CRC debugfs interface from Tomeu
- piles and piles of misc patches all over
* tag 'topic/drm-misc-2016-10-24' of git://anongit.freedesktop.org/drm-intel: (55 commits)
drm: Use u64 for intermediate dotclock calculations
drm/i915: Use the per-plane rotation property
drm/omap: Use per-plane rotation property
drm/omap: Set rotation property initial value to BIT(DRM_ROTATE_0) insted of 0
drm/atmel-hlcdc: Use per-plane rotation property
drm/arm: Use per-plane rotation property
drm: Add support for optional per-plane rotation property
drm/atomic: Reject attempts to use multiple rotation angles at once
drm: Add drm_rotation_90_or_270()
dma-buf/sync_file: hold reference to fence when creating sync_file
drm/virtio: kconfig: Fixup white space.
drm/fence: release fence reference when canceling event
drm/i915: Handle early failure during intel_get_load_detect_pipe
drm/fb_cma_helper: do not free fbdev if there is none
drm: fix sparse warnings on undeclared symbols in crc debugfs
gpu: Remove depends on RESET_CONTROLLER when not a provider
i915: don't call drm_atomic_state_put on invalid pointer
drm: Don't export the drm_fb_get_bpp_depth() function
drm/arm: mali-dp: Replace drm_fb_get_bpp_depth() with drm_format_plane_cpp()
drm: vmwgfx: Replace drm_fb_get_bpp_depth() with drm_format_info()
...
2 more patches to stabilize the new MMUv2 support.
* 'drm-etnaviv-fixes' of git://git.pengutronix.de/lst/linux:
drm/etnaviv: block 64K of address space behind each cmdstream
drm/etnaviv: ensure write caches are flushed at end of user cmdstream
This removes the 'write' and 'force' from get_user_pages_remote() and
replaces them with 'gup_flags' to make the use of FOLL_FORCE explicit in
callers as use of this flag can result in surprising behaviour (and
hence bugs) within the mm subsystem.
Signed-off-by: Lorenzo Stoakes <lstoakes@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
To make sure we don't place anything there which might confuse
the FE prefetcher. This gets rid of another case of FE MMU faults
when the address space gets crowded before triggering the reaper.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
If the GPU is done with one user command stream the buffers referenced
by this command stream may go away and get unmapped from the MMU. If
the write caches are still dirty at this point later evictions will run
into MMU faults, killing the GPU.
Make sure the write caches are flushed before signaling completion
of the user command stream.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Notable changes:
- Cleanups from Fabio to some error paths and proper error propagation.
- Lots of refactoring and new code to support the new MMU version 2,
still relatively unoptimized and doesn't yet provide better process
isolation than MMUv1, but enough to get newer cores up and running.
- New hardware support: GC3000, as found on the NXP i.MX6 QuadPlus SoC.
* 'drm-etnaviv-next' of git://git.pengutronix.de/git/lst/linux: (25 commits)
drm/etnaviv: mark whole context as lost in recover worker
drm/etnaviv: record correct cmdbuf IOVA in dump
drm/etnaviv: space out IOVA layout for cmdbufs on MMUv2
drm/etnaviv: fix up model and revision for GC2000+
drm/etnaviv: implement IOMMUv2 translation
drm/etnaviv: handle MMU exception in IRQ handler
drm/etnaviv: add flushing logic for MMUv2
drm/etnaviv: add function to construct MMUv2 init buffer
drm/etnaviv: map cmdbuf through MMU on version 2
drm/etnaviv: split out iova search and MMU reaping logic
drm/etnaviv: split out FE start
drm/etnaviv: split out wait for gpu idle
drm/etnaviv: move gpu_va() to etnaviv mmu
drm/etnaviv: remove unused iommu_v2 header
drm/etnaviv: move IOMMU domain allocation into etnaviv MMU
drm/etnaviv: indirect IOMMU restore through etnaviv MMU
drm/etnaviv: move linear window setup into etnaviv_iommuv1_restore
drm/etnaviv: rename etnaviv_iommu_domain_restore to etnaviv_iommuv1_restore
drm/etnaviv: only check if the cmdbuf is inside the linear window on MMUv1
drm/etnaviv: only try to use the linear window on MMUv1
...
If we reset the GPU to get it back into a usable state we lose
all context, not just the MMU one. Mark the whole context as
lost to trigger a restore of the exec and MMU state.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
For cmdbufs the CPU IOVA was recorded instead of the GPU one.
Fix this to make it consistent with other BOs and to make
reading the dumps easier.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
At least on the GC3000 the FE MMU is not properly flushing stale TLB
entries. Make sure to map the cmdbufs with a big enough spacing in
the IOVAs to not hit old/prefetched TLB entries when jumping to a
newly mapped cmdbuf.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
GC2000+ on the i.MX6QP is just a re-branded GC3000, lets call it by
its real name to avoid confusion in other parts of the driver.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
All other parts are now in place, so implement the actual translation
step and hook it up, so the driver claims support for cores with
the new MMU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Bit 30 of the interrupt status signals an MMU exception. Handle this
condition properly and dump some useful registers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Flushing works differently on MMUv2, in that it's only necessary
to set a single bit in the control register to flush all translation
units. A semaphore stall then makes sure that the flush has propagated
properly.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Both the safe/scratch address and the master TLB address are per pipe
with the CPU mapped registers not properly propagating to the
different translation units.
The only way to correctly configure all translation units is to have
a command stream snipped executed by the FE, before any other execution
can start.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
With MMUv2 all buffers need to be mapped through the MMU once it
is enabled. Align the buffer size to 4K, as the MMU is only able to
map page aligned buffers.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
With MMUv2 the command buffers need to be mapped through the MMU.
Split out the iova search and MMU reaping logic so it can be reused
for the cmdbuf mapping, where no GEM object is involved.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Split out into a new externally visible function, as the IOMMUv2
code needs this functionality, too.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Split out into a new externally visible function, as the IOMMUv2
code needs this functionality, too.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU virtual address for the command buffers differs depending on
the IOMMU version. Move the calculation of the iova into etnaviv
mmu, to enable proper dispatch.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU code doesn't need to deal with the IOMMU directly, instead
it can all be hidden behind the etnaviv mmu interface. Move the
last remaining part into etnaviv mmu.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This function has external visibility and only handles the Vivant IOMMU
version 1. Rename to make this more clear and allow a clear separation
of the different IOMMU versions.
Also drop the domain parameter, as we can infer it from the GPU we are
dealing with.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
There is no linear window on MMUv2 and the FE can access the full 4GB
address space either directly (as long as the MMU isn't configured) or
through the MMU, once it is up.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
As the comment above the code states, the linear window is only
available on MMUv1. Don't try to use it on MMUv2.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The driver doesn't ever enable individual clocks alone, so there
is no need to scatter the clock enable/disable sequences through
multiple functions. Fold them into the top one.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
In the etnaviv_gpu_platform_probe() error path the 'fail' label is
used to just return the error code.
This can be simplified by returning the error code immediately, so
get rid of the unneeded 'fail' label.
Signed-off-by: Fabio Estevam <festevam@gmail.com>
clk_prepare_enable() may fail, so we should better check for its return
value and propagate it in the case of failure.
Signed-off-by: Fabio Estevam <festevam@gmail.com>