The etnaviv_cmdbuf.c doesn't reference any functions or data members
defined in drm_mm.h, remove unneeded headers may reduce kernel compile
times.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The shader L1 cache is a writeback cache for shader loads/stores
and thus must be flushed before any BOs backing the shader buffers
are potentially freed.
Cc: stable@vger.kernel.org
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Since the kernel ringbuffers are allocated from a larger suballocated
area, same as the user commandbufs, they don't need to be CPU page
sized. Allocate 4KB for the kernel ring buffers, as we never use more
than that.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Etnaviv assumes that GPU page size is 4KiB, however, GPUVA ranges collision
when using softpin capable GPUs on a non 4KiB CPU page size configuration.
The root cause is that kernel side BO takes up bigger address space than
userspace expect, the size of backing memory of GEM buffer objects are
required to align to the CPU PAGE_SIZE. Therefore, results in userspace
allocated GPUVA range fails to be inserted to the specified hole exactly.
To solve this problem, record the GPU visiable size of a BO firstly, then
map and unmap the SG entry strictly with respect to the total GPUVA size.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The GPU visible size of a GEM BO is not necessarily PAGE_SIZE aligned,
which happens when CPU page size is not equal to GPU page size. Extra
precious resources such as GPU page tables and GPU TLBs may being paid
because of this but never get used.
Track the size of GPU visible part of GEM BO separately, ensure no
GPUVA range wasting by aligning that size to GPU page size.
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Large draws can make the GPU appear to be stuck to the current hangcheck
logic as the FE address will not move until the draw is finished. However,
the FE has a debug register, which records the current primitive ID within
a draw. Using this debug register we can extend the timeout as long as the
draw progresses.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Update state_hi.xml.h header from etna_viv commit
8f43a34fd9cd ("rndb: document FE current primitve debug reg")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
A later change will use the FE debug registers to improve GPU
progress monitoring. Instead of having to keep track of the
usage state of the debug registers and lock access to the
VIVS_HI_CLOCK_CONTROL register, statically enable debug
register access during GPU init.
The Vivante downstream driver seems to do the same thing since
a while, so it should be okay to keep access enabled. (See
gckHARDWARE_InitializeHardware in 6.4.11 downstream driver).
Many debug registers contain bogus data if clock gating is
enabled, so even if they are always accessible performance
profiling still needs to manage some prerequisites.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The perf counter read functions don't just read registers, but they
also mutate state to direct the reads towards the correct pipe and
engine. Assert that the GPU mutex is held at this point, so that
those state changes don't interfere with others.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The perfmon sampling mutates shared GPU state (e.g. VIVS_HI_CLOCK_CONTROL
to select the pipe for the perf counter reads). To avoid clashing with
other functions mutating the same state (e.g. etnaviv_gpu_update_clock)
the perfmon sampling needs to hold the GPU lock.
Fixes: 68dc0b295d ("drm/etnaviv: use 'sync points' for performance monitor requests")
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
In the etnaviv_pdev_probe() and etnaviv_gpu_platform_probe() function, the
value of '&pdev->dev' has been cached to the local auto variable 'dev'.
But some callers use 'dev', while the rest use '&pdev->dev'. To keep it
consistent, use 'dev' uniformly.
Tested-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Currently, the calling of mutex_destroy() is ignored on error handling
code path. It is safe for now, since mutex_destroy() actually does
nothing in non-debug builds. But the mutex_destroy() is used to mark
the mutex uninitialized on debug builds, and any subsequent use of the
mutex is forbidden.
It also could lead to problems if mutex_destroy() gets extended, add
missing mutex_destroy() to eliminate potential concerns.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Currently, the etnaviv_gem_submit.c isn't call any runtime power management
functions. So drop this unused header, we can include it back when it
really get used though.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The unpin_user_pages() function takes an 'unsigned long' argument to
store the number of userspace pages, and the struct drm_gem_object::size
is a size_t type. The number of pages can not be negative, hence, use
'unsigned' variable to count the number of pages.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The drm_prime_pages_to_sg() function takes an 'unsigned int' argument to
store the length of the page vector. The size of the object in number of
CPU pages can not be negative, hence, use 'unsigned' variable to store
the number of pages, instead of the 'signed' one.
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Remove __GFP_HIGHMEM when requesting a page from DMA32 zone,
and since all vivante GPUs in the system will share the same
DMA constraints, move the check of whether to get a page from
DMA32 to etnaviv_bind().
Fixes: b72af445cd ("drm/etnaviv: request pages from DMA32 zone when needed")
Suggested-by: Sui Jingfeng <sui.jingfeng@linux.dev>
Signed-off-by: Xiaolei Wang <xiaolei.wang@windriver.com>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The current implementation of drm_sched_start uses a hardcoded
-ECANCELED to dispose of a job when the parent/hw fence is NULL.
This results in drm_sched_job_done being called with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult
to distinguish between recovery methods, whether a queue reset or a
full GPU reset was used.
To improve this, we first try a soft recovery for timeout jobs and
use the error code -ENODATA. If soft recovery fails, we proceed with
a queue reset, where the error code remains -ENODATA for the job.
Finally, for a full GPU reset, we use error codes -ECANCELED or
-ETIME. This patch adds an error code parameter to drm_sched_start,
allowing us to differentiate between queue reset and GPU reset
failures. This enables user mode and test applications to validate
the expected correctness of the requested operation. After a
successful queue reset, the only way to continue normal operation is
to call drm_sched_job_done with the specific error code -ENODATA.
v1: Initial implementation by Jesse utilized amdgpu_device_lock_reset_domain
and amdgpu_device_unlock_reset_domain to allow user mode to track
the queue reset status and distinguish between queue reset and
GPU reset.
v2: Christian suggested using the error codes -ENODATA for queue reset
and -ECANCELED or -ETIME for GPU reset, returned to
amdgpu_cs_wait_ioctl.
v3: To meet the requirements, we introduce a new function
drm_sched_start_ex with an additional parameter to set
dma_fence_set_error, allowing us to handle the specific error
codes appropriately and dispose of bad jobs with the selected
error code depending on whether it was a queue reset or GPU reset.
v4: Alex suggested using a new name, drm_sched_start_with_recovery_error,
which more accurately describes the function's purpose.
Additionally, it was recommended to add documentation details
about the new method.
v5: Fixed declaration of new function drm_sched_start_with_recovery_error.(Alex)
v6 (chk): rebase on upstream changes, cleanup the commit message,
drop the new function again and update all callers,
apply the errno also to scheduler fences with hw fences
v7 (chk): rebased
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240826122541.85663-1-christian.koenig@amd.com
Since 45ecaea738 ("drm/sched: Partial revert of 'drm/sched: Keep
s_fence->parent pointer'") still active jobs aren't put back in the
pending list on drm_sched_start(), as they don't have a active
parent fence anymore, so if the GPU is still working and the timeout
is extended, all currently active jobs will be freed.
To avoid prematurely freeing jobs that are still active on the GPU,
don't block the scheduler until we are fully committed to actually
reset the GPU.
As the current job is already removed from the pending list and
will not be put back when drm_sched_start() isn't called, we must
make sure to put the job back on the pending list when extending
the timeout.
Cc: stable@vger.kernel.org #6.0
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
A single IRQ might signal the completion of multiple jobs/fences
at once. There is no point in attaching a new timestamp to each
fence that only differs in when exactly the IRQ handler was able
to process this fence.
Get a single timestamp when the IRQ handler has determined that
there are completed jobs and reuse this for all fences that get
signalled by the handler.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
The dma sync operation needs to be done with DMA_BIDIRECTIONAL when
the BO is prepared for both read and write operations.
Fixes: a8c21a5451 ("drm/etnaviv: add initial etnaviv DRM driver")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
The etnaviv devcoredump is created in the GPU reset path, which
must make forward progress to avoid stalling memory reclaim on
unsignalled dma fences. The currently used __GFP_NORETRY does not
prohibit sleeping on direct reclaim, breaking the forward progress
guarantee. Switch to GFP_NOWAIT, which allows background reclaim
to be triggered, but avoids any stalls waiting for direct reclaim.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On MMUv2 cores the linear window is only relevant when starting the FE,
before the MMU has been activated. Once the MMU is active, all accesses
are translated with no way to bypass the MMU via the linear window. Thus
TS ignoring the linear window offset is not an issue on cores with MMUv2
present and there is no need to disable TS when we need to move the
linear window.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Tested-by: Joao Paulo Goncalves <joao.goncalves@toradex.com>
On some hardware (such at the GC7000 rev 6009), these registers need to be
read twice to return the correct value. Hide that in gpu_read().
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Turn the etnaviv_is_model_rev() macro into a static inline function.
Use the raw model number as a parameter instead of the chipModel_GCxxxx
defines. This reduces synchronization requirements for the generated
headers. For newer hardware, the GCxxxx names are not the correct model
names anyway. For example, model 0x8000 NPUs are called VIPNano-QI/SI(+)
by VeriSilicon.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Update the state HI header from the rnndb commit
8d7ee714cfe2 ("Merge pull request #24 from pH5/unknown-3950").
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Core in platform_driver_register() already sets the .owner, so driver
does not need to. Whatever is set here will be anyway overwritten by
main driver calling platform_driver_register().
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This reverts commit 1dccdba084.
In userspace a different approach was choosen - hwdb. As a result, there
is no need for these values.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
commit 4bce244272 ("drm/etnaviv: disable tx clock gating for GC7000
rev6203") accidentally applied the fix for i.MX8MN errata ERR050226 to
GC2000 instead of GC7000, failing to disable tx clock gating for GC7000
rev 0x6023 as intended.
Additional clean-up further propagated this issue, partially breaking
the clock gating fixes added for GC7000 rev 6202 in commit 432f51e7de
("drm/etnaviv: add clock gating workaround for GC7000 r6202").
Signed-off-by: Derek Foreman <derek.foreman@collabora.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The hwdb selection logic as a feature that allows it to mark some fields
as 'don't care'. If we match with such a field we memcpy(..)
the current etnaviv_chip_identity into ident.
This step can overwrite some id values read from the GPU with the
'don't care' value.
Fix this issue by restoring the affected values after the memcpy(..).
As this is crucial for user space to know when this feature works as
expected increment the minor version too.
Fixes: 4078a1186d ("drm/etnaviv: update hwdb selection logic")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
0x1540 is the address of 4th render target address pair (two pixel pipes).
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
These ones will be needed to make use fo the NN and TP units in the NPUs
based on Vivante IP.
Also fix the number of NN cores in the VIPNano-qi.
Signed-off-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Acked-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Module level clock gating and the pulse eater might interfere with
the GPU reset, as they both have the potential to stop the clock
and thus reset propagation to parts of the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
The 'len' parameter is the 4th argument, because it is not get used, so
drop it. No functional change.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
As in the etnaviv_gem_get_pages() function, the point to the drm_device
has already been cached to the 'dev' local variable. We can use it
directly, While at it, using 'unsigned int' type to count the number of
pages. As the drm_prime_pages_to_sg() function takes an unsigned int type
for its third argument. No functional change.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
[lst: Reword subject to make more generic and match patch content]
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
This patch make the code in the etnaviv_pdev_probe() less twisted, and it
also make it easier to drop the reference to device node after finished.
Before apply this patch, there is no call to of_node_put() when done. We
should call of_node_put() when done.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The .remove() callback for a platform driver returns an int which makes
many driver authors wrongly assume it's possible to do error handling by
returning an error code. However the value returned is (mostly) ignored
and this typically results in resource leaks. To improve here there is a
quest to make the remove callback return void. In the first step of this
quest all drivers are converted to .remove_new() which already returns
void.
Trivially convert the etnaviv drm driver from always returning zero in
the remove callback to the void returning variant.
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Jyri Sarha <jyri.sarha@iki.fi>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20231102165640.3307820-25-u.kleine-koenig@pengutronix.de
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.
This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.
However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.
In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>
The newly introduced functions are etnaviv_create_platform_device() and
etnaviv_destroy_platform_device(). Those two function are pure function
and can be shared for other use case. Currently, the benefit is that we
no longer need to call of_node_put() for three different cases, we only
need to call it once in the etnaviv_init() function.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
1) Keep the curly brace aligned.
2) No indentation by double tabs where single tab indentation is enough.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
The mentioned second parameter is the 'u32 size', but it is not get used by
the etnaviv_gem_new_impl() function, so drop it. No functional change.
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>