These values are never quite useful for dynamic allocations of the page
tables. Getting rid of them will help prevent later confusion.
v2: Updated to use unmap_and_free_pd functions.
v3: Updated gen8_ppgtt_free after teardown logic was removed.
v4: Rebase after s/page_tables/page_table/.
v5: Keep allocating all page directories in GEN8+ systems with less
than 4GB of memory. Updated gen6_for_all_pdes.
v6: Prevent (harmless) out of range access in gen6_for_all_pdes.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
One important part of this patch is we now write a scratch page
directory into any unused PDP descriptors. This matters for 2 reasons,
first, we're not allowed to just use 0, or an invalid pointer, and second,
we must wipe out any previous contents from the last context.
The latter point only matters with full PPGTT. The former point only
effect platforms with less than 4GB memory.
v2: Updated commit message to point that we must set unused PDPs to the
scratch page.
v3: Unmap scratch_pd in gen8_ppgtt_free.
v4: Initialize scratch_pd. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Start using gen8_for_each_pde macro to allocate page tables.
v2: teardown_va_range references removed.
v3: Rebase after s/page_tables/page_table/.
v4: Keep setting up page tables for all page directories in systems with
less than 4GB of memory.
v5: Also initialize the page tables. (Mika)
v6: Initialize all page tables, including the extra ones from systems
with less than 4GB of memory. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Start using gen8_for_each_pdpe macro to allocate the page directories.
Similar to PTs, while setting up a page directory, make all entries of
the pd point to the scratch pd before mapping (and make all its entries
point to the scratch page); this is to be safe in case of out of bound
access or proactive prefetch. Systems without LLC require an explicit
flush.
v2: Rebased after s/free_pt_*/unmap_and_free_pt/ change.
v3: Rebased after teardown va range logic was removed.
v4: Keep setting up all page directories for systems with less than 4GB
of memory.
v5: Initialize PDs. (Mika)
v6: Initialize also the extra PDs from systems with less than 4GB of
memory. (Mika)
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to gen6, we will use for_each_pde/for_each_pdpe
and pte/pde/pdpe_index to iterate over these new structures.
v2: Match trace_i915_va_teardown params
v3: Multiple rebases.
v4: Updated to use unmap_and_free_pt.
v5: teardown_va_range logic no longer needed.
v6: Rebase after s/page_tables/page_table/.
v7: Renamed commit to match what it does now (it was "Use dynamic
allocation idioms on free").
v8: Prevent (harmless) out of range access in gen8_for_each_pde and
gen8_for_each_pdpe_e.
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[danvet: s/BUG/WARN/]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Similar to gen6, while setting up a page table, make all entries of the
pt point to the scratch page before mapping; this is to be safe in case
of out of bound access or proactive prefetch.
Systems without LLC require an explicit flush.
v2: Expanded commit text and fixed indentation (Mika)
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Lets try to keep this consistent:
Page Directory Pointer (PDP).
Page Directory (PD), also known as page directory pointer entries.
Page Table (PT), also known as page directory entries.
s/struct i915_page_table_entry/struct i915_page_table/
s/struct i915_page_directory_entry/struct i915_page_directory/
s/struct i915_page_directory_pointer_entry/struct
i915_page_directory_pointer/
Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Michel Thierry <michel.thierry@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is mostly useful for execlists where the rings switch between
contexts (and so checking that the ring's start register matches the
context is important).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This is just so that I don't have to read about the batch pool on
systems that are not using it! Rather than using a newline between the
kernel clients and userspace clients, just distinguish the internal
allocations with a '[k]'
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Since we use obj->active as a hint in many places throughout the code,
knowing its state in debugfs is extremely useful.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now with the trimmed memcpy before the command parser, we try to
allocate many different sizes of batches, predominantly one or two
pages. We can therefore speed up searching for a good sized batch by
keeping the objects of buckets of roughly the same size.
v2: Add a comment about bucket sizes
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
At runtime, this helps ensure that the batch pools are kept trim and
fast. Then at suspend, this releases memory that we do not need to
restore. It also ties into the oom-notifier to ensure that we recover as
much kernel memory as possible during OOM.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
I woke up one morning and found 50k objects sitting in the batch pool
and every search seemed to iterate the entire list... Painting the
screen in oils would provide a more fluid display.
One issue with the current design is that we only check for retirements
on the current ring when preparing to submit a new batch. This means
that we can have thousands of "active" batches on another ring that we
have to walk over. The simplest way to avoid that is to split the pools
per ring and then our LRU execution ordering will also ensure that the
inactive buffers remain at the front.
v2: execlists still requires duplicate code.
v3: execlists requires more duplicate code
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Move the madvise logic out of the execbuffer main path into the
relatively rare allocation path, making the execbuffer manipulation less
fragile.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the next patch, I want to use the structure elsewhere and so require
it defined earlier. Rather than move the definition to an earlier location
where it feels very odd, place it in its own header file.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
This reverts commit ec5cc0f9b0
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Jun 12 10:28:55 2014 +0100
drm/i915: Restrict GPU boost to the RCS engine
The premise that media/blitter workloads are not affected by boosting is
patently false with a trip through igt. The question that remains is
what exactly is going wrong with the media workload that prompted this?
Hopefully that would be fixed by the missing agressive downclocking, in
addition to the extra restrictions imposed on how frequent a process is
allowed to boost.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll>
Acked-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
With boosting for missed pageflips, we have a much stronger indication
of when we need to (temporarily) boost GPU frequency to ensure smooth
delivery of frames. So now only allow each client to perform one RPS boost
in each period of GPU activity due to stalling on results.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If we hit a vblank and see that have a pageflip queue but not yet
processed, ensure that the GPU is running at maximum in order to clear
the backlog. Pageflips are only queued for the following vblank, if we
miss it, there will be a visible stutter. Boosting the GPU frequency
doesn't prevent us from missing the target vblank, but it should help
the subsequent frames hitting theirs.
v2: Reorder vblank vs flip-complete so that we only check for a missed
flip after processing the completion events, and avoid spurious boosts.
v3: Rename missed_vblank
v4: Rebase
v5: Cancel the outstanding work in runtime suspend
v6: Rebase
v7: Rebase required fixing
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Deepak S<deepak.s@linux.intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Reuse the same reclocking strategy for Baytail as on its bigger brethren,
Sandybridge and Ivybridge. In particular, this makes the device quicker
to reclock (both up and down) though the tendency now is to downclock
more aggressively to compensate for the RPS boosts.
v2: Rebase
v3: Exclude Cherrytrail as Deepak was concerned that the increased
number of register writes would wake the common powerwell too often.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Deepak S <deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Currently we emit semaphore synchronisation as if we were going to flip
using the target CS engine, but we then change our minds and do the flip
using the CPU. Consequently we write instructions to the ring but never
use them - even to the point of filling that ring up entirely and never
submitting a request.
The wrinkle in the ointment is that we have to tell a white lie to
pin-to-display for it to skip the synchronisation for mmioflips as we
will create a task specifically for that slow synchronisation. An oddity
of note is the discrepancy in requests that we tell to pin-display to
serialise to and that we then eventually wait upon. This is due to a
limitation in the i915_gem_object_sync() routine that will be lifted
later.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The biggest user of i915_gem_object_get_page() is the relocation
processing during execbuffer. Typically userspace passes in a set of
relocations in sorted order. Sadly, we alternate between relocations
increasing from the start of the buffers, and relocations decreasing
from the end. However the majority of consecutive lookups will still be
in the same page. We could cache the start of the last sg chain, however
for most callers, the entire sgl is inside a single chain and so we see
no improve from the extra layer of caching.
v2: Avoid the double increment inside unlikely()
References: https://bugs.freedesktop.org/show_bug.cgi?id=88308
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
For the conversion to atomic. The pre_enable() hooks are called as part
of the crtc enable sequence, at which point the staged config was
already made effective. Furthermore, the function actually changes
hardware state, so it should anyway deal with current and not staged
config.
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that we use a drm atomic state for the legacy modeset, it is
possible to get rid of the usage of intel_crtc->new_config in the
function intel_mode_max_pixclk().
Signed-off-by: Ander Conselvan de Oliveira <ander.conselvan.de.oliveira@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
../drivers/gpu/drm/i915/intel_pm.c:3185:45: warning: Initializer entry defined twice
../drivers/gpu/drm/i915/intel_pm.c:3185:52: also defined here
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Sometimes userspace wants a true overlay that is never clipped. In such
cases, we need to disable the destination colorkey. However, it is
currently unconditionally enabled in the overlay with no means of
disabling. So rectify that by always default to on, and extending the
UPDATE_ATTR ioctl to support explicit disabling of the colorkey.
This is contrast to the spite code which requires explicit enabling of
either the destination or source colorkey. Handling source colorkey is
still todo for the overlay. (Of course it may be worth migrating overlay
to sprite before then.)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Modify the Gen9 SSEU device status logic to support Broxton.
Broxton reuses the Skylake power gate acknowledgment registers but
has at most 1 slice and 3 subslices. Broxton supports subslice
power gating within its single slice.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Modify the Gen9 SSEU info initialization logic to support
Broxton. Broxton reuses the SKL fuse registers but has at most
1 slice and 6 EU per subslice.
Signed-off-by: Jeff McGee <jeff.mcgee@intel.com>
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
On Broxton per specification the GTT has to be mapped as uncached.
This was caught by the PTE write readback warning, which showed a
corrupted PTE value with using the current write-combine mapping.
v2:
- add comment explaining how the problem with WC mapping manifests
(Daniel)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The caching options for page table entries have remained the same as
Cherryview. This patch fixes it so the right code path is taken on BXT.
v2: Fix up commit message (Mike)
Signed-off-by: Sumit Singh <sumit.k.singh@intel.com>
Signed-off-by: Damien Lespiau <damien.lespiau@intel.com>
Reviewed-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Starting from GEN5 the FBC base register is the same on all platforms.
GEN>=5 is the same condition as HAS_PCH_SPLIT except on BXT, so make
things work on BXT as well.
Motivated by Rodrigo's request to check FBC support on BXT.
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Antti Koskipää <antti.koskipaa@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>