Instead of implementing the full tracking + dynamic allocation, this
patch does a bit less than half of the work, by tracking and warning on
unexpected conditions. The tracking itself follows which PTEs within a
page table are currently being used for objects. The next patch will
modify this to actually allocate the page tables only when necessary.
With the current patch there isn't much in the way of making a gen
agnostic range allocation function. However, in the next patch we'll add
more specificity which makes having separate functions a bit easier to
manage.
One important change introduced here is that DMA mappings are
created/destroyed at the same page directories/tables are
allocated/deallocated.
Notice that aliasing PPGTT is not managed here. The patch which actually
begins dynamic allocation/teardown explains the reasoning for this.
v2: s/pdp.page_directory/pdp.page_directories
Make a scratch page allocation helper
v3: Rebase and expand commit message.
v4: Allocate required pagetables only when it is needed, _bind_to_vm
instead of bind_vma (Daniel).
v5: Rebased to remove the unnecessary noise in the diff, also:
- PDE mask is GEN agnostic, renamed GEN6_PDE_MASK to I915_PDE_MASK.
- Removed unnecessary checks in gen6_alloc_va_range.
- Changed map/unmap_px_single macros to use dma functions directly and
be part of a static inline function instead.
- Moved drm_device plumbing through page tables operation to its own
patch.
- Moved allocate/teardown_va_range calls until they are fully
implemented (in subsequent patch).
- Merged pt and scratch_pt unmap_and_free path.
- Moved scratch page allocator helper to the patch that will use it.
v6: Reduce complexity by not tearing down pagetables dynamically, the
same can be achieved while freeing empty vms. (Daniel)
v7: s/i915_dma_map_px_single/i915_dma_map_single
s/gen6_write_pdes/gen6_write_pde
Prevent a NULL case when only GGTT is available. (Mika)
v8: Rebased after s/page_tables/page_table/.
v9: Reworked i915_pte_index and i915_pte_count.
Also exercise bitmap allocation here (gen6_alloc_va_range) and fix
incorrect write_page_range in i915_gem_restore_gtt_mappings (Mika).
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v3+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In Gen8, PDPs are saved and restored with legacy contexts (legacy contexts
only exist on the render ring). So change the ordering of LRI vs MI_SET_CONTEXT
for the initialization of the context. Also the only cases in which we
need to manually update the PDPs are when MI_RESTORE_INHIBIT has been
set in MI_SET_CONTEXT (i.e. when the context is not yet initialized or
it is the default context).
Legacy submission is not available post GEN8, so it isn't necessary to
add extra checks for newer generations.
v2: Use new functions to replace the logic right away (Daniel)
v3: Add missing pd load logic.
v4: Add warning in case pd_load_pre & pd_load_post are true, and add
missing trace_switch_mm. Cleaned up pd_load conditions. Add more
information about when is pd_load_post needed. (Mika)
Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Michel Thierry <michel.thierry@intel.com> (v2+)
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
And remove one bogus * from i915_gem_gtt.c since that's not a
kerneldoc there.
v2: Review from Chris:
- Clarify memory space to better distinguish from address space.
- Add note that shrink doesn't guarantee the freed memory and that
users must fall back to shrink_all.
- Explain how pinning ties in with eviction/shrinker.
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Two code changes:
- Extract i915_gem_shrinker_init.
- Inline i915_gem_object_is_purgeable since we open-code it everywhere
else too.
This already has the benefit of pulling all the shrinker code
together, next patch adds a bit of kerneldoc.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Use both up/down manual ei calcuations for symmetry and greater
flexibility for reclocking, instead of faking the down interrupt based
on a fixed integer number of up interrupts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Rewrite commit 31685c258e
Author: Deepak S <deepak.s@linux.intel.com>
Date: Thu Jul 3 17:33:01 2014 -0400
drm/i915/vlv: WA for Turbo and RC6 to work together.
Other than code clarity, the major improvement is to disable the extra
interrupts generated when idle. However, the reclocking remains rather
slow under the new manual regime, in particular it fails to downclock as
quickly as desired. The second major improvement is that for certain
workloads, like games, we need to combine render+media activity counters
as the work of displaying the frame is split across the engines and both
need to be taken into account when deciding the global GPU frequency as
memory cycles are shared.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
When we idle, we set the GPU frequency to the hardware minimum (not user
minimum). We introduce a new variable to distinguish between the
different roles, and to allow easy tuning of the idle frequency without
impacting over aspects of RPS. Setting the minimum frequency should be a
safety blanket as the pcu on the GPU should be power gating itself
anyway. However, in order for us to do set the absolute minimum
frequency, we need to relax a few of our assertions that we do not
exceed the user limits.
v2: Add idle_freq
v3: Init idle_freq for vlv and add a bunch of WARNs
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Reviewed-by: Deepak S<deepak.s@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
If the batch buffer is too large to fit into the aperture and we need a
GTT mapping for relocations, we currently fail. This only applies to a
subset of machines for a subset of environments, quite undesirable. We
can simply check after failing to insert the batch into the GTT as to
whether we only need a mappable binding for relocation and, if so, we can
revert to using a non-mappable binding and an alternate relocation
method. However, using relocate_entry_cpu() is excruciatingly slow for
large buffers on non-LLC as the entire buffer requires clflushing before
and after the relocation handling. Alternatively, we can implement a
third relocation method that only clflushes around the relocation entry.
This is still slower than updating through the GTT, so we prefer using
the GTT where possible, but is orders of magnitude faster as we
typically do not have to then clflush the entire buffer.
An alternative idea of using a temporary WC mapping of the backing store
is promising (it should be faster than using the GTT itself), but
requires fairly extensive arch/x86 support - along the lines of
kmap_atomic_prof_pfn() (which is not universally implemented even for
x86).
Testcase: igt/gem_exec_big #pnv,byt
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88392
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
[danvet: Add a WARN_ONCE for the impossible reloc case and explain in
a short comment why we want to avoid ping-pong.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
In the original code then if WARN_ON(i915_is_ggtt(vm) != !!ggtt_view)
was true then we leak "vma". Presumably that doesn't happen often but
static checkers complain and this bug is easy to fix.
Fixes: c3bbb6f2825d ('drm/i915: Do not use ggtt_view with (aliasing) PPGTT')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Allow for a larger receive data size, and check if the receiver returned
the number of bytes written. Without this, we've basically skipped all
the unwritten bytes for short writes.
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
To keep things clear rename the intel_dp->supported_rates[] to
intel_dp->sink_rates[], and rename the supported_rates[] name we used
elsewhere for the intersection of source and sink rates to
common_rates[].
Cc: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
"P1273_DPLL_Programming Spreadsheet.xlsm" lists a boatload of
frequencies for eDP. Try to use them all.
For now I've decided not to add hardcoded DPLL dividers for these cases
since chv_find_best_dpll() works just fine.
I've not actually tested any of these since I don't have an eDP 1.4 panel.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Complain loudly if we ever attempt to overflow the the supported_rates[]
array. This should never happen since the sink_rates[] array will always
be smaller or of equal size. But should someone change that we want to
catch it without scribblign over the stack.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Now that intel_dp_max_link_bw() no longer considers the source
restrictions we may try to enable MST with 5.4GHz even when the source
doesn't support it. To fix that switch the code over to handle the link
rate in the same way as the SST code handles it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
intel_dp_compute_config() only really needs to know the rates supported
by both source and sink, so hide the raw source and sink arrays from it.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Remove the sink vs. source limit mess from intel_dp_max_link_bw() and
just move the source restriction checks to intel_dp_source_rates().
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
[danvet: Resolve conflict with WaDisableHBR2:skl patch.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Once we've read the rates from the sink we don't have to mess with them,
so the caller can just look at the stored rates without doing extra
copies. If the sink doesn't support the new link rate stuff, we just
point the caller at the default_rates[] array.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Sonika Jindal <sonika.jindal@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
GGTT views are only applicable when dealing with GGTT. Change the code to
reject ggtt_view where it should not be used and require it when it should
be.
v2:
- Dropped _ppgtt_ infixes, allow both types to be passed
- Disregard other but normal views when no view is specified
- More checks that valid parameters are passed
- More readable error checking
v3:
- Prefer WARN_ONCE over BUG_ON when there is code path for failure
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
[danvet: Drop unecessary forward decl from earlier patch iterations.]
[danvet: Remove unused variable spotted by Tvrtko.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Backmerge because of numerous and interleaving conflicts and git
rerere getting confused a bit too often.
Conflicts:
drivers/gpu/drm/i915/intel_display.c
All conflicts are because of -next patches backported to -fixes, so
just go with the code in -next.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Backporting a couple of plane related fixes from drm-next to v4.0.
* tag 'drm-intel-fixes-2015-03-19' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Make sure the primary plane is enabled before reading out the fb state
drm/i915: Ensure plane->state->fb stays in sync with plane->fb
- Fixing SDMA initialization when in non-HWS mode (debug mode)
- Memory leak fix when destroying kernel queue
- Fix number of available compute pipelines according to new firmware
* tag 'drm-amdkfd-fixes-2015-03-19' of git://people.freedesktop.org/~gabbayo/linux:
drm/radeon: Changing number of compute pipe lines
drm/amdkfd: Fix SDMA queue init. in non-HWS mode
drm/amdkfd: destroy mqd when destroying kernel queue
This adds initial DP 1.2 MST support to radeon, on CAYMAN
and up in theory.
This is off by default.
v2: agd5f:
- add UNIPHY3 offsets
- move atom cmd table code into atombios_encoders.c
- whitespace cleanup
- replace some magic numbers with proper defines
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For MST we need to be able to pick front end encoders
separate from backend, but only for MST, so we need to
make the encoder picking interface smarter.
v2: agd5f: squash in:
drm/radeon: release digital encoder before asking for new one
Reported-by: Dieter Nützel <Dieter@nuetzel-hh.de>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
These allow overriding the encoder id with the frontend,
we need this for setting up MST.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This adds support for processing short irqs, and triggering
the dp_work.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is to be called on short HPD irqs, just introduce
the basic infrastructure for it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
radeon requires this to get the slots for later filling
out a table on every transition.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The atombios tables have an unfortunate restriction on only
being able to write 12 bytes, MST really wants 16-bytes here,
and since the hw can do it, we should just write directly to it.
This uses a module option to allow for it now, and maybe
we should provide the old code as a fallback for a while.
v2: (agd5f)
- move registers to a proper register header
- only enable on DCE5+
- enable by default on DCE5+
- Switch pad to aux mode before using it
- reformat instance handling to better match the
rest of the driver
Signed-off-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to query certain registers from userspace
for profiling and harvest configuration. E.g., it can
be used by the GALLIUM_HUD for profiling the status of
various gfx blocks.
Tested-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>