Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Moving all GPU features to the platform struct definition allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct
definitions
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[patch series] Moving all GPU features to the platform struct definition
allows for
- standard place when adding new features from new platforms
- possible to see supported features when dumping struct definition
Signed-off-by: Carlos Santa <carlos.santa@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We don't have safe 64-bit mmio writes as they are really split into
2x32-bit writes. This tearing is dangerous as the hardware *will*
operate on the intermediate value, requiring great care when assigning.
(See, for example, i965_write_fence_reg.) As such we don't currently use
them and strongly advise not to us them. Go one step further and remove
the 64-bit write vfuncs.
v2: Add some more details to the comment about why WRITE64 is absent,
and why you need to think twice before using READ64.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160906144538.4204-1-chris@chris-wilson.co.uk
In an upcoming patch we'll need the actual mask of subslices in addition
to their count, so convert the subslice_per_slice field to a mask.
Also we can easily calculate subslice_total from the other fields, so
instead of storing a cached version of this, add a helper to calculate
it.
v2:
- Use hweight8() on u8 typed vars instead of hweight32(). (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Move all slice/subslice/eu related properties to the sseu_dev_info
struct.
No functional change.
v2:
- s/info/sseu/ based on the new struct name. (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
The data in this struct is provided both by getting the
slice/subslice/eu features available on a given platform and the actual
runtime state of these same features which depends on the HW's current
power saving state.
Atm members of this struct are duplicated in sseu_dev_status and
intel_device_info. For clarity and code reuse we can share one struct
for both of the above purposes. This patch only moves the struct to the
header file, the next patch will convert users of intel_device_info to
use this struct too.
Instead of unsigned int u8 is used now, which is big enough and is used
anyway in intel_device_info.
No functional change.
v2:
- s/stat/sseu/ based on the new struct name (Ben)
Reviewed-by: Robert Bragg <robert@sixbynine.org> (v1)
Reviewed-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Tested-by: Ben Widawsky <benjamin.widawsky@intel.com> (v1)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1472659987-10417-2-git-send-email-imre.deak@intel.com
Rather than walk the full array of engines checking whether each is in
the mask in turn, we can use the mask to jump to the right engines. This
should quicker for a sparse array of engines or mask, whilst generating
smaller code:
text data bss dec hex filename
1251010 4579 800 1256389 132bc5 drivers/gpu/drm/i915/i915.ko
1250530 4579 800 1255909 1329e5 drivers/gpu/drm/i915/i915.ko
The downside is that we have to pass in a temporary, alas no C99
iterators yet.
[P.S. Joonas doesn't like having to pass extra temporaries into the
macro, and even less that I called them tmp. As yet, we haven't found a
macro that avoids passing in a temporary that is smaller. We probably
will get C99 iterators first!]
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160827075401.16470-2-chris@chris-wilson.co.uk
Just like with sysfs, we do some major overhaul.
Pass dev_priv instead of dev to all feature macros (IS_, HAS_,
INTEL_, etc.). This has the side effect that a bunch of functions
now get dev_priv passed instead of dev.
All calls to INTEL_INFO()->gen have been replaced with
INTEL_GEN().
We want access to to_i915(node->minor->dev) in a lot of places,
so add the node_to_i915() helper to accommodate for this.
Finally, we have quite a few cases where we get a void * pointer,
and need to cast it to drm_device *, only to run to_i915() on it.
Add cast_to_i915() to do this.
v2: Don't introduce extra dev (Chris)
v3: Make pipe_crc_info have a pointer to drm_i915_private instead of
drm_device. This saves a bit of space, since we never use
drm_device anywhere in these functions.
Also some minor fixup that I missed in the previous version.
v4: Changed the code a bit so that dev_priv is passed directly
to various functions, thus removing the need for the
cast_to_i915() helper. Also did some additional cleanup.
v5: Additional cleanup of newly introduced changes.
v6: Rebase again because of conflict.
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822105931.pcbe2lpsgzckzboa@boom
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Various cleanup for i915_sysfs.c; we now use dev_priv whenever
possible. The kdev_to_drm_minor() helper function has been
replaced by one that converts from struct device *
to struct drm_i915_private *.
We already have a seemingly identical helper (kdev_to_i915())
in i915_drv.h. But that one cannot be used here.
Unlike the version in i915_drv.h, this helper
reaches i915 through drm_minor.
v2: Rename kdev_to_i915_dm() to kdev_minor_to_i915() (Chris)
Signed-off-by: David Weinehall <david.weinehall@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160822103245.24069-4-david.weinehall@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Since the watermark calculations for Skylake are still broken, we're apt
to hitting underruns very easily under multi-monitor configurations.
While it would be lovely if this was fixed, it's not. Another problem
that's been coming from this however, is the mysterious issue of
underruns causing full system hangs. An easy way to reproduce this with
a skylake system:
- Get a laptop with a skylake GPU, and hook up two external monitors to
it
- Move the cursor from the built-in LCD to one of the external displays
as quickly as you can
- You'll get a few pipe underruns, and eventually the entire system will
just freeze.
After doing a lot of investigation and reading through the bspec, I
found the existence of the SAGV, which is responsible for adjusting the
system agent voltage and clock frequencies depending on how much power
we need. According to the bspec:
"The display engine access to system memory is blocked during the
adjustment time. SAGV defaults to enabled. Software must use the
GT-driver pcode mailbox to disable SAGV when the display engine is not
able to tolerate the blocking time."
The rest of the bspec goes on to explain that software can simply leave
the SAGV enabled, and disable it when we use interlaced pipes/have more
then one pipe active.
Sure enough, with this patchset the system hangs resulting from pipe
underruns on Skylake have completely vanished on my T460s. Additionally,
the bspec mentions turning off the SAGV with more then one pipe enabled
as a workaround for display underruns. While this patch doesn't entirely
fix that, it looks like it does improve the situation a little bit so
it's likely this is going to be required to make watermarks on Skylake
fully functional.
This will still need additional work in the future: we shouldn't be
enabling the SAGV if any of the currently enabled planes can't enable WM
levels that introduce latencies >= 30 µs.
Changes since v11:
- Add skl_can_enable_sagv()
- Make sure we don't enable SAGV when not all planes can enable
watermarks >= the SAGV engine block time. I was originally going to
save this for later, but I recently managed to run into a machine
that was having problems with a single pipe configuration + SAGV.
- Make comparisons to I915_SKL_SAGV_NOT_CONTROLLED explicit
- Change I915_SAGV_DYNAMIC_FREQ to I915_SAGV_ENABLE
- Move printks outside of mutexes
- Don't print error messages twice
Changes since v10:
- Apparently sandybridge_pcode_read actually writes values and reads
them back, despite it's misleading function name. This means we've
been doing this mostly wrong and have been writing garbage to the
SAGV control. Because of this, we no longer attempt to read the SAGV
status during initialization (since there are no helpers for this).
- mlankhorst noticed that this patch was breaking on some very early
pre-release Skylake machines, which apparently don't allow you to
disable the SAGV. To prevent machines from failing tests due to SAGV
errors, if the first time we try to control the SAGV results in the
mailbox indicating an invalid command, we just disable future attempts
to control the SAGV state by setting dev_priv->skl_sagv_status to
I915_SKL_SAGV_NOT_CONTROLLED and make a note of it in dmesg.
- Move mutex_unlock() a little higher in skl_enable_sagv(). This
doesn't actually fix anything, but lets us release the lock a little
sooner since we're finished with it.
Changes since v9:
- Only enable/disable sagv on Skylake
Changes since v8:
- Add intel_state->modeset guard to the conditional for
skl_enable_sagv()
Changes since v7:
- Remove GEN9_SAGV_LOW_FREQ, replace with GEN9_SAGV_IS_ENABLED (that's
all we use it for anyway)
- Use GEN9_SAGV_IS_ENABLED instead of 0x1 for clarification
- Fix a styling error that snuck past me
Changes since v6:
- Protect skl_enable_sagv() with intel_state->modeset conditional in
intel_atomic_commit_tail()
Changes since v5:
- Don't use is_power_of_2. Makes things confusing
- Don't use the old state to figure out whether or not to
enable/disable the sagv, use the new one
- Split the loop in skl_disable_sagv into it's own function
- Move skl_sagv_enable/disable() calls into intel_atomic_commit_tail()
Changes since v4:
- Use is_power_of_2 against active_crtcs to check whether we have > 1
pipe enabled
- Fix skl_sagv_get_hw_state(): (temp & 0x1) indicates disabled, 0x0
enabled
- Call skl_sagv_enable/disable() from pre/post-plane updates
Changes since v3:
- Use time_before() to compare timeout to jiffies
Changes since v2:
- Really apply minor style nitpicks to patch this time
Changes since v1:
- Added comments about this probably being one of the requirements to
fixing Skylake's watermark issues
- Minor style nitpicks from Matt Roper
- Disable these functions on Broxton, since it doesn't have an SAGV
Signed-off-by: Lyude <cpaul@redhat.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471463761-26796-3-git-send-email-cpaul@redhat.com
[mlankhorst: ENOSYS -> ENXIO, whitespace fixes]
In the recent patch
bc3d674 drm/i915: Allow userspace to request no-error-capture upon ...
the final version moved the flags and the associated #defines around
so they were adjacent; unfortunately, they ended up between a comment
and the thing (hw_id) to which the comment applies :(
So this patch reshuffles the comment and subject back together.
Also, as we're touching 'hw_id', let's change it from just 'unsigned'
to a fully-specified 'unsigned int', because some code checking tools
(including checkpatch) object to plain 'unsigned'.
Fixes: bc3d674462 ("drm/i915: Allow userspace to request no-error-capture...")
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471616622-6919-1-git-send-email-david.s.gordon@intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
This is a companion to i915_gem_obj_prepare_shmem_read() that prepares
the backing storage for direct writes. It first serialises with the GPU,
pins the backing storage and then indicates what clfushes are required in
order for the writes to be coherent.
Whilst here, fix support for ancient CPUs without clflush for which we
cannot do the GTT+clflush tricks.
v2: Add i915_gem_obj_finish_shmem_access() for symmetry
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-8-chris@chris-wilson.co.uk
No longer is knowing how much of the GTT (both mappable aperture and
beyond) relevant, and the output clutters the real information - that is
how many objects are allocated and bound (and by who) so that we can
quickly grasp if there is a leak.
v2: Relent, and rename pinned to indicate display only. Since the
display objects are semi-static and are of variable size, they are the
interesting objects to watch over time for aperture leaking. The other
pins are either static (such as the scratch page) or very short lived
(such as execbuf) and not part of the precious GGTT.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1471254551-25805-6-git-send-email-chris@chris-wilson.co.uk
This patch provides the infrastructure for performing a 16-byte aligned
read from WC memory using non-temporal instructions introduced with sse4.1.
Using movntdqa we can bypass the CPU caches and read directly from memory
and ignoring the page attributes set on the CPU PTE i.e. negating the
impact of an otherwise UC access. Copying using movntdqa from WC is almost
as fast as reading from WB memory, modulo the possibility of both hitting
the CPU cache or leaving the data in the CPU cache for the next consumer.
(The CPU cache itself my be flushed for the region of the movntdqa and on
later access the movntdqa reads from a separate internal buffer for the
cacheline.) The write back to the memory is however cached.
This will be used in later patches to accelerate accessing WC memory.
v2: Report whether the accelerated copy is successful/possible.
v3: Function alignment override was only necessary when using the
function target("sse4.1") - which is not necessary for emitting movntdqa
from __asm__.
v4: Improve notes on CPU cache behaviour vs non-temporal stores.
v5: Fix byte offsets for unrolled moves.
v6: Find all remaining typos of "movntqda", use kernel_fpu_begin.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Akash Goel <akash.goel@intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-2-git-send-email-chris@chris-wilson.co.uk
vmaps has a provision for controlling the page protection bits, with which
we can use to control the mapping type, e.g. WB, WC, UC or even WT.
To allow the caller to choose their mapping type, we add a parameter to
i915_gem_object_pin_map - but we still only allow one vmap to be cached
per object. If the object is currently not pinned, then we recreate the
previous vmap with the new access type, but if it was pinned we report an
error. This effectively limits the access via i915_gem_object_pin_map to a
single mapping type for the lifetime of the object. Not usually a problem,
but something to be aware of when setting up the object's vmap.
We will want to vary the access type to enable WC mappings of ringbuffer
and context objects on !llc platforms, as well as other objects where we
need coherent access to the GPU's pages without going through the GTT
v2: Remove the redundant braces around pin count check and fix the marker
in documentation (Chris)
v3:
- Add a new enum for the vmalloc mapping type & pass that as an argument to
i915_object_pin_map. (Tvrtko)
- Use PAGE_MASK to extract or filter the mapping type info and remove a
superfluous BUG_ON.(Tvrtko)
v4:
- Rename the enums and clean up the pin_map function. (Chris)
v5: Drop the VM_NO_GUARD, minor cosmetics.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1471001999-17787-1-git-send-email-chris@chris-wilson.co.uk
The PPS registers are pretty much the same everywhere, the differences
being:
- Register fields appearing, disappearing from one platform to the
next: panel-reset-on-powerdown, backlight-on, panel-port,
register-unlock
- Different register base addresses
- Different number of PPS instances: 2 on VLV/CHV/BXT, 1 everywhere
else.
We can merge the separate set of PPS definitions by extending the PPS
instance argument to all platforms and using instance 0 on platforms
with a single instance. This means we'll need to calculate the register
addresses dynamically based on the given platform and PPS instance.
v2:
- Simplify if ladder in intel_pps_get_registers(). (Ville)
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1470827254-21954-1-git-send-email-imre.deak@intel.com