If we try and fail to allocate a i915_request, we apply some
backpressure on the clients to throttle the memory allocations coming
from i915.ko. Currently, we wait until completely idle, but this is far
too heavy and leads to some situations where the only escape is to
declare a client hung and reset the GPU. The intent is to only ratelimit
the allocation requests and to allow ourselves to recycle requests and
memory from any long queues built up by a client hog.
Although the system memory is inherently a global resources, we don't
want to overly penalize an unlucky client to pay the price of reaping a
hog. To reduce the influence of one client on another, we can instead of
waiting for the entire GPU to idle, impose a barrier on the local client.
(One end goal for request allocation is for scalability to many
concurrent allocators; simultaneous execbufs.)
To prevent ourselves from getting caught out by long running requests
(requests that may never finish without userspace intervention, whom we
are blocking) we need to impose a finite timeout, ideally shorter than
hangcheck. A long time ago Paul McKenney suggested that RCU users should
ratelimit themselves using judicious use of cond_synchronize_rcu(). This
gives us the opportunity to reduce our indefinite wait for the GPU to
idle to a wait for the RCU grace period of the previous allocation along
this timeline to expire, satisfying both the local and finite properties
we desire for our ratelimiting.
There are still a few global steps (reclaim not least amongst those!)
when we exhaust the immediate slab pool, at least now the wait is itself
decoupled from struct_mutex for our glorious highly parallel future!
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106680
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180914080017.30308-1-chris@chris-wilson.co.uk
drm-misc-next for 4.20:
UAPI Changes:
- Add host endian variants for the most common formats (Gerd)
- Fail ADDFB2 for big-endian drivers that don't advertise BE quirk (Gerd)
- clear smem_start in fbdev for drm drivers to avoid leaking fb addr (Daniel)
Cross-subsystem Changes:
Core Changes:
- fix drm_mode_addfb() on big endian machines (Gerd)
- add timeline point to syncobj find+replace (Chunming)
- more drmP.h removal effort (Daniel)
- split uapi portions of drm_atomic.c into drm_atomic_uapi.c (Daniel)
Driver Changes:
- bochs: Convert open-coded portions to use helpers (Peter)
- vkms: Add cursor support (Haneen)
- udmabuf: Lots of fixups (mostly cosmetic afaict) (Gerd)
- qxl: Convert to use fbdev helper (Peter)
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Chunming Zhou <david1.zhou@amd.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Peter Wu <peter@lekensteyn.nl>
Cc: Haneen Mohammed <hamohammed.sa@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180913130254.GA156437@art_vandelay
This patch adds support to decode system memory bandwidth and other
parameters for skylake and Gen9+ platforms, which will be used for
arbitrated display memory bandwidth calculation in GEN9 based
platforms and WM latency level-0 Work-around calculation on GEN9+.
Changes Since V1:
- s/memdev_info/dram_info
- create a struct to hold channel info
Changes Since V2:
- rewrite code to adhere i915 coding style
- not valid for GLK
Signed-off-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180824093225.12598-3-mahesh1.kumar@intel.com
If we have framebuffers that are >= 4GiB in size we will overflow
the fb size check in intel_fill_fb_info().
Currently that is only possible with NV12 and CCS as offsets[1]
may be anything between 0 and 0xffffffff. offsets[0] is currently
required to be 0 so we can't hit the overflow with any single
plane format (thanks to max fb size of 8kx8k and max stride of
32 KiB).
In the future we may allow almost any framebuffer to exceed 4GiB
in size so we really should fix the overflow. Not that the overflow
is particularly dangerous. It's mostly just a sanity check against
insane userspace. The display engine can't write to memory anyway
so I suppose in the worst case we might anger the hw by attempting
scanout past the end of the ggtt, or we might scan out some data
that we're not supposed to see from other parts of the ggtt.
Note that triggering this overflow depends on the driver
aligning the fb height to the next tile boundary to push the
calculated size above 4GiB. With linear buffers the effective
tile height is one so that never happens, and the core already
has a check for 32bit overflow of offsets[]+pitches[]*height.
v2: Drop the unnecessary cast (Chris)
Testcase: igt/kms_big_fb/x-tiled-addfb-size-offset-overflow
Testcase: igt/kms_big_fb/y-tiled-addfb-size-offset-overflow
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912180443.28649-1-ville.syrjala@linux.intel.com
We can remove the update-via-batch-buffer code path, which is basically an
effective duplicate of update-via-context-image path, if we notice that
after we have idled the GPU, we can update the context image even of the
kernel context directly. (Update-via-batch-buffer path existed only to
solve the problem of how to update the kernel context image.)
Only additional thing needed is to activate the edited configuration by
sending one empty request down the pipe. This accomplishes context restore
of the updated kernel context and so the OA configuration gets written out
to it's control registers.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912152930.28237-1-tvrtko.ursulin@linux.intel.com
Split up intel_check_primary_plane() and intel_check_sprite_plane()
into per-platform variants. This way we can get a unified behaviour
between the SKL universal planes, and we stop checking for non-SKL
specific scaling limits for the "sprite" planes. And we now get
a natural place where to add more plarform specific checks.
v2: Split the .check_plane() calling convention change out (José)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180907152413.15761-10-ville.syrjala@linux.intel.com
Let's store the final plane stride in the plane state. This avoids
having to pick between the normal vs. rotated stride during hardware
programming. And once we get GTT remapping the plane stride will
no longer match the fb stride so we'll need a place to store it
anyway.
v2: Keep checking fb->pitches[0] for cursor as later on we won't
populate plane_state->color_plane[0].stride for invisible planes
and we have been checking the cursor fb stride even for invisible
planes
v3: s/betwen/between in commit msg (José)
v4: Check color_plane[0].stride instead of fb->pitches[0] in
the skl_check_main_surface() X-tiling kludge
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180911150139.23922-1-ville.syrjala@linux.intel.com
If the caller supplies more than 4G of objects and than one that has to
be in the low 4G, it is possible for the low 4G to be full before we
attempt to find room for the last object that must be there. As we don't
reorder the two types, every pass hits the same problem and we fail with
ENOSPC. However, if we impose a little bit of ordering between the two
classes of objects, on the second pass we will be able to fit the
special object as we do it first. For setups that only use !48b objects,
we now reverse the order between passes, hopefully making the subsequent
passes more likely to succeed given that we are trying a different
order (rather than repeating the previous pass!)
v2: Quick one line explanation for the relative priorities given to
reservations.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180912101133.31377-1-chris@chris-wilson.co.uk
Userspace should be free to race against itself and shoot itself in
the foot if it so desires to adjust a parameter at the same time as
submitting a batch to that context. As such, the struct_mutex in context
setparam is only being used to serialise userspace against itself and
not for any protection of internal structs and so is superfluous.
v2: Separate user_flags from internal flags to reduce chance of
interference; and use locked bit ops for user updates.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180911132206.23032-1-chris@chris-wilson.co.uk
Given that we are now reasonably confident in our ability to detect and
reserve the stolen memory (physical memory reserved for graphics by the
BIOS) for ourselves on most machines, we can put it to use. In this
case, we need a page to hold the overlay registers.
On an i915g running MythTv, H Buus noticed that
commit 6a2c4232ec
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 4 04:51:40 2014 -0800
drm/i915: Make the physical object coherent with GTT
introduced stuttering into his video playback. After discarding the
likely suspect of it being the physical cursor updates, we were left
with the use of the phys object for the overlay. And lo, if we
completely avoid using the phys object (allocated just once on module
load!) by switching to stolen memory, the stuttering goes away.
For lack of a better explanation, claim victory and kill two birds with
one stone.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107600
Fixes: 6a2c4232ec ("drm/i915: Make the physical object coherent with GTT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180906190144.1272-1-chris@chris-wilson.co.uk
(cherry picked from commit c8124d3992)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Given that we are now reasonably confident in our ability to detect and
reserve the stolen memory (physical memory reserved for graphics by the
BIOS) for ourselves on most machines, we can put it to use. In this
case, we need a page to hold the overlay registers.
On an i915g running MythTv, H Buus noticed that
commit 6a2c4232ec
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Tue Nov 4 04:51:40 2014 -0800
drm/i915: Make the physical object coherent with GTT
introduced stuttering into his video playback. After discarding the
likely suspect of it being the physical cursor updates, we were left
with the use of the phys object for the overlay. And lo, if we
completely avoid using the phys object (allocated just once on module
load!) by switching to stolen memory, the stuttering goes away.
For lack of a better explanation, claim victory and kill two birds with
one stone.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107600
Fixes: 6a2c4232ec ("drm/i915: Make the physical object coherent with GTT")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180906190144.1272-1-chris@chris-wilson.co.uk
Merge tag 'gvt-next-2018-09-04'
drm-intel-next-2018-09-06-1:
UAPI Changes:
- GGTT coherency GETPARAM: GGTT has turned out to be non-coherent for some
platforms, which we've failed to communicate to userspace so far. SNA was
modified to do extra flushing on non-coherent GGTT access, while Mesa will
mitigate by always requiring WC mapping (which is non-coherent anyway).
- Neuter Resource Streamer uAPI: There never really were users for the feature,
so neuter it while keeping the interface bits for compatibility. This is a
long due item from past.
Cross-subsystem Changes:
- Backmerge of branch drm-next-4.19 for DP_DPCD_REV_14 changes
Core Changes:
- None
Driver Changes:
- A load of Icelake (ICL) enabling patches (Paulo, Manasi)
- Enabled full PPGTT for IVB,VLV and HSW (Chris)
- Bugzilla #107113: Distribute DDB based on display resolutions (Mahesh)
- Bugzillas #100023,#107476,#94921: Support limited range DP displays (Jani)
- Bugzilla #107503: Increase LSPCON timeout (Fredrik)
- Avoid boosting GPU due to an occasional stall in interactive workloads (Chris)
- Apply GGTT coherency W/A only for affected systems instead of all (Chris)
- Fix for infinite link training loop for faulty USB-C MST hubs (Nathan)
- Keep KMS functional on Gen4 and earlier when GPU is wedged (Chris)
- Stop holding ppGTT reference from closed VMAs (Chris)
- Clear error registers after error capture (Lionel)
- Various Icelake fixes (Anusha, Jyoti, Ville, Tvrtko)
- Add missing Coffeelake (CFL) PCI IDs (Rodrigo)
- Flush execlists tasklet directly from reset-finish (Chris)
- Fix LPE audio runtime PM (Chris)
- Fix detection of out of range surface positions (GLK/CNL) (Ville)
- Remove wait-for-idle for PSR2 (Dhinakaran)
- Power down existing display hardware resources when display is disabled (Chris)
- Don't allow runtime power management if RC6 doesn't exist (Chris)
- Add debugging checks for runtime power management paths (Imre)
- Increase symmetry in display power init/fini paths (Imre)
- Isolate GVT specific macros from i915_reg.h (Lucas)
- Increase symmetry in power management enable/disable paths (Chris)
- Increase IP disable timeout to 100 ms to avoid DRM_ERROR (Imre)
- Fix memory leak from HDMI HDCP write function (Brian, Rodrigo)
- Reject Y/Yf tiling on interlaced modes (Ville)
- Use a cached mapping for the physical HWS on older gens (Chris)
- Force slow path of writing relocations to buffer if unable to write to userspace (Chris)
- Do a full device reset after being wedged (Chris)
- Keep forcewake counts over reset (in case of debugfs user) (Imre, Chris)
- Avoid false-positive errors from power wells during init (Imre)
- Reset engines forcibly in exchange of declaring whole device wedged (Mika)
- Reduce context HW ID lifetime in preparation for Icelake (Chris)
- Attempt to recover from module load failures (Chris)
- Keep select interrupts over a reset to avoid missing/losing them (Chris)
- GuC submission backend improvements (Jakub)
- Terminate context images with BB_END (Chris, Lionel)
- Make GCC evaluate GGTT view struct size assertions again (Ville)
- Add selftest to exercise suspend/hibernate code-paths for GEM (Chris)
- Use a full emulation of a user ppgtt context in selftests (Chris)
- Exercise resetting in the middle of a wait-on-fence in selftests (Chris)
- Fix coherency issues on selftests for Baytrail (Chris)
- Various other GEM fixes / self-test updates (Chris, Matt)
- GuC doorbell self-tests (Daniele)
- PSR mode control through debugfs for IGTs (Maarten)
- Degrade expected WM latency errors to DRM_DEBUG_KMS (Chris)
- Cope with errors better in MST link training (Dhinakaran)
- Fix WARN on KBL external displays (Azhar)
- Power well code cleanups (Imre)
- Fixes to PSR debugging (Dhinakaran)
- Make forcewake errors louder for easier catching in CI (WARNs) (Chris)
- Fortify tiling code against programmer errors (Chris)
- Bunch of fixes for CI exposed corner cases (multiple authors, mostly Chris)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180907105446.GA22860@jlahtine-desk.ger.corp.intel.com
drm-misc-next for 4.20:
UAPI Changes:
- Add userspace dma-buf device to turn memfd regions into dma-bufs (Gerd)
- Add per-plane blend mode property (Lowry)
- Change in drm_fourcc.h is documentation only (Brian)
Cross-subsystem Changes:
- None
Core Changes:
- Remove user logspam and useless lock in vma_offset_mgr destroy (Chris)
- Add get/verify_crc_source for improved crc source selection (Mahesh)
- Add __drm_atomic_helper_plane_reset to reduce copypasta (Alexandru)
Driver Changes:
- various: Replance ref/unref calls with drm_dev_get/put (Thomas)
- bridge: Add driver for TI SN65DSI86 chip (Sandeep)
- rockchip: Add PX30 support (Sandy)
- sun4i: Add support for R40 TCON (Jernej)
- vkms: Continued building out vkms, added gem support (Haneen)Driver Changes:
- various: fbdev: Wrap remove_conflicting_framebuffers with resource_len
accessors to remove a bunch of cargo-cult (Michał)
- rockchip: Add rgb output iface support + fixes (Sandy/Heiko)
- nouveau/amdgpu: Add cec-over-aux support (Hans)
- sun4i: Add support for Allwinner A64 (Jagan)
Cc: Gerd Hoffmann <kraxel@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Cc: Heiko Stuebner <heiko@sntech.de>
Cc: Sandy Huang <hjc@rock-chips.com>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Jagan Teki <jagan@amarulasolutions.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Sean Paul <sean@poorly.run>
Link: https://patchwork.freedesktop.org/patch/msgid/20180905202210.GA95199@art_vandelay