It was only needed to protect the connector_list walking, see
commit 8c4ccc4ab6
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Jul 9 23:44:26 2015 +0200
drm/probe-helper: Grab mode_config.mutex in poll_init/enable
Unfortunately the commit message of that patch fails to mention that
the new locking check was for the connector_list.
But that requirement disappeared in
commit c36a3254f7
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu Dec 15 16:58:43 2016 +0100
drm: Convert all helpers to drm_connector_list_iter
and so we can drop this again.
This fixes a locking inversion on nouveau, where the rpm code needs to
re-enable. But in other places the rpm_get() calls are nested within
the big modeset locks.
While at it, also improve the kerneldoc for these two functions a
notch.
v2: Update the kerneldoc even more to explain that these functions
can't be called concurrently, or bad things happen (Chris).
Cc: Dave Airlie <airlied@gmail.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Lyude <lyude@redhat.com>
Reviewed-by: Lyude <lyude@redhat.com>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111090117.5134-1-daniel.vetter@ffwll.ch
The WaDisableLSQCROPERFforOCL workaround has the side effect of
disabling an L3SQ optimization that has huge performance implications
and is unlikely to be necessary for the correct functioning of usual
graphic workloads. Userspace is free to re-enable the workaround on
demand, and is generally in a better position to determine whether the
workaround is necessary than the DRM is (e.g. only during the
execution of compute kernels that rely on both L3 fences and HDC R/W
requests).
The same workaround seems to apply to BDW (at least to production
stepping G1) and SKL as well (the internal workaround database claims
that it does for all steppings, while the BSpec workaround table only
mentions pre-production steppings), but the DRM doesn't do anything
beyond whitelisting the L3SQCREG4 register so userspace can enable it
when it sees fit. Do the same on KBL platforms.
Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
This is followed by a regression of 35% and 10% respectively for the
same benchmarks and platform caused by my recent patch series
switching userspace to use the dataport constant cache instead of the
sampler to implement uniform pull constant loads, which caused us to
hit more heavily the L3 cache (and on platforms other than KBL had the
opposite effect of improving performance of the same two benchmarks).
The overall effect on KBL of this change combined with the recent
userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf
was affected by the constant cache changes (though it improved as it
did on other platforms rather than regressing), but is not
significantly affected by this patch (with statistical significance of
5% and sample size 20).
v2: Drop some more code to avoid unused variable warning.
Fixes: 738fa1b312 ("drm/i915/kbl: Add WaDisableLSQCROPERFforOCL")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
Signed-off-by: Francisco Jerez <currojerez@riseup.net>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: beignet@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.7+
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
[Removed double Fixes tag]
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1484217894-20505-1-git-send-email-mika.kuoppala@intel.com
Performing an eviction search can be very, very slow especially for a
range restricted replacement. For example, a workload like
gem_concurrent_blit will populate the entire GTT and then cause aperture
thrashing. Since the GTT is a mix of active and inactive tiny objects,
we have to search through almost 400k objects before finding anything
inside the mappable region, and as this search is required before every
operation performance falls off a cliff.
Instead of performing the full search, we do a trial replacement of the
node at a random location fitting the specified restrictions. We lose
the strict LRU property of the GTT in exchange for avoiding the slow
search (several orders of runtime improvement for gem_concurrent_blit
4KiB-global-gtt, e.g. from 5000s to 20s). The loss of LRU replacement is
(later) mitigated firstly by only doing replacement if we find no
freespace and secondly by execbuf doing a PIN_NONBLOCK search first before
it starts thrashing (i.e. the random replacement will only occur from the
already inactive set of objects).
v2: Ascii-art, and check preconditionst
v3: Rephrase final sentence in comment to explain why we don't bother
with if (i915_is_ggtt(vm)) for preferring random replacement.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-3-chris@chris-wilson.co.uk
When we evict from the GTT to make room for an object, the hole we
create is put onto the MRU stack inside the drm_mm range manager. On the
next search pass, we can speed up a PIN_HIGH allocation by referencing
that stack for the new hole.
v2: Pull together the 3 identical implements (ahem, a couple were
outdated) into a common routine for allocating a node and evicting as
necessary.
v3: Detect invalid calls to i915_gem_gtt_insert()
v4: kerneldoc
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170111112312.31493-1-chris@chris-wilson.co.uk
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.
v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Let userspace know if its request was resubmitted due to it being
executed at the time of a global reset. In this case, the reset was for
a guilty request on another engine, and this request was an innocent
victim that will be re-executed upon restarting. However, since it was
running at the time of the reset, we can not guarantee that it suffered
no ill-effects from the reset (e.g. some context state may be lost, or
some self-modifying fragment shaders will be restarted from the final
state not their initial state), to let userspace know that it has been
corrupted set a special value on the fence->error, -EAGAIN.
If the request does hang on resubmission, the error will be overwritten
with -EIO.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110172246.27297-3-chris@chris-wilson.co.uk
The core provides now an ABI to userspace for generation of frame CRCs,
so implement the ->set_crc_source() callback and reuse as much code as
possible with the previous ABI implementation.
When handling the pageflip interrupt, we skip 1 or 2 frames depending on
the HW because they contain wrong values. For the legacy ABI for
generating frame CRCs, this was done in userspace but now that we have a
generic ABI it's better if it's not exposed by the kernel.
v2:
- Leave the legacy implementation in place as the ABI implementation
in the core is incompatible with it.
v3:
- Use the "cooked" vblank counter so we have a whole 32 bits.
- Make sure we don't mess with the state of the legacy CRC capture
ABI implementation.
v4:
- Keep use of get_vblank_counter as in the legacy code, will be
changed in a followup commit.
v5:
- Skip first frame or two as it's known that they contain wrong
data.
- A few fixes suggested by Emil Velikov.
v6:
- Rework programming of the HW registers to preserve previous
behavior.
v7:
- Address whitespace issue.
- Added a comment on why in the implementation of the new ABI we
skip the 1st or 2nd frames.
v9:
- Add stub for intel_crtc_set_crc_source.
v12:
- Rebased.
- Remove stub for intel_crtc_set_crc_source and instead set the
callback to NULL (Jani Nikula).
v15:
- Rebased.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Emil Velikov <emil.velikov@collabora.com>
Reviewed-by: Robert Foss <robert.foss@collabora.com>
irq
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110134305.26326-2-tomeu.vizoso@collabora.com
Pull in latest drm-next from Dave Airlie to get at all the drm-misc
goodies, specifically:
- dma_fence error state handling rework (Chris needs that for error
recovery)
- crc support locking changes (Tomeu's i915 crc patches need that).
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The fence size/alignment is a combination of the vma size plus object
tiling parameters. Those parameters are rarely changed, making the fence
size/alignemnt roughly constant for the lifetime of the VMA. We can
simplify subsequent calculations by precalculating the size/alignment
required for GGTT vma taking fencing into account (with an update if we
do change the tiling or stride).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170109161613.11881-4-chris@chris-wilson.co.uk
Back to regular -misc pulls with reasonable sizes:
- dma_fence error clarification (Chris)
- drm_crtc_from_index helper (Shawn), pile more patches on the m-l to roll
this out to drivers
- mmu-less support for fbdev helpers from Benjamin
- piles of kerneldoc work
- some polish for crc support from Tomeu and Benjamin
- odd misc stuff all over
* tag 'drm-misc-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-misc: (48 commits)
dma-fence: Introduce drm_fence_set_error() helper
dma-fence: Wrap querying the fence->status
dma-fence: Clear fence->status during dma_fence_init()
drm: fix compilations issues introduced by "drm: allow to use mmuless SoC"
drm: Change the return type of the unload hook to void
drm: add more document for drm_crtc_from_index()
drm: remove useless parameters from drm_pick_cmdline_mode function
drm: crc: Call wake_up_interruptible() each time there is a new CRC entry
drm: allow to use mmuless SoC
drm: compile drm_vm.c only when needed
fbmem: add a default get_fb_unmapped_area function
drm: crc: Wait for a frame before returning from open()
drm: Move locking into drm_debugfs_crtc_crc_add
drm/imx: imx-tve: Remove unused variable
Revert "drm: nouveau: fix build when LEDS_CLASS=m"
drm: Add kernel-doc for drm_crtc_commit_get/put
drm/atomic: Fix outdated comment.
drm: reference count event->completion
gpu: drm: mgag200: mgag200_main:- Handle error from pci_iomap
drm: Document deprecated load/unload hook
...
More 4.11 stuff, holidays edition (i.e. not much):
- docs and cleanups for shared dpll code (Ander)
- some kerneldoc work (Chris)
- fbc by default on gen9+ too, yeah! (Paulo)
- fixes, polish and other small things all over gem code (Chris)
- and a few small things on top
Plus a backmerge, because Dave was enjoying time off too.
* tag 'drm-intel-next-2017-01-09' of git://anongit.freedesktop.org/git/drm-intel: (275 commits)
drm/i915: Update DRIVER_DATE to 20170109
drm/i915: Drain freed objects for mmap space exhaustion
drm/i915: Purge loose pages if we run out of DMA remap space
drm/i915: Fix phys pwrite for struct_mutex-less operation
drm/i915: Simplify testing for am-I-the-kernel-context?
drm/i915: Use range_overflows()
drm/i915: Use fixed-sized types for stolen
drm/i915: Use phys_addr_t for the address of stolen memory
drm/i915: Consolidate checks for memcpy-from-wc support
drm/i915: Only skip requests once a context is banned
drm/i915: Move a few more utility macros to i915_utils.h
drm/i915: Clear ret before unbinding in i915_gem_evict_something()
drm/i915/guc: Exclude the upper end of the Global GTT for the GuC
drm/i915: Move a few utility macros into a separate header
drm/i915/execlists: Reorder execlists register enabling
drm/i915: Assert that we do create the deferred context
drm/i915: Assert all timeline requests are gone before fini
drm/i915: Revoke fenced GTT mmapings across GPU reset
drm/i915: enable FBC on gen9+ too
drm/i915: actually drive the BDW reserved IDs
...
In gvt, almost all memory allocations are in sleepable contexts. It's
fault-prone to use GFP_ATOMIC everywhere. Replace it with GFP_KERNEL
wherever possible.
Signed-off-by: Jike Song <jike.song@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>