Define a mutex for the exclusive use of interacting with the per-file
context-idr, that was previously guarded by struct_mutex. This allows us
to reduce the coverage of struct_mutex, with a view to removing the last
bits coordinating GEM context later. (In the short term, we avoid taking
struct_mutex while using the extended constructor functions, preventing
some nasty recursion.)
v2: s/context_lock/context_idr_lock/
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190321140711.11190-2-chris@chris-wilson.co.uk
In order to make it easier to bring up new platforms
without having to take care about all corner cases
that was previously taken care for previous platforms
we already use comparative INTEL_GEN statements.
Let's start doing the same with PCH.
The only caveats are:
- less-than comparisons need to be avoided or done with
attention and check > PCH_NONE as well.
- It is not necessarily a chronological order, but a matter
of south display compatibility/inheritance.
v2: Rebased on top of Jani's clean-up which removed the
need for less-than comparison
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-3-rodrigo.vivi@intel.com
Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).
v2: Use intel_engine_mask_t and ALL_ENGINES
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-3-chris@chris-wilson.co.uk
When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.
v2: Pull wedging into switch_to_kernel_context_sync() but only after
waiting (though only for the same short delay) for the active context to
finish.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190308093657.8640-1-chris@chris-wilson.co.uk
In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.
v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
v3: Tvrtko voted for moah churn so teach everyone to not mention ring
and use $class$instance throughout.
v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and
VCS[0-4] in later gen. We opt to keep the code consistent and use
0-index naming throughout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk
We currently use a worker queued from an rcu callback to determine when
a how grace period has elapsed while we remained idle. We use this idle
delay to infer that we will be idle for a while and this is a suitable
point at which we can trim our global memory caches.
Since we wrote that, this mechanism now exists as rcu_work, and having
converted the idle shrinkers over to using that, we can remove our own
variant.
v2: Say goodbye to gt.epoch as well.
v3: Remove the misplaced and redundant comment before parking globals
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-3-chris@chris-wilson.co.uk
As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.
The benefit for us is much reduced pointer dancing along the frequent
allocation paths.
v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190228102035.5857-1-chris@chris-wilson.co.uk
Defining the mei-i915 interface functions and initialization of
the interface.
v2:
Adjust to the new interface changes. [Tomas]
Added further debug logs for the failures at MEI i/f.
port in hdcp_port data is equipped to handle -ve values.
v3:
mei comp is matched for global i915 comp master. [Daniel]
In hdcp_shim hdcp_protocol() is replaced with const variable. [Daniel]
mei wrappers are adjusted as per the i/f change [Daniel]
v4:
port initialization is done only at hdcp2_init only [Danvet]
v5:
I915 registers a subcomponent to be matched with mei_hdcp [Daniel]
v6:
HDCP_disable for all connectors incase of comp_unbind.
Tear down HDCP comp interface at i915_unload [Daniel]
v7:
Component init and fini are moved out of connector ops [Daniel]
hdcp_disable is not called from unbind. [Daniel]
v8:
subcomponent name is dropped as it is already merged.
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> [v11]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1550338640-17470-5-git-send-email-ramalingam.c@intel.com
At a few points in our uABI, we check to see if the driver is wedged and
report -EIO back to the user in that case. However, as we perform the
check and reset asynchronously (where once before they were both
serialised by the struct_mutex), we may instead see the temporary wedging
used to cancel inflight rendering to avoid a deadlock during reset
(caused by either us timing out in our reset handler,
i915_wedge_on_timeout or with malice aforethought in intel_reset_prepare
for a stuck modeset). If we suspect this is the case, that is we see a
wedged driver *and* reset in progress, then wait until the reset is
resolved before reporting upon the wedged status.
v2: might_sleep() (Mika)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109580
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190220145637.23503-1-chris@chris-wilson.co.uk
As time goes by, usage of generic ioctls such as drm_syncobj and
sync_file are on the increase bypassing i915-specific ioctls like
GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
we track the file/client and account the waitboosting to them. However,
since commit 7b92c1bd05 ("drm/i915: Avoid keeping waitboost active for
signaling threads"), we no longer have been applying the client
ratelimiting on waitboosts and so that information has only been used
for debug tracking.
Push the application of waitboosting down to the common
i915_request_wait, and apply it to all foreign fence waits as well.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190213092504.25709-1-chris@chris-wilson.co.uk
Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.
Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:
1 int readside(void) {
2 int idx;
3 rcu_read_lock();
4 if (nomoresrcu) {
5 rcu_read_unlock();
6 return -EINVAL;
7 }
8 idx = srcu_read_lock(&ss);
9 rcu_read_unlock();
10 /* SRCU read-side critical section. */
11 srcu_read_unlock(&ss, idx);
12 return 0;
13 }
14
15 void cleanup(void)
16 {
17 nomoresrcu = 1;
18 synchronize_rcu();
19 synchronize_srcu(&ss);
20 cleanup_srcu_struct(&ss);
21 }
No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.
However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.
v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences
v5: s/srcu/reset_backoff_srcu/
v6: Remove more stale comments
Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4 ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190208153708.20023-2-chris@chris-wilson.co.uk
Changing the i915_edp_psr_debug was enabling, disabling or switching
PSR version by directly calling intel_psr_disable_locked() and
intel_psr_enable_locked(), what is not the default PSR path that will
be executed by real users.
So lets force a fastset in the PSR CRTC to trigger a pipe update and
stress the default code path.
Recently a bug was found when switching from PSR2 to PSR1 while
enable_psr kernel parameter was set to the default parameter, this
changes fix it and also fixes the bug linked bellow were DRRS was
left enabled together with PSR when enabling PSR from debugfs.
v2: Handling missing case: disabled to PSR1
v3: Not duplicating the whole atomic state(Maarten)
v4: Adding back the missing call to intel_psr_irq_control(Dhinakaran)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108341
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190206211845.5322-1-jose.souza@intel.com
Split the color management hooks along the single vs. double
buffered registers line. Of the currently programmed registers
GAMMA_MODE and the ilk+ pipe CSC are double buffered, the
LUTS and CHV CGM block are single buffered.
The double buffered register will be programmed during the
normal pipe update with evasion, and also during pipe enable
so that the settings will already be correct when the pipe
starts up before the planes are enabled.
The single buffered registers are currently programmed before
the vblank evade. Which is totally wrong, but we'll correct
that later.
v2: Add some docs to explain the two vfuncs (Matt,Uma)
Rebase
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205160848.24662-6-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
First of all GMCH can be considered a feature by itself
since it is a chip present in some platforms that connects
the IA processor to memory and other components in PC.
Also with the introduction of display block at device info,
we got a redundant definition:
.display.has_gmch_display = 1,
So, let's clean up things a bit and use the standardized
way of has_feature on displays side.
No functional change and no manual interaction to generate
this patch.
It is only:
sed -si -e 's/has_gmch_display/has_gmch/g' \
-e 's/HAS_GMCH_DISPLAY/HAS_GMCH/g' drivers/gpu/drm/i915/*{c,h}
Cc: José Roberto de Souza <jose.souza@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190204222538.15842-1-rodrigo.vivi@intel.com
We want to expose the ability to reconfigure the slices, subslice and
eu per context and per engine. To facilitate that, store the current
configuration on the context for each engine, which is initially set
to the device default upon creation.
v2: record sseu configuration per context & engine (Chris)
v3: introduce the i915_gem_context_sseu to store powergating
programming, sseu_dev_info has grown quite a bit (Lionel)
v4: rename i915_gem_sseu into intel_sseu (Chris)
use to_intel_context() (Chris)
v5: More to_intel_context() (Tvrtko)
Switch intel_sseu from union to struct (Tvrtko)
Move context default sseu in existing loop (Chris)
v6: s/intel_sseu_from_device_sseu/intel_device_default_sseu/ (Tvrtko)
Tvrtko Ursulin:
v7:
* Pass intel_sseu by pointer instead of value to make_rpcs.
* Rebase for make_rpcs changes.
v8:
* Rebase for RPCS edit on pin.
v9:
* Rebase for context image setup changes.
v10:
* Rename dev_priv to i915. (Chris Wilson)
v11:
* Rebase.
v12:
* Rebase for IS_GEN changes.
v13:
* Rebase for RUNTIME_INFO.
v14:
* Rebase for intel_context_init.
v15:
* Rebase for drm-tip changes.
v16:
* Moved struct intel_sseu definition to i915_gem_context.h.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-1-tvrtko.ursulin@linux.intel.com
On icl+ bspec tells us to calculate a separate minimum ddb
allocation from the blocks watermark. Both have to be checked
against the actual ddb allocation, but since we do things the
other way around we'll just calculat the minimum acceptable
ddb allocation by taking the maximum of the two values.
We'll also replace the memcmp() with a full trawl over the
the watermarks so that it'll ignore the min_ddb_alloc
because we can't directly read that out from the hw. I suppose
we could reconstruct it from the other values, but I was
too lazy to do that now.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181221171436.8218-6-ville.syrjala@linux.intel.com
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
If we restrict ourselves to only using a cacheline for each timeline's
HWSP (we could go smaller, but want to avoid needless polluting
cachelines on different engines between different contexts), then we can
suballocate a single 4k page into 64 different timeline HWSP. By
treating each fresh allocation as a slab of 64 entries, we can keep it
around for the next 64 allocation attempts until we need to refresh the
slab cache.
John Harrison noted the issue of fragmentation leading to the same worst
case performance of one page per timeline as before, which can be
mitigated by adopting a freelist.
v2: Keep all partially allocated HWSP on a freelist
This is still without migration, so it is possible for the system to end
up with each timeline in its own page, but we ensure that no new
allocation would needless allocate a fresh page!
v3: Throw a selftest at the allocator to try and catch invalid cacheline
reuse.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128181812.22804-4-chris@chris-wilson.co.uk
Now that the submission backends are controlled via their own spinlocks,
with a wave of a magic wand we can lift the struct_mutex requirement
around GPU reset. That is we allow the submission frontend (userspace)
to keep on submitting while we process the GPU reset as we can suspend
the backend independently.
The major change is around the backoff/handoff strategy for performing
the reset. With no mutex deadlock, we no longer have to coordinate with
any waiter, and just perform the reset immediately.
Testcase: igt/gem_mmap_gtt/hang # regresses
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190125132230.22221-3-chris@chris-wilson.co.uk