When running the GuC the GPU can't be considered idle if the GuC still
has contexts pinned. As such, a call has been added in
intel_gt_wait_for_idle to idle the UC and in turn the GuC by waiting for
the number of unpinned contexts to go to zero.
v2: rtimeout -> remaining_timeout
v3: Drop unnecessary includes, guc_submission_busy_loop ->
guc_submission_send_busy_loop, drop negatie timeout trick, move a
refactor of guc_context_unpin to earlier path (John H)
v4: Add stddef.h back into intel_gt_requests.h, sort circuit idle
function if not in GuC submission mode
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-16-matthew.brost@intel.com
With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.
Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.
v2:
(John H)
- s/drm_dbg/drm_err
(Daneiel)
- Clean up sched state function
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-9-matthew.brost@intel.com
Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.
v2:
(Daniel Vetter)
- Use msleep_interruptible rather than cond_resched in busy loop
(Michal)
- Remove C++ style comment
v3:
(Matthew Brost)
- Drop GUC_ID_START
(John Harrison)
- Fix a bunch of typos
- Use drm_err rather than drm_dbg for G2H errors
(Daniele)
- Fix ;; typo
- Clean up sched state functions
- Add lockdep for guc_id functions
- Don't call __release_guc_id when guc_id is invalid
- Use MISSING_CASE
- Add comment in guc_context_pin
- Use shorter path to rpm
(Daniele / CI)
- Don't call release_guc_id on an invalid guc_id in destroy
v4:
(Daniel Vetter)
- Add FIXME comment
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet is used for the submission path.
Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.
In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.
v2:
(John Harrison)
- Clean up some comments
v3:
(John Harrison)
- More comment cleanups
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-5-matthew.brost@intel.com
This essentially reverts
commit 84a1074920
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 24 11:36:08 2018 +0000
drm/i915: Shrink the GEM kmem_caches upon idling
mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
then we need to fix that there, not hand-roll our own slab shrinking
code in i915.
Also when this was added there was only one other caller of
kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one
outside of i915 code in a kunit test, which seems legit since that
wants to very carefully control what's in the kmem_cache. This out of
a total of over 500 calls to kmem_cache_create. This alone should have
been warning sign enough that we're doing something silly.
Noticed while reviewing a patch set from Jason to fix up some issues
in our i915_init() and i915_exit() module load/cleanup code. Now that
i915_globals.c isn't any different than normal init/exit functions, we
should convert them over to one unified table and remove
i915_globals.[hc] entirely.
v2: Improve commit message (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch
Workarounds are documented in the bspec with an exclusive upper bound
(i.e., a "fixed" stepping that no longer needs the workaround). This
makes our driver's use of an inclusive upper bound for stepping ranges
confusing; the differing notation between code and bspec makes it very
easy for mistakes to creep in.
Let's switch the upper bound of our IS_{GT,DISP}_STEP macros over to use
an exclusive upper bound like the bspec does. This also has the benefit
of helping make sure workarounds are properly handled for new minor
steppings that show up (e.g., an A1 between the A0 and B0 we already
knew about) --- if the new intermediate stepping pulls in hardware fixes
early, there will be an update to the workaround definition which lets
us know we need to change our code. If the new stepping does not pull a
hardware fix earlier, then the new stepping will already be captured
properly by the "[begin, fix)" range in the code.
We'll probably need to be extra vigilant in code review of new
workarounds for the near future to make sure developers notice the new
semantics of workaround bounds. But we just migrated a bunch of our
platforms from the IS_REVID bounds over to IS_{GT,DISP}_STEP, so people
are already adjusting to the new macros and now is a good time to make
this change too.
[mattrope: Split out GT changes to apply through gt-next tree]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-8-matthew.d.roper@intel.com
While doing a quick sanity check of the ICL workarounds in the driver I
noticed a few things that should be updated:
* There's no mention in the bspec that WaPipelineFlushCoherentLines
is needed on gen11 (both the current WA database and the old,
deprecated page 20196 were checked); it appears this might have just
been copied from the gen9 list? Even if this were needed, it doesn't
seem like this was the correct implementation anyway since the gen9
workaround is supposed to be implemented in the indirect context bb
(as we do in gen8_emit_flush_coherentl3_wa() on gen8/gen9).
* WaForwardProgressSoftReset does not appear in the current workaround
database. The old deprecated workaround list has a note indicating
the workaround was dropped in 2017, so we should be safe to drop it
from the code too.
While we're at it, add the formal workaround ID number to
WaDisableBankHangMode (our hardware team made a transition from
text-based workaround names to ID numbers partway through the
development of ICL, which is why some workarounds only have names, some
only have numbers, and some have both).
Bspec: 33450
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-3-matthew.d.roper@intel.com
On SKL we've been applying this workaround on H0+ steppings, which is
actually backwards; H0 is supposed to be the first stepping where the
workaround is no longer needed. Flip the bounds so that the workaround
applies to all steppings _before_ H0.
On BXT we've been applying this workaround to all steppings, but the
bspec tells us it's only needed until C0. Pre-C0 GT steppings only
appeared in pre-production hardware, which we no longer support in the
driver, so we can drop the workaround completely for this platform.
On ICL we've been applying this workaround to all steppings, but there
doesn't seem to be any indication that this workaround was ever needed
for this platform (even now-deprecated page 20196 of the bspec doesn't
mention it). We can go ahead and drop it.
I also don't see any mention of this workaround being needed for KBL,
although this may be an oversight since the workaround is needed for all
steppings of CFL. I'll leave the workaround in place for KBL to be
safe.
Bspec: 14091, 33450
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-2-matthew.d.roper@intel.com
The switch from old old IS_FOO_REVID() macros to the new table-based
IS_FOO_{GT,DISP}_STEP() macros is needed on both drm-intel-next (for
display-based DMC matching) and drm-intel-gt-next (for workaround
guards). To avoid conflicts, we'll apply the patches to a topic branch
and merge it to both intel branches to ensure the transition to the
new macros is clean.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Switch ICL to use a revid->stepping table as we're trying to do on all
platforms going forward. While we're at it, let's include some
additional steppings that have popped up, even if we don't yet have any
workarounds tied to those steppings (we probably need to audit our
workaround list soon to see if any of the bounds have moved or if new
workarounds have appeared).
Note that the current bspec table is missing information about how to
map PCI revision ID to GT/display steppings; it only provides an SoC
stepping. The mapping to GT/display steppings (which aren't always the
same as the SoC stepping) used to be in the bspec, but was apparently
dropped during an update in Nov 2019; I've made my changes here based on
an older bspec snapshot that still had the necessary information. We've
requested that the missing information be restored.
I'm only including the production revids in the table here since we're
past the point at which we usually stop trying to support pre-production
hardware. An appropriate check is added to
intel_detect_preproduction_hw() to print an error and taint the kernel
just in case someone still tries to load the driver on old
pre-production hardware.
v2:
- Drop pre-production steppings and add error/taint at startup when
loading on pre-production hardware.
Bspec: 21141 # pre-Nov 2019 snapshot
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713193635.3390052-8-matthew.d.roper@intel.com
Switch SKL to use a revid->stepping table as we're trying to do on all
platforms going forward. Also drop the preproduction revisions and add
the newer steppings we hadn't already handled.
Note that SKL has a case where a newer revision ID corresponds to an
older GT/disp stepping (0x9 -> STEP_J0, 0xA -> STEP_I1). Also, the lack
of a revision ID 0x8 in the table is intentional and not an oversight.
We'll re-write the KBL-specific comment to make it clear that these kind
of quirks are expected.
v2:
- Since GT and display steppings are always identical on SKL use a
macro to set both values at once in a more readable manner. (Anusha)
- Drop preproduction steppings.
Bspec: 13626
Cc: Anusha Srivatsa <anusha.srivatsa@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713193635.3390052-4-matthew.d.roper@intel.com
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 1482667324 ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matthew.auld@intel.com
(cherry picked from commit 8f88ca76b3)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
We skip filling out the pt with scratch entries if the va range covers
the entire pt, since we later have to fill it with the PTEs for the
object pages anyway. However this might leave open a small window where
the PTEs don't point to anything valid for the HW to consume.
When for example using 2M GTT pages this fill_px() showed up as being
quite significant in perf measurements, and ends up being completely
wasted since we ignore the pt and just use the pde directly.
Anyway, currently we have our PTE construction split between alloc and
insert, which is probably slightly iffy nowadays, since the alloc
doesn't actually allocate anything anymore, instead it just sets up the
page directories and points the PTEs at the scratch page. Later when we
do the insert step we re-program the PTEs again. Better might be to
squash the alloc and insert into a single step, then bringing back this
optimisation(along with some others) should be possible.
Fixes: 1482667324 ("drm/i915: Only initialize partially filled pagetables")
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Jon Bloomfield <jon.bloomfield@intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: <stable@vger.kernel.org> # v4.15+
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210713130431.2392740-1-matthew.auld@intel.com
CTB writes are now in the path of command submission and should be
optimized for performance. Rather than reading CTB descriptor values
(e.g. head, tail) which could result in accesses across the PCIe bus,
store shadow local copies and only read/write the descriptor values when
absolutely necessary. Also store the current space in the each channel
locally.
v2:
(Michal)
- Add additional sanity checks for head / tail pointers
- Use GUC_CTB_HDR_LEN rather than magic 1
v3:
(Michal / John H)
- Drop redundant check of head value
v4:
(John H)
- Drop redundant checks of tail / head values
v5:
(Michal)
- Address more nits
v6:
(Michal)
- Add GEM_BUG_ON sanity check on ctb->space
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-7-matthew.brost@intel.com
Add non blocking CTB send function, intel_guc_send_nb. GuC submission
will send CTBs in the critical path and does not need to wait for these
CTBs to complete before moving on, hence the need for this new function.
The non-blocking CTB now must have a flow control mechanism to ensure
the buffer isn't overrun. A lazy spin wait is used as we believe the
flow control condition should be rare with a properly sized buffer.
The function, intel_guc_send_nb, is exported in this patch but unused.
Several patches later in the series make use of this function.
v2:
(Michal)
- Use define for H2G room calculations
- Move INTEL_GUC_SEND_NB define
(Daniel Vetter)
- Use msleep_interruptible rather than cond_resched
v3:
(Michal)
- Move includes to following patch
- s/INTEL_GUC_SEND_NB/INTEL_GUC_CT_SEND_NB/g
v4:
(John H)
- Update comment, add type local variable
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708162055.129996-5-matthew.brost@intel.com
The conversion to ww mutexes failed to address the fence code which
already returns -EDEADLK when we run out of fences. Ww mutexes on
the other hand treat -EDEADLK as an internal errno value indicating
a need to restart the operation due to a deadlock. So now when the
fence code returns -EDEADLK the higher level code erroneously
restarts everything instead of returning the error to userspace
as is expected.
To remedy this let's switch the fence code to use a different errno
value for this. -ENOBUFS seems like a semi-reasonable unique choice.
Apart from igt the only user of this I could find is sna, and even
there all we do is dump the current fence registers from debugfs
into the X server log. So no user visible functionality is affected.
If we really cared about preserving this we could of course convert
back to -EDEADLK higher up, but doesn't seem like that's worth
the hassle here.
Not quite sure which commit specifically broke this, but I'll
just attribute it to the general gem ww mutex work.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Testcase: igt/gem_pread/exhaustion
Testcase: igt/gem_pwrite/basic-exhaustion
Testcase: igt/gem_fenced_exec_thrash/too-many-fences
Fixes: 80f0b679d6 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210630164413.25481-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
(cherry picked from commit 78d2ad7eb4)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The conversion to ww mutexes failed to address the fence code which
already returns -EDEADLK when we run out of fences. Ww mutexes on
the other hand treat -EDEADLK as an internal errno value indicating
a need to restart the operation due to a deadlock. So now when the
fence code returns -EDEADLK the higher level code erroneously
restarts everything instead of returning the error to userspace
as is expected.
To remedy this let's switch the fence code to use a different errno
value for this. -ENOBUFS seems like a semi-reasonable unique choice.
Apart from igt the only user of this I could find is sna, and even
there all we do is dump the current fence registers from debugfs
into the X server log. So no user visible functionality is affected.
If we really cared about preserving this we could of course convert
back to -EDEADLK higher up, but doesn't seem like that's worth
the hassle here.
Not quite sure which commit specifically broke this, but I'll
just attribute it to the general gem ww mutex work.
Cc: stable@vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@intel.com>
Testcase: igt/gem_pread/exhaustion
Testcase: igt/gem_pwrite/basic-exhaustion
Testcase: igt/gem_fenced_exec_thrash/too-many-fences
Fixes: 80f0b679d6 ("drm/i915: Add an implementation for i915_gem_ww_ctx locking, v2.")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210630164413.25481-1-ville.syrjala@linux.intel.com
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
There's a big comment saying how useful it is but no one is using this
for anything anymore.
It was added in 2bfa996e03 ("drm/i915: Store owning file on the
i915_address_space") and used for debugfs at the time as well as telling
the difference between the global GTT and a PPGTT. In f6e8aa3871
("drm/i915: Report the number of closed vma held by each context in
debugfs") we removed one use of it by switching to a context walk and
comparing with the VM in the context. Finally, VM stats for debugfs
were entirely nuked in db80a1294c ("drm/i915/gem: Remove per-client
stats from debugfs/i915_gem_objects")
v2 (Daniel Vetter):
- Delete a struct drm_i915_file_private pre-declaration
- Add a comment to the commit message about history
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-24-jason@jlekstrand.net
Even though FENCE_SUBMIT is only documented to wait until the request in
the in-fence starts instead of waiting until it completes, it has a bit
more magic than that. If FENCE_SUBMIT is used to submit something to a
balanced engine, we would wait to assign engines until the primary
request was ready to start and then attempt to assign it to a different
engine than the primary. There is an IGT test (the bonded-slice subtest
of gem_exec_balancer) which exercises this by submitting a primary batch
to a specific VCS and then using FENCE_SUBMIT to submit a secondary
which can run on any VCS and have i915 figure out which VCS to run it on
such that they can run in parallel.
However, this functionality has never been used in the real world. The
media driver (the only user of FENCE_SUBMIT) always picks exactly two
physical engines to bond and never asks us to pick which to use.
v2 (Daniel Vetter):
- Mention the exact IGT test this breaks
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-11-jason@jlekstrand.net
This adds a bunch of complexity which the media driver has never
actually used. The media driver does technically bond a balanced engine
to another engine but the balanced engine only has one engine in the
sibling set. This doesn't actually result in a virtual engine.
This functionality was originally added to handle cases where we may
have more than two video engines and media might want to load-balance
their bonded submits by, for instance, submitting to a balanced vcs0-1
as the primary and then vcs2-3 as the secondary. However, no such
hardware has shipped thus far and, if we ever want to enable such
use-cases in the future, we'll use the up-and-coming parallel submit API
which targets GuC submission.
This makes I915_CONTEXT_ENGINES_EXT_BOND a total no-op. We leave the
validation code in place in case we ever decide we want to do something
interesting with the bonding information.
v2 (Jason Ekstrand):
- Don't delete quite as much code.
v3 (Tvrtko Ursulin):
- Add some history to the commit message
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-10-jason@jlekstrand.net
This API allows one context to grab bits out of another context upon
creation. It can be used as a short-cut for setparam(getparam()) for
things like I915_CONTEXT_PARAM_VM. However, it's never been used by any
real userspace. It's used by a few IGT tests and that's it. Since it
doesn't add any real value (most of the stuff you can CLONE you can copy
in other ways), drop it.
There is one thing that this API allows you to clone which you cannot
clone via getparam/setparam: timelines. However, timelines are an
implementation detail of i915 and not really something that needs to be
exposed to userspace. Also, sharing timelines between contexts isn't
obviously useful and supporting it has the potential to complicate i915
internally. It also doesn't add any functionality that the client can't
get in other ways. If a client really wants a shared timeline, they can
use a syncobj and set it as an in and out fence on every submit.
v2 (Jason Ekstrand):
- More detailed commit message
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210708154835.528166-7-jason@jlekstrand.net