If two requests are on the same ring, they are explicitly ordered by the
HW. So, a submission fence is sufficient to ensure ordering when using
the new GuC submission interface. Conversely, if two requests share a
timeline and are on the same physical engine but different context this
doesn't ensure ordering on the new GuC submission interface. So, a
completion fence needs to be used to ensure ordering.
v2:
(Daniele)
- Don't delete spin lock
v3:
(Daniele)
- Delete forward dec
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-13-matthew.brost@intel.com
With GuC scheduling, it isn't safe to unpin a context while scheduling
is enabled for that context as the GuC may touch some of the pinned
state (e.g. LRC). To ensure scheduling isn't enabled when an unpin is
done, a call back is added to intel_context_unpin when pin count == 1
to disable scheduling for that context. When the response CTB is
received it is safe to do the final unpin.
Future patches may add a heuristic / delay to schedule the disable
call back to avoid thrashing on schedule enable / disable.
v2:
(John H)
- s/drm_dbg/drm_err
(Daneiel)
- Clean up sched state function
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-9-matthew.brost@intel.com
Implement GuC context operations which includes GuC specific operations
alloc, pin, unpin, and destroy.
v2:
(Daniel Vetter)
- Use msleep_interruptible rather than cond_resched in busy loop
(Michal)
- Remove C++ style comment
v3:
(Matthew Brost)
- Drop GUC_ID_START
(John Harrison)
- Fix a bunch of typos
- Use drm_err rather than drm_dbg for G2H errors
(Daniele)
- Fix ;; typo
- Clean up sched state functions
- Add lockdep for guc_id functions
- Don't call __release_guc_id when guc_id is invalid
- Use MISSING_CASE
- Add comment in guc_context_pin
- Use shorter path to rpm
(Daniele / CI)
- Don't call release_guc_id on an invalid guc_id in destroy
v4:
(Daniel Vetter)
- Add FIXME comment
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-7-matthew.brost@intel.com
Implement GuC submission tasklet for new interface. The new GuC
interface uses H2G to submit contexts to the GuC. Since H2G use a single
channel, a single tasklet is used for the submission path.
Also the per engine interrupt handler has been updated to disable the
rescheduling of the physical engine tasklet, when using GuC scheduling,
as the physical engine tasklet is no longer used.
In this patch the field, guc_id, has been added to intel_context and is
not assigned. Patches later in the series will assign this value.
v2:
(John Harrison)
- Clean up some comments
v3:
(John Harrison)
- More comment cleanups
Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-5-matthew.brost@intel.com
As we begin applying XeHP and DG2 patches, the basic platform
definitions and macros (like IS_DG2()) will be needed in both
drm-intel-next and drm-intel-gt-next. Those initial definition patches
are applied to a topic branch and merged to both trees.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
DG2 doesn't have a SAGV or QGV points that determine memory bandwidth.
Instead it has a constant amount of memory bandwidth available to
display that does not need to be reduced based on the number of active
planes.
For simplicity, we'll just modify driver initialization to create a
single dummy QGV point with the proper amount of memory bandwidth,
rather than trying to query the pcode for this information.
Bspec: 64631
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Anusha Srivatsa <anusha.srivatsa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-19-matthew.d.roper@intel.com
DG2 extends our DDB to four DBuf slices; pipes A+B only have access to
the first two slices, whereas pipes C+D only have access to the second
two.
Confusingly, our bspec decided to switch from 1-based numbering
of dbuf slices (S1, S2) to 0-based numbering (S0, S1, S2, S3) in
Display13. At the moment we're using the 0-based number scheme for the
DBUF_CTL_S() register addressing, but the 1-based number scheme in the
actual slice assignment tables. We may want to consider switching the
assignment over to 0-based numbering too at some point...
Bspec: 49255
Bspec: 50057
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-16-matthew.d.roper@intel.com
As we begin applying XeHP and DG2 patches, the basic platform
definitions and macros (like IS_DG2()) will be needed in both
drm-intel-next and drm-intel-gt-next. Those initial definition patches
are applied to a topic branch and merged to both trees.
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Our _FEATURES macro went back to GEN7, extending each other, making it
difficult to grasp what was really enabled/disabled. Take the
opportunity of the GEN -> XE_HP name break and also break with the
feature inheritance.
For XE_HP this basically goes from GEN12 back to GEN7 coalescing the
features making sure the overrides remain, remove all the
display-specific features and sort it.
Then also remove the definitions that would be overridden by
DGFX_FEATURES and those that were 0 (since that is the default).
Exception here is has_master_unit_irq: although it is a feature that
started with DG1 and is true for all DGFX platforms, it's also true for
XE_HP in general.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-2-matthew.d.roper@intel.com
Besides the arch version returned by GRAPHICS_VER(), new platforms
contain a "release id" to make clear the difference from one platform to
another.
The release id number is not formally defined by hardware until future
platforms that will expose it via a new GMD_ID register. For the
platforms we support before that register becomes available we will set
the values in software and we can set them as we please. So the plan is
to set them so we can group different features under a single
GRAPHICS_VER_FULL() check.
After GMD_ID is used, the usefulness of a "full version check" will be
greatly reduced and will be mostly used for deciding workarounds and a
few code paths. So it makes sense to keep it as a separate field from
graphics_ver. Also, as a platform with `release == n` may be closer
feature-wise to `n - 2` than to `n - 1`, use the word "release" rather
than the more common "minor" for this
This is a mix of 2 independent changes: one by me and the other by Matt
Roper.
v2:
- Reword commit message to make it clearer why we don't call it
"minor" (Matt Roper and Tvrtko)
- Rename variables s/*_ver_release/*_rel/ and print them in a single
line formatted as {ver}.{rel:2} (Jani and Matt Roper)
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210707235921.2416911-2-lucas.demarchi@intel.com
(cherry picked from commit ca6374e267)
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
There's no reason that I can tell why this should be per-i915_buddy_mm
and doing so causes KMEM_CACHE to throw dmesg warnings because it tries
to create a debugfs entry with the name i915_buddy_block multiple times.
We could handle this by carefully giving each slab its own name but that
brings its own pain because then we have to store that string somewhere
and manage the lifetimes of the different slabs. The most likely
outcome would be a global atomic which we increment to get a new name or
something like that.
The much easier solution is to use the i915_globals system like we do
for every other slab in i915. This ensures that we have exactly one of
them for each i915 driver load and it gets neatly created on module load
and destroyed on module unload. Using the globals system also means
that its now tied into the shrink handler so we can properly respond to
low-memory situations.
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Fixes: 88be9a0a06 ("drm/i915/ttm: add ttm_buddy_man")
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Christian König <christian.koenig@amd.com>
[danvet: Rebase against removal of global shrink code]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721152358.2893314-7-jason@jlekstrand.net
If the driver was not fully loaded, we may still have globals lying
around. If we don't tear those down in i915_exit(), we'll leak a bunch
of memory slabs. This can happen two ways: use_kms = false and if we've
run mock selftests. In either case, we have an early exit from
i915_init which happens after i915_globals_init() and we need to clean
up those globals.
The mock selftests case is especially sticky. The load isn't entirely
a no-op. We actually do quite a bit inside those selftests including
allocating a bunch of mock objects and running tests on them. Once all
those tests are complete, we exit early from i915_init(). Perviously,
i915_init() would return a non-zero error code on failure and a zero
error code on success. In the success case, we would get to i915_exit()
and check i915_pci_driver.driver.owner to detect if i915_init exited early
and do nothing. In the failure case, we would fail i915_init() but
there would be no opportunity to clean up globals.
The most annoying part is that you don't actually notice the failure as
part of the self-tests since leaking a bit of memory, while bad, doesn't
result in anything observable from userspace. Instead, the next time we
load the driver (usually for next IGT test), i915_globals_init() gets
invoked again, we go to allocate a bunch of new memory slabs, those
implicitly create debugfs entries, and debugfs warns that we're trying
to create directories and files that already exist. Since this all
happens as part of the next driver load, it shows up in the dmesg-warn
of whatever IGT test ran after the mock selftests.
While the obvious thing to do here might be to call i915_globals_exit()
after selftests, that's not actually safe. The dma-buf selftests call
i915_gem_prime_export which creates a file. We call dma_buf_put() on
the resulting dmabuf which calls fput() on the file. However, fput()
isn't immediate and gets flushed right before syscall returns. This
means that all the fput()s from the selftests don't happen until right
before the module load syscall used to fire off the selftests returns
which is after i915_init(). If we call i915_globals_exit() in
i915_init() after selftests, we end up freeing slabs out from under
objects which won't get released until fput() is flushed at the end of
the module load syscall.
The solution here is to let i915_init() return success early and detect
the early success in i915_exit() and only tear down globals and nothing
else. This way the module loads successfully, regardless of the success
or failure of the tests. Because we've not enumerated any PCI devices,
no device nodes are created and it's entirely useless from userspace.
The only thing the module does at that point is hold on to a bit of
memory until we unload it and i915_exit() is called. Importantly, this
means that everything from our selftests has the ability to properly
flush out between i915_init() and i915_exit() because there is at least
one syscall boundary in between.
In order to handle all the delicate init/exit cases, we convert the
whole thing to a table of init/exit pairs and track the init status in
the new init_progress global. This allows us to ensure that i915_exit()
always tears down exactly the things that i915_init() successfully
initialized. We also allow early-exit of i915_init() without failure by
an init function returning > 0. This is useful for nomodeset, and
selftests. For the mock selftests, we convert them to always return 1
so we get the desired behavior of the driver always succeeding to load
the driver and then properly tearing down the partially loaded driver.
v2 (Tvrtko Ursulin):
- Guard init_funcs[i].exit with GEM_BUG_ON(i >= ARRAY_SIZE(init_funcs))
v2 (Daniel Vetter):
- Update the docstring for i915.mock_selftests
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721152358.2893314-4-jason@jlekstrand.net
This essentially reverts
commit 84a1074920
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Wed Jan 24 11:36:08 2018 +0000
drm/i915: Shrink the GEM kmem_caches upon idling
mm/vmscan.c:do_shrink_slab() is a thing, if there's an issue with it
then we need to fix that there, not hand-roll our own slab shrinking
code in i915.
Also when this was added there was only one other caller of
kmem_cache_shrink (added 2005 to the acpi code). Now there's a 2nd one
outside of i915 code in a kunit test, which seems legit since that
wants to very carefully control what's in the kmem_cache. This out of
a total of over 500 calls to kmem_cache_create. This alone should have
been warning sign enough that we're doing something silly.
Noticed while reviewing a patch set from Jason to fix up some issues
in our i915_init() and i915_exit() module load/cleanup code. Now that
i915_globals.c isn't any different than normal init/exit functions, we
should convert them over to one unified table and remove
i915_globals.[hc] entirely.
v2: Improve commit message (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: David Airlie <airlied@linux.ie>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721183229.4136488-1-daniel.vetter@ffwll.ch
Workarounds are documented in the bspec with an exclusive upper bound
(i.e., a "fixed" stepping that no longer needs the workaround). This
makes our driver's use of an inclusive upper bound for stepping ranges
confusing; the differing notation between code and bspec makes it very
easy for mistakes to creep in.
Let's switch the upper bound of our IS_{GT,DISP}_STEP macros over to use
an exclusive upper bound like the bspec does. This also has the benefit
of helping make sure workarounds are properly handled for new minor
steppings that show up (e.g., an A1 between the A0 and B0 we already
knew about) --- if the new intermediate stepping pulls in hardware fixes
early, there will be an update to the workaround definition which lets
us know we need to change our code. If the new stepping does not pull a
hardware fix earlier, then the new stepping will already be captured
properly by the "[begin, fix)" range in the code.
We'll probably need to be extra vigilant in code review of new
workarounds for the near future to make sure developers notice the new
semantics of workaround bounds. But we just migrated a bunch of our
platforms from the IS_REVID bounds over to IS_{GT,DISP}_STEP, so people
are already adjusting to the new macros and now is a good time to make
this change too.
[mattrope: Split out display changes to apply through intel-next tree]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-8-matthew.d.roper@intel.com
Workarounds are documented in the bspec with an exclusive upper bound
(i.e., a "fixed" stepping that no longer needs the workaround). This
makes our driver's use of an inclusive upper bound for stepping ranges
confusing; the differing notation between code and bspec makes it very
easy for mistakes to creep in.
Let's switch the upper bound of our IS_{GT,DISP}_STEP macros over to use
an exclusive upper bound like the bspec does. This also has the benefit
of helping make sure workarounds are properly handled for new minor
steppings that show up (e.g., an A1 between the A0 and B0 we already
knew about) --- if the new intermediate stepping pulls in hardware fixes
early, there will be an update to the workaround definition which lets
us know we need to change our code. If the new stepping does not pull a
hardware fix earlier, then the new stepping will already be captured
properly by the "[begin, fix)" range in the code.
We'll probably need to be extra vigilant in code review of new
workarounds for the near future to make sure developers notice the new
semantics of workaround bounds. But we just migrated a bunch of our
platforms from the IS_REVID bounds over to IS_{GT,DISP}_STEP, so people
are already adjusting to the new macros and now is a good time to make
this change too.
[mattrope: Split out GT changes to apply through gt-next tree]
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-8-matthew.d.roper@intel.com
While doing a quick sanity check of the ICL workarounds in the driver I
noticed a few things that should be updated:
* There's no mention in the bspec that WaPipelineFlushCoherentLines
is needed on gen11 (both the current WA database and the old,
deprecated page 20196 were checked); it appears this might have just
been copied from the gen9 list? Even if this were needed, it doesn't
seem like this was the correct implementation anyway since the gen9
workaround is supposed to be implemented in the indirect context bb
(as we do in gen8_emit_flush_coherentl3_wa() on gen8/gen9).
* WaForwardProgressSoftReset does not appear in the current workaround
database. The old deprecated workaround list has a note indicating
the workaround was dropped in 2017, so we should be safe to drop it
from the code too.
While we're at it, add the formal workaround ID number to
WaDisableBankHangMode (our hardware team made a transition from
text-based workaround names to ID numbers partway through the
development of ICL, which is why some workarounds only have names, some
only have numbers, and some have both).
Bspec: 33450
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-3-matthew.d.roper@intel.com
On SKL we've been applying this workaround on H0+ steppings, which is
actually backwards; H0 is supposed to be the first stepping where the
workaround is no longer needed. Flip the bounds so that the workaround
applies to all steppings _before_ H0.
On BXT we've been applying this workaround to all steppings, but the
bspec tells us it's only needed until C0. Pre-C0 GT steppings only
appeared in pre-production hardware, which we no longer support in the
driver, so we can drop the workaround completely for this platform.
On ICL we've been applying this workaround to all steppings, but there
doesn't seem to be any indication that this workaround was ever needed
for this platform (even now-deprecated page 20196 of the bspec doesn't
mention it). We can go ahead and drop it.
I also don't see any mention of this workaround being needed for KBL,
although this may be an oversight since the workaround is needed for all
steppings of CFL. I'll leave the workaround in place for KBL to be
safe.
Bspec: 14091, 33450
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210717051426.4120328-2-matthew.d.roper@intel.com