A starting point to counter the pervasive struct_mutex. For the goal of
avoiding (or at least blocking under them!) global locks during user
request submission, a simple but important step is being able to manage
each clients GTT separately. For which, we want to replace using the
struct_mutex as the guard for all things GTT/VM and switch instead to a
specific mutex inside i915_address_space.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-2-chris@chris-wilson.co.uk
Our goal is to remove struct_mutex and replace it with fine grained
locking. One of the thorny issues is our eviction logic for reclaiming
space for an execbuffer (or GTT mmaping, among a few other examples).
While eviction itself is easy to move under a per-VM mutex, performing
the activity tracking is less agreeable. One solution is not to do any
MRU tracking and do a simple coarse evaluation during eviction of
active/inactive, with a loose temporal ordering of last
insertion/evaluation. That keeps all the locking constrained to when we
are manipulating the VM itself, neatly avoiding the tricky handling of
possible recursive locking during execbuf and elsewhere.
Note that discarding the MRU (currently implemented as a pair of lists,
to avoid scanning the active list for a NONBLOCKING search) is unlikely
to impact upon our efficiency to reclaim VM space (where we think a LRU
model is best) as our current strategy is to use random idle replacement
first before doing a search, and over time the use of softpinned 48b
per-ppGTT is growing (thereby eliminating any need to perform any eviction
searches, in theory at least) with the remaining users being found on
much older devices (gen2-gen6).
v2: Changelog and commentary rewritten to elaborate on the duality of a
single list being both an inactive and active list.
v3: Consolidate bool parameters into a single set of flags; don't
comment on the duality of a single variable being a multiplicity of
bits.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20190128102356.15037-1-chris@chris-wilson.co.uk
Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of
gen_mask to do the comparison. Now callers can pass then gen as a parameter,
so we don't require one macro for each gen.
The following spatch was used to convert the users of these macros:
@@
expression e;
@@
(
- IS_GEN2(e)
+ IS_GEN(e, 2)
|
- IS_GEN3(e)
+ IS_GEN(e, 3)
|
- IS_GEN4(e)
+ IS_GEN(e, 4)
|
- IS_GEN5(e)
+ IS_GEN(e, 5)
|
- IS_GEN6(e)
+ IS_GEN(e, 6)
|
- IS_GEN7(e)
+ IS_GEN(e, 7)
|
- IS_GEN8(e)
+ IS_GEN(e, 8)
|
- IS_GEN9(e)
+ IS_GEN(e, 9)
|
- IS_GEN10(e)
+ IS_GEN(e, 10)
|
- IS_GEN11(e)
+ IS_GEN(e, 11)
)
v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than
using the bitmask
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 141432
Addresses-Coverity-ID: 141433
Addresses-Coverity-ID: 141434
Addresses-Coverity-ID: 141435
Addresses-Coverity-ID: 141436
Addresses-Coverity-ID: 1357360
Addresses-Coverity-ID: 1357403
Addresses-Coverity-ID: 1357433
Addresses-Coverity-ID: 1392622
Addresses-Coverity-ID: 1415273
Addresses-Coverity-ID: 1435752
Addresses-Coverity-ID: 1441500
Addresses-Coverity-ID: 1454596
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180628223541.GA17665@embeddedor.com
Before this commit the WaSkipStolenMemoryFirstPage workaround code was
skipping the first 4k by passing 4096 as start of the address range passed
to drm_mm_init(). This means that calling drm_mm_reserve_node() to try and
reserve the firmware framebuffer so that we can inherit it would always
fail, as the firmware framebuffer starts at address 0.
Commit d435376104 ("drm/i915: skip the first 4k of stolen memory on
everything >= gen8") says in its commit message: "This is confirmed to fix
Skylake screen flickering issues (probably caused by the fact that we
initialized a ring in the first page of stolen, but I didn't 100% confirm
this theory)."
Which suggests that it is safe to use the first page for a linear
framebuffer as the firmware is doing (see note below).
This commit always passes 0 as start to drm_mm_init() and works around
WaSkipStolenMemoryFirstPage in i915_gem_stolen_insert_node_in_range()
by insuring the start address passed by to drm_mm_insert_node_in_range()
is always 4k or more. All entry points to i915_gem_stolen.c go through
i915_gem_stolen_insert_node_in_range(), so that any newly allocated
objects such as ring-buffers will not be allocated in the first 4k.
The one exception is i915_gem_object_create_stolen_for_preallocated()
which directly calls drm_mm_reserve_node() which now will be able to
use the first 4k.
This fixes the i915 driver no longer being able to inherit the firmware
framebuffer on gen8+, which fixes the video output changing from the
vendor logo to a black screen as soon as the i915 driver is loaded
(on systems without fbcon).
Some notes about the mapping of the BIOS framebuffer:
v1 led to some discussion if the assumption of the intel_display.c code
that the firmware framebuffer is a linear mapping of the stolen memory
starting at offset 0 is still correct, because that would mean that the
GOP does not implement the WaSkipStolenMemoryFirstPage workaround.
To verify this the following code was added at the end of
i915_gem_object_create_stolen_for_preallocated() :
pr_err("first ggtt entry before bind: 0x%016llx\n",
readq(dev_priv->ggtt.gsm));
ret = i915_vma_bind(vma,
HAS_LLC(dev_priv) ? I915_CACHE_LLC : I915_CACHE_NONE,
PIN_UPDATE);
pr_err("i915_vma_bind ret %d\n", ret);
pr_err("first ggtt entry after bind: 0x%016llx\n",
readq(dev_priv->ggtt.gsm));
Which prints the mapping of the first page, then does a vma_bind() to
force update the mapping with our linear view of the framebuffer and
then prints the mapping of the first page again.
On an Asrock B150M Pro4S/D3 mainboard with i5-6500 CPU this prints:
[ 1.651141] first ggtt entry before bind: 0x0000000078c00001
[ 1.651151] i915_vma_bind ret 0
[ 1.651152] first ggtt entry after bind: 0x0000000078c00083
And "sudo cat /proc/iomem | grep Stolen" gives:
78c00000-88bfffff : Graphics Stolen Memory
There are no visual changes with this patch (BIOS vendor logo still
stays in place when we inherit the BIOS framebuffer), so the vma_bind()
does not impact which memory is being scanned out.
The address of the first ggtt entry matches with the start of stolen
and the i915_vma_bind call only changes the first gtt entry's flags,
or-ing in _PAGE_RW (BIT(1)) and PPAT_CACHED (BIT(7)), which perfectly
matches what we would expect based on gen8_pte_encode()'s behavior.
So it seems that the GOP indeed does NOT implement the wa and the i915's
code assuming a linear mapping at the start of stolen for the BIOS fb
still holds true for gen8+.
I've also tested this on a Cherry Trail based device (a GPD Win)
with identical results (the flags are 0x1b after the vma_bind
on CHT, which matches with I915_CACHE_NONE).
Changed in v2: No code changes, extended the commit message with the
verification that the intel_display.c BIOS framebuffer mapping is still
correct.
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180420095933.16442-1-hdegoede@redhat.com
i915 is the only driver using those fields in the drm_gem_object
structure, so they only waste memory for all other drivers.
Move the fields into drm_i915_gem_object instead and patch the i915 code
with the following sed commands:
sed -i "s/obj->base.read_domains/obj->read_domains/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c
sed -i "s/obj->base.write_domain/obj->write_domain/g" drivers/gpu/drm/i915/*.c drivers/gpu/drm/i915/*/*.c
Change is only compile tested.
v2: move fields around as suggested by Chris.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180216124338.9087-1-christian.koenig@amd.com
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
While I have no solid proof that ILK follows the ELK path when it
comes to the stolen memory reserved area, there are some hints that
it might be the case. Unfortunately my ILK doesn't have this enabled,
and no way to enable it via the BIOS it seems.
So let's have ILK use the ELK code path, and let's toss in a WARN
into the code to see if we catch anyone with an ILK that has this
enabled to further analyze the situation.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171102151737.23336-3-ville.syrjala@linux.intel.com
Acked-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Remove the struct_mutex requirement around dev_priv->mm.bound_list and
dev_priv->mm.unbound_list by giving it its own spinlock. This reduces
one more requirement for struct_mutex and in the process gives us
slightly more accurate unbound_list tracking, which should improve the
shrinker - but the drawback is that we drop the retirement before
counting so i915_gem_object_is_active() may be stale and lead us to
underestimate the number of objects that may be shrunk (see commit
bed50aea61 ("drm/i915/shrinker: Flush active on objects before
counting")).
v2: Crosslink the spinlock to the lists it protects, and btw this
changes s/obj->global_link/obj->mm.link/
v3: Fix decoupling of old links in i915_gem_object_attach_phys()
v3.1: Fix the fix, only unlink if it was linked
v3.2: Use a local for to_i915(obj->base.dev)->mm.obj_lock
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171016114037.5556-1-chris@chris-wilson.co.uk
Chris Wilson needs the new drm_driver->release callback to make sure
the shiny new dma-buf testcases don't oops the driver on unload.
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
The conversion of stolen to use phys_addr_t (from essentially u32)
sparked an interesting discussion. We treat stolen memory as only
accessible from the GPU (the DMA device) - an attempt to use it from the
CPU will generate a MCE on gen6 onwards, although it is in theory a
physical address that can be dereferenced from the CPU as demonstrated
by earlier generations. As such, using phys_addr_t has the wrong
connotations and as we pass the address into the DMA device via
dma_addr_t (through the scatterlists used to program the GTT entries),
we should treat it as dma_addr_t throughout.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20170127165531.28135-1-chris@chris-wilson.co.uk
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
The i915_stolen_to_physical() function has 'unsigned long' as its
return type but it returns the 'base' variable, which is of type
'u32'. The only place where this function is called assigns the
returned value to dev_priv->mm.stolen_base, which is of type
'phys_addr_t'. The return value is actually a physical address and
everything else in the stolen memory code seems to be using
phys_addr_t, so fix i915_stolen_to_physical() to use phys_addr_t.
v2: Add missing blank lines after declarations (Chris, checkpatch.pl).
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1485461947-16030-1-git-send-email-paulo.r.zanoni@intel.com
Start converting over from the byte count to its semantic macro, either
we want to allocate the size of a physical page in main memory or we
want the size of a virtual page in the GTT. 4096 could mean either, but
PAGE_SIZE and I915_GTT_PAGE_SIZE are explicit and should help improve
code comprehension and future changes. In the future, we may want to use
variable GTT page sizes and so have the challenge of knowing which
hardcoded values were used to represent a physical page vs the virtual
page.
v2: Look for a few more 4096s to convert, discover IS_ALIGNED().
v3: 4096ul paranoia, make fence alignment a distinct value of 4096, keep
bdw stolen w/a as 4096 until we know better.
v4: Add asserts that i915_vma_insert() start/end are aligned to GTT page
sizes.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/20170110144734.26052-1-chris@chris-wilson.co.uk
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Don't even tell the mm allocator to handle the first page of stolen on
the affected platforms. This means that we won't inherit the FB in
case the BIOS decides to put it at the start of stolen. But the BIOS
should not be putting it at the start of stolen since it's going to
get corrupted. I suppose the bug here is that some pixels at the very
top of the screen will be corrupted, so it's not exactly easy to
notice.
We have confirmation that the first page of stolen does actually get
corrupted, so I really think we should do this in order to avoid any
possible future headaches, even if that means losing BIOS framebuffer
inheritance. Let's not use the HW in a way it's not supposed to be
used.
Notice that now ggtt->stolen_usable_size won't reflect the ending
address of the stolen usable range anymore, so we have to fix the
places that rely on this. To simplify, we'll just use U64_MAX.
v2: don't even put the first page on the mm (Chris)
v3: drm_mm_init() takes size instead of end as argument (Ville)
v4: add a comment explaining the reserved ranges (Chris)
use 0 for start and U64_MAX for end when possible (Chris)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94605
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1481808235-27607-1-git-send-email-paulo.r.zanoni@intel.com
Where it is more appropriate and also to be consistent with
the direction of the driver.
v2: Leave out object alloc/free inlining. (Joonas Lahtinen)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>