RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.
Prior to commit e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.
To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)
v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v5: s/state/interactive/
v6: Group the mutex with its principle in a substruct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180731132629.3381-1-chris@chris-wilson.co.uk
(cherry picked from commit 60548c554b)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
A long time ago, we were afraid of handling interrupts and signaling
waiters during a reset, worrying that the confusion in request handling
would interfere with our attempts to process the reset in an orderly
fashion. Since then, we have isolated our irq-driven request handling by
virtue of the engine->timeline.lock and control of kthreads where
required, eliminating the danger of concurrently processing interrupts.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180806145647.13131-1-chris@chris-wilson.co.uk
After disabling resource streamer on ICL (due to it actually not
existing there), I got feedback that there have been some experimental
patches for mesa to use RS years ago, but nothing ever landed or shipped
because there was no performance improvement.
This removes it from kernel keeping the uapi defines around for
compatibility.
v2: - re-add the inadvertent removal of CTX_CTRL_INHIBIT_SYN_CTX_SWITCH
- don't bother trying to document removed params on uapi header:
applications should know that from the query.
(from Chris)
v3: - disable CTX_CTRL_RS_CTX_ENABLE istead of removing it
- reword commit message after Daniele confirmed no performance
regression on his machine
- reword commit message to make clear RS is being removed due to
never been used
v4: - move I915_EXEC_RESOURCE_STREAMER to __I915_EXEC_ILLEGAL_FLAGS so
the check on ioctl() is made much earlier by
i915_gem_check_execbuffer() (suggested by Tvrtko)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180803232443.17193-1-lucas.demarchi@intel.com
The next patch in this series will have to make the definition of
irq_cpustat_t available to entering_irq().
Inclusion of asm/hardirq.h into asm/apic.h would cause circular header
dependencies like
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/topology.h
linux/smp.h
asm/smp.h
or
linux/gfp.h
linux/mmzone.h
asm/mmzone.h
asm/mmzone_64.h
asm/smp.h
asm/apic.h
asm/hardirq.h
linux/irq.h
linux/irqdesc.h
linux/kobject.h
linux/sysfs.h
linux/kernfs.h
linux/idr.h
linux/gfp.h
and others.
This causes compilation errors because of the header guards becoming
effective in the second inclusion: symbols/macros that had been defined
before wouldn't be available to intermediate headers in the #include chain
anymore.
A possible workaround would be to move the definition of irq_cpustat_t
into its own header and include that from both, asm/hardirq.h and
asm/apic.h.
However, this wouldn't solve the real problem, namely asm/harirq.h
unnecessarily pulling in all the linux/irq.h cruft: nothing in
asm/hardirq.h itself requires it. Also, note that there are some other
archs, like e.g. arm64, which don't have that #include in their
asm/hardirq.h.
Remove the linux/irq.h #include from x86' asm/hardirq.h.
Fix resulting compilation errors by adding appropriate #includes to *.c
files as needed.
Note that some of these *.c files could be cleaned up a bit wrt. to their
set of #includes, but that should better be done from separate patches, if
at all.
Signed-off-by: Nicolai Stange <nstange@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We don't have proper watermark NV12 support on ICL due to differences
in how it should be implemented. In commit 234059da0f
("drm/i915/icl: NV12 y-plane ddb is not in same plane") we avoided
writing the non-existent PLANE_NV12_BUF_CFG registers but we forgot to
also avoid them on the hardware state readout. While the code is still
not correct, at least now we can avoid unclaimed register error
messages when dealing with RGB formats, which makes CI happier.
Also add some FIXME comments in order to make it even more clear that
there's still work to do.
References: commit 234059da0f ("drm/i915/icl: NV12 y-plane ddb is
not in same plane")
Cc: Mahesh Kumar <mahesh1.kumar@intel.com>
Reviewed-by: Mahesh Kumar <mahesh1.kumar@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180801004614.22149-1-paulo.r.zanoni@intel.com
if a context is a restore inhibit context, gfx hw only load the first page
for ring context, so we only need to copy from guest the 1 page too.
v3: use "return" instead of "goto" for inhibit case. (zhenyu wang)
v2: move judgement of restore inhibit to a macro in mmio_context.h
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Acked-by: Hang Yuan <hang.yuan@linux.intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
MI_NOOP is a common command appearing in almost all command buffers, put it
into a fastpath can improve perfomance, especially in command buffers
contains lots of MI_NOOPs (0s).
Take glmark2 as an example, 3% performance increase is observed after
introduced this patch. Meanwhile, in case where abundant in MI_NOOPs,
up to 12% performance increase is measured.
v2: use lowercase for index of MI_NOOP in cmd_info (zhenyu wang)
Signed-off-by: Li Weinan <weinan.z.li@intel.com>
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
A bit larger this time around, due to introduction of "dpu1" support
for the display controller in sdm845 and beyond. This has been on
list and undergoing refactoring since Feb (going from ~110kloc to
~30kloc), and all my review complaints have been addressed, so I'd be
happy to see this upstream so further feature work can procede on top
of upstream.
Also includes the gpu coredump support, which should be useful for
debugging gpu crashes. And various other misc fixes and such.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGv-8y3zguY0Mj1vh=o+vrv_bJ8AwZ96wBXYPvMeQT2XcA@mail.gmail.com
RPS provides a feedback loop where we use the load during the previous
evaluation interval to decide whether to up or down clock the GPU
frequency. Our responsiveness is split into 3 regimes, a high and low
plateau with the intent to keep the gpu clocked high to cover occasional
stalls under high load, and low despite occasional glitches under steady
low load, and inbetween. However, we run into situations like kodi where
we want to stay at low power (video decoding is done efficiently
inside the fixed function HW and doesn't need high clocks even for high
bitrate streams), but just occasionally the pipeline is more complex
than a video decode and we need a smidgen of extra GPU power to present
on time. In the high power regime, we sample at sub frame intervals with
a bias to upclocking, and conversely at low power we sample over a few
frames worth to provide what we consider to be the right levels of
responsiveness respectively. At low power, we more or less expect to be
kicked out to high power at the start of a busy sequence by waitboosting.
Prior to commit e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active
request") whenever we missed the frame or stalled, we would immediate go
full throttle and upclock the GPU to max. But in commit e9af4ea2b9, we
relaxed the waitboosting to only apply if the pipeline was deep to avoid
over-committing resources for a near miss. Sadly though, a near miss is
still a miss, and perceptible as jitter in the frame delivery.
To try and prevent the near miss before having to resort to boosting
after the fact, we use the pageflip queue as an indication that we are
in an "interactive" regime and so should sample the load more frequently
to provide power before the frame misses it vblank. This will make us
more favorable to providing a small power increase (one or two bins) as
required rather than going all the way to maximum and then having to
work back down again. (We still keep the waitboosting mechanism around
just in case a dramatic change in system load requires urgent uplocking,
faster than we can provide in a few evaluation intervals.)
v2: Reduce rps_set_interactive to a boolean parameter to avoid the
confusion of what if they wanted a new power mode after pinning to a
different mode (which to choose?)
v3: Only reprogram RPS while the GT is awake, it will be set when we
wake the GT, and while off warns about being used outside of rpm.
v4: Fix deferred application of interactive mode
v5: s/state/interactive/
v6: Group the mutex with its principle in a substruct
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107111
Fixes: e9af4ea2b9 ("drm/i915: Avoid waitboosting on the active request")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Radoslaw Szwichtenberg <radoslaw.szwichtenberg@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180731132629.3381-1-chris@chris-wilson.co.uk
The i915 DRM driver very cleverly used ascii85 encoding for their
GPU state file. Move the encode functions to a general header file to
support other drivers that might be interested in the same
functionality.
v4: Make the return value const char * as suggested by Chris Wilson
v3: Fix error_puts -> err_puts pointed out by the 01.org bot
v2: Update API to be cleaner for the caller as suggested by Chris Wilson
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
The new recommendation from the spec is to simply not set this bit
anymore. Not setting the bit would prevent some hangs that our driver
manages to avoid since commit c8af5274c3 ("drm/i915: enable the
pipe/transcoder/planes later on HSW+"), and the theoretical downside
of not setting the bit doesn't seem realistic according to the HW
team. Let's follow their recommendation.
BSpec: 20233
References: commit c8af5274c3 ("drm/i915: enable the
pipe/transcoder/planes later on HSW+")
Cc: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726001229.13791-1-paulo.r.zanoni@intel.com
Removing the pin bias from GuC allows us to not check for GuC every time
we pin a context, which fixes the assertion error on unresolved GuC
platform default in mock contexts selftest.
It also seems that we were using uninitialized WOPCM variables when
setting the GuC pin bias. The pin bias has to be set after the WOPCM,
but before the call to i915_gem_contexts_init where the first contexts
are pinned.
v2:
This also makes it so that there's no need to set GuC variables from
within the WOPCM init function or to move the WOPCM init, while keeping
the correct initialization order. Also for mock tests the pin bias is
left at 0 and we make sure that the pin bias with GuC will not be
smaller than without GuC.
v3:
Avoid unused i915 in intel_guc_ggtt_offset if debug is disabled.
v4:
Squash with WOPCM init reordering.
Moved the i915_ggtt_pin_bias helper to this patch, and made some
functions use it instead of directly dereferencing i915->ggtt.
v5:
Since we now don't use wopcm.guc.base for the pin bias there's no need to
validate it. It also has already been verified in WOPCM init.
v6:
Deleted the now unnecessarily introduced includes from previous versions.
Dropped naming changes from dev_priv to i915 for better patch readability.
v7:
Changed some comments to make more sense in the context they're in.
v8:
Moved and renamed the function which now returns the wopcm.guc.size to
intel_guc.c:intel_guc_reserved_gtt_size to avoid any possible confusion
with the pin_bias in ggtt, which should be used for pinning.
Fixed patch not applying or the most recent upstream.
Fixes: f7dc0157e4 ("drm/i915/uc: Fetch GuC/HuC firmwares from guc/huc specific init")
Testcase: igt/drv_selftest/mock_contexts #GuC
Signed-off-by: Jakub Bartmiński <jakub.bartminski@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Michał Winiarski <michal.winiarski@intel.com>
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180727141148.30874-3-jakub.bartminski@intel.com
Immutable branch (mfd, chrome) due for the v4.19 window
Immutable Branch which moves the cros_ec_i2c and cros_ec_spi
transport drivers from mfd to platform/chrome. Changes in arm are a simple
rename in defconfigs. Change in input is a rename in help text.
On older HW, gen2/3, fence registers are used for detiling GPU commands
and as such changing those registers requires serialisation with the
requests on the GPU. Anything running on the GPU is subject to a hang,
and so we must be able to recover cleanly in the middle of a stuck wait
on a fence register.
We can simulate using the fence on the GPU simply by marking the fence
as active on the request for this vma, the interface being common to all
gen, thus broadening the test.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180719194746.19111-2-chris@chris-wilson.co.uk
If we fail during GEM initialisation, we scrub the HW state by
performing a device level GPU resuet. However, we want to leave the
system in a usable state (with functioning KMS but no GEM) so after
scrubbing the HW state, we need to restore some sane defaults and
re-enable the low-level common parts of the GPU (such as the GMCH).
v2: Restore GTT entries.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180726085033.4044-2-chris@chris-wilson.co.uk
Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>