I see the main drm pull got merged, here's the first batch of fixes for
4.7 already. Fixes all around, a large portion cc: stable stuff.
[airlied: the DP++ stuff is a regression fix].
* tag 'drm-intel-next-fixes-2016-05-25' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Stop automatically retiring requests after a GPU hang
drm/i915: Unify intel_ring_begin()
drm/i915: Ignore stale wm register values on resume on ilk-bdw (v2)
drm/i915/psr: Try to program link training times correctly
drm/i915/bxt: Adjusting the error in horizontal timings retrieval
drm/i915: Don't leave old junk in ilk active watermarks on readout
drm/i915: s/DPPL/DPLL/ for SKL DPLLs
drm/i915: Fix gen8 semaphores id for legacy mode
drm/i915: Set crtc_state->lane_count for HDMI
drm/i915/BXT: Retrieving the horizontal timing for DSI
drm/i915: Protect gen7 irq_seqno_barrier with uncore lock
drm/i915: Re-enable GGTT earlier during resume on pre-gen6 platforms
drm/i915: Determine DP++ type 1 DVI adaptor presence based on VBT
drm/i915: Enable/disable TMDS output buffers in DP++ adaptor as needed
drm/i915: Respect DP++ adaptor TMDS clock limit
drm: Add helper for DP++ adaptors
Pull drm updates from Dave Airlie:
"Here's the main drm pull request for 4.7, it's been a busy one, and
I've been a bit more distracted in real life this merge window. Lots
more ARM drivers, not sure if it'll ever end. I think I've at least
one more coming the next merge window.
But changes are all over the place, support for AMD Polaris GPUs is in
here, some missing GM108 support for nouveau (found in some Lenovos),
a bunch of MST and skylake fixes.
I've also noticed a few fixes from Arnd in my inbox, that I'll try and
get in asap, but I didn't think they should hold this up.
New drivers:
- Hisilicon kirin display driver
- Mediatek MT8173 display driver
- ARC PGU - bitstreamer on Synopsys ARC SDP boards
- Allwinner A13 initial RGB output driver
- Analogix driver for DisplayPort IP found in exynos and rockchip
DRM Core:
- UAPI headers fixes and C++ safety
- DRM connector reference counting
- DisplayID mode parsing for Dell 5K monitors
- Removal of struct_mutex from drivers
- Connector registration cleanups
- MST robustness fixes
- MAINTAINERS updates
- Lockless GEM object freeing
- Generic fbdev deferred IO support
panel:
- Support for a bunch of new panels
i915:
- VBT refactoring
- PLL computation cleanups
- DSI support for BXT
- Color manager support
- More atomic patches
- GEM improvements
- GuC fw loading fixes
- DP detection fixes
- SKL GPU hang fixes
- Lots of BXT fixes
radeon/amdgpu:
- Initial Polaris support
- GPUVM/Scheduler/Clock/Power improvements
- ASYNC pageflip support
- New mesa feature support
nouveau:
- GM108 support
- Power sensor support improvements
- GR init + ucode fixes.
- Use GPU provided topology information
vmwgfx:
- Add host messaging support
gma500:
- Some cleanups and fixes
atmel:
- Bridge support
- Async atomic commit support
fsl-dcu:
- Timing controller for LCD support
- Pixel clock polarity support
rcar-du:
- Misc fixes
exynos:
- Pipeline clock support
- Exynoss4533 SoC support
- HW trigger mode support
- export HDMI_PHY clock
- DECON5433 fixes
- Use generic prime functions
- use DMA mapping APIs
rockchip:
- Lots of little fixes
vc4:
- Render node support
- Gamma ramp support
- DPI output support
msm:
- Mostly cleanups and fixes
- Conversion to generic struct fence
etnaviv:
- Fix for prime buffer handling
- Allow hangcheck to be coalesced with other wakeups
tegra:
- Gamme table size fix"
* 'drm-next' of git://people.freedesktop.org/~airlied/linux: (1050 commits)
drm/edid: add displayid detailed 1 timings to the modelist. (v1.1)
drm/edid: move displayid validation to it's own function.
drm/displayid: Iterate over all DisplayID blocks
drm/edid: move displayid tiled block parsing into separate function.
drm: Nuke ->vblank_disable_allowed
drm/vmwgfx: Report vmwgfx version to vmware.log
drm/vmwgfx: Add VMWare host messaging capability
drm/vmwgfx: Kill some lockdep warnings
drm/nouveau/gr/gf100-: fix race condition in fecs/gpccs ucode
drm/nouveau/core: recognise GM108 chipsets
drm/nouveau/gr/gm107-: fix touching non-existent ppcs in attrib cb setup
drm/nouveau/gr/gk104-: share implementation of ppc exception init
drm/nouveau/gr/gk104-: move rop_active_fbps init to nonctx
drm/nouveau/bios/pll: check BIT table version before trying to parse it
drm/nouveau/bios/pll: prevent oops when limits table can't be parsed
drm/nouveau/volt/gk104: round up in gk104_volt_set
drm/nouveau/fb/gm200: setup mmu debug buffer registers at init()
drm/nouveau/fb/gk20a,gm20b: setup mmu debug buffer registers at init()
drm/nouveau/fb/gf100-: allocate mmu debug buffers
drm/nouveau/fb: allow chipset-specific actions for oneinit()
...
The newly-introduced function i915_gem_object_pin_map() returns an
ERR_PTR (not NULL) if the pin-and-map opertaion fails, so that's what we
must check for. And it's nicer not to assign such a pointer-or-error to
a structure being filled in until after it's been validated, so we
should keep it local and avoid exporting a bogus pointer. Also, for
clarity and symmetry, we should clear 'virtual_start' along with 'vma'
when unmapping a ringbuffer.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
I caught a few errors in our current PHY/CDCLK programming by sanity
checking the actual programmed state, so I thought it would be also
useful for the future. In addition to verifying the state after
programming it also verify it after exiting DC5, to make sure DMC
restored/kept intact everything related.
v2:
- Inlining __phy_reg_verify_state() doesn't make sense and also
incorrect, so don't do it (PW/CI gcc)
v3:
- Rebase on latest -nightly
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: David Weinehall <david.weinehall@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459780030-15781-1-git-send-email-imre.deak@intel.com
Conceptually, each request is a record of a hardware transaction - we
build up a list of pending commands and then either commit them to
hardware, or cancel them. However, whilst building up the list of
pending commands, we may modify state outside of the request and make
references to the pending request. If we do so and then cancel that
request, external objects then point to the deleted request leading to
both graphical and memory corruption.
The easiest example is to consider object/VMA tracking. When we mark an
object as active in a request, we store a pointer to this, the most
recent request, in the object. Then we want to free that object, we wait
for the most recent request to be idle before proceeding (otherwise the
hardware will write to pages now owned by the system, or we will attempt
to read from those pages before the hardware is finished writing). If
the request was cancelled instead, that wait completes immediately. As a
result, all requests must be committed and not cancelled if the external
state is unknown.
All that remains of i915_gem_request_cancel() users are just a couple of
extremely unlikely allocation failures, so remove the API entirely.
A consequence of committing all incomplete requests is that we generate
excess breadcrumbs and fill the ring much more often with dummy work. We
have completely undone the outstanding_last_seqno optimisation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93907
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-16-git-send-email-chris@chris-wilson.co.uk
Reporting -EIO from i915_wait_request() has proven very troublematic
over the years, with numerous hard-to-reproduce bugs cropping up in the
corner case of where a reset occurs and the code wasn't expecting such
an error.
If the we reset the GPU or have detected a hang and wish to reset the
GPU, the request is forcibly complete and the wait broken. Currently, we
report either -EAGAIN or -EIO in order for the caller to retreat and
restart the wait (if appropriate) after dropping and then reacquiring
the struct_mutex (essential to allow the GPU reset to proceed). However,
if we take the view that the request is complete (no further work will
be done on it by the GPU because it is dead and soon to be reset), then
we can proceed with the task at hand and then drop the struct_mutex
allowing the reset to occur. This transfers the burden of checking
whether it is safe to proceed to the caller, which in all but one
instance it is safe - completely eliminating the source of all spurious
-EIO.
Of note, we only have two API entry points where we expect that
userspace can observe an EIO. First is when submitting an execbuf, if
the GPU is terminally wedged, then the operation cannot succeed and an
-EIO is reported. Secondly, existing userspace uses the throttle ioctl
to detect an already wedged GPU before starting using HW acceleration
(or to confirm that the GPU is wedged after an error condition). So if
the GPU is wedged when the user calls throttle, also report -EIO.
v2: Split more carefully the change to i915_wait_request() and assorted
ABI from the reset handling.
v3: Add a couple of WARN_ON(EIO) to the interruptible modesetting code
so that we don't start to leak EIO there in future (and break our hang
resistant modesetting).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-9-git-send-email-chris@chris-wilson.co.uk
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-1-git-send-email-chris@chris-wilson.co.uk
As the request is only valid during the same global reset epoch, we can
record the current reset_counter when constructing the request and reuse
it when waiting upon that request in future. This removes a very hairy
atomic check serialised by the struct_mutex at the time of waiting and
allows us to transfer those waits to a central dispatcher for all
waiters and all requests.
PS: With per-engine resets, we obviously cannot assume a global reset
epoch for the requests - a per-engine epoch makes the most sense. The
challenge then is how to handle checking in the waiter for when to break
the wait, as the fine-grained reset may also want to requeue the
request (i.e. the assumption that just because the epoch changes the
request is completed may be broken - or we just avoid breaking that
assumption with the fine-grained resets).
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-7-git-send-email-chris@chris-wilson.co.uk
In the reset_counter, we use two bits to track a GPU hang and reset. The
low bit is a "reset-in-progress" flag that we set to signal when we need
to break waiters in order for the recovery task to grab the mutex. As
soon as the recovery task has the mutex, we can clear that flag (which
we do by incrementing the reset_counter thereby incrementing the gobal
reset epoch). By clearing that flag when the recovery task holds the
struct_mutex, we can forgo a second flag that simply tells GEM to ignore
the "reset-in-progress" flag.
The second flag we store in the reset_counter is whether the
reset failed and we consider the GPU terminally wedged. Whilst this flag
is set, all access to the GPU (at least through GEM rather than direct mmio
access) is verboten.
PS: Fun is in store, as in the future we want to move from a global
reset epoch to a per-engine reset engine with request recovery.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-6-git-send-email-chris@chris-wilson.co.uk
This is principally a little bit of syntatic sugar to hide the
atomic_read()s throughout the code to retrieve the current reset_counter.
It also provides the other utility functions to check the reset state on the
already read reset_counter, so that (in later patches) we can read it once
and do multiple tests rather than risk the value changing between tests.
v2: Be more strict on converting existing i915_reset_in_progress() over to
the more verbose i915_reset_in_progress_or_wedged().
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1460565315-7748-4-git-send-email-chris@chris-wilson.co.uk
Store the edram capabilities instead of only the size of
edram. This is preparatory patch to allow edram size calculation
based on edram capability bits for gen9+. With gen9 the
edram is behind llc and is a separate entity. With hsw/bdw
it was more of a victim cache for LLC so the name 'eLLC' might
be warranted. Regardless, rename all mentions of eLLC to EDRAM to
clear the confusion.
v2: return bytes for edram size (Chris)
s/eLLC/eDRAM in output if we are gen > 8
v3: rebase, INTEL_GEN (Chris)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Rather than blindly waking up all forcewake domains on command
submission, we can teach each engine what is (or are) the correct
one to take.
On platforms with multiple forcewake domains like VLV, CHV, SKL
and BXT, this has the potential of lowering the GPU and CPU
power use and submission latency.
To implement it we add a function named
intel_uncore_forcewake_for_reg whose purpose is to query which
forcewake domains need to be taken to read or write a specific
register with raw mmio accessors.
These enables the execlists engine setup to query which
forcewake domains are relevant per engine on the currently
running platform.
v2:
* Kerneldoc.
* Split from intel_uncore.c macro extraction, WARN_ON,
no warns on old platforms. (Chris Wilson)
v3:
* Single domain per engine, mention all registers,
bi-directional function and a new name, fix handling
of gen6 and gen7 writes. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1460468251-14069-1-git-send-email-tvrtko.ursulin@linux.intel.com
As the vast majority of users do not use the domain id variable,
we can eliminate it from the iterator and also change the latter
using the same principle as was recently done for for_each_engine.
For a couple of callers which do need the domain mask, store it
in the domain array (which already has the domain id), then both
can be retrieved thence.
Result is clearer code and smaller generated binary, especially
in the tight fw get/put loops. Also, relationship between domain
id and mask is no longer assumed in the macro.
v2: Improve grammar in the commit message and rename the
iterator to for_each_fw_domain_masked for consistency.
(Dave Gordon)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Because it is based on jiffies, current implementation releases the
forcewake at any time between straight away and between 1ms and 10ms,
depending on the kernel configuration (CONFIG_HZ).
This is probably not what has been desired, since the dynamics of keeping
parts of the GPU awake should not be correlated with this kernel
configuration parameter.
Change the auto-release mechanism to use hrtimers and set the timeout to
1ms with a 1ms of slack. This should make the GPU power consistent
across kernel configs, and timer slack should enable some timer coalescing
where multiple force-wake domains exist, or with unrelated timers.
For GlBench/T-Rex this decreases the number of forcewake releases from
~480 to ~300 per second, and for a heavy combined OGL/OCL test from
~670 to ~360 (HZ=1000 kernel).
Even though this reduction can be attributed to the average release period
extending from 0-1ms to 1-2ms, as discussed above, it will make the
forcewake timeout consistent for different CONFIG_HZ values.
Real life measurements with the above workload has shown that, with this
patch, both manage to auto-release the forcewake between 2-4 times per
10ms, even though the number of forcewake gets is dramatically different.
T-Rex requests between 5-10 explicit gets and 5-10 implict gets in each
10ms period, while the OGL/OCL test requests 250 and 380 times in the same
period.
The two data points together suggest that the nature of the forwake
accesses is bursty and that further changes and potential timeout
extensions, or moving the start of timeout from the first to the last
automatic forcewake grab, should be carefully measured for power and
performance effects.
v2:
* Commit spelling. (Dave Gordon)
* More discussion on numbers in the commit. (Chris Wilson)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
We've had problems on several occasions with using the panel type
from the VBT block 40. Usually it seems to be 2, which often
doesn't give us the correct timings for the panel. After some
more digging I found a way to get a panel type via the OpRegion
SWSCI GBDA "Get Panel Details" method. Let's try to use it.
The spec has this to say about the output:
"Bits [15:8] - Panel Type
Bits contain the panel type user setting from CMOS
00h = Not Valid, use default Panel Type & Timings from VBT
01h - 0Fh = Panel Number"
Another version of the spec lists the valid range as 1-16, which makes
more sense since VBT supports 16 panels. Based on actual results
from Rob's G45, 1-16 is what we need to accept.
The other bits in the output don't look relevant for the problem at
hand.
The input is specified as:
"Bits [31:4] - Reserved
Reserved (must be zero)
Bits [3:0] - Panel Number
These bits contain the sequential index of Panel, starting at 0 and
counting upwards from the first integrated Internal Flat-Panel Display
Encoder present, and then from the first external Display Encoder
(e.g., S/DVO-B then S/DVO-C) which supports Internal Flat-Panels.
0h - 0Fh = Panel number"
For now I've just hardcoded the input panel number as 0. That would seem
like a decent choise for LVDS. Not so sure about eDP when port != A.
v2: Accept values 1-16
Filter out bogus results in opregion code (Jani)
Add debug logging for all the different branches (Jani)
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Rob Kramer <rob@solution-space.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94825
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460359431-11003-1-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Tested-by: Rob Kramer <rob@solution-space.com>
When the GMBUS based i2c transfer times out, we try to fall back to
bit-banging and retry the operation that way. However if the bit-banging
attempt also fails, we should probably go back to the GMBUS method for
the next attempt. Maybe there simply wasn't anyone one the bus at this
time.
There's also a bit of a mess going on with the force_bit handling.
It's supposed to be a ref count actually, and it is as far as
intel_gmbus_force_bit() is concerned. But it's treated as just a
flag by the timeout based bit-banging fallback. I suppose that's
fine since we should never end up in the timeout fallback case
if force_bit was already non-zero. However now that we want to restore
things back to where they were after the bit-banging attempt failed,
we're going to have to do things a bit differently to avoid clobbering
the force_bit count as set up by intel_gmbus_force_bit(). So let's
dedicate the high bit as a flag for the low level timeout based fallback
and treat the rest of the bits as a ref count just as before.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457366220-29409-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
We now have two implementations for vmapping a whole object, one for
dma-buf and one for the ringbuffer. If we couple the mapping into the
obj->pages lifetime, then we can reuse an obj->mapping for both and at
the same time couple it into the shrinker. There is a third vmapping
routine in the cmdparser that maps only a range within the object, for
the time being that is left alone, but will eventually use these routines
in order to cache the mapping between invocations.
v2: Mark the failable kmalloc() as __GFP_NOWARN (vsyrjala)
v3: Call unpin_vmap from the right dmabuf unmapper
v4: Rename vmap to map as we don't wish to imply the type of mapping
involved, just that it contiguously maps the object into kernel space.
Add kerneldoc and lockdep annotations
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460113874-17366-4-git-send-email-chris@chris-wilson.co.uk
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
In order to simplify future patches, extract the
lazy_coherency optimisation our of the engine->get_seqno() vfunc into
its own callback.
v2: Rename the barrier to engine->irq_seqno_barrier to try and better
reflect that the barrier is only required after the user interrupt before
reading the seqno (to ensure that the seqno update lands in time as we
do not have strict seqno-irq ordering on all platforms).
Reviewed-by: Dave Gordon <david.s.gordon@intel.com> [#v2]
v3: Comments for hangcheck paranoia. Mika wanted to keep the extra
barrier inside the hangcheck, just in case. I can argue that it doesn't
provide a barrier against anything, but the side-effects of applying the
barrier may prevent a false declaration of a hung GPU.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1460195877-20520-2-git-send-email-chris@chris-wilson.co.uk
This patch sets the invert bit for hpd detection for each port
based on VBT configuration. Since each AOB can be designed to
depend on invert bit or not, it is expected if an AOB requires
invert bit, the user will set respective bit in VBT.
v2: Separated VBT parsing from the rest of the logic. (Jani)
v3: Moved setting invert bit logic to bxt_hpd_irq_setup()
and changed its logic to avoid looping twice. (Ville)
v4: Changed the logic to mask out the bits first and then
set them to remove need of temporary variable. (Ville)
v5: Moved defines to existing set of defines for the register
and added required breaks. (Ville)
Signed-off-by: Sivakumar Thulasimani <sivakumar.thulasimani@intel.com>
Signed-off-by: Durgadoss R <durgadoss.r@intel.com>
Signed-off-by: Shubhangi Shrivastava <shubhangi.shrivastava@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
[Jani: fixed some checkpatch noise, added kernel-doc.]
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1459420907-11383-2-git-send-email-shubhangi.shrivastava@intel.com
Extract the GPLL reference frequency from CCK and use it in the
GPU freq<->opcode conversions on VLV/CHV. This eliminates all the
assumptions we have about which divider is used for which czclk
frequency.
Note that unlike most clocks from CCK, the GPLL ref clock is a divided
down version of the CZ clock rather than the HPLL clock. CZ clock itself
is a divided down version of the HPLL clock though, so in effect it just
gets divided down twice.
While at it, throw in a few comments explaining the remaining constants
for anyone who later wants to compare this to the spreadsheets.
v2: Add slow/fast notes for CHV clocks (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457120584-26080-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
Due to timing issues in the HW, some of the status bits required for GuC
authentication occasionally don't get set; when that happens, the GuC
cannot be initialized and we will be left with a wedged GPU. The W/A
suggested is to perform a soft reset of the GuC and attempt to reload
the F/W again for few times before giving up.
As the failure is dependent on timing, tests performed by triggering
manual full gpu reset (i915_wedged) showed that we could sometimes hit
this after several thousand iterations, but sometimes tests ran even
longer without any issues. Reset and reload mechanism proved helpful
when we indeed hit f/w load failure, so it is better to include this
to improve driver stability.
This change implements the following WAs,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
DPLL_MD(PIPE_C) is AWOL on CHV. Instead of fixing it someone added
chicken bits to propagate the pixel multiplier from DPLL_MD(PIPE_B)
to either pipe B or C. So do that to make pixel repeat work on pipes
B and C. Pipe A is fine without any tricks.
Fortunately the pixel repeat propagation appears to be a oneshot
operation, so once the value has been written we can clear the
chicken bits. So it is still possible to drive pipe B and C with
different pixel multipliers simultaneosly.
Looks like DPLL_VGA_MODE_DIS must also be set in DPLL(PIPE_B)
for this to work. But since we keep that bit always set in all
DPLLs there's no problem.
This of course means we can't reliably read out the pixel multiplier
for pipes B and C. That would make the state checker unhappy, so I
added shadow copies of those registers in to dev_priv. The other
option would have been to skip pixel multiplier, dpll_md an dotclock
checks entirely on CHV, but that feels like a serious loss of cross
checking, so just pretending that we have working DPLL MD registers
seemed better. Obviously with the shadow copies we can't detect if
the pixel multiplier was properly configured, nor can we take over
its state from the BIOS, but hopefully people won't have displays
that would be limitd to such crappy modes.
There is one strange flicker still remaining. It's visible on
pipe C/HDMID when HDMIB is enabled while driven by pipe B.
It doesn't occur if pipe A drives HDMIB, nor is there any glitch
on pipe B/HDMIB when port C/HDMID starts up. I don't have a board
with HDMIC so not sure if it happens there too. So I'm not sure
if it's somehow tied in with this strange linkage between pipe B
and C. Sadly I was unable to find an enable sequence that would
avoid the glitch, but at least it's not fatal ie. the output
recovers afterwards.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1458052809-23426-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Refer to the GGTT VM consistently as "ggtt->base" instead of just "ggtt",
"vm" or indirectly through other variables like "dev_priv->ggtt.base"
to avoid confusion with the i915_ggtt object itself and PPGTT VMs.
Refer to the GGTT as "ggtt" instead of indirectly through chaining.
As a bonus gets rid of the long-standing i915_obj_to_ggtt vs.
i915_gem_obj_to_ggtt conflict, due to removal of i915_obj_to_ggtt!
v2:
- Added some more after grepping sources with Chris
v3:
- Refer to GGTT VM through ggtt->base consistently instead of ggtt_vm
(Chris)
v4:
- Convert all dev_priv->ggtt->foo accesses to ggtt->foo.
v5:
- Make patch checker happy
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
With async modesets this is no longer protected with connection_mutex,
so ensure that each pll has its own lock. The pll configuration state
is still protected; it's only the pll updates that need locking against
concurrency.
Changes since v1:
- Rebased.
- Fix locking to protect all accesses. (Durgadoss)
Changes since v2:
- Make the dpll_lock global to protect concurrent updates to the
same register, for example DPLL_CTRL1 on skl. (Ander)
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/56F29F50.1090708@linux.intel.com
Reviewed-by: Ander Conselvan de Oliveira <conselvan2@gmail.com>
Equivalent to the existing for_each_engine() macro, this will replace
the latter wherever the third argument *is* actually wanted (in most
places, it is not used). The third argument is renamed to emphasise
that it is an engine id (type enum intel_engine_id). All the callers of
the macro that actually need the third argument are updated to use this
version, and the argument (generally 'i') is also updated to be 'id'.
Other callers (where the third argument is unused) are untouched for
now; they will be updated in the next patch.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>