GM20B requires an extra clock compared to GK20A. Add that information
into the platform data and acquire and enable this clock if necessary.
Signed-off-by: Alexandre Courbot <acourbot@nvidia.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
This pull request want to land the analogix_dp driver into drm/bridge directory,
which reused the Exynos DP code, and add Rockchip DP support. And those
patches have been:
* 'drm-next-analogix-dp-v2' of github.com:yakir-Yang/linux:
drm: bridge: analogix/dp: Fix the possible dead lock in bridge disable time
drm: bridge: analogix/dp: add panel prepare/unprepare in suspend/resume time
drm: bridge: analogix/dp: add edid modes parse in get_modes method
drm: bridge: analogix/dp: move hpd detect to connector detect function
drm: bridge: analogix/dp: try force hpd after plug in lookup failed
drm: bridge: analogix/dp: add max link rate and lane count limit for RK3288
drm: bridge: analogix/dp: add some rk3288 special registers setting
dt-bindings: add document for rockchip variant of analogix_dp
drm: rockchip: dp: add rockchip platform dp driver
ARM: dts: exynos/dp: remove some properties that deprecated by analogix_dp driver
dt-bindings: add document for analogix display port driver
drm: bridge: analogix/dp: dynamic parse sync_pol & interlace & dynamic_range
drm: bridge: analogix/dp: remove duplicate configuration of link rate and link count
drm: bridge: analogix/dp: fix some obvious code style
drm: bridge: analogix/dp: rename register constants
drm/exynos: dp: rename implementation specific driver part
drm: bridge: analogix/dp: split exynos dp driver to bridge directory
imx-drm: stricter plane parameter checking, dw_hdmi-imx and dmfc fixes
- Check whether plane parameters comply with IPU IDMAC limitations and
fix planar YUV 4:2:0 U/V offsets and stride
- Cleanup encoder in dw_hdmi-imx bind error path and
remove a superfluous platform_set_drvdata in dw_hdmi-imx
- DMFC setup fixes: lock the ipu_dmfc_init_channel function against
concurrent use, rename it to ipu_dmfc_config_wait4eot, and call
it after the FIFO size has been determined.
* tag 'imx-drm-next-2016-04-01' of git://git.pengutronix.de/git/pza/linux:
drm/imx: Don't set a gamma table size
drm/imx: ipuv3-plane: Configure DMFC wait4eot bit after slots are determined
gpu: ipu-v3: ipu-dmfc: Rename ipu_dmfc_init_channel to ipu_dmfc_config_wait4eot
gpu: ipu-v3: ipu-dmfc: Make function ipu_dmfc_init_channel() return void
gpu: ipu-v3: ipu-dmfc: Protect function ipu_dmfc_init_channel() with mutex
drm/imx: dw_hdmi: Don't call platform_set_drvdata()
drm/imx: dw_hdmi: Call drm_encoder_cleanup() in error path
drm/imx: ipuv3-plane: fix planar YUV 4:2:0 support
drm/imx: ipuv3-plane: Add more thorough checks for plane parameter limitations
gpu: ipu-cpmem: modify ipu_cpmem_set_yuv_planar_full for better control
- VBT code refactor for a clean split between parsing&using of firmware
information (Jani)
- untangle the pll computation code, and splitting up the monster
i9xx_crtc_compute_clocks (Ander)
- dsi support for bxt (Jani, Shashank Sharma and others)
- color manager (i.e. de-gamma, color conversion matrix & gamma support) from
Lionel Landwerlin
- Vulkan hsw support in the command parser (Jordan Justen)
- large-scale renaming of intel_engine_cs variables/parameters to avoid the epic
ring vs. engine confusion introduced in gen8 (Tvrtko Ursulin)
- few atomic patches from Maarten&Matt, big one is two-stage wm programming on ilk-bdw
- refactor driver load and add infrastructure to inject load failures for
testing, from Imre
- various small things all over
* tag 'drm-intel-next-2016-03-30' of git://anongit.freedesktop.org/drm-intel: (179 commits)
drm/i915: Update DRIVER_DATE to 20160330
drm/i915: Call intel_dp_mst_resume() before resuming displays
drm/i915: Fix races on fbdev
drm/i915: remove unused dev_priv->render_reclock_avail
drm/i915: move sdvo mappings to vbt data
drm/i915: move edp low vswing config to vbt data
drm/i915: use a substruct in vbt data for edp
drm/i915: replace for_each_engine()
drm/i915: introduce for_each_engine_id()
drm/i915/bxt: Fix DSI HW state readout
drm/i915: Remove vblank wait from hsw_enable_ips, v2.
drm/i915: Tidy aliasing_gtt_bind_vma()
drm/i915: Split PNV version of crtc_compute_clock()
drm/i915: Split g4x_crtc_compute_clock()
drm/i915: Split i8xx_crtc_compute_clock()
drm/i915: Split CHV and VLV specific crtc_compute_clock() hooks
drm/i915: Merge ironlake_compute_clocks() and ironlake_crtc_compute_clock()
drm/i915: Move fp divisor calculation into ironlake_compute_dpll()
drm/i915: Pass crtc_state->dpll directly to ->find_dpll()
drm/i915: Simplify ironlake_crtc_compute_clock() CPU eDP case
...
Extract the GPLL reference frequency from CCK and use it in the
GPU freq<->opcode conversions on VLV/CHV. This eliminates all the
assumptions we have about which divider is used for which czclk
frequency.
Note that unlike most clocks from CCK, the GPLL ref clock is a divided
down version of the CZ clock rather than the HPLL clock. CZ clock itself
is a divided down version of the HPLL clock though, so in effect it just
gets divided down twice.
While at it, throw in a few comments explaining the remaining constants
for anyone who later wants to compare this to the spreadsheets.
v2: Add slow/fast notes for CHV clocks (Imre)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457120584-26080-2-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Imre Deak <imre.deak@intel.com> (v1)
The Tegra powergate and rail IDs are always positive values and so change
the type to be unsigned and remove the tests to see if the ID is less
than zero. Update the Tegra DC powergate type to be an unsigned as well.
Signed-off-by: Jon Hunter <jonathanh@nvidia.com>
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Signed-off-by: Thierry Reding <treding@nvidia.com>
After a suspend-resume cycle, the resumed kernel has no idea what the
booted kernel may have done to the GuC before replacing itself with the
resumed image. In particular, it may have already loaded the GuC with
firmware, which will then cause this kernel's attempt to (re)load the
firmware to fail (GuC program memory is write-once!). The symptoms
(GuC firmware reload fails after hibernation) are further described
in the Bugzilla reference below.
So let's *always* reset the GuC just before (re)loading the firmware;
the hardware should then be in a well-known state, and we may even
avoid some of the issues arising from unpredictable timing.
Also added some more fields & values to the definition of the GUC_STATUS
register, which is the key diagnostic indicator if the GuC load fails.
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Cc: Alex Dai <yu.dai@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94390
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Due to timing issues in the HW, some of the status bits required for GuC
authentication occasionally don't get set; when that happens, the GuC
cannot be initialized and we will be left with a wedged GPU. The W/A
suggested is to perform a soft reset of the GuC and attempt to reload
the F/W again for few times before giving up.
As the failure is dependent on timing, tests performed by triggering
manual full gpu reset (i915_wedged) showed that we could sometimes hit
this after several thousand iterations, but sometimes tests ran even
longer without any issues. Reset and reload mechanism proved helpful
when we indeed hit f/w load failure, so it is better to include this
to improve driver stability.
This change implements the following WAs,
WaEnableuKernelHeaderValidFix:skl,bxt
WaEnableGuCBootHashCheckNotSet:skl,bxt
Signed-off-by: Arun Siluvery <arun.siluvery@linux.intel.com>
Signed-off-by: Dave Gordon <david.s.gordon@intel.com>
Reviewed-by: Alex Dai <yu.dai@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
This effectively reverts
commit 8e5fd599eb
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date: Wed Apr 9 13:28:50 2014 +0300
drm/i915/chv: Make CHV irq handler loop until all interrupts are consumed
as under continuous execlists load we can saturate the IRQ handler,
destablising the tsc clock and triggering the NMI watchdog to declare a hung
CPU.
[ 552.756051] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large:
[ 552.756080] clocksource: 'refined-jiffies' wd_now: 10003b480 wd_last: 10003b28c mask: ffffffff
[ 552.756091] clocksource: 'tsc' cs_now: d55d31aa50 cs_last: d17446166c mask: ffffffffffffffff
[ 552.756210] clocksource: Switched to clocksource refined-jiffies
[ 575.217870] NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
[ 575.217893] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.5.0-rc7+ #18
[ 575.217905] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0027.2015.0507.1758 05/07/2015
[ 575.217915] 0000000000000000 ffff88027fd05bc0 ffffffff81288c6d 0000000000000000
[ 575.217935] 0000000000000001 ffff88027fd05be0 ffffffff810e72d1 0000000000000000
[ 575.217951] ffff88027fd05c80 ffff88027fd05c20 ffffffff81114b60 0000000181015f1e
[ 575.217967] Call Trace:
[ 575.217973] <NMI> [<ffffffff81288c6d>] dump_stack+0x4f/0x72
[ 575.217994] [<ffffffff810e72d1>] watchdog_overflow_callback+0x151/0x160
[ 575.218003] [<ffffffff81114b60>] __perf_event_overflow+0xa0/0x1e0
[ 575.218016] [<ffffffff811154c4>] perf_event_overflow+0x14/0x20
[ 575.218028] [<ffffffff8101d2ca>] intel_pmu_handle_irq+0x1da/0x460
[ 575.218042] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218052] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218064] [<ffffffff81014ae8>] perf_event_nmi_handler+0x28/0x50
[ 575.218075] [<ffffffff81007540>] nmi_handle+0x60/0x130
[ 575.218086] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218096] [<ffffffff810079c0>] do_nmi+0x140/0x470
[ 575.218108] [<ffffffff81559ec7>] end_repeat_nmi+0x1a/0x1e
[ 575.218119] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218129] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218139] [<ffffffff814a8aae>] ? poll_idle+0x3e/0x70
[ 575.218148] <<EOE>> [<ffffffff814a8353>] cpuidle_enter_state+0xf3/0x2f0
[ 575.218164] [<ffffffff814a8587>] cpuidle_enter+0x17/0x20
[ 575.218175] [<ffffffff810aaa3a>] call_cpuidle+0x2a/0x40
[ 575.218185] [<ffffffff810aade3>] cpu_startup_entry+0x273/0x330
[ 575.218196] [<ffffffff81033a1e>] start_secondary+0x10e/0x130
However, not servicing all available IIR within the handler does hurt the
throughput of pathological nop execbuf by about 20%, with a similar effect
upon the dispatch latency of a series of execbuf.
v2: use do {} while(0) for a smaller patch, and easier to revert again
I have reasonable confidence that we do not miss GT interrupts (as
execlists provides a stress case with a failure mechanism easily
detected by igt), however I have less confidence about all the other
sources of interrupts and worry that may lose a display hotplug
interrupt, for example.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93467
Testcase: igt/gem_exec_nop/basic # requires NMI watchdog
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Antti Koskipää <antti.koskipaa@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1457946117-6714-1-git-send-email-chris@chris-wilson.co.uk
(cherry picked from commit 579de73b04)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Since we need MST devices ready before we try to resume displays,
calling this after intel_display_resume() can result in some issues with
various laptop docks where the monitor won't turn back on after
suspending the system.
This order was originally changed in
commit e7d6f7d708 ("drm/i915: resume MST after reading back hw state")
In order to fix some unclaimed register errors, however the actual cause
of those has since been fixed.
CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
[danvet: Resolve conflicts with locking changes.]
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit a16b7658f4)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
After unplugging a DP MST display from the system, we have to go through
and destroy all of the DRM connectors associated with it since none of
them are valid anymore. Unfortunately, intel_dp_destroy_mst_connector()
doesn't do a good enough job of ensuring that throughout the destruction
process that no modesettings can be done with the connectors. As it is
right now, intel_dp_destroy_mst_connector() works like this:
* Take all modeset locks
* Clear the configuration of the crtc on the connector, if there is one
* Drop all modeset locks, this is required because of circular
dependency issues that arise with trying to remove the connector from
sysfs with modeset locks held
* Unregister the connector
* Take all modeset locks, again
* Do the rest of the required cleaning for destroying the connector
* Finally drop all modeset locks for good
This only works sometimes. During the destruction process, it's very
possible that a userspace application will attempt to do a modesetting
using the connector. When we drop the modeset locks, an ioctl handler
such as drm_mode_setcrtc has the oppurtunity to take all of the modeset
locks from us. When this happens, one thing leads to another and
eventually we end up committing a mode with the non-existent connector:
[drm:intel_dp_link_training_clock_recovery [i915]] *ERROR* failed to enable link training
[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
[drm:intel_dp_start_link_train [i915]] *ERROR* failed to start channel equalization
[drm:intel_dp_aux_ch] dp_aux_ch timeout status 0x7cf0001f
[drm:intel_mst_pre_enable_dp [i915]] *ERROR* failed to allocate vcpi
And in some cases, such as with the T460s using an MST dock, this
results in breaking modesetting and/or panicking the system.
To work around this, we now unregister the connector at the very
beginning of intel_dp_destroy_mst_connector(), grab all the modesetting
locks, and then hold them until we finish the rest of the function.
CC: stable@vger.kernel.org
Signed-off-by: Lyude <cpaul@redhat.com>
Signed-off-by: Rob Clark <rclark@redhat.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1458155884-13877-1-git-send-email-cpaul@redhat.com
(cherry picked from commit 1f7717552e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
It may caused a dead lock if we flush the hpd work in bridge disable time.
The normal flow would like:
IN --> DRM IOCTL
1. Acquire crtc_ww_class_mutex (DRM IOCTL)
IN --> analogix_dp_bridge
2. Acquire hpd work lock (Flush hpd work)
3. HPD work already in idle, no need to run the work function.
OUT <-- analogix_dp_bridge
OUT <-- DRM IOCTL
The dead lock flow would like:
IN --> DRM IOCTL
1. Acquire crtc_ww_class_mutex (DRM IOCTL)
IN --> analogix_dp_bridge
2. Acquire hpd work lock (Flush hpd work)
IN --> analogix_dp_hotplug
IN --> drm_helper_hpd_irq_event
3. Acquire mode_config lock (This lock already have been acquired in previous step 1)
** Dead Lock Now **
It's wrong to flush the hpd work in bridge->disable time, I guess the
original code just want to ensure the delay work must be finish before
encoder disabled.
The flush work in bridge disable time is try to ensure the HPD event
won't be missed before display card disabled, actually we can take a
fast respond way(interrupt thread) to update DRM HPD event to fix the
delay update and possible dead lock.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Display Port monitor could support kinds of mode which indicate
in monitor edid, not just one single display resolution which
defined in panel or devivetree property display timing.
Note: Gustavo Padovan try to remove the controller and phy
power on function in bind time at bellow commit:
drm/exynos: do not start enabling DP at bind() phase
But for now driver need to read edid message in .get_modes()
function, so controller must be inited in bind time, so we
need to add controller init back.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
This change just make a little clean to make code more like
drm core expect, move hdp detect code from bridge->enable(),
and place them into connector->detect().
Note: Gustavo Padovan try to remove the controller and phy
power on function in bind time at bellow commit:
drm/exynos: do not start enabling DP at bind() phase
But for now the connector status don't hardcode to connected,
need to operate dp phy in .detect function, so we need to revert
parts if Gustavo Padovan's changes, add phy poweron
function in bind time.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Some edp screen do not have hpd signal, so we can't just return
failed when hpd plug in detect failed.
This is an hardware property, so we need add a devicetree property
"analogix,need-force-hpd" to indicate this sutiation.
Acked-by: Rob Herring <robh@kernel.org>
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
There are some IP limit on rk3288 that only support 4 physical lanes
of 2.7/1.6 Gbps/lane, so seprate them out by device_type flag.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Both hsync/vsync polarity and interlace mode can be parsed from
drm display mode, and dynamic_range and ycbcr_coeff can be judge
by the video code.
But presumably Exynos still relies on the DT properties, so take
good use of mode_fixup() in to achieve the compatibility hacks.
Reviewed-by: Krzysztof Kozlowski <k.kozlowski@samsung.com>
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
link_rate and lane_count already configured in analogix_dp_set_link_train(),
so we don't need to config those repeatly after training finished, just
remove them out.
Beside Display Port 1.2 already support 5.4Gbps link rate, the maximum sets
would change from {1.62Gbps, 2.7Gbps} to {1.62Gbps, 2.7Gbps, 5.4Gbps}.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
In the original split we kept the register constants intact to keep the
diff small. Still the constants are Analogix-specific, so rename them now.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
The core functionality now resides in the generic bridge part so the
exynos-specific implementation details can get a more suitable nameing.
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Split the dp core driver from exynos directory to bridge directory,
and rename the core driver to analogix_dp_*, rename the platform
code to exynos_dp.
Beside the new analogix_dp driver would export six hooks.
"analogix_dp_bind()" and "analogix_dp_unbind()"
"analogix_dp_suspned()" and "analogix_dp_resume()"
"analogix_dp_detect()" and "analogix_dp_get_modes()"
The bind/unbind symbols is used for analogix platform driver to connect
with analogix_dp core driver. And the detect/get_modes is used for analogix
platform driver to init the connector.
They reason why connector need register in helper driver is rockchip drm
haven't implement the atomic API, but Exynos drm have implement it, so
there would need two different connector helper functions, that's why we
leave the connector register in helper driver.
Acked-by: Inki Dae <inki.dae@samsung.com>
Tested-by: Caesar Wang <wxt@rock-chips.com>
Tested-by: Douglas Anderson <dianders@chromium.org>
Tested-by: Heiko Stuebner <heiko@sntech.de>
Tested-by: Javier Martinez Canillas <javier@osg.samsung.com>
Signed-off-by: Yakir Yang <ykk@rock-chips.com>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
and revert fix following it accordingly
Revert "drm/amdgpu: stop trying to suspend UVD sessions v2"
Revert "drm/amdgpu: fix the UVD suspend sequence order"
Signed-off-by: Leo Liu <leo.liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In commit e45708976a ("drm/dp-helper: Move the legacy helpers to
gma500") the legacy i2c helpers were moved to the only remaining user of
them, the gma500 driver. Together with that move, i2c_dp_aux_add_bus()
was marked deprecated and started warning about its remaining use.
It's now been a year and a half of annoying warning, and apparently
nobody cares enough about gma500 to try to move it along to the more
modern models.
Get rid of the warning - if even the gma500 people don't care enough,
then they should certainly not spam other innocent developers with a
warning that might hide other, much more real issues.
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Alan Cox <alan@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge PAGE_CACHE_SIZE removal patches from Kirill Shutemov:
"PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
Let's stop pretending that pages in page cache are special. They are
not.
The first patch with most changes has been done with coccinelle. The
second is manual fixups on top.
The third patch removes macros definition"
[ I was planning to apply this just before rc2, but then I spaced out,
so here it is right _after_ rc2 instead.
As Kirill suggested as a possibility, I could have decided to only
merge the first two patches, and leave the old interfaces for
compatibility, but I'd rather get it all done and any out-of-tree
modules and patches can trivially do the converstion while still also
working with older kernels, so there is little reason to try to
maintain the redundant legacy model. - Linus ]
* PAGE_CACHE_SIZE-removal:
mm: drop PAGE_CACHE_* and page_cache_{get,release} definition
mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage
mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.
This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.
Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.
Let's stop pretending that pages in page cache are special. They are
not.
The changes are pretty straight-forward:
- <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.
The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.
There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.
virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E
@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT
@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE
@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK
@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)
@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)
@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Doing a lot of work in the interrupt handler introduces huge
latencies to the system as a whole.
Most dramatic effect can be seen by running an all engine
stress test like igt/gem_exec_nop/all where, when the kernel
config is lean enough, the whole system can be brought into
multi-second periods of complete non-interactivty. That can
look for example like this:
NMI watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [kworker/u8:3:143]
Modules linked in: [redacted for brevity]
CPU: 0 PID: 143 Comm: kworker/u8:3 Tainted: G U L 4.5.0-160321+ #183
Hardware name: Intel Corporation Broadwell Client platform/WhiteTip Mountain 1
Workqueue: i915 gen6_pm_rps_work [i915]
task: ffff8800aae88000 ti: ffff8800aae90000 task.ti: ffff8800aae90000
RIP: 0010:[<ffffffff8104a3c2>] [<ffffffff8104a3c2>] __do_softirq+0x72/0x1d0
RSP: 0000:ffff88014f403f38 EFLAGS: 00000206
RAX: ffff8800aae94000 RBX: 0000000000000000 RCX: 00000000000006e0
RDX: 0000000000000020 RSI: 0000000004208060 RDI: 0000000000215d80
RBP: ffff88014f403f80 R08: 0000000b1b42c180 R09: 0000000000000022
R10: 0000000000000004 R11: 00000000ffffffff R12: 000000000000a030
R13: 0000000000000082 R14: ffff8800aa4d0080 R15: 0000000000000082
FS: 0000000000000000(0000) GS:ffff88014f400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fa53b90c000 CR3: 0000000001a0a000 CR4: 00000000001406f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Stack:
042080601b33869f ffff8800aae94000 00000000fffc2678 ffff88010000000a
0000000000000000 000000000000a030 0000000000005302 ffff8800aa4d0080
0000000000000206 ffff88014f403f90 ffffffff8104a716 ffff88014f403fa8
Call Trace:
<IRQ>
[<ffffffff8104a716>] irq_exit+0x86/0x90
[<ffffffff81031e7d>] smp_apic_timer_interrupt+0x3d/0x50
[<ffffffff814f3eac>] apic_timer_interrupt+0x7c/0x90
<EOI>
[<ffffffffa01c5b40>] ? gen8_write64+0x1a0/0x1a0 [i915]
[<ffffffff814f2b39>] ? _raw_spin_unlock_irqrestore+0x9/0x20
[<ffffffffa01c5c44>] gen8_write32+0x104/0x1a0 [i915]
[<ffffffff8132c6a2>] ? n_tty_receive_buf_common+0x372/0xae0
[<ffffffffa017cc9e>] gen6_set_rps_thresholds+0x1be/0x330 [i915]
[<ffffffffa017eaf0>] gen6_set_rps+0x70/0x200 [i915]
[<ffffffffa0185375>] intel_set_rps+0x25/0x30 [i915]
[<ffffffffa01768fd>] gen6_pm_rps_work+0x10d/0x2e0 [i915]
[<ffffffff81063852>] ? finish_task_switch+0x72/0x1c0
[<ffffffff8105ab29>] process_one_work+0x139/0x350
[<ffffffff8105b186>] worker_thread+0x126/0x490
[<ffffffff8105b060>] ? rescuer_thread+0x320/0x320
[<ffffffff8105fa64>] kthread+0xc4/0xe0
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
[<ffffffff814f351f>] ret_from_fork+0x3f/0x70
[<ffffffff8105f9a0>] ? kthread_create_on_node+0x170/0x170
I could not explain, or find a code path, which would explain
a +20 second lockup, but from some instrumentation it was
apparent the interrupts off proportion of time was between
10-25% under heavy load which is quite bad.
When a interrupt "cliff" is reached, which was >~320k irq/s on
my machine, the whole system goes into a terrible state of the
above described multi-second lockups.
By moving the GT interrupt handling to a tasklet in a most
simple way, the problem above disappears completely.
Testing the effect on sytem-wide latencies using
igt/gem_syslatency shows the following before this patch:
gem_syslatency: cycles=1532739, latency mean=416531.829us max=2499237us
gem_syslatency: cycles=1839434, latency mean=1458099.157us max=4998944us
gem_syslatency: cycles=1432570, latency mean=2688.451us max=1201185us
gem_syslatency: cycles=1533543, latency mean=416520.499us max=2498886us
This shows that the unrelated process is experiencing huge
delays in its wake-up latency. After the patch the results
look like this:
gem_syslatency: cycles=808907, latency mean=53.133us max=1640us
gem_syslatency: cycles=862154, latency mean=62.778us max=2117us
gem_syslatency: cycles=856039, latency mean=58.079us max=2123us
gem_syslatency: cycles=841683, latency mean=56.914us max=1667us
Showing a huge improvement in the unrelated process wake-up
latency. It also shows an approximate halving in the number
of total empty batches submitted during the test. This may
not be worrying since the test puts the driver under
a very unrealistic load with ncpu threads doing empty batch
submission to all GPU engines each.
Another benefit compared to the hard-irq handling is that now
work on all engines can be dispatched in parallel since we can
have up to number of CPUs active tasklets. (While previously
a single hard-irq would serially dispatch on one engine after
another.)
More interesting scenario with regards to throughput is
"gem_latency -n 100" which shows 25% better throughput and
CPU usage, and 14% better dispatch latencies.
I did not find any gains or regressions with Synmark2 or
GLbench under light testing. More benchmarking is certainly
required.
v2:
* execlists_lock should be taken as spin_lock_bh when
queuing work from userspace now. (Chris Wilson)
* uncore.lock must be taken with spin_lock_irq when
submitting requests since that now runs from either
softirq or process context.
v3:
* Expanded commit message with more testing data;
* converted missed locking sites to _bh;
* added execlist_lock comment. (Chris Wilson)
v4:
* Mention dispatch parallelism in commit. (Chris Wilson)
* Do not hold uncore.lock over MMIO reads since the block
is already serialised per-engine via the tasklet itself.
(Chris Wilson)
* intel_lrc_irq_handler should be static. (Chris Wilson)
* Cancel/sync the tasklet on GPU reset. (Chris Wilson)
* Document and WARN that tasklet cannot be active/pending
on engine cleanup. (Chris Wilson/Imre Deak)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Testcase: igt/gem_exec_nop/all
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1459768316-6670-1-git-send-email-tvrtko.ursulin@linux.intel.com
We accidentally return PTR_ERR(NULL) which is success instead of a
negative error code.
Fixes: 879e40bea6f2 ('drm: ARM HDLCD - get rid of devm_clk_put()')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>
Clock is acquired with devm_clk_get() which already manages
corresponding resource.
I.e. in case of driver removal or failure on installaiton
clock resources will be automatically released and explicit
call of devm_clk_put() is not required.
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: David Airlie <airlied@linux.ie>
Cc: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Alexey Brodkin <abrodkin@synopsys.com>
Signed-off-by: Liviu Dudau <Liviu.Dudau@arm.com>