SKL+ needs >4K alignment for tiled surfaces, so make
intel_compute_page_offset() handle it.
The way we do it is first we compute the closest tile boundary
as before, and then figure out how many tiles we need to go
to reach the desired alignment. The difference in the offset
is then added into the x/y offsets.
v2: Be less confusing wrt. units (pixels vs. bytes) (Daniel)
v3: Use u32 for offsets
Have intel_adjust_tile_offset() return the new offset (will be
useful later)
Add an offset_aligned variable (Daniel)
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-5-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
The page aligned surface address calculation needs to know which way
things are rotated. The contract now says that the caller must pass the
rotate x/y coordinates, as well as the tile_height aligned stride in
the tile_height direction. This will make it fairly simple to deal with
90/270 degree rotation on SKL+ where we have to deal with the rotated
view into the GTT.
v2: Pass rotation instead of bool even thoughwe only care about 0/180 vs. 90/270
v3: Introduce intel_tile_dims(), and don't mix up different units so much
v4: Unconfuse bytes vs. pixels even more
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1455569699-27905-4-git-send-email-ville.syrjala@linux.intel.com
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Assorted changes in the areas of code cleanup, reduction of
invariant conditional in the interrupt handler and lock
contention and MMIO access optimisation.
* Remove needless initialization.
* Improve cache locality by reorganizing code and/or using
branch hints to keep unexpected or error conditions out
of line.
* Favor busy submit path vs. empty queue.
* Less branching in hot-paths.
v2:
* Avoid mmio reads when possible. (Chris Wilson)
* Use natural integer size for csb indices.
* Remove useless return value from execlists_update_context.
* Extract 32-bit ppgtt PDPs update so it is out of line and
shared with two callers.
* Grab forcewake across all mmio operations to ease the
load on uncore lock and use chepear mmio ops.
v3:
* Removed some more pointless u8 data types.
* Removed unused return from execlists_context_queue.
* Commit message updates.
v4:
* Unclumsify the unqueue if statement. (Chris Wilson)
* Hide forcewake from the queuing function. (Chris Wilson)
Version 3 now makes the irq handling code path ~20% smaller on
48-bit PPGTT hardware, and a little bit less elsewhere. Hot
paths are mostly in-line now and hammering on the uncore
spinlock is greatly reduced together with mmio traffic to an
extent.
Benchmarking with "gem_latency -n 100" (keep submitting
batches with 100 nop instruction) shows approximately 4% higher
throughput, 2% less CPU time and 22% smaller latencies. This was
on a big-core while small-cores could benefit even more.
Most likely reason for the improvements are the MMIO
optimization and uncore lock traffic reduction.
One odd result is with "gem_latency -n 0" (dispatching empty
batches) which shows 5% more throughput, 8% less CPU time,
25% better producer and consumer latencies, but 15% higher
dispatch latency which is yet unexplained.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: http://patchwork.freedesktop.org/patch/msgid/1456505912-22286-1-git-send-email-tvrtko.ursulin@linux.intel.com
The vblank IRQ is only needed to trigger page flip work, so we
might as well disable it when there is no work to do.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
If the FB is backed by a GEM object with an dma-buf attached
we need to wait for any pending fences to signal before executing
the page flip.
The implementation is straight forward by deferring the flip to
a workqueue in that case.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The DRM core only references the currently queued/active framebuffer.
So there is a period of time where the flip is not completed, but
the GEM object backing the FB is already unreferenced and could be
destroyed if userspace closes its handle.
Make sure to keep a reference to the GEM object until the flip is
actually executed clean things up in a worker running behind the
flip execution.
Also move the page flip event into the context of this worker, so
it gets cleaned up automatically.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Start tracking the flip state explicitly, as opposed to inferring
it from the presence if a new FB. This is a preparatory step to
introduce an new immediate state, where we can wait for a fence to
signal.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
The core already does the correct replacemet if the driver
page flip function returns without an error, so there is no
need to do it here.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
This patch changes the dev_info() call to dev_dbg() in ipu_plane_update()
to print out the information that a plane's CRTC is changed, because this
kind of information is only useful for debugging.
Signed-off-by: Liu Ying <Ying.Liu@freescale.com>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
- lots and lots of fbc work from Paulo
- max pixel clock checks from Mika Kahola
- prep work for nv12 offset handling from Ville
- piles of small fixes and refactorings all around
* tag 'drm-intel-next-2016-02-14' of git://anongit.freedesktop.org/drm-intel: (113 commits)
drm/i915: Update DRIVER_DATE to 20160214
drm/i915: edp resume/On time optimization.
agp/intel-gtt: Only register fake agp driver for gen1
drm/i915: TV pixel clock check
drm/i915: CRT pixel clock check
drm/i915: SDVO pixel clock check
drm/i915: DisplayPort-MST pixel clock check
drm/i915: HDMI pixel clock check
drm/i915: DisplayPort pixel clock check
drm/i915: check that rpm ref is held when accessing ringbuf in stolen mem
drm/i915: fix error path in intel_setup_gmbus()
drm/i915: Stop depending upon CONFIG_AGP_INTEL
agp/intel-gtt: Don't leak the scratch page
drm/i915: Capture PCI revision and subsytem details in error state
drm/i915: fix context/engine cleanup order
drm/i915: Handle PipeC fused off on IVB/HSW/BDW
drm/i915/skl: Fix typo in DPLL_CFGCR1 definition
drm/i915: Skip DDI PLL selection for DSI
drm/i915/skl: Explicitly check for eDP in skl_ddi_pll_select()
drm/i915/skl: Don't skip mst encoders in skl_ddi_pll_select()
...
The owner must be per ring as long as we don't
support sharing VMIDs per process. Also move the
assigned VMID and page directory address into the
IB structure.
v3: assign the VMID to all IBs, not just the first one.
v4: use correct pointer for owner
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In addition to calculating final watermarks, let's also pre-calculate a
set of intermediate watermark values at atomic check time. These
intermediate watermarks are a combination of the watermarks for the old
state and the new state; they should satisfy the requirements of both
states which means they can be programmed immediately when we commit the
atomic state (without waiting for a vblank). Once the vblank does
happen, we can then re-program watermarks to the more optimal final
value.
v2: Significant rebasing/rewriting.
v3:
- Move 'need_postvbl_update' flag to CRTC state (Daniel)
- Don't forget to check intermediate watermark values for validity
(Maarten)
- Don't due async watermark optimization; just do it at the end of the
atomic transaction, after waiting for vblanks. We do want it to be
async eventually, but adding that now will cause more trouble for
Maarten's in-progress work. (Maarten)
- Don't allocate space in crtc_state for intermediate watermarks on
platforms that don't need it (gen9+).
- Move WaCxSRDisabledForSpriteScaling:ivb into intel_begin_crtc_commit
now that ilk_update_wm is gone.
v4:
- Add a wm_mutex to cover updates to intel_crtc->active and the
need_postvbl_update flag. Since we don't have async yet it isn't
terribly important yet, but might as well add it now.
- Change interface to program watermarks. Platforms will now expose
.initial_watermarks() and .optimize_watermarks() functions to do
watermark programming. These should lock wm_mutex, copy the
appropriate state values into intel_crtc->active, and then call
the internal program watermarks function.
v5:
- Skip intermediate watermark calculation/check during initial hardware
readout since we don't trust the existing HW values (and don't have
valid values of our own yet).
- Don't try to call .optimize_watermarks() on platforms that don't have
atomic watermarks yet. (Maarten)
v6:
- Rebase
v7:
- Further rebase
v8:
- A few minor indentation and line length fixes
v9:
- Yet another rebase since Maarten's patches reworked a bunch of the
code (wm_pre, wm_post, etc.) that this was previously based on.
v10:
- Move wm_mutex to dev_priv to protect against racing commits against
disjoint CRTC sets. (Maarten)
- Drop unnecessary clearing of cstate->wm.need_postvbl_update (Maarten)
v11:
- Now that we've moved to atomic watermark updates, make sure we call
the proper function to program watermarks in
{ironlake,haswell}_crtc_enable(); the failure to do so on the
previous patch iteration led to us not actually programming the
watermarks before turning on the CRTC, which was the cause of the
underruns that the CI system was seeing.
- Fix inverted logic for determining when to optimize watermarks. We
were needlessly optimizing when the intermediate/optimal values were
the same (harmless), but not actually optimizing when they differed
(also harmless, but wasteful from a power/bandwidth perspective).
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1456276813-5689-1-git-send-email-matthew.d.roper@intel.com
Add support for the HDMI PHY/PLL found in MSM8996/APQ8096.
Unlike the previous PHYs supported in the driver, this doesn't need
the powerup/powerdown ops. The PLL prepare/unprepare clock ops
enable/disable the phy itself.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
- Create separate domains for 8960 PHY and PLL
- Create separate domains for 8x60 PHY
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Remove the old PHY ops managed by hdmi_platform_config and use them as ops
provided by the HDMI PHY driver.
Remove the old HDMI 8960 PLL code that used the top level HDMI TX mmio
base.
NOTE: With this commit, HDMI functionality will break until the HDMI
PHY/PLL register offsets in hdmi.xml.h aren't updated to be used as
separate domains.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make HDMI core get its PHY by parsing the "phys" phandle. The core will use
this PHY reference to enable/disable PHY. The driver defers probe until PHY
isn't available.
The DT bindings used here is the same as the one used for PHYs using the
common PHY framework bindings.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Add a helper to initialize PLL in the PHY driver. HDMI PLLs are going to
have their own mmio base different from that of PHY.
For the clock code in hdmi_phy_8960.c, some changes were needed for it to
work with the updated register offsets. Create a copy of the updated clock
code in hdmi_pll_8960.c, instead of rewriting it in hdmi_phy_8960.c
itself. This removes the need to place CONFIG_COMMON_CLOCK checks all
around, makes the code more legible, and also removes some old checkpatch
warnings with the original code.
The older hdmi pll clock ops in hdmi_phy_8960.c will be removed later. The
driver will use these until the HDMI PHY/PLL register offsets aren't
considered as separate domains (i.e. their offsets start from 0).
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Create a PHY device that represents the TX PHY and PLL parts of the HDMI
block.
This makes management of PHY specific resources (regulators and clocks)
much easier, and makes the PHY and PLL usable independently. It also
simplifies the core HDMI driver, which currently assigns phy ops among
many other things.
The PHY driver implementation done here is very similar to the PHY driver
we already have for DSI.
Keep the old hdmi_phy_funcs ops for now. The driver will use these until
the HDMI PHY/PLL register offsets aren't considered as separate
domains (i.e. their offsets start from 0).
The driver doesn't use the common PHY framework for now. This is because
it's hard to map our ops with the ops provided by the framework. The
bindings used for this is the generic phy bindings. So, this can be
adapted to the PHY framework in the future, if possible.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Some platforms may not have a HPD gpio line to detect Hot Plug signal from
the connector. They need to rely only on reading REG_HDMI_HPD_INT_STATUS
for HPD.
Modify hdmi_connector_detect logic such that it checks for HPD only using
the status register if there is no HPD gpio.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
Make gpio allocation and usage iterative by parsing the gpios on a given
platform from a list. This gives us flexibility over what all gpios exist
for a platform, whether they are input or output, and what value they
should be set to.
In particular, this will make HDMI on 8x96 platforms easier to integrate
with the driver, as it doesn't have a HPD gpio input to them. Also, it
cleans things up a bit.
We still use the legacy gpio api here, as we might need to backport this
driver to downstream kernels.
Signed-off-by: Archit Taneja <architt@codeaurora.org>
Signed-off-by: Rob Clark <robdclark@gmail.com>
amdgpu must load only after amdkfd's loading has been completed. If that
is not enforced, then amdgpu's call into amdkfd's functions will cause a
kernel BUG.
When amdgpu and amdkfd are built as kernel modules, that rule is enforced
by the kernel's modules loading mechanism. When amdgpu and amdkfd are
built inside the kernel image, that rule is enforced by ordering in the
drm Makefile (amdkfd before amdgpu).
Instead of using drm Makefile ordering, we can now use deferred loading
as amdkfd now returns -EPROBE_DEFER in kgd2kfd_init() when it is not yet
loaded.
This patch defers amdgpu loading by propagating -EPROBE_DEFER to the
kernel's drivers loading infrastructure. That will put amdgpu into the
pending drivers list (see description in dd.c). Once amdkfd is loaded,
a call to kgd2kfd_init() will return successfully and amdgpu will be able
to load.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
radeon must load only after amdkfd's loading has been completed. If that
is not enforced, then radeon's call into amdkfd's functions will cause a
kernel BUG.
When radeon and amdkfd are built as kernel modules, that rule is
enforced by the kernel's modules loading mechanism. When radeon and
amdkfd are built inside the kernel image, that rule is enforced by
ordering in the drm Makefile (amdkfd before radeon).
Instead of using drm Makefile ordering, we can now use deferred
loading as amdkfd now returns -EPROBE_DEFER in kgd2kfd_init() when it is
not yet loaded.
This patch defers radeon loading by propagating -EPROBE_DEFER to the
kernel's drivers loading infrastructure. That will put radeon into the
pending drivers list (see description in dd.c). Once amdkfd is loaded,
a call to kgd2kfd_init() will return successfully and radeon will be
able to load.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Current dependencies between amdkfd and radeon/amdgpu force the loading
of amdkfd _before_ radeon and/or amdgpu are loaded. When all these kernel
drivers are built as modules, this ordering is enforced by the kernel
built-in mechanism of loading dependent modules.
However, there is no such mechanism in case where all these drivers are
compiled inside the kernel image (not as modules). The current way to
enforce loading of amdkfd before radeon/amdgpu, is to put amdkfd before
radeon/amdgpu in the drm Makefile, but that method is way too fragile.
In addition, there is no kernel mechanism to check whether a kernel
driver that is built inside the kernel image, has already been loaded.
To solve this, this patch adds to kfd_module.c a new static variable,
amdkfd_init_completed, that is set to 1 only when amdkfd's
module initialization function has been completed (successfully).
kgd2kfd_init(), which is the initialization function of the
kgd-->kfd interface, and which is the first function in amdkfd called by
radeon/amdgpu, will return successfully only if amdkfd_init_completed is
equal 1.
If amdkfd_init_completed is not equal to 1, kgd2kfd_init() will
return -EPROBE_DEFER to signal radeon/amdgpu they need to defer
their loading until amdkfd is loaded.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
We weren't updating the interlaced bit, so we'd scan out incorrectly
if the firmware had brought up the TV encoder and we were switching to
HDMI.
Signed-off-by: Eric Anholt <eric@anholt.net>
It looks like when I went to add the interlaced bits, I just took the
existing PV_VERT* block and indented it, instead of copy and pasting
it first. Without this, changing resolution never worked.
Signed-off-by: Eric Anholt <eric@anholt.net>
If the firmware hadn't brought up HDMI for us, we need to do its
power-on reset sequence (reset HD and and clear its STANDBY bits,
reset HDMI, and leave the PHY disabled).
Signed-off-by: Eric Anholt <eric@anholt.net>
We'd need X to queue up an async pageflip while another is
outstanding, and then take a SIGIO. I think X actually avoids sending
out the next pageflip while one's already queued, but I'm not sure.
Signed-off-by: Eric Anholt <eric@anholt.net>
It's simpler to just use snprintf() to print this to one buffer instead
of using strcpy() and strcat(). Also using snprintf() is slightly safer
than using sprintf().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Elsewhere we have adopted the convention of using '_link' to denote
elements in the list (and '_list' for the actual list_head itself), and
that the name should indicate which list the link belongs to (and
preferrably not just where the link is being stored).
s/vma_link/obj_link/ (we iterate over obj->vma_list)
s/mm_list/vm_link/ (we iterate over vm->[in]active_list)
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>