Since old kernel versions wouldn't expose the IN_FORMATS_ASYNC blob,
userspace can't really use the absence of the blob to determine
that async flips aren't supported. Thus it seems better to always
expose the blob on all planes, whether they support async flips
or not. The blob will simply not indicate any format+modifier
combinations as supported on planes that aren't async flip capable.
Currently we expose the blob for all skl+ universal planes (even
though we implement async flips only for the first plane on each
pipe), and i9xx primary planes (for ilk+ we have async flips support,
for pre-ilk we do not). Complete the full set by also expsosing
the blob on pre-skl sprite planes, and cursors.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20251112233030.24117-3-ville.syrjala@linux.intel.com
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Pass the format info into plane->max_stride() from the
caller instead of doing yet another drm_format_info()
lookup on the spot.
drm_format_info() is both rather expensive, and technically
incorrect since it doesn't return the correct format info
for compressed formats (though that doesn't actually matter
for the current .max_stride() implementations since they
are just interested in the cpp value).
Most callers already have the format info available. The
only exception is intel_dumb_fb_max_stride() where we shall
use the actually correct drm_get_format_info() variant.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patch.msgid.link/20251107181126.5743-3-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
The remaining utils display needs from i915_utils.h are primarily
MISSING_CASE() and fetch_and_zero(), with a couple of
i915_inject_probe_failure() uses.
To avoid excessive churn, add duplicates of MISSING_CASE() and
fetch_and_zero() to intel_display_utils.h, and switch display to use the
display utils.
As long as there are display files that include i915_drv.h, which
includes i915_utils.h, we'll need #ifndef guards for MISSING_CASE() and
fetch_and_zero() in both utils headers. We can remove them once display
no longer depends on i915_drv.h.
A couple of files in display still need i915_utils.h for
i915_inject_probe_failure(). Annotate this. They will be handled
separately.
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Link: https://patch.msgid.link/79f9e31ca64c8c045834d48e20ceb0c515d1e9e1.1761146196.git.jani.nikula@intel.com
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Currently we pre-compute the plane surface/base address
partially (only for cursor_needs_physical cases) in
intel_plane_pin_fb() and finish the calculation in the
plane->update_arm(). Let's just precompute the whole thing
instead.
One benefit is that we get rid of all the vma offset stuff
from the low level plane code. Another use I have in mind
is including the surface address in the plane tracepoints,
which should make it easier to analyze display faults.
v2: Deal with xe reuse_vma() hacks
v3: use intel_plane_ggtt_offset() still in reuse_vma()
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250717203216.31258-1-ville.syrjala@linux.intel.com
Bspec lists different VT-d guard numbers (the number of dummy
padding PTEs) for different platforms and plane types. Use those
instead of just assuming the max glk+ number for everything.
This could avoid a bit of overhead on older platforms due to
reduced padding, and it makes it easier to cross check with the
spec.
Note that VLV/CHV do not document this w/a at all, so not sure
if it's actually needed or not. Nor do we actually know how much
padding is required if it is needed. For now use the same 128
PTEs that we use for snb-bdw primary planes.
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20250122151755.6928-5-ville.syrjala@linux.intel.com
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
We need to be able to do both MMIO and DSB based pipe/plane
programming. To that end plumb the 'dsb' all way from the top
into the plane commit hooks.
The compiler appears smart enough to combine the branches from
all the back-to-back register writes into a single branch.
So the generated asm ends up looking more or less like this:
plane_hook()
{
if (dsb) {
intel_dsb_reg_write();
intel_dsb_reg_write();
...
} else {
intel_de_write_fw();
intel_de_write_fw();
...
}
}
which seems like a reasonably efficient way to do this.
An alternative I was also considering is some kind of closure
(register write function + display vs. dsb pointer passed to it).
That does result is smaller code as there are no branches anymore,
but having each register access go via function pointer sounds
less efficient.
Not that I actually measured the overhead of either approach yet.
Also the reg_rw tracepoint seems to be making a huge mess of the
generated code for the mmio path. And additionally there's some
kind of IS_GSI_REG() hack in __raw_uncore_read() which ends up
generating a pointless branch for every mmio register access.
So looks like there might be quite a bit of room for improvement
in the mmio path still.
Reviewed-by: Animesh Manna <animesh.manna@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240930170415.23841-12-ville.syrjala@linux.intel.com
Different hardware generations have different scanout alignment
requirements. Introduce a new vfunc that will allow us to
make that distinction without horrible if-ladders.
For now we directly plug in the existing intel_surf_alignment()
and intel_cursor_alignment() functions.
For fbdev we (temporarily) introduce intel_fbdev_min_alignment()
that simply queries the alignment from the primary plane of
the first crtc.
TODO: someone will need to fix xe's alignment handling
Reviewed-by: Imre Deak <imre.deak@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240612204712.31404-4-ville.syrjala@linux.intel.com
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Split out frontbuffer related declarations and static inlines from
gem/i915_gem_object.h into new gem/i915_gem_object_frontbuffer.h.
The main goal is to reduce header interdependencies. With
gem/i915_gem_object.h including display/intel_frontbuffer.h,
modification of the latter causes a whopping 300+ objects to be rebuilt,
while many of the source files actually needing it aren't explicitly
including it at all.
After the change, only 21 objects depend on display/intel_frontbuffer.h,
directly or indirectly.
Cc: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Jouni Högander <jouni.hogander@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230830085127.2416842-1-jani.nikula@intel.com
Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.
Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.
Also some places actually needed i915_irq.h too.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
Add display/intel_display_trace.[ch] for defining display
tracepoints. The main goal is to reduce cross-includes between gem and
display. It would be possible split up tracing even further, but that
would lead to more boilerplate.
We end up having to include intel_crtc.h in a few places because it was
pulled in implicitly via intel_de.h -> i915_trace.h -> intel_crtc.h, and
that's no longer the case.
There should be no changes to tracepoints.
v3:
- Rebase
v2:
- Define TRACE_INCLUDE_PATH relative to define_trace.h (Chris)
- Remove useless comments (Ville)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/7862ad764fbd0748d903c76bc632d3d277874e5b.1638961423.git.jani.nikula@intel.com
Chop vlv_sprite_update() into two halves. Fist half becomes
the _noarm() variant, second part the _arm() variant.
Fortunately I have already previously grouped the register
writes into roughtly the correct order, so the split looks
surprisingly clean.
Looks like most of the hardware logic was copied from the
pre-ctg sprite C, so SPSTRIDE/POS/SIZE are armed by SPSURF,
while the rest are self arming. SPCONSTALPHA is the one
entirely new register that didn't exist in the old sprite C,
and looks like that one is self arming. The CHV pipe B CSC
is also self arming, like the rest of the CHV pipe B
additions.
I didn't have time to capture i915_update_info numbers for
these, but since all the other platforms generally showed
improvements, and crucially no regression, I am fairly
confident this should behave similarly.
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-10-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Chop g4x_sprite_update() into two halves. Fist half becomes
the _noarm() variant, second part the _arm() variant.
Fortunately I have already previously grouped the register
writes into roughtly the correct order, so the split looks
surprisingly clean.
Not much of a change in i915_update_info on these older
platforms that don't have so many planes or registers to
begin with. Here are the numbers from snb (totally unpatched
vs. both primary plane and sprite patched applied) running
kms_atomic_transition --r plane-all-transition --extended:
w/o patch w/ patch
Updates: 5404 Updates: 5405
| |
1us |****** 1us |******
|********* |*********
4us |*********** 4us |***********
|********** |**********
16us |** 16us |**
| |
66us | 66us |
| |
262us | 262us |
| |
1ms | 1ms |
| |
4ms | 4ms |
| |
17ms | 17ms |
| |
Min update: 1400ns Min update: 1307ns
Max update: 19809ns Max update: 20194ns
Average update: 6957ns Average update: 6432ns
Overruns > 100us: 0 Overruns > 100us: 0
But there seems to be a slight improvement with
lockdep enabled:
w/o patch w/ patch
Updates: 17612 Updates: 16364
| |
1us | 1us |
|****** |******
4us |********** 4us |**********
|************ |*************
16us |************* 16us |************
|*** |*
66us | 66us |
| |
262us | 262us |
| |
1ms | 1ms |
| |
4ms | 4ms |
| |
17ms | 17ms |
| |
Min update: 3141ns Min update: 3562ns
Max update: 126450ns Max update: 73354ns
Average update: 16373ns Average update: 15153ns
Overruns > 250us: 0 Overruns > 250us: 0
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-8-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
The amount of plane registers we have to write has been steadily
increasing, putting more pressure on the vblank evasion mechanism
and forcing us to increase its time budget. Let's try to take some
of the pressure off by splitting plane updates into two parts:
1) write all non-self arming plane registers, ie. the registers
where the write actually does nothing until a separate arming
register is also written which will cause the hardware to latch
the new register values at the next start of vblank
2) write all self arming plane registers, ie. registers which always
just latch at the next start of vblank, and registers which also
arm other registers to do so
Here we just provide the mechanism, but don't actually implement
the split on any platform yet. so everything stays now in the _arm()
hooks. Subsequently we can move a whole bunch of stuff into the
_noarm() part, especially in more modern platforms where the number
of registers we have to write is also the greatest. On older
platforms this is less beneficial probably, but no real reason
to deviate from a common behaviour.
And let's sprinkle some TODOs around the areas that will need
adapting.
Cc: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211018115030.3547-5-ville.syrjala@linux.intel.com
Reviewed-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
The next patch needs to distinguish between a view's mapping and scanout
stride. Rename the current stride parameter to mapping_stride with the
script below. mapping_stride will keep the same meaning as stride had
on all platforms so far, while the meaning of it will change on ADLP.
No functional changes.
@@
identifier intel_fb_view;
identifier i915_color_plane_view;
identifier color_plane;
expression e;
type T;
@@
struct intel_fb_view {
...
struct i915_color_plane_view {
...
- T stride;
+ T mapping_stride;
...
} color_plane[e];
...
};
@@
struct i915_color_plane_view pv;
@@
pv.
- stride
+ mapping_stride
@@
struct i915_color_plane_view *pvp;
@@
pvp->
- stride
+ mapping_stride
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211026225105.2783797-6-imre.deak@intel.com
By using the modifier plane capability flags to encode the modifiers'
CCS type and tiling attributes, it becomes simpler to the check for
any of these capabilities when providing the list of supported
modifiers.
This also allows distinguishing modifiers on future platforms where
platforms with the same display version support different modifiers. An
example is DG2 and ADLP, both being D13, where DG2 supports only F and X
tiling, while ADLP supports only Y and X tiling. With the
INTEL_PLANE_CAP_TILING_* flags added in this patch we can provide
the correct modifiers for each platform.
v2:
- Define PLANE_HAS_* with macros instead of an enum. (Jani)
- Rename PLANE_HAS_*_ANY to PLANE_HAS_*_MASK. (Jani)
- Rename PLANE_HAS_* to INTEL_PLANE_CAP_*.
- Set the CCS_RC_CC cap only for DISPLAY_VER >= 12.
- Set the TILING_Y cap only for DISPLAY_VER < 13 || ADLP.
- Simplify the SKL plane cap display version checks and move them
to a separate function.
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20211027125150.2891371-1-imre.deak@intel.com
Add a table describing all the framebuffer modifiers used by i915 at one
place. This has the benefit of deduplicating the listing of supported
modifiers for each platform and checking the support of these modifiers
on a given plane. This also simplifies in a similar way getting some
attribute for a modifier, for instance checking if the modifier is a
CCS modifier type.
While at it drop the cursor plane filtering from skl_plane_has_rc_ccs(),
as the cursor plane is registered with DRM core elsewhere.
v1: Unchanged.
v2:
- Keep the plane caps calculation in the plane code and pass an enum
with these caps to intel_fb_get_modifiers(). (Ville)
- Get the modifiers calling intel_fb_get_modifiers() in i9xx_plane.c as
well.
v3:
- s/.id/.modifier/ (Ville)
- Keep modifier_desc vs. plane_cap filter conditions consistent. (Ville)
- Drop redundant cursor plane check from skl_plane_has_rc_ccs(). (Ville)
- Use from, until display version fields in modifier_desc instead of a mask. (Jani)
- Unexport struct intel_modifier_desc, separate its decl and init. (Jani)
- Remove enum pipe, plane_id forward decls from intel_fb.h, which are
not needed after v2.
v4:
- Reuse IS_DISPLAY_VER() instead of open-coding it. (Jani)
- Preserve the current modifier order exposed to user space. (Ville)
v5: Use }, { on one line to seperate the descriptor array elements. (Jani)
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Imre Deak <imre.deak@intel.com>
Reviewed-by: Juha-Pekka Heikkila <juhapekka.heikkila@gmail.com> (v3)
Link: https://patchwork.freedesktop.org/patch/msgid/20211020195138.1841242-2-imre.deak@intel.com