Active busyness of an engine is calculated using gt timestamp and the
context switch in time. While capturing the gt timestamp, it's possible
that the context switches out. This race could result in an active
busyness value that is greater than the actual context runtime value by a
small amount. This leads to a negative delta and throws off busyness
calculations for the user.
If a subsequent count is smaller than the previous one, just return the
previous one, since we expect the busyness to catch up.
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-3-umesh.nerlige.ramappa@intel.com
(cherry picked from commit cf907f6d29)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
The __to_intel_display() generics require the definition of struct
drm_i915_private i.e. inclusion of i915_drv.h. Add
intel_display_conversion.c with a __i915_to_display() function to do the
conversion without the intel_display_conversion.h having an implicit
dependency on i915_drv.h.
The long term goal is to remove __to_intel_display() and the
intel_display_conversion.[ch] files altoghether, and this is merely a
transitional step to make the dependencies on i915_drv.h explicit.
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/39e99b765b8c1a05d001659c39686a661ac268e2.1732104170.git.jani.nikula@intel.com
The long term goal is to remove the __to_intel_display() generics from
display macros, such as register macros. This requires that all such
macro usage passes struct intel_display * rather than struct
drm_i915_private * to the macros.
The short term goal is to hide the struct drm_i915_private access in
intel_display_conversions.h into a function. This is problematic with
gvt, because it's a separate module, and the conversion function would
need to be exported.
Make the conversion to always passing struct intel_display * in gvt to
unblock both of the above.
Reviewed-by: Luca Coelho <luciano.coelho@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/266616e14db8d9a342fd93ec9752f561149a799b.1732104170.git.jani.nikula@intel.com
Active busyness of an engine is calculated using gt timestamp and the
context switch in time. While capturing the gt timestamp, it's possible
that the context switches out. This race could result in an active
busyness value that is greater than the actual context runtime value by a
small amount. This leads to a negative delta and throws off busyness
calculations for the user.
If a subsequent count is smaller than the previous one, just return the
previous one, since we expect the busyness to catch up.
Fixes: 77cdd054dd ("drm/i915/pmu: Connect engine busyness stats from GuC to pmu")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241127174006.190128-3-umesh.nerlige.ramappa@intel.com
Xe3 is capable of switching automatically to min ddb allocation
(not using any extra blocks) or interim SAGV-adjusted allocation
in case if async flip is used. Introduce the minimum and interim
ddb allocation configuration for that purpose. Also i915 is
replaced with intel_display within the patch's context
v2: update min/interim ddb declarations and handling (Ville)
update to register definitions styling
consolidation of minimal wm0 check with min DDB check
Bspec: 69880, 72053
Signed-off-by: Stanislav Lisovskiy <stanislav.lisovskiy@intel.com>
Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241121112726.510220-4-vinod.govindapillai@intel.com
Future platforms can have new additions in the plane_wm
registers. So update skl_wm_level_from_reg_val() to have
possiblity for such platform differentiations. This is in
preparation for the rest of the patches in this series where
hw support for the minimum and interim ddb allocations for
async flip is added. Replace all the i915 uses to intel_display
in this function while updating this function
Signed-off-by: Vinod Govindapillai <vinod.govindapillai@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241121112726.510220-2-vinod.govindapillai@intel.com
G8 power state entry is disabled due to a limitation on DG2, so we
enable it from driver with Wa_14022698537. For now we enable it for
all DG2 devices with the exception of a few, for which, we enable
only when paired with whitelisted CPU models. This works with native
ASPM and reduces idle power consumption.
$ echo powersave > /sys/module/pcie_aspm/parameters/policy
$ lspci -s 0000:03:00.0 -vvv
LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-
v2: Fix Wa_ID and include it in subject (Badal)
Rephrase commit message (Jani)
v3: Move workaround to i915_pcode_init() (Badal, Anshuman)
Re-order macro (Riana)
v4: Spell fix (Riana)
Signed-off-by: Raag Jadav <raag.jadav@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241211115952.1659287-5-raag.jadav@intel.com
The v6.13-rc2 release included a bunch of breaking changes,
specifically the MODULE_IMPORT_NS commit.
Backmerge in order to fix them before the next pull-request.
Include the fix from Stephen Roswell.
Caused by commit
25c3fd1183 ("drm/virtio: Add a helper to map and note the dma addrs and lengths")
Interacting with commit
cdd30ebb1b ("module: Convert symbol namespace to string literal")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209121717.2abe8026@canb.auug.org.au
Signed-off-by: Maarten Lankhorst <dev@lankhorst.se>
Catch up with -rc2 and fixing namespace conflict issue caused by
commit cdd30ebb1b ("module: Convert symbol namespace to string literal")
and commit 0c45e76fcc ("drm/xe/vsec: Support BMG devices")
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
DSB LUT register writes vs. palette anti-collision logic
appear to interact in interesting ways:
- posted DSB writes simply vanish into thin air while
anti-collision is active
- non-posted DSB writes actually get blocked by the anti-collision
logic, but unfortunately this ends up hogging the bus for
long enough that unrelated parallel CPU MMIO accesses start
to disappear instead
Even though we are updating the LUT during vblank we aren't
immune to the anti-collision logic because it kicks in briefly
for pipe prefill (initiated at frame start). The safe time
window for performing the LUT update is thus between the
undelayed vblank and frame start. Turns out that with low
enough CDCLK frequency (DSB execution speed depends on CDCLK)
we can exceed that.
As we are currently using non-posted writes for the legacy LUT
updates, in which case we can hit the far more severe failure
mode. The problem is exacerbated by the fact that non-posted
writes are much slower than posted writes (~4x it seems).
To mititage the problem let's switch to using posted DSB
writes for legacy LUT updates (which will involve using the
double write approach to avoid other problems with DSB
vs. legacy LUT writes). Despite writing each register twice
this will in fact make the legacy LUT update faster when
compared to the non-posted write approach, making the
problem less likely to appear. The failure mode is also
less severe.
This isn't the 100% solution we need though. That will involve
estimating how long the LUT update will take, and pushing
frame start and/or delayed vblank forward to guarantee that
the update will have finished by the time the pipe prefill
starts...
Cc: stable@vger.kernel.org
Fixes: 34d8311f4a ("drm/i915/dsb: Re-instate DSB for LUT updates")
Fixes: 25ea3411bd ("drm/i915/dsb: Use non-posted register writes for legacy LUT")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241120164123.12706-3-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
(cherry picked from commit 2504a316b3)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Turns out the DSB indexed register write command has
rather significant initial overhead compared to the normal
MMIO write command. Based on some quick experiments on TGL
you have to write the register at least ~5 times for the
indexed write command to come out ahead. If you write the
register less times than that the MMIO write is faster.
So it seems my automagic indexed write logic was a bit
misguided. Go back to the original approach only use
indexed writes for the cases we know will benefit from
it (indexed LUT register updates).
Currently we shouldn't have any cases where this truly
matters (just some rare double writes to the precision
LUT index registers), but we will need to switch the
legacy LUT updates to write each LUT register twice (to
avoid some palette anti-collision logic troubles).
This would be close to the worst case for using indexed
writes (two writes per register, and 256 separate registers).
Using the MMIO write command should shave off around 30%
of the execution time compared to using the indexed write
command.
Cc: stable@vger.kernel.org
Fixes: 34d8311f4a ("drm/i915/dsb: Re-instate DSB for LUT updates")
Fixes: 25ea3411bd ("drm/i915/dsb: Use non-posted register writes for legacy LUT")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241120164123.12706-2-ville.syrjala@linux.intel.com
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
(cherry picked from commit ecba559a88)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>
Commit 255fc1703e ("drm/i915/gem: Calculate object page offset for
partial memory mapping") introduced a new offset, which accounts for
userspace mapping not starting from the beginning of object's scatterlist.
This works fine for cases where first object pte is larger than the new
offset - "r->sgt.curr" counter is set to the offset to match the difference
in the number of total pages. However, if object's first pte's size is
equal to or smaller than the offset, then information about the offset
in userspace is covered up by moving "r->sgt" pointer in remap_sg():
r->sgt.curr += PAGE_SIZE;
if (r->sgt.curr >= r->sgt.max)
r->sgt = __sgt_iter(__sg_next(r->sgt.sgp), use_dma(r->iobase));
This means that two or more pages from virtual memory are counted for
only one page in object's memory, because after moving "r->sgt" pointer
"r->sgt.curr" will be 0.
We should account for this mismatch by moving "r->sgt" pointer to the
next pte. For that we may use "r.sgt.max", which already holds the max
allowed size. This change also eliminates possible confusion, when
looking at i915_scatterlist.h and remap_io_sg() code: former has
scatterlist pointer definition, which differentiates "s.max" value
based on "dma" flag (sg_dma_len() is used only when the flag is
enabled), while latter uses sg_dma_len() indiscriminately.
This patch aims to resolve issue:
https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12031
v3:
- instead of checking if r.sgt.curr would exceed allowed max, changed
the value in the while loop to be aligned with `dma` value
v4:
- remove unnecessary parent relation
v5:
- update commit message with explanation about page counting mismatch
and link to the issue
Signed-off-by: Krzysztof Karas <krzysztof.karas@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/upbjdavlbcxku63ns4vstp5kgbn2anxwewpmnppszgb67fn66t@tfclfgkqijue