Commit Graph

3179 Commits

Author SHA1 Message Date
Daniele Ceraolo Spurio
b910f716f6 drm/i915/guc: define media GT GuC send regs
The media GT shares the G-unit with the root GT, so a second set of
communication registers is required for the media GuC.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-6-daniele.ceraolospurio@intel.com
2022-11-14 10:11:47 -08:00
Aravind Iddamsetty
ee71434eeb drm/i915/mtl: Handle wopcm per-GT and limit calculations.
With MTL standalone media architecture the wopcm layout has changed,
with separate partitioning in WOPCM for the root GT GuC and the media
GT GuC. The size of WOPCM is 4MB with the lower 2MB reserved for the
media GT and the upper 2MB for the root GT.

Given that MTL has GuC deprivilege, the WOPCM registers are pre-locked
by the bios. Therefore, we can skip all the math for the partitioning
and just limit ourselves to sanity-checking the values.

v2: fix makefile file ordering (Jani)
v3: drop XELPM_SAMEDIA_WOPCM_SIZE, check huc instead of VDBOX (John)
v4: further clarify commit message, remove blank line (John)

Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-5-daniele.ceraolospurio@intel.com
2022-11-14 10:11:47 -08:00
Daniele Ceraolo Spurio
016241168d drm/i915/uc: use different ggtt pin offsets for uc loads
Our current FW loading process is the same for all FWs:

- Pin FW to GGTT at the start of the ggtt->uc_fw node
- Load the FW
- Unpin

This worked because we didn't have a case where 2 FWs would be loaded on
the same GGTT at the same time. On MTL, however, this can happen if both
GTs are reset at the same time, so we can't pin everything in the same
spot and we need to use separate offset. For simplicity, instead of
calculating the exact required size, we reserve a 2MB slot for each fw.

v2: fail fetch if FW is > 2MBs, improve comments (John)
v3: more comment improvements (John)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-3-daniele.ceraolospurio@intel.com
2022-11-14 09:58:31 -08:00
Daniele Ceraolo Spurio
472098c877 drm/i915/huc: only load HuC on GTs that have VCS engines
On MTL the primary GT doesn't have any media capabilities, so no video
engines and no HuC. We must therefore skip the HuC fetch and load on
that specific case. Given that other multi-GT platforms might have HuC
on the primary GT, we can't just check for that and it is easier to
instead check for the lack of VCS engines.

Based on code from Aravind Iddamsetty

v2: clarify which engine_mask is used for each GT and why (Tvrtko)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221108020600.3575467-1-daniele.ceraolospurio@intel.com
2022-11-14 09:58:29 -08:00
Tvrtko Ursulin
3653727560 drm/i915: Simplify internal helper function signature
Since we are now storing the GT backpointer in the wa list we can drop the
explicit struct intel_gt * argument to wa_list_apply.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221110124633.3135026-1-tvrtko.ursulin@linux.intel.com
2022-11-14 11:35:21 +00:00
Badal Nilawar
8f70f1ec58 drm/i915/mtl: Add Wa_14017073508 for SAMedia
This workaround is added for Media tile of MTL A step. It is to help
pcode workaround which handles the hardware issue seen during package C2/C3
transitions due to RC6 entry/exit transitions on Media tile. As a part of
workaround pcode expect kmd to send mailbox message "media busy" when
components of Media tile are in use and "media idle" otherwise.
As per workaround description gucrc need to be disabled so enabled
host based RC for Media tile.

v2:
 - Correct workaround id (Matt)
 - Fix review comments (Rodrigo)

Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221103184559.2306481-1-badal.nilawar@intel.com
2022-11-14 16:58:22 +05:30
Riana Tauro
733827eea6 drm/i915/guc/slpc: Add selftest for slpc tile-tile interaction
Run a workload on tiles simultaneously by requesting for RP0 frequency.
Pcode can however limit the frequency being granted due to throttling
reasons. This test checks if there is any throttling but does not fail
if RP0 is not granted due to throttle reasons

v2: Fix build error
v3: Use IS_ERR_OR_NULL to check worker
    Addressed cosmetic review comments (Tvrtko)
v4: do not skip test on media engines if gt type is GT_MEDIA.
    Use correct PERF_LIMIT_REASONS register for MTL (Vinay)

Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109112541.275021-2-riana.tauro@intel.com
2022-11-11 20:18:33 +05:30
Jani Nikula
801543b259 drm/i915: stop including i915_irq.h from i915_trace.h
Turns out many of the files that need i915_reg.h get it implicitly via
{display/intel_de.h, gt/intel_context.h} -> i915_trace.h -> i915_irq.h
-> i915_reg.h. Since i915_trace.h doesn't actually need i915_irq.h,
makes sense to drop it, but that requires adding quite a few new
includes all over the place.

Prefer including i915_reg.h where needed instead of adding another
implicit include, because eventually we'll want to split up i915_reg.h
and only include the specific registers at each place.

Also some places actually needed i915_irq.h too.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/6e78a2e0ac1bffaf5af3b5ccc21dff05e6518cef.1668008071.git.jani.nikula@intel.com
2022-11-11 13:05:19 +02:00
Jani Nikula
e563531ace drm/i915: split out intel_display_reg_defs.h
Split out the display register helper macros to a separate file. For
now, include it from i915_reg.h, but note that there are already files
that don't need i915_reg.h, such as intel_audio.c.

Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/3af47193ff5219b6d2cfe353b752ec4bb44de4f1.1668008071.git.jani.nikula@intel.com
2022-11-11 12:44:49 +02:00
Tvrtko Ursulin
a10234fda4 drm/i915: Partial abandonment of legacy DRM logging macros
Convert some usages of legacy DRM logging macros into versions which tell
us on which device have the events occurred.

v2:
 * Don't have struct drm_device as local. (Jani, Ville)

v3:
 * Store gt, not i915, in workaround list. (John)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221109104633.2579245-1-tvrtko.ursulin@linux.intel.com
2022-11-10 12:35:46 +00:00
Daniele Ceraolo Spurio
8b693ea26c drm/i915/guc: don't hardcode BCS0 in guc_hang selftest
On MTL there are no BCS engines on the media GT, so we can't always use
BCS0 in the test. There is no actual reason to use a BCS engine over an
engine of a different class, so switch to using any available engine.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102214310.2829310-1-daniele.ceraolospurio@intel.com
2022-11-07 11:28:29 -08:00
Daniele Ceraolo Spurio
194babe26b drm/i915/mtl: don't expose GSC command streamer to the user
There is no userspace user for this CS yet, we only need it for internal
kernel ops (e.g. HuC, PXP), so don't expose it.

v2: even if it's not exposed, rename the engine so it is easier to
identify in the debug logs (Matt)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102171047.2787951-6-daniele.ceraolospurio@intel.com
2022-11-07 11:03:57 -08:00
Daniele Ceraolo Spurio
ef8281abb1 drm/i915/mtl: add GSC CS reset support
The GSC CS has its own dedicated bit in the GDRST register.

Bspec: 52549
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102171047.2787951-5-daniele.ceraolospurio@intel.com
2022-11-07 11:03:53 -08:00
Daniele Ceraolo Spurio
c07ee63690 drm/i915/mtl: add GSC CS interrupt support
The GSC CS re-uses the same interrupt bits that the GSC used in older
platforms. This means that we can now have an engine interrupt coming
out of OTHER_CLASS, so we need to handle that appropriately.

v2: clean up the if statement for the engine irq (Tvrtko)

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102171047.2787951-4-daniele.ceraolospurio@intel.com
2022-11-07 11:03:45 -08:00
Daniele Ceraolo Spurio
c9c12ba72e drm/i915/mtl: pass the GSC CS info to the GuC
We need to tell the GuC that the GSC CS is there.

Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102171047.2787951-3-daniele.ceraolospurio@intel.com
2022-11-07 11:03:39 -08:00
Daniele Ceraolo Spurio
5fd974d164 drm/i915/mtl: add initial definitions for GSC CS
Starting on MTL, the GSC is no longer managed with direct MMIO access,
but we instead have a dedicated command streamer for it. As a first step
for adding support for this CS, add the required definitions.
Note that, although it is now a CS, the GSC retains its old
class:instance value (OTHER_CLASS instance 6)

Bspec: 65308, 45605
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102171047.2787951-2-daniele.ceraolospurio@intel.com
2022-11-07 11:03:35 -08:00
John Harrison
178b8a3668 drm/i915/guc: Don't deadlock busyness stats vs reset
The engine busyness stats has a worker function to do things like
64bit extend the 32bit hardware counters. The GuC's reset prepare
function flushes out this worker function to ensure no corruption
happens during the reset. Unforunately, the worker function has an
infinite wait for active resets to finish before doing its work. Thus
a deadlock would occur if the worker function had actually started
just as the reset starts.

The function being used to lock the reset-in-progress mutex is called
intel_gt_reset_trylock(). However, as noted it does not follow
standard 'trylock' conventions and exit if already locked. So rename
the current _trylock function to intel_gt_reset_lock_interruptible(),
which is the behaviour it actually provides. In addition, add a new
implementation of _trylock and call that from the busyness stats
worker instead.

v2: Rename existing trylock to interruptible rather than trying to
preserve the existing (confusing) naming scheme (review comments from
Tvrtko).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-3-John.C.Harrison@Intel.com
2022-11-04 13:45:45 -07:00
John Harrison
de51de9672 drm/i915/guc: Properly initialise kernel contexts
If a context has already been registered prior to first submission
then context init code was not being called. The noticeable effect of
that was the scheduling priority was left at zero (meaning super high
priority) instead of being set to normal. This would occur with
kernel contexts at start of day as they are manually pinned up front
rather than on first submission. So add a call to initialise those
when they are pinned.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221102192109.2492625-2-John.C.Harrison@Intel.com
2022-11-04 13:45:44 -07:00
John Harrison
cc2e0cf0ad drm/i915/guc: Remove excessive line feeds in state dumps
Some of the GuC state dump messages were adding extra line feeds. When
printing via a DRM printer to dmesg, for example, that messes up the
log formatting as it loses any prefixing from the printer. Given that
the extra line feeds are just in the middle of random bits of GuC
state, there isn't any real need for them. So just remove them
completely.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031220007.4176835-1-John.C.Harrison@Intel.com
2022-11-04 12:08:19 -07:00
Dave Airlie
60ba8c5bd9 Merge tag 'drm-intel-gt-next-2022-11-03' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
Driver Changes:

- Fix for #7306: [Arc A380] white flickering when using arc as a
  secondary gpu (Matt A)
- Add Wa_18017747507 for DG2 (Wayne)
- Avoid spurious WARN on DG1 due to incorrect cache_dirty flag
  (Niranjana, Matt A)
- Corrections to CS timestamp support for Gen5 and earlier (Ville)

- Fix a build error used with clang compiler on hwmon (GG)
- Improvements to LMEM handling with RPM (Anshuman, Matt A)
- Cleanups in dmabuf code (Mike)

- Selftest improvements (Matt A)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y2N11wu175p6qeEN@jlahtine-mobl.ger.corp.intel.com
2022-11-04 17:33:34 +10:00
Ville Syrjälä
1086af67b9 drm/i915/selftests: Run the perf MI_BB tests on gen4/5
Now that we know the ring timestamp frequency on gen4/5 we
can run the perf tests that depend on sampling the timestamp.

On g4x/ilk we must read the udw of the 64bit timestamp
register. Details in {g4x,gen5)_read_clock_frequency().

When executing the read via the CS i965 doesn't seem to need
the double read trick that CPU mmio reads need.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-7-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:55:56 +02:00
Ville Syrjälä
38530a37de drm/i915/selftests: Test RING_TIMESTAMP on gen4/5
Now that we actually know the cs timestamp frequency on gen4/5
let's run the corresponding test.

On g4x/ilk we must read the udw of the 64bit timestamp
register. Details in {g4x,gen5)_read_clock_frequency().

The one extra caveat is that on i965 (or at least CL, don't
recall if I ever tested on BW) we must read the register
twice to get an up to date value. For some unknown reason
the first read tends to return a stale value.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-6-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:54:59 +02:00
Ville Syrjälä
cf8a82de21 drm/i915/selftests: Run MI_BB perf selftests on SNB
SNB does have the RING_TIMESTAMP register on the RCS engine.
Run the MI_BB perf tests on it.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-5-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:54:48 +02:00
Ville Syrjälä
dbea79a502 drm/i915: Fix cs timestamp frequency for cl/bw
Despite what the spec says the TIMESTAMP register seems to
tick once every hrawclk (confirmed on i965gm and g35).

v2: Rebase

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-4-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:53:59 +02:00
Ville Syrjälä
78e418d0ea drm/i915: Stop claiming cs timestamp frquency on gen2/3
Gen2/3 have no TIMESTAMP registers to sample so no point in thinking
we have any frequency for it either.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-3-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:53:47 +02:00
Ville Syrjälä
ad1ea98019 drm/i915: Fix cs timestamp frequency for ctg/elk/ilk
On ilk the UDW of TIMESTAMP increments every 1000 ns,
LDW is mbz. In order to represent that we'd need 52 bits,
but we only have 32 bits. Even worse most things want to
only deal with 32 bits of timestamp. So let's just set
up the timestamp frequency as if we only had the UDW.

On ctg/elk 63:20 of TIMESTAMP increments every 1/4 ns, 19:0
are mbz. To make life simpler let's ignore the LDW and set up
timestamp frequency based on the UDW only (increments every
1024 ns).

v2: Rebase

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031135703.14670-2-ville.syrjala@linux.intel.com
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2022-11-02 08:53:32 +02:00
Wayne Boyer
ea9c6215ac drm/i915/dg2: Introduce Wa_18017747507
WA 18017747507 applies to all DG2 skus.

BSpec: 56035, 46121, 68173

Signed-off-by: Wayne Boyer <wayne.boyer@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221031131509.3411195-1-wayne.boyer@intel.com
2022-11-01 14:29:39 -07:00
Dave Airlie
f80c71f7a8 Merge tag 'drm-intel-next-2022-10-28' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Hotplug code clean-up and organization (Jani, Gustavo)
- More VBT specific code clean-up, doc, organization,
  and improvements (Ville)
- More MTL enabling work (Matt, RK, Anusha, Jose)
- FBC related clean-ups and improvements (Ville)
- Removing unused sw_fence_await_reservation (Niranjana)
- Big chunch of display house clean-up (Ville)
- Many Watermark fixes and clean-ups (Ville)
- Fix device info for devices without display (Jani)
- Fix TC port PLLs after readout (Ville)
- DPLL ID clean-ups (Ville)
- Prep work for finishing (de)gamma readout (Ville)
- PSR fixes and improvements (Jouni, Jose)
- Reject excessive dotclocks early (Ville)
- DRRS related improvements (Ville)
- Simplify uncore register updates (Andrzej)
- Fix simulated GPU reset wrt. encoder HW readout (Imre)
- Add a ADL-P workaround (Jose)
- Fix clear mask in GEN7_MISCCPCTL update (Andrzej)
- Temporarily disable runtime_pm for discrete (Anshuman)
- Improve fbdev debugs (Nirmoy)
- Fix DP FRL link training status (Ankit)
- Other small display fixes (Ankit, Suraj)
- Allow panel fixed modes to have differing sync
  polarities (Ville)
- Clean up crtc state flag checks (Ville)
- Fix race conditions during DKL PHY accesses (Imre)
- Prep-work for cdclock squash and crawl modes (Anusha)
- ELD precompute and readout (Ville)

Signed-off-by: Dave Airlie <airlied@redhat.com>

From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/Y1wd6ZJ8LdJpCfZL@intel.com
2022-11-01 17:48:17 +10:00
Anshuman Gupta
e66c8dcf99 drm/i915: Encapsulate lmem rpm stuff in intel_runtime_pm
Runtime pm is not really per GT, therefore it make sense to
move lmem_userfault_list, lmem_userfault_lock and
userfault_wakeref from intel_gt to intel_runtime_pm structure,
which is embedded to i915.

No functional change.

v2:
- Fixes the code comment nit. [Matt Auld]

Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221027092242.1476080-2-anshuman.gupta@intel.com
2022-10-31 18:29:17 +05:30
Matt Roper
876e9047a9 drm/i915/mtl: Add missing steering table terminators
The termination entries were missing for a couple of the recently-added
MTL steering tables.

Fixes: f32898c94a ("drm/i915/xelpg: Add multicast steering")
Fixes: a7ec65fc7e ("drm/i915/xelpmp: Add multicast steering for media GT")
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221028224022.964997-1-matthew.d.roper@intel.com
2022-10-28 17:36:56 -07:00
Vinay Belgaumkar
01e7427467 drm/i915/guc: Support OA when Wa_16011777198 is enabled
On DG2, a w/a resets RCS/CCS before it goes into RC6. This breaks OA
since OA does not expect engine resets during its use. Fix it by
disabling RC6.

v2: (Ashutosh)
- Bring back slpc_unset_param helper
- Update commit msg
- Use with_intel_runtime_pm helper for set/unset

v3: (Ashutosh)
- Just use intel_uc_uses_guc_rc

Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-15-umesh.nerlige.ramappa@intel.com
2022-10-27 12:37:01 -07:00
Umesh Nerlige Ramappa
fa56980434 drm/i915/perf: Save/restore EU flex counters across reset
If a drm client is killed, then hw contexts used by the client are reset
immediately. This reset clears the EU flex counter configuration. If an
OA use case is running in parallel, it would start seeing zeroed eu
counter values following the reset even if the drm client is restarted.
Save/restore the EU flex counter config so that the EU counters can be
monitored continuously across resets.

v2:
- Save/restore eu flex config only for gen12, as for pre-gen12, these
  are saved and restored in the context image.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-14-umesh.nerlige.ramappa@intel.com
2022-10-27 12:36:59 -07:00
Umesh Nerlige Ramappa
ed6b25aa6f drm/i915/perf: Add Wa_1508761755:dg2
Disable Clock gating in EU when gathering the events so that EU events
are not lost.

v2: Fix checkpatch issues
v3: User MCR helpers to write to MC reg
v4: Indent correctly (checkpatch)

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-12-umesh.nerlige.ramappa@intel.com
2022-10-27 12:36:49 -07:00
Umesh Nerlige Ramappa
9677a9f3b1 drm/i915/perf: Move gt-specific data from i915->perf to gt->perf
Make perf part of gt as the OAG buffer is specific to a gt. The refactor
eventually simplifies programming the right OA buffer and the right HW
registers when supporting multiple gts.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-8-umesh.nerlige.ramappa@intel.com
2022-10-27 12:36:30 -07:00
Umesh Nerlige Ramappa
a5c3a3cbf0 drm/i915/perf: Determine gen12 oa ctx offset at runtime
Some SKUs of same gen12 platform may have different oactxctrl
offsets. For gen12, determine oactxctrl offsets at runtime.

v2: (Lionel)
- Move MI definitions to intel_gpu_commands.h
- Ensure __find_reg_in_lri does read past context image size

v3: (Ashutosh)
- Drop unnecessary use of double underscores
- fix find_reg_in_lri
- Return error if oa context offset is U32_MAX
- Error out if oa_ctx_ctrl_offset does not find offset

v4: (Ashutosh)
- Warn on odd MI LRI_LEN
- Remove unnecessary check for valid_oactxctrl_offset
- Drop valid_oactxctrl_offset macro

v5: Drop unrelated comment

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-5-umesh.nerlige.ramappa@intel.com
2022-10-27 12:36:13 -07:00
Umesh Nerlige Ramappa
2d9da58521 drm/i915/perf: Fix noa wait predication for DG2
Predication for batch buffer commands changed in XEHPSDV.
MI_BATCH_BUFFER_START predicates based on MI_SET_PREDICATE_RESULT
register. The MI_SET_PREDICATE_RESULT register can only be modified
with MI_SET_PREDICATE command. When configured, the MI_SET_PREDICATE
command sets MI_SET_PREDICATE_RESULT based on bit 0 of
MI_PREDICATE_RESULT_2. Use this to configure predication in noa_wait.

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-4-umesh.nerlige.ramappa@intel.com
2022-10-27 12:35:59 -07:00
Umesh Nerlige Ramappa
682aa4373f drm/i915/perf: Fix OA filtering logic for GuC mode
With GuC mode of submission, GuC is in control of defining the context
id field that is part of the OA reports. To filter reports, UMD and KMD
must know what sw context id was chosen by GuC. There is not interface
between KMD and GuC to determine this, so read the upper-dword of
EXECLIST_STATUS to filter/squash OA reports for the specific context.

v2: Explain guc id stealing w.r.t OA use case

Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026222102.5526-2-umesh.nerlige.ramappa@intel.com
2022-10-27 12:35:56 -07:00
Nathan Chancellor
a8a4f0467d drm/i915: Fix CFI violations in gt_sysfs
When booting with CONFIG_CFI_CLANG, there are numerous violations when
accessing the files under
/sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0:

  $ cd /sys/devices/pci0000:00/0000:00:02.0/drm/card0/gt/gt0

  $ grep . *
  id:0
  punit_req_freq_mhz:350
  rc6_enable:1
  rc6_residency_ms:214934
  rps_act_freq_mhz:1300
  rps_boost_freq_mhz:1300
  rps_cur_freq_mhz:350
  rps_max_freq_mhz:1300
  rps_min_freq_mhz:350
  rps_RP0_freq_mhz:1300
  rps_RP1_freq_mhz:350
  rps_RPn_freq_mhz:350
  throttle_reason_pl1:0
  throttle_reason_pl2:0
  throttle_reason_pl4:0
  throttle_reason_prochot:0
  throttle_reason_ratl:0
  throttle_reason_status:0
  throttle_reason_thermal:0
  throttle_reason_vr_tdc:0
  throttle_reason_vr_thermalert:0

  $ sudo dmesg &| grep "CFI failure at"
  [  214.595903] CFI failure at kobj_attr_show+0x19/0x30 (target: id_show+0x0/0x70 [i915]; expected type: 0xc527b809)
  [  214.596064] CFI failure at kobj_attr_show+0x19/0x30 (target: punit_req_freq_mhz_show+0x0/0x40 [i915]; expected type: 0xc527b809)
  [  214.596407] CFI failure at kobj_attr_show+0x19/0x30 (target: rc6_enable_show+0x0/0x40 [i915]; expected type: 0xc527b809)
  [  214.596528] CFI failure at kobj_attr_show+0x19/0x30 (target: rc6_residency_ms_show+0x0/0x270 [i915]; expected type: 0xc527b809)
  [  214.596682] CFI failure at kobj_attr_show+0x19/0x30 (target: act_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.596792] CFI failure at kobj_attr_show+0x19/0x30 (target: boost_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.596893] CFI failure at kobj_attr_show+0x19/0x30 (target: cur_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.596996] CFI failure at kobj_attr_show+0x19/0x30 (target: max_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.597099] CFI failure at kobj_attr_show+0x19/0x30 (target: min_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.597198] CFI failure at kobj_attr_show+0x19/0x30 (target: RP0_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.597301] CFI failure at kobj_attr_show+0x19/0x30 (target: RP1_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.597405] CFI failure at kobj_attr_show+0x19/0x30 (target: RPn_freq_mhz_show+0x0/0xe0 [i915]; expected type: 0xc527b809)
  [  214.597538] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.597701] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.597836] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.597952] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.598071] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.598177] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.598307] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.598439] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)
  [  214.598542] CFI failure at kobj_attr_show+0x19/0x30 (target: throttle_reason_bool_show+0x0/0x50 [i915]; expected type: 0xc527b809)

With kCFI, indirect calls are validated against their expected type
versus actual type and failures occur when the two types do not match.
The ultimate issue is that these sysfs functions are expecting to be
called via dev_attr_show() but they may also be called via
kobj_attr_show(), as certain files are created under two different
kobjects that have two different sysfs_ops in intel_gt_sysfs_register(),
hence the warnings above. When accessing the gt_ files under
/sys/devices/pci0000:00/0000:00:02.0/drm/card0, which are using the same
sysfs functions, there are no violations, meaning the functions are
being called with the proper type.

To make everything work properly, adjust certain functions to match the
type of the ->show() and ->store() members in 'struct kobj_attribute'.
Add a macro to generate functions for that can be called via both
dev_attr_{show,store}() or kobj_attr_{show,store}() so that they can be
called through both kobject locations without violating kCFI and adjust
the attribute groups to account for this.

Link: https://github.com/ClangBuiltLinux/linux/issues/1716
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221013205909.1282545-1-nathan@kernel.org
2022-10-27 19:14:53 +02:00
Alan Previn
a7ac9d84b8 drm/i915/guc: Remove intel_context:number_committed_requests counter
With the introduction of the delayed disable-sched behavior,
we use the GuC's xarray of valid guc-id's as a way to
identify if new requests had been added to a context
when the said context is being checked for closure.

Additionally that prior change also closes the race for when
a new incoming request fails to cancel the pending
delayed disable-sched worker.

With these two complementary checks, we see no more
use for intel_context:guc_state:number_committed_requests.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-3-alan.previn.teres.alexis@intel.com
2022-10-26 17:29:47 -07:00
Matthew Brost
8332109430 drm/i915/guc: Delay disabling guc_id scheduling for better hysteresis
Add a delay, configurable via debugfs (default 34ms), to disable
scheduling of a context after the pin count goes to zero. Disable
scheduling is a costly operation as it requires synchronizing with
the GuC. So the idea is that a delay allows the user to resubmit
something before doing this operation. This delay is only done if
the context isn't closed and less than a given threshold
(default is 3/4) of the guc_ids are in use.

Alan Previn: Matt Brost first introduced this patch back in Oct 2021.
However no real world workload with measured performance impact was
available to prove the intended results. Today, this series is being
republished in response to a real world workload that benefited greatly
from it along with measured performance improvement.

Workload description: 36 containers were created on a DG2 device where
each container was performing a combination of 720p 3d game rendering
and 30fps video encoding. The workload density was configured in a way
that guaranteed each container to ALWAYS be able to render and
encode no less than 30fps with a predefined maximum render + encode
latency time. That means the totality of all 36 containers and their
workloads were not saturating the engines to their max (in order to
maintain just enough headrooom to meet the min fps and max latencies
of incoming container submissions).

Problem statement: It was observed that the CPU core processing the i915
soft IRQ work was experiencing severe load. Using tracelogs and an
instrumentation patch to count specific i915 IRQ events, it was confirmed
that the majority of the CPU cycles were caused by the
gen11_other_irq_handler() -> guc_irq_handler() code path. The vast
majority of the cycles was determined to be processing a specific G2H
IRQ: i.e. INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE. These IRQs are sent
by GuC in response to i915 KMD sending H2G requests:
INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_SET. Those H2G requests are sent
whenever a context goes idle so that we can unpin the context from GuC.
The high CPU utilization % symptom was limiting density scaling.

Root Cause Analysis: Because the incoming execution buffers were spread
across 36 different containers (each with multiple contexts) but the
system in totality was NOT saturated to the max, it was assumed that each
context was constantly idling between submissions. This was causing
a thrashing of unpinning contexts from GuC at one moment, followed quickly
by repinning them due to incoming workload the very next moment. These
event-pairs were being triggered across multiple contexts per container,
across all containers at the rate of > 30 times per sec per context.

Metrics: When running this workload without this patch, we measured an
average of ~69K INTEL_GUC_ACTION_SCHED_CONTEXT_MODE_DONE events every 10
seconds or ~10 million times over ~25+ mins. With this patch, the count
reduced to ~480 every 10 seconds or about ~28K over ~10 mins. The
improvement observed is ~99% for the average counts per 10 seconds.

Design awareness: Selftest impact.
As temporary WA disable this feature for the selftests. Selftests are
very timing sensitive and any change in timing can cause failure. A
follow up patch will fixup the selftests to understand this delay.

Design awareness: Race between guc_request_alloc and guc_context_close.
If a context close is issued while there is a request submission in
flight and a delayed schedule disable is pending, guc_context_close
and guc_request_alloc will race to cancel the delayed disable.
To close the race, make sure that guc_request_alloc waits for
guc_context_close to finish running before checking any state.

Design awareness: GT Reset event.
If a gt reset is triggered, as preparation steps, add an additional step
to ensure all contexts that have a pending delay-disable-schedule task
be flushed of it. Move them directly into the closed state after cancelling
the worker. This is okay because the existing flow flushes all
yet-to-arrive G2H's dropping them anyway.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006225121.826257-2-alan.previn.teres.alexis@intel.com
2022-10-26 17:29:43 -07:00
Alan Previn
befb231d5d drm/i915/guc: Fix GuC error capture sizing estimation and reporting
During GuC error capture initialization, we estimate the amount of size
we need for the error-capture-region of the shared GuC-log-buffer.
This calculation was incorrect so fix that. With the fixed calculation
we can reduce the allocation of error-capture region from 4MB to 1MB
(see note2 below for reasoning). Additionally, switch from drm_notice to
drm_debug for the 3X spare size check since that would be impossible to
hit without redesigning gpu_coredump framework to hold multiple captures.

NOTE1: Even for 1x the min size estimation case, actually running out
of space is a corner case because it can only occur if all engine
instances get reset all at once and i915 isn't able extract the capture
data fast enough within G2H handler worker.

NOTE2: With the corrected calculation, a DG2 part required ~77K and a PVC
required ~115K (1X min-est-size that is calculated as one-shot all-engine-
reset scenario).

Fixes: d7c15d76a5 ("drm/i915/guc: Check sizing of guc_capture output")
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Chris Wilson <chris.p.wilson@intel.com>
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221026060506.1007830-2-alan.previn.teres.alexis@intel.com
2022-10-26 15:00:39 -07:00
Vinay Belgaumkar
37d52e446e drm/i915/slpc: Use platform limits for min/max frequency
GuC will set the min/max frequencies to theoretical max on
ATS-M. This will break kernel ABI, so limit min/max frequency
to RP0(platform max) instead.

Also modify the SLPC selftest to update the min frequency
when we have a server part so that we can iterate between
platform min and max.

v2: Check softlimits instead of platform limits (Riana)
v3: More review comments (Ashutosh)
v4: No need to use saved_min_freq and other comments (Ashutosh)

Bug: https://gitlab.freedesktop.org/drm/intel/-/issues/7030

Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Riana Tauro <riana.tauro@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221024225453.4856-1-vinay.belgaumkar@intel.com
2022-10-26 14:24:30 -07:00
Vinay Belgaumkar
f864a29afc drm/i915/slpc: Optmize waitboost for SLPC
Waitboost (when SLPC is enabled) results in a H2G message. This can result
in thousands of messages during a stress test and fill up an already full
CTB. There is no need to request for boost if min softlimit is equal or
greater than it.

v2: Add the tracing back, and check requested freq
in the worker thread (Tvrtko)
v3: Check requested freq in dec_waiters as well
v4: Only check min_softlimit against boost_freq. Limit this
optimization for server parts for now.
v5: min_softlimit can be greater than boost (Ashutosh)

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221024171108.14373-1-vinay.belgaumkar@intel.com
2022-10-26 12:20:25 -07:00
Gustavo Sousa
e62f31e173 drm/i915/xelp: Add Wa_1806527549
Workaround to be applied to platforms using XE_LP graphics.

BSpec: 52890
Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019161334.119885-1-gustavo.sousa@intel.com
2022-10-26 10:50:13 -07:00
Alan Previn
5f8a3f65fc drm/i915/guc: Add compute reglist for guc err capture
We missed this at initial upstream because at that time
none of the GuC enabled platforms had a compute engine.
Add this now.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019072930.17755-3-alan.previn.teres.alexis@intel.com
2022-10-24 14:44:44 -07:00
Alan Previn
a894077890 drm/i915/guc: Add error-capture init warnings when needed
If GuC is being used and we initialized GuC-error-capture,
we need to be warning if we don't provide an error-capture
register list in the firmware ADS, for valid GT engines.
A warning makes sense as this would impact debugability
without realizing why a reglist wasn't retrieved and reported
by GuC.

However, depending on the platform, we might have certain
engines that have a register list for engine instance error state
but not for engine class. Thus, add a check only to warn if the
register list was non existent vs an empty list (use the
empty lists to skip the warning).

NOTE: if a future platform were to introduce new registers
in place of what was an empty list on existing / legacy hardware
engines no warning is provided as the empty list is meant
to be used intentionally. As an example, if a future hardware
were to add blitter engine-class-registers (new) on top
of the legacy blitter engine-instance-register (HEAD, TAIL, etc.),
no warning is generated.

Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221019072930.17755-2-alan.previn.teres.alexis@intel.com
2022-10-24 14:44:41 -07:00
John Harrison
d7a8680ec9 drm/i915: Improve long running compute w/a for GuC submission
A workaround was added to the driver to allow compute workloads to run
'forever' by disabling pre-emption on the RCS engine for Gen12.
It is not totally unbound as the heartbeat will kick in eventually
and cause a reset of the hung engine.

However, this does not work well in GuC submission mode. In GuC mode,
the pre-emption timeout is how GuC detects hung contexts and triggers
a per engine reset. Thus, disabling the timeout means also losing all
per engine reset ability. A full GT reset will still occur when the
heartbeat finally expires, but that is a much more destructive and
undesirable mechanism.

The purpose of the workaround is actually to give compute tasks longer
to reach a pre-emption point after a pre-emption request has been
issued. This is necessary because Gen12 does not support mid-thread
pre-emption and compute tasks can have long running threads.

So, rather than disabling the timeout completely, just set it to a
'long' value.

v2: Review feedback from Tvrtko - must hard code the 'long' value
instead of determining it algorithmically. So make it an extra CONFIG
definition. Also, remove the execlist centric comment from the
existing pre-emption timeout CONFIG option given that it applies to
more than just execlists.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Michal Mrozek <michal.mrozek@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-5-John.C.Harrison@Intel.com
2022-10-24 12:12:47 -07:00
John Harrison
47daf84a8b drm/i915: Make the heartbeat play nice with long pre-emption timeouts
Compute workloads are inherently not pre-emptible for long periods on
current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable with GuC
submission as it prevents per engine reset of hung contexts. Hence the
next patch will re-enable the timeout but bumped up by an order of
magnitude.

However, the heartbeat might not respect that. Depending upon current
activity, a pre-emption to the heartbeat pulse might not even be
attempted until the last heartbeat period. Which means that only one
period is granted for the pre-emption to occur. With the aforesaid
bump, the pre-emption timeout could be significantly larger than this
heartbeat period.

So adjust the heartbeat code to take the pre-emption timeout into
account. When it reaches the final (high priority) period, it now
ensures the delay before hitting reset is bigger than the pre-emption
timeout.

v2: Fix for selftests which adjust the heartbeat period manually.
v3: Add FIXME comment about selftests. Add extra FIXME comment and
drm_notices when setting heartbeat to a non-default value (review
feedback from Tvrtko)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-4-John.C.Harrison@Intel.com
2022-10-24 12:12:36 -07:00
John Harrison
c3bd49cd9a drm/i915: Fix compute pre-emption w/a to apply to compute engines
An earlier patch added support for compute engines. However, it missed
enabling the anti-pre-emption w/a for the new engine class. So move
the 'compute capable' flag earlier and use it for the pre-emption w/a
test.

Fixes: c674c5b934 ("drm/i915/xehp: CCS should use RCS setup functions")
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Cc: "Michał Winiarski" <michal.winiarski@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com>
Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Cc: "Thomas Hellström" <thomas.hellstrom@linux.intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Auld <matthew.auld@intel.com>
Cc: Jani Nikula <jani.nikula@intel.com>
Cc: Ramalingam C <ramalingam.c@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-3-John.C.Harrison@Intel.com
2022-10-24 12:12:21 -07:00
John Harrison
568944af44 drm/i915/guc: Limit scheduling properties to avoid overflow
GuC converts the pre-emption timeout and timeslice quantum values into
clock ticks internally. That significantly reduces the point of 32bit
overflow. On current platforms, worst case scenario is approximately
110 seconds. Rather than allowing the user to set higher values and
then get confused by early timeouts, add limits when setting these
values.

v2: Add helper functions for clamping (review feedback from Tvrtko).
v3: Add a bunch of BUG_ON range checks in addition to the checks
already in the clamping functions (Tvrtko)

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Acked-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20221006213813.1563435-2-John.C.Harrison@Intel.com
2022-10-24 12:11:59 -07:00