Commit Graph

3179 Commits

Author SHA1 Message Date
Vinay Belgaumkar
dff0fc4990 drm/i915/guc/slpc: Initial definitions for SLPC
Add macros to check for SLPC support. This feature is currently supported
for Gen12+ and enabled whenever GuC submission is enabled/selected.

Include templates for SLPC init/fini and enable.

v2: Move SLPC helper functions to intel_guc_slpc.c/.h. Define
basic template for SLPC structure in intel_guc_slpc_types.h.
Fix copyright (Michal W)

v3: Review comments (Michal W)

v4: Include supported/selected inside slpc struct (Michal W)

Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Sundaresan Sujaritha <sujaritha.sundaresan@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210730202119.23810-2-vinay.belgaumkar@intel.com
2021-08-03 16:05:20 -07:00
Lucas De Marchi
3989de0ef5 drm/i915/xehp: Fix missing sentinel on mcr_ranges_xehp
There's a missing sentinel since we are not using ARRAY_SIZE(), but rather
checking that the .start is 0 to stop the iteration in mcr_range().

	BUG: KASAN: global-out-of-bounds in mcr_range.isra.0+0x69/0xa0 [i915]
	Read of size 4 at addr ffffffffa0889928 by task modprobe/3881

Fixes: d8905ba705 ("drm/i915/xehp: Define multicast register ranges")
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210730191115.2514581-1-lucas.demarchi@intel.com
2021-07-30 19:06:13 -07:00
Lucas De Marchi
6266992cf1 drm/i915/gt: remove GRAPHICS_VER == 10
Replace all remaining handling of GRAPHICS_VER {==,>=} 10 with
{==,>=} 11. With the removal of CNL, there is no platform with graphics
version equals 10.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728220326.1578242-5-lucas.demarchi@intel.com
2021-07-29 10:06:10 -07:00
Lucas De Marchi
701d31860d drm/i915/gt: rename CNL references in intel_engine.h
With the removal of CNL, let's consider ICL as the first platform using
that index.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728220326.1578242-4-lucas.demarchi@intel.com
2021-07-29 10:06:09 -07:00
Lucas De Marchi
91a197e4e1 drm/i915/gt: remove explicit CNL handling from intel_sseu.c
CNL is the only platform with GRAPHICS_VER == 10. With its removal we
don't need to handle that version anymore.

Also we can now reduce the max number of slices: the call to
intel_sseu_set_info() with the highest number of slices comes from SKL
and BDW with 3 slices. Recent platforms actually increase the
number of subslices so the number of slices remain 1.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728220326.1578242-3-lucas.demarchi@intel.com
2021-07-29 10:06:08 -07:00
Lucas De Marchi
94fd8400c2 drm/i915/gt: remove explicit CNL handling from intel_mocs.c
Only one reference to CNL that is not needed, but code is the same for
GEN9_BC, so leave the code around and just remove the special
case for CNL.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210728220326.1578242-2-lucas.demarchi@intel.com
2021-07-29 10:06:06 -07:00
Daniel Vetter
bb13ea2825 drm/i915: Remove i915_globals
No longer used.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-10-daniel.vetter@ffwll.ch
2021-07-28 17:19:52 +02:00
Daniel Vetter
2dcec7d3fe drm/i915: move intel_context slab to direct module init/exit
With the global kmem_cache shrink infrastructure gone there's nothing
special and we can convert them over.

I'm doing this split up into each patch because there's quite a bit of
noise with removing the static global.slab_ce to just a
slab_ce.

v2: Make slab static (Jason, 0day)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727121037.2041102-4-daniel.vetter@ffwll.ch
2021-07-28 16:45:58 +02:00
Daniele Ceraolo Spurio
e754dccbc9 drm/i915/guc: Unblock GuC submission on Gen11+
Unblock GuC submission on Gen11+ platforms.

v2:
 (Martin Peres / John H)
  - Delete debug message when GuC is disabled by default on certain
    platforms

Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-34-matthew.brost@intel.com
2021-07-27 17:32:30 -07:00
Matthew Brost
ee242ca704 drm/i915/guc: Implement GuC priority management
Implement a simple static mapping algorithm of the i915 priority levels
(int, -1k to 1k exposed to user) to the 4 GuC levels. Mapping is as
follows:

i915 level < 0          -> GuC low level     (3)
i915 level == 0         -> GuC normal level  (2)
i915 level < INT_MAX    -> GuC high level    (1)
i915 level == INT_MAX   -> GuC highest level (0)

We believe this mapping should cover the UMD use cases (3 distinct user
levels + 1 kernel level).

In addition to static mapping, a simple counter system is attached to
each context tracking the number of requests inflight on the context at
each level. This is needed as the GuC levels are per context while in
the i915 levels are per request.

v2:
 (Daniele)
  - Add BUILD_BUG_ON to enforce ordering of priority levels
  - Add missing lockdep to guc_prio_fini
  - Check for return before setting context registered flag
  - Map DISPLAY priority or higher to highest guc prio
  - Update comment for guc_prio

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-33-matthew.brost@intel.com
2021-07-27 17:32:27 -07:00
John Harrison
3a7b72665e drm/i915/selftest: Bump selftest timeouts for hangcheck
Some testing environments and some heavier tests are slower than
previous limits allowed for. For example, it can take multiple seconds
for the 'context has been reset' notification handler to reach the
'kill the requests' code in the 'active' version of the 'reset
engines' test. During which time the selftest gets bored, gives up
waiting and fails the test.

There is also an async thread that the selftest uses to pump work
through the hardware in parallel to the context that is marked for
reset. That also could get bored waiting for completions and kill the
test off.

Lastly, the flush at the of various test sections can also see
timeouts due to the large amount of work backed up. This is also true
of the live_hwsp_read test.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-32-matthew.brost@intel.com
2021-07-27 17:32:26 -07:00
John Harrison
617e87c05c drm/i915/selftest: Fix hangcheck self test for GuC submission
When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Conversely, one of the tests specifically sends hanging batches to the
engines but wants them to sit around until a manual reset of the full
GT (including GuC itself). That means disabling GuC based engine
resets to prevent those from killing the hanging batch too soon. So,
add support to the scheduling policy helper for disabling resets as
well as making them quicker!

In GuC submission mode, the 'is engine idle' test basically turns into
'is engine PM wakelock held'. Independently, there is a heartbeat
disable helper function that the tests use. For unexplained reasons,
this acquires the engine wakelock before disabling the heartbeat and
only releases it when re-enabling the heartbeat. As one of the tests
tries to do a wait for idle in the middle of a heartbeat disabled
section, it is therefore guaranteed to always fail. Added a 'no_pm'
variant of the heartbeat helper that allows the engine to be asleep
while also having heartbeats disabled.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-31-matthew.brost@intel.com
2021-07-27 17:32:23 -07:00
Rahul Kumar Singh
064a1f35bf drm/i915/selftest: Fix MOCS selftest for GuC submission
When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-29-matthew.brost@intel.com
2021-07-27 17:32:21 -07:00
Rahul Kumar Singh
3a4bfa091c drm/i915/selftest: Fix workarounds selftest for GuC submission
When GuC submission is enabled, the GuC controls engine resets. Rather
than explicitly triggering a reset, the driver must submit a hanging
context to GuC and wait for the reset to occur.

Signed-off-by: Rahul Kumar Singh <rahul.kumar.singh@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-28-matthew.brost@intel.com
2021-07-27 17:32:18 -07:00
John Harrison
3f5dff6c18 drm/i915/selftest: Better error reporting from hangcheck selftest
There are many ways in which the hangcheck selftest can fail. Very few
of them actually printed an error message to say what happened. So,
fill in the missing messages.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-27-matthew.brost@intel.com
2021-07-27 17:32:17 -07:00
Matthew Brost
62eaf0ae21 drm/i915/guc: Support request cancellation
This adds GuC backend support for i915_request_cancel(), which in turn
makes CONFIG_DRM_I915_REQUEST_TIMEOUT work.

This implementation makes use of fence while there are likely simplier
options. A fence was chosen because of another feature coming soon
which requires a user to block on a context until scheduling is
disabled. In that case we return the fence to the user and the user can
wait on that fence.

v2:
 (Daniele)
  - A comment about locking the blocked incr / decr
  - A comments about the use of the fence
  - Update commit message explaining why fence
  - Delete redundant check blocked count in unblock function
  - Ring buffer implementation
  - Comment about blocked in submission path
  - Shorter rpm path
v3:
 (Checkpatch)
  - Fix typos in commit message
 (Daniel)
  - Rework to simplier locking structure in guc_context_block / unblock

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-26-matthew.brost@intel.com
2021-07-27 17:32:14 -07:00
Matthew Brost
ae8ac10dfd drm/i915/guc: Implement banned contexts for GuC submission
When using GuC submission, if a context gets banned disable scheduling
and mark all inflight requests as complete.

Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-25-matthew.brost@intel.com
2021-07-27 17:32:12 -07:00
John Harrison
481d458cae drm/i915/guc: Add golden context to GuC ADS
The media watchdog mechanism involves GuC doing a silent reset and
continue of the hung context. This requires the i915 driver provide a
golden context to GuC in the ADS.

v2:
 (Matthew Brost):
  - Fix memory corruption in shmem_read
 (John H)
  - Use locals rather than defines for LR_* + SKIP_SIZE

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-24-matthew.brost@intel.com
2021-07-27 17:32:09 -07:00
John Harrison
731c2ad5e1 drm/i915/guc: Include scheduling policies in the debugfs state dump
Added the scheduling policy parameters to the 'guc_info' debugfs state
dump.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-23-matthew.brost@intel.com
2021-07-27 17:32:08 -07:00
John Harrison
cb6cc81586 drm/i915/guc: Connect reset modparam updates to GuC policy flags
Changing the reset module parameter has no effect on a running GuC.
The corresponding entry in the ADS must be updated and then the GuC
informed via a Host2GuC message.

The new debugfs interface to module parameters allows this to happen.
However, connecting the parameter data address back to anything useful
is messy. One option would be to pass a new private data structure
address through instead of just the parameter pointer. However, that
means having a new (and different) data structure for each parameter
and a new (and different) write function for each parameter. This
method keeps everything generic by instead using a string lookup on
the directory entry name.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-22-matthew.brost@intel.com
2021-07-27 17:32:06 -07:00
John Harrison
7935785240 drm/i915/guc: Hook GuC scheduling policies up
Use the official driver default scheduling policies for configuring
the GuC scheduler rather than a bunch of hardcoded values.

v2:
 (Matthew Brost)
  - Move I915_ENGINE_WANT_FORCED_PREEMPTION to later patch

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Cc: Jose Souza <jose.souza@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-21-matthew.brost@intel.com
2021-07-27 17:32:05 -07:00
John Harrison
dc0dad365c drm/i915/guc: Fix for error capture after full GPU reset with GuC
In the case of a full GPU reset (e.g. because GuC has died or because
GuC's hang detection has been disabled), the driver can't rely on GuC
reporting the guilty context. Instead, the driver needs to scan all
active contexts and find one that is currently executing, as per the
execlist mode behaviour. In GuC mode, this scan is different to
execlist mode as the active request list is handled very differently.

Similarly, the request state dump in debugfs needs to be handled
differently when in GuC submission mode.

Also refactured some of the request scanning code to avoid duplication
across the multiple code paths that are now replicating it.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-20-matthew.brost@intel.com
2021-07-27 17:32:02 -07:00
Matthew Brost
573ba126ae drm/i915/guc: Capture error state on context reset
We receive notification of an engine reset from GuC at its
completion. Meaning GuC has potentially cleared any HW state
we may have been interested in capturing. GuC resumes scheduling
on the engine post-reset, as the resets are meant to be transparent,
further muddling our error state.

There is ongoing work to define an API for a GuC debug state dump. The
suggestion for now is to manually disable FW initiated resets in cases
where debug state is needed.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-19-matthew.brost@intel.com
2021-07-27 17:31:59 -07:00
John Harrison
c17b637928 drm/i915/guc: Enable GuC engine reset
Clear the 'disable resets' flag to allow GuC to reset hung contexts
(detected via pre-emption timeout).

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-18-matthew.brost@intel.com
2021-07-27 17:31:58 -07:00
John Harrison
d75dc57fee drm/i915/guc: Don't complain about reset races
It is impossible to seal all race conditions of resets occurring
concurrent to other operations. At least, not without introducing
excesive mutex locking. Instead, don't complain if it occurs. In
particular, don't complain if trying to send a H2G during a reset.
Whatever the H2G was about should get redone once the reset is over.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-17-matthew.brost@intel.com
2021-07-27 17:31:57 -07:00
John Harrison
6de12da166 drm/i915/guc: Provide mmio list to be saved/restored on engine reset
The driver must provide GuC with a list of mmio registers
that should be saved/restored during a GuC-based engine reset.
Unfortunately, the list must be dynamically allocated as its size is
variable. That means the driver must generate the list twice - once to
work out the size and a second time to actually save it.

v2:
 (Alan / CI)
  - GEN7_GT_MODE -> GEN6_GT_MODE to fix WA selftest failure

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-16-matthew.brost@intel.com
2021-07-27 17:31:55 -07:00
Matthew Brost
933864af11 drm/i915/guc: Enable the timer expired interrupt for GuC
The GuC can implement execution qunatums, detect hung contexts and
other such things but it requires the timer expired interrupt to do so.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-15-matthew.brost@intel.com
2021-07-27 17:31:54 -07:00
Matthew Brost
f7957e603c drm/i915/guc: Handle engine reset failure notification
GuC will notify the driver, via G2H, if it fails to
reset an engine. We recover by resorting to a full GPU
reset.

v2:
 (John Harrison):
  - s/drm_dbg/drm_err

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Fernando Pacheco <fernando.pacheco@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-14-matthew.brost@intel.com
2021-07-27 17:31:52 -07:00
Matthew Brost
1e0fd2b5da drm/i915/guc: Handle context reset notification
GuC will issue a reset on detecting an engine hang and will notify
the driver via a G2H message. The driver will service the notification
by resetting the guilty context to a simple state or banning it
completely.

v2:
 (John Harrison)
  - Move msg[0] lookup after length check
v3:
 (John Harrison)
  - s/drm_dbg/drm_err

Cc: Matthew Brost <matthew.brost@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-13-matthew.brost@intel.com
2021-07-27 17:31:50 -07:00
Matthew Brost
cad46a332f drm/i915/guc: Suspend/resume implementation for new interface
The new GuC interface introduces an MMIO H2G command,
INTEL_GUC_ACTION_RESET_CLIENT, which is used to implement suspend. This
MMIO tears down any active contexts generating a context reset G2H CTB
for each. Once that step completes the GuC tears down the CTB
channels. It is safe to suspend once this MMIO H2G command completes
and all G2H CTBs have been processed. In practice the i915 will likely
never receive a G2H as suspend should only be called after the GPU is
idle.

Resume is implemented in the same manner as before - simply reload the
GuC firmware and reinitialize everything (e.g. CTB channels, contexts,
etc..).

v2:
 (Michel / John H)
  - INTEL_GUC_ACTION_RESET_CLIENT 0x5B01 -> 0x5507

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-12-matthew.brost@intel.com
2021-07-27 17:31:48 -07:00
Matthew Brost
e5a1ad0359 drm/i915/guc: Add disable interrupts to guc sanitize
Add disable GuC interrupts to intel_guc_sanitize(). Part of this
requires moving the guc_*_interrupt wrapper function into header file
intel_guc.h.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-11-matthew.brost@intel.com
2021-07-27 17:31:47 -07:00
Matthew Brost
c41ee2873e drm/i915: Reset GPU immediately if submission is disabled
If submission is disabled by the backend for any reason, reset the GPU
immediately in the heartbeat code as the backend can't be reenabled
until the GPU is reset.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-10-matthew.brost@intel.com
2021-07-27 17:31:45 -07:00
Matthew Brost
eb5e7da736 drm/i915/guc: Reset implementation for new GuC interface
Reset implementation for new GuC interface. This is the legacy reset
implementation which is called when the i915 owns the engine hang check.
Future patches will offload the engine hang check to GuC but we will
continue to maintain this legacy path as a fallback and this code path
is also required if the GuC dies.

With the new GuC interface it is not possible to reset individual
engines - it is only possible to reset the GPU entirely. This patch
forces an entire chip reset if any engine hangs.

v2:
 (Michal)
  - Check for -EPIPE rather than -EIO (CT deadlock/corrupt check)
v3:
 (John H)
  - Split into a series of smaller patches
v4:
 (John H)
  - Fix typo
  - Add braces around if statements in reset code
v5:
 (Checkpatch)
  - Fix warnings

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <john.c.harrison@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-9-matthew.brost@intel.com
2021-07-27 17:31:42 -07:00
Matthew Brost
d1cee2d37a drm/i915: Move active request tracking to a vfunc
Move active request tracking to a backend vfunc rather than assuming all
backends want to do this in the manner. In the of case execlists /
ring submission the tracking is on the physical engine while with GuC
submission it is on the context.

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-8-matthew.brost@intel.com
2021-07-27 17:31:39 -07:00
Matthew Brost
a95d116098 drm/i915/guc: Direct all breadcrumbs for a class to single breadcrumbs
With GuC virtual engines the physical engine which a request executes
and completes on isn't known to the i915. Therefore we can't attach a
request to a physical engines breadcrumbs. To work around this we create
a single breadcrumbs per engine class when using GuC submission and
direct all physical engine interrupts to this breadcrumbs.

v2:
 (John H)
  - Rework header file structure so intel_engine_mask_t can be in
    intel_engine_types.h

Signed-off-by: Matthew Brost <matthew.brost@intel.com>
CC: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-6-matthew.brost@intel.com
2021-07-27 17:31:35 -07:00
John Harrison
96d3e0e1ad drm/i915/guc: Make hangcheck work with GuC virtual engines
The serial number tracking of engines happens at the backend of
request submission and was expecting to only be given physical
engines. However, in GuC submission mode, the decomposition of virtual
to physical engines does not happen in i915. Instead, requests are
submitted to their virtual engine mask all the way through to the
hardware (i.e. to GuC). This would mean that the heart beat code
thinks the physical engines are idle due to the serial number not
incrementing. Which in turns means hangcheck does not work for
GuC virtual engines.

This patch updates the tracking to decompose virtual engines into
their physical constituents and tracks the request against each. This
is not entirely accurate as the GuC will only be issuing the request
to one physical engine. However, it is the best that i915 can do given
that it has no knowledge of the GuC's scheduling decisions.

Downside of this is that all physical engines constituting a GuC
virtual engine will be periodically unparked (even during just a single
context executing) in order to be pinged with a heartbeat request.
However the power and performance cost of this is not expected to be
measurable (due low frequency of heartbeat pulses) and it is considered
an easier option than trying to make changes to GuC firmware.

v2:
 (Tvrtko)
  - Update commit message
  - Have default behavior if no vfunc present

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-3-matthew.brost@intel.com
2021-07-27 17:31:31 -07:00
Matthew Brost
556120256e drm/i915/guc: GuC virtual engines
Implement GuC virtual engines. Rather simple implementation, basically
just allocate an engine, setup context enter / exit function to virtual
engine specific functions, set all other variables / functions to guc
versions, and set the engine mask to that of all the siblings.

v2: Update to work with proto-ctx
v3:
 (Daniele)
  - Drop include, add comment to intel_virtual_engine_has_heartbeat

Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210727002348.97202-2-matthew.brost@intel.com
2021-07-27 17:31:28 -07:00
Lucas De Marchi
816753c06f drm/i915/gt: nuke gen6_hw_id
This is only used by GRAPHICS_VER == 6 and GRAPHICS_VER == 7. All other
recent platforms do not depend on this field, so it doesn't make much
sense to keep it generic like that. Instead, just do a mapping from
engine class to HW ID in the single place that is needed.

v2: use macros with the direct register address instead of calculating
from the legacy HW_ID (Matt Roper)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723002551.3906535-1-lucas.demarchi@intel.com
2021-07-26 08:15:18 -07:00
Matt Roper
bfac1e2b6e drm/i915/xehp: Xe_HP forcewake support
Implement Xe_HP forcewake handling.  While we're at it, let's reorder to
the forcewake assignment if/else ladder to match our usual driver
conventions.

Co-authored-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Stuart Summers <stuart.summers@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723174239.1551352-6-matthew.d.roper@intel.com
2021-07-24 07:18:18 -07:00
John Harrison
ddabf72176 drm/i915/xehp: Extra media engines - Part 3 (reset)
Xe_HP can have a lot of extra media engines. This patch adds the reset
support for them.

Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723174239.1551352-5-matthew.d.roper@intel.com
2021-07-24 07:17:41 -07:00
John Harrison
1b16b6b696 drm/i915/xehp: Extra media engines - Part 2 (interrupts)
Xe_HP can have a lot of extra media engines. This patch adds the
interrupt handler support for them.

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723174239.1551352-4-matthew.d.roper@intel.com
2021-07-24 07:17:12 -07:00
John Harrison
938c778f6a drm/i915/xehp: Extra media engines - Part 1 (engine definitions)
Xe_HP can have a lot of extra media engines. This patch adds the basic
definitions for them.

v2:
 - Re-order intel_gt_info and intel_device_info slightly to avoid
   unnecessary padding now that we've increased the size of
   intel_engine_mask_t.  (Tvrtko)
v3:
 - Drop the .hw_id assignments.  (Lucas)
v4:
 - Fix graphics_ver typo for VCS4 (should be 12, not 11).  (Lucas)

Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210723191024.1553405-1-matthew.d.roper@intel.com
2021-07-24 07:16:50 -07:00
Matt Roper
d8905ba705 drm/i915/xehp: Define multicast register ranges
Since we can't steer multicast register reads during ring-based
workaround verification, we need to define the multicast ranges where
failure to steer could potentially cause us to read back from a
fused-off register instance.

As with gen12, we can ignore the multicast ranges that the bspec
describes as 'SQIDI' since all instances of those registers will always
be present and we'll always be able to read back a workaround value that
was written with multicast.

Bspec: 66534
Cc: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210714031540.3539704-11-matthew.d.roper@intel.com
2021-07-23 10:28:16 -07:00
José Roberto de Souza
0b03d93fde drm/i915: Extend Wa_1406941453 to adl-p
Workaround also needed for alderlake-P.

HSDES: 14010801662
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210722192041.92346-1-jose.souza@intel.com
2021-07-23 10:24:19 -07:00
Tvrtko Ursulin
eea97e42f4 drm/i915/xehp: VDBOX/VEBOX fusing registers are enable-based
On Xe_HP the fusing register is renamed and changed to have the "enable"
semantics, but otherwise remains compatible (mmio address, bitmask
ranges) with older platforms.

To simplify things we do not add a new register definition but just stop
inverting the fusing masks before processing them.

Bspec: 52615
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721223043.834562-6-matthew.d.roper@intel.com
2021-07-22 19:15:06 -07:00
Lucas De Marchi
265b5ee0d3 drm/i915/gt: rename legacy engine->hw_id to engine->gen6_hw_id
We kept adding new engines and for that increasing hw_id unnecessarily:
it's not used since GRAPHICS_VER == 8. Prepend "gen6" to the field and
try to pack it in the structs to give a hint this field is actually not
used in recent platforms.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210720232014.3302645-4-lucas.demarchi@intel.com
2021-07-22 17:02:20 -07:00
Lucas De Marchi
f9be30003f drm/i915/gt: nuke unused legacy engine hw_id
The engine hw_id is only used by RING_FAULT_REG(), which is not used
by GRAPHICS_VER >= 8. We did use hw_id on recent platforms to set
the engine's guc_id, but that is not the case anymore since
commit c784e5249e ("drm/i915/guc: Update to use firmware v49.0.1"):
now we only use class and id information to generate guc_id.

We tend to keep adding new defines just to be consistent, but let's try
to remove them and let them defined to 0 for engines that only exist on
gen8+ platforms.

v2: Reword commit message and add information about when we stopped
using hw_id (Matt Roper)

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210720232014.3302645-3-lucas.demarchi@intel.com
2021-07-22 17:02:14 -07:00
Lucas De Marchi
7894375e27 drm/i915/gt: fix platform prefix
gen8_clear_engine_error_register() is actually not used by
GRAPHICS_VER >= 8, since for those we are using another register that is
not engine-dependent. Fix the platform prefix, to make clear we are not
using any GEN6_RING_FAULT_REG_* one GRAPHICS_VER >= 8.

Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210720232014.3302645-2-lucas.demarchi@intel.com
2021-07-22 16:51:58 -07:00
Matthew Brost
e03b59064b drm/i915: Add intel_context tracing
Add intel_context tracing. These trace points are particular helpful
when debugging the GuC firmware and can be enabled via
CONFIG_DRM_I915_LOW_LEVEL_TRACEPOINTS kernel config option.

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-19-matthew.brost@intel.com
2021-07-22 10:07:30 -07:00
Matthew Brost
dbf9da8d55 drm/i915/guc: Add trace point for GuC submit
Add trace point for GuC submit. Extended existing request trace points
to include submit fence value,, guc_id, and ring tail value.

v2: Fix white space alignment in i915_request_add trace point
v3: Delete dep_from , dep_to (Tvrtko)

Cc: John Harrison <john.c.harrison@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210721215101.139794-18-matthew.brost@intel.com
2021-07-22 10:07:29 -07:00