Driver Changes:
- Avoid infinite GPU waits by avoidin premature release of request's
reusable memory (Chris, Janusz)
- Expose RPS thresholds in sysfs (Tvrtko)
- Apply GuC SLPC min frequency softlimit correctly (Vinay)
- Restore SLPC efficient freq earlier (Vinay)
- Consider OA buffer boundary when zeroing out reports (Umesh)
- Extend Wa_14015795083 to TGL, RKL, DG1 and ADL (Matt R)
- Fix context workarounds with non-masked regs on MTL/DG2 (Lucas)
- Enable the CCS_FLUSH bit in the pipe control and in the CS for MTL+ (Andi)
- Update MTL workarounds 14018778641, 22016122933 (Tejas, Zhanjun)
- Ensure memory quiesced before AUX CCS invalidation (Jonathan)
- Add a gsc_info debugfs (Daniele)
- Invalidate the TLBs on each GT on multi-GT device (Chris)
- Fix a VMA UAF for multi-gt platform (Nirmoy)
- Do not use stolen on MTL due to HW bug (Nirmoy)
- Check HuC and GuC version compatibility on MTL (Daniele)
- Dump perf_limit_reasons for slow GuC init debug (Vinay)
- Replace kmap() with kmap_local_page() (Sumitra, Ira)
- Add sentinel to xehp_oa_b_counters for KASAN (Andrzej)
- Add the gen12_needs_ccs_aux_inv helper (Andi)
- Fixes and updates for GSC memory allocation (Daniele)
- Fix one wrong caching mode enum usage (Tvrtko)
- Fixes for GSC wakeref (Alan)
- Static checker fixes (Harshit, Arnd, Dan, Cristophe, David, Andi)
- Rename flags with bit_group_X according to the datasheet (Andi)
- Use direct alias for i915 in requests (Andrzej)
- Replace i915->gt0 with to_gt(i915) (Andi)
- Use the i915_vma_flush_writes helper (Tvrtko)
- Selftest improvements (Alan)
- Remove dead code (Tvrtko)
Signed-off-by: Dave Airlie <airlied@redhat.com>
# Conflicts:
# drivers/gpu/drm/i915/gt/uc/intel_gsc_fw.c
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZMy6kDd9npweR4uy@jlahtine-mobl.ger.corp.intel.com
On MTL, if the GSC Proxy init flows haven't completed, submissions to the
GSC engine will fail. Those init flows are dependent on the mei's
gsc_proxy component that is loaded in parallel with i915 and a
worker that could potentially start after i915 driver init is done.
That said, all subsytems that access the GSC engine today does check
for such init flow completion before using the GSC engine. However,
selftests currently don't wait on anything before starting.
To fix this, add a waiter function at the start of __run_selftests
that waits for gsc-proxy init flows to complete. Selftests shouldn't
care if the proxy-init failed as that should be flagged elsewhere.
Difference from prior versions:
v7: - Change the fw status to INTEL_UC_FIRMWARE_LOAD_FAIL if the
proxy-init fails so that intel_gsc_uc_fw_proxy_get_status
catches it. (Daniele)
v6: - Add a helper that returns something more than a boolean
so we selftest can stop waiting if proxy-init hadn't
completed but failed (Daniele).
v5: - Move the call to __wait_gsc_proxy_completed from common
__run_selftests dispatcher to the group-level selftest
function (Trvtko).
- change the pr_info to pr_warn if we hit the timeout.
v4: - Remove generalized waiters function table framework (Tvrtko).
- Remove mention of CI-framework-timeout from comments (Tvrtko).
v3: - Rebase to latest drm-tip.
v2: - Based on internal testing, increase the timeout for gsc-proxy
specific case to 8 seconds.
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720230126.375566-1-alan.previn.teres.alexis@intel.com
intel_gsc_uc_fw_proxy_init_done is used by a few code paths
and usages. However, certain paths need a wakeref while others
can't take a wakeref such as from the runtime_pm_resume callstack.
Add a param into this helper to allow callers to direct whether
to take the wakeref or not. This resolves the following bug:
INFO: task sh:2607 blocked for more than 61 seconds.
Not tainted 6.3.0-pxp-gsc-final-jun14+ #297
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:sh state:D stack:13016 pid:2607 ppid:2602 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x47b/0xe10
schedule+0x58/0xd0
rpm_resume+0x1cc/0x800
? __pfx_autoremove_wake_function+0x10/0x10
__pm_runtime_resume+0x42/0x80
__intel_runtime_pm_get+0x19/0x80 [i915]
gsc_uc_get_fw_status+0x10/0x50 [i915]
intel_gsc_uc_fw_init_done+0x9/0x20 [i915]
intel_gsc_uc_load_start+0x5b/0x130 [i915]
__uc_resume+0xa5/0x280 [i915]
intel_runtime_resume+0xd4/0x250 [i915]
? __pfx_pci_pm_runtime_resume+0x10/0x10
__rpm_callback+0x3c/0x160
Fixes: 8c33c3755b ("drm/i915/gsc: take a wakeref for the proxy-init-completion check")
Signed-off-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230615211940.4061378-1-alan.previn.teres.alexis@intel.com
A few fixes/updates are required around the GSC memory allocation and it
is easier to do them all at the same time. The changes are as follows:
1 - Switch the memory allocation to stolen memory. We need to avoid
accesses from GSC FW to normal memory after the suspend function has
completed and to do so we can either switch to using stolen or make sure
the GSC is gone to sleep before the end of the suspend function. Given
that the GSC waits for a bit before going idle even if there are no
pending operations, it is easier and quicker to just use stolen memory.
2 - Reduce the GSC allocation size to 4MBs, which is the POR requirement.
The 8MBs were needed only for early FW and I had misunderstood that as
being the expected POR size when I sent the original patch.
3 - Perma-map the GSC allocation. This isn't required immediately, but it
will be needed later to be able to quickly extract the GSC logs, which are
inside the allocation. Since the mapping code needs to be rewritten due to
switching to stolen, it makes sense to do the switch immediately to avoid
having to change it again later.
Note that the explicit setting of CACHE_NONE for Wa_22016122933 has been
dropped because that's the default setting for stolen memory on !LLC
platforms.
v2: only memset the memory we're not overwriting (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230612181529.2222451-2-daniele.ceraolospurio@intel.com
Now that each FW has its own reserved area, we can keep them always
pinned and skip the pin/unpin dance on reset. This will make things
easier for the 2-step HuC authentication, which requires the FW to be
pinned in GGTT after the xfer is completed.
Since the vma is now valid for a long time and not just for the quick
pin-load-unpin dance, the name "dummy" is no longer appropriare and has
been replaced with vma_res. All the functions have also been updated to
operate on vma_res for consistency.
Given that we pin the vma behind the allocator's back (which is ok
because we do the pinning in an area that was previously reserved for
thus purpose), we do need to explicitly re-pin on resume because the
automated helper won't cover us.
v2: better comments and commit message, s/dummy/vma_res/
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230531235415.1467475-2-daniele.ceraolospurio@intel.com
The GSC notifies us of a proxy request via the HECI2 interrupt. The
interrupt must be enabled both in the HECI layer and in our usual gt irq
programming; for the latter, the interrupt is enabled via the same enable
register as the GSC CS, but it does have its own mask register. When the
interrupt is received, we also need to de-assert it in both layers.
The handling of the proxy request is deferred to the same worker that we
use for GSC load. New flags have been added to distinguish between the
init case and the proxy interrupt.
v2: Make sure not to set the reset bit when enabling/disabling the GSC
interrupts, fix defines (Alan)
v3: rebase on proxy status register check
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230502163854.317653-5-daniele.ceraolospurio@intel.com
The GSC uC needs to communicate with the CSME to perform certain
operations. Since the GSC can't perform this communication directly
on platforms where it is integrated in GT, i915 needs to transfer the
messages from GSC to CSME and back.
The proxy flow is as follow:
1 - i915 submits a request to GSC asking for the message to CSME
2 - GSC replies with the proxy header + payload for CSME
3 - i915 sends the reply from GSC as-is to CSME via the mei proxy
component
4 - CSME replies with the proxy header + payload for GSC
5 - i915 submits a request to GSC with the reply from CSME
6 - GSC replies either with a new header + payload (same as step 2,
so we restart from there) or with an end message.
After GSC load, i915 is expected to start the first proxy message chain,
while all subsequent ones will be triggered by the GSC via interrupt.
To communicate with the CSME, we use a dedicated mei component, which
means that we need to wait for it to bind before we can initialize the
proxies. This usually happens quite fast, but given that there is a
chance that we'll have to wait a few seconds the GSC work has been moved
to a dedicated WQ to not stall other processes.
v2: fix code style, includes and variable naming (Alan)
v3: add extra check for proxy status, fix includes and comments
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230502163854.317653-4-daniele.ceraolospurio@intel.com
The GSC FW load is a slow process (up to 250 ms), so we defer it to a
dedicated worker to avoid stalling the init flow for that long. However,
we currently start this worker before the HW init is complete, so there
is a chance that the GSC loading code submits to the HW before the
engine initialization has completed. We can easily fix this by starting
the thread later in the gt_resume flow.
From this later spot, the GSC code can still race with the default
submission code; we functionally don't care who wins the race (the GSC
load doesn't need any state), but since the whole point of the separate
worker is to make the main thread faster, we prefer the default
submission code to run first. Therefore, make an exception for driver
probe and only and start the gsc load from uc_init_late.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-3-daniele.ceraolospurio@intel.com
If we unload the driver and wedge before the GSC worker is complete,
the worker will hit an error on its submission to the GSC engine and
then exit. This is hard to hit for a user, but it is reproducible
with skipping selftests. The error is handled gracefully by the
worker, so there are no functional issues, but we still end up with
an error message in dmesg, which is something we want to avoid as
this is a supported scenario. We could modify the worker to better
handle a wedging occurring during its execution, but that gets
complicated for a couple of reasons:
- We do want the error on runtime wedging, because there are
implications for subsystems outside of GT (i.e., PXP, HDCP), it's
only the error on driver unload that we want to silence.
- The worker is responsible for multiple submissions (GSC FW load,
HuC auth, SW proxy), so all of those will have to be adapted to
handle the wedged_on_fini scenario.
Therefore, it's much simpler to just wait for the worker to be done
before wedging on driver removal, also considering that the worker
will likely already be idle in the great majority of non-selftest
scenarios.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-2-daniele.ceraolospurio@intel.com