Pull more drm fixes from Dave Airlie:
"This is the fixes for the last couple of weeks for i915 and last 3
weeks for amdgpu, lots of them but pretty scattered around and all
pretty small.
amdgpu:
- SR-IOV fixes
- DCN 3.2 fixes
- DC mclk handling fixes
- eDP fixes
- SubVP fixes
- HDCP regression fix
- DSC fixes
- DC FP fixes
- DCN 3.x fixes
- Display flickering fix when switching between vram and gtt
- Z8 power saving fix
- Fix hang when skipping modeset
- GPU reset fixes
- Doorbell fix when resizing BARs
- Fix spurious warnings in gmc
- Locking fix for AMDGPU_SCHED IOCTL
- SR-IOV fix
- DCN 3.1.4 fix
- DCN 3.2 fix
- Fix job cleanup when CS is aborted
i915:
- skl pipe source size check
- mtl transcoder mask fix
- DSI power on sequence fix
- GuC versioning corner case fix"
* tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drm: (48 commits)
drm/amdgpu: drop redundant sched job cleanup when cs is aborted
drm/amd/display: filter out invalid bits in pipe_fuses
drm/amd/display: Change default Z8 watermark values
drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV
drm/amdgpu: add a missing lock for AMDGPU_SCHED
drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()
drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini
drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini
drm/amdgpu: Enable doorbell selfring after resize FB BAR
drm/amdgpu: Use the default reset when loading or reloading the driver
drm/amdgpu: Fix mode2 reset for sienna cichlid
drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep()
drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_info
drm/i915/guc: Actually return an error if GuC version range check fails
drm/amd/display: Lowering min Z8 residency time
drm/amd/display: fix flickering caused by S/G mode
drm/amd/display: Set min_width and min_height capability for DCN30
drm/amd/display: Isolate remaining FPU code in DCN32
drm/amd/display: Update bounding box values for DCN321
drm/amd/display: Do not clear GPINT register when releasing DMUB from reset
...
When reduced version firmware files were added (matching major
component being the only strict requirement), the minor version was
still tracked and a notification reported if it was older. However,
the patch version should really be tracked as well for the same
reasons. The KMD can work without the change but if the effort has
been taken to release a new firmware with the change then there must
be a valid reason for doing so - important bug fix, security fix, etc.
And in that case it would be good to alert the user if they are
missing out on that new fix.
v2: Use correct patch version number and drop redunant debug print
(review by Daniele / CI results).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230504202252.1104212-2-John.C.Harrison@Intel.com
The GSC notifies us of a proxy request via the HECI2 interrupt. The
interrupt must be enabled both in the HECI layer and in our usual gt irq
programming; for the latter, the interrupt is enabled via the same enable
register as the GSC CS, but it does have its own mask register. When the
interrupt is received, we also need to de-assert it in both layers.
The handling of the proxy request is deferred to the same worker that we
use for GSC load. New flags have been added to distinguish between the
init case and the proxy interrupt.
v2: Make sure not to set the reset bit when enabling/disabling the GSC
interrupts, fix defines (Alan)
v3: rebase on proxy status register check
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230502163854.317653-5-daniele.ceraolospurio@intel.com
The GSC uC needs to communicate with the CSME to perform certain
operations. Since the GSC can't perform this communication directly
on platforms where it is integrated in GT, i915 needs to transfer the
messages from GSC to CSME and back.
The proxy flow is as follow:
1 - i915 submits a request to GSC asking for the message to CSME
2 - GSC replies with the proxy header + payload for CSME
3 - i915 sends the reply from GSC as-is to CSME via the mei proxy
component
4 - CSME replies with the proxy header + payload for GSC
5 - i915 submits a request to GSC with the reply from CSME
6 - GSC replies either with a new header + payload (same as step 2,
so we restart from there) or with an end message.
After GSC load, i915 is expected to start the first proxy message chain,
while all subsequent ones will be triggered by the GSC via interrupt.
To communicate with the CSME, we use a dedicated mei component, which
means that we need to wait for it to bind before we can initialize the
proxies. This usually happens quite fast, but given that there is a
chance that we'll have to wait a few seconds the GSC work has been moved
to a dedicated WQ to not stall other processes.
v2: fix code style, includes and variable naming (Alan)
v3: add extra check for proxy status, fix includes and comments
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230502163854.317653-4-daniele.ceraolospurio@intel.com
The documentation is closer to not being kernel-doc, so just drop the
kernel-doc markers.
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:27: warning: Function parameter or member 'size' not described in '__guc_capture_bufstate'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:27: warning: Function parameter or member 'data' not described in '__guc_capture_bufstate'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:27: warning: Function parameter or member 'rd' not described in '__guc_capture_bufstate'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:27: warning: Function parameter or member 'wr' not described in '__guc_capture_bufstate'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'link' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'is_partial' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'eng_class' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'eng_inst' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'guc_id' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'lrca' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:59: warning: Function parameter or member 'reginfo' not described in '__guc_capture_parsed_output'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:62: warning: wrong kernel-doc identifier on line:
* struct guc_debug_capture_list_header / struct guc_debug_capture_list
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:80: warning: wrong kernel-doc identifier on line:
* struct __guc_mmio_reg_descr / struct __guc_mmio_reg_descr_group
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:105: warning: wrong kernel-doc identifier on line:
* struct guc_state_capture_header_t / struct guc_state_capture_t /
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:163: warning: Function parameter or member 'is_valid' not described in '__guc_capture_ads_cache'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:163: warning: Function parameter or member 'ptr' not described in '__guc_capture_ads_cache'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:163: warning: Function parameter or member 'size' not described in '__guc_capture_ads_cache'
drivers/gpu/drm/i915/gt/uc/guc_capture_fwif.h:163: warning: Function parameter or member 'status' not described in '__guc_capture_ads_cache'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'marker' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'read_ptr' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'write_ptr' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'size' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'sampled_write_ptr' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'wrap_offset' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'flush_to_file' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'buffer_full_cnt' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'reserved' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'flags' not described in 'guc_log_buffer_state'
drivers/gpu/drm/i915/gt/uc/intel_guc_fwif.h:491: warning: Function parameter or member 'version' not described in 'guc_log_buffer_state'
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/9c210d53fdbd6da5fac42e435855d269504919d7.1683041799.git.jani.nikula@intel.com
GuC based register dumps in error capture logs were basically broken
for virtual engines. This can be seen in igt@gem_exec_balancer@hang:
[IGT] gem_exec_balancer: starting subtest hang
[drm] GPU HANG: ecode 12:4:e1524110, in gem_exec_balanc [6388]
[drm] GT0: GUC: No register capture node found for 0x1005 / 0xFEDC311D
[drm] GPU HANG: ecode 12:4:00000000, in gem_exec_balanc [6388]
[IGT] gem_exec_balancer: exiting, ret=0
The test causes a hang on both engines of a virtual engine context.
The engine instance zero hang gets a valid error capture but the
non-instance-zero hang does not.
Fix that by scanning through the list of pending register captures
when a hang notification for a virtual engine is received. That way,
the hang can be assigned to the correct physical engine prior to
starting the error capture process. So later on, when the error capture
handler tries to find the engine register list, it looks for one on
the correct engine.
Also, sneak in a missing blank line before a comment in the node
search code.
v2: Fix null pointer deref on non-GuC platforms.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230428185636.457407-5-John.C.Harrison@Intel.com
GPU accumulates the context runtime in a 32 bit counter - CTX_TIMESTAMP
in the context image. This value is saved/restored on context switches.
KMD accumulates these values into a 64 bit counter taking care of any
overflows as needed. This count provides the basis for client specific
busyness in the fdinfo interface.
KMD accumulation happens just before the context is unpinned and when
context switches out. This works for execlist back-end since execlist
scheduling has visibility into context switches. With GuC mode, KMD does
not have visibility into context switches and this counter is
accumulated only when context is unpinned. Context is unpinned once the
context scheduling is successfully disabled. Disabling context
scheduling is an asynchronous operation. Also if a context is servicing
frequent requests, scheduling may never be disabled on it.
For GuC mode, since updates to the context runtime may be delayed, add
hooks to update the context runtime in a worker thread as well as when
a user queries for it.
Limitation:
- If a context is never switched out or runs for a long period of time,
the runtime value of CTX_TIMESTAMP may never be updated, so the
counter value may be unreliable. This patch does not support such
cases. Such support must be available from the GuC FW and it is WIP.
This patch is an extract from previous work authored by John/Umesh here -
https://patchwork.freedesktop.org/patch/496441/?series=105085&rev=4
v2: (Ashutosh)
- Drop COPS_RUNTIME_ACTIVE_TOTAL
- s/guc_context_update_clks/__guc_context_update_stats
- Pin context before accessing in guc_timestamp_ping
- In guc_context_unpin, use spinlock to serialize access to runtime stats
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Co-developed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230427224705.2785566-2-umesh.nerlige.ramappa@intel.com
SLPC enables use of efficient freq at init by default. It is
possible for GuC to request frequencies that are higher than
the 'software' max if user has set it lower than the efficient
level.
Scenarios/tests that require strict fixing of freq below the efficient
level will need to disable it through this interface.
v2: Keep just one interface to toggle sysfs. With this, user will
be completely responsible for toggling efficient frequency if need
be. There will be no implicit disabling when user sets min < RP1 (Ashutosh)
v3: Remove unused label, review comments (Ashutosh)
v4: Toggle efficient freq usage in SLPC selftest and checkpatch fixes
v5: Review comments (Andi) and add a separate patch for selftest updates
Fixes: 95ccf312a1 ("drm/i915/guc/slpc: Allow SLPC to use efficient frequency")
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230426003942.1924347-1-vinay.belgaumkar@intel.com
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.4-rc1.
Once again, a busy development cycle, with lots of changes happening
in the driver core in the quest to be able to move "struct bus" and
"struct class" into read-only memory, a task now complete with these
changes.
This will make the future rust interactions with the driver core more
"provably correct" as well as providing more obvious lifetime rules
for all busses and classes in the kernel.
The changes required for this did touch many individual classes and
busses as many callbacks were changed to take const * parameters
instead. All of these changes have been submitted to the various
subsystem maintainers, giving them plenty of time to review, and most
of them actually did so.
Other than those changes, included in here are a small set of other
things:
- kobject logging improvements
- cacheinfo improvements and updates
- obligatory fw_devlink updates and fixes
- documentation updates
- device property cleanups and const * changes
- firwmare loader dependency fixes.
All of these have been in linux-next for a while with no reported
problems"
* tag 'driver-core-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (120 commits)
device property: make device_property functions take const device *
driver core: update comments in device_rename()
driver core: Don't require dynamic_debug for initcall_debug probe timing
firmware_loader: rework crypto dependencies
firmware_loader: Strip off \n from customized path
zram: fix up permission for the hot_add sysfs file
cacheinfo: Add use_arch[|_cache]_info field/function
arch_topology: Remove early cacheinfo error message if -ENOENT
cacheinfo: Check cache properties are present in DT
cacheinfo: Check sib_leaf in cache_leaves_are_shared()
cacheinfo: Allow early level detection when DT/ACPI info is missing/broken
cacheinfo: Add arm64 early level initializer implementation
cacheinfo: Add arch specific early level initializer
tty: make tty_class a static const structure
driver core: class: remove struct class_interface * from callbacks
driver core: class: mark the struct class in struct class_interface constant
driver core: class: make class_register() take a const *
driver core: class: mark class_release() as taking a const *
driver core: remove incorrect comment for device_create*
MIPS: vpe-cmp: remove module owner pointer from struct class usage.
...
On dGfx, the PL1 power limit being enabled and set to a low value results
in a low GPU operating freq. It also negates the freq raise operation which
is done before GuC firmware load. As a result GuC firmware load can time
out. Such timeouts were seen in the GL #8062 bug below (where the PL1 power
limit was enabled and set to a low value). Therefore disable the PL1 power
limit when allowed by HW when loading GuC firmware.
v2:
- Take mutex (to disallow writes to power1_max) across GuC reset/fw load
- Add hwm_power_max_restore to error return code path
v3 (Jani N):
- Add/remove explanatory comments
- Function renames
- Type corrections
- Locking annotation
v4:
- Don't hold the lock across GuC reset (Rodrigo)
- New locking scheme (suggested by Rodrigo)
- Eliminate rpm_get in power_max_disable/restore, not needed (Tvrtko)
v5:
- Fix uninitialized pl1en variable compile warning reported by kernel
build robot by creating new err_rps label
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/8062
Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230420164041.1428455-3-ashutosh.dixit@intel.com
This patch implements Wa_22016122933.
In MTL, memory writes initiated by the Media tile update the whole
cache line, even for partial writes. This creates a coherency
problem for cacheable memory if both CPU and GPU are writing data
to different locations within a single cache line.
This patch circumvents the issue by making CPU/GPU shared memory
uncacheable (WC on CPU side, and PAT index 2 for GPU). Additionally,
it ensures that CPU writes are visible to the GPU with an
intel_guc_write_barrier().
While fixing the CTB issue, we noticed some random GSC firmware
loading failure because the share buffers are cacheable (WB) on CPU
side but uncached on GPU side. To fix these issues we need to map
such shared buffers as WC on CPU side. Since such allocations are
not all done through GuC allocator, to avoid too many code changes,
the i915_coherent_map_type() is now hard coded to return WC for MTL.
v2: Simplify the commit message(Matt).
BSpec: 45101
Signed-off-by: Fei Yang <fei.yang@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230424182902.3663500-3-fei.yang@intel.com
When HuC is loaded by GSC, there is no header definition for the kernel
to look at and firmware is just handed to GSC. However when reading the
version, it should still check the size of the blob to guarantee it's not
incurring into out-of-bounds array access.
If firmware is smaller than expected, the following message is now
printed:
# echo boom > /lib/firmware/i915/dg2_huc_gsc.bin
# dmesg | grep -i huc
[drm] GT0: HuC firmware i915/dg2_huc_gsc.bin: invalid size: 5 < 184
[drm] *ERROR* GT0: HuC firmware i915/dg2_huc_gsc.bin: fetch failed -ENODATA
...
Even without this change the size, header and signature are still
checked by GSC when loading, so this only avoids the out-of-bounds array
access.
Fixes: a7b516bd98 ("drm/i915/huc: Add fetch support for gsc-loaded HuC binary")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413200349.3492571-1-lucas.demarchi@intel.com
(cherry picked from commit adfbae9ffe)
Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
When HuC is loaded by GSC, there is no header definition for the kernel
to look at and firmware is just handed to GSC. However when reading the
version, it should still check the size of the blob to guarantee it's not
incurring into out-of-bounds array access.
If firmware is smaller than expected, the following message is now
printed:
# echo boom > /lib/firmware/i915/dg2_huc_gsc.bin
# dmesg | grep -i huc
[drm] GT0: HuC firmware i915/dg2_huc_gsc.bin: invalid size: 5 < 184
[drm] *ERROR* GT0: HuC firmware i915/dg2_huc_gsc.bin: fetch failed -ENODATA
...
Even without this change the size, header and signature are still
checked by GSC when loading, so this only avoids the out-of-bounds array
access.
Fixes: a7b516bd98 ("drm/i915/huc: Add fetch support for gsc-loaded HuC binary")
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230413200349.3492571-1-lucas.demarchi@intel.com
UAPI Changes:
- (Build-time only, should not have any impact)
drm/i915/uapi: Replace fake flex-array with flexible-array member
"Zero-length arrays as fake flexible arrays are deprecated and we are
moving towards adopting C99 flexible-array members instead."
This is on core kernel request moving towards GCC 13.
Driver Changes:
- Fix context runtime accounting on sysfs fdinfo for heavy workloads (Tvrtko)
- Add support for OA media units on MTL (Umesh)
- Add new workarounds for Meteorlake (Daniele, Radhakrishna, Haridhar)
- Fix sysfs to read actual frequency for MTL and Gen6 and earlier
(Ashutosh)
- Synchronize i915/BIOS on C6 enabling on MTL (Vinay)
- Fix DMAR error noise due to GPU error capture (Andrej)
- Fix forcewake during BAR resize on discrete (Andrzej)
- Flush lmem contents after construction on discrete (Chris)
- Fix GuC loading timeout on systems where IFWI programs low boot
frequency (John)
- Fix race condition UAF in i915_perf_add_config_ioctl (Min)
- Sanitycheck MMIO access early in driver load and during forcewake
(Matt)
- Wakeref fixes for GuC RC error scenario and active VM tracking (Chris)
- Cancel HuC delayed load timer on reset (Daniele)
- Limit double GT reset to pre-MTL (Daniele)
- Use i915 instead of dev_priv insied the file_priv structure (Andi)
- Improve GuC load error reporting (John)
- Simplify VCS/BSD engine selection logic (Tvrtko)
- Perform uc late init after probe error injection (Andrzej)
- Fix format for perf_limit_reasons in debugfs (Vinay)
- Create per-gt debugfs files (Andi)
- Documentation and kerneldoc fixes (Nirmoy, Lee)
- Selftest improvements (Fei, Jonathan)
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/ZC6APj/feB+jBf2d@jlahtine-mobl.ger.corp.intel.com
We need the fixes in here for testing, as well as the driver core
changes for documentation updates to build on.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The WA states that we need to alert the GSC FW before doing a GSC engine
reset and then wait for 200ms. The GuC owns engine reset, so on the i915
side we only need to apply this for full GT reset.
Given that we do full GT resets in the resume paths to cleanup the HW
state and that a long wait in those scenarios would not be acceptable,
a faster path has been introduced where, if the GSC is idle, we try first
to individually reset the GuC and all engines except the GSC and only fall
back to full reset if that fails.
Note: according to the WA specs, if the GSC is idle it should be possible
to only wait for the uC wakeup time (~15ms) instead of the whole 200ms.
However, the GSC FW team have mentioned that the wakeup time can change
based on other things going on in the HW and pcode, so a good security
margin would be required. Given that when the GSC is idle we already
skip the wait & reset entirely and that this reduced wait would still
likely be too long to use in resume paths, it's not worth adding support
for this reduced wait.
v2: add comment to explain why it is safe to skip the GSC reset (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230323231857.2194435-2-daniele.ceraolospurio@intel.com
There are multiple ways in which the GuC load can fail. The driver was
reporting the status register as is, but not everyone can read the
matrix unfiltered. So add decoding of the common error cases.
Also, remove the comment about interrupt based load completion
checking being not recommended. The interrupt was removed from the GuC
firmware some time ago so it is no longer an option anyway. While at
it, also abort the timeout if a known error code is reported. No need
to keep waiting if the GuC has already given up the load.
v2: Fix mis-matched case and confusing 'success' variable (Daniele).
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316220632.3312218-2-John.C.Harrison@Intel.com
Add function that takes care of sending command to gsc cs. We start
of with allocation of memory for our command intel_hdcp_gsc_message that
contains gsc cs memory header as directed in specs followed by the
actual payload hdcp message that we want to send.
Spec states that we need to poll pending bit of response header around
20 times each try being 50ms apart hence adding that to current
gsc_msg_send function
Also we use the same function to take care of both sending and receiving
hence no separate function to get the response.
--v4
-Create common function to fill in gsc_mtl_header [Alan]
-define host session bitmask [Alan]
--v5
-use i915 directly instead of gt->i915 [Alan]
-No need to make fields NULL as we are already
using kzalloc [Alan]
--v8
-change mechanism to reuse the same memory for one hdcp session[Alan]
-fix header ordering
-add comments to explain flags and host session mask [Alan]
--v9
-remove gem obj from hdcp message as we can use
i915_vma_unpin_and_release [Alan]
-move hdcp message allocation and deallocation from hdcp2_enable and
hdcp2_disable to init and teardown of HDCP [Alan]
--v10
-remove unnecessary i915_vma_unpin [Alan]
--v11
-fix comment style [Uma]
Cc: Ankit Nautiyal <ankit.k.nautiyal@intel.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Pervin Teres <alan.previn.teres.alexis@intel.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Uma Shankar <uma.shankar@intel.com>
Signed-off-by: Uma Shankar <uma.shankar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230316092927.668980-6-suraj.kandpal@intel.com
To support multi-GT configurations, we need to generate
independent debug files for each GT.
To achieve this create a separate directory for each GT under the
debugfs directory. For instance, in a system with two GTs, the
debugfs structure would look like this:
/sys/kernel/debug/dri
└── 0
├── gt0
│ ├── drpc
│ ├── engines
│ ├── forcewake
│ ├── frequency
│ └── rps_boost
└── gt1
: ├── drpc
: ├── engines
: ├── forcewake
├── frequency
└── rps_boost
Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230318203616.183765-2-andi.shyti@linux.intel.com
The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508
Fixes: 8f70f1ec58 ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310061339.2495416-2-badal.nilawar@intel.com
(cherry picked from commit 038a24835a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The Wa_14017073508 require to send Media Busy/Idle mailbox while
accessing Media tile. As of now it is getting handled while __gt_unpark,
__gt_park. But there are various corner cases where forcewakes are taken
without __gt_unpark i.e. without sending Busy Mailbox especially during
register reads. Forcewakes are taken without busy mailbox leads to
GPU HANG. So bringing mailbox calls under forcewake calls are no feasible
option as forcewake calls are atomic and mailbox calls are blocking.
The issue already fixed in B step so disabling MC6 on A step and
reverting previous commit which handles Wa_14017073508
Fixes: 8f70f1ec58 ("drm/i915/mtl: Add Wa_14017073508 for SAMedia")
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Anshuman Gupta <anshuman.gupta@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230310061339.2495416-2-badal.nilawar@intel.com
The CI results for the 'fast request' patch set (enables error return
codes for fire-and-forget H2G messages) hit an issue with the KMD
sending context submission requests on an invalid context. That was
caused by a fault injection probe failing the context creation of a
kernel context. However, there was no return code checking on any of
the kernel context registration paths. So the driver kept going and
tried to use the kernel context for the record defaults process.
This would not cause any actual problems. The invalid requests would
be rejected by GuC and ultimately the start up sequence would
correctly wedge due to the context creation failure. But fixing the
issue correctly rather ignoring it means we won't get CI complaining
when the fast request patch lands and enables the extra error checking.
So fix it by checking for errors and aborting as appropriate when
creating kernel contexts. While at it, clean up some other submission
init related failure cleanup paths. Also, rename guc_init_lrc_mapping
to guc_init_submission as the former name hasn't been valid in a long
time.
v2: Add another wrapper to keep the flow balanced (Daniele)
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230217223308.3449737-3-John.C.Harrison@Intel.com
The GSC FW load is a slow process (up to 250 ms), so we defer it to a
dedicated worker to avoid stalling the init flow for that long. However,
we currently start this worker before the HW init is complete, so there
is a chance that the GSC loading code submits to the HW before the
engine initialization has completed. We can easily fix this by starting
the thread later in the gt_resume flow.
From this later spot, the GSC code can still race with the default
submission code; we functionally don't care who wins the race (the GSC
load doesn't need any state), but since the whole point of the separate
worker is to make the main thread faster, we prefer the default
submission code to run first. Therefore, make an exception for driver
probe and only and start the gsc load from uc_init_late.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-3-daniele.ceraolospurio@intel.com
If we unload the driver and wedge before the GSC worker is complete,
the worker will hit an error on its submission to the GSC engine and
then exit. This is hard to hit for a user, but it is reproducible
with skipping selftests. The error is handled gracefully by the
worker, so there are no functional issues, but we still end up with
an error message in dmesg, which is something we want to avoid as
this is a supported scenario. We could modify the worker to better
handle a wedging occurring during its execution, but that gets
complicated for a couple of reasons:
- We do want the error on runtime wedging, because there are
implications for subsystems outside of GT (i.e., PXP, HDCP), it's
only the error on driver unload that we want to silence.
- The worker is responsible for multiple submissions (GSC FW load,
HuC auth, SW proxy), so all of those will have to be adapted to
handle the wedged_on_fini scenario.
Therefore, it's much simpler to just wait for the worker to be done
before wedging on driver removal, also considering that the worker
will likely already be idle in the great majority of non-selftest
scenarios.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230223172120.3304293-2-daniele.ceraolospurio@intel.com
Register 0x9424 is not replicated on any platform, so it shouldn't be
declared with REG_MCR(). Declaring it with _MMIO() is basically
duplicate of the GEN7 version, so just remove the GEN8 and change all
the callers to use the right functions.
Old versions of the gen8 bspec page used to contain a table with MCR
registers, apparently implying 0x9400 - 0x94ff registers were
replicated. However that table went away and there is no information
related to the ranges for gen8 anymore. Moreover the current behavior of
the driver wouldn't do anything special for 0x9424 since there is no
equivalent table in intel_gt_mcr.c: the driver would just fallback to
intel_uncore_{read,write}(). Therefore, do not care about the possible
special case for gen8 and just use the register as non-MCR for all the
platforms.
One place doing read + write is also converted to intel_uncore_rmw().
v2: Reword commit message adding the justification wrt gen8
Fixes: a9e69428b1 ("drm/i915: Define MCR registers explicitly")
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Cc: Matt Atwood <matthew.s.atwood@intel.com>
Cc: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230206165410.3056073-1-lucas.demarchi@intel.com
(cherry picked from commit 869bace73a)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>