Pull drm updates from Dave Airlie:
"There are two external interactions of note, the msm tree pull in some
opp tree, hopefully the opp tree arrives from the same git tree
however it normally does.
There is also a new cgroup controller for device memory, that is used
by drm, so is merging through my tree. This will hopefully help open
up gpu cgroup usage a bit more and move us forward.
There is a new accelerator driver for the AMD XDNA Ryzen AI NPUs.
Then the usual xe/amdgpu/i915/msm leaders and lots of changes and
refactors across the board:
core:
- device memory cgroup controller added
- Remove driver date from drm_driver
- Add drm_printer based hex dumper
- drm memory stats docs update
- scheduler documentation improvements
new driver:
- amdxdna - Ryzen AI NPU support
connector:
- add a mutex to protect ELD
- make connector setup two-step
panels:
- Introduce backlight quirks infrastructure
- New panels: KDB KD116N2130B12, Tianma TM070JDHG34-00,
- Multi-Inno Technology MI1010Z1T-1CP11
bridge:
- ti-sn65dsi83: Add ti,lvds-vod-swing optional properties
- Provide default implementation of atomic_check for HDMI bridges
- it605: HDCP improvements, MCCS Support
xe:
- make OA buffer size configurable
- GuC capture fixes
- add ufence and g2h flushes
- restore system memory GGTT mappings
- ioctl fixes
- SRIOV PF scheduling priority
- allow fault injection
- lots of improvements/refactors
- Enable GuC's WA_DUAL_QUEUE for newer platforms
- IRQ related fixes and improvements
i915:
- More accurate engine busyness metrics with GuC submission
- Ensure partial BO segment offset never exceeds allowed max
- Flush GuC CT receive tasklet during reset preparation
- Some DG2 refactor to fix DG2 bugs when operating with certain CPUs
- Fix DG1 power gate sequence
- Enabling uncompressed 128b/132b UHBR SST
- Handle hdmi connector init failures, and no HDMI/DP cases
- More robust engine resets on Haswell and older
i915/xe display:
- HDCP fixes for Xe3Lpd
- New GSC FW ARL-H/ARL-U
- support 3 VDSC engines 12 slices
- MBUS joining sanitisation
- reconcile i915/xe display power mgmt
- Xe3Lpd fixes
- UHBR rates for Thunderbolt
amdgpu:
- DRM panic support
- track BO memory stats at runtime
- Fix max surface handling in DC
- Cleaner shader support for gfx10.3 dGPUs
- fix drm buddy trim handling
- SDMA engine reset updates
- Fix doorbell ttm cleanup
- RAS updates
- ISP updates
- SDMA queue reset support
- Rework DPM powergating interfaces
- Documentation updates and cleanups
- DCN 3.5 updates
- Use a pm notifier to more gracefully handle VRAM eviction on
suspend or hibernate
- Add debugfs interfaces for forcing scheduling to specific engine
instances
- GG 9.5 updates
- IH 4.4 updates
- Make missing optional firmware less noisy
- PSP 13.x updates
- SMU 13.x updates
- VCN 5.x updates
- JPEG 5.x updates
- GC 12.x updates
- DC FAMS updates
amdkfd:
- GG 9.5 updates
- Logging improvements
- Shader debugger fixes
- Trap handler cleanup
- Cleanup includes
- Eviction fence wq fix
msm:
- MDSS:
- properly described UBWC registers
- added SM6150 (aka QCS615) support
- DPU:
- added SM6150 (aka QCS615) support
- enabled wide planes if virtual planes are enabled (by using two
SSPPs for a single plane)
- added CWB hardware blocks support
- DSI:
- added SM6150 (aka QCS615) support
- GPU:
- Print GMU core fw version
- GMU bandwidth voting for a740 and a750
- Expose uche trap base via uapi
- UAPI error reporting
rcar-du:
- Add r8a779h0 Support
ivpu:
- Fix qemu crash when using passthrough
nouveau:
- expose GSP-RM logging buffers via debugfs
panfrost:
- Add MT8188 Mali-G57 MC3 support
rockchip:
- Gamma LUT support
hisilicon:
- new HIBMC support
virtio-gpu:
- convert to helpers
- add prime support for scanout buffers
v3d:
- Add DRM_IOCTL_V3D_PERFMON_SET_GLOBAL
vc4:
- Add support for BCM2712
vkms:
- line-per-line compositing algorithm to improve performance
zynqmp:
- Add DP audio support
mediatek:
- dp: Add sdp path reset
- dp: Support flexible length of DP calibration data
etnaviv:
- add fdinfo memory support
- add explicit reset handling"
* tag 'drm-next-2025-01-17' of https://gitlab.freedesktop.org/drm/kernel: (1070 commits)
drm/bridge: fix documentation for the hdmi_audio_prepare() callback
doc/cgroup: Fix title underline length
drm/doc: Include new drm-compute documentation
cgroup/dmem: Fix parameters documentation
cgroup/dmem: Select PAGE_COUNTER
kernel/cgroup: Remove the unused variable climit
drm/display: hdmi: Do not read EDID on disconnected connectors
drm/tests: hdmi: Add connector disablement test
drm/connector: hdmi: Do atomic check when necessary
drm/amd/display: 3.2.316
drm/amd/display: avoid reset DTBCLK at clock init
drm/amd/display: improve dpia pre-train
drm/amd/display: Apply DML21 Patches
drm/amd/display: Use HW lock mgr for PSR1
drm/amd/display: Revised for Replay Pseudo vblank control
drm/amd/display: Add a new flag for replay low hz
drm/amd/display: Remove unused read_ono_state function from Hwss module
drm/amd/display: Do not elevate mem_type change to full update
drm/amd/display: Do not wait for PSR disable on vbl enable
drm/amd/display: Remove unnecessary eDP power down
...
Instead of handling the whitelist directly in the GuC ADS
initialization, make it follow the same logic as other engine registers
that are save-restored. Main benefit is that then the SW tracking then
shows it in debugfs and there's no risk of an engine workaround to write
to the same nopriv register that is being passed directly to GuC.
This means that xe_reg_whitelist_process_engine() only has to process
the RTP and convert them to entries for the hwe. With that all the
registers should be covered by xe_reg_sr_apply_mmio() to write to the HW
and there's no special handling in GuC ADS to also add these registers
to the list of registers that is passed to GuC.
Example for DG2:
# cat /sys/kernel/debug/dri/0000\:03\:00.0/gt0/register-save-restore
...
Engine
rcs0
...
REG[0x24d0] clr=0xffffffff set=0x1000dafc masked=no mcr=no
REG[0x24d4] clr=0xffffffff set=0x1000db01 masked=no mcr=no
REG[0x24d8] clr=0xffffffff set=0x0000db1c masked=no mcr=no
...
Whitelist
rcs0
REG[0xdafc-0xdaff]: allow read access
REG[0xdb00-0xdb1f]: allow read access
REG[0xdb1c-0xdb1f]: allow rw access
v2:
- Use ~0u for clr bits so it's just a write (Matt Roper)
- Simplify helpers now that unused slots are not written
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241209232739.147417-6-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
When we decide to kill a job, (from guc_exec_queue_timedout_job), we could
end up with 4 possible scenarios at this starting point of this decision:
1. the guc-captured register-dump is already there.
2. the driver is wedged.mode > 1, so GuC-engine-reset / GuC-err-capture
will not happen.
3. the user has started the driver in execlist-submission mode.
4. the guc-captured register-dump is not ready yet so we force GuC to kill
that context now, but:
A. we don't know yet if GuC will be successful on the engine-reset
and get the guc-err-capture, else kmd will do a manual reset later
OR B. guc will be successful and we will get a guc-err-capture
shortly.
So to accomdate the scenarios of 2 and 4A, we will need to do a manual KMD
capture first(which is not be reliable in guc-submission mode) and decide
later if we need to use that for the cases of 2 or 4A. So this flow is
part of the implementation for this patch.
Provide xe_guc_capture_get_reg_desc_list to get the register dscriptor
list.
Add manual capture by read from hw engine if GuC capture is not ready.
If it becomes ready at later time, GuC sourced data will be used.
Although there may only be a small delay between (1) the check for whether
guc-err-capture is available at the start of guc_exec_queue_timedout_job
and (2) the decision on using a valid guc-err-capture or manual-capture,
lets not take any chances and lock the matching node down so it doesn't
get re-claimed if GuC-Err-Capture subsystem is running out of pre-cached
nodes.
Signed-off-by: Zhanjun Dong <zhanjun.dong@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241004193428.3311145-6-zhanjun.dong@intel.com
The xe_guc_exec_queue_snapshot is not really a GuC internal thing and
is definitely not a GuC CT thing. So give it its own section heading.
The snapshot itself is really a capture of the submission backend's
internal state. Although all it currently prints out is the submission
contexts. So label it as 'Contexts'. If more general state is added
later then it could be change to 'Submission backend' or some such.
Further, everything from the GuC CT section onwards is GT specific but
there was no indication of which GT it was related to (and that is
impossible to work out from the other fields that are given). So add a
GT section heading. Also include the tile id of the GT, because again
significant information.
Lastly, drop a couple of unnecessary line feeds within sections.
v2: Add GT section heading, add tile id to device section.
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20241003004611.2323493-4-John.C.Harrison@Intel.com
A xe_hw_engine_group is a group of hw engines. Two hw engines belong to
the same xe_hw_engine_group if one hw engine cannot make progress while
the other is stuck on a page fault.
Typically, hw engines of the same group share some resources such as EUs,
but this really depends on the hardware configuration of the platforms.
The simple engines partitioning proposed here might be too conservative
but is intended to work for existing platforms. It can be optimized later
if more sets of independent engines are identified.
The hw engine groups are intended to be used in the context of faulting
long-running jobs submissions.
v2: Move to own files, improve error handling (Matt Brost)
v3: Fix build issue reported by CI, improve commit message (Matt Roper)
v4: Fix kernel doc
v5: Add switch case for XE_ENGINE_CLASS_OTHER
Signed-off-by: Francois Dugast <francois.dugast@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240809155156.1955925-2-francois.dugast@intel.com
Enable feature to allow memory reads to take a priority memory path.
This will reduce latency on the read path, but may introduce read after
write (RAW) hazards as read and writes will no longer be ordered.
To avoid RAW hazards, SW can use the MI_MEM_FENCE command or any other
MI command that generates non posted memory writes. This will ensure
data is coherent in memory prior to execution of commands which read
data from memory. RCS,BCS and CCS support this feature.
No pattern identified in KMD that could lead to a hazard.
v2: Modify commit message, enable priority mem read feature for media,
modify version range, modify bspec detail (Matt Roper)
v3: Rebase, fix cramped line-wrapping (jcavitt)
v4: Rebase
v5: Media does not support Priority Mem Read. Modify commit
to reflect the same.
v6: Rebase
Bspec: 60298, 60237, 60187, 60188
Signed-off-by: Pallavi Mishra <pallavi.mishra@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Acked-by: Carl Zhang <carl.zhang@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240731195622.1868401-1-pallavi.mishra@intel.com
Per client engine utilization uses RING_TIMESTAMP to return
drm-total-cycles to the user. Current code uses XE_FW_GT to read this
register on the first available engine in a GT. When testing on DG2, it
is observed that this value is 0 when running test on some engines. To
resolve that, get the hwe domain specific FW for reading the engine
timestamp.
v2:
- update commit message
- use domain specific FW (Matt)
v3:
- Drop check for hwe in the helper (Matt, Michal)
v4:
- checkpatch fixes
v5: Rebase
Fixes: 188ced1e0f ("drm/xe/client: Print runtime to fdinfo")
Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240627235105.2631135-1-umesh.nerlige.ramappa@intel.com
The flags stored in the BO grew over time without following
much a naming pattern. First of all, get rid of the _BIT suffix that was
banned from everywhere else due to the guideline in
drivers/gpu/drm/i915/i915_reg.h that xe kind of follows:
Define bits using ``REG_BIT(N)``. Do **not** add ``_BIT`` suffix to the name.
Here the flags aren't for a register, but it's good practice to keep it
consistent.
Second divergence on names is the use or not of "CREATE". This is
because most of the flags are passed to xe_bo_create*() family of
functions, changing its behavior. However, since the flags are also
stored in the bo itself and checked elsewhere in the code, it seems
better to just omit the CREATE part.
With those 2 guidelines, all the flags are given the form
XE_BO_FLAG_<FLAG_NAME> with the following commands:
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i \
-e "s/XE_BO_\([_A-Z0-9]*\)_BIT/XE_BO_\1/g" \
-e 's/XE_BO_CREATE_/XE_BO_FLAG_/g'
git grep -le "XE_BO_" -- drivers/gpu/drm/xe | xargs sed -i -r \
-e 's/XE_BO_(DEFER_BACKING|SCANOUT|FIXED_PLACEMENT|PAGETABLE|NEEDS_CPU_ACCESS|NEEDS_UC|INTERNAL_TEST|INTERNAL_64K|GGTT_INVALIDATE)/XE_BO_FLAG_\1/g'
And then the defines in drivers/gpu/drm/xe/xe_bo.h are adjusted to
follow the coding style.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240322142702.186529-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Starting on Xe2, the GSCCS engine reset is a 2-step process. When the
driver or the GuC hits the GDRST register, the CS is immediately reset
and a success is reported, but the GSC shim continues its reset in the
background. While the shim reset is ongoing, the CS is able to accept
new context submission, but any commands that require the shim will
be stalled until the reset is completed. This means that we can keep
submitting to the GSCCS as long as we make sure that the preemption
timeout is big enough to cover any delay introduced by the reset; since
the GSC preempt timeout is not tunable at runtime, we only need to check
that the value set in kconfig is big enough (and increase it if it
isn't).
When the shim reset completes, a specific CS interrupt is triggered,
in response to which we need to check the GSCI_TIMER_STATUS register
to see if the reset was successful or not.
Note that the GSCI_TIMER_STATUS register is not power save/restored,
so it gets reset on MC6 entry. However, a reset failure stops MC6,
so in that scenario we're always guaranteed to find the correct value.
Since we can't check the register within interrupt context, the
existing GSC worker has been updated to handle it.
The expected action to take on ER failure is to trigger a driver FLR,
but we still don't support that, so for now we just print an error. A
comment has been added to the code to keep track of the FLR requirement.
v2: Add a check for the initial timeout value (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240304145634.820684-1-daniele.ceraolospurio@intel.com
Recommendation is to read FUSE4 register to check if WMTP has been
enabled/disabled by HW. If enabled we don't need to do anything special,
however if disabled recommendation is to also disable the WMTP mode in
the FF_SLICE_CS_CHICKEN2 register, falling back to thread-group and
mid-batch preemption only. However on Linux, the per-context CS_CHICKEN1
is how userspace controls pre-emption, so instead use the default lrc to
disable WMTP using CS_CHICKEN1, if disabled by HW. Userspace is still
free to set CS_CHICKEN1 to whatever they want later.
v2: remove redundant version check and also add descriptive name(Matt)
v3: remove usage of REG_FIELD_GET(Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20240104182615.21327-1-nirmoy.das@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
When interrupts are delivered using memory based mechanism, engines
will write status to the report page at the offset (in bytes) that
corresponds to their interrupt bit from the GT_INTR_DW register.
Add engine interrupt offset definitions to engine info as we will
need this to process memory based interrupts.
Bspec: 46149, 50829, 50844
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231214185955.1791-6-michal.wajdeczko@intel.com
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Disable dynamic HW load balancing of compute resource assignment
to engines and instead enabled fixed mode of mapping compute
resources to engines on all platforms with more than one compute
engine.
By default enable only one CCS engine with all compute slices
assigned to it. This is the desired configuration for common
workloads.
PVC platform supports only the fixed CCS mode (workaround 16016805146).
v2: Rebase, make it platform agnostic
v3: Minor code refactoring
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>