The GSC notifies us of a proxy request via the HECI2 interrupt. The
interrupt must be enabled both in the HECI layer and in our usual gt irq
programming; for the latter, the interrupt is enabled via the same enable
register as the GSC CS, but it does have its own mask register. When the
interrupt is received, we also need to de-assert it in both layers.
The handling of the proxy request is deferred to the same worker that we
use for GSC load. New flags have been added to distinguish between the
init case and the proxy interrupt.
v2: rename irq define, fix include ordering (Alan)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: Suraj Kandpal <suraj.kandpal@intel.com>
Reviewed-by: Alan Previn <alan.previn.teres.alexis@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240117182621.2653049-3-daniele.ceraolospurio@intel.com
Recommendation is to read FUSE4 register to check if WMTP has been
enabled/disabled by HW. If enabled we don't need to do anything special,
however if disabled recommendation is to also disable the WMTP mode in
the FF_SLICE_CS_CHICKEN2 register, falling back to thread-group and
mid-batch preemption only. However on Linux, the per-context CS_CHICKEN1
is how userspace controls pre-emption, so instead use the default lrc to
disable WMTP using CS_CHICKEN1, if disabled by HW. Userspace is still
free to set CS_CHICKEN1 to whatever they want later.
v2: remove redundant version check and also add descriptive name(Matt)
v3: remove usage of REG_FIELD_GET(Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20240104182615.21327-1-nirmoy.das@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Disable dynamic HW load balancing of compute resource assignment
to engines and instead enabled fixed mode of mapping compute
resources to engines on all platforms with more than one compute
engine.
By default enable only one CCS engine with all compute slices
assigned to it. This is the desired configuration for common
workloads.
PVC platform supports only the fixed CCS mode (workaround 16016805146).
v2: Rebase, make it platform agnostic
v3: Minor code refactoring
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Skip the init/start/stop GuC PC functions and toggle C6 using
register writes instead. Also request max possible frequency
as dynamic freq management is disabled.
v2: Fix compile warning
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Starting GT freq is usually RPn. Raising freq to RP0 will
help speed up GuC load times. As an example, this data was
collected on DG2-
GuC Load time @RPn ~ 41 ms
GuC Load time @RP0 ~ 11 ms
v2: Raise GT freq before hwconfig init. This will speed up
both HuC and GuC loads. Address review comments (Rodrigo).
Also add a small usleep after requesting frequency which gives
pcode some time to react.
v3: Address checkpatch issue
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This workaround is primarily implemented by the BIOS. However if the
BIOS applies the workaround it will reserve a small piece of our DSM
(which should be at the top, right below the WOPCM); we just need to
keep that region reserved so that nothing else attempts to re-use it.
v2 (Gustavo):
- Check for NULL media_gt
- Mask bits [5:0] to avoid potential issues in future platforms
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231102124855.1940491-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add L3SQCREG5 as part of HW recommended settings. The recommended value
in Bspec is 00e0007f. For Xe2-LPG, bits 23:21 don't exist anymore, but
it's confirmed with HW engineers that setting them doesn't do anything.
They still exist on the media GT, Xe2-LPM, but they are already they are
already set as per HW default value. So for Xe2 platform, the only bits
that need to be set are 9:0 since HW's default is 0x1ff and the
recommended value is 0x7f.
Unlike most registers, which have the same relative offset on both
the primary and media GT, this register has a different base offset
on the media GT.
On MTL the register only exists for the primary (graphics) GT, so
there's no need to program it on the media gt. Also, it's part of the
RCS engine's context, so it needs to be added as a LRC workaround.
Bspec: 72161
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231024220739.224251-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the initial collection of gt/engine/lrc workarounds.
While at it, add some newlines around the platform/IP comments to make
them consistent across all workarounds.
v2:
- FF_MODE is an MCR register (Matt Roper)
- Group 18032247524 with other Xe2 workarounds (Matt Roper)
- Move WA changing PSS_CHICKEN to lrc_was[] as for Xe2 that register
is part of the render context image (Matt Roper)
- Apply WA 16020518922 only on render engine (Matt Roper)
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231024220739.224251-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Force indirect state sampler data to only be in the dynamic state pool,
which is more convienent for the UMD. Behavior change mirrors similar
change for i915 in commit 16fc9c08f0 ("drm/i915: disable sampler
indirect state in bindless heap")
v2: split out per engine tuning into separate patch, commit message
(Lucas)
v3: rebase
v4: Change to match render only, g.ver 1200 to 1271 (MattR)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There are a set of engine group busyness counters provided by HW which are
perfect fit to be exposed via PMU perf events.
BSPEC: 46559, 46560, 46722, 46729, 52071, 71028
events can be listed using:
perf list
xe_0000_03_00.0/any-engine-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/copy-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/interrupts/ [Kernel PMU event]
xe_0000_03_00.0/media-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/render-group-busy-gt0/ [Kernel PMU event]
and can be read using:
perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
time counts unit events
1.001139062 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
2.003294678 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
3.005199582 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
4.007076497 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
5.008553068 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
6.010531563 43520 ns xe_0000_8c_00.0/render-group-busy-gt0/
7.012468029 44800 ns xe_0000_8c_00.0/render-group-busy-gt0/
8.013463515 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
9.015300183 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
10.017233010 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
10.971934120 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
The pmu base implementation is taken from i915.
v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.
v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.
v4: minor nits.
v5: take forcewake when accessing the OAG registers
v6:
1. drop engine_busyness_sample_type
2. update UAPI documentation
v7:
1. update UAPI documentation
2. drop MEDIA_GT specific change for media busyness counter.
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe2 uses the same steering control register and steering semaphore
register as MTL. As with recent platforms, group/instance 0,0 is
sufficient to target a non-terminated instance for most classes of MCR
registers; the only types of ranges that need to consider platform
fusing to find a non-terminated instance are SLICE/DSS ranges and a new
SQIDI_PSMI type of range.
Note that the range of valid bits in XE2_NODE_ENABLE_MASK may be reduced
for some Xe2 SKUs. However the lowest bits are always valid and only
the lowest instance is obtained via __ffs(), so there's no need to
complicate the masking with extra platform/subplatform checks.
Also note that Wa_14017387313 suggests skipping MCR lock acquisition
around GAM and GAMWKR registers to prevent MCR register accesses in an
interrupt handler from deadlocking when the steering semaphore is
already held outside the interrupt context. At this time Xe never
issues MCR accesses from within an interrupt handler so the workaround
is not currently needed.
v2:
- [0x008700-0x0087FF] range to extend up to 0x887F (Matt Attwood)
- [0x00EF00-0x00F4FF] -> [0x00F000, 0xFFFF] to follow latest
bspec version (Bala)
Bspec: 71185
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The workaround database tells us to set this bit, even though the bspec
indicates the bit doesn't exist on these platforms. Since this is a
write-only register, we also can't read back its value to verify whether
it's actually working or not. For now we'll trust that the workaround
database knows what it's talking about; if not, the hardware will just
ignore the attempt to write to a non-existent bit and it shouldn't cause
any problems.
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230727220920.2291913-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the registers to get C6 residency of MTL SAMedia and
C6 status of MTL gts
v2:
- move register definitions to regs header (Anshuman)
- correct reg definition for mtl rc status
- make idle_status function common (Badal)
v3:
- remove extra line in commit message
- use only media type check in initialization
- use graphics ver check (Anshuman)
v4:
- remove extra lines (Anshuman)
Bspec: 66300
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This adds a handful of workarounds that apply to production steppings of
MTL:
- Wa_14018575942
- Wa_22016670082
- Wa_14017856879
- Wa_18019271663
Wa_22016670082 is currently only applied to the primary GT at the
moment, but may need to be extended to the media GT in the future if a
pending update to the workaround database gets finalized.
OOB workarounds will need to be implemented separately in future patches
for Wa_14016712196, Wa_16018063123, and Wa_18013179988.
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://lore.kernel.org/r/20230608181217.2385932-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The _total_vram_size helper is device based and is not complete.
Teach the helper to be tile aware and add the ability to size
DG1 correctly.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For platforms with GMD_ID registers, the IP stepping should be
determined from the 'revid' field of those registers rather than from
the PCI revid.
The hardware teams have indicated that they plan to keep the revid =>
stepping mapping consistent across all GMD_ID platforms, with major
steppings (A0, B0, C0, etc.) having revids that are multiples of 4, and
minor steppings (A1, A2, A3, etc.) taking the intermediate values. For
now we'll trust that hardware follows through on this plan; if they have
to change direction in the future (e.g., they wind up needing something
like an "A4" that doesn't fit this scheme), we can add a GMD_ID-based
lookup table when the time comes.
v2:
- Set xe->info.platform before finding stepping; the pre-GMD_ID code
relies on this value to pick a lookup table.
v3:
- Also set xe->info.subplatform before picking the stepping for
pre-GMD_ID lookup.
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20230524185952.666158-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
guc_mmio_regset_write() had a flags for the registers to be added to the
GuC's regset list. The only register actually using that was RCU_MODE,
but it was setting the flags to a bogus value. From
struct xe_guc_fwif.h,
#define GUC_REGSET_MASKED BIT(0)
#define GUC_REGSET_MASKED_WITH_VALUE BIT(2)
#define GUC_REGSET_RESTORE_ONLY BIT(3)
Cross checking with i915, the only flag to set in RCU_MODE is
GUC_REGSET_MASKED. That can be done automatically from the register, as
long as the definition is correct.
Add the XE_REG_OPTION_MASKED annotation to RCU_MODE and kill the "flags"
field in guc_mmio_regset_write(): guc_mmio_regset_write_one() can decide
that based on the register being passed.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230429062332.354139-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
These should replace the _MMIO() and MCR_REG() from i915, with the goal
of being more extensible, allowing to pass the additional fields for
struct xe_reg and struct xe_reg_mcr. Replace all uses of _MMIO() and
MCR_REG() in xe.
Since the RTP, reg-save-restore and WA infra are not ready to use the
new type, just undef the macro like was done for the i915 types
previously. That conversion will come later.
v2: Remove MEDIA_SOFT_SCRATCH_COUNT/MEDIA_SOFT_SCRATCH re-added by
mistake (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Convert the macro declarations to the equivalent GENMASK and
and bitfield prep for all registers.
v2 (Matt Roper):
- Fix wrong conversion of RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK
- Reorder fields of XEHP_SLICE_UNIT_LEVEL_CLKGATE for consistency
- Simplify CTC_SOURCE_* by only defining CTC_SOURCE_DIVIDE_LOGIC
as REG_BIT(0)
v3: Also remove DOP_CLOCK_GATE_ENABLE that is unused and wrongly defined
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The defines for the registers were brought over from i915 while
bootstrapping the driver. As xe supports TGL and later only, it doesn't
make sense to keep the GEN* prefixes and suffixes in the registers: TGL
is graphics version 12, previously called "GEN12". So drop the prefix
everywhere.
v2:
- Also drop _TGL suffix and reword commit message as suggested
by Matt Roper. While at it, rename VSUNIT_CLKGATE_DIS_TGL to
VSUNIT_CLKGATE2_DIS with the additional "2", so it doesn't clash
with the define for the other register
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Follow up commits will mass-remove the gen prefix/suffix. For GEN6_RC0
and GEN6_RC6 that would make the variable too short and easy to
conflict. So, add "GT_" prefix that is also part of the register name.
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The write of GFX_FLSH_CNTL_GEN6 was inherited from the i915 codebase
where it was used to force a flush of the write-combine buffer in cases
where the GSM/GGTT were mapped as WC. Since Xe never uses WC mappings
of the GGTT, this register write is unnecessary. Furthermore, this
register was removed on Xe_HP-based platforms, so this write winds up
clobbering an unrelated register.
v2:
- Also drop GFX_FLSH_CNTL_GEN6 from the register file now that it's no
longer used. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230418230247.3802438-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
If graphics_desc and media_desc are not specified in a platform's
xe_device_desc, treat this as an indication that the IP version should
be determined from the hardware's GMD_ID register.
Note that leaving media_desc unset for a platform that simply doesn't
have the IP (e.g., PVC) is also okay --- a read of the GMD_ID register
offset will be attempted, but since there's no register at that location
a value of '0' will be returned, effectively disabling media support.
Mapping of version -> IP description is done via a table lookup; this
table will be re-used in future patches for some KUnit testing.
v2:
- Drop dummy structures. NULL can be safely used for both the GMD_ID
cases and the "media not present case."
- Use a table-based lookup of GMD_ID versions rather than a simple
switch statement; the table will allow us to easily perform kunit
testing of all the IP descriptors.
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Cc: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230406235621.1914492-8-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>