Despite its name, MI_FLUSH_DW instruction can write an immediate value
of either dword size or qword size, depending on the 'length' field of
the instruction. Since "length" excludes the first two dwords of the
instruction, a value of 2 in the length field implies a dword write and
a value of 3 implies a qword write. Even in cases where the flush
instruction's post-sync operation is set to "no write" we're still
expected to size the overall instruction as if we were doing a dword or
qword write (i.e., a length of 1 shouldn't be used on modern platforms).
Rather than baking a size of "1" into the #define and then adding
another unexplained "+ 1" at all the spots where the definition gets
used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
that should be OR'd into the instruction header to make it more explicit
what behavior we're requesting.
Bspec: 60229
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-9-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe driver currently supports 22-bit addresses for MMIO access.
Future platforms will have additional MMIO extension with
larger address spaces, and to access them, the driver will
have to support wider address representation.
Please note that while the XE_REG macro is used for MMIO access,
XE_REG_EXT macro will be used for MMIO-extension access.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Moti Haimovski <mhaimovski@habana.ai>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Force indirect state sampler data to only be in the dynamic state pool,
which is more convienent for the UMD. Behavior change mirrors similar
change for i915 in commit 16fc9c08f0 ("drm/i915: disable sampler
indirect state in bindless heap")
v2: split out per engine tuning into separate patch, commit message
(Lucas)
v3: rebase
v4: Change to match render only, g.ver 1200 to 1271 (MattR)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It was reading (base) + 0x8c but that is not a valid register
and instead it should read (base) + 0x68.
So here reading the correct register and removing the wrong and
duplicated.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Set bits 30 and 31 of XY_FAST_COPY_BLT's dword1 for XeHP and above.
Destination or source being Y-Major is selected on dword0 and there's
nothing to set on dword1. According to the bspec for Xe2,
"Behavior is undefined when programmed the value 0". Also for XeHP,
the only value allowed in those bits is 0b11, not being possible to
select "Legacy Tile-Y" anymore, only the newer Tile4.
So, unconditionally set those bits for graphics IP 12.50 and above.
v2: Reword commit message and extend it to graphics version >= 12.50
(Matt Roper)
Bspec: 57567
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Haridhar Kalvala <haridhar.kalvala@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230929213640.3189912-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although the vast majority of workarounds the driver needs to implement
are either GT-based or display-based, there are occasionally workarounds
that reside outside those parts of the hardware (i.e., in they target
registers in the sgunit/soc); we can consider these to be "tile"
workarounds since there will be instance of these registers per tile.
The registers in question should only lose their values during a
function-level reset, so they only need to be applied during probe and
resume; the registers will not be affected by GT/engine resets.
Tile workarounds are rare (there's only one, 22010954014, that's
relevant to Xe at the moment) so it's probably not worth updating the
xe_rtp design to handle tile-level workarounds yet, although we may want
to consider that in the future if/when more of these show up on future
platforms.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Link: https://lore.kernel.org/r/20230913231411.291933-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There are a set of engine group busyness counters provided by HW which are
perfect fit to be exposed via PMU perf events.
BSPEC: 46559, 46560, 46722, 46729, 52071, 71028
events can be listed using:
perf list
xe_0000_03_00.0/any-engine-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/copy-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/interrupts/ [Kernel PMU event]
xe_0000_03_00.0/media-group-busy-gt0/ [Kernel PMU event]
xe_0000_03_00.0/render-group-busy-gt0/ [Kernel PMU event]
and can be read using:
perf stat -e "xe_0000_8c_00.0/render-group-busy-gt0/" -I 1000
time counts unit events
1.001139062 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
2.003294678 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
3.005199582 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
4.007076497 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
5.008553068 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
6.010531563 43520 ns xe_0000_8c_00.0/render-group-busy-gt0/
7.012468029 44800 ns xe_0000_8c_00.0/render-group-busy-gt0/
8.013463515 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
9.015300183 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
10.017233010 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
10.971934120 0 ns xe_0000_8c_00.0/render-group-busy-gt0/
The pmu base implementation is taken from i915.
v2:
Store last known value when device is awake return that while the GT is
suspended and then update the driver copy when read during awake.
v3:
1. drop init_samples, as storing counters before going to suspend should
be sufficient.
2. ported the "drm/i915/pmu: Make PMU sample array two-dimensional" and
dropped helpers to store and read samples.
3. use xe_device_mem_access_get_if_ongoing to check if device is active
before reading the OA registers.
4. dropped format attr as no longer needed
5. introduce xe_pmu_suspend to call engine_group_busyness_store
6. few other nits.
v4: minor nits.
v5: take forcewake when accessing the OAG registers
v6:
1. drop engine_busyness_sample_type
2. update UAPI documentation
v7:
1. update UAPI documentation
2. drop MEDIA_GT specific change for media busyness counter.
Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Co-developed-by: Bommu Krishnaiah <krishnaiah.bommu@intel.com>
Signed-off-by: Aravind Iddamsetty <aravind.iddamsetty@linux.intel.com>
Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Wa_16017236439 requires that we update BCS_SWCTRL
(via indirect context batch buffer) to set 64B
transfers when running on an even-numbered BCS
engine and 256B on an odd-numbered BCS engine.
v2: Move WA from engine_was[] to lrc_was[]
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As with PVC, Xe2 platforms require that the index of an uncached MOCS
entry be programmed into the GUC_SHIM_CONTROL register. This will
likely be needed on future platforms as well.
Xe2 also extends the size of the MOCS index register field from two bits
to four bits. Since these extra bits were unused on PVC, it should be
safe to just increase the size of the mask.
Bspec: 60592
Cc: Haridhar Kalvala <haridhar.kalvala@intel.com>
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe2 uses the same steering control register and steering semaphore
register as MTL. As with recent platforms, group/instance 0,0 is
sufficient to target a non-terminated instance for most classes of MCR
registers; the only types of ranges that need to consider platform
fusing to find a non-terminated instance are SLICE/DSS ranges and a new
SQIDI_PSMI type of range.
Note that the range of valid bits in XE2_NODE_ENABLE_MASK may be reduced
for some Xe2 SKUs. However the lowest bits are always valid and only
the lowest instance is obtained via __ffs(), so there's no need to
complicate the masking with extra platform/subplatform checks.
Also note that Wa_14017387313 suggests skipping MCR lock acquisition
around GAM and GAMWKR registers to prevent MCR register accesses in an
interrupt handler from deadlocking when the steering semaphore is
already held outside the interrupt context. At this time Xe never
issues MCR accesses from within an interrupt handler so the workaround
is not currently needed.
v2:
- [0x008700-0x0087FF] range to extend up to 0x887F (Matt Attwood)
- [0x00EF00-0x00F4FF] -> [0x00F000, 0xFFFF] to follow latest
bspec version (Bala)
Bspec: 71185
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The workaround database tells us to set this bit, even though the bspec
indicates the bit doesn't exist on these platforms. Since this is a
write-only register, we also can't read back its value to verify whether
it's actually working or not. For now we'll trust that the workaround
database knows what it's talking about; if not, the hardware will just
ignore the attempt to write to a non-existent bit and it shouldn't cause
any problems.
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230727220920.2291913-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the registers to get C6 residency of MTL SAMedia and
C6 status of MTL gts
v2:
- move register definitions to regs header (Anshuman)
- correct reg definition for mtl rc status
- make idle_status function common (Badal)
v3:
- remove extra line in commit message
- use only media type check in initialization
- use graphics ver check (Anshuman)
v4:
- remove extra lines (Anshuman)
Bspec: 66300
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Signed-off-by: Riana Tauro <riana.tauro@intel.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This adds a handful of workarounds that apply to production steppings of
MTL:
- Wa_14018575942
- Wa_22016670082
- Wa_14017856879
- Wa_18019271663
Wa_22016670082 is currently only applied to the primary GT at the
moment, but may need to be extended to the media GT in the future if a
pending update to the workaround database gets finalized.
OOB workarounds will need to be implemented separately in future patches
for Wa_14016712196, Wa_16018063123, and Wa_18013179988.
Reviewed-by: Radhakrishna Sripada <radhakrishna.sripada@intel.com>
Link: https://lore.kernel.org/r/20230608181217.2385932-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For scratch table mode we need to cover the case where a scratch PTE might
have been pre-fetched and cached and used instead of that of the newly
bound vma.
For compute vms, invalidate TLB globally using GuC before signalling
bind complete. For !long-running vms, invalidate TLB at batch start.
Also document how TLB invalidation works.
v2:
- Fix a pointer to the comment about TLB invalidation (Jose Souza).
- Add a bool to the vm whether we want to invalidate TLB at batch start.
- Invalidate TLB also on BCS- and video engines at batch start where
needed.
- Use BIT() macro instead of explicit shift.
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Tested-by: José Roberto de Souza <jose.souza@intel.com> #v1
Reported-by: José Roberto de Souza <jose.souza@intel.com> #v1
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/291
Acked-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reformat the GuC register header according to the same rules used by
other register headers:
- Register definitions are ordered by offset
- Value of #define's start on column 49
- Lowercase used for hex values
No functional change.
This header has some things that aren't directly related to register
definitions (e.g., number of doorbells, doorbell info structure, GuC
interrupt vector layout, etc. These items have been moved to the bottom
of the header.
Cc: Michal Wajdeczko <michal.wajdeczko@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20230602235210.1314028-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The _total_vram_size helper is device based and is not complete.
Teach the helper to be tile aware and add the ability to size
DG1 correctly.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
For platforms with GMD_ID registers, the IP stepping should be
determined from the 'revid' field of those registers rather than from
the PCI revid.
The hardware teams have indicated that they plan to keep the revid =>
stepping mapping consistent across all GMD_ID platforms, with major
steppings (A0, B0, C0, etc.) having revids that are multiples of 4, and
minor steppings (A1, A2, A3, etc.) taking the intermediate values. For
now we'll trust that hardware follows through on this plan; if they have
to change direction in the future (e.g., they wind up needing something
like an "A4" that doesn't fit this scheme), we can add a GMD_ID-based
lookup table when the time comes.
v2:
- Set xe->info.platform before finding stepping; the pre-GMD_ID code
relies on this value to pick a lookup table.
v3:
- Also set xe->info.subplatform before picking the stepping for
pre-GMD_ID lookup.
Reviewed-by: Balasubramani Vivekanandan <balasubramani.vivekanandan@intel.com>
Link: https://lore.kernel.org/r/20230524185952.666158-1-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
guc_mmio_regset_write() had a flags for the registers to be added to the
GuC's regset list. The only register actually using that was RCU_MODE,
but it was setting the flags to a bogus value. From
struct xe_guc_fwif.h,
#define GUC_REGSET_MASKED BIT(0)
#define GUC_REGSET_MASKED_WITH_VALUE BIT(2)
#define GUC_REGSET_RESTORE_ONLY BIT(3)
Cross checking with i915, the only flag to set in RCU_MODE is
GUC_REGSET_MASKED. That can be done automatically from the register, as
long as the definition is correct.
Add the XE_REG_OPTION_MASKED annotation to RCU_MODE and kill the "flags"
field in guc_mmio_regset_write(): guc_mmio_regset_write_one() can decide
that based on the register being passed.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/20230429062332.354139-3-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
copy cs instructions that dont have a explict MOCS field will use this
default MOCS value.
v2:
- move to xe_hw_engine.c
- remove BLIT_CCTL auxiliary macros
- removed MASKED_REG
v3:
- rebased
v4:
- process workaround in hwe->reg_lrc
v5:
- add a new function and call it from xe_gt_record_default_lrcs()
because hwe->reg_lrc is initialized later
BSpec: 45807
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
CS instructions that dont have a explicit MOCS field will use this
default MOCS value.
To do this, it was necessary to initialize part of the mocs earlier
and add new function that loads another array of rtp entries set
during run-time.
This is still missing to handle of mocs read for platforms with
HAS_L3_CCS_READ(aka PVC).
v2:
- move to xe_hw_engine.c
- remove CMD_CCTL auxiliary macros
v3:
- rebased
Bspec: 45826
Cc: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
These should replace the _MMIO() and MCR_REG() from i915, with the goal
of being more extensible, allowing to pass the additional fields for
struct xe_reg and struct xe_reg_mcr. Replace all uses of _MMIO() and
MCR_REG() in xe.
Since the RTP, reg-save-restore and WA infra are not ready to use the
new type, just undef the macro like was done for the i915 types
previously. That conversion will come later.
v2: Remove MEDIA_SOFT_SCRATCH_COUNT/MEDIA_SOFT_SCRATCH re-added by
mistake (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-8-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Stop using i915 types for registers. Use our own types. Differently from
i915, this will keep under the register definition the knowledge for the
different types of registers. For now, the "flags"/"options" are mcr and
masked, although only the former is being used.
Additionally MCR registers have their own type. The only place that
should really look inside a xe_mcr_reg_t is that code dealing with the
steering and using other APIs when the register is MCR has been a source
of problem in the past.
Most of the driver is agnostic to the register differences since they
either use the definition from the header or already call the correct
MCR_REG()/_MMIO() macros. By embeding the struct xe_reg inside the
struct it's also possible to guarantee the compiler will break if
using RANDOM_MCR_REG.reg is attempted, since now the u32 is inside the
inner struct.
v2:
- Deep a dedicated type for MCR registers to avoid misuse
(Matt Roper, Jani)
- Drop the typedef and just use a struct since it's not an opaque type
(Jani)
- Add more kernel-doc
v3:
- Use only 22 bits for the register address since all the platforms
supported so far have only 4MB of MMIO per tile (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Convert the macro declarations to the equivalent GENMASK and
and bitfield prep for all registers.
v2 (Matt Roper):
- Fix wrong conversion of RPM_CONFIG0_CRYSTAL_CLOCK_FREQ_MASK
- Reorder fields of XEHP_SLICE_UNIT_LEVEL_CLKGATE for consistency
- Simplify CTC_SOURCE_* by only defining CTC_SOURCE_DIVIDE_LOGIC
as REG_BIT(0)
v3: Also remove DOP_CLOCK_GATE_ENABLE that is unused and wrongly defined
Reviewed-by: Matt Atwood <matthew.s.atwood@intel.com>
Link: https://lore.kernel.org/r/20230427223256.1432787-4-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>