- The XY_CTRL_SURF_COPY_BLT instruction operating on ccs data expects
size in pages of main memory for which CCS data should be copied.
- The bitfield representing copy size in XY_CTRL_SURF_COPY_BLT has
shifted one bit higher in the instruction.
v2:
- Fix the num_pages for ccs size calculation.
- Address nits (Thomas)
v3:
- Use FIELD_PREP and FIELD_FIT instead of shifts and numbers.(Matt)
Cc: Matt Roper <matthew.d.roper@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Disable dynamic HW load balancing of compute resource assignment
to engines and instead enabled fixed mode of mapping compute
resources to engines on all platforms with more than one compute
engine.
By default enable only one CCS engine with all compute slices
assigned to it. This is the desired configuration for common
workloads.
PVC platform supports only the fixed CCS mode (workaround 16016805146).
v2: Rebase, make it platform agnostic
v3: Minor code refactoring
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This workaround applies to graphics 20.04 on all engines.
Workaround has three parts :
1. Pipe flush before MI_ATOMIC - This part isn't relevant to Xe
(at least not right now) since we don't use MI_ATOMIC anywhere
in the kernel mode driver.
2. Memory-based interrupt masking - Memory-based interrupt processing
isn't supported on physical functions, only virtual functions,
according to bspec 60352. So this is probably only relevant once
SRIOV support lands in the driver.
3. Disabling CSB/timestamp updates to the ghwsp and pphwsp - Workaround
is added by this change.
The CSB reports to gHWSP and ppHWSP have been discussed as part
of a different topic on some internal threads and we've confirmed
that neither the KMD nor the GuC firmware use those for anything,
so disabling them is always "safe" and should have no functional
or performance impact on system operation. The same is true for
the timestamp updates in the ppHWSP as well. Given that, it might
make sense to just combine these two workarounds into a single
record (and single patch) and apply it on all steppings. Disabling
the reports for RCS on higher steppings doesn't have any kind of
negative impact and will simplify the overall situation.
V3(MattR):
- Combine WA apply same WA for all engines, no performance impact
V2(MattR):
- Mention detail in commit message
- Reorder bit define
- Improve bit naming
- Remove workaround part which isnt relevant
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Starting on MTL, the HuC is authenticated twice, once via GuC (same as
with older integrated platforms) and once via GSC; the first
authentication allows the HuC to be used for clear-media workloads,
while the second one unlocks support for protected content.
Ahead of adding the authentication flow via GSC, this patch adds support
for differentiating the 2 auth steps and checking if they're complete.
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: Vivaik Balasubrawmanian <vivaik.balasubrawmanian@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Workaround applies to Graphics 20.04 as part of ring
submission
V4(MattR):
- Rule for engine in oob WA not supported, add explicitly
V3(MattR):
- Pass hwe and rename API name to hint end of ring work
- Use existing RING_NOPID API
V2:
- Marking this WA for 20.04 instead of 20.00
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Skip the init/start/stop GuC PC functions and toggle C6 using
register writes instead. Also request max possible frequency
as dynamic freq management is disabled.
v2: Fix compile warning
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When the GSC FW is loaded, we need to inform it when a GSCCS reset is
coming and then wait 200ms for it to get ready to process the reset.
v2: move WA code to GSC file, use variable in Makefile (John)
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: John Harrison <john.c.harrison@intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The GSC FW must be copied in a 4MB stolen memory allocation, whose GGTT
address is then passed as a parameter to a dedicated load instruction
submitted via the GSC engine.
Since the GSC load is relatively slow (up to 250ms), we perform it
asynchronously via a worker. This requires us to make sure that the
worker has stopped before suspending/unloading.
Note that we can't yet use xe_migrate_copy for the copy because it
doesn't work with stolen memory right now, so we do a memcpy from the
CPU side instead.
v2: add comment about timeout value, fix GSC status checking
before load (John)
Bspec: 65306, 65346
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Alan Previn <alan.previn.teres.alexis@intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Reviewed-by: John Harrison <John.C.Harrison@Intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This workaround applies to Xe2_LPM
V3(MattR):
- Reorder reg and wa placement
- Add base parameter to reg macro for better definition
V2(MattR):
- Change name of register
- Loop for all engines
- Driver permanent WA, applies to all steps
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Tejas Upadhyay <tejas.upadhyay@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
As for display, the intent is to share the display code with the i915
driver so that there is maximum reuse there.
We do this by recompiling i915/display code twice.
Now that i915 has been adapted to support the Xe build, we can add
the xe/display support.
This initial work is a collaboration of many people and unfortunately
this squashed patch won't fully honor the proper credits.
But let's try to add a few from the squashed patches:
Co-developed-by: Matthew Brost <matthew.brost@intel.com>
Co-developed-by: Jani Nikula <jani.nikula@intel.com>
Co-developed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Co-developed-by: Matt Roper <matthew.d.roper@intel.com>
Co-developed-by: Mauro Carvalho Chehab <mchehab@kernel.org>
Co-developed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Co-developed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Starting GT freq is usually RPn. Raising freq to RP0 will
help speed up GuC load times. As an example, this data was
collected on DG2-
GuC Load time @RPn ~ 41 ms
GuC Load time @RP0 ~ 11 ms
v2: Raise GT freq before hwconfig init. This will speed up
both HuC and GuC loads. Address review comments (Rodrigo).
Also add a small usleep after requesting frequency which gives
pcode some time to react.
v3: Address checkpatch issue
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Vinay Belgaumkar <vinay.belgaumkar@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Driver initiated function-reset (FLR) is the highest level of reset
that we can trigger from within the driver. In contrast to PCI FLR it
doesn't require re-enumeration of PCI BAR. It can be useful in case
GT fails to reset. It is also the only way to trigger GSC reset from
the driver and can be used in future addition of GSC support.
v2:
- use regs from xe_regs.h
- move the flag to xe.mmio
- call flr only on root gt
- use BIOS protection check
- copy/paste comments from i915
v3:
- flr code moved to xe_device.c
v4:
- needs_flr_on_fini moved to xe_device
Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com>
Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
This workaround is primarily implemented by the BIOS. However if the
BIOS applies the workaround it will reserve a small piece of our DSM
(which should be at the top, right below the WOPCM); we just need to
keep that region reserved so that nothing else attempts to re-use it.
v2 (Gustavo):
- Check for NULL media_gt
- Mask bits [5:0] to avoid potential issues in future platforms
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20231102124855.1940491-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Expose power1_max_interval, that is the tau corresponding to PL1, as a
custom hwmon attribute. Some bit manipulation is needed because of the
format of PKG_PWR_LIM_1_TIME in
PACKAGE_RAPL_LIMIT register (1.x * power(2,y))
v2: Get rpm wake ref while accessing power1_max_interval
v3: %s/hwmon/xe_hwmon/
v4:
- As power1_max_interval is rw attr take lock in read function as well
- Refine comment about val to fix point conversion (Andi)
- Update kernel version and date in doc
v5: Fix review comments (Anshuman)
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com>
Reviewed-by: Anshuman Gupta <anshuman.gupta@intel.com>
Signed-off-by: Badal Nilawar <badal.nilawar@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231030115618.1382200-4-badal.nilawar@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add L3SQCREG5 as part of HW recommended settings. The recommended value
in Bspec is 00e0007f. For Xe2-LPG, bits 23:21 don't exist anymore, but
it's confirmed with HW engineers that setting them doesn't do anything.
They still exist on the media GT, Xe2-LPM, but they are already they are
already set as per HW default value. So for Xe2 platform, the only bits
that need to be set are 9:0 since HW's default is 0x1ff and the
recommended value is 0x7f.
Unlike most registers, which have the same relative offset on both
the primary and media GT, this register has a different base offset
on the media GT.
On MTL the register only exists for the primary (graphics) GT, so
there's no need to program it on the media gt. Also, it's part of the
RCS engine's context, so it needs to be added as a LRC workaround.
Bspec: 72161
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231024220739.224251-2-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Add the initial collection of gt/engine/lrc workarounds.
While at it, add some newlines around the platform/IP comments to make
them consistent across all workarounds.
v2:
- FF_MODE is an MCR register (Matt Roper)
- Group 18032247524 with other Xe2 workarounds (Matt Roper)
- Move WA changing PSS_CHICKEN to lrc_was[] as for Xe2 that register
is part of the render context image (Matt Roper)
- Apply WA 16020518922 only on render engine (Matt Roper)
Signed-off-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com>
Signed-off-by: Shekhar Chauhan <shekhar.chauhan@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20231024220739.224251-1-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Extracting the common MI_* instructions that can be used with any engine
to their own header will make it easier as we add additional engine
instructions in upcoming patches.
Also, since the majority of GPU instructions (both MI and non-MI) have
a "length" field in bits 7:0 of the instruction header, a common define
is added for that. Instruction-specific length fields are still defined
for special case instructions that have larger/smaller length fields.
v2:
- Use "instr" instead of "inst" as the short form of "instruction"
everywhere. (Lucas)
- Include xe_reg_defs.h instead of the i915 compat header. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-12-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
MI_STORE_DATA_IMM can store either dword values or qword values, and can
store more than one value if the instruction's length field is large
enough. Create explicit defines to specify the number of dwords/qwords
to be stored, which will set the instruction length correctly and, if
necessary, turn on the 'store qword' bit.
While we're here, also replace an open-coded version of
MI_STORE_DATA_IMM with the common macros.
Bspec: 60246
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-11-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Despite its name, MI_FLUSH_DW instruction can write an immediate value
of either dword size or qword size, depending on the 'length' field of
the instruction. Since "length" excludes the first two dwords of the
instruction, a value of 2 in the length field implies a dword write and
a value of 3 implies a qword write. Even in cases where the flush
instruction's post-sync operation is set to "no write" we're still
expected to size the overall instruction as if we were doing a dword or
qword write (i.e., a length of 1 shouldn't be used on modern platforms).
Rather than baking a size of "1" into the #define and then adding
another unexplained "+ 1" at all the spots where the definition gets
used, lets just create MI_FLUSH_IMM_DW and MI_FLUSH_IMM_QW definitions
that should be OR'd into the instruction header to make it more explicit
what behavior we're requesting.
Bspec: 60229
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20231016163449.1300701-9-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Xe driver currently supports 22-bit addresses for MMIO access.
Future platforms will have additional MMIO extension with
larger address spaces, and to access them, the driver will
have to support wider address representation.
Please note that while the XE_REG macro is used for MMIO access,
XE_REG_EXT macro will be used for MMIO-extension access.
Signed-off-by: Koby Elbaz <kelbaz@habana.ai>
Reviewed-by: Ofir Bitton <obitton@habana.ai>
Reviewed-by: Moti Haimovski <mhaimovski@habana.ai>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Force indirect state sampler data to only be in the dynamic state pool,
which is more convienent for the UMD. Behavior change mirrors similar
change for i915 in commit 16fc9c08f0 ("drm/i915: disable sampler
indirect state in bindless heap")
v2: split out per engine tuning into separate patch, commit message
(Lucas)
v3: rebase
v4: Change to match render only, g.ver 1200 to 1271 (MattR)
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It was reading (base) + 0x8c but that is not a valid register
and instead it should read (base) + 0x68.
So here reading the correct register and removing the wrong and
duplicated.
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>