The majority of xe_gt_irq_postinstall() is really focused on the
hardware engine interrupts; other GT-related interrupts such as the GuC
are enabled/disabled independently. Renaming the function and making it
truly GT-specific will make it more clear what the intended focus is.
Disabling/masking of other interrupts (such as GuC interrupts) is
unnecessary since that has already happened during the irq_reset stage,
and doing so will become harmful once the media GT is re-enabled since
calls to xe_gt_irq_postinstall during media GT initialization would
incorrectly disable the primary GT's GuC interrupts.
Also, since this function is called from gt_fw_domain_init(), it's not
necessary to also call it earlier during xe_irq_postinstall; just
xe_irq_resume to handle runtime resume should be sufficient.
v2:
- Drop unnecessary !gt check. (Lucas)
- Reword some comments about enable/unmask for clarity. (Lucas)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-26-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The xe_irq_postinstall() never actually gets called after installing the
interrupt handler. This oversight seems to get papered over due to the
fact that the (misnamed) xe_gt_irq_postinstall does more than it really
should and gets called in the middle of the GT initialization. The
callstack for postinstall is also a bit muddled with top-level device
interrupt enablement happening within platform-specific functions called
from the per-tile xe_gt_irq_postinstall() function.
Clean this all up by adding the missing call to xe_irq_postinstall()
after installing the interrupt handler and pull top-level irq enablement
up to xe_irq_postinstall where we'd expect it to be.
The xe_gt_irq_postinstall() function is still a bit misnamed here; an
upcoming patch will refocus its purpose and rename it.
v2:
- Squash in patch to actually call xe_irq_postinstall() after
installing the interrupt handler.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-25-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Although primary and media GuC share a single interrupt enable bit, they
each have distinct bits in the mask register. Although we always enable
interrupts for the primary GuC before the media GuC today (and never
disable either of them), this might not always be the case in the
future, so use a RMW when updating the mask register to ensure the other
GuC's mask doesn't get clobbered.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-24-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
IRQ delivery and handling needs to be handled on a per-tile basis. Note
that this is true even for the "GT interrupts" relating to engines and
GuCs --- the interrupts relating to both GTs get raised through a single
set of registers in the tile's sgunit range.
On true multi-tile platforms, interrupts on remote tiles are internally
forwarded to the root tile; the first thing the top-level interrupt
handler should do is consult the root tile's instance of
DG1_MSTR_TILE_INTR to determine which tile(s) had interrupts. This
register is also responsible for enabling/disabling top-level reporting
of any interrupts to the OS. Although this register technically exists
on all tiles, it should only be used on the root tile.
The (mis)use of struct xe_gt as a target for MMIO operations in the
driver makes the code somewhat confusing since we wind up needing a GT
pointer to handle programming that's unrelated to the GT. To mitigate
this confusion, all of the xe_gt structures used solely as an MMIO
target in interrupt code are renamed to 'mmio' so that it's clear that
the structure being passed does not necessarily relate to any specific
GT (primary or media) that we might be dealing with interrupts for.
Reworking the driver's MMIO handling to not be dependent on xe_gt is
planned as a future patch series.
Note that GT initialization code currently calls xe_gt_irq_postinstall()
in an attempt to enable the HWE interrupts for the GT being initialized.
Unfortunately xe_gt_irq_postinstall() doesn't really match its name and
does a bunch of other stuff unrelated to the GT interrupts (such as
enabling the top-level device interrupts). That will be addressed in
future patches.
v2:
- Clarify commit message with explanation of why DG1_MSTR_TILE_INTR is
only used on the root tile, even though it's an sgunit register that
is technically present in each tile's MMIO space. (Aravind)
- Also clarify that the xe_gt used as a target for MMIO operations may
or may not relate to the GT we're dealing with for interrupts.
(Lucas)
Cc: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-22-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Now that tiles and GTs are handled separately, extra_gts[] doesn't
really provide any useful information that we can't just infer directly.
The primary GT of the root tile and of the remote tiles behave the same
way and don't need independent handling.
When we re-add support for media GTs in a future patch, the presence of
media can be determined from MEDIA_VER() (i.e., >= 13) and media's GSI
offset handling is expected to remain constant for all forseeable future
platforms, so it won't need to be provided in a definition structure
either.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-18-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There are a bunch of places in the driver where we need to perform
non-GT MMIO against the platform's primary tile (display code, top-level
interrupt enable/disable, driver initialization, etc.). Rename
'to_gt()' to 'xe_primary_mmio_gt()' to clarify that we're trying to get
a primary MMIO handle for these top-level operations.
In the future we need to move away from xe_gt as the target for MMIO
operations (most of which are completely unrelated to GT).
v2:
- s/xe_primary_mmio_gt/xe_root_mmio_gt/ for more consistency with how
we refer to tile 0. (Lucas)
v3:
- Tweak comment on xe_root_mmio_gt(). (Lucas)
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-16-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Migration primarily focuses on the memory associated with a tile, so it
makes more sense to track this at the tile level (especially since the
driver was already skipping migration operations on media GTs).
Note that the blitter engine used to perform the migration always lives
in the tile's primary GT today. In theory that could change if media
GTs ever start including blitter engines in the future, but we can
extend the design if/when that happens in the future.
v2:
- Fix kunit test build
- Kerneldoc parameter name update
v3:
- Removed leftover prototype for removed function. (Gustavo)
- Remove unrelated / unwanted error handling change. (Gustavo)
Cc: Gustavo Sousa <gustavo.sousa@intel.com>
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Gustavo Sousa <gustavo.sousa@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-15-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Since memory and address spaces are a tile concept rather than a GT
concept, we need to plumb tile-based handling through lots of
memory-related code.
Note that one remaining shortcoming here that will need to be addressed
before media GT support can be re-enabled is that although the address
space is shared between a tile's GTs, each GT caches the PTEs
independently in their own TLB and thus TLB invalidation should be
handled at the GT level.
v2:
- Fix kunit test build.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-13-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Each tile has its own register region in the BAR, containing instances
of all registers for the platform. In contrast, the multiple GTs within
a tile share the same MMIO space; there's just a small subset of
registers (the GSI registers) which have multiple copies at different
offsets (0x0 for primary GT, 0x380000 for media GT). Move the register
MMIO region size/pointers to the tile structure, leaving just the GSI
offset information in the GT structure.
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-7-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Create a new xe_tile structure to begin separating the concept of "tile"
from "GT." A tile is effectively a complete GPU, and a GT is just one
part of that. On platforms like MTL, there's only a single full GPU
(tile) which has its IP blocks provided by two GTs. In contrast, a
"multi-tile" platform like PVC is basically multiple complete GPUs
packed behind a single PCI device.
For now, just create xe_tile as a simple wrapper around xe_gt. The
items in xe_gt that are truly tied to the tile rather than the GT will
be moved in future patches. Support for multiple GTs per tile (i.e.,
the MTL standalone media case) will also be re-introduced in a future
patch.
v2:
- Fix kunit test build
- Move hunk from next patch to use local tile variable rather than
direct xe->tiles[id] accesses. (Lucas)
- Mention compute in kerneldoc. (Rodrigo)
Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/20230601215244.678611-3-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
It looks like the driver only wants to track one vma for each external
object per vm. However it looks like bo_has_vm_references_locked() will
ignore any vma that is marked as vma->destroyed (not actually destroyed
yet). If we then mark our externally tracked vma as destroyed and then
create a new vma for the same object and vm, we can have two externally
tracked vma for the same object and vm. When the destroy actually
happens it tries to move the external tracking to a different vma, but
in this case it is already being tracked, leading to double list add
errors. It should be safe to simply drop the destroyed check in
bo_has_vm_references(), since the actual destroy will switch the
external tracking to the next available vma.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/290
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
__emit_job_gen12_render_compute() masks some PIPE_CONTROL bits that
do not exist in platforms without render engine.
So here replacing the PVC check by something more generic that will
support any future platforms without render engine.
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The _io_offset helper function is returning an offset into the GPU
address space. Using the CPU address offset (io_) is not correct.
Rename to reflect usage.
Update to use GPU offset information.
Update PT dma_offset to use the helper
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The current method of sizing GT device memory is not quite right.
Update the algorithm to use the relevant HW information and offsets
to set up the sizing correctly.
Update the stolen memory sizing to reflect the changes, and to be
GT specific.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
"Right sizing" the PCI BAR is not necessary. If rebar is needed
size to the maximum available.
Preserve the force_vram_bar_size sizing.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
The _total_vram_size helper is device based and is not complete.
Teach the helper to be tile aware and add the ability to size
DG1 correctly.
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Lockdep is unhappy about ggtt->lock -> runtime_pm, where it seems
to think this can somehow get inverted. The ggtt->lock looks like a
potentially sensitive driver lock, so likely a sensible move to never
call the runtime_pm routines while holding it. Actually it looks like
d3cold wants to grab this, so perhaps this can indeed deadlock.
v2:
- Don't forget about xe_gt_tlb_invalidation_vma(), which now needs
explicit access_get.
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Seems to be a sensitive lock, where ct->lock looks to be primed with
fs_reclaim, so holding that and then allocating memory will cause
lockdep to complain. We need to change the ordering wrt to grabbing the
ct->lock and potentially grabbing the runtime_pm, since some of the
runtime_pm routines can allocate memory (or at least that's what lockdep
seems to suggest).
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There is no mention that migrate_copy() will skip copying the CCS aux
state for all types of vram -> vram transfers. Currently we don't need
such a facility but might be surprising if we ever do.
v2: (Lucas):
- s/lmem/vram/ in the commit message
- Tidy up the code a bit; use one emit_copy_ccs()
v3:
- Reword the commit message
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Lucas De Marchi <lucas.demarchi@intel.com>
Acked-by: Nirmoy Das <nirmoy.das@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Modify the xe_migrate_copy() function somewhat to explicitly allow
copying of data between two buffer objects including system memory
buffer objects. Update the migrate test accordingly.
v2:
- Check that buffer object sizes match when copying (Matthew Auld)
Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Port Wa_14014475959 to xe_wa fixing its condition. The workaround should
only be applied on the primary GT, not on media. So just checking by
MTL platform is not enough: checking GT is of the right type is also
needed.
Since the GRAPHICS_STEP() does checks the GT type, we could leave the
first check as a platform one: it'd would be easier to understand and
not go out of sync with the graphics_ip_map[] in
drivers/gpu/drm/xe/xe_pci.c. However it also means that new platforms
using the same IP wouldn't match. Prefer using the IP version.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-22-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When running rules on MTL and beyond that have media as a standalone GT,
the rule should only match if the gt passed as parameter match the
version/range/stepping that the rule is checking. This allows
workarounds affecting only the media GT to be applied only on that GT
and vice-versa.
For platforms before MTL, the GT will not be of media type, even if it
includes media engines. Make sure to cover that case by checking if the
platforma has standalone media.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-21-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Wa_16015675438 and Wa_18020744125 apply to DG2 using the same action and
conditions. Add both to the oob rules so they are both reported as
active. Note that previously they were not checking by platform or IP
version, hence making them not future-proof. Those workarounds should
only be active in PVC and DG2, besides the check for "no render engine".
v2: From current WA database, Wa_16015675438 applies to all DG2
subplatforms except G11. Migrate condition to use subplatform and
remove G11 from the match (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-19-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Wa_22012727170 and Wa_22012727685 apply to DG2 using the same action and
conditions. Add both to the oob rules so they are both reported as
active.
Do not Wa_22012727170 to PVC and MTL since only early A* steppings are
affected.
v2: Remove DG2_G10 from Wa_22012727685 to match current WA database
(Matt Roper)
v3: GRAPHICS_STEP(A0, FOREVER) can be left alone for DG2 as this means
all steppings
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-18-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
There are WAs that, due to their nature, cannot be applied from a
central place like xe_wa.c. Those are peppered around the rest of the
code, as needed. Now they have a new name: "out-of-band workarounds".
These workarounds have their names and rules still grouped in xe_wa.c,
inside the xe_wa_oob array, which is generated at compile time by
xe_wa_oob.rules and the hostprog xe_gen_wa_oob. The code generation
guarantees that the header xe_wa_oob.h contains the IDs for the
workarounds that match the index in the table. This way the runtime
checks that are spread throughout the code are simple tests against the
bitmap saved during initialization.
v2: Fix prev_name tracking not working when it's empty, i.e. when there
is more than 1 continuation rule.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-13-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
When doing out-of-tree builds with O= or KBUILD_OUTPUT=, it's important
to also add the directory where the target is saved. Otherwise any file
generated by the build system may not be available for other targets
depending on it.
The $(obj) is added automatically when building the entire kernel,
but it's not added when M=drivers/gpu/drm/xe is added.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-12-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Start differentiating the media and graphics stepping as it will be
important for MTL. Note that RTP is still not prepared to handle the
different types of GT, i.e. checking for graphics version/range/stepping
on a media gt or vice versa still matches regardless of the gt being
passed as parameter. Changing it to accommodate MTL is left for a future
patch.
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-10-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Allocate the data to track workarounds on each gt of the device,
and pass that to RTP so the active workarounds are tracked.
Even if the workarounds available until now are mostly device
or platform centric, with the different IP versions for media and
graphics starting with MTL, it's possible that some workarounds
need to be applied only on select GTs. Also, given the workaround
database is per IP block, for tracking purposes there is no need to
differentiate the workarounds per engine class. Hence the bitmask
to track active workarounds can be tracked per GT.
v2: Move the tracking from per-device to per-GT basis (Matt Roper)
Reviewed-by: Matt Roper <matthew.d.roper@intel.com>
Link: https://lore.kernel.org/r/20230526164358.86393-7-lucas.demarchi@intel.com
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>