mirror of
https://github.com/torvalds/linux.git
synced 2026-04-22 00:33:58 -04:00
Pull drm updates from Dave Airlie:
"As part of building up nova-core/nova-drm pieces we've brought in some
rust abstractions through this tree, aux bus being the main one, with
devres changes also in the driver-core tree. Along with the drm core
abstractions and enough nova-core/nova-drm to use them. This is still
all stub work under construction, to build the nova driver upstream.
The other big NVIDIA related one is nouveau adds support for
Hopper/Blackwell GPUs, this required a new GSP firmware update to
570.144, and a bunch of rework in order to support multiple fw
interfaces.
There is also the introduction of an asahi uapi header file as a
precursor to getting the real driver in later, but to unblock
userspace mesa packages while the driver is trapped behind rust
enablement.
Otherwise it's the usual mixture of stuff all over, amdgpu, i915/xe,
and msm being the main ones, and some changes to vsprintf.
new drivers:
- bring in the asahi uapi header standalone
- nova-drm: stub driver
rust dependencies (for nova-core):
- auxiliary
- bus abstractions
- driver registration
- sample driver
- devres changes from driver-core
- revocable changes
core:
- add Apple fourcc modifiers
- add virtio capset definitions
- extend EXPORT_SYNC_FILE for timeline syncobjs
- convert to devm_platform_ioremap_resource
- refactor shmem helper page pinning
- DP powerup/down link helpers
- extended %p4cc in vsprintf.c to support fourcc prints
- change vsprintf %p4cn to %p4chR, remove %p4cn
- Add drm_file_err function
- IN_FORMATS_ASYNC property
- move sitronix from tiny to their own subdir
rust:
- add drm core infrastructure rust abstractions
(device/driver, ioctl, file, gem)
dma-buf:
- adjust sg handling to not cache map on attach
- allow setting dma-device for import
- Add a helper to sort and deduplicate dma_fence arrays
docs:
- updated drm scheduler docs
- fbdev todo update
- fb rendering
- actual brightness
ttm:
- fix delayed destroy resv object
bridge:
- add kunit tests
- convert tc358775 to atomic
- convert drivers to devm_drm_bridge_alloc
- convert rk3066_hdmi to bridge driver
scheduler:
- add kunit tests
panel:
- refcount panels to improve lifetime handling
- Powertip PH128800T004-ZZA01
- NLT NL13676BC25-03F, Tianma TM070JDHG34-00
- Himax HX8279/HX8279-D DDIC
- Visionox G2647FB105
- Sitronix ST7571
- ZOTAC rotation quirk
vkms:
- allow attaching more displays
i915:
- xe3lpd display updates
- vrr refactor
- intel_display struct conversions
- xe2hpd memory type identification
- add link rate/count to i915_display_info
- cleanup VGA plane handling
- refactor HDCP GSC
- fix SLPC wait boosting reference counting
- add 20ms delay to engine reset
- fix fence release on early probe errors
xe:
- SRIOV updates
- BMG PCI ID update
- support separate firmware for each GT
- SVM fix, prelim SVM multi-device work
- export fan speed
- temp disable d3cold on BMG
- backup VRAM in PM notifier instead of suspend/freeze
- update xe_ttm_access_memory to use GPU for non-visible access
- fix guc_info debugfs for VFs
- use copy_from_user instead of __copy_from_user
- append PCIe gen5 limitations to xe_firmware document
amdgpu:
- DSC cleanup
- DC Scaling updates
- Fused I2C-over-AUX updates
- DMUB updates
- Use drm_file_err in amdgpu
- Enforce isolation updates
- Use new dma_fence helpers
- USERQ fixes
- Documentation updates
- SR-IOV updates
- RAS updates
- PSP 12 cleanups
- GC 9.5 updates
- SMU 13.x updates
- VCN / JPEG SR-IOV updates
amdkfd:
- Update error messages for SDMA
- Userptr updates
- XNACK fixes
radeon:
- CIK doorbell cleanup
nouveau:
- add support for NVIDIA r570 GSP firmware
- enable Hopper/Blackwell support
nova-core:
- fix task list
- register definition infrastructure
- move firmware into own rust module
- register auxiliary device for nova-drm
nova-drm:
- initial driver skeleton
msm:
- GPU:
- ACD (adaptive clock distribution) for X1-85
- drop fictional address_space_size
- improve GMU HFI response time out robustness
- fix crash when throttling during boot
- DPU:
- use single CTL path for flushing on DPU 5.x+
- improve SSPP allocation code for better sharing
- Enabled SmartDMA on SM8150, SC8180X, SC8280XP, SM8550
- Added SAR2130P support
- Disabled DSC support on MSM8937, MSM8917, MSM8953, SDM660
- DP:
- switch to new audio helpers
- better LTTPR handling
- DSI:
- Added support for SA8775P
- Added SAR2130P support
- HDMI:
- Switched to use new helpers for ACR data
- Fixed old standing issue of HPD not working in some cases
amdxdna:
- add dma-buf support
- allow empty command submits
renesas:
- add dma-buf support
- add zpos, alpha, blend support
panthor:
- fail properly for NO_MMAP bos
- add SET_LABEL ioctl
- debugfs BO dumping support
imagination:
- update DT bindings
- support TI AM68 GPU
hibmc:
- improve interrupt handling and HPD support
virtio:
- add panic handler support
rockchip:
- add RK3588 support
- add DP AUX bus panel support
ivpu:
- add heartbeat based hangcheck
mediatek:
- prepares support for MT8195/99 HDMIv2/DDCv2
anx7625:
- improve HPD
tegra:
- speed up firmware loading
* tag 'drm-next-2025-05-28' of https://gitlab.freedesktop.org/drm/kernel: (1627 commits)
drm/nouveau/tegra: Fix error pointer vs NULL return in nvkm_device_tegra_resource_addr()
drm/xe: Default auto_link_downgrade status to false
drm/xe/guc: Make creation of SLPC debugfs files conditional
drm/i915/display: Add check for alloc_ordered_workqueue() and alloc_workqueue()
drm/i915/dp_mst: Work around Thunderbolt sink disconnect after SINK_COUNT_ESI read
drm/i915/ptl: Use everywhere the correct DDI port clock select mask
drm/nouveau/kms: add support for GB20x
drm/dp: add option to disable zero sized address only transactions.
drm/nouveau: add support for GB20x
drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle
drm/nouveau: add support for GB10x
drm/nouveau/gf100-: track chan progress with non-WFI semaphore release
drm/nouveau/nv50-: separate CHANNEL_GPFIFO handling out from CHANNEL_DMA
drm/nouveau: add helper functions for allocating pinned/cpu-mapped bos
drm/nouveau: add support for GH100
drm/nouveau: improve handling of 64-bit BARs
drm/nouveau/gv100-: switch to volta semaphore methods
drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES
drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY
drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM
...
411 lines
11 KiB
C
411 lines
11 KiB
C
// SPDX-License-Identifier: MIT
|
|
/*
|
|
* Copyright © 2021-2023 Intel Corporation
|
|
*/
|
|
|
|
#include "xe_mmio.h"
|
|
|
|
#include <linux/delay.h>
|
|
#include <linux/io-64-nonatomic-lo-hi.h>
|
|
#include <linux/minmax.h>
|
|
#include <linux/pci.h>
|
|
|
|
#include <drm/drm_managed.h>
|
|
#include <drm/drm_print.h>
|
|
|
|
#include "regs/xe_bars.h"
|
|
#include "regs/xe_regs.h"
|
|
#include "xe_device.h"
|
|
#include "xe_gt.h"
|
|
#include "xe_gt_printk.h"
|
|
#include "xe_gt_sriov_vf.h"
|
|
#include "xe_macros.h"
|
|
#include "xe_sriov.h"
|
|
#include "xe_trace.h"
|
|
|
|
static void tiles_fini(void *arg)
|
|
{
|
|
struct xe_device *xe = arg;
|
|
struct xe_tile *tile;
|
|
int id;
|
|
|
|
for_each_remote_tile(tile, xe, id)
|
|
tile->mmio.regs = NULL;
|
|
}
|
|
|
|
/*
|
|
* On multi-tile devices, partition the BAR space for MMIO on each tile,
|
|
* possibly accounting for register override on the number of tiles available.
|
|
* tile_mmio_size contains both the tile's 4MB register space, as well as
|
|
* additional space for the GTT and other (possibly unused) regions).
|
|
* Resulting memory layout is like below:
|
|
*
|
|
* .----------------------. <- tile_count * tile_mmio_size
|
|
* | .... |
|
|
* |----------------------| <- 2 * tile_mmio_size
|
|
* | tile1 GTT + other |
|
|
* |----------------------| <- 1 * tile_mmio_size + 4MB
|
|
* | tile1->mmio.regs |
|
|
* |----------------------| <- 1 * tile_mmio_size
|
|
* | tile0 GTT + other |
|
|
* |----------------------| <- 4MB
|
|
* | tile0->mmio.regs |
|
|
* '----------------------' <- 0MB
|
|
*/
|
|
static void mmio_multi_tile_setup(struct xe_device *xe, size_t tile_mmio_size)
|
|
{
|
|
struct xe_tile *tile;
|
|
u8 id;
|
|
|
|
/*
|
|
* Nothing to be done as tile 0 has already been setup earlier with the
|
|
* entire BAR mapped - see xe_mmio_probe_early()
|
|
*/
|
|
if (xe->info.tile_count == 1)
|
|
return;
|
|
|
|
/* Possibly override number of tile based on configuration register */
|
|
if (!xe->info.skip_mtcfg) {
|
|
struct xe_mmio *mmio = xe_root_tile_mmio(xe);
|
|
u8 tile_count;
|
|
u32 mtcfg;
|
|
|
|
/*
|
|
* Although the per-tile mmio regs are not yet initialized, this
|
|
* is fine as it's going to the root tile's mmio, that's
|
|
* guaranteed to be initialized earlier in xe_mmio_probe_early()
|
|
*/
|
|
mtcfg = xe_mmio_read32(mmio, XEHP_MTCFG_ADDR);
|
|
tile_count = REG_FIELD_GET(TILE_COUNT, mtcfg) + 1;
|
|
|
|
if (tile_count < xe->info.tile_count) {
|
|
drm_info(&xe->drm, "tile_count: %d, reduced_tile_count %d\n",
|
|
xe->info.tile_count, tile_count);
|
|
xe->info.tile_count = tile_count;
|
|
|
|
/*
|
|
* FIXME: Needs some work for standalone media, but
|
|
* should be impossible with multi-tile for now:
|
|
* multi-tile platform with standalone media doesn't
|
|
* exist
|
|
*/
|
|
xe->info.gt_count = xe->info.tile_count;
|
|
}
|
|
}
|
|
|
|
for_each_remote_tile(tile, xe, id)
|
|
xe_mmio_init(&tile->mmio, tile, xe->mmio.regs + id * tile_mmio_size, SZ_4M);
|
|
}
|
|
|
|
int xe_mmio_probe_tiles(struct xe_device *xe)
|
|
{
|
|
size_t tile_mmio_size = SZ_16M;
|
|
|
|
mmio_multi_tile_setup(xe, tile_mmio_size);
|
|
|
|
return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe);
|
|
}
|
|
|
|
static void mmio_fini(void *arg)
|
|
{
|
|
struct xe_device *xe = arg;
|
|
struct xe_tile *root_tile = xe_device_get_root_tile(xe);
|
|
|
|
pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs);
|
|
xe->mmio.regs = NULL;
|
|
root_tile->mmio.regs = NULL;
|
|
}
|
|
|
|
int xe_mmio_probe_early(struct xe_device *xe)
|
|
{
|
|
struct xe_tile *root_tile = xe_device_get_root_tile(xe);
|
|
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
|
|
|
|
/*
|
|
* Map the entire BAR.
|
|
* The first 16MB of the BAR, belong to the root tile, and include:
|
|
* registers (0-4MB), reserved space (4MB-8MB) and GGTT (8MB-16MB).
|
|
*/
|
|
xe->mmio.size = pci_resource_len(pdev, GTTMMADR_BAR);
|
|
xe->mmio.regs = pci_iomap(pdev, GTTMMADR_BAR, 0);
|
|
if (!xe->mmio.regs) {
|
|
drm_err(&xe->drm, "failed to map registers\n");
|
|
return -EIO;
|
|
}
|
|
|
|
/* Setup first tile; other tiles (if present) will be setup later. */
|
|
xe_mmio_init(&root_tile->mmio, root_tile, xe->mmio.regs, SZ_4M);
|
|
|
|
return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe);
|
|
}
|
|
ALLOW_ERROR_INJECTION(xe_mmio_probe_early, ERRNO); /* See xe_pci_probe() */
|
|
|
|
/**
|
|
* xe_mmio_init() - Initialize an MMIO instance
|
|
* @mmio: Pointer to the MMIO instance to initialize
|
|
* @tile: The tile to which the MMIO region belongs
|
|
* @ptr: Pointer to the start of the MMIO region
|
|
* @size: The size of the MMIO region in bytes
|
|
*
|
|
* This is a convenience function for minimal initialization of struct xe_mmio.
|
|
*/
|
|
void xe_mmio_init(struct xe_mmio *mmio, struct xe_tile *tile, void __iomem *ptr, u32 size)
|
|
{
|
|
xe_tile_assert(tile, size <= XE_REG_ADDR_MAX);
|
|
|
|
mmio->regs = ptr;
|
|
mmio->regs_size = size;
|
|
mmio->tile = tile;
|
|
}
|
|
|
|
static void mmio_flush_pending_writes(struct xe_mmio *mmio)
|
|
{
|
|
#define DUMMY_REG_OFFSET 0x130030
|
|
int i;
|
|
|
|
if (mmio->tile->xe->info.platform != XE_LUNARLAKE)
|
|
return;
|
|
|
|
/* 4 dummy writes */
|
|
for (i = 0; i < 4; i++)
|
|
writel(0, mmio->regs + DUMMY_REG_OFFSET);
|
|
}
|
|
|
|
u8 xe_mmio_read8(struct xe_mmio *mmio, struct xe_reg reg)
|
|
{
|
|
u32 addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
u8 val;
|
|
|
|
/* Wa_15015404425 */
|
|
mmio_flush_pending_writes(mmio);
|
|
|
|
val = readb(mmio->regs + addr);
|
|
trace_xe_reg_rw(mmio, false, addr, val, sizeof(val));
|
|
|
|
return val;
|
|
}
|
|
|
|
u16 xe_mmio_read16(struct xe_mmio *mmio, struct xe_reg reg)
|
|
{
|
|
u32 addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
u16 val;
|
|
|
|
/* Wa_15015404425 */
|
|
mmio_flush_pending_writes(mmio);
|
|
|
|
val = readw(mmio->regs + addr);
|
|
trace_xe_reg_rw(mmio, false, addr, val, sizeof(val));
|
|
|
|
return val;
|
|
}
|
|
|
|
void xe_mmio_write32(struct xe_mmio *mmio, struct xe_reg reg, u32 val)
|
|
{
|
|
u32 addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
|
|
trace_xe_reg_rw(mmio, true, addr, val, sizeof(val));
|
|
|
|
if (!reg.vf && IS_SRIOV_VF(mmio->tile->xe))
|
|
xe_gt_sriov_vf_write32(mmio->sriov_vf_gt ?:
|
|
mmio->tile->primary_gt, reg, val);
|
|
else
|
|
writel(val, mmio->regs + addr);
|
|
}
|
|
|
|
u32 xe_mmio_read32(struct xe_mmio *mmio, struct xe_reg reg)
|
|
{
|
|
u32 addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
u32 val;
|
|
|
|
/* Wa_15015404425 */
|
|
mmio_flush_pending_writes(mmio);
|
|
|
|
if (!reg.vf && IS_SRIOV_VF(mmio->tile->xe))
|
|
val = xe_gt_sriov_vf_read32(mmio->sriov_vf_gt ?:
|
|
mmio->tile->primary_gt, reg);
|
|
else
|
|
val = readl(mmio->regs + addr);
|
|
|
|
trace_xe_reg_rw(mmio, false, addr, val, sizeof(val));
|
|
|
|
return val;
|
|
}
|
|
|
|
u32 xe_mmio_rmw32(struct xe_mmio *mmio, struct xe_reg reg, u32 clr, u32 set)
|
|
{
|
|
u32 old, reg_val;
|
|
|
|
old = xe_mmio_read32(mmio, reg);
|
|
reg_val = (old & ~clr) | set;
|
|
xe_mmio_write32(mmio, reg, reg_val);
|
|
|
|
return old;
|
|
}
|
|
|
|
int xe_mmio_write32_and_verify(struct xe_mmio *mmio,
|
|
struct xe_reg reg, u32 val, u32 mask, u32 eval)
|
|
{
|
|
u32 reg_val;
|
|
|
|
xe_mmio_write32(mmio, reg, val);
|
|
reg_val = xe_mmio_read32(mmio, reg);
|
|
|
|
return (reg_val & mask) != eval ? -EINVAL : 0;
|
|
}
|
|
|
|
bool xe_mmio_in_range(const struct xe_mmio *mmio,
|
|
const struct xe_mmio_range *range,
|
|
struct xe_reg reg)
|
|
{
|
|
u32 addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
|
|
return range && addr >= range->start && addr <= range->end;
|
|
}
|
|
|
|
/**
|
|
* xe_mmio_read64_2x32() - Read a 64-bit register as two 32-bit reads
|
|
* @mmio: MMIO target
|
|
* @reg: register to read value from
|
|
*
|
|
* Although Intel GPUs have some 64-bit registers, the hardware officially
|
|
* only supports GTTMMADR register reads of 32 bits or smaller. Even if
|
|
* a readq operation may return a reasonable value, that violation of the
|
|
* spec shouldn't be relied upon and all 64-bit register reads should be
|
|
* performed as two 32-bit reads of the upper and lower dwords.
|
|
*
|
|
* When reading registers that may be changing (such as
|
|
* counters), a rollover of the lower dword between the two 32-bit reads
|
|
* can be problematic. This function attempts to ensure the upper dword has
|
|
* stabilized before returning the 64-bit value.
|
|
*
|
|
* Note that because this function may re-read the register multiple times
|
|
* while waiting for the value to stabilize it should not be used to read
|
|
* any registers where read operations have side effects.
|
|
*
|
|
* Returns the value of the 64-bit register.
|
|
*/
|
|
u64 xe_mmio_read64_2x32(struct xe_mmio *mmio, struct xe_reg reg)
|
|
{
|
|
struct xe_reg reg_udw = { .addr = reg.addr + 0x4 };
|
|
u32 ldw, udw, oldudw, retries;
|
|
|
|
reg.addr = xe_mmio_adjusted_addr(mmio, reg.addr);
|
|
reg_udw.addr = xe_mmio_adjusted_addr(mmio, reg_udw.addr);
|
|
|
|
/* we shouldn't adjust just one register address */
|
|
xe_tile_assert(mmio->tile, reg_udw.addr == reg.addr + 0x4);
|
|
|
|
oldudw = xe_mmio_read32(mmio, reg_udw);
|
|
for (retries = 5; retries; --retries) {
|
|
ldw = xe_mmio_read32(mmio, reg);
|
|
udw = xe_mmio_read32(mmio, reg_udw);
|
|
|
|
if (udw == oldudw)
|
|
break;
|
|
|
|
oldudw = udw;
|
|
}
|
|
|
|
drm_WARN(&mmio->tile->xe->drm, retries == 0,
|
|
"64-bit read of %#x did not stabilize\n", reg.addr);
|
|
|
|
return (u64)udw << 32 | ldw;
|
|
}
|
|
|
|
static int __xe_mmio_wait32(struct xe_mmio *mmio, struct xe_reg reg, u32 mask, u32 val,
|
|
u32 timeout_us, u32 *out_val, bool atomic, bool expect_match)
|
|
{
|
|
ktime_t cur = ktime_get_raw();
|
|
const ktime_t end = ktime_add_us(cur, timeout_us);
|
|
int ret = -ETIMEDOUT;
|
|
s64 wait = 10;
|
|
u32 read;
|
|
bool check;
|
|
|
|
for (;;) {
|
|
read = xe_mmio_read32(mmio, reg);
|
|
|
|
check = (read & mask) == val;
|
|
if (!expect_match)
|
|
check = !check;
|
|
|
|
if (check) {
|
|
ret = 0;
|
|
break;
|
|
}
|
|
|
|
cur = ktime_get_raw();
|
|
if (!ktime_before(cur, end))
|
|
break;
|
|
|
|
if (ktime_after(ktime_add_us(cur, wait), end))
|
|
wait = ktime_us_delta(end, cur);
|
|
|
|
if (atomic)
|
|
udelay(wait);
|
|
else
|
|
usleep_range(wait, wait << 1);
|
|
wait <<= 1;
|
|
}
|
|
|
|
if (ret != 0) {
|
|
read = xe_mmio_read32(mmio, reg);
|
|
|
|
check = (read & mask) == val;
|
|
if (!expect_match)
|
|
check = !check;
|
|
|
|
if (check)
|
|
ret = 0;
|
|
}
|
|
|
|
if (out_val)
|
|
*out_val = read;
|
|
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* xe_mmio_wait32() - Wait for a register to match the desired masked value
|
|
* @mmio: MMIO target
|
|
* @reg: register to read value from
|
|
* @mask: mask to be applied to the value read from the register
|
|
* @val: desired value after applying the mask
|
|
* @timeout_us: time out after this period of time. Wait logic tries to be
|
|
* smart, applying an exponential backoff until @timeout_us is reached.
|
|
* @out_val: if not NULL, points where to store the last unmasked value
|
|
* @atomic: needs to be true if calling from an atomic context
|
|
*
|
|
* This function polls for the desired masked value and returns zero on success
|
|
* or -ETIMEDOUT if timed out.
|
|
*
|
|
* Note that @timeout_us represents the minimum amount of time to wait before
|
|
* giving up. The actual time taken by this function can be a little more than
|
|
* @timeout_us for different reasons, specially in non-atomic contexts. Thus,
|
|
* it is possible that this function succeeds even after @timeout_us has passed.
|
|
*/
|
|
int xe_mmio_wait32(struct xe_mmio *mmio, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us,
|
|
u32 *out_val, bool atomic)
|
|
{
|
|
return __xe_mmio_wait32(mmio, reg, mask, val, timeout_us, out_val, atomic, true);
|
|
}
|
|
|
|
/**
|
|
* xe_mmio_wait32_not() - Wait for a register to return anything other than the given masked value
|
|
* @mmio: MMIO target
|
|
* @reg: register to read value from
|
|
* @mask: mask to be applied to the value read from the register
|
|
* @val: value not to be matched after applying the mask
|
|
* @timeout_us: time out after this period of time
|
|
* @out_val: if not NULL, points where to store the last unmasked value
|
|
* @atomic: needs to be true if calling from an atomic context
|
|
*
|
|
* This function works exactly like xe_mmio_wait32() with the exception that
|
|
* @val is expected not to be matched.
|
|
*/
|
|
int xe_mmio_wait32_not(struct xe_mmio *mmio, struct xe_reg reg, u32 mask, u32 val, u32 timeout_us,
|
|
u32 *out_val, bool atomic)
|
|
{
|
|
return __xe_mmio_wait32(mmio, reg, mask, val, timeout_us, out_val, atomic, false);
|
|
}
|