Commit Graph

16673 Commits

Author SHA1 Message Date
Alex Deucher
f3b37ebf2c drm/amdgpu: fix SPDX headers on amdgpu_cper.c/h
These should be MIT.  The driver in general is MIT and
the license text at the top of the files is MIT so fix
it.

Fixes: 92d5d2a09d ("drm/amdgpu: Introduce funcs for populating CPER")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4654
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit abd3f87640)
2025-10-28 11:02:36 -04:00
Mario Limonciello
ba10f8d92a drm/amd: Check that VPE has reached DPM0 in idle handler
[Why]
Newer VPE microcode has functionality that will decrease DPM level
only when a workload has run for 2 or more seconds.  If VPE is turned
off before this DPM decrease and the PMFW doesn't reset it when
power gating VPE, the SOC can get stuck with a higher DPM level.

This can happen from amdgpu's ring buffer test because it's a short
quick workload for VPE and VPE is turned off after 1s.

[How]
In idle handler besides checking fences are drained check PMFW version
to determine if it will reset DPM when power gating VPE.  If PMFW will
not do this, then check VPE DPM level. If it is not DPM0 reschedule
delayed work again until it is.

v2: squash in return fix (Alex)

Cc: Peyton.Lee@amd.com
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
Tested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3ac635367e)
Cc: stable@vger.kernel.org
2025-10-28 10:58:34 -04:00
Timur Kristóf
f44aad39b6 drm/amdgpu: Use DC by default for Bonaire
Now that DC supports analog connectors, there is nothing stopping
us from using it by default on Bonaire.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 10:10:13 -04:00
Srinivasan Shanmugam
dfc74e37bd drm/amdkfd: Fix use-after-free of HMM range in svm_range_validate_and_map()
The function svm_range_validate_and_map() was freeing `range` when
amdgpu_hmm_range_get_pages() failed. But later, the code still used the
same `range` pointer and freed it again. This could cause a
use-after-free and double-free issue.

The fix sets `range = NULL` right after it is freed and checks for
`range` before using or freeing it again.

v2: Removed duplicate !r check in the condition for clarity.

v3: In amdgpu_hmm_range_get_pages(), when hmm_range_fault() fails, we
kvfree(pfns) but leave the pointer in hmm_range->hmm_pfns still pointing
to freed memory. The caller (or amdgpu_hmm_range_free(range)) may try to
free range->hmm_range.hmm_pfns again, causing a double free, Setting
hmm_range->hmm_pfns = NULL immediately after kvfree(pfns) prevents both
double free. (Philip)

In svm_range_validate_and_map(), When r == 0, it means success → range
is not NULL.  When r != 0, it means failure → already made range = NULL.
So checking both (!r && range) is unnecessary because the moment r == 0,
we automatically know range exists and is safe to use. (Philip)

Fixes: 737da5363c ("drm/amdgpu: update the functions to use amdgpu version of hmm")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Philip Yang <Philip.Yang@amd.com>
Cc: Sunil Khatri <sunil.khatri@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Philip Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 10:02:21 -04:00
Alex Deucher
a0559012a1 drm/amdgpu/userq: fix SDMA and compute validation
The CSA and EOP buffers have different alignement requirements.
Hardcode them for now as a bug fix.  A proper query will be added in
a subsequent patch.

v2: verify gfx shadow helper callback (Prike)

Fixes: 9e46b8bb05 ("drm/amdgpu: validate userq buffer virtual address and size")
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:59:48 -04:00
Jesse.Zhang
f18719ef4b drm/amdgpu: Convert amdgpu userqueue management from IDR to XArray
This commit refactors the AMDGPU userqueue management subsystem to replace
IDR (ID Allocation) with XArray for improved performance, scalability, and
maintainability. The changes address several issues with the previous IDR
implementation and provide better locking semantics.

Key changes:

1. **Global XArray Introduction**:
   - Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking
   - Uses doorbell_index as key for efficient global lookup
   - Replaces the previous `userq_mgr_list` linked list approach

2. **Per-process XArray Conversion**:
   - Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr`
   - Maintains per-process queue tracking with queue_id as key
   - Uses XA_FLAGS_ALLOC for automatic ID allocation

3. **Locking Improvements**:
   - Removed global `userq_mutex` from `struct amdgpu_device`
   - Replaced with fine-grained XArray locking using XArray's internal spinlocks

4. **Runtime Idle Check Optimization**:
   - Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty

5. **Queue Management Functions**:
   - Converted all IDR operations to equivalent XArray functions:
     - `idr_alloc()` → `xa_alloc()`
     - `idr_find()` → `xa_load()`
     - `idr_remove()` → `xa_erase()`
     - `idr_for_each()` → `xa_for_each()`

Benefits:
- **Performance**: XArray provides better scalability for large numbers of queues
- **Memory Efficiency**: Reduced memory overhead compared to IDR
- **Thread Safety**: Improved locking semantics with XArray's internal spinlocks

v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa
    Remove xa_lock and use its own lock.

v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create()
v4: use xa_store_irq (Christian)
    hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian)

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:59:22 -04:00
Mario Limonciello
1454642960 drm/amd: Re-introduce property to control adaptive backlight modulation
commit 0887054d14 ("drm/amd: Drop abm_level property") dropped the
abm level property in favor of sysfs control. Since then there have
been discussions that compositors showed an interest in modifying
a vendor specific property instead.

So re-introduce the abm level property, but with different semantics.
Rather than being an integer it's now an enum. One of the enum options
is 'sysfs', and that is because there is still a sysfs file for use by
userspace when the compositor doesn't support this property.

If usespace has not modified this property, the default value will
be for sysfs to control it. Once userspace has set the property stop
allowing sysfs control.

The property is only attached to non-OLED eDP panels.

Cc: Xaver Hugl <xaver.hugl@kde.org>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:56:14 -04:00
Srinivasan Shanmugam
90ef1dcb1d drm/amdgpu: Fix pointer casts when reading dynamic region sizes
The function amdgpu_virt_get_dynamic_data_info() writes a 64-bit size
value.  In two places (amdgpu_bios.c and amdgpu_discovery.c), the code
passed the address of a smaller variable by casting it to u64 *, which
is unsafe.

This could make the function write more bytes than the smaller variable
can hold, possibly overwriting nearby memory. Reported by static
analysis tools.

v2: Dynamic region size comes from the host (SR-IOV setup) and is always
fixed to 5 MB. (Lijo/Ellen)

5 MB easily fits inside a 32-bit value, so using a 64-bit type is not
needed. It also avoids extra type casts

Fixes: b4a8fcc782 ("drm/amdgpu: Add logic for VF ipd and VF bios to init from dynamic crit_region offsets")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Ellen Pan <yunru.pan@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:55:16 -04:00
Sunil Khatri
84564d2920 drm/amdgpu: null check for hmm_pfns ptr before freeing it
Due to low memory or when num of pages is too big to be
accomodated, allocation could fail for pfn's.

Chekc hmm_pfns for NULL before calling the kvfree for the it.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:54:46 -04:00
Srinivasan Shanmugam
00dc2ff519 drm/amdgpu: Make SR-IOV critical region checks overflow-safe
The function amdgpu_virt_init_critical_region() contained an invalid
check for a negative init_hdr_offset value:

    if (init_hdr_offset < 0)

Since init_hdr_offset is an unsigned 32-bit integer, this condition can
never be true and triggers a Smatch warning:

    warn: unsigned 'init_hdr_offset' is never less than zero

In addition, the subsequent bounds check: if ((init_hdr_offset +
init_hdr_size) > vram_size) was vulnerable to integer overflow when
adding the two unsigned values.  Thus, by promoting offset and size to
64-bit and using check_add_overflow() to safely validate the sum against
VRAM size.

Fixes: 07009df649 ("drm/amdgpu: Introduce SRIOV critical regions v2 during VF init")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Ellen Pan <yunru.pan@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:52:44 -04:00
Alex Deucher
102c4f7c55 drm/amdgpu: fix SPDX header on cyan_skillfish_reg_init.c
This should be MIT.  The driver in general is MIT and
the license text at the top of the file is MIT so fix
it.

Fixes: e8529dbc75 ("drm/amdgpu: add ip offset support for cyan skillfish")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4654
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:52:40 -04:00
Alex Deucher
abd3f87640 drm/amdgpu: fix SPDX headers on amdgpu_cper.c/h
These should be MIT.  The driver in general is MIT and
the license text at the top of the files is MIT so fix
it.

Fixes: 92d5d2a09d ("drm/amdgpu: Introduce funcs for populating CPER")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4654
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:51:50 -04:00
Dan Carpenter
6142aa0660 drm/amdgpu/userqueue: Fix use after free in amdgpu_userq_buffer_vas_list_cleanup()
The amdgpu_userq_buffer_va_list_del() function frees "va_cursor" but it
is dereferenced on the next line when we print the debug message.  Print
the debug message first and then free it.

Fixes: 2a28f9665d ("drm/amdgpu: track the userq bo va for its obj management")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:50:58 -04:00
Jinzhou Su
214eb7e83d drm/amdgpu: Add uniras version in sysfs
Display uniras version in sysfs version interface
when uniras enable.

v2: display ras version detail info

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:47:49 -04:00
Perry Yuan
28d4de7ebf drm/amdgpu: get rev_id from strap register or IP-discovery table
Query the sub-revision field in the IP Discovery table for the VFs
to obtain their revision ID.
Meanwhile, read the revision ID from the strap register for the PF.

Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:47:45 -04:00
Jinzhou Su
edddaada9e drm/amdgpu: clear bad page info of ras module
Clear bad page info of ras module.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:47:33 -04:00
Mario Limonciello
3ac635367e drm/amd: Check that VPE has reached DPM0 in idle handler
[Why]
Newer VPE microcode has functionality that will decrease DPM level
only when a workload has run for 2 or more seconds.  If VPE is turned
off before this DPM decrease and the PMFW doesn't reset it when
power gating VPE, the SOC can get stuck with a higher DPM level.

This can happen from amdgpu's ring buffer test because it's a short
quick workload for VPE and VPE is turned off after 1s.

[How]
In idle handler besides checking fences are drained check PMFW version
to determine if it will reset DPM when power gating VPE.  If PMFW will
not do this, then check VPE DPM level. If it is not DPM0 reschedule
delayed work again until it is.

v2: squash in return fix (Alex)

Cc: Peyton.Lee@amd.com
Reported-by: Sultan Alsawaf <sultan@kerneltoast.com>
Reviewed-by: Sultan Alsawaf <sultan@kerneltoast.com>
Tested-by: Sultan Alsawaf <sultan@kerneltoast.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4615
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-28 09:46:13 -04:00
Simona Vetter
098456f314 Merge tag 'drm-misc-next-2025-10-21' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19:

UAPI Changes:

amdxdna:
- Support reading last hardware error

Cross-subsystem Changes:

dma-buf:
- heaps: Create heap per CMA reserved location; Improve user-space documentation

Core Changes:

atomic:
- Clean up and improve state-handling interfaces, update drivers

bridge:
- Improve ref counting

buddy:
- Optimize block management

Driver Changes:

amdxdna:
- Fix runtime power management
- Support firmware debug output

ast:
- Set quirks for each chip model

atmel-hlcdc:
- Set LCDC_ATTRE register in plane disable
- Set correct values for plane scaler

bochs:
- Use vblank timer

bridge:
- synopsis: Support CEC; Init timer with correct frequency

cirrus-qemu:
- Use vblank timer

imx:
- Clean up

ivu:
- Update JSM API to 3.33.0
- Reset engine on more job errors
- Return correct error codes for jobs

komeda:
- Use drm_ logging functions

panel:
- edp: Support AUO B116XAN02.0

panfrost:
- Embed struct drm_driver in Panfrost device
- Improve error handling
- Clean up job handling

panthor:
- Support custom ASN_HASH for mt8196

renesas:
- rz-du: Fix dependencies

rockchip:
- dsi: Add support for RK3368
- Fix LUT size for RK3386

sitronix:
- Fix output position when clearing screens

qaic:
- Support dma-buf exports
- Support new firmware's READ_DATA implementation
- Replace kcalloc with memdup
- Replace snprintf() with sysfs_emit()
- Avoid overflows in arithmetics
- Clean up
- Fixes

qxl:
- Use vblank timer

rockchip:
- Clean up mode-setting code

vgem:
- Fix fence timer deadlock

virtgpu:
- Use vblank timer

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>

From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20251021111837.GA40643@linux.fritz.box
2025-10-24 13:25:20 +02:00
Simona Vetter
6200442de0 Merge tag 'drm-misc-next-2025-10-02' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for v6.19:

UAPI Changes:

Cross-subsystem Changes:
-  fbcon cleanups.
- Make drivers depend on FB_TILEBLITTING instead of selecting it,
  and hide FB_MODE_HELPERS.

Core Changes:
- More preparations for rust.
- Throttle dirty worker with vblank
- Use drm_for_each_bridge_in_chain_scoped in drm's bridge code and
  assorted fixes.
- Ensure drm_client_modeset tests are enabled in UML.
- Rename ttm_bo_put to ttm_bo_fini, as a further step in removing the
  TTM bo refcount.
- Add POST_LT_ADJ_REQ training sequence.
- Show list of removed but still allocated bridges.
- Add a simulated vblank interrupt for hardware without it,
  and add some helpers to use them in vkms and hypervdrm.

Driver Changes:
- Assorted small fixes, cleanups and updates to host1x, tegra,
  panthor,   amdxdna, gud, vc4, ssd130x, ivpu, panfrost, panthor,
  sysfb, bridge/sn65dsi86, solomon, ast, tidss.
- Convert drivers from using .round_rate() to .determine_rate()
- Add support for KD116N3730A07/A12, chromebook mt8189, JT101TM023,
  LQ079L1SX01, raspberrypi 5" panels.
- Improve reclocking on tegra186+ with nouveau.
- Improve runtime pm in amdxdna.
- Add support for HTX_PAI in imx.
- Use a helper to calculate dumb buffer sizes in most drivers.

Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://lore.kernel.org/r/b412fb91-8545-466a-8102-d89c0f2758a7@linux.intel.com
2025-10-21 10:16:34 +02:00
Lijo Lazar
883687c307 drm/amdgpu: Remove unused members in amdgpu_mman
Discovery related members are now part of amdgpu_discovery_info.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:22 -04:00
YiPeng Chai
fe2ccc7b7b drm/amdgpu: query block error count of ras module
Query block error count of ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:18 -04:00
Ellen Pan
91da591310 drm/amdgpu: Add logic for VF data exchange region to init from dynamic crit_region offsets
1. Added VF logic to init data exchange region using the offsets from dynamic(v2) critical regions;

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:14 -04:00
Ellen Pan
b4a8fcc782 drm/amdgpu: Add logic for VF ipd and VF bios to init from dynamic crit_region offsets
1. Added VF logic in amdgpu_virt to init IP discovery using the offsets from dynamic(v2) critical regions;
2. Added VF logic in amdgpu_virt to init bios image using the offsets from dynamic(v2) critical regions;

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:11 -04:00
Ellen Pan
13ccaa8443 drm/amdgpu: Reuse fw_vram_usage_* for dynamic critical region in SRIOV
- During guest driver init, asa VFs receive PF msg to
	init dynamic critical region(v2), VFs reuse fw_vram_usage_*
	 from ttm to store critical region tables in a 5MB chunk.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:08 -04:00
Ellen Pan
07009df649 drm/amdgpu: Introduce SRIOV critical regions v2 during VF init
1. Introduced amdgpu_virt_init_critical_region during VF init.
     - VFs use init_data_header_offset and init_data_header_size_kb
            transmitted via PF2VF mailbox to fetch the offset of
            critical regions' offsets/sizes in VRAM and save to
            adev->virt.crit_region_offsets and adev->virt.crit_region_sizes_kb.

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:28:05 -04:00
Ellen Pan
6d2191d226 drm/amdgpu: Add SRIOV crit_region_version support
1. Added enum amd_sriov_crit_region_version to support multi versions
2. Added logic in SRIOV mailbox to regonize crit_region version during
   req_gpu_init_data

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:58 -04:00
Ellen Pan
d88c8bec18 drm/amdgpu: Updated naming of SRIOV critical region offsets/sizes with _V1 suffix
- This change prepares the later patches to intro  _v2 suffix to SRIOV critical regions

Signed-off-by: Ellen Pan <yunru.pan@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:52 -04:00
YiPeng Chai
a6b5a7a033 drm/amdgpu: query bad page info of ras module
Query bad page info of ras module.

V2:
  Update code to reuse bad page output code.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:49 -04:00
YiPeng Chai
62902b88ff drm/amdgpu: ras module supports error injection
ras module supports error injection.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:27:45 -04:00
Tiezhu Yang
46791d147d drm/amd: Fix set but not used warnings
There are many set but not used warnings under drivers/gpu/drm/amd when
compiling with the latest upstream mainline GCC:

  drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c:305:18: warning: variable ‘p’ set but not used [-Wunused-but-set-variable=]
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:103:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=]
  ...
  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h:164:26: warning: variable ‘internal_reg_offset’ set but not used [-Wunused-but-set-variable=]
  ...
  drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:445:13: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]
  drivers/gpu/drm/amd/amdgpu/../display/dc/dc_dmub_srv.c:875:21: warning: variable ‘pipe_idx’ set but not used [-Wunused-but-set-variable=]

Remove the variables actually not used or add __maybe_unused attribute for
the variables actually used to fix them, compile tested only.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:58 -04:00
YiPeng Chai
7169e706c8 drm/amdgpu: Add ras module ip block to amdgpu discovery
Add ras module ip block to amdgpu discovery.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:54 -04:00
Tao Zhou
036f18d0a2 drm/amdgpu: check save count before RAS bad page saving
It's possible that unit_num is larger than 0 but save_count is zero,
since we do get bad page address but the address is invalid. Check
unit_num and save_count together.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Candice Li <candice.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:49 -04:00
Sunil Khatri
d5a62b7aa9 drm/amdgpu: add the kernel docs for alloc/free/valid range
Add kernel docs for the functions related to hmm_range.

Documents added for functions:
amdgpu_hmm_range_valid
amdgpu_hmm_range_alloc
amdgpu_hmm_range_free

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:46 -04:00
Victor Zhao
6169b555db drm/amdgpu: use GPU_HDP_FLUSH for sriov
Currently SRIOV runtime will use kiq to write HDP_MEM_FLUSH_CNTL for
hdp flush. This register need to be write from CPU for nbif to aware,
otherwise it will not work.

Implement amdgpu_kiq_hdp_flush and use kiq to do gpu hdp flush during
sriov runtime.

v2:
- fallback to amdgpu_asic_flush_hdp when amdgpu_kiq_hdp_flush failed
- add function amdgpu_mes_hdp_flush

v3:
- changed returned error

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:41 -04:00
Victor Zhao
e71ca1efd3 drm/amdgpu: Add kiq hdp flush callbacks
Add kiq hdp flush callbacks for gfx ips to support gpu hdp flush when no
ring presents

Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:36 -04:00
Mario Limonciello
152dca4ea7 drm/amd: Add a helper to tell whether an IP block HW is enabled
There is already a helper for telling if a block is valid, but if
IP handling wants to check if it's HW is enabled no such helper
exists.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:31 -04:00
Alysa Liu
bd07b3f08a drm/amdgpu: Fix vram_usage underflow
vram_usage was subtracting non-vram memory size,
which caused it to become negative.

Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:22 -04:00
Tvrtko Ursulin
8c62f75cb7 drm/amdgpu: Use memset32 for IB padding
Use memset32 instead of open coding it, just because it is
that bit nicer.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:25:10 -04:00
YiPeng Chai
04226ae1bc drm/amdgpu: Add ras module eeprom safety watermark check
Add ras module eeprom safety watermark check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:19:02 -04:00
YiPeng Chai
43a90c0732 drm/amdgpu: Avoid hive seqno increment in legacy ras
The hive->event_mgr variable is used by both ras module
and legacy ras. To ensure the continuity of hive seqno
growth, after enabling ras module, it is forbidden to
operate the event_mgr variable in legacy ras.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:18:55 -04:00
YiPeng Chai
47ba675a19 drm/amdgpu: Avoid loading bad pages into legacy ras
When ras module is enabled, the bad pages will
be loaded by ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:18:44 -04:00
YiPeng Chai
fe0f51d6d8 drm/amdgpu: add ras module rma check
Add ras module rma check.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:18:41 -04:00
YiPeng Chai
408bd841ad drm/amdgpu: Improve ras fatal error handling function
In multi-gpu case, a fatal error will generate several
fatal error interrupts. After improving this function,
the ras module can reuse this function to only
handle the first interrupt.

V3:
  Initialize event_id using RAS_EVENT_INVALID_ID.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:18:35 -04:00
YiPeng Chai
3d72d2e5f4 drm/amdgpu: Intercept ras interrupts to ras module
Intercept ras interrupts to ras module.

V2:
  Change function names in ras module.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-20 18:18:26 -04:00
Thomas Zimmermann
7910d69376 drm/client: Remove holds_console_lock parameter from suspend/resume
No caller of the client resume/suspend helpers holds the console
lock. The last such cases were removed from radeon in the patch
series at [1]. Now remove the related parameter and the TODO items.

v2:
- update placeholders for CONFIG_DRM_CLIENT=n

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/series/151624/ # [1]
Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/r/20251001143709.419736-1-tzimmermann@suse.de
2025-10-18 17:35:09 +02:00
Jonathan Kim
72ea12f6be drm/amdgpu: update remove after reset flag for MES remove queue
Remove queue after reset flag is required to remove a queue that has
been successfully reset to clean up the MES' internal state.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
YiPeng Chai
ace232eff5 drm/amdgpu: Add ras module files into amdgpu
Add ras module files into amdgpu.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
Sunil Khatri
42f1487884 drm/amdgpu/userqueue: validate userptrs for userqueues
userptrs could be changed by the user at any time and
hence while locking all the bos before GPU start processing
validate all the userptr bos.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
Sunil Khatri
737da5363c drm/amdgpu: update the functions to use amdgpu version of hmm
At times we need a bo reference for hmm and for that add
a new struct amdgpu_hmm_range which will hold an optional
bo member and hmm_range.

Use amdgpu_hmm_range instead of hmm_range and let the bo
as an optional argument for the caller if they want to
the bo reference to be taken or they want to handle that
explicitly.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00
Lijo Lazar
071bba9624 drm/amdgpu: Reserve discovery TMR only if needed
For legacy SOCs, discovery binary is sideloaded. Instead of checking for
binary blob, use a flag to determine if discovery region needs to be
reserved.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-10-13 14:14:36 -04:00