linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-26 18:42:25 -04:00

Author	SHA1	Message	Date
Mario Limonciello	6d622755bc	drm/amd: Drop some common modes from amdgpu_connector_add_common_modes() [Why] DC and non-DC codepaths have different sets of common modes that are added for eDP and LVDS cases. This can cause different behaviors for turning on DC on hardware that can support both. [How] Drop extra modes from amdgpu_connector_add_common_modes() not present in amdgpu_dm_connector_add_common_modes(). Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-5-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:54:12 -04:00
Alex Deucher	dbf2341569	drm/amdgpu: update MODULE_PARM_DESC for freesync_video To better describe what it does. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756 Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:54:07 -04:00
Mario Limonciello	123a1750c5	drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes() [Why] Adding or removing a mode from common_modes[] can be fragile if a user forgot to update the for loop boundaries. [How] Use ARRAY_SIZE() to detect size of the array and use that instead. Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Link: https://lore.kernel.org/r/20250924161624.1975819-4-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:55 -04:00
Jesse.Zhang	b8ae2640f9	drm/amdgpu: Fix fence signaling race condition in userqueue This commit fixes a potential race condition in the userqueue fence signaling mechanism by replacing dma_fence_is_signaled_locked() with dma_fence_is_signaled(). The issue occurred because: 1. dma_fence_is_signaled_locked() should only be used when holding the fence's individual lock, not just the fence list lock 2. Using the locked variant without the proper fence lock could lead to double-signaling scenarios: - Hardware completion signals the fence - Software path also tries to signal the same fence By using dma_fence_is_signaled() instead, we properly handle the locking hierarchy and avoid the race condition while still maintaining the necessary synchronization through the fence_list_lock. v2: drop the comment (Christian) Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:53:23 -04:00
Mario Limonciello	210844d2c0	drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes() [Why] amdgpu_connector_add_common_modes() has a check for the width and height of common modes being too small, but the array of common_modes[] has fixed values. The check is dead code. [How] Drop unnecessary check. Cc: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250924161624.1975819-3-mario.limonciello@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:50:58 -04:00
Sunil Khatri	4e3b45d7b6	drm/amdgpu: remove the redeclaration of variable i Variable "i" has been redeclared as integer later in the function which is wrong and not serving any purpose. Fixes: `899fbde146` ("drm/amdgpu: replace get_user_pages with HMM mirror helpers") Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:46:35 -04:00
Prike Liang	883bd89d00	drm/amdgpu/userq: assign an error code for invalid userq va It should return an error code if userq VA validation fails. Fixes: `9e46b8bb05` ("drm/amdgpu: validate userq buffer virtual address and size") Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:40:18 -04:00
Christian König	90e09ea4cf	drm/amdgpu: revert "rework reserved VMID handling" v2 This reverts commit `e44a0fe630`. Initially we used VMID reservation to enforce isolation between processes. That has now been replaced by proper fence handling. Both OpenGL, RADV and ROCm developers requested a way to reserve a VMID for SPM, so restore that approach by reverting back to only allowing a single process to use the reserved VMID. Only compile tested for now. v2: use -ENOENT instead of -EINVAL if VMID is not available Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:39:00 -04:00
Christian König	66f3883dbc	drm/amdgpu: remove leftover from enforcing isolation by VMID Initially we enforced isolation by reserving a VMID, but that practice was now removed. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:38:54 -04:00
Jesse.Zhang	7469567d88	drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails Add a fallback mechanism to attempt pipe reset when KCQ reset fails to recover the ring. After performing the KCQ reset and queue remapping, test the ring functionality. If the ring test fails, initiate a pipe reset as an additional recovery step. v2: fix the typo (Lijo) v3: try pipeline reset when kiq mapping fails (Lijo) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-25 15:38:48 -04:00
Mario Limonciello (AMD)	0a6e9e098f	drm/amd: Fix hybrid sleep [Why] commit `530694f54d` ("drm/amdgpu: do not resume device in thaw for normal hibernation") optimized the flow for systems that are going into S4 where the power would be turned off. Basically the thaw() callback wouldn't resume the device if the hibernation image was successfully created since the system would be powered off. This however isn't the correct flow for a system entering into s0i3 after the hibernation image is created. Some of the amdgpu callbacks have different behavior depending upon the intended state of the suspend. [How] Use pm_hibernation_mode_is_suspend() as an input to decide whether to run resume during thaw() callback. Reported-by: Ionut Nechita <ionut_n2001@yahoo.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4573 Tested-by: Ionut Nechita <ionut_n2001@yahoo.com> Fixes: `530694f54d` ("drm/amdgpu: do not resume device in thaw for normal hibernation") Acked-by: Alex Deucher <alexander.deucher@amd.com> Tested-by: Kenneth Crudup <kenny@panix.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Cc: 6.17+ <stable@vger.kernel.org> # 6.17+: `495c8d3503`: PM: hibernate: Add pm_hibernation_mode_is_suspend() Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2025-09-25 21:37:38 +02:00
Jesse.Zhang	5886090032	drm/amdgpu: Move VCN reset mask setup to late_init for VCN 5.0.1 This patch moves the initialization of the VCN supported_reset mask from sw_init to a new late_init function for VCN 5.0.1. The change ensures that all necessary hardware and firmware initialization is complete before determining the supported reset types. Key changes: - Added vcn_v5_0_1_late_init() function to handle late initialization - Moved supported_reset mask setup from sw_init to late_init - Added check for per-queue reset support via amdgpu_dpm_reset_vcn_is_supported() - Updated ip_funcs to use the new late_init function This change helps ensure proper reset behavior by waiting until all dependencies are initialized before determining available reset types. Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:41:37 -04:00
Jesse.Zhang	dc704458dd	drm/amdgpu: Add ring reset support for VCN v5.0.1 Implement the ring reset callback for VCN v5.0.1 to properly handle hardware recovery when encountering GPU hangs. The new functionality: 1. Adds vcn_v5_0_1_ring_reset() function that: - Prepares for reset using amdgpu_ring_reset_helper_begin() - Performs VCN instance reset via amdgpu_dpm_reset_vcn() - Re-initializes hardware through vcn_v5_0_1_hw_init_inst() - Restarts DPG mode with vcn_v5_0_1_start_dpg_mode() - Completes reset with amdgpu_ring_reset_helper_end() 2. Hooks the reset function into the unified ring functions via: - Adding .reset = vcn_v5_0_1_ring_reset to vcn_v5_0_1_unified_ring_vm_funcs 3. Maintains existing behavior for SR-IOV VF cases by checking RRMT status This provides proper hardware recovery capabilities for VCN 5.0.1 IP block during fault conditions, matching functionality available in other VCN versions. v2: Remove the RRMT_ENABLED cap setting in the reset function and replace adev->vcn.inst[ring->me].indirect_sram with vinst->indirect_sram (Lijo) Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:41:27 -04:00
Jesse.Zhang	eb6910cdaa	drm/amdgpu: Refactor VCN v5.0.1 HW init into separate instance function Split the per-instance initialization code from vcn_v5_0_1_hw_init() into a new vcn_v5_0_1_hw_init_inst() function. This improves code organization by: 1. Separating the instance-specific initialization logic 2. Making the main init function more readable 3. Following the pattern used in queue reset The SR-IOV specific initialization remains in the main function since it has different requirements. Reviewed-by: Sonny Jiang <sonny.jiang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Ruili Ji <ruiliji2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:41:11 -04:00
Rahul Kumar	86a54e45fd	drm/amdgpu: Use kmalloc_array() instead of kmalloc() Documentation/process/deprecated.rst recommends against the use of kmalloc with dynamic size calculations due to the risk of overflow and smaller allocation being made than the caller was expecting. Replace kmalloc() with kmalloc_array() in amdgpu_amdkfd_gfx_v10.c, amdgpu_amdkfd_gfx_v10_3.c, amdgpu_amdkfd_gfx_v11.c and amdgpu_amdkfd_gfx_v12.c to make the intended allocation size clearer and avoid potential overflow issues. Suggested-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Rahul Kumar <rk0006818@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:35:54 -04:00
Sonny Jiang	854b9ab637	drm/amdgpu: Update amdgpu_vcn5_fw_shared for vcn_5_0_1 Align vcn5_fw_shared structure with FW Signed-off-by: Sonny Jiang <sonny.jiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:22:51 -04:00
Mario Limonciello	1fb710793c	drm/amdgpu: Enable MES lr_compute_wa by default The MES set resources packet has an optional bit 'lr_compute_wa' which can be used for preventing MES hangs on long compute jobs. Set this bit by default. Co-developed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:22:38 -04:00
Sunil Khatri	c5b3cc417b	drm/amdgpu: use hmm_pfns instead of array of pages we dont need to allocate local array of pages to hold the pages returned by the hmm, instead we could use the hmm_range structure itself to get to hmm_pfn and get the required pages directly. This avoids call to alloc/free quite a lot. Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:22:31 -04:00
Lijo Lazar	b29c22b8da	drm/amdgpu: Fix vbios build number parsing logic It's not necessary that the build string and atom header section has a difference of 32 bytes. Use the remaining bytes in the section as copy limit. Fixes: `d6fa802661` ("drm/amdgpu: Add vbios build number interface") Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-23 10:21:09 -04:00
Dave Airlie	342f141ba9	Merge tag 'amd-drm-next-6.18-2025-09-19' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-09-19: amdgpu: - Fence drv clean up fix - DPC fixes - Misc display fixes - Support the MMIO remap page as a ttm pool - JPEG parser updates - UserQ updates - VCN ctx handling fixes - Documentation updates - Misc cleanups - SMU 13.0.x updates - SI DPM updates - GC 11.x cleaner shader updates - DMCUB updates - DML fixes - Improve fallback handling for pixel encoding - VCN reset improvements - DCE6 DC updates - DSC fixes - Use devm for i2c buses - GPUVM locking updates - GPUVM documentation improvements - Drop non-DC DCE11 code - S0ix fixes - Backlight fix - SR-IOV fixes amdkfd: - SVM updates Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250919193354.2989255-1-alexander.deucher@amd.com	2025-09-22 08:45:51 +10:00
Guangshuo Li	cc9a8e238e	drm/amdgpu/atom: Check kcalloc() for WS buffer in amdgpu_atom_execute_table_locked() kcalloc() may fail. When WS is non-zero and allocation fails, ectx.ws remains NULL while ectx.ws_size is set, leading to a potential NULL pointer dereference in atom_get_src_int() when accessing WS entries. Return -ENOMEM on allocation failure to avoid the NULL dereference. Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 16:59:27 -04:00
Christian König	59e4405e9e	drm/amdgpu: revert to old status lock handling v3 It turned out that protecting the status of each bo_va with a spinlock was just hiding problems instead of solving them. Revert the whole approach, add a separate stats_lock and lockdep assertions that the correct reservation lock is held all over the place. This not only allows for better checks if a state transition is properly protected by a lock, but also switching back to using list macros to iterate over the state of lists protected by the dma_resv lock of the root PD. v2: re-add missing check v3: split into two patches Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 16:59:14 -04:00
Alex Deucher	9272bb34b0	drm/amdgpu: suspend KFD and KGD user queues for S0ix We need to make sure the user queues are preempted so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `f8b367e6fa`) Cc: stable@vger.kernel.org	2025-09-18 14:59:41 -04:00
Alex Deucher	2ade36eaa9	drm/amdkfd: add proper handling for S0ix When in S0i3, the GFX state is retained, so all we need to do is stop the runlist so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4bfa860993`) Cc: stable@vger.kernel.org	2025-09-18 14:59:24 -04:00
Sunil Khatri	0aa09d8a6c	drm/amdgpu: add missing comment for the new argument In function 'amdgpu_vm_lock_done_list' update the comment for the new argument 'vm'. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202509180211.UAqME0zj-lkp@intel.com/ Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:38 -04:00
Alex Deucher	f8b367e6fa	drm/amdgpu: suspend KFD and KGD user queues for S0ix We need to make sure the user queues are preempted so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:30 -04:00
Alex Deucher	846de1384a	drm/amdgpu/userq: Optimize S0ix handling In S0i3, GFX state is retained, so it's preferrable to preempt queues rather than unmapping them as the overhead is lower. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:23 -04:00
Joe.Wang	f05c03ffc7	drm/amdgpu: Fix PRT flag for gfx12 AMDGPU_PTE_PRT_GFX12 flag is missed during pageTable rework, add it back. Fixes: `6716a823d1` ("drm/amdgpu: rework how PTE flags are generated v3") Signed-off-by: Joe Wang <joe.wang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:10 -04:00
Xiang Liu	1ed511fb76	drm/amdgpu: Check VF critical region before RAS poison injection Check VF critical region before RAS poison injection to ensure that the poison injection will not hit the VF critical region. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:10 -04:00
Alex Deucher	4bfa860993	drm/amdkfd: add proper handling for S0ix When in S0i3, the GFX state is retained, so all we need to do is stop the runlist so GFX can enter gfxoff. Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org> Tested-by: David Perry <david.perry@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:02 -04:00
Xiang Liu	f1fdeb3d07	drm/amdgpu: Introduce VF critical region check for RAS poison injection The SRIOV guest send requet to host to check whether the poison injection address is in VF critical region or not via mabox. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:02 -04:00
Alex Deucher	18f769ff36	drm/amdgpu: remove non-DC DCE 11 code DC has been the default for ~8 years now and supports many things that the non-DC code does not (audio, DP MST, etc.). No DCE 11.x IPs ever supported analog encoders so that is not an issue. Finally drop this code. Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:02 -04:00
Christian König	ed7a4397f5	drm/ttm: rename ttm_bo_put to _fini v3 Give TTM BOs a separate cleanup function. No funktional change, but the next step in removing the TTM BO reference counting and replacing it with the GEM object reference counting. v2: move the code around a bit to make it clearer what's happening v3: fix nouveau_bo_fini as well Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://lore.kernel.org/r/20250909144311.1927-1-christian.koenig@amd.com	2025-09-17 14:03:21 +02:00
Christian König	df99f6d112	drm/amdgpu: re-order and document VM code Re-order fields in the VM structure and try to improve the documentation a bit. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:51:48 -04:00
Christian König	930595df25	drm/amdgpu: remove check for BO reservation add assert instead We should leave such checks to lockdep and not implement something manually. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:51:35 -04:00
Rodrigo Siqueira	63137c7c8c	drm/amdgpu: Use devm_i2c_add_adapter() in SMU V11 Instead of using i2c_add_adapter() and i2c_del_adapter() in the SMU V11, use devm_i2c_add_adapter() to simplify the code path. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:47:24 -04:00
Rodrigo Siqueira	0f36a3c6af	drm/amdgpu/amdgpu_i2c: Use devm_i2c_add_adapter instead of i2c_add_adapter This commit replaces i2c_add_adapter() with devm_i2c_add_adapter() and removes part of the cleanup logic since the new function handles the i2c removal. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:47:22 -04:00
Christian König	39203f5e6d	drm/amdgpu: fix userq VM validation v4 That was actually complete nonsense and not validating the BOs at all. The code just cleared all VM areas were it couldn't grab the lock for a BO. Try to fix this. Only compile tested at the moment. v2: fix fence slot reservation as well as pointed out by Sunil. also validate PDs, PTs, per VM BOs and update PDEs v3: grab the status_lock while working with the done list. v4: rename functions, add some comments, fix waiting for updates to complete. v4: rename amdgpu_vm_lock_done_list(), add some more comments Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:47:06 -04:00
Christian König	d7ddcf921e	drm/amdgpu: reject gang submissions under SRIOV Gang submission means that the kernel driver guarantees that multiple submissions are executed on the HW at the same time on different engines. Background is that those submissions then depend on each other and each can't finish stand alone. SRIOV now uses world switch to preempt submissions on the engines to allow sharing the HW resources between multiple VFs. The problem is now that the SRIOV world switch can't know about such inter dependencies and will cause a timeout if it waits for a partially running gang submission. To conclude SRIOV and gang submissions are fundamentally incompatible at the moment. For now just disable them. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-16 17:47:00 -04:00
Srinivasan Shanmugam	c1b6b8c770	drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to ensure data isolation among GPU tasks. The cleaner shader is tasked with clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid data leakage and guarantees the accuracy of computational results. This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Wasee Alam <wasee.alam@amd.com> Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `0a71ceb27f`)	2025-09-15 17:23:42 -04:00
Christian König	2740509623	drm/amdgpu: revert "Implement new dummy vram manager" This is should be unnecessary since a VRAM manager isn't mandatory in the first place. It could be that we have some missing checks inside AMDGPU or TTM but those should then be fixed instead of worked around like that. Signed-off-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:04:49 -04:00
Christian König	a9273da04f	drm/amdgpu: add AMDGPU_IDS_FLAGS_GANG_SUBMIT Add a UAPI flag indicating if gang submit is supported or not. Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:04:42 -04:00
Shaoyun Liu	85442bac84	drm/amd/amdgpu: Fix the mes version that support inv_tlbs MES pipe0 will do VM invalidation with engine set 5 when assign VMID to a process, driver will submit inv_tlb package to mes pipe1. It might run into race condition if both pipes use the same invalidate engine set. From MES version 0x83 it will use invalidate engine set 6 for pipe1 to fix the issue Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:02:44 -04:00
Mario Limonciello (AMD)	531df041f2	drm/amd: Avoid evicting resources at S5 Normally resources are evicted on dGPUs at suspend or hibernate and on APUs at hibernate. These steps are unnecessary when using the S4 callbacks to put the system into S5. Cc: AceLan Kao <acelan.kao@canonical.com> Cc: Kai-Heng Feng <kaihengf@nvidia.com> Cc: Mark Pearson <mpearson-lenovo@squebb.ca> Cc: Denis Benato <benato.denis96@gmail.com> Cc: Merthan Karakaş <m3rthn.k@gmail.com> Tested-by: Eric Naim <dnaim@cachyos.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:02:39 -04:00
Jesse.Zhang	bb1d7f157e	drm/amdgpu: Switch user queues to use preempt/restore for eviction This patch modifies the user queue management to use preempt/restore operations instead of full map/unmap for queue eviction scenarios where applicable. The changes include: 1. Introduces new helper functions: - amdgpu_userqueue_preempt_helper() - amdgpu_userqueue_restore_helper() 2. Updates queue state management to track PREEMPTED state 3. Modifies eviction handling to use preempt instead of unmap: - amdgpu_userq_evict_all() now uses preempt_helper - amdgpu_userq_restore_all() now uses restore_helper The preempt/restore approach provides better performance during queue eviction by avoiding the overhead of full queue teardown and setup. Full map/unmap operations are still used for initial setup/teardown and system suspend scenarios. v2: rename amdgpu_userqueue_restore_helper/amdgpu_userqueue_preempt_helper to amdgpu_userq_restore_helper/amdgpu_userq_preempt_helper for consistency. (Alex) v3: amdgpu_userq_stop_sched_for_enforce_isolation() and amdgpu_userq_start_sched_for_enforce_isolation() should use preempt and restore (Alex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:02:33 -04:00
Jesse.Zhang	5cefcbb306	drm/amdgpu: adjust MES API used for suspend and resume Use the suspend and resume API rather than remove queue and add queue API. The former just preempts the queue while the latter remove it from the scheduler completely. There is no need to do that, we only need preemption in this case. V2: replace queue_active with queue state v3: set the suspend_fence_addr v4: allocate another per queue buffer for the suspend fence, and set the sequence number. also wait for the suspend fence. (Alex) v5: use a wb slot (Alex) v6: Change the timeout period. For MES, the default timeout is 2100000; /* 2100 ms */ (Alex) Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 17:02:28 -04:00
Hawking Zhang	46fbe1e349	Revert "drm/amdgpu: Allocate psp fw private buffer in vram" This reverts commit `22dcb283d6`. Need to certain APU platforms and will proceed to rework the patch accordingly Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Le Ma <Le.Ma@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 16:56:15 -04:00
Srinivasan Shanmugam	0a71ceb27f	drm/amdgpu/gfx11: Add Cleaner Shader Support for GFX11.0.1/11.0.4 GPUs Enable the cleaner shader for additional GFX11.0.1/11.0.4 series GPUs to ensure data isolation among GPU tasks. The cleaner shader is tasked with clearing the Local Data Store (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs), which helps avoid data leakage and guarantees the accuracy of computational results. This update extends cleaner shader support to GFX11.0.1/11.0.4 GPUs, previously available for GFX11.0.3. It enhances security by clearing GPU memory between processes and maintains a consistent GPU state across KGD and KFD workloads. Cc: Wasee Alam <wasee.alam@amd.com> Cc: Mario Sopena-Novales <mario.novales@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 16:56:07 -04:00
Lijo Lazar	780f7a45e5	drm/amdgpu: Add virtual device capabilities Add a member to define the capabilities of virtual device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 16:55:48 -04:00
Lijo Lazar	1f9ba8ea04	drm/amdgpu: Add generic capability class Define a utility macro for defining capabilities and their attributes. Capability attributes are read-only, write-only, read-write. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 16:55:41 -04:00

... 12 13 14 15 16 ...

17069 Commits