linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-05 06:52:34 -04:00

Author	SHA1	Message	Date
Tao Zhou	b7674ae75b	drm/amdgu: get RAS retire flip bits for new type of HBM Get RAS retire flip bits for HBM with different types in various NPS modes. Also set flip row bit and MCA R13 bit in PA in different NPS modes. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:08 -04:00
Tao Zhou	9b5b71895b	drm/amdgpu: implement get_retire_flip_bits for UMC v12 The RAS bad page retire flip bits can be set per vram type, vram vendor and NPS mode. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:05 -04:00
Tao Zhou	699bff37a5	drm/amdgpu: add get_retire_flip_bits for UMC Add the general interface to get flip bits for RAS bad page retirement. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:32:01 -04:00
fanhuang	80f66ca7a4	drm/amdgpu: add vcn v5_0_0 ip headers Add vcn v5_0_0 register offset and shift masks header files Only include the registers required for MMSCH initialization Signed-off-by: fanhuang <FangSheng.Huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:51 -04:00
Tao Zhou	4ce5b99128	drm/amdgpu: adjust high bits for RAS retired page Per UMC address conversion algorithm, the high row bits of UMC MCA address are changed when they're converted into normalized address on specific ASICs. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:46 -04:00
Tao Zhou	1df57411a6	drm/amd: add definition for new memory type Support new version of HBM. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:40 -04:00
ganglxie	f5db59067c	Refine RAS bad page records counting and parsing in eeprom V3 there is only MCA records in V3, no need to care about PA records. recalculate the value of ras_num_bad_pages when parsing failed and go on with the left records instead of quit. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:33 -04:00
Taimur Hassan	3ab3d680ff	drm/amd/display: Promote DC to 3.2.333 Summary * Refactor DMI quirks * Fix link-off issue triggered by quick unplug/replug * Fix race condition when submitting DMUB commands * Correct reply value when AUX Write incomplete * Backup / restore plane config only on update Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:28 -04:00
Michael Strauss	8989cb919b	drm/amd/display: Add early 8b/10b channel equalization test pattern sequence [WHY] Early EQ pattern sequence is required for some LTTPR + old dongle combinations. [HOW] If DP_EARLY_8B10B_TPS2 chip cap is set, this new sequence programs phy to output TPS2 before initiating link training and writes TPS1 to LTTPR training pattern register as instructed by vendor. Add function to get embedded LTTPR target address offset. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: TungYu Lu <tungyu.lu@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:21 -04:00
Sung Lee	90af999835	drm/amd/display: Program triplebuffer on all pipes [WHY] Triplebuffer should be programmed on all pipes. Some code assumed it only needed to be called on top pipe, but as the HWSS function does not account for that, it must be called on every pipe. [HOW] Remove condition to not program triplebuffer on non-top/next pipe. Call the function unconditionally on all pipes. Reviewed-by: Dillon Varone <dillon.varone@amd.com> Reviewed-by: Aric Cyr <aric.cyr@amd.com> Signed-off-by: Sung Lee <Sung.Lee@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:15 -04:00
Taimur Hassan	e91c91e506	drm/amd/display: [FW Promotion] Release 0.1.10.0 Refactoring some IPS and panel replay structs Acked-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:12 -04:00
Samson Tam	c8d7e0be81	drm/amd/display: disable EASF narrow filter sharpening [Why & How] Default should be 1 to disable EASF narrow filter sharpening. Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Samson Tam <Samson.Tam@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:31:08 -04:00
Wayne Lin	ea979dd401	drm/amd/display: Return the exact value for debugging [Why] It's unnecessary to set operation_result as invalid reply when p_notify->result != AUX_RET_SUCCESS. [How] Set operation_result as p_notify->result to better understand the reason for the error Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:30:58 -04:00
Mario Limonciello	de6485e3df	drm/amd/display: Restructure DMI quirks [Why] DMI quirks are relatively big code that makes amdgpu_dm 200 lines larger. [How] Move DMI quirks into a dedicated source file and make all quirks variables for `struct amdgpu_display_manager`. Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:30:52 -04:00
Aurabindo Pillai	f8ad62c0a9	drm/amd/display: check stream id dml21 wrapper to get plane_id [Why & How] Fix a false positive warning which occurs due to lack of correct checks when querying plane_id in DML21. This fixes the warning when performing a mode1 reset (cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover): [ 35.751250] WARNING: CPU: 11 PID: 326 at /tmp/amd.PHpyAl7v/amd/amdgpu/../display/dc/dml2/dml2_dc_resource_mgmt.c:91 dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751434] Modules linked in: amdgpu(OE) amddrm_ttm_helper(OE) amdttm(OE) amddrm_buddy(OE) amdxcp(OE) amddrm_exec(OE) amd_sched(OE) amdkcl(OE) drm_suballoc_helper drm_ttm_helper ttm drm_display_helper cec rc_core i2c_algo_bit rfcomm qrtr cmac algif_hash algif_skcipher af_alg bnep amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_hdmi snd_hda_intel edac_mce_amd snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm_amd snd_hda_core snd_hwdep snd_pcm kvm snd_seq_midi snd_seq_midi_event snd_rawmidi crct10dif_pclmul polyval_clmulni polyval_generic btusb ghash_clmulni_intel sha256_ssse3 btrtl sha1_ssse3 snd_seq btintel aesni_intel btbcm btmtk snd_seq_device crypto_simd sunrpc cryptd bluetooth snd_timer ccp binfmt_misc rapl snd i2c_piix4 wmi_bmof gigabyte_wmi k10temp i2c_smbus soundcore gpio_amdpt mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs ip_tables x_tables autofs4 hid_generic usbhid hid crc32_pclmul igc ahci xhci_pci libahci xhci_pci_renesas video wmi [ 35.751501] CPU: 11 UID: 0 PID: 326 Comm: kworker/u64:9 Tainted: G OE 6.11.0-21-generic #21~24.04.1-Ubuntu [ 35.751504] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE [ 35.751505] Hardware name: Gigabyte Technology Co., Ltd. X670E AORUS PRO X/X670E AORUS PRO X, BIOS F30 05/22/2024 [ 35.751506] Workqueue: amdgpu-reset-dev amdgpu_debugfs_reset_work [amdgpu] [ 35.751638] RIP: 0010:dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751794] Code: 6d 0c 00 00 8b 84 24 88 00 00 00 41 3b 44 9c 20 0f 84 fc 07 00 00 48 83 c3 01 48 83 fb 06 75 b3 4c 8b 64 24 68 4c 8b 6c 24 40 <0f> 0b b8 06 00 00 00 49 8b 94 24 a0 49 00 00 89 c3 83 f8 07 0f 87 [ 35.751796] RSP: 0018:ffffbfa3805d7680 EFLAGS: 00010246 [ 35.751798] RAX: 0000000000010000 RBX: 0000000000000006 RCX: 0000000000000000 [ 35.751799] RDX: 0000000000000000 RSI: 0000000000000005 RDI: 0000000000000000 [ 35.751800] RBP: ffffbfa3805d78f0 R08: 0000000000000000 R09: 0000000000000000 [ 35.751801] R10: 0000000000000000 R11: 0000000000000000 R12: ffffbfa383249000 [ 35.751802] R13: ffffa0e68f280000 R14: ffffbfa383249658 R15: 0000000000000000 [ 35.751803] FS: 0000000000000000(0000) GS:ffffa0edbe580000(0000) knlGS:0000000000000000 [ 35.751804] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 35.751805] CR2: 00005d847ef96c58 CR3: 000000041de3e000 CR4: 0000000000f50ef0 [ 35.751806] PKRU: 55555554 [ 35.751807] Call Trace: [ 35.751810] <TASK> [ 35.751816] ? show_regs+0x6c/0x80 [ 35.751820] ? __warn+0x88/0x140 [ 35.751822] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.751964] ? report_bug+0x182/0x1b0 [ 35.751969] ? handle_bug+0x6e/0xb0 [ 35.751972] ? exc_invalid_op+0x18/0x80 [ 35.751974] ? asm_exc_invalid_op+0x1b/0x20 [ 35.751978] ? dml2_map_dc_pipes+0x243d/0x3f40 [amdgpu] [ 35.752117] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752256] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752260] ? math_pow+0x48/0xa0 [amdgpu] [ 35.752400] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752403] ? math_pow+0x11/0xa0 [amdgpu] [ 35.752524] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752526] ? core_dcn4_mode_programming+0xe4d/0x20d0 [amdgpu] [ 35.752663] ? srso_alias_return_thunk+0x5/0xfbef5 [ 35.752669] dml21_validate+0x3d4/0x980 [amdgpu] Reviewed-by: Austin Zheng <austin.zheng@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:30:44 -04:00
George Shen	1561782686	drm/amd/display: fix link_set_dpms_off multi-display MST corner case [Why & How] When MST config is unplugged/replugged too quickly, it can potentially result in a scenario where previous DC state has not been reset before the HPD link detection sequence begins. In this case, driver will disable the streams/link prior to re-enabling the link for link training. There is a bug in the current logic that does not account for the fact that current_state can be released and cleared prior to swapping to a new state (resulting in the pipe_ctx stream pointers to be cleared) in between disabling streams. To resolve this, cache the original streams prior to committing any stream updates. Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: George Shen <george.shen@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:30:28 -04:00
John Olender	53761b7ecd	drm/amd/display: Defer BW-optimization-blocked DRR adjustments [Why & How] Instead of dropping DRR updates, defer them. This fixes issues where monitor continues to see incorrect refresh rate after VRR was turned off by userspace. Fixes: `32953485c5` ("drm/amd/display: Do not update DRR while BW optimizations pending") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3546 Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: John Olender <john.olender@gmail.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:28:41 -04:00
Gabe Teeger	9334c491cd	Revert: "drm/amd/display: Enable urgent latency adjustment on DCN35" This reverts commit `756c85e4d0` ("drm/amd/display: Enable urgent latency adjustment on DCN35") Reason for revert: Negative power impact. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Gabe Teeger <Gabe.Teeger@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:27:44 -04:00
Dillon Varone	dd141b16b3	drm/amd/display: Fix race in dmub_srv_wait_for_pending [WHY] If commands are being submitted to DMCUB while concurrently waiting for pending commands to complete, rptr and wptr may never match again, and reported command count will not update. [HOW] Modify dmub_srv_wait_for_pending to constantly check wptr and rptr match, and update inbox status whenever a message is sent to avoid the race and determine message completion or idle as quickly as possible. Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:27:36 -04:00
Wayne Lin	7ac37f0dcd	drm/amd/display: Correct the reply value when AUX write incomplete [Why] Now forcing aux->transfer to return 0 when incomplete AUX write is inappropriate. It should return bytes have been transferred. [How] aux->transfer is asked not to change original msg except reply field of drm_dp_aux_msg structure. Copy the msg->buffer when it's write request, and overwrite the first byte when sink reply 1 byte indicating partially written byte number. Then we can return the correct value without changing the original msg. Fixes: `3637e457eb` ("drm/amd/display: Fix wrong handling for AUX_DEFER case") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:25:43 -04:00
Dillon Varone	3705217501	drm/amd/display: Backup and restore plane configuration only on update [WHY&HOW] When backing up and restoring plane states for minimal transition cases, only configuration should be backed up and restored. Information only relevant to the object/allocation (like refcount) should be excluded. Also move this interface to dc_plane.h. Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Dillon Varone <Dillon.Varone@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:25:32 -04:00
Tim Huang	0a5c060b59	drm/amdgpu: fix incorrect MALL size for GFX1151 On GFX1151, the reported MALL cache size reflects only half of its actual size; this adjustment corrects the discrepancy. Signed-off-by: Tim Huang <tim.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:23:23 -04:00
Dr. David Alan Gilbert	4367ee3ed1	drm/amd/pm: Remove remainder of mode2_reset_is_support The previous patch removed smu_mode2_reset_is_support() which was the only function to call through the mode2_reset_is_support() method pointer. Remove the remaining functions that were assigned to it and the pointer itself. See discussion at: https://lore.kernel.org/all/DM4PR12MB5165D85BD85BC8FC8BF7A3B48E88A@DM4PR12MB5165.namprd12.prod.outlook.com/ Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:23:14 -04:00
Dr. David Alan Gilbert	da471b8b77	drm/amd/pm: Remove unused smu_mode2_reset_is_support smu_mode2_reset_is_support() was added in 2020 by commit `5c03e5843e` ("drm/amdgpu:add smu mode1/2 support for aldebaran") but has remained unused. See discussion at: https://lore.kernel.org/all/DM4PR12MB5165D85BD85BC8FC8BF7A3B48E88A@DM4PR12MB5165.namprd12.prod.outlook.com/ Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:22:45 -04:00
Dr. David Alan Gilbert	f6da61b956	drm/amd/pm/smu13: Remove unused smu_v13_0_init_display_count smu_v13_0_init_display_count() was added in 2020 by commit `c05d1c4015` ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)") but is unused. See discussion on: https://lore.kernel.org/all/DM4PR12MB5165D85BD85BC8FC8BF7A3B48E88A@DM4PR12MB5165.namprd12.prod.outlook.com/ that it really isn't neede. Remove it. Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:22:07 -04:00
Arvind Yadav	010503a3cb	drm/amdgpu: Fix amdgpu_userq_wait_ioctl() warn missing error code 'r' To resolve the warning regarding the missing error code 'r' in amdgpu_userq_wait_ioctl(), assign the value 'r = -EINVAL'. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202505080458.rnV8YfiY-lkp@intel.com/ Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:21:56 -04:00
Arvind Yadav	f10eb185ad	drm/amdgpu: Fix NULL dereference in amdgpu_userq_restore_worker Switch cancel_delayed_work() to cancel_delayed_work_sync() to ensure the delayed work has finished executing before proceeding with resource cleanup. This prevents a potential use-after-free or NULL dereference if the resume_work is still running during finalization. BUG: kernel NULL pointer dereference, address: 0000000000000140 [ +0.000050] #PF: supervisor read access in kernel mode [ +0.000019] #PF: error_code(0x0000) - not-present page [ +0.000021] PGD 0 P4D 0 [ +0.000015] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI [ +0.000021] CPU: 17 UID: 0 PID: 196299 Comm: kworker/17:0 Tainted: G U 6.14.0-org-staging #1 [ +0.000032] Tainted: [U]=USER [ +0.000015] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F39 03/22/2024 [ +0.000029] Workqueue: events amdgpu_userq_restore_worker [amdgpu] [ +0.000426] RIP: 0010:drm_exec_lock_obj+0x32/0x210 [drm_exec] [ +0.000025] Code: e5 41 57 41 56 41 55 49 89 f5 41 54 49 89 fc 48 83 ec 08 4c 8b 77 30 4d 85 f6 0f 85 c0 00 00 00 4c 8d 7f 08 48 39 77 38 74 54 <49> 8b bd f8 00 00 00 4c 89 fe 41 f6 04 24 01 75 3c e8 08 50 bc e0 [ +0.000046] RSP: 0018:ffffab1b04da3ce8 EFLAGS: 00010297 [ +0.000020] RAX: 0000000000000001 RBX: ffff930cc60e4bc0 RCX: 0000000000000000 [ +0.000025] RDX: 0000000000000004 RSI: 0000000000000048 RDI: ffffab1b04da3d88 [ +0.000028] RBP: ffffab1b04da3d10 R08: ffff930cc60e4000 R09: 0000000000000000 [ +0.000022] R10: ffffab1b04da3d18 R11: 0000000000000001 R12: ffffab1b04da3d88 [ +0.000023] R13: 0000000000000048 R14: 0000000000000000 R15: ffffab1b04da3d90 [ +0.000023] FS: 0000000000000000(0000) GS:ffff9313dea80000(0000) knlGS:0000000000000000 [ +0.000024] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000021] CR2: 0000000000000140 CR3: 000000018351a000 CR4: 0000000000350ef0 [ +0.000025] Call Trace: [ +0.000018] <TASK> [ +0.000015] ? show_regs+0x69/0x80 [ +0.000022] ? __die+0x25/0x70 [ +0.000019] ? page_fault_oops+0x15d/0x510 [ +0.000024] ? do_user_addr_fault+0x312/0x690 [ +0.000024] ? sched_clock_cpu+0x10/0x1a0 [ +0.000028] ? exc_page_fault+0x78/0x1b0 [ +0.000025] ? asm_exc_page_fault+0x27/0x30 [ +0.000024] ? drm_exec_lock_obj+0x32/0x210 [drm_exec] [ +0.000024] drm_exec_prepare_obj+0x21/0x60 [drm_exec] [ +0.000021] amdgpu_vm_lock_pd+0x22/0x30 [amdgpu] [ +0.000266] amdgpu_userq_validate_bos+0x6c/0x320 [amdgpu] [ +0.000333] amdgpu_userq_restore_worker+0x4a/0x120 [amdgpu] [ +0.000316] process_one_work+0x189/0x3c0 [ +0.000021] worker_thread+0x2a4/0x3b0 [ +0.000022] kthread+0x109/0x220 [ +0.000018] ? __pfx_worker_thread+0x10/0x10 [ +0.000779] ? _raw_spin_unlock_irq+0x1f/0x40 [ +0.000560] ? __pfx_kthread+0x10/0x10 [ +0.000543] ret_from_fork+0x3c/0x60 [ +0.000507] ? __pfx_kthread+0x10/0x10 [ +0.000515] ret_from_fork_asm+0x1a/0x30 [ +0.000515] </TASK> v2: Replace cancel_delayed_work() to cancel_delayed_work_sync() in amdgpu_userq_destroy() and amdgpu_userq_evict(). Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Arvind Yadav <arvind.yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:21:39 -04:00
Philip Yang	7dbbfb3c17	drm/amdgpu: csa unmap use uninterruptible lock After process exit to unmap csa and free GPU vm, if signal is accepted and then waiting to take vm lock is interrupted and return, it causes memory leaking and below warning backtrace. Change to use uninterruptible wait lock fix the issue. WARNING: CPU: 69 PID: 167800 at amd/amdgpu/amdgpu_kms.c:1525 amdgpu_driver_postclose_kms+0x294/0x2a0 [amdgpu] Call Trace: <TASK> drm_file_free.part.0+0x1da/0x230 [drm] drm_close_helper.isra.0+0x65/0x70 [drm] drm_release+0x6a/0x120 [drm] amdgpu_drm_release+0x51/0x60 [amdgpu] __fput+0x9f/0x280 ____fput+0xe/0x20 task_work_run+0x67/0xa0 do_exit+0x217/0x3c0 do_group_exit+0x3b/0xb0 get_signal+0x14a/0x8d0 arch_do_signal_or_restart+0xde/0x100 exit_to_user_mode_loop+0xc1/0x1a0 exit_to_user_mode_prepare+0xf4/0x100 syscall_exit_to_user_mode+0x17/0x40 do_syscall_64+0x69/0xc0 Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:21:31 -04:00
Aurabindo Pillai	102419cdad	drm/amd/display: use drm_dbg_driver() in amdgpu_dm.c Replace all use of DRM_DEBUG_DRIVER in amdgpu_dm.c with drm_dbg_driver(). The latter prints the instance of the drm device associated with the error which would helpful in debugging scenarios involving multiple GPUs Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-13 09:21:21 -04:00
Dave Airlie	1faeeb315f	Merge tag 'amd-drm-next-6.16-2025-05-09' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.16-2025-05-09: amdgpu: - IPS fixes - DSC cleanup - DC Scaling updates - DC FP fixes - Fused I2C-over-AUX updates - SubVP fixes - Freesync fix - DMUB AUX fixes - VCN fix - Hibernation fixes - HDP fixes - DCN 2.1 fixes - DPIA fixes - DMUB updates - Use drm_file_err in amdgpu - Enforce isolation updates - Use new dma_fence helpers - USERQ fixes - Documentation updates - Misc code cleanups - SR-IOV updates - RAS updates - PSP 12 cleanups amdkfd: - Update error messages for SDMA - Userptr updates drm: - Add drm_file_err function dma-buf: - Add a helper to sort and deduplicate dma_fence arrays From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250509230951.3871914-1-alexander.deucher@amd.com Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-12 07:14:34 +10:00
Paul E. McKenney	c6f7f1b2c0	drm/amd/pm: Avoid open-coded use of ratelimit_state structure's internals The amdgpu_set_thermal_throttling_logging() function directly accesses the ratelimit_state structure's ->missed field, which works, but which also makes it more difficult to change this field. Therefore, make use of the ratelimit_state_reset_interval() function instead of directly accessing the ->missed field. Nevertheless, open-coded use of ->burst and ->interval is still permitted, for example, for runtime sysfs adjustment of these fields. Link: https://lore.kernel.org/all/fbe93a52-365e-47fe-93a4-44a44547d601@paulmck-laptop/ Link: https://lore.kernel.org/all/20250423115409.3425-1-spasswolf@web.de/ Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202503180826.EiekA1MB-lkp@intel.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Kenneth Feng <kenneth.feng@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: <amd-gfx@lists.freedesktop.org> Cc: <dri-devel@lists.freedesktop.org>	2025-05-08 16:13:26 -07:00
Alex Deucher	5a11a27677	drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `689275140c` ("drm/amdgpu/hdp7.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `dbc064adfc`) Cc: stable@vger.kernel.org	2025-05-08 11:48:12 -04:00
Alex Deucher	ca28e80abe	drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `abe1cbaec6` ("drm/amdgpu/hdp6.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `84141ff615`) Cc: stable@vger.kernel.org	2025-05-08 11:47:54 -04:00
Alex Deucher	dbc988c689	drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `f756dbac1c` ("drm/amdgpu/hdp5.2: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `4a89b7698e`) Cc: stable@vger.kernel.org	2025-05-08 11:47:23 -04:00
Alex Deucher	0e33e0f339	drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `cf424020e0` ("drm/amdgpu/hdp5.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `a5cb344033`) Cc: stable@vger.kernel.org	2025-05-08 11:46:57 -04:00
Lijo Lazar	afc6053d4c	Reapply: drm/amdgpu: Use generic hdp flush function Except HDP v5.2 all use a common logic for HDP flush. Use a generic function. HDP v5.2 forces NO_KIQ logic, revisit it later. Reapply after fixing up an HDP regression. v2: merge the fix (Alex) Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> (v1) Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:21:37 -04:00
Alex Deucher	dbc064adfc	drm/amdgpu/hdp7: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `689275140c` ("drm/amdgpu/hdp7.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:21:12 -04:00
Alex Deucher	84141ff615	drm/amdgpu/hdp6: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `abe1cbaec6` ("drm/amdgpu/hdp6.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:20:48 -04:00
Huang Rui	793fa8ce4e	drm/amdgpu: cleanup sriov function for psp v12 PSP v12 won't have SRIOV function. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:20:43 -04:00
Alex Deucher	4a89b7698e	drm/amdgpu/hdp5.2: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `f756dbac1c` ("drm/amdgpu/hdp5.2: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:20:19 -04:00
Alex Deucher	a5cb344033	drm/amdgpu/hdp5: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `cf424020e0` ("drm/amdgpu/hdp5.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:18:30 -04:00
Huang Rui	518e22b42c	drm/amdgpu: remove re-route ih in psp v12 APU doesn't have second IH ring, so re-routing action here is a no-op. It will take a lot of time to wait timeout from PSP during the initialization. So remove the function in psp v12. Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-08 11:18:24 -04:00
Alex Deucher	f690e39747	drm/amdgpu/hdp4: use memcfg register to post the write for HDP flush Reading back the remapped HDP flush register seems to cause problems on some platforms. All we need is a read, so read back the memcfg register. Fixes: `c9b8dcabb5` ("drm/amdgpu/hdp4.0: do a posting read when flushing HDP") Reported-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lists.freedesktop.org/archives/amd-gfx/2025-April/123150.html Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4119 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3908 Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5c937b4a60`) Cc: stable@vger.kernel.org	2025-05-07 18:24:56 -04:00
Alex Deucher	4aaffc8575	drm/amdgpu: fix pm notifier handling Set the s3/s0ix and s4 flags in the pm notifier so that we can skip the resource evictions properly in pm prepare based on whether we are suspending or hibernating. Drop the eviction as processes are not frozen at this time, we we can end up getting stuck trying to evict VRAM while applications continue to submit work which causes the buffers to get pulled back into VRAM. v2: Move suspend flags out of pm notifier (Mario) Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: `2965e6355d` ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `06f2dcc241`) Cc: stable@vger.kernel.org	2025-05-07 18:24:30 -04:00
Alex Deucher	d0ce1aaa85	Revert "drm/amd: Stop evicting resources on APUs in suspend" This reverts commit `3a9626c816`. This breaks S4 because we end up setting the s3/s0ix flags even when we are entering s4 since prepare is used by both flows. The causes both the S3/s0ix and s4 flags to be set which breaks several checks in the driver which assume they are mutually exclusive. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `ce8f7d9589`) Cc: stable@vger.kernel.org	2025-05-07 18:24:04 -04:00
Ruijing Dong	b7e84fb708	drm/amdgpu/vcn: using separate VCN1_AON_SOC offset VCN1_AON_SOC_ADDRESS_3_0 offset varies on different VCN generations, the issue in vcn4.0.5 is caused by a different VCN1_AON_SOC_ADDRESS_3_0 offset. This patch does the following: 1. use the same offset for other VCN generations. 2. use the vcn4.0.5 special offset 3. update vcn_4_0 and vcn_5_0 Acked-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Ruijing Dong <ruijing.dong@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `5c89ceda99`) Cc: stable@vger.kernel.org	2025-05-07 18:23:40 -04:00
Wayne Lin	65924ec69b	drm/amd/display: Fix wrong handling for AUX_DEFER case [Why] We incorrectly ack all bytes get written when the reply actually is defer. When it's defer, means sink is not ready for the request. We should retry the request. [How] Only reply all data get written when receive I2C_ACK\|AUX_ACK. Otherwise, reply the number of actual written bytes received from the sink. Add some messages to facilitate debugging as well. Fixes: `ad6756b4d7` ("drm/amd/display: Shift dc link aux to aux_payload") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `3637e457eb`) Cc: stable@vger.kernel.org	2025-05-07 18:23:09 -04:00
Wayne Lin	3924f45d4d	drm/amd/display: Copy AUX read reply data whenever length > 0 [Why] amdgpu_dm_process_dmub_aux_transfer_sync() should return all exact data reply from the sink side. Don't do the analysis job in it. [How] Remove unnecessary check condition AUX_TRANSACTION_REPLY_AUX_ACK. Fixes: `ead08b95fa` ("drm/amd/display: Fix race condition in DPIA AUX transfer") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `9b540e3fe6`) Cc: stable@vger.kernel.org	2025-05-07 18:20:24 -04:00
Wayne Lin	396dc51b3b	drm/amd/display: Remove incorrect checking in dmub aux handler [Why & How] "Request length != reply length" is expected behavior defined in spec. It's not an invalid reply. Besides, replied data handling logic is not designed to be written in amdgpu_dm_process_dmub_aux_transfer_sync(). Remove the incorrectly handling section. Fixes: `ead08b95fa` ("drm/amd/display: Fix race condition in DPIA AUX transfer") Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `81b5c6fa62`) Cc: stable@vger.kernel.org	2025-05-07 18:19:36 -04:00
Wayne Lin	bc70e11b55	drm/amd/display: Fix the checking condition in dmub aux handling [Why & How] Fix the checking condition for detecting AUX_RET_ERROR_PROTOCOL_ERROR. It was wrongly checking by "not equals to" Reviewed-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit `1db6c9e9b6`) Cc: stable@vger.kernel.org	2025-05-07 18:17:42 -04:00

... 2 3 4 5 6 ...

33639 Commits