linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 14:53:58 -04:00

Author	SHA1	Message	Date
Kees Cook	189f164e57	Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-22 08:26:33 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Kent Russell	5028a24aa8	drm/amdgpu: Send applicable RMA CPERs at end of RAS init Firmware and monitoring tools may not be ready to receive a CPER when we read the bad pages, so send the CPERs at the end of RAS initialization to ensure that the FW is ready to receive and process the CPER. This removes the previous CPER submission that was added during bad page load, and sends both in-band and out-of-band at the same time. Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-02-05 17:28:34 -05:00
Zilin Guan	ee41e5b63c	drm/amdgpu: Fix memory leak in amdgpu_ras_init() When amdgpu_nbio_ras_sw_init() fails in amdgpu_ras_init(), the function returns directly without freeing the allocated con structure, leading to a memory leak. Fix this by jumping to the release_con label to properly clean up the allocated memory before returning the error code. Compile tested only. Issue found using a prototype static analysis tool and code review. Fixes: `fdc94d3a8c` ("drm/amdgpu: Rework pcie_bif ras sw_init") Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-02-05 17:19:09 -05:00
Gangliang Xie	0028b86b52	drm/amdgpu: mark invalid records with U64_MAX set retired_page of invalid ras records to U64_MAX, and skip them when reading ras records Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-01-21 14:25:43 -05:00
Gangliang Xie	f9f281e839	drm/amdgpu: only check critical address when it is not reserved when an address is reserved already, no need to check if it is in critical or not, to save time Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-01-05 17:00:00 -05:00
Mario Limonciello (AMD)	e291729873	drm/amd: Convert DRM_() to drm_() The drm_*() macros include the device which is helpful for debugging issues in multi-GPU systems. Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-01-05 16:59:55 -05:00
Jinzhou Su	0621f21cf3	drm/amdgpu: Add address checking for uniras Add address checking for uniras Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2026-01-05 16:28:01 -05:00
Asad Kamal	bd68a1404b	drm/amdgpu/ras: Move ras data alloc before bad page check In the rare event if eeprom has only invalid address entries, allocation is skipped, this causes following NULL pointer issue [ 547.103445] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 547.118897] #PF: supervisor read access in kernel mode [ 547.130292] #PF: error_code(0x0000) - not-present page [ 547.141689] PGD 124757067 P4D 0 [ 547.148842] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 547.158504] CPU: 49 PID: 8167 Comm: cat Tainted: G OE 6.8.0-38-generic #38-Ubuntu [ 547.177998] Hardware name: Supermicro AS -8126GS-TNMR/H14DSG-OD, BIOS 1.7 09/12/2025 [ 547.195178] RIP: 0010:amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu] [ 547.210375] Code: e8 63 78 82 c0 45 31 d2 45 3b 75 08 48 8b 45 a0 73 44 44 89 f1 48 8b 7d 88 48 89 ca 48 c1 e2 05 48 29 ca 49 8b 4d 00 48 01 d1 <48> 83 79 10 00 74 17 49 63 f2 48 8b 49 08 41 83 c2 01 48 8d 34 76 [ 547.252045] RSP: 0018:ffa0000067287ac0 EFLAGS: 00010246 [ 547.263636] RAX: ff11000167c28130 RBX: ff11000127600000 RCX: 0000000000000000 [ 547.279467] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff11000125b1c800 [ 547.295298] RBP: ffa0000067287b50 R08: 0000000000000000 R09: 0000000000000000 [ 547.311129] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 547.326959] R13: ff11000217b1de00 R14: 0000000000000000 R15: 0000000000000092 [ 547.342790] FS: 0000746e59d14740(0000) GS:ff11017dfda80000(0000) knlGS:0000000000000000 [ 547.360744] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 547.373489] CR2: 0000000000000010 CR3: 000000019585e001 CR4: 0000000000f71ef0 [ 547.389321] PKRU: 55555554 [ 547.395316] Call Trace: [ 547.400737] <TASK> [ 547.405386] ? show_regs+0x6d/0x80 [ 547.412929] ? __die+0x24/0x80 [ 547.419697] ? page_fault_oops+0x99/0x1b0 [ 547.428588] ? do_user_addr_fault+0x2ee/0x6b0 [ 547.438249] ? exc_page_fault+0x83/0x1b0 [ 547.446949] ? asm_exc_page_fault+0x27/0x30 [ 547.456225] ? amdgpu_ras_sysfs_badpages_read+0x2f2/0x5d0 [amdgpu] [ 547.470040] ? mas_wr_modify+0xcd/0x140 [ 547.478548] sysfs_kf_bin_read+0x63/0xb0 [ 547.487248] kernfs_file_read_iter+0xa1/0x190 [ 547.496909] kernfs_fop_read_iter+0x25/0x40 [ 547.506182] vfs_read+0x255/0x390 This also result in space left assigned to negative values. Moving data alloc call before bad page check resolves both the issue. Signed-off-by: Asad Kamal <asad.kamal@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-12-08 13:56:34 -05:00
Tao Zhou	f752e79d38	drm/amdgpu: fix the calculation of RAS bad page number __amdgpu_ras_restore_bad_pages is responsible for the maintenance of bad page number, drop the unnecessary bad page number update in the error handling path of add_bad_pages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-12-08 13:56:33 -05:00
Lijo Lazar	c58b5d8269	drm/amdgpu: Unregister mce notifier Unregister mce notifier on unload. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-18 10:52:19 -05:00
Tao Zhou	e84835940e	drm/amdgpu: get RAS bad page address from MCA address Instead of from physical address. v2: add comment to make the code more readable Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:54:14 -05:00
Tao Zhou	c154a96b55	drm/amdgpu: load RAS bad page from PMFW in page retirement In legacy way, bad page is queried from MCA registers, switch to getting it from PMFW when PMFW manages eeprom data. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-11 21:53:26 -05:00
Tao Zhou	e1ca536e17	drm/amdgpu: support to load RAS bad pages from PMFW PMFW manages eeprom bad page records, update bad page loading accrodingly. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-06 10:01:14 -05:00
YiPeng Chai	d95ca7f515	drm/amdgpu: suspend ras module before gpu reset During gpu reset, all GPU-related resources are inaccessible. To avoid affecting ras functionality, suspend ras module before gpu reset and resume it after gpu reset is complete. V2: Rename functions to avoid misunderstanding. V3: Move flush_delayed_work to amdgpu_ras_process_pause, Move schedule_delayed_work to amdgpu_ras_process_unpause. V4: Rename functions. V5: Move the function to amdgpu_ras.c. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:59 -05:00
Gangliang Xie	f6cdcbd2c0	drm/amdgpu: add function to check if pmfw eeprom is supported add function to check if pmfw is supported, skip eeprom check and recover when pmfw eeprom is supported Signed-off-by: Gangliang Xie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:53:58 -05:00
YiPeng Chai	812b727364	drm/amdgpu: Fix error injection parameter error Fix error injection parameter error. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-11-04 11:52:46 -05:00
Sakari Ailus	ef4a4b8781	drm/amd: Remove redundant pm_runtime_mark_last_busy() calls pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(), pm_runtime_autosuspend() and pm_request_autosuspend() now include a call to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to pm_runtime_mark_last_busy(). Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 11:31:45 -04:00
Jinzhou Su	214eb7e83d	drm/amdgpu: Add uniras version in sysfs Display uniras version in sysfs version interface when uniras enable. v2: display ras version detail info Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 09:47:49 -04:00
Jinzhou Su	edddaada9e	drm/amdgpu: clear bad page info of ras module Clear bad page info of ras module. Signed-off-by: Jinzhou Su <jinzhou.su@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-28 09:47:33 -04:00
YiPeng Chai	fe2ccc7b7b	drm/amdgpu: query block error count of ras module Query block error count of ras module. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:28:18 -04:00
YiPeng Chai	a6b5a7a033	drm/amdgpu: query bad page info of ras module Query bad page info of ras module. V2: Update code to reuse bad page output code. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:27:49 -04:00
YiPeng Chai	62902b88ff	drm/amdgpu: ras module supports error injection ras module supports error injection. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:27:45 -04:00
Tao Zhou	036f18d0a2	drm/amdgpu: check save count before RAS bad page saving It's possible that unit_num is larger than 0 but save_count is zero, since we do get bad page address but the address is invalid. Check unit_num and save_count together. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Candice Li <candice.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:25:49 -04:00
YiPeng Chai	43a90c0732	drm/amdgpu: Avoid hive seqno increment in legacy ras The hive->event_mgr variable is used by both ras module and legacy ras. To ensure the continuity of hive seqno growth, after enabling ras module, it is forbidden to operate the event_mgr variable in legacy ras. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:18:55 -04:00
YiPeng Chai	47ba675a19	drm/amdgpu: Avoid loading bad pages into legacy ras When ras module is enabled, the bad pages will be loaded by ras module. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:18:44 -04:00
YiPeng Chai	fe0f51d6d8	drm/amdgpu: add ras module rma check Add ras module rma check. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:18:41 -04:00
YiPeng Chai	408bd841ad	drm/amdgpu: Improve ras fatal error handling function In multi-gpu case, a fatal error will generate several fatal error interrupts. After improving this function, the ras module can reuse this function to only handle the first interrupt. V3: Initialize event_id using RAS_EVENT_INVALID_ID. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:18:35 -04:00
YiPeng Chai	3d72d2e5f4	drm/amdgpu: Intercept ras interrupts to ras module Intercept ras interrupts to ras module. V2: Change function names in ras module. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-10-20 18:18:26 -04:00
Xiang Liu	1ed511fb76	drm/amdgpu: Check VF critical region before RAS poison injection Check VF critical region before RAS poison injection to ensure that the poison injection will not hit the VF critical region. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-18 09:43:10 -04:00
Stanley.Yang	e09b081d8a	drm/amdgpu: wait pmfw polling mca bank info done wait 500ms to ensure pmfw polling mca bank info done. Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-09-15 16:55:27 -04:00
Dave Airlie	14579a6f18	Merge tag 'amd-drm-next-6.18-2025-08-29' of https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.18-2025-08-29: amdgpu: - Replay fixes - RAS updates - VCN SRAM load fixes - EDID read fixes - eDP ALPM support - AUX fixes - Documenation updates - Rework how PTE flags are generated - DCE6 fixes - VCN devcoredump cleanup - MMHUB client id fixes - SR-IOV fixes - VRR fixes - VCN 5.0.1 RAS support - Backlight fixes - UserQ fixes - Misc code cleanups - SMU 13.0.12 updates - Expanded PCIe DPC support - Expanded VCN reset support - SMU 13.0.x Updates - VPE per queue reset support - Cusor rotation fix - DSC fixes - GC 12 MES TLB invalidation update - Cursor fixes - Non-DC TMDS clock validation fix amdkfd: - debugfs fixes - Misc code cleanups - Page migration fixes - Partition fixes - SVM fixes radeon: - Misc code cleanups Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://lore.kernel.org/r/20250829190848.1921648-1-alexander.deucher@amd.com	2025-09-02 09:35:54 +10:00
Ce Sun	d8442bcad0	drm/amdgpu: Correct the loss of aca bank reg info By polling, poll ACA bank count to ensure that valid ACA bank reg info can be obtained v2: add corresponding delay before send msg to SMU to query mca bank info (Stanley) v3: the loop cannot exit. (Thomas) v4: remove amdgpu_aca_clear_bank_count. (Kevin) v5: continuously inject ce. If a creation interruption occurs at this time, bank reg info will be lost. (Thomas) v5: each cycle is delayed by 100ms. (Tao) Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Ce Sun	0989b764f4	drm/amdgpu: Add a mutex lock to protect poison injection When poison is triggered multiple times, competition will occur. Add a mutex lock to protect poison injection Signed-off-by: Ce Sun <cesun102@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-27 13:57:47 -04:00
Yunshui Jiang	130c7ed88f	drm/amdgpu: use kmalloc_array() instead of kmalloc() Use kmalloc_array() instead of kmalloc() with multiplication. kmalloc_array() is a safer way because of its multiply overflow check. Signed-off-by: Yunshui Jiang <jiangyunshui@kylinos.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:30:31 -04:00
Ce Sun	21c0ffa612	drm/amdgpu: Avoid rma causes GPU duplicate reset Try to ensure poison creation handle is completed in time to set device rma value. Signed-off-by: Ce Sun <cesun102@amd.com> Signed-off-by: Stanley.Yang <Stanley.Yang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-08-04 14:27:47 -04:00
Linus Torvalds	260f6f4fda	Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...	2025-07-30 19:26:49 -07:00
YiPeng Chai	3fc96f60b6	drm/amdgpu: add critical address check for bad page retirement Add critical address check for bad page retirement. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-28 16:40:07 -04:00
YiPeng Chai	f348691897	drm/amdgpu: support ras critical address check Support ras critical address check. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-28 16:40:06 -04:00
Tao Zhou	d45c5e6845	drm/amdgpu: adjust the update of RAS bad page number One eeprom record may not map to unit number of bad pages, the accurate bad page number is gotten after bad page address check. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-28 16:40:06 -04:00
Tao Zhou	2b17c240e8	drm/amdgpu: add range check for RAS bad page address Exclude invalid bad pages. Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-28 16:40:06 -04:00
YiPeng Chai	a813437c33	drm/amdgpu: add command to check address validity Add command to check address validity and remove unused command codes. v2: The command interface adds new parameters to support multiple check address strategies. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-28 16:40:06 -04:00
ganglxie	48ee3d8e5e	drm/amdgpu: refine bad page loading when in the same nps mode when loading bad page in the same nps mode, need to set the other fields fields in eeprom records manually besides retired_page Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-07-15 14:07:53 -04:00
YiPeng Chai	a6d6a86e94	drm/amdgpu: Remove useless timeout error message The timeout is only used to interrupt polling and not need to print a error message. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-06-30 12:08:00 -04:00
ganglxie	cfce8f4fa7	drm/amdgpu: refine ras error injection when eeprom initialization failed when eeprom initialization failed, we still support ras error injection, and reserve bad pages, but do not save bad pages to eeprom Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-06-30 12:08:00 -04:00
Xiang Liu	f43411978d	drm/amdgpu: Add debug mask to disable CE logs Add debug mask to disable kernel logs of RAS correctable errors, including both ACA and CE error counter kernel messages. Signed-off-by: Xiang Liu <xiang.liu@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-06-18 12:19:18 -04:00
Lijo Lazar	3bdf8dd84e	drm/amdgpu: Clear reset flags from ras context Once RAS errors are cleared with appropriate recovery mechanism, clear reset flags also from RAS context. Otherwise, stale flag values could affect the subsequent RAS reset handling on the device. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-06-18 12:19:18 -04:00
Thomas Weißschuh	fb506e31b3	sysfs: treewide: switch back to attribute_group::bin_attrs The normal bin_attrs field can now handle const pointers. This makes the _new variant unnecessary. Switch all users back. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20250530-sysfs-const-bin_attr-final-v3-4-724bfcf05b99@weissschuh.net Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2025-06-17 10:44:15 +02:00
ganglxie	fce0afca35	drm/amdgpu: Get mca address for old eeprom records after getting mca address for old eeprom records with 'address==0', it can be correctly parsed under none-nps1, or it will be dropped. Signed-off-by: ganglxie <ganglxie@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>	2025-05-29 10:57:09 -04:00

1 2 3 4 5 ...

547 Commits