linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-27 02:52:27 -04:00

Author	SHA1	Message	Date
Linus Torvalds	a859eca0e4	Merge tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel Pull more drm fixes from Dave Airlie: "These are the enqueued fixes that ended up in our fixes branch, nouveau mostly, along with some small fixes in other places. plane: - Handle IS_ERR vs NULL in drm_plane_create_hotspot_properties() ttm: - fix devcoredump for evicted bos panel: - Fix stack usage warning in novatek-nt35560 nouveau: - alloc fwsec sb at boot to avoid s/r problems - fix strcpy usage - fix i2c encoder crash bridge: - Ignore spurious PLL_UNLOCK bit in ti-sn65dsi83 mgag200: - Fix bigendian handling in mgag200 tilcdc: - Fix probe failure in tilcdc" * tag 'drm-fixes-2025-12-13' of https://gitlab.freedesktop.org/drm/kernel: drm/mgag200: Fix big-endian support drm/tilcdc: Fix removal actions in case of failed probe drm/ttm: Avoid NULL pointer deref for evicted BOs drm: nouveau: Replace sprintf() with sysfs_emit() drm/nouveau: fix circular dep oops from vendored i2c encoder drm/nouveau: refactor deprecated strcpy drm/plane: Fix IS_ERR() vs NULL check in drm_plane_create_hotspot_properties() drm/bridge: ti-sn65dsi83: ignore PLL_UNLOCK errors drm/nouveau/gsp: Allocate fwsec-sb at boot drm/panel: novatek-nt35560: avoid on-stack device structure	2025-12-13 17:39:28 +12:00
Lyude Paul	da67179e55	drm/nouveau/gsp: Allocate fwsec-sb at boot At the moment - the memory allocation for fwsec-sb is created as-needed and is released after being used. Typically this is at some point well after driver load, which can cause runtime suspend/resume to initially work on driver load but then later fail on a machine that has been running for long enough with sufficiently high enough memory pressure: kworker/7:1: page allocation failure: order:5, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 7 UID: 0 PID: 875159 Comm: kworker/7:1 Not tainted 6.17.8-300.fc43.x86_64 #1 PREEMPT(lazy) Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 warn_alloc+0x163/0x190 ? __alloc_pages_direct_compact+0x1b3/0x220 __alloc_pages_slowpath.constprop.0+0x57a/0xb10 __alloc_frozen_pages_noprof+0x334/0x350 __alloc_pages_noprof+0xe/0x20 __dma_direct_alloc_pages.isra.0+0x1eb/0x330 dma_direct_alloc_pages+0x3c/0x190 dma_alloc_pages+0x29/0x130 nvkm_firmware_ctor+0x1ae/0x280 [nouveau] nvkm_falcon_fw_ctor+0x3e/0x60 [nouveau] nvkm_gsp_fwsec+0x10e/0x2c0 [nouveau] ? sysvec_apic_timer_interrupt+0xe/0x90 nvkm_gsp_fwsec_sb+0x27/0x70 [nouveau] tu102_gsp_fini+0x65/0x110 [nouveau] ? ktime_get+0x3c/0xf0 nvkm_subdev_fini+0x67/0xc0 [nouveau] nvkm_device_fini+0x94/0x140 [nouveau] nvkm_udevice_fini+0x50/0x70 [nouveau] nvkm_object_fini+0xb1/0x140 [nouveau] nvkm_object_fini+0x70/0x140 [nouveau] ? __pfx_pci_pm_runtime_suspend+0x10/0x10 nouveau_do_suspend+0xe4/0x170 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1a0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x45/0x1f0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x6d/0x80 rpm_suspend+0xe5/0x5e0 ? finish_task_switch.isra.0+0x99/0x2c0 pm_runtime_work+0x98/0xb0 process_one_work+0x18f/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf9/0x240 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0xf1/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> The reason this happens is because the fwsec-sb firmware image only supports being booted from a contiguous coherent sysmem allocation. If a system runs into enough memory fragmentation from memory pressure, such as what can happen on systems with low amounts of memory, this can lead to a situation where it later becomes impossible to find space for a large enough contiguous allocation to hold fwsec-sb. This causes us to fail to boot the firmware image, causing the GPU to fail booting and causing the driver to fail. Since this firmware can't use non-contiguous allocations, the best solution to avoid this issue is to simply allocate the memory for fwsec-sb during initial driver-load, and reuse the memory allocation when fwsec-sb needs to be used. We then release the memory allocations on driver unload. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: `594766ca3e` ("drm/nouveau/gsp: move booter handling to GPU-specific code") Cc: <stable@vger.kernel.org> # v6.16+ Reviewed-by: Timur Tabi <ttabi@nvidia.com> Link: https://patch.msgid.link/20251202175918.63533-1-lyude@redhat.com	2025-12-04 20:35:18 -05:00
Timur Tabi	31d3354f42	drm/nouveau: verify that hardware supports the flush page address Ensure that the DMA address of the framebuffer flush page is not larger than its hardware register. On GPUs older than Hopper, the register for the address can hold up to a 40-bit address (right-shifted by 8 so that it fits in the 32-bit register), and on Hopper and later it can be 52 bits (64-bit register where bits 52-63 must be zero). Recently it was discovered that under certain conditions, the flush page could be allocated outside this range. Although this bug was fixed, we can ensure that any future changes to this code don't accidentally generate an invalid page address. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251113230323.1271726-2-ttabi@nvidia.com	2025-11-24 17:53:22 -05:00
Timur Tabi	04d98b3452	drm/nouveau: restrict the flush page to a 32-bit address The flush page DMA address is stored in a special register that is not associated with the GPU's standard DMA range. For example, on Turing, the GPU's MMU can handle 47-bit addresses, but the flush page address register is limited to 40 bits. At the point during device initialization when the flush page is allocated, the DMA mask is still at its default of 32 bits. So even though it's unlikely that the flush page could exist above a 40-bit address, the dma_map_page() call could fail, e.g. if IOMMU is disabled and the address is above 32 bits. The simplest way to achieve all constraints is to allocate the page in the DMA32 zone. Since the flush page is literally just a page, this is an acceptable limitation. The alternative is to temporarily set the DMA mask to 40 (or 52 for Hopper and later) bits, but that could have unforseen side effects. In situations where the flush page is allocated above 32 bits and IOMMU is disabled, you will get an error like this: nouveau 0000:65:00.0: DMA addr 0x0000000107c56000+4096 overflow (mask ffffffff, bus limit 0). Fixes: `5728d06419` ("drm/nouveau/fb: handle sysmem flush page from common code") Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251113230323.1271726-1-ttabi@nvidia.com	2025-11-24 17:53:22 -05:00
Ben Skeggs	0ee6a72bb0	drm/nouveau/mmu/tu102: Add support for compressed kinds Allow compressed PTE kinds to be written into PTEs when GSP-RM is present, rather than reverting to their non-compressed versions. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Signed-off-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: James Jones <jajones@nvidia.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251110-nouveau-compv6-v6-4-83b05475f57c@mary.zone	2025-11-12 12:23:40 -05:00
Ben Skeggs	a79d3845f9	drm/nouveau/mmu/gp100: Remove unused/broken support for compression From GP100 onwards it's not possible to initialise comptag RAM without PMU firmware, which nouveau has no support for. As such, this code is essentially a no-op and will always revert to the equivalent non-compressed kind due to comptag allocation failure. It's also broken for the needs of VM_BIND/Vulkan. Remove the code entirely to make way for supporting compression on GPUs that support GSM-RM. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Signed-off-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Lyude Paul <lyude@redhat.com> Reviewed-by: James Jones <jajones@nvidia.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patch.msgid.link/20251110-nouveau-compv6-v6-3-83b05475f57c@mary.zone	2025-11-12 12:23:40 -05:00
Aaron Kling	6ca1701cec	drm/nouveau: Support devfreq for Tegra Using pmu counters for usage stats. This enables dynamic frequency scaling on all of the currently supported Tegra gpus. The register offsets are valid for gk20a, gm20b, gp10b, and gv11b. If support is added for ga10b, this will need rearchitected. Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Reviewed-by: Lyude Paul <lyude@redhat.com> [fixed tab alignment in gk20a_devfreq_target()] Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250906-gk20a-devfreq-v2-1-0217f53ee355@gmail.com	2025-09-15 14:18:08 -04:00
Aaron Kling	d5603737e7	drm/nouveau: Support reclocking on gp10b Starting with Tegra186, gpu clock handling is done by the bpmp and there is little to be done by the kernel. The only thing necessary for reclocking is to set the gpcclk to the desired rate and the bpmp handles the rest. The pstate list is based on the downstream driver generates. Signed-off-by: Aaron Kling <webgeek1234@gmail.com> Reviewed-by: Lyude Paul <lyude@redhat.com> [added newline before gp10b_clk macro declaration for checkpatch error] Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250823-gp10b-reclock-v2-1-90a1974a54e3@gmail.com	2025-09-15 14:15:55 -04:00
Dave Airlie	0d9f0083f7	Merge tag 'v6.17-rc6' into drm-next This is a backmerge of Linux 6.17-rc6, needed for msm, also requested by misc. Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-09-15 17:51:07 +10:00
Dave Airlie	0ef5c4e4db	nouveau: fix disabling the nonstall irq due to storm code Nouveau has code that when it gets an IRQ with no allowed handler it disables it to avoid storms. However with nonstall interrupts, we often disable them from the drm driver, but still request their emission via the push submission. Just don't disable nonstall irqs ever in normal operation, the event handling code will filter them out, and the driver will just enable/disable them at load time. This fixes timeouts we've been seeing on/off for a long time, but they became a lot more noticeable on Blackwell. This doesn't fix all of them, there is a subsequent fence emission fix to fix the last few. Fixes: `3ebd64aa3c` ("drm/nouveau/intr: support multiple trees, and explicit interfaces") Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/r/20250829021633.1674524-1-airlied@gmail.com [ Fix a typo and a minor checkpatch.pl warning; remove "v2" from commit subject. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-08-29 18:36:23 +02:00
Timur Tabi	66e82b6e0a	drm/nouveau: fix error path in nvkm_gsp_fwsec_v2 Function nvkm_gsp_fwsec_v2() sets 'ret' if the kmemdup() call fails, but it never uses or returns 'ret' after that point. We always need to release the firmware regardless, so do that and then check for error. Fixes: `176fdcbddf` ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Timur Tabi <ttabi@nvidia.com> Link: https://lore.kernel.org/r/20250813001004.2986092-1-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-08-25 18:09:29 +02:00
Qianfeng Rong	989fe67712	drm/nouveau/gsp: fix mismatched alloc/free for kvmalloc() Replace kfree() with kvfree() for memory allocated by kvmalloc(). Compile-tested only. Cc: stable@vger.kernel.org Fixes: `8a8b1ec526` ("drm/nouveau/gsp: split rpc handling out on its own") Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Acked-by: Zhi Wang <zhiw@nvidia.com> Link: https://lore.kernel.org/r/20250813125412.96178-1-rongqianfeng@vivo.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-08-15 20:39:48 +02:00
Mel Henning	a3daf184bd	drm/nouveau: Improve message for missing firmware This is inteded to address concerns that users might get cryptic error messages or a failure to boot if they set nouveau.config=NvGspRm=0 on the kernel command line and their gpu requires gsp (Ada or newer). With this patch, that configuration results in error messages like this: nouveau 0000:01:00.0: gsp: Failed to load required firmware for device. nouveau 0000:01:00.0: gsp ctor failed: -22 nouveau 0000:01:00.0: probe with driver nouveau failed with error -22 When nouveau fails to load like this, we still fall back to the generic framebuffer device, so users will still have limited graphical output. Signed-off-by: Mel Henning <mhenning@darkrefraction.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250811213843.4294-4-mhenning@darkrefraction.com	2025-08-12 17:36:54 -04:00
Mel Henning	2e308a935f	drm/nouveau: Remove nvkm_gsp_fwif.enable This struct element is no longer used. Signed-off-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250811213843.4294-3-mhenning@darkrefraction.com	2025-08-12 17:36:51 -04:00
Mel Henning	e0ed674acb	drm/nouveau: Remove DRM_NOUVEAU_GSP_DEFAULT config This option was originally intoduced because the GSP code path was not well tested and we wanted to leave it up to distros which code path they shipped by default. By now though, the GSP path is probably better tested than the old firmware eg. Fedora ships GSP by default and we generally run CTS on GSP. We've always been GSP-only on Ada and later. So, this path removes the option and effectively sets the option to always on. We still fall back to the old firmware if GSP is not found. This change only affects Turing and Ampere. Users can still set nouveau.config=NvGspRm=0 on the kernel command line to force using the old firmware on Turing/Ampere. Signed-off-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250811213843.4294-2-mhenning@darkrefraction.com	2025-08-12 17:36:49 -04:00
Timur Tabi	27738c3003	drm/nouveau: always set RMDevidCheckIgnore for GSP-RM Always set the RMDevidCheckIgnore registry key for GSP-RM so that it will continue support newer variants of already supported GPUs. GSP-RM maintains an internal list of PCI IDs of GPUs that it supports, and checks if the current GPU is on this list. While the actual GPU architecture (as specified in the BOOT_0/BOOT_42 registers) determines how to enable the GPU, the PCI ID is used for the product name, e.g. "NVIDIA GeForce RTX 5090". Unfortunately, if there is no match, GSP-RM will refuse to initialize, even if the device is fully supported. Nouveau will get an error return code, but by then it's too late. This behavior may be corrected in a future version of GSP-RM, but that does not help Nouveau today. Fortunately, GSP-RM supports an undocumented registry key that tells it to ignore the mismatch. In such cases, the product name returned will be a blank string, but otherwise GSP-RM will continue. Unlike Nvidia's proprietary driver, Nouveau cannot update to newer firmware versions to keep up with every new hardware release. Instead, we can permanently set this registry key, and GSP-RM will continue to function the same with known hardware. Signed-off-by: Timur Tabi <ttabi@nvidia.com> Link: https://lore.kernel.org/r/20250808191340.1701983-1-ttabi@nvidia.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-08-12 00:32:54 +02:00
Linus Torvalds	260f6f4fda	Merge tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel Pull drm updates from Dave Airlie: "Highlights: - Intel xe enable Panthor Lake, started adding WildCat Lake - amdgpu has a bunch of reset improvments along with the usual IP updates - msm got VM_BIND support which is important for vulkan sparse memory - more drm_panic users - gpusvm common code to handle a bunch of core SVM work outside drivers. Detail summary: Changes outside drm subdirectory: - 'shrink_shmem_memory()' for better shmem/hibernate interaction - Rust support infrastructure: - make ETIMEDOUT available - add size constants up to SZ_2G - add DMA coherent allocation bindings - mtd driver for Intel GPU non-volatile storage - i2c designware quirk for Intel xe core: - atomic helpers: tune enable/disable sequences - add task info to wedge API - refactor EDID quirks - connector: move HDR sink to drm_display_info - fourcc: half-float and 32-bit float formats - mode_config: pass format info to simplify dma-buf: - heaps: Give CMA heap a stable name ci: - add device tree validation and kunit displayport: - change AUX DPCD access probe address - add quirk for DPCD probe - add panel replay definitions - backlight control helpers fbdev: - make CONFIG_FIRMWARE_EDID available on all arches fence: - fix UAF issues format-helper: - improve tests gpusvm: - introduce devmem only flag for allocation - add timeslicing support to GPU SVM ttm: - improve eviction sched: - tracing improvements - kunit improvements - memory leak fixes - reset handling improvements color mgmt: - add hardware gamma LUT handling helpers bridge: - add destroy hook - switch to reference counted drm_bridge allocations - tc358767: convert to devm_drm_bridge_alloc - improve CEC handling panel: - switch to reference counter drm_panel allocations - fwnode panel lookup - Huiling hl055fhv028c support - Raspberry Pi 7" 720x1280 support - edp: KDC KD116N3730A05, N160JCE-ELL CMN, N116BCJ-EAK - simple: AUO P238HAN01 - st7701: Winstar wf40eswaa6mnn0 - visionox: rm69299-shift - Renesas R61307, Renesas R69328 support - DJN HX83112B hdmi: - add CEC handling - YUV420 output support xe: - WildCat Lake support - Enable PanthorLake by default - mark BMG as SRIOV capable - update firmware recommendations - Expose media OA units - aux-bux support for non-volatile memory - MTD intel-dg driver for non-volatile memory - Expose fan control and voltage regulator in sysfs - restructure migration for multi-device - Restore GuC submit UAF fix - make GEM shrinker drm managed - SRIOV VF Post-migration recovery of GGTT nodes - W/A additions/reworks - Prefetch support for svm ranges - Don't allocate managed BO for each policy change - HWMON fixes for BMG - Create LRC BO without VM - PCI ID updates - make SLPC debugfs files optional - rework eviction rejection of bound external BOs - consolidate PAT programming logic for pre/post Xe2 - init changes for flicker-free boot - Enable GuC Dynamic Inhibit Context switch i915: - drm_panic support for i915/xe - initial flip queue off by default for LNL/PNL - Wildcat Lake Display support - Support for DSC fractional link bpp - Support for simultaneous Panel Replay and Adaptive sync - Support for PTL+ double buffer LUT - initial PIPEDMC event handling - drm_panel_follower support - DPLL interface renames - allocate struct intel_display dynamically - flip queue preperation - abstract DRAM detection better - avoid GuC scheduling stalls - remove DG1 force probe requirement - fix MEI interrupt handler on RT kernels - use backlight control helpers for eDP - more shared display code refactoring amdgpu: - add userq slot to INFO ioctl - SR-IOV hibernation support - Suspend improvements - Backlight improvements - Use scaling for non-native eDP modes - cleaner shader updates for GC 9.x - Remove fence slab - SDMA fw checks for userq support - RAS updates - DMCUB updates - DP tunneling fixes - Display idle D3 support - Per queue reset improvements - initial smartmux support amdkfd: - enable KFD on loongarch - mtype fix for ext coherent system memory radeon: - CS validation additional GL extensions - drop console lock during suspend/resume - bump driver version msm: - VM BIND support - CI: infrastructure updates - UBWC single source of truth - decouple GPU and KMS support - DP: rework I/O accessors - DPU: SM8750 support - DSI: SM8750 support - GPU: X1-45 support and speedbin support for X1-85 - MDSS: SM8750 support nova: - register! macro improvements - DMA object abstraction - VBIOS parser + fwsec lookup - sysmem flush page support - falcon: generic falcon boot code and HAL - FWSEC-FRTS: fb setup and load/execute ivpu: - Add Wildcat Lake support - Add turbo flag ast: - improve hardware generations implementation imx: - IMX8qxq Display Controller support lima: - Rockchip RK3528 GPU support nouveau: - fence handling cleanup panfrost: - MT8370 support - bo labeling - 64-bit register access qaic: - add RAS support rockchip: - convert inno_hdmi to a bridge rz-du: - add RZ/V2H(P) support - MIPI-DSI DCS support sitronix: - ST7567 support sun4i: - add H616 support tidss: - add TI AM62L support - AM65x OLDI bridge support bochs: - drm panic support vkms: - YUV and R* format support - use faux device vmwgfx: - fence improvements hyperv: - move out of simple - add drm_panic support" * tag 'drm-next-2025-07-30' of https://gitlab.freedesktop.org/drm/kernel: (1479 commits) drm/tidss: oldi: convert to devm_drm_bridge_alloc() API drm/tidss: encoder: convert to devm_drm_bridge_alloc() drm/amdgpu: move reset support type checks into the caller drm/amdgpu/sdma7: re-emit unprocessed state on ring reset drm/amdgpu/sdma6: re-emit unprocessed state on ring reset drm/amdgpu/sdma5.2: re-emit unprocessed state on ring reset drm/amdgpu/sdma5: re-emit unprocessed state on ring reset drm/amdgpu/gfx12: re-emit unprocessed state on ring reset drm/amdgpu/gfx11: re-emit unprocessed state on ring reset drm/amdgpu/gfx10: re-emit unprocessed state on ring reset drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset drm/amdgpu: Add WARN_ON to the resource clear function drm/amd/pm: Use cached metrics data on SMUv13.0.6 drm/amd/pm: Use cached data for min/max clocks gpu: nova-core: fix bounds check in PmuLookupTableEntry::new drm/amdgpu: Replace HQD terminology with slots naming drm/amdgpu: Add user queue instance count in HW IP info drm/amd/amdgpu: Add helper functions for isp buffers drm/amd/amdgpu: Initialize swnode for ISP MFD device ...	2025-07-30 19:26:49 -07:00
Ben Skeggs	d133036a0b	drm/nouveau/gsp: fix potential leak of memory used during acpi init If any of the ACPI calls fail, memory allocated for the input buffer would be leaked. Fix failure paths to free allocated memory. Also add checks to ensure the allocations succeeded in the first place. Reported-by: Danilo Krummrich <dakr@kernel.org> Fixes: `176fdcbddf` ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250617040036.2932-1-bskeggs@nvidia.com	2025-07-07 16:32:44 +02:00
Dave Airlie	e79d0ba605	nouveau/gsp: add a 50ms delay between fbsr and driver unload rpcs This fixes a bunch of command hangs after runtime suspend/resume. This fixes a regression caused by code movement in the commit below, the commit seems to just change timings enough to cause this to happen now, and adding the sleep seems to avoid it. I've spent some time trying to root cause it to no great avail, it seems like a bug on the firmware side, but it could be a bug in our rpc handling that I can't find. Either way, we should land the workaround to fix the problem, while we continue to work out the root cause. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Ben Skeggs <bskeggs@nvidia.com> Cc: Danilo Krummrich <dakr@kernel.org> Fixes: `c21b039715` ("drm/nouveau/gsp: add hals for fbsr.suspend/resume()") Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250702232707.175679-1-airlied@gmail.com	2025-07-04 00:22:12 +02:00
Zhi Wang	9802f0a63b	drm/nouveau: fix a use-after-free in r535_gsp_rpc_push() The RPC container is released after being passed to r535_gsp_rpc_send(). When sending the initial fragment of a large RPC and passing the caller's RPC container, the container will be freed prematurely. Subsequent attempts to send remaining fragments will therefore result in a use-after-free. Allocate a temporary RPC container for holding the initial fragment of a large RPC when sending. Free the caller's container when all fragments are successfully sent. Fixes: `176fdcbddf` ("drm/nouveau/gsp/r535: add support for booting GSP-RM") Signed-off-by: Zhi Wang <zhiw@nvidia.com> Link: https://lore.kernel.org/r/20250527163712.3444-1-zhiw@nvidia.com [ Rebase onto Blackwell changes. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>	2025-06-13 16:38:06 +02:00
Colin Ian King	80626ae6ff	drm/nouveau/gsp: Fix potential integer overflow on integer shifts The left shift int 32 bit integer constants 1 is evaluated using 32 bit arithmetic and then assigned to a 64 bit unsigned integer. In the case where the shift is 32 or more this can lead to an overflow. Avoid this by shifting using the BIT_ULL macro instead. Fixes: `6c3ac7bcfc` ("drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES") Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Danilo Krummrich <dakr@kernel.org> Link: https://lore.kernel.org/r/20250522131512.2768310-1-colin.i.king@gmail.com	2025-06-13 16:25:37 +02:00
Thomas Zimmermann	c598d5eb9f	Merge drm/drm-next into drm-misc-next Backmerging to forward to v6.16-rc1 Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>	2025-06-11 09:01:34 +02:00
Chen Ni	04c8970771	drm/nouveau/vfn/r535: Convert comma to semicolon Replace comma between expressions with semicolons. Using a ',' in place of a ';' can have unintended side effects. Although that is not the case here, it is seems best to use ';' unless ',' is intended. Found by inspection. No functional change intended. Compile tested only. Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Fixes: `cd3c62282b` ("drm/nouveau/gsp: add usermode class id to gpu hal") Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://lore.kernel.org/r/20250603061027.1310267-1-nichen@iscas.ac.cn	2025-06-06 14:00:06 +10:00
Maxime Ripard	7b1166dee8	Merge drm-next-2025-05-28 into drm-misc-next Christian needs a recent drm-next branch to merge fence patches. Signed-off-by: Maxime Ripard <mripard@kernel.org>	2025-06-03 15:07:39 +02:00
Ben Skeggs	284ad706ad	drm/nouveau: add support for GB20x This commit adds support for the GB20x GPUs found on GeForce RTX 50xx series boards. Beyond a few miscellaneous register moves and HW class ID plumbing, this reuses most of the code added to support GH100/GB10x. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:45 +10:00
Ben Skeggs	56c36f590a	drm/nouveau/gsp: add hal for fifo.chan.doorbell_handle The doorbell register on GB20x GPUs has additional fields. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:44 +10:00
Ben Skeggs	32cb1cc358	drm/nouveau: add support for GB10x This commit enables basic support for the GB100/GB102 Blackwell GPUs. Beyond HW class ID plumbing there's very little change here vs GH100. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:44 +10:00
Ben Skeggs	44f93b209e	drm/nouveau: add support for GH100 This commit enables basic support for Hopper GPUs, and is intended primarily as a base supporting Blackwell GPUs, which reuse most of the code added here. Advanced features such as Confidential Compute are not supported. Beyond a few miscellaneous register moves and HW class ID plumbing, the bulk of the changes implemented here are to support the GSP-RM boot sequence used on Hopper/Blackwell GPUs, as well as a new page table layout. There should be no changes here that impact prior GPUs. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Co-developed-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:44 +10:00
Ben Skeggs	76b8f81a5b	drm/nouveau: improve handling of 64-bit BARs GPUs exist now with a 64-bit BAR0, which mean that BAR1 and BAR2's indices (as passed to pci_resource_len() etc) are bumped up by one. Modify nvkm_device.resource_addr/size() to take an enum instead of an integer bar index, and take IORESOURCE_MEM_64 into account when translating to the "raw" bar id. [airlied: fixup ERR_PTR] Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:35 +10:00
Ben Skeggs	6c3ac7bcfc	drm/nouveau/gsp: support deeper page tables in COPY_SERVER_RESERVED_PDES Use data from 'struct nvkm_vmm_page/desc' to determine which PDEs need to be mirrored to RM instead of hardcoded values for pre-Hopper page tables. Needed to support Hopper/Blackwell. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	bc7849720b	drm/nouveau/gsp: init client VMMs with NV0080_CTRL_DMA_SET_PAGE_DIRECTORY The current code using NV90F1_CTRL_CMD_VASPACE_COPY_SERVER_RESERVED_PDES not only requires changes to support the new page table layout used on Hopper/Blackwell GPUs, but is also broken in that it always mirrors the PDEs used for virtual address 0, rather than the area reserved for RM. This works fine for the non-NVK case where the kernel has full control of the VMM layout and things end up in the right place, but NVK puts its kernel reserved area much higher in the address space. Fixing the code to work at any VA is not enough as some parts of RM want the reserved area in a specific location, and NVK would then hit other assertions in RM instead. Fortunately, it appears that RM never needs to allocate anything within its reserved area for DRM clients, and the COPY_SERVER_RESERVED_PDES control call primarily serves to allow RM to locate the root page table when initialising a channel's instance block. Flag VMMs allocated by the DRM driver as externally owned, and use NV0080_CTRL_CMD_DMA_SET_PAGE_DIRECTORY to inform RM of the root page table in a similar way to NVIDIA's UVM driver. The COPY_SERVER_RESERVED_PDES paths are kept for the golden context image and gr scrubber channel, where RM needs the reserved area. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	708d81a9f5	drm/nouveau/gsp: fetch level shift and PDE from BAR2 VMM When mirroring BAR2 page tables to RM, we need to know the level shift for the root page table (which is currently hardcoded), as well as the raw PDE value (which is currently hardcoded in GP1xx-AD1xx format). In order to support GH100/GBxxx, modify the code to determine the page shift from per-GPU info in nvkm_vmm_page, as well as read the relevant PDE back from the root page table rather than recalculating it. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	82df73d8ee	drm/nouveau/mmu: bump up the maximum page table depth GH100/GBxxx have 6-level page tables. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	0adfd612c0	drm/nouveau/instmem: add hal for set_bar0_window_addr() GH100/GBxxx have moved the register that controls where in VRAM the the BAR0 NV_PRAMIN window points. Add a HAL for this, as the BAR0 window is needed for BAR2 bootstrap. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	2f89bb3264	drm/nouveau/pci: add PRI address of config space mirror to nvkm_pci_func These registers have moved on GH100/GBxxx, and the GSP-RM init code uses hardcoded values from earlier GPUs to fill GspSystemInfo. Replace the per-GPU accessors in nvkm_pci_func with region info, and use it when initialising GspSystemInfo. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	53dac06238	drm/nouveau/gsp: add support for 570.144 Add r570-specific HAL routines, and support loading of GSP-RM version 570.144 if firmware is available. There should be no impact on r535, or non-GSP paths. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	1b9d7b9df8	drm/nouveau/gsp: add common client alloc code 570.144 has incompatible changes to NV0000_ALLOC_PARAMETERS. Factor out the common code so it can be shared. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	9c86a6010a	drm/nouveau/gsp: add hal for gsp.sr_data_size() 570.86.15 uses a slightly different calculation for the size of the sysmem buffer needed to store GSP-RM's vidmem data across suspend. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	f82fb646e1	drm/nouveau/gsp: add hal for disp.chan.dmac_alloc() 565.57.01 has incompatible changes to NV50VAIO_CHANNELDMA_ALLOCATION_PARAMETERS. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	8887abb8cb	drm/nouveau/gsp: add hal for fifo.rc_triggered() 565.57.01 has incompatible changes to rpc_rc_triggered_v17_02. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:26 +10:00
Ben Skeggs	3194beda36	drm/nouveau/gsp: add hal for fifo.rsvd_chids 555.42.02 reserves some CHIDs for internal use. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	27b13dc5d0	drm/nouveau/gsp: add hal for fifo.chan.alloc 570.86.16 has incompatible changes to NV_CHANNEL_ALLOC_PARAMS. At the same time, remove the duplicated channel allocation code from golden context init. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	37a82fa330	drm/nouveau/gsp: add hal for disp.dp.get_caps() 555.42.02 has incompatible changes to NV0073_CTRL_CMD_DP_GET_CAPS. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	cf6b2b5e18	drm/nouveau/gsp: add hal for disp.get_active() 555.42.02 has incompatible changes to NV0073_CTRL_CMD_SYSTEM_GET_ACTIVE. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	bfbae411ed	drm/nouveau/gsp: add hal for disp.get_connect_state() 555.42.02 has incompatible changes to NV0073_CTRL_CMD_SYSTEM_GET_CONNECT_STATE. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	1cf5940bdb	drm/nouveau/gsp: add hal for disp.get_supported() 555.42.02 has incompatible changes to NV0073_CTRL_CMD_SYSTEM_GET_SUPPORTED. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	c21b039715	drm/nouveau/gsp: add hals for fbsr.suspend/resume() 555.42.02 has incompatible changes to FBSR. At the same time, move the calling of FBSR functions from the instmem subdev's suspend/resume paths, to GSP's. This is needed to fix ordering issues that arise from changes to FBSR in newer RM versions. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	207c445b31	drm/nouveau/gsp: add hal for gsp.set_rmargs() 555.42.02 has incompatible changes to GSP_ARGUMENTS_CACHED. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	2f9974fdd5	drm/nouveau/gsp: add hal for gr.get_ctxbufs_info() NV2080_CTRL_CMD_INTERNAL_STATIC_KGR_GET_CONTEXT_BUFFERS_INFO has incompatible changes in 550.40.07. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	f308c9ffdc	drm/nouveau/gsp: add hal for fifo.ectx_size() NV2080_CTRL_CMD_INTERNAL_GET_CONSTRUCTED_FALCON_INFO is moved to NV2080_CTRL_CMD_GPU_GET_CONSTRUCTED_FALCON_INFO in 550.40.07. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00

1 2 3 4 5 ...

1257 Commits