linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-29 12:02:35 -04:00

Author	SHA1	Message	Date
Dave Airlie	8f8a4dce64	nouveau: add a third state to the fini handler. This is just refactoring to allow the lower layers to distinguish between suspend and runtime suspend. GSP 570 needs to set a flag with the GPU is going into GCOFF, this flag taken from the opengpu driver is set whenever runtime suspend is enterning GCOFF but not for normal suspend paths. This just refactors the code, a subsequent patch use the information. Fixes: `53dac06238` ("drm/nouveau/gsp: add support for 570.144") Cc: <stable@vger.kernel.org> Reviewed-by: Lyude Paul <lyude@redhat.com> Tested-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260203052431.2219998-3-airlied@gmail.com	2026-02-04 12:17:43 +10:00
Dave Airlie	e8b3627bec	nouveau: don't attempt fwsec on sb on newer platforms. The changes to always loads fwsec sb causes problems on newer GPUs which don't use this path. Add hooks and pass through the device specific layers. Fixes: `da67179e55` ("drm/nouveau/gsp: Allocate fwsec-sb at boot") Cc: <stable@vger.kernel.org> # v6.16+ Cc: Lyude Paul <lyude@redhat.com> Cc: Timur Tabi <ttabi@nvidia.com> Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev> Tested-by: Christopher Snowhill <chris@kode54.net> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patch.msgid.link/20260102041829.2748009-1-airlied@gmail.com	2026-01-04 16:55:38 +10:00
Lyude Paul	da67179e55	drm/nouveau/gsp: Allocate fwsec-sb at boot At the moment - the memory allocation for fwsec-sb is created as-needed and is released after being used. Typically this is at some point well after driver load, which can cause runtime suspend/resume to initially work on driver load but then later fail on a machine that has been running for long enough with sufficiently high enough memory pressure: kworker/7:1: page allocation failure: order:5, mode:0xcc0(GFP_KERNEL), nodemask=(null),cpuset=/,mems_allowed=0 CPU: 7 UID: 0 PID: 875159 Comm: kworker/7:1 Not tainted 6.17.8-300.fc43.x86_64 #1 PREEMPT(lazy) Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024 Workqueue: pm pm_runtime_work Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 warn_alloc+0x163/0x190 ? __alloc_pages_direct_compact+0x1b3/0x220 __alloc_pages_slowpath.constprop.0+0x57a/0xb10 __alloc_frozen_pages_noprof+0x334/0x350 __alloc_pages_noprof+0xe/0x20 __dma_direct_alloc_pages.isra.0+0x1eb/0x330 dma_direct_alloc_pages+0x3c/0x190 dma_alloc_pages+0x29/0x130 nvkm_firmware_ctor+0x1ae/0x280 [nouveau] nvkm_falcon_fw_ctor+0x3e/0x60 [nouveau] nvkm_gsp_fwsec+0x10e/0x2c0 [nouveau] ? sysvec_apic_timer_interrupt+0xe/0x90 nvkm_gsp_fwsec_sb+0x27/0x70 [nouveau] tu102_gsp_fini+0x65/0x110 [nouveau] ? ktime_get+0x3c/0xf0 nvkm_subdev_fini+0x67/0xc0 [nouveau] nvkm_device_fini+0x94/0x140 [nouveau] nvkm_udevice_fini+0x50/0x70 [nouveau] nvkm_object_fini+0xb1/0x140 [nouveau] nvkm_object_fini+0x70/0x140 [nouveau] ? __pfx_pci_pm_runtime_suspend+0x10/0x10 nouveau_do_suspend+0xe4/0x170 [nouveau] nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau] pci_pm_runtime_suspend+0x67/0x1a0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 __rpm_callback+0x45/0x1f0 ? __pfx_pci_pm_runtime_suspend+0x10/0x10 rpm_callback+0x6d/0x80 rpm_suspend+0xe5/0x5e0 ? finish_task_switch.isra.0+0x99/0x2c0 pm_runtime_work+0x98/0xb0 process_one_work+0x18f/0x350 worker_thread+0x25a/0x3a0 ? __pfx_worker_thread+0x10/0x10 kthread+0xf9/0x240 ? __pfx_kthread+0x10/0x10 ? __pfx_kthread+0x10/0x10 ret_from_fork+0xf1/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> The reason this happens is because the fwsec-sb firmware image only supports being booted from a contiguous coherent sysmem allocation. If a system runs into enough memory fragmentation from memory pressure, such as what can happen on systems with low amounts of memory, this can lead to a situation where it later becomes impossible to find space for a large enough contiguous allocation to hold fwsec-sb. This causes us to fail to boot the firmware image, causing the GPU to fail booting and causing the driver to fail. Since this firmware can't use non-contiguous allocations, the best solution to avoid this issue is to simply allocate the memory for fwsec-sb during initial driver-load, and reuse the memory allocation when fwsec-sb needs to be used. We then release the memory allocations on driver unload. Signed-off-by: Lyude Paul <lyude@redhat.com> Fixes: `594766ca3e` ("drm/nouveau/gsp: move booter handling to GPU-specific code") Cc: <stable@vger.kernel.org> # v6.16+ Reviewed-by: Timur Tabi <ttabi@nvidia.com> Link: https://patch.msgid.link/20251202175918.63533-1-lyude@redhat.com	2025-12-04 20:35:18 -05:00
Mel Henning	2e308a935f	drm/nouveau: Remove nvkm_gsp_fwif.enable This struct element is no longer used. Signed-off-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Ben Skeggs <bskeggs@nvidia.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://lore.kernel.org/r/20250811213843.4294-3-mhenning@darkrefraction.com	2025-08-12 17:36:51 -04:00
Ben Skeggs	32cb1cc358	drm/nouveau: add support for GB10x This commit enables basic support for the GB100/GB102 Blackwell GPUs. Beyond HW class ID plumbing there's very little change here vs GH100. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:44 +10:00
Ben Skeggs	44f93b209e	drm/nouveau: add support for GH100 This commit enables basic support for Hopper GPUs, and is intended primarily as a base supporting Blackwell GPUs, which reuse most of the code added here. Advanced features such as Confidential Compute are not supported. Beyond a few miscellaneous register moves and HW class ID plumbing, the bulk of the changes implemented here are to support the GSP-RM boot sequence used on Hopper/Blackwell GPUs, as well as a new page table layout. There should be no changes here that impact prior GPUs. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Co-developed-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Timur Tabi <ttabi@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 07:14:44 +10:00
Ben Skeggs	207c445b31	drm/nouveau/gsp: add hal for gsp.set_rmargs() 555.42.02 has incompatible changes to GSP_ARGUMENTS_CACHED. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:25 +10:00
Ben Skeggs	57fe0d30a0	drm/nouveau/gsp: add hal for wpr config info + meta init 545.23.06 increases the libos3 heap size requirements, and GH100/GBxxx will need their own implementation entirely. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:24 +10:00
Ben Skeggs	befe75ae0d	drm/nouveau/gsp: add gpu hal stubs With GSP-RM handling the majority of the HW programming, NVKM's usual HALs are more elaborate than necessary, resulting in a fair amount of duplicated boilerplate. Adds 'nvkm_rm_gpu' which serves to provide GPU-specific constants and functions in a more streamlined manner. This is initially used in subsequent commits to store engine class IDs, and replace the per-engine/engobj boilerplate with common code for all GSP-RM supported engines - and is further extended when adding GH100, GB10x and GB20x support. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	594766ca3e	drm/nouveau/gsp: move booter handling to GPU-specific code GH100/GBxxx have significant changes to the GSP-RM boot process. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	7f022236b5	drm/nouveau/gsp: move firmware loading to GPU-specific code GH100/GBxxx use a slightly different set of firmwares to boot GSP-RM. Signed-off-by: Ben Skeggs <bskeggs@nvidia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Timur Tabi <ttabi@nvidia.com> Tested-by: Timur Tabi <ttabi@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2025-05-19 06:29:23 +10:00
Ben Skeggs	176fdcbddf	drm/nouveau/gsp/r535: add support for booting GSP-RM This commit adds the initial code needed to boot the GSP-RM firmware provided by NVIDIA, bringing with it the beginnings of Ada support. Until it's had more testing and time to bake, support is disabled by default (except on Ada). GSP-RM usage can be enabled by passing the "config=NvGspRm=1" module option. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-33-skeggsb@gmail.com	2023-10-31 15:08:15 +10:00
Ben Skeggs	015ef6187f	drm/nouveau/gsp: prepare for GSP-RM - move TOP after GSP, so we can disable TOP if GSP is in use - provide plumbing to support falcon-only and GSP-RM paths - provide a method for subdevs to detect GSP-RM paths - split tu102/tu116/ga100 paths from gv100, which can't support GSP-RM Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-5-skeggsb@gmail.com	2023-10-31 15:08:10 +10:00
Ben Skeggs	74f9dcb0df	drm/nouveau/gsp: add funcs Ampere. Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>	2022-11-09 10:44:57 +10:00
Ben Skeggs	b240b21261	drm/nouveau/gsp: switch to instanced constructor Signed-off-by: Ben Skeggs <bskeggs@redhat.com> Reviewed-by: Lyude Paul <lyude@redhat.com>	2021-02-11 11:49:53 +10:00
Ben Skeggs	334815ef31	drm/nouveau/gsp: initialise SW state for falcon from constructor This will allow us to register the falcon with ACR, and further customise its behaviour by providing the nvkm_falcon_func structure directly. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>	2020-01-15 10:50:26 +10:00
Ben Skeggs	78b10b7403	drm/nouveau/gsp: select implementation based on available firmware This will allow for further customisation of the subdev depending on what firmware is available. Signed-off-by: Ben Skeggs <bskeggs@redhat.com>	2020-01-15 10:50:26 +10:00

17 Commits