Commit Graph

17 Commits

Author SHA1 Message Date
Dave Airlie
8f8a4dce64 nouveau: add a third state to the fini handler.
This is just refactoring to allow the lower layers to distinguish
between suspend and runtime suspend.

GSP 570 needs to set a flag with the GPU is going into GCOFF,
this flag taken from the opengpu driver is set whenever runtime
suspend is enterning GCOFF but not for normal suspend paths.

This just refactors the code, a subsequent patch use the information.

Fixes: 53dac06238 ("drm/nouveau/gsp: add support for 570.144")
Cc: <stable@vger.kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Tested-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260203052431.2219998-3-airlied@gmail.com
2026-02-04 12:17:43 +10:00
Dave Airlie
e8b3627bec nouveau: don't attempt fwsec on sb on newer platforms.
The changes to always loads fwsec sb causes problems on newer GPUs
which don't use this path.

Add hooks and pass through the device specific layers.

Fixes: da67179e55 ("drm/nouveau/gsp: Allocate fwsec-sb at boot")
Cc: <stable@vger.kernel.org> # v6.16+
Cc: Lyude Paul <lyude@redhat.com>
Cc: Timur Tabi <ttabi@nvidia.com>
Tested-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Tested-by: Christopher Snowhill <chris@kode54.net>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patch.msgid.link/20260102041829.2748009-1-airlied@gmail.com
2026-01-04 16:55:38 +10:00
Lyude Paul
da67179e55 drm/nouveau/gsp: Allocate fwsec-sb at boot
At the moment - the memory allocation for fwsec-sb is created as-needed and
is released after being used. Typically this is at some point well after
driver load, which can cause runtime suspend/resume to initially work on
driver load but then later fail on a machine that has been running for long
enough with sufficiently high enough memory pressure:

  kworker/7:1: page allocation failure: order:5, mode:0xcc0(GFP_KERNEL),
  nodemask=(null),cpuset=/,mems_allowed=0
  CPU: 7 UID: 0 PID: 875159 Comm: kworker/7:1 Not tainted
  6.17.8-300.fc43.x86_64 #1 PREEMPT(lazy)
  Hardware name: SLIMBOOK Executive/Executive, BIOS N.1.10GRU06 02/02/2024
  Workqueue: pm pm_runtime_work
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   warn_alloc+0x163/0x190
   ? __alloc_pages_direct_compact+0x1b3/0x220
   __alloc_pages_slowpath.constprop.0+0x57a/0xb10
   __alloc_frozen_pages_noprof+0x334/0x350
   __alloc_pages_noprof+0xe/0x20
   __dma_direct_alloc_pages.isra.0+0x1eb/0x330
   dma_direct_alloc_pages+0x3c/0x190
   dma_alloc_pages+0x29/0x130
   nvkm_firmware_ctor+0x1ae/0x280 [nouveau]
   nvkm_falcon_fw_ctor+0x3e/0x60 [nouveau]
   nvkm_gsp_fwsec+0x10e/0x2c0 [nouveau]
   ? sysvec_apic_timer_interrupt+0xe/0x90
   nvkm_gsp_fwsec_sb+0x27/0x70 [nouveau]
   tu102_gsp_fini+0x65/0x110 [nouveau]
   ? ktime_get+0x3c/0xf0
   nvkm_subdev_fini+0x67/0xc0 [nouveau]
   nvkm_device_fini+0x94/0x140 [nouveau]
   nvkm_udevice_fini+0x50/0x70 [nouveau]
   nvkm_object_fini+0xb1/0x140 [nouveau]
   nvkm_object_fini+0x70/0x140 [nouveau]
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   nouveau_do_suspend+0xe4/0x170 [nouveau]
   nouveau_pmops_runtime_suspend+0x3e/0xb0 [nouveau]
   pci_pm_runtime_suspend+0x67/0x1a0
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   __rpm_callback+0x45/0x1f0
   ? __pfx_pci_pm_runtime_suspend+0x10/0x10
   rpm_callback+0x6d/0x80
   rpm_suspend+0xe5/0x5e0
   ? finish_task_switch.isra.0+0x99/0x2c0
   pm_runtime_work+0x98/0xb0
   process_one_work+0x18f/0x350
   worker_thread+0x25a/0x3a0
   ? __pfx_worker_thread+0x10/0x10
   kthread+0xf9/0x240
   ? __pfx_kthread+0x10/0x10
   ? __pfx_kthread+0x10/0x10
   ret_from_fork+0xf1/0x110
   ? __pfx_kthread+0x10/0x10
   ret_from_fork_asm+0x1a/0x30
   </TASK>

The reason this happens is because the fwsec-sb firmware image only
supports being booted from a contiguous coherent sysmem allocation. If a
system runs into enough memory fragmentation from memory pressure, such as
what can happen on systems with low amounts of memory, this can lead to a
situation where it later becomes impossible to find space for a large
enough contiguous allocation to hold fwsec-sb. This causes us to fail to
boot the firmware image, causing the GPU to fail booting and causing the
driver to fail.

Since this firmware can't use non-contiguous allocations, the best solution
to avoid this issue is to simply allocate the memory for fwsec-sb during
initial driver-load, and reuse the memory allocation when fwsec-sb needs to
be used. We then release the memory allocations on driver unload.

Signed-off-by: Lyude Paul <lyude@redhat.com>
Fixes: 594766ca3e ("drm/nouveau/gsp: move booter handling to GPU-specific code")
Cc: <stable@vger.kernel.org> # v6.16+
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Link: https://patch.msgid.link/20251202175918.63533-1-lyude@redhat.com
2025-12-04 20:35:18 -05:00
Mel Henning
2e308a935f drm/nouveau: Remove nvkm_gsp_fwif.enable
This struct element is no longer used.

Signed-off-by: Mel Henning <mhenning@darkrefraction.com>
Reviewed-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Lyude Paul <lyude@redhat.com>
Link: https://lore.kernel.org/r/20250811213843.4294-3-mhenning@darkrefraction.com
2025-08-12 17:36:51 -04:00
Ben Skeggs
32cb1cc358 drm/nouveau: add support for GB10x
This commit enables basic support for the GB100/GB102 Blackwell GPUs.

Beyond HW class ID plumbing there's very little change here vs GH100.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 07:14:44 +10:00
Ben Skeggs
44f93b209e drm/nouveau: add support for GH100
This commit enables basic support for Hopper GPUs, and is intended
primarily as a base supporting Blackwell GPUs, which reuse most of
the code added here.

Advanced features such as Confidential Compute are not supported.

Beyond a few miscellaneous register moves and HW class ID plumbing,
the bulk of the changes implemented here are to support the GSP-RM
boot sequence used on Hopper/Blackwell GPUs, as well as a new page
table layout.

There should be no changes here that impact prior GPUs.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Co-developed-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 07:14:44 +10:00
Ben Skeggs
207c445b31 drm/nouveau/gsp: add hal for gsp.set_rmargs()
555.42.02 has incompatible changes to GSP_ARGUMENTS_CACHED.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 06:29:25 +10:00
Ben Skeggs
57fe0d30a0 drm/nouveau/gsp: add hal for wpr config info + meta init
545.23.06 increases the libos3 heap size requirements, and GH100/GBxxx
will need their own implementation entirely.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 06:29:24 +10:00
Ben Skeggs
befe75ae0d drm/nouveau/gsp: add gpu hal stubs
With GSP-RM handling the majority of the HW programming, NVKM's usual
HALs are more elaborate than necessary, resulting in a fair amount of
duplicated boilerplate.

Adds 'nvkm_rm_gpu' which serves to provide GPU-specific constants and
functions in a more streamlined manner.

This is initially used in subsequent commits to store engine class IDs,
and replace the per-engine/engobj boilerplate with common code for all
GSP-RM supported engines - and is further extended when adding GH100,
GB10x and GB20x support.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 06:29:23 +10:00
Ben Skeggs
594766ca3e drm/nouveau/gsp: move booter handling to GPU-specific code
GH100/GBxxx have significant changes to the GSP-RM boot process.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 06:29:23 +10:00
Ben Skeggs
7f022236b5 drm/nouveau/gsp: move firmware loading to GPU-specific code
GH100/GBxxx use a slightly different set of firmwares to boot GSP-RM.

Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Timur Tabi <ttabi@nvidia.com>
Tested-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2025-05-19 06:29:23 +10:00
Ben Skeggs
176fdcbddf drm/nouveau/gsp/r535: add support for booting GSP-RM
This commit adds the initial code needed to boot the GSP-RM firmware
provided by NVIDIA, bringing with it the beginnings of Ada support.

Until it's had more testing and time to bake, support is disabled by
default (except on Ada).  GSP-RM usage can be enabled by passing the
"config=NvGspRm=1" module option.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-33-skeggsb@gmail.com
2023-10-31 15:08:15 +10:00
Ben Skeggs
015ef6187f drm/nouveau/gsp: prepare for GSP-RM
- move TOP after GSP, so we can disable TOP if GSP is in use
- provide plumbing to support falcon-only and GSP-RM paths
- provide a method for subdevs to detect GSP-RM paths
- split tu102/tu116/ga100 paths from gv100, which can't support GSP-RM

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-5-skeggsb@gmail.com
2023-10-31 15:08:10 +10:00
Ben Skeggs
74f9dcb0df drm/nouveau/gsp: add funcs
Ampere.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
2022-11-09 10:44:57 +10:00
Ben Skeggs
b240b21261 drm/nouveau/gsp: switch to instanced constructor
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
2021-02-11 11:49:53 +10:00
Ben Skeggs
334815ef31 drm/nouveau/gsp: initialise SW state for falcon from constructor
This will allow us to register the falcon with ACR, and further customise
its behaviour by providing the nvkm_falcon_func structure directly.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00
Ben Skeggs
78b10b7403 drm/nouveau/gsp: select implementation based on available firmware
This will allow for further customisation of the subdev depending on what
firmware is available.

Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
2020-01-15 10:50:26 +10:00