The LOGINIT, LOGINTR, LOGRM, and LOGPMU buffers are circular buffers
that have printf-like logs from GSP-RM and PMU encoded in them.
LOGINIT, LOGINTR, and LOGRM are allocated by Nouveau and their DMA
addresses are passed to GSP-RM during initialization. The buffers are
required for GSP-RM to initialize properly.
LOGPMU is also allocated by Nouveau, but its contents are updated
when Nouveau receives an NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPC from
GSP-RM. Nouveau then copies the RPC to the buffer.
The messages are encoded as an array of variable-length structures that
contain the parameters to an NV_PRINTF call. The format string and
parameter count are stored in a special ELF image that contains only
logging strings. This image is not currently shipped with the Nvidia
driver.
There are two methods to extract the logs.
OpenRM tries to load the logging ELF, and if present, parses the log
buffers in real time and outputs the strings to the kernel console.
Alternatively, and this is the method used by this patch, the buffers
can be exposed to user space, and a user-space tool (along with the
logging ELF image) can parse the buffer and dump the logs.
This method has the advantage that it allows the buffers to be parsed
even when the logging ELF file is not available to the user. However,
it has the disadvantage the debugfs entries need to remain until the
driver is unloaded.
The buffers are exposed via debugfs. If GSP-RM fails to initialize, then
Nouveau immediately shuts down the GSP interface. This would normally
also deallocate the logging buffers, thereby preventing the user from
capturing the debug logs.
To avoid this, introduce the keep-gsp-logging command line parameter. If
specified, and if at least one logging buffer has content, then Nouveau
will migrate these buffers into new debugfs entries that are retained
until the driver unloads.
An end-user can capture the logs using the following commands:
cp /sys/kernel/debug/nouveau/<path>/loginit loginit
cp /sys/kernel/debug/nouveau/<path>/logrm logrm
cp /sys/kernel/debug/nouveau/<path>/logintr logintr
cp /sys/kernel/debug/nouveau/<path>/logpmu logpmu
where (for a PCI device) <path> is the PCI ID of the GPU (e.g.
0000:65:00.0).
Since LOGPMU is not needed for normal GSP-RM operation, it is only
created if debugfs is available. Otherwise, the
NV_VGPU_MSG_EVENT_UCODE_LIBOS_PRINT RPCs are ignored.
A simple way to test the buffer migration feature is to have
nvkm_gsp_init() return an error code.
Tested-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Timur Tabi <ttabi@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20241030202952.694055-2-ttabi@nvidia.com
asm/unaligned.h is always an include of asm-generic/unaligned.h;
might as well move that thing to linux/unaligned.h and include
that - there's nothing arch-specific in that header.
auto-generated by the following:
for i in `git grep -l -w asm/unaligned.h`; do
sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
for i in `git grep -l -w asm-generic/unaligned.h`; do
sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
These are some dodgy "convenience" macros for the DRM driver to peek
into NVKM state. They're still used in a few places, but don't belong
in nvif/device.h in any case.
Move them to nouveau_drv.h, and modify callers to pass a nouveau_drm
instead of an nvif_device.
v2:
- use drm->nvkm pointer for nvxx_*() macros, removing some void*
v3:
- add some explanation of the nvxx_*() macros
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-28-bskeggs@nvidia.com
The next commit removes the nvif rd/wr methods from nvif_device, which
were probably a bad idea, and mostly intended as a fallback if ioremap()
failed (or wasn't available, as was the case in some tools I once used).
The nv04 KMS driver already mapped the device, because it's mostly been
kept alive on life-support for many years and still directly bashes PRI
a lot for modesetting.
Post-nv50, I tried pretty hard to keep PRI accesses out of the DRM code,
but there's still a few random places where we do, and those were using
the rd/wr paths prior to this commit.
This allocates and maps a new nvif_device (which will replace the usage
of nouveau_drm.master.device later on), and replicates this pointer to
all other possible users.
This will be cleaned up by the end of another patch series, after it's
been made safe to do so.
Signed-off-by: Ben Skeggs <bskeggs@nvidia.com>
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240726043828.58966-25-bskeggs@nvidia.com
This commit adds the initial code needed to boot the GSP-RM firmware
provided by NVIDIA, bringing with it the beginnings of Ada support.
Until it's had more testing and time to bake, support is disabled by
default (except on Ada). GSP-RM usage can be enabled by passing the
"config=NvGspRm=1" module option.
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230918202149.4343-33-skeggsb@gmail.com
Now that we're supporting things like Ada and the GSP, there's situations
where we really need to actually know the display state that we're starting
with when loading the driver in order to prevent breaking GSP expectations.
The first step in doing this is making it so that we can read the current
state of IORs from nvkm in DRM, so that we can fill in said into into the
atomic state.
We do this by introducing an INHERIT ioctl to nvkm/nvif. This is basically
another form of ACQUIRE, except that it will only acquire the given output
path for userspace if it's already set up in hardware. This way, we can go
through and probe each outp object we have in DRM in order to figure out
the current hardware state of each one. If the outp isn't in use, it simply
returns -ENODEV.
This is also part of the work that will be required for implementing GSP
support for display. While the GSP should mostly work without this commit,
this commit should fix some edge case bugs that can occur on initial driver
load. This also paves the way for some of the initial groundwork for
fastboot support.
Signed-off-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Acked-by: Danilo Krummrich <me@dakr.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230919220442.202488-11-lyude@redhat.com
The new VM_BIND UAPI uses the DRM GPU VA manager to manage the VA space.
Hence, we a need a way to manipulate the MMUs page tables without going
through the internal range allocator implemented by nvkm/vmm.
This patch adds a raw interface for nvkm/vmm to pass the resposibility
for managing the address space and the corresponding map/unmap/sparse
operations to the upper layers.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230804182406.5222-11-dakr@redhat.com
More arrays (and arguments) for dcpd were set to 16, when it looks like
DP_RECEIVER_CAP_SIZE (15) should be used. Fix the remaining cases, seen
with GCC 13:
../drivers/gpu/drm/nouveau/nvif/outp.c: In function 'nvif_outp_acquire_dp':
../include/linux/fortify-string.h:57:33: warning: array subscript 'unsigned char[16][0]' is partly outside array bounds of 'u8[15]' {aka 'unsigned char[15]'} [-Warray-bounds=]
57 | #define __underlying_memcpy __builtin_memcpy
| ^
...
../drivers/gpu/drm/nouveau/nvif/outp.c:140:9: note: in expansion of macro 'memcpy'
140 | memcpy(args.dp.dpcd, dpcd, sizeof(args.dp.dpcd));
| ^~~~~~
../drivers/gpu/drm/nouveau/nvif/outp.c:130:49: note: object 'dpcd' of size [0, 15]
130 | nvif_outp_acquire_dp(struct nvif_outp *outp, u8 dpcd[DP_RECEIVER_CAP_SIZE],
| ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 8134437213 ("drm/nouveau/disp: move DP link config into acquire")
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Signed-off-by: Karol Herbst <git@karolherbst.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230204184307.never.825-kees@kernel.org
Both Coverity and GCC with -Wstringop-overflow noticed that
nvif_outp_acquire_dp() accidentally defined its second argument with 1
additional element:
drivers/gpu/drm/nouveau/dispnv50/disp.c: In function 'nv50_pior_atomic_enable':
drivers/gpu/drm/nouveau/dispnv50/disp.c:1813:17: error: 'nvif_outp_acquire_dp' accessing 16 bytes in a region of size 15 [-Werror=stringop-overflow=]
1813 | nvif_outp_acquire_dp(&nv_encoder->outp, nv_encoder->dp.dpcd, 0, 0, false, false);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/gpu/drm/nouveau/dispnv50/disp.c:1813:17: note: referencing argument 2 of type 'u8[16]' {aka 'unsigned char[16]'}
drivers/gpu/drm/nouveau/include/nvif/outp.h:24:5: note: in a call to function 'nvif_outp_acquire_dp'
24 | int nvif_outp_acquire_dp(struct nvif_outp *, u8 dpcd[16],
| ^~~~~~~~~~~~~~~~~~~~
Avoid these warnings by defining the argument size using the matching
define (DP_RECEIVER_CAP_SIZE, 15) instead of having it be a literal
(and incorrect) value (16).
Reported-by: coverity-bot <keescook+coverity-bot@chromium.org>
Addresses-Coverity-ID: 1527269 ("Memory - corruptions")
Addresses-Coverity-ID: 1527268 ("Memory - corruptions")
Link: https://lore.kernel.org/lkml/202211100848.FFBA2432@keescook/
Link: https://lore.kernel.org/lkml/202211100848.F4C2819BB@keescook/
Fixes: 8134437213 ("drm/nouveau/disp: move DP link config into acquire")
Reviewed-by: Lyude Paul <lyude@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Karol Herbst <kherbst@redhat.com>
Cc: David Airlie <airlied@gmail.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: "Gustavo A. R. Silva" <gustavo@embeddedor.com>
Cc: dri-devel@lists.freedesktop.org
Cc: nouveau@lists.freedesktop.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221127183036.never.139-kees@kernel.org
- replaces the hacked-up version that existed solely to support TTM
v2. remove earlier hack preventing use of non-stall intr for fences
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
- replaces the hacked-up version that existed solely to support TTM
- noop until the next commit, adding proper support for ampere host
v2. fixup for ga103 early merge
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Exposes a bunch of the new features that became possible as a result
of the earlier commits. DRM will build on this in the future to add
support for features such as SCG ("async compute") and multi-device
rendering, as part of the work necessary to be able to write a half-
decent vulkan driver - finally.
For the moment, this just crudely ports DRM to the API changes.
- channel class interfaces now the same for all HW classes
- channel group class exposed (SCG)
- channel runqueue selector exposed (SCG)
- channel sub-device id control exposed (multi-device rendering)
- channel names in logging will reflect creating process, not fd owner
- explicit USERD allocation required by VOLTA_CHANNEL_GPFIFO_A and newer
- drm is smarter about determining the appropriate channel class to use
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
DRM uses this to setup fence-related items.
- nouveau_chan.runlist will always be "0" for the moment, not an issue
as GPUs prior to ampere have system-wide channel IDs,
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Adds the basic skeleton for common channel (group) interfaces.
- common behaviour between <gk104 and >=gk104 impl's
- separates priv/user channel objects
- passthrough to existing object for now, kludges removed later
Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>