drm/mgag200: Add a workaround for low-latency

We found a regression in v5.10 on real-time server, using the
rt-kernel and the mgag200 driver. It's some really specialized
workload, with <10us latency expectation on isolated core.
After the v5.10, the real time tasks missed their <10us latency
when something prints on the screen (fbcon or printk)

The regression has been bisected to 2 commits:
commit 0b34d58b6c ("drm/mgag200: Enable caching for SHMEM pages")
commit 4862ffaec5 ("drm/mgag200: Move vmap out of commit tail")

The first one changed the system memory framebuffer from Write-Combine
to the default caching.
Before the second commit, the mgag200 driver used to unmap the
framebuffer after each frame, which implicitly does a cache flush.
Both regressions are fixed by this commit, which restore WC mapping
for the framebuffer in system memory, and add a cache flush.
This is only needed on x86_64, for low-latency workload,
so the new kconfig DRM_MGAG200_IOBURST_WORKAROUND depends on
PREEMPT_RT and X86.

For more context, the whole thread can be found here [1]

Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com>
Link: https://lore.kernel.org/dri-devel/20231019135655.313759-1-jfalempe@redhat.com/ # 1
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20240208095125.377908-1-jfalempe@redhat.com
This commit is contained in:
Jocelyn Falempe
2024-02-08 10:51:10 +01:00
parent 0475184905
commit bfa4437fd3
3 changed files with 37 additions and 0 deletions

View File

@@ -11,3 +11,15 @@ config DRM_MGAG200
MGA G200 desktop chips and the server variants. It requires 0.3.0
of the modesetting userspace driver, and a version of mga driver
that will fail on KMS enabled devices.
config DRM_MGAG200_IOBURST_WORKAROUND
bool "Disable buffer caching"
depends on DRM_MGAG200 && PREEMPT_RT && X86
help
Enable a workaround to avoid I/O bursts within the mgag200 driver at
the expense of overall display performance.
It restores the <v5.10 behavior, by mapping the framebuffer in system
RAM as Write-Combining, and flushing the cache after each write.
This is only useful on x86_64 if you want to run processes with
deterministic latency.
If unsure, say N.