drm/mgag200: Add a workaround for low-latency

We found a regression in v5.10 on real-time server, using the rt-kernel and the mgag200 driver. It's some really specialized workload, with <10us latency expectation on isolated core. After the v5.10, the real time tasks missed their <10us latency when something prints on the screen (fbcon or printk) The regression has been bisected to 2 commits: commit 0b34d58b6c ("drm/mgag200: Enable caching for SHMEM pages") commit 4862ffaec5 ("drm/mgag200: Move vmap out of commit tail") The first one changed the system memory framebuffer from Write-Combine to the default caching. Before the second commit, the mgag200 driver used to unmap the framebuffer after each frame, which implicitly does a cache flush. Both regressions are fixed by this commit, which restore WC mapping for the framebuffer in system memory, and add a cache flush. This is only needed on x86_64, for low-latency workload, so the new kconfig DRM_MGAG200_IOBURST_WORKAROUND depends on PREEMPT_RT and X86. For more context, the whole thread can be found here [1] Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> Link: https://lore.kernel.org/dri-devel/20231019135655.313759-1-jfalempe@redhat.com/ # 1 Acked-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patchwork.freedesktop.org/patch/msgid/20240208095125.377908-1-jfalempe@redhat.com
2026-04-24 17:42:27 -04:00 · 2024-02-08 10:51:10 +01:00
parent 0475184905
commit bfa4437fd3
3 changed files with 37 additions and 0 deletions
--- a/drivers/gpu/drm/mgag200/Kconfig
+++ b/drivers/gpu/drm/mgag200/Kconfig
@@ -11,3 +11,15 @@ config DRM_MGAG200
 	 MGA G200 desktop chips and the server variants. It requires 0.3.0
 	 of the modesetting userspace driver, and a version of mga driver
 	 that will fail on KMS enabled devices.
+
+config DRM_MGAG200_IOBURST_WORKAROUND
+	bool "Disable buffer caching"
+	depends on DRM_MGAG200 && PREEMPT_RT && X86
+	help
+	  Enable a workaround to avoid I/O bursts within the mgag200 driver at
+	  the expense of overall display performance.
+	  It restores the <v5.10 behavior, by mapping the framebuffer in system
+	  RAM as Write-Combining, and flushing the cache after each write.
+	  This is only useful on x86_64 if you want to run processes with
+	  deterministic latency.
+	  If unsure, say N.