linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-23 17:15:46 -04:00

Author	SHA1	Message	Date
Nirmoy Das	7546a8201b	Revert "drm/xe/lnl: Offload system clear page activity to GPU" This optimization relied on having to clear CCS on allocations. If there is no need to clear CCS on allocations then this would mostly help in reducing CPU utilization. Revert this patch at this moment because of: 1 Currently Xe can't do clear on free and using a invalid ttm flag, TTM_TT_FLAG_CLEARED_ON_FREE which could poison global ttm pool on multi-device setup. 2 Also for LNL CPU:WB doesn't require clearing CCS as such BO will not be allowed to bind with compression PTE. Subsequent patch will disable clearing CCS for CPU:WB BOs for LNL. This reverts commit `2368306180`. Cc: Christian König <christian.koenig@amd.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Nirmoy Das <nirmoy.das@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240828083635.23601-1-nirmoy.das@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>	2024-08-28 06:45:52 -07:00
Nirmoy Das	2368306180	drm/xe/lnl: Offload system clear page activity to GPU On LNL because of flat CCS, driver creates migrates job to clear CCS meta data. Extend that to also clear system pages using GPU. Inform TTM to allocate pages without __GFP_ZERO to avoid double page clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clear on free as XE now takes care of clearing pages. If a bo is in system placement such as BO created with DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING and there is a cpu map then for such BO gpu clear will be avoided as there is no dma mapping for such BO at that moment to create migration jobs. Tested this patch api_overhead_benchmark_l0 from https://github.com/intel/compute-benchmarks Without the patch: api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation: UsmMemoryAllocation(api=l0 type=Host size=4KB) 84.206 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 105775.56 us erf tool top 5 entries: 71.44% api_overhead_be [kernel.kallsyms] [k] clear_page_erms 6.34% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page 2.24% api_overhead_be [kernel.kallsyms] [k] cpa_flush 2.15% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 1.94% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res With the patch: api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation: UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us Perf tool top 5 entries: 11.16% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page 7.85% api_overhead_be [kernel.kallsyms] [k] cpa_flush 7.59% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res 7.24% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 5.53% api_overhead_be [kernel.kallsyms] [k] lookup_address_in_pgd_attr Without this patch clear_page_erms() dominates execution time which is also not pipelined with migration jobs. With this patch page clearing will get pipelined with migration job and will free CPU for more work. v2: Handle regression on dgfx(Himal) Update commit message as no ttm API changes needed. v3: Fix Kunit test. v4: handle data leak on cpu mmap(Thomas) v5: s/gpu_page_clear/gpu_page_clear_sys and move setting it to xe_ttm_sys_mgr_init() and other nits (Matt Auld) v6: Disable it when init_on_alloc and/or init_on_free is active(Matt) Use compute-benchmarks as reporter used it to report this allocation latency issue also a proper test application than mime. In v5, the test showed significant reduction in alloc latency but that is not the case any more, I think this was mostly because previous test was done on IFWI which had low mem BW from CPU. Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240816135154.19678-2-nirmoy.das@intel.com Signed-off-by: Nirmoy Das <nirmoy.das@intel.com>	2024-08-19 17:49:00 +02:00
Rodrigo Vivi	5b2b3a0fbb	drm/xe: Runtime PM wake on every debugfs call Let's ensure our PCI device is awaken on every debugfs call. Let's increase the runtime_pm protection and start moving that to the outer bounds. Also let's remove the mem_access_{get,put} from where they are not needed anymore. Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240222163937.138342-7-rodrigo.vivi@intel.com	2024-02-26 09:06:45 -05:00
José Roberto de Souza	a121594006	drm/xe: Limit the system memory size to half of the system memory ttm_global_init() imposes this limitation. Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:31:42 -05:00
Chang, Bruce	1a545ed74b	drm/xe: fix pvc unload issue Currently, unload pvc driver will generate a null dereference and the call stack is as below. [ 4850.618000] Call Trace: [ 4850.620740] <TASK> [ 4850.623134] ttm_bo_cleanup_memtype_use+0x3f/0x50 [ttm] [ 4850.628661] ttm_bo_release+0x154/0x2c0 [ttm] [ 4850.633317] ? drm_buddy_fini+0x62/0x80 [drm_buddy] [ 4850.638487] ? __kmem_cache_free+0x27d/0x2c0 [ 4850.643054] ttm_bo_put+0x38/0x60 [ttm] [ 4850.647190] xe_gem_object_free+0x1f/0x30 [xe] [ 4850.651945] drm_gem_object_free+0x1e/0x30 [drm] [ 4850.656904] ggtt_fini_noalloc+0x9d/0xe0 [xe] [ 4850.661574] drm_managed_release+0xb5/0x150 [drm] [ 4850.666617] drm_dev_release+0x30/0x50 [drm] [ 4850.671209] devm_drm_dev_init_release+0x3c/0x60 [drm] There are a couple issues, but the main one is due to TTM has only one TTM_PL_TT region, but since pvc has 2 tiles and tries to setup 1 TTM_PL_TT each tile. The second will overwrite the first one. During unload time, the first tile will reset the TTM_PL_TT manger and when the second tile is trying to free Bo and it will generate the null reference since the TTM manage is already got reset to 0. The fix is to use one global TTM_PL_TT manager. v2: make gtt mgr global and change the name to sys_mgr Cc: Stuart Summers <stuart.summers@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Vivi, Rodrigo <rodrigo.vivi@intel.com> Signed-off-by: Bruce Chang <yu.bruce.chang@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2023-12-19 18:31:30 -05:00

5 Commits