The BO move throttling code is designed to allow VRAM to fill quickly if it
is relatively empty. However, this does not take into account situations
where the visible VRAM is smaller than total VRAM, and total VRAM may not
be close to full but the visible VRAM segment is under pressure. In such
situations, visible VRAM would experience unrestricted swapping and
performance would drop.
Add a separate counter specifically for moves involving visible VRAM, and
check it before moving BOs there.
v2: Only perform calculations for separate counter if visible VRAM is
smaller than total VRAM. (Michel Dänzer)
v3: [Michel Dänzer]
* Use BO's location rather than the AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED
flag to determine whether to account a move for visible VRAM in most
cases.
* Use a single
if (adev->mc.visible_vram_size < adev->mc.real_vram_size) {
block in amdgpu_cs_get_threshold_for_moves.
Fixes: 95844d20ae (drm/amdgpu: throttle buffer migrations at CS using a fixed MBps limit (v2))
Signed-off-by: John Brooks <john@fastquake.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rather than checking the CONGIG_MEMSIZE register as that may
not be reliable on some APUs.
v2: The scratch register is only used on CIK+
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Call nbio init registers on hw_init to set up any
nbio registers that need initialization at hw init time.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Used for nbio registers that need to be initialized. Currently
only used for a golden setting that got missed on some boards.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows us to write the mapped PTEs into
an IB instead of the table directly.
v2: fix build with debugfs enabled, remove unused assignment
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The arrays pctl0_data and pctl1_data do not need to be in global scope,
so them both static.
Cleans up sparse warnings:
symbol 'pctl0_data' was not declared. Should it be static?
symbol 'pctl1_data' was not declared. Should it be static?
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In previous case, driver can't enable psp via the kernel parameter for raven.
We should open this path and set it as direct by default till psp firmware
loading is workable.
Signed-off-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
nbio hdp flush routine are called within atomic context.
Avoid use KIQ when write to the HDP_MEM_COHERENCY_FLUSH_CNTL register
since this register has its own VF copy
Signed-off-by: Shaoyun Liu <Shaoyun.Liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
for SR-IOV, we must keep the pipeline-sync in the protection
of COND_EXEC, otherwise the command consumed by CPG is not
consistent when world switch triggerd, e.g.:
world switch hit and the IB frame is skipped so the fence
won't signal, thus CP will jump to the next DMAframe's pipeline-sync
command, and it will make CP hang foever.
after pipelin-sync moved into COND_EXEC the consistency can be
guaranteed
Signed-off-by: Monk Liu <Monk.Liu@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Now that we use a pointer to the scratch reg start offset,
most of the functions were duplicated.
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ttm_place are not supposed to change at runtime. All functions
working with ttm_place provided by <drm/ttm/ttm_placement.h> work
with const ttm_place. So mark the non-const structs as const.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Overwriting still used ring content has a low probability to cause
problems, not writing at all has 100% probability to cause problems.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Felix Kuehling <Felix.Kuehling@amd.com>