drm/amdgpu: fix the issue of reserving bad pages failed

In amdgpu_ras_reset_gpu, because bad pages may not be freed,
it has high probability to reserve bad pages failed.

Change to reserve bad pages when freeing VRAM.

v2:
1. avoid allocating the drm_mm node outside of amdgpu_vram_mgr.c
2. move bad page reserving into amdgpu_ras_add_bad_pages, if vram mgr
   reserve bad page failed, it will put it into pending list, otherwise
   put it into processed list;
3. remove amdgpu_ras_release_bad_pages, because retired page's info has
   been moved into amdgpu_vram_mgr

v3:
1. formate code style;
2. rename amdgpu_vram_reserve_scope as amdgpu_vram_reservation;
3. rename scope_pending as reservations_pending;
4. rename scope_processed as reserved_pages;
5. change to iterate over all the pending ones and try to insert them
   with drm_mm_reserve_node();

v4:
1. rename amdgpu_vram_mgr_reserve_scope as
amdgpu_vram_mgr_reserve_range;
2. remove unused include "amdgpu_ras.h";
3. rename amdgpu_vram_mgr_check_and_reserve as
amdgpu_vram_mgr_do_reserve;
4. refine amdgpu_vram_mgr_reserve_range to call
amdgpu_vram_mgr_do_reserve.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Hawking Zhang <hawking.zhang@amd.com>
Signed-off-by: Dennis Li <Dennis.Li@amd.com>
Signed-off-by: Wenhui Sheng <Wenhui.Sheng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit is contained in:
Dennis Li
2020-10-22 17:44:55 +08:00
committed by Alex Deucher
parent 5eeb45934c
commit 676deb3877
4 changed files with 156 additions and 119 deletions

View File

@@ -41,10 +41,17 @@
#define AMDGPU_POISON 0xd0bed0be
struct amdgpu_vram_reservation {
struct list_head node;
struct drm_mm_node mm_node;
};
struct amdgpu_vram_mgr {
struct ttm_resource_manager manager;
struct drm_mm mm;
spinlock_t lock;
struct list_head reservations_pending;
struct list_head reserved_pages;
atomic64_t usage;
atomic64_t vis_usage;
};
@@ -122,6 +129,10 @@ void amdgpu_vram_mgr_free_sgt(struct amdgpu_device *adev,
struct sg_table *sgt);
uint64_t amdgpu_vram_mgr_usage(struct ttm_resource_manager *man);
uint64_t amdgpu_vram_mgr_vis_usage(struct ttm_resource_manager *man);
int amdgpu_vram_mgr_reserve_range(struct ttm_resource_manager *man,
uint64_t start, uint64_t size);
int amdgpu_vram_mgr_query_page_status(struct ttm_resource_manager *man,
uint64_t start);
int amdgpu_ttm_init(struct amdgpu_device *adev);
void amdgpu_ttm_late_init(struct amdgpu_device *adev);