This looks unnecessary and actually extremely harmful since using kmap()
is not possible while inside the ring reset.
Remove all the extra mapping and unmapping of the MQDs.
v2: also fix debugfs
v3: fix coding style typo
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the case of parsing GFX deferred error from SMU corrected error
channel, the error count should be set to 1 instead of parsing from
MISC0 register, which is 0.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the case of injecting uncorrected error with background workload,
the deferred error among uncorrected errors need to be specified
by checking the deferred and poison bits of status register.
v2: refine checking for deferred error
v2: log possiable DEs among CEs
v2: generate CPER records for DEs among UEs
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten
in gfx hw_init it is read out to popluate the gb_addr_config_fields
in the sw_init stage, which causes mismatch.
Fix it by using the golden value in sw_init as well.
v2: This is a driver-set golden reg and keep as it is (Lijo)
Signed-off-by: Shiwu Zhang <shiwu.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In the case of poison inband log, the error type need to be specified
by checking the deferred or poison bit of status register.
v2: check both deferred and poison bit
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.
To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.
This addition is useful for creating CPER records.
Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Yang Wang <keivnyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Remove unnecessary IP version checks for GFX 9.4.3 and similar variants.
Wrap checks inside meaningful function.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Users might switch to ROCM build does not include MEC SJT FW and driver
needs to consider this case.w
Signed-off-by: yfeng1 <yfeng1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Some of the firmware that is loaded by amdgpu is not actually required.
For example the ISP firmware on some SoCs is optional, and if it's not
present the ISP IP block just won't be initialized.
The firmware loader core however will show a warning when this happens
like this:
```
Direct firmware load for amdgpu/isp_4_1_0.bin failed with error -2
```
To avoid confusion for non-required firmware, adjust the amd-ucode helper
to take an extra argument indicating if the firmware is required or
optional.
On optional firmware use firmware_request_nowarn() instead of
request_firmware() to avoid the warnings.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/amd-gfx/df71d375-7abd-4b32-97ce-15e57846eed8@amd.com/T/#t
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pass ip_block instead of adev in set_clockgating_state() callback
functions. Modify set_clockgating_state()for all correspoding ip blocks.
v2: remove all changes for is_idle(), remove type casting
Signed-off-by: Boyuan Zhang <boyuan.zhang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There's no need to perform the two MMIO writes in the KIQ
Setting registers programmed period, and reducing the MMIO
writes will save the driver loading time.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use second jump table in sriov for live migration or mulitple VF
support so different VF can load different version of MEC as long
as they support sjt
Signed-off-by: Victor Zhao <Victor.Zhao@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add two sysfs interfaces for gfx and compute:
gfx_reset_mask
compute_reset_mask
These interfaces are read-only and show the resets supported by the IP.
For example, full adapter reset (mode1/mode2/BACO/etc),
soft reset, queue reset, and pipe reset.
V2: the sysfs node returns a text string instead of some flags (Christian)
v3: add a generic helper which takes the ring as parameter
and print the strings in the order they are applied (Christian)
check amdgpu_gpu_recovery before creating sysfs file itself,
and initialize supported_reset_types in IP version files (Lijo)
v4: Fixing uninitialized variables (Tim)
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tim Huang <tim.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Optimize gfx_v9_4_3_ring_insert_nop() to call
optimized version of amdgpu_ring_insert_nop
instead of calling amdgpu_ring_write for number
of nop times.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_fini.
Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_fini on suspend
and some followed clean up.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Update the *handle to amdgpu_ip_block ptr for all
functions pointers of hw_init.
Also update the ip_block ptr where ever needed as
there were cyclic dependency of hw_init on resume.
v2: squash in isp fix
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to make sure it's halted as we don't know what state
the GPU may have been left in previously.
Tested-by: Amber Lin <Amber.Lin@amd.com>
Acked-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Need to set the pipe reset and cache invalidation bits
on halt otherwise we can get stale state if the CP firmware
changes (e.g., on module unload and reload).
Tested-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Drop wrapper used in one place. gfx_v9_4_3_xcc_cp_enable()
is used in one place. gfx_v9_4_3_xcc_cp_compute_enable()
is used everywhere else.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Implement the compute pipe reset, and the driver will
fallback to pipe reset when queue reset fails.
The pipe reset only deactivates the queue which is
scheduled in the pipe, and meanwhile the MEC pipe
will be reset to the firmware _start pointer. So,
it seems pipe reset will cost more cycles than the
queue reset; therefore, the driver tries to recover
by doing queue reset first.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit applies isolation enforcement to the GFX and Compute rings
in the gfx_v9_4_3 module.
The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and
`amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be
called when a ring begins and ends its use, respectively.
`amdgpu_gfx_enforce_isolation_ring_begin_use` is called when a ring
begins its use. This function cancels any scheduled
`enforce_isolation_work` and, if necessary, signals the Kernel Fusion
Driver (KFD) to stop the runqueue.
`amdgpu_gfx_enforce_isolation_ring_end_use` is called when a ring ends
its use. This function schedules `enforce_isolation_work` to be run
after a delay.
These functions are part of the Enforce Isolation Handler, which
enforces shader isolation on AMD GPUs to prevent data leakage between
different processes.
The commit also includes a check for the type of the ring. If the type
of the ring is `AMDGPU_RING_TYPE_COMPUTE`, the `xcp_id` of the
`enforce_isolation` structure in the `gfx` structure of the
`amdgpu_device` is set to the `xcp_id` of the ring. This ensures that
the correct `xcp_id` is used when enforcing isolation on compute rings.
The `xcp_id` is an identifier for an XCP partition, and different rings
can be associated with different XCP partitions.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
This commit extends the cleaner shader feature to support GFX9.4.4
hardware.
The cleaner shader feature is used to clear or initialize certain GPU
resources, such as Local Data Share (LDS), Vector General Purpose
Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). This
operation needs to be performed in isolation, while no other tasks
should be running on the GPU at the same time.
Previously, the cleaner shader feature was implemented for GFX9.4.3
hardware. This commit adds support for GFX9.4.4 hardware by allowing the
cleaner shader to be used with this hardware version.
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>