mirror of
https://github.com/torvalds/linux.git
synced 2026-04-20 07:43:57 -04:00
Pull drm updates from Dave Airlie:
"Highlights:
- amdgpu support for lots of new IP blocks which means newer GPUs
- xe has a lot of SR-IOV and SVM improvements
- lots of intel display refactoring across i915/xe
- msm has more support for gen8 platforms
- Given up on kgdb/kms integration, it's too hard on modern hw
core:
- drop kgdb support
- replace system workqueue with percpu
- account for property blobs in memcg
- MAINTAINERS updates for xe + buddy
rust:
- Fix documentation for Registration constructors
- Use pin_init::zeroed() for fops initialization
- Annotate DRM helpers with __rust_helper
- Improve safety documentation for gem::Object::new()
- Update AlwaysRefCounted imports
- mm: Prevent integer overflow in page_align()
atomic:
- add drm_device pointer to drm_private_obj
- introduce gamma/degamma LUT size check
buddy:
- fix free_trees memory leak
- prevent BUG_ON
bridge:
- introduce drm_bridge_unplug/enter/exit
- add connector argument to .hpd_notify
- lots of recounting conversions
- convert rockchip inno hdmi to bridge
- lontium-lt9611uxc: switch to HDMI audio helpers
- dw-hdmi-qp: add support for HPD-less setups
- Algoltek AG6311 support
panels:
- edp: CSW MNE007QB3-1, AUO B140HAN06.4, AUO B140QAX01.H
- st75751: add SPI support
- Sitronix ST7920, Samsung LTL106HL02
- LG LH546WF1-ED01, HannStar HSD156J
- BOE NV130WUM-T08
- Innolux G150XGE-L05
- Anbernic RG-DS
dma-buf:
- improve sg_table debugging
- add tracepoints
- call clear_page instead of memset
- start to introduce cgroup memory accounting in heaps
- remove sysfs stats
dma-fence:
- add new helpers
dp:
- mst: avoid oob access with vcpi=0
hdmi:
- limit infoframes exposure to userspace
gem:
- reduce page table overhead with THP
- fix leak in drm_gem_get_unmapped_area
gpuvm:
- API sanitation for rust bindings
sched:
- introduce new helpers
panic:
- report invalid panic modes
- add kunit tests
i915/xe display:
- Expose sharpness only if num_scalers is >= 2
- Add initial Xe3P_LPD for NVL
- BMG FBC support
- Add MTL+ platforms to support dpll framework
_ fix DIMM_S DRM decoding on ICL
- Return to using AUX interrupts
- PSR/Panel replay refactoring
- use consolidation HDMI tables
- Xe3_LPD CD2X dividier changes
xe:
- vfio: add vfio_pci for intel GPU
- multi queue support
- dynamic pagemaps and multi-device SVM
- expose temp attribs in hwmon
- NO_COMPRESSION bo flag
- expose MERT OA unit
- sysfs survivability refactor
- SRIOV PF: add MERT support
- enable SR-IOV VF migration
- Enable I2C/NVM on Crescent Island
- Xe3p page reclaimation support
- introduce SRIOV scheduler groups
- add SoC remappt support in system controller
- insert compiler barriers in GuC code
- define NVL GuC firmware
- handle GT resume failure
- fix drm scheduler layering violations
- enable GSC loading and PXP for PTL
- disable GuC Power DCC strategy on PTL
- unregister drm device on probe error
i915:
- move to kernel standard fault injection
- bump recommended GuC version for DG2 and MTL
amdgpu:
- SMUIO 15.x, PSP 15.x support
- IH 6.1.1/7.1 support
- MMHUB 3.4/4.2 support
- GC 11.5.4/12.1 support
- SDMA 6.1.4/7.1/7.11.4 support
- JPEG 5.3 support
- UserQ updates
- GC 9 gfx queue reset support
- TTM memory ops parallelization
- convert legacy logging to new helpers
- DC analog fixes
amdkfd:
- GC 11.5.4/12.1 suppport
- SDMA 6.1.4/7.1 support
- per context support
- increase kfd process hash table
- Reserved SDMA rework
radeon:
- convert legacy logging to new helpers
- use devm for i2c adapters
msm:
- GPU
- Document a612/RGMU dt bindings
- UBWC 6.0 support (for A840 / Kaanapali)
- a225 support
- DPU:
- Switch to use virtual planes by default
- Fix DSI CMD panels on DPU 3.x
- Rewrite format handling to remove intermediate representation
- Fix watchdog on DPU 8.x+
- Fix TE / Vsync source setting on DPU 8.x+
- Add 3D_Mux on SC7280
- Kaanapali platform support
- Fix UBWC register programming
- Make RM reserve DSPP-enabled mixers for CRTCs with LMs
- Gamma correction support
- DP:
- Enable support for eDP 1.4+ link rate tables
- Fix MDSS1 DP indices on SA8775P, making them to work
- Fix msm_dp_ctrl_config_msa() to work with LLVM 20
- DSI:
- Document QCS8300 as compatible with SA8775P
- Kaanapali platform support
- DSI PHY:
- switch to divider_determine_rate()
- MDP5:
- Drop support for MSM8998, SDM660 and SDM630 (switch over to DPU)
- MDSS:
- Kaanapali platform support
- Fixed UBWC register programming
nova-core:
- Prepare for Turing support. This includes parsing and handling
Turing-specific firmware headers and sections as well as a Turing
Falcon HAL implementation
- Get rid of the Result<impl PinInit<T, E>> anti-pattern
- Relocate initializer-specific code into the appropriate initializer
- Use CStr::from_bytes_until_nul() to remove custom helpers
- Improve handling of unexpected firmware values
- Clean up redundant debug prints
- Replace c_str!() with native Rust C-string literals
- Update nova-core task list
nova:
- Align GEM object size to system page size
tyr:
- Use generated uAPI bindings for GpuInfo
- Replace manual sleeps with read_poll_timeout()
- Replace c_str!() with native Rust C-string literals
- Suppress warnings for unread fields
- Fix incorrect register name in print statement
nouveau:
- fix big page table support races in PTE management
- improve reclocking on tegra 186+
amdxdna:
- fix suspend race conditions
- improve handling of zero tail pointers
- fix cu_idx overwritten during command setup
- enable hardware context priority
- remove NPU2 support
- update message buffer allocation requirements
- update firmware version check
ast:
- support imported cursor buffers
- big endian fixes
etnaviv:
- add PPU flop reset support
imagination:
- add AM62P support
- introduce hw version checks
ivpu:
- implement warm boot flow
panfrost:
- add bo sync ioctl
- add GPU_PM_RT support for RZ/G3E SoC
panthor:
- add bo sync ioctl
- enable timestamp propagation
- scheduler robustness improvements
- VM termination fixes
- huge page support
rockchip:
- RK3368 HDMI Support
- get rid of atomic_check fixups
- RK3506 support
- RK3576/RK3588 improved HPD handling
rz-du:
- RZ/V2H(P) MIPI-DSI Support
v3d:
- fix DMA segment size
- convert to new logging helpers
mediatek:
- move DP training to hotplug thread
- convert logging to new helpers
- add support for HS speed DSI
- Genio 510/700/1200-EVK, Radxa NIO-12L HDMI support
atmel-hlcdc:
- switch to drmm resource
- support nomodeset
- use newer helpers
hisilicon:
- fix various DP bugs
renesas:
- fix kernel panic on reboot
exynos:
- fix vidi_connection_ioctl using wrong device
- fix vidi_connection deref user ptr
- fix concurrency regression with vidi_context
vkms:
- add configfs support for display configuration
* tag 'drm-next-2026-02-11' of https://gitlab.freedesktop.org/drm/kernel: (1610 commits)
drm/xe/pm: Disable D3Cold for BMG only on specific platforms
drm/xe: Fix kerneldoc for xe_tlb_inval_job_alloc_dep
drm/xe: Fix kerneldoc for xe_gt_tlb_inval_init_early
drm/xe: Fix kerneldoc for xe_migrate_exec_queue
drm/xe/query: Fix topology query pointer advance
drm/xe/guc: Fix kernel-doc warning in GuC scheduler ABI header
drm/xe/guc: Fix CFI violation in debugfs access.
accel/amdxdna: Move RPM resume into job run function
accel/amdxdna: Fix incorrect DPM level after suspend/resume
nouveau/vmm: start tracking if the LPT PTE is valid. (v6)
nouveau/vmm: increase size of vmm pte tracker struct to u32 (v2)
nouveau/vmm: rewrite pte tracker using a struct and bitfields.
accel/amdxdna: Fix incorrect error code returned for failed chain command
accel/amdxdna: Remove hardware context status
drm/bridge: imx8qxp-pixel-combiner: Fix bailout for imx8qxp_pc_bridge_probe()
drm/panel: ilitek-ili9882t: Remove duplicate initializers in tianma_il79900a_dsc
drm/i915/display: fix the pixel normalization handling for xe3p_lpd
drm/exynos: vidi: use ctx->lock to protect struct vidi_context member variables related to memory alloc/free
drm/exynos: vidi: fix to avoid directly dereferencing user pointer
drm/exynos: vidi: use priv->vidi_dev for ctx lookup in vidi_connection_ioctl()
...
460 lines
14 KiB
C
460 lines
14 KiB
C
// SPDX-License-Identifier: MIT
|
|
/*
|
|
* Copyright © 2025 Intel Corporation
|
|
*/
|
|
|
|
#include "xe_survivability_mode.h"
|
|
#include "xe_survivability_mode_types.h"
|
|
|
|
#include <linux/kobject.h>
|
|
#include <linux/pci.h>
|
|
#include <linux/sysfs.h>
|
|
|
|
#include "xe_configfs.h"
|
|
#include "xe_device.h"
|
|
#include "xe_heci_gsc.h"
|
|
#include "xe_i2c.h"
|
|
#include "xe_mmio.h"
|
|
#include "xe_nvm.h"
|
|
#include "xe_pcode_api.h"
|
|
#include "xe_vsec.h"
|
|
|
|
/**
|
|
* DOC: Survivability Mode
|
|
*
|
|
* Survivability Mode is a software based workflow for recovering a system in a failed boot state
|
|
* Here system recoverability is concerned with recovering the firmware responsible for boot.
|
|
*
|
|
* Boot Survivability
|
|
* ===================
|
|
*
|
|
* Boot Survivability is implemented by loading the driver with bare minimum (no drm card) to allow
|
|
* the firmware to be flashed through mei driver and collect telemetry. The driver's probe flow is
|
|
* modified such that it enters survivability mode when pcode initialization is incomplete and boot
|
|
* status denotes a failure.
|
|
*
|
|
* Survivability mode can also be entered manually using the survivability mode attribute available
|
|
* through configfs which is beneficial in several usecases. It can be used to address scenarios
|
|
* where pcode does not detect failure or for validation purposes. It can also be used in
|
|
* In-Field-Repair (IFR) to repair a single card without impacting the other cards in a node.
|
|
*
|
|
* Use below command enable survivability mode manually::
|
|
*
|
|
* # echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode
|
|
*
|
|
* It is the responsibility of the user to clear the mode once firmware flash is complete.
|
|
*
|
|
* Refer :ref:`xe_configfs` for more details on how to use configfs
|
|
*
|
|
* Survivability mode is indicated by the below admin-only readable sysfs entry. It
|
|
* provides information about the type of survivability mode (Boot/Runtime).
|
|
*
|
|
* .. code-block:: shell
|
|
*
|
|
* # cat /sys/bus/pci/devices/<device>/survivability_mode
|
|
* Boot
|
|
*
|
|
*
|
|
* Any additional debug information if present will be visible under the directory
|
|
* ``survivability_info``::
|
|
*
|
|
* /sys/bus/pci/devices/<device>/survivability_info/
|
|
* ├── aux_info0
|
|
* ├── aux_info1
|
|
* ├── aux_info2
|
|
* ├── aux_info3
|
|
* ├── aux_info4
|
|
* ├── capability_info
|
|
* ├── fdo_mode
|
|
* ├── postcode_trace
|
|
* └── postcode_trace_overflow
|
|
*
|
|
* This directory has the following attributes
|
|
*
|
|
* - ``capability_info`` : Indicates Boot status and support for additional information
|
|
*
|
|
* - ``postcode_trace``, ``postcode_trace_overflow`` : Each postcode is a 8bit value and
|
|
* represents a boot failure event. When a new failure event is logged by PCODE the
|
|
* existing postcodes are shifted left. These entries provide a history of 8 postcodes.
|
|
*
|
|
* - ``aux_info<n>`` : Some failures have additional debug information
|
|
*
|
|
* - ``fdo_mode`` : To allow recovery in scenarios where MEI itself fails, a new SPI Flash
|
|
* Descriptor Override (FDO) mode is added in v2 survivability breadcrumbs. This mode is enabled
|
|
* by PCODE and provides the ability to directly update the firmware via SPI Driver without
|
|
* any dependency on MEI. Xe KMD initializes the nvm aux driver if FDO mode is enabled.
|
|
*
|
|
* Runtime Survivability
|
|
* =====================
|
|
*
|
|
* Certain runtime firmware errors can cause the device to enter a wedged state
|
|
* (:ref:`xe-device-wedging`) requiring a firmware flash to restore normal operation.
|
|
* Runtime Survivability Mode indicates that a firmware flash is necessary to recover the device and
|
|
* is indicated by the presence of survivability mode sysfs.
|
|
* Survivability mode sysfs provides information about the type of survivability mode.
|
|
*
|
|
* .. code-block:: shell
|
|
*
|
|
* # cat /sys/bus/pci/devices/<device>/survivability_mode
|
|
* Runtime
|
|
*
|
|
* When such errors occur, userspace is notified with the drm device wedged uevent and runtime
|
|
* survivability mode. User can then initiate a firmware flash using userspace tools like fwupd
|
|
* to restore device to normal operation.
|
|
*/
|
|
|
|
static const char * const reg_map[] = {
|
|
[CAPABILITY_INFO] = "Capability Info",
|
|
[POSTCODE_TRACE] = "Postcode trace",
|
|
[POSTCODE_TRACE_OVERFLOW] = "Postcode trace overflow",
|
|
[AUX_INFO0] = "Auxiliary Info 0",
|
|
[AUX_INFO1] = "Auxiliary Info 1",
|
|
[AUX_INFO2] = "Auxiliary Info 2",
|
|
[AUX_INFO3] = "Auxiliary Info 3",
|
|
[AUX_INFO4] = "Auxiliary Info 4",
|
|
};
|
|
|
|
#define FDO_INFO (MAX_SCRATCH_REG + 1)
|
|
|
|
struct xe_survivability_attribute {
|
|
struct device_attribute attr;
|
|
u8 index;
|
|
};
|
|
|
|
static struct
|
|
xe_survivability_attribute *dev_attr_to_survivability_attr(struct device_attribute *attr)
|
|
{
|
|
return container_of(attr, struct xe_survivability_attribute, attr);
|
|
}
|
|
|
|
static void set_survivability_info(struct xe_mmio *mmio, u32 *info, int id)
|
|
{
|
|
info[id] = xe_mmio_read32(mmio, PCODE_SCRATCH(id));
|
|
}
|
|
|
|
static void populate_survivability_info(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
u32 *info = survivability->info;
|
|
struct xe_mmio *mmio;
|
|
u32 id = 0, reg_value;
|
|
|
|
mmio = xe_root_tile_mmio(xe);
|
|
set_survivability_info(mmio, info, CAPABILITY_INFO);
|
|
reg_value = info[CAPABILITY_INFO];
|
|
|
|
survivability->version = REG_FIELD_GET(BREADCRUMB_VERSION, reg_value);
|
|
/* FDO mode is exposed only from version 2 */
|
|
if (survivability->version >= 2)
|
|
survivability->fdo_mode = REG_FIELD_GET(FDO_MODE, reg_value);
|
|
|
|
if (reg_value & HISTORY_TRACKING) {
|
|
set_survivability_info(mmio, info, POSTCODE_TRACE);
|
|
|
|
if (reg_value & OVERFLOW_SUPPORT)
|
|
set_survivability_info(mmio, info, POSTCODE_TRACE_OVERFLOW);
|
|
}
|
|
|
|
/* Traverse the linked list of aux info registers */
|
|
if (reg_value & AUXINFO_SUPPORT) {
|
|
for (id = REG_FIELD_GET(AUXINFO_REG_OFFSET, reg_value);
|
|
id >= AUX_INFO0 && id < MAX_SCRATCH_REG;
|
|
id = REG_FIELD_GET(AUXINFO_HISTORY_OFFSET, info[id]))
|
|
set_survivability_info(mmio, info, id);
|
|
}
|
|
}
|
|
|
|
static void log_survivability_info(struct pci_dev *pdev)
|
|
{
|
|
struct xe_device *xe = pdev_to_xe_device(pdev);
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
u32 *info = survivability->info;
|
|
int id;
|
|
|
|
dev_info(&pdev->dev, "Survivability Boot Status : Critical Failure (%d)\n",
|
|
survivability->boot_status);
|
|
for (id = 0; id < MAX_SCRATCH_REG; id++) {
|
|
if (info[id])
|
|
dev_info(&pdev->dev, "%s: 0x%x\n", reg_map[id], info[id]);
|
|
}
|
|
}
|
|
|
|
static int check_boot_failure(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
|
|
return survivability->boot_status == NON_CRITICAL_FAILURE ||
|
|
survivability->boot_status == CRITICAL_FAILURE;
|
|
}
|
|
|
|
static ssize_t survivability_mode_show(struct device *dev,
|
|
struct device_attribute *attr, char *buff)
|
|
{
|
|
struct pci_dev *pdev = to_pci_dev(dev);
|
|
struct xe_device *xe = pdev_to_xe_device(pdev);
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
|
|
return sysfs_emit(buff, "%s\n", survivability->type ? "Runtime" : "Boot");
|
|
}
|
|
|
|
static DEVICE_ATTR_ADMIN_RO(survivability_mode);
|
|
|
|
static ssize_t survivability_info_show(struct device *dev,
|
|
struct device_attribute *attr, char *buff)
|
|
{
|
|
struct xe_survivability_attribute *sa = dev_attr_to_survivability_attr(attr);
|
|
struct pci_dev *pdev = to_pci_dev(dev);
|
|
struct xe_device *xe = pdev_to_xe_device(pdev);
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
u32 *info = survivability->info;
|
|
|
|
if (sa->index == FDO_INFO)
|
|
return sysfs_emit(buff, "%s\n", str_enabled_disabled(survivability->fdo_mode));
|
|
|
|
return sysfs_emit(buff, "0x%x\n", info[sa->index]);
|
|
}
|
|
|
|
#define SURVIVABILITY_ATTR_RO(name, _index) \
|
|
struct xe_survivability_attribute attr_##name = { \
|
|
.attr = __ATTR(name, 0400, survivability_info_show, NULL), \
|
|
.index = _index, \
|
|
}
|
|
|
|
static SURVIVABILITY_ATTR_RO(capability_info, CAPABILITY_INFO);
|
|
static SURVIVABILITY_ATTR_RO(postcode_trace, POSTCODE_TRACE);
|
|
static SURVIVABILITY_ATTR_RO(postcode_trace_overflow, POSTCODE_TRACE_OVERFLOW);
|
|
static SURVIVABILITY_ATTR_RO(aux_info0, AUX_INFO0);
|
|
static SURVIVABILITY_ATTR_RO(aux_info1, AUX_INFO1);
|
|
static SURVIVABILITY_ATTR_RO(aux_info2, AUX_INFO2);
|
|
static SURVIVABILITY_ATTR_RO(aux_info3, AUX_INFO3);
|
|
static SURVIVABILITY_ATTR_RO(aux_info4, AUX_INFO4);
|
|
static SURVIVABILITY_ATTR_RO(fdo_mode, FDO_INFO);
|
|
|
|
static void xe_survivability_mode_fini(void *arg)
|
|
{
|
|
struct xe_device *xe = arg;
|
|
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
|
|
struct device *dev = &pdev->dev;
|
|
|
|
device_remove_file(dev, &dev_attr_survivability_mode);
|
|
}
|
|
|
|
static umode_t survivability_info_attrs_visible(struct kobject *kobj, struct attribute *attr,
|
|
int idx)
|
|
{
|
|
struct xe_device *xe = kdev_to_xe_device(kobj_to_dev(kobj));
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
u32 *info = survivability->info;
|
|
|
|
/*
|
|
* Last index in survivability_info_attrs is fdo mode and is applicable only in
|
|
* version 2 of survivability mode
|
|
*/
|
|
if (idx == MAX_SCRATCH_REG && survivability->version >= 2)
|
|
return 0400;
|
|
|
|
if (idx < MAX_SCRATCH_REG && info[idx])
|
|
return 0400;
|
|
|
|
return 0;
|
|
}
|
|
|
|
/* Attributes are ordered according to enum scratch_reg */
|
|
static struct attribute *survivability_info_attrs[] = {
|
|
&attr_capability_info.attr.attr,
|
|
&attr_postcode_trace.attr.attr,
|
|
&attr_postcode_trace_overflow.attr.attr,
|
|
&attr_aux_info0.attr.attr,
|
|
&attr_aux_info1.attr.attr,
|
|
&attr_aux_info2.attr.attr,
|
|
&attr_aux_info3.attr.attr,
|
|
&attr_aux_info4.attr.attr,
|
|
&attr_fdo_mode.attr.attr,
|
|
NULL,
|
|
};
|
|
|
|
static const struct attribute_group survivability_info_group = {
|
|
.name = "survivability_info",
|
|
.attrs = survivability_info_attrs,
|
|
.is_visible = survivability_info_attrs_visible,
|
|
};
|
|
|
|
static int create_survivability_sysfs(struct pci_dev *pdev)
|
|
{
|
|
struct device *dev = &pdev->dev;
|
|
struct xe_device *xe = pdev_to_xe_device(pdev);
|
|
int ret;
|
|
|
|
ret = device_create_file(dev, &dev_attr_survivability_mode);
|
|
if (ret) {
|
|
dev_warn(dev, "Failed to create survivability sysfs files\n");
|
|
return ret;
|
|
}
|
|
|
|
ret = devm_add_action_or_reset(xe->drm.dev,
|
|
xe_survivability_mode_fini, xe);
|
|
if (ret)
|
|
return ret;
|
|
|
|
if (check_boot_failure(xe)) {
|
|
ret = devm_device_add_group(dev, &survivability_info_group);
|
|
if (ret)
|
|
return ret;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
static int enable_boot_survivability_mode(struct pci_dev *pdev)
|
|
{
|
|
struct device *dev = &pdev->dev;
|
|
struct xe_device *xe = pdev_to_xe_device(pdev);
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
int ret = 0;
|
|
|
|
ret = create_survivability_sysfs(pdev);
|
|
if (ret)
|
|
return ret;
|
|
|
|
/* Make sure xe_heci_gsc_init() and xe_i2c_probe() are aware of survivability */
|
|
survivability->mode = true;
|
|
|
|
xe_heci_gsc_init(xe);
|
|
|
|
xe_vsec_init(xe);
|
|
|
|
if (survivability->fdo_mode) {
|
|
ret = xe_nvm_init(xe);
|
|
if (ret)
|
|
goto err;
|
|
}
|
|
|
|
ret = xe_i2c_probe(xe);
|
|
if (ret)
|
|
goto err;
|
|
|
|
dev_err(dev, "In Survivability Mode\n");
|
|
|
|
return 0;
|
|
|
|
err:
|
|
dev_err(dev, "Failed to enable Survivability Mode\n");
|
|
survivability->mode = false;
|
|
return ret;
|
|
}
|
|
|
|
/**
|
|
* xe_survivability_mode_is_boot_enabled- check if boot survivability mode is enabled
|
|
* @xe: xe device instance
|
|
*
|
|
* Returns true if in boot survivability mode of type, else false
|
|
*/
|
|
bool xe_survivability_mode_is_boot_enabled(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
|
|
return survivability->mode && survivability->type == XE_SURVIVABILITY_TYPE_BOOT;
|
|
}
|
|
|
|
/**
|
|
* xe_survivability_mode_is_requested - check if it's possible to enable survivability
|
|
* mode that was requested by firmware or userspace
|
|
* @xe: xe device instance
|
|
*
|
|
* This function reads configfs and boot status from Pcode.
|
|
*
|
|
* Return: true if platform support is available and boot status indicates
|
|
* failure or if survivability mode is requested, false otherwise.
|
|
*/
|
|
bool xe_survivability_mode_is_requested(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
struct xe_mmio *mmio = xe_root_tile_mmio(xe);
|
|
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
|
|
u32 data;
|
|
bool survivability_mode;
|
|
|
|
if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || xe->info.platform < XE_BATTLEMAGE)
|
|
return false;
|
|
|
|
survivability_mode = xe_configfs_get_survivability_mode(pdev);
|
|
/* Enable survivability mode if set via configfs */
|
|
if (survivability_mode)
|
|
return true;
|
|
|
|
data = xe_mmio_read32(mmio, PCODE_SCRATCH(0));
|
|
survivability->boot_status = REG_FIELD_GET(BOOT_STATUS, data);
|
|
|
|
return check_boot_failure(xe);
|
|
}
|
|
|
|
/**
|
|
* xe_survivability_mode_runtime_enable - Initialize and enable runtime survivability mode
|
|
* @xe: xe device instance
|
|
*
|
|
* Initialize survivability information and enable runtime survivability mode.
|
|
* Runtime survivability mode is enabled when certain errors cause the device to be
|
|
* in non-recoverable state. The device is declared wedged with the appropriate
|
|
* recovery method and survivability mode sysfs exposed to userspace
|
|
*
|
|
* Return: 0 if runtime survivability mode is enabled, negative error code otherwise.
|
|
*/
|
|
int xe_survivability_mode_runtime_enable(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
|
|
int ret;
|
|
|
|
if (!IS_DGFX(xe) || IS_SRIOV_VF(xe) || xe->info.platform < XE_BATTLEMAGE) {
|
|
dev_err(&pdev->dev, "Runtime Survivability Mode not supported\n");
|
|
return -EINVAL;
|
|
}
|
|
|
|
populate_survivability_info(xe);
|
|
|
|
ret = create_survivability_sysfs(pdev);
|
|
if (ret)
|
|
dev_err(&pdev->dev, "Failed to create survivability mode sysfs\n");
|
|
|
|
survivability->type = XE_SURVIVABILITY_TYPE_RUNTIME;
|
|
dev_err(&pdev->dev, "Runtime Survivability mode enabled\n");
|
|
|
|
xe_device_set_wedged_method(xe, DRM_WEDGE_RECOVERY_VENDOR);
|
|
xe_device_declare_wedged(xe);
|
|
dev_err(&pdev->dev, "Firmware flash required, Please refer to the userspace documentation for more details!\n");
|
|
|
|
return 0;
|
|
}
|
|
|
|
/**
|
|
* xe_survivability_mode_boot_enable - Initialize and enable boot survivability mode
|
|
* @xe: xe device instance
|
|
*
|
|
* Initialize survivability information and enable boot survivability mode
|
|
*
|
|
* Return: 0 if boot survivability mode is enabled or not requested, negative error
|
|
* code otherwise.
|
|
*/
|
|
int xe_survivability_mode_boot_enable(struct xe_device *xe)
|
|
{
|
|
struct xe_survivability *survivability = &xe->survivability;
|
|
struct pci_dev *pdev = to_pci_dev(xe->drm.dev);
|
|
|
|
if (!xe_survivability_mode_is_requested(xe))
|
|
return 0;
|
|
|
|
populate_survivability_info(xe);
|
|
|
|
/*
|
|
* v2 supports survivability mode for critical errors
|
|
*/
|
|
if (survivability->version < 2 && survivability->boot_status == CRITICAL_FAILURE) {
|
|
log_survivability_info(pdev);
|
|
return -ENXIO;
|
|
}
|
|
|
|
survivability->type = XE_SURVIVABILITY_TYPE_BOOT;
|
|
|
|
return enable_boot_survivability_mode(pdev);
|
|
}
|