We need kernel scheduling entities to deal with handle clean up
if apps are not cleaned up properly. With commit 56e449603f
("drm/sched: Convert the GPU scheduler to variable number of run-queues")
the scheduler entities have to be created after scheduler init, so
change the ordering to fix this.
v2: Leave logic in UVD and VCE code
Fixes: 56e449603f ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: ltuikov89@gmail.com
The kfd_resume needs to touch GC registers to enable the interrupts,
it needs to be done before GFXOFF is enabled to ensure that the GFX is
not off and GC registers can be touched. So move kfd_resume before the
amdgpu_device_ip_late_init which enables the CGPG/GFXOFF.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If a user has disabled GFXOFF this may cause problems for the suspend
sequence. Ensure that it is enabled in amdgpu_acpi_is_s0ix_active().
The system won't reach the deepest state but it also won't hang.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.
This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.
However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.
In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.
Signed-off-by: Danilo Krummrich <dakr@redhat.com>
Reviewed-by: Luben Tuikov <ltuikov89@gmail.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231110001638.71750-1-dakr@redhat.com
amdgpu_kiq_wreg/rreg is hardcoded to use MEC engine 0.
Add an xcc_id parameter to amdgpu_kiq_wreg/rreg, define W/RREG32_XCC
and amdgpu_device_xcc_wreg/rreg to use the new xcc_id parameter.
Using amdgpu_sriov_runtime to determine whether to access via kiq or
RLC is sufficient for now.
v5: add condition in amdgpu_device_xcc_w/rreg, remove trace func call
v4: avoid using amdgpu_sriov_w/rreg
v3: use W/RREG32_XCC to handle non-kiq case
v2: define amdgpu_device_xcc_wreg/rreg instead of changing parameters
of amdgpu_device_wreg/rreg
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
refine smu mca driver to support query ras error from pmfw path.
- correct gfx smu bank hwid (from mp5 to smu bank)
- retire unused callback function in amdgpu_mca_smu_funcs{}
- add new mca_bank_set{} structure to collect mca bank
- move enum mca_reg_idx into amdgpu_mca.h header
- add mca status register field decode macro
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The following regs can only be programmed by the PF:
HDP_MISC_CNTL
HDP_NONSURFACE_BASE
HDP_NONSURFACE_BASE_HI
v2: update commit message
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Samir Dhume <samir.dhume@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
WREG32/RREG32_SOC15_IP_NO_KIQ and amdgpu_virt_kiq_reg_write_reg_wait
are not using the correct rlcg interface or mec engine, respectively.
Add xcc instance parameter to them.
v4: Use GET_INST and squash commit with:
"drm/amdgpu: Add xcc_inst param to amdgpu_virt_kiq_reg_write_reg_wait"
v3: xcc not needed for MMMHUB
v2: rebase
Signed-off-by: Victor Lu <victorchengchi.lu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
set_xgmi_plpd_mode may be unsupported and this isn't error, no need to
print warning for it.
v2: add ret2 to save the status of psp_ras_trigger_error.
Suggested-by: lijo.lazar@amd.com
Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull more drm updates from Dave Airlie:
"Geert pointed out I missed the renesas reworks in my main pull, so
this pull contains the renesas next work for atomic conversion and DT
support.
It also contains a bunch of amdgpu and some small ssd13xx fixes.
renesas:
- atomic conversion
- DT support
ssd13xx:
- dt binding fix for ssd132x
- Initialize ssd130x crtc_state to NULL.
amdgpu:
- Fix RAS support check
- RAS fixes
- MES fixes
- SMU13 fixes
- Contiguous memory allocation fix
- BACO fixes
- GPU reset fixes
- Min power limit fixes
- GFX11 fixes
- USB4/TB hotplug fixes
- ARM regression fix
- GFX9.4.3 fixes
- KASAN/KCSAN stack size check fixes
- SR-IOV fixes
- SMU14 fixes
- PSP13 fixes
- Display blend fixes
- Flexible array size fixes
amdkfd:
- GPUVM fix
radeon:
- Flexible array size fixes"
* tag 'drm-next-2023-11-07' of git://anongit.freedesktop.org/drm/drm: (83 commits)
drm/amd/display: Enable fast update on blendTF change
drm/amd/display: Fix blend LUT programming
drm/amd/display: Program plane color setting correctly
drm/amdgpu: Query and report boot status
drm/amdgpu: Add psp v13 function to query boot status
drm/amd/swsmu: remove fw version check in sw_init.
drm/amd/swsmu: update smu v14_0_0 driver if and metrics table
drm/amdgpu: Add C2PMSG_109/126 reg field shift/masks
drm/amdgpu: Optimize the asic type fix code
drm/amdgpu: fix GRBM read timeout when do mes_self_test
drm/amdgpu: check recovery status of xgmi hive in ras_reset_error_count
drm/amd/pm: only check sriov vf flag once when creating hwmon sysfs
drm/amdgpu: Attach eviction fence on alloc
drm/amdkfd: Improve amdgpu_vm_handle_moved
drm/amd/display: Increase frame warning limit with KASAN or KCSAN in dml2
drm/amd/display: Avoid NULL dereference of timing generator
drm/amdkfd: Update cache info for GFX 9.4.3
drm/amdkfd: Populate cache info for GFX 9.4.3
drm/amdgpu: don't put MQDs in VRAM on ARM | ARM64
drm/amdgpu/smu13: drop compute workload workaround
...
Doorbell rptr/wptr can be set through multiple ways including direct
register initialization. Disable doorbell during hw_fini once the ring
is disabled so that during next module reload direct initialization
takes effect. Also, move the direct initialization after minor update is
set to 1 since rptr/wptr are reinitialized back to 0 which could be
lower than the previous doorbell value (ex: cases like module reload).
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Le Ma <le.ma@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Tested-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The purpose of this patch is to disable XNACK or set XNACK OFF mode
on SRIOV platform which doesn't support it.
This will prevent user-space application to fail or result into
unexpected behaviour whenever the application need to run test-case
in XNACK ON mode.
Signed-off-by: Surbhi Kakarya <surbhi.kakarya@amd.com>
Reviewed-by: Shaoyun Liu <shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use a proper MEID to make sure the CP_HQD_* and CP_GFX_HQD_* registers
can be touched when initialize the compute and gfx mqd in mes_self_test.
Otherwise, we expect no response from CP and an GRBM eventual timeout.
Signed-off-by: Tim Huang <Tim.Huang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yifan Zhang <yifan1.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Instead of attaching the eviction fence when a KFD BO is first mapped,
attach it when it is allocated or imported. This in preparation to allow
KFD BOs to be mapped using the render node API.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Let amdgpu_vm_handle_moved update all BO VA mappings of BOs reserved by
the caller. This will be useful for handling extra BO VA mappings in
KFD VMs that are managed through the render node API.
v2: rebase against drm_exec changes (Alex)
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Issues were reported with commit 1cfb4d6121
("drm/amdgpu: put MQDs in VRAM") on an ADLINK Ampere
Altra Developer Platform (AVA developer platform).
Various ARM systems seem to have problems related
to PCIe and MMIO access. In this case, I'm not sure
if this is specific to the ADLINK platform or ARM
in general. Seems to be some coherency issue with
VRAM. For now, just don't put MQDs in VRAM on ARM.
Link: https://lists.freedesktop.org/archives/amd-gfx/2023-October/100453.html
Fixes: 1cfb4d6121 ("drm/amdgpu: put MQDs in VRAM")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: alexey.klimov@linaro.org
AMD dGPUs have integrated FW that runs as soon as the
device gets power and initializes the board (determines
the amount of memory, provides configuration details to
the driver, etc.). For direct PCIe attached cards this
happens as soon as power is applied and normally completes
well before the OS has even started loading. However, with
hotpluggable ports like USB4, the driver needs to wait for
this to complete before initializing the device.
This normally takes 60-100ms, but could take longer on
some older boards periodically due to memory training.
Retry for up to a second. In the non-hotplug case, there
should be no change in behavior and this should complete
on the first try.
v2: adjust test criteria
v3: adjust checks for the masks, only enable on removable devices
v4: skip bif_fb_en check
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
[Why] During suspend, if GFX DPM is enabled and GFXOFF feature is
enabled the system may get hung. So, it is suggested to disable
GFXOFF feature during suspend and enable it after resume.
[How] Update the code to disable GFXOFF feature during suspend and enable
it after resume.
[ 311.396526] amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000001E SMN_C2PMSG_82:0x00000000
[ 311.396530] amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
[ 311.396531] [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Acked-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Kun Liu <kun.liu2@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The ATRM ACPI method is for fetching the dGPU vbios rom
image on laptops and all-in-one systems. It should not be
used for external add in cards. If the dGPU is thunderbolt
connected, don't try ATRM.
v2: pci_is_thunderbolt_attached only works for Intel. Use
pdev->external_facing instead.
v3: dev_is_removable() seems to be what we want
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2925
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
GFX doorbell range should be set after flr otherwise the gfx doorbell
range will be overlap with MEC.
v2: remove "amdgpu_sriov_vf" and "amdgpu_in_reset" check, and add grbm
select for the case of 2 gfx rings.
Signed-off-by: Lin.Cao <lincao12@amd.com>
Acked-by: ZhenGuo Yin <zhenguo.yin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Use acpi_evaluate_dsm_typed() instead of open-coding _DSM
evaluation to learn device characteristics (Andy Shevchenko)
- Tidy multi-function header checks using new PCI_HEADER_TYPE_MASK
definition (Ilpo Järvinen)
- Simplify config access error checking in various drivers (Ilpo
Järvinen)
- Use pcie_capability_clear_word() (not
pcie_capability_clear_and_set_word()) when only clearing (Ilpo
Järvinen)
- Add pci_get_base_class() to simplify finding devices using base
class only (ignoring subclass and programming interface) (Sui
Jingfeng)
- Add pci_is_vga(), which includes ancient PCI_CLASS_NOT_DEFINED_VGA
devices from before the Class Code was added to PCI (Sui Jingfeng)
- Use pci_is_vga() for vgaarb, sysfs "boot_vga", virtio, qxl to
include ancient VGA devices (Sui Jingfeng)
Resource management:
- Make pci_assign_unassigned_resources() non-init because sparc uses
it after init (Randy Dunlap)
Driver binding:
- Retain .remove() and .probe() callbacks (previously __init) because
sysfs may cause them to be called later (Uwe Kleine-König)
- Prevent xHCI driver from claiming AMD VanGogh USB3 DRD device, so
it can be claimed by dwc3 instead (Vicki Pfau)
PCI device hotplug:
- Add Ampere Altra Attention Indicator extension driver for acpiphp
(D Scott Phillips)
Power management:
- Quirk VideoPropulsion Torrent QN16e with longer delay after reset
(Lukas Wunner)
- Prevent users from overriding drivers that say we shouldn't use
D3cold (Lukas Wunner)
- Avoid PME from D3hot/D3cold for AMD Rembrandt and Phoenix USB4
because wakeup interrupts from those states don't work if amd-pmc
has put the platform in a hardware sleep state (Mario Limonciello)
IOMMU:
- Disable ATS for Intel IPU E2000 devices with invalidation message
endianness erratum (Bartosz Pawlowski)
Error handling:
- Factor out interrupt enable/disable into helpers (Kai-Heng Feng)
Peer-to-peer DMA:
- Fix flexible-array usage in struct pci_p2pdma_pagemap in case we
ever use pagemaps with multiple entries (Gustavo A. R. Silva)
ASPM:
- Revert a change that broke when drivers disabled L1 and users later
enabled an L1.x substate via sysfs, and fix a similar issue when
users disabled L1 via sysfs (Heiner Kallweit)
Endpoint framework:
- Fix double free in __pci_epc_create() (Dan Carpenter)
- Use IS_ERR_OR_NULL() to simplify endpoint core (Ruan Jinjie)
Cadence PCIe controller driver:
- Drop unused "is_rc" member (Li Chen)
Freescale Layerscape PCIe controller driver:
- Enable 64-bit addressing in endpoint mode (Guanhua Gao)
Intel VMD host bridge driver:
- Fix multi-function header check (Ilpo Järvinen)
Microsoft Hyper-V host bridge driver:
- Annotate struct hv_dr_state with __counted_by (Kees Cook)
NVIDIA Tegra194 PCIe controller driver:
- Drop setting of LNKCAP_MLW (max link width) since dw_pcie_setup()
already does this via dw_pcie_link_set_max_link_width() (Yoshihiro
Shimoda)
Qualcomm PCIe controller driver:
- Use PCIE_SPEED2MBS_ENC() to simplify encoding of link speed
(Manivannan Sadhasivam)
- Add a .write_dbi2() callback so DBI2 register writes, e.g., for
setting the BAR size, work correctly (Manivannan Sadhasivam)
- Enable ASPM for platforms that use 1.9.0 ops, because the PCI core
doesn't enable ASPM states that haven't been enabled by the
firmware (Manivannan Sadhasivam)
Renesas R-Car Gen4 PCIe controller driver:
- Add DesignWare core support (set max link width, EDMA_UNROLL flag,
.pre_init(), .deinit(), etc) for use by R-Car Gen4 driver
(Yoshihiro Shimoda)
- Add driver and DT schema for DesignWare-based Renesas R-Car Gen4
controller in both host and endpoint mode (Yoshihiro Shimoda)
Xilinx NWL PCIe controller driver:
- Update ECAM size to support 256 buses (Thippeswamy Havalige)
- Stop setting bridge primary/secondary/subordinate bus numbers,
since PCI core does this (Thippeswamy Havalige)
Xilinx XDMA controller driver:
- Add driver and DT schema for Zynq UltraScale+ MPSoCs devices with
Xilinx XDMA Soft IP (Thippeswamy Havalige)
Miscellaneous:
- Use FIELD_GET()/FIELD_PREP() to simplify and reduce use of _SHIFT
macros (Ilpo Järvinen, Bjorn Helgaas)
- Remove logic_outb(), _outw(), outl() duplicate declarations (John
Sanpe)
- Replace unnecessary UTF-8 in Kconfig help text because menuconfig
doesn't render it correctly (Liu Song)"
* tag 'pci-v6.7-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (102 commits)
PCI: qcom-ep: Add dedicated callback for writing to DBI2 registers
PCI: Simplify pcie_capability_clear_and_set_word() to ..._clear_word()
PCI: endpoint: Fix double free in __pci_epc_create()
PCI: xilinx-xdma: Add Xilinx XDMA Root Port driver
dt-bindings: PCI: xilinx-xdma: Add schemas for Xilinx XDMA PCIe Root Port Bridge
PCI: xilinx-cpm: Move IRQ definitions to a common header
PCI: xilinx-nwl: Modify ECAM size to enable support for 256 buses
PCI: xilinx-nwl: Rename the NWL_ECAM_VALUE_DEFAULT macro
dt-bindings: PCI: xilinx-nwl: Modify ECAM size in the DT example
PCI: xilinx-nwl: Remove redundant code that sets Type 1 header fields
PCI: hotplug: Add Ampere Altra Attention Indicator extension driver
PCI/AER: Factor out interrupt toggling into helpers
PCI: acpiphp: Allow built-in drivers for Attention Indicators
PCI/portdrv: Use FIELD_GET()
PCI/VC: Use FIELD_GET()
PCI/PTM: Use FIELD_GET()
PCI/PME: Use FIELD_GET()
PCI/ATS: Use FIELD_GET()
PCI/ATS: Show PASID Capability register width in bitmasks
PCI/ASPM: Fix L1 substate handling in aspm_attr_store_common()
...
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Signed-off-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Luben Tuikov <luben.tuikov@amd.com>
Link: https://lore.kernel.org/r/20231031032439.1558703-3-matthew.brost@intel.com
Signed-off-by: Luben Tuikov <ltuikov89@gmail.com>