This allows us to filter out VM faults in the GMC code.
v2: don't filter out all faults
v3: fix copy&paste typo, send all IV to the KFD, don't change message level
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This allows user mode to map doorbell pages into GPUVM address space.
That way GPUs can submit to user mode queues (self-dispatch).
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is used for interoperability between ROCm compute and graphics
APIs. It allows importing graphics driver BOs into the ROCm SVM
address space for zero-copy GPU access.
The API is split into two steps (query and import) to allow user mode
to manage the virtual address space allocation for the imported buffer.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We don't want KFD processes evicting each other over VRAM usage.
Therefore prevent overcommitting VRAM among KFD applications with
a per-GPU limit. Also leave enough room for page tables on top
of the application memory usage.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Reviewed-by: Eric Huang <JinHuiEric.Huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Avoid including mmu_context.h in amdgpu_amdkfd.h since that may be
included in other header files that define traces. This leads to
conflicts due to traces defined in other headers included via
mmu_context.h.
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY] clarify dal input parameters to pplib interface, remove
un-used parameters. dal knows exactly which parameters needed
and their effects at pplib and smu sides.
current dal sequence for dcn1_update_clock to pplib:
1.smu10_display_clock_voltage_request for dcefclk
2.smu10_display_clock_voltage_request for fclk
3.phm_store_dal_configuration_data {
set_min_deep_sleep_dcfclk
set_active_display_count
store_cc6_data --- this data never be referenced
new sequence will be:
1. set_display_count --- need add new pplib interface
2. set_min_deep_sleep_dcfclk -- new pplib interface
3. set_hard_min_dcfclk_by_freq
4. set_hard_min_fclk_by_freq
after this code refactor, smu10_display_clock_voltage_request,
phm_store_dal_configuration_data will not be needed for rv.
[HOW] step 1: add new functions at pplib interface
step 2: add new functions at amdgpu dm and dc
Signed-off-by: hersen wu <hersenxs.wu@amd.com>
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_ring_soft_recovery would have Call-Trace,
when s_fence->parent was NULL inside amdgpu_job_timedout.
Check fence first, as drm_sched_hw_job_reset did.
Signed-off-by: Wentao Lou <Wentao.Lou@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
PSP only support VMR ring for SRIOV vf since v45 and all commands will
be send to VMR ring for executing.
VMR ring use C2PMSG 101 ~ 103 instead of C2PMSG 64 ~ 71.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If PSP FW is running already, driver will not load PSP FW again and skip
it. So psp fw version is not correct if reading it from FW binary file,
need to get right version from register.
Signed-off-by: Xiangliang Yu <Xiangliang.Yu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It is perfectly possible that the BO list is created before the BO is
exported. While at it clean up setting shared to one instead of true.
v2: add comment and simplify logic
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Huang Rui <ray.huang@amd.com>
Acked-by: Junwei Zhang <Jerry.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
For Picasso && AM4 SOCKET board, we use picasso_rlc_am4.bin
For Picasso && FP5 SOCKET board, we use picasso_rlc.bin
Judgment method:
PCO AM4: revision >= 0xC8 && revision <= 0xCF
or revision >= 0xD8 && revision <= 0xDF
otherwise is PCO FP5
Signed-off-by: Aaron Liu <aaron.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher at amd.com>
Reviewed-by: Huang Rui <ray.huang at amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check if the MC firmware supports FFC and tell the SMC so
mclk switching is handled properly.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
According to Dave, once you've started an L2T flush, all L2T accesses
will be blocked until the flush completes. This fixes a consistent
3-4ms stall between the ioctl and running the job, and 3DMMES Taiji
goes from 27fps to 110fps.
v2: Leave a note about why we don't need to wait for completion.
Signed-off-by: Eric Anholt <eric@anholt.net>
Fixes: 57692c94dc ("drm/v3d: Introduce a new DRM driver for Broadcom V3D V3.x+")
Reviewed-by: Dave Emett <david.emett@broadcom.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181203222438.25417-4-eric@anholt.net
Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3
was good, but still not good enough. To survive 24+ hours under test we
needed to perform not one, not two but three extra store-dw. Doing so
for each GPU relocation was a little unsightly and since we need to
worry about userspace hitting the same issues, we should apply the dummy
store-dw into the EMIT_FLUSH.
Fixes: 7dd4f6729f ("drm/i915: Async GPU relocation processing")
References: 7fa28e1469 ("drm/i915: Write GPU relocs harder with gen3")
Testcase: igt/gem_tiled_fence_blits # blb/pnv
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207134037.11848-1-chris@chris-wilson.co.uk
Although commit fb6f0b64e4 ("drm/i915: Prevent machine hang from
Broxton's vtd w/a and error capture") applied cleanly after a 24 month
hiatus, the code had moved on with new methods for peeking and fetching
the captured gpu info. Make sure we catch all uses of the stashed error
state and avoid dereferencing the error pointer.
v2: Move error pointer determination into i915_gpu_capture_state
v3: Restore early check to avoid capturing and then throwing away
subsequent GPU error states.
Fixes: fb6f0b64e4 ("drm/i915: Prevent machine hang from Broxton's vtd w/a and error capture")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207110554.19897-1-chris@chris-wilson.co.uk
Currently we face a severe problem on Braswell that manifests as invalid
ppGTT accesses. The code tries to maintain the PDP (page directory
pointers) inside the context in two ways, direct write into the context
and a pipelined LRI update. The direct write into the context is
fundamentally racy as it is unserialised with any access (read or write)
the GPU is doing. By asserting that Braswell is not used with vGPU
(currently an unsupported platform) we can eliminate the dangerous
direct write into the context image and solely use the pipelined update.
However, the LRI of the PDP fouls up the GPU, causing it to freeze and
take out the machine with "forcewake ack timeouts". This seems possible
to workaround by preventing the GPU from sleeping (via means of
disabling the power-state management interface, i.e. forcing each ring
to remain awake) around the update. Equally, it seems an EMIT_INVALIDATE
before the LRI is sufficient to prevent the forcewake errors.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108656
References: https://bugs.freedesktop.org/show_bug.cgi?id=108714
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181207090213.14352-3-chris@chris-wilson.co.uk
For a lot of use cases we need 64bit sequence numbers. Currently drivers
overload the dma_fence structure to store the additional bits.
Stop doing that and make the sequence number in the dma_fence always
64bit.
For compatibility with hardware which can do only 32bit sequences the
comparisons in __dma_fence_is_later only takes the lower 32bits as significant
when the upper 32bits are all zero.
v2: change the logic in __dma_fence_is_later
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Chunming Zhou <david1.zhou@amd.com>
Link: https://patchwork.freedesktop.org/patch/266927/
Recently gvt shadow ctx create ppgtt table and this ppgtt's root
pointer is modified at workload dispatch, then we lose the original
ppgtt's root pointer, this causes the ppgtt destroy function abnormal
as it will release the wrong root table.
This patch save i915 context ppgtt root pointer at shadow
ctx creation and restore it at shadow ctx destruction.
v2: Split save and restore function (Zhenyu)
Fixes:4f15665ccbba("drm/i915: Add ppgtt to GVT GEM context")
Signed-off-by: Xiong Zhang <xiong.y.zhang@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Host print below warning message when creating guest:
"gvt: vgpu(2) Invalid FORCE_NONPRIV write 83a8".
Register 0x83a8 should be in force-to-nonpriv whitelist as required by
guest
v2: update commit message to describe purpose of this patch in detail
(zhenyu wang)
Signed-off-by: Zhao Yan <yan.y.zhao@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
Final changes to drm-misc-next for v4.21:
UAPI Changes:
Core Changes:
- Add dma_fence_get_stub to dma-buf, and use it in drm/syncobj.
- Add and use DRM_MODESET_LOCK_BEGIN/END helpers.
- Small fixes to drm_atomic_helper_resume(), drm_mode_setcrtc() and
drm_atomic_helper_commit_duplicated_state()
- Fix drm_atomic_state_helper.[c] extraction.
Driver Changes:
- Small fixes to tinydrm, vkms, meson, rcar-du, virtio, vkms,
v3d, and pl111.
- vc4: Allow scaling and YUV formats on cursor planes.
- v3d: Enable use of the Texture Formatting Unit, and fix
prime imports of buffers from other drivers.
- Add support for the AUO G101EVN010 panel.
- sun4i: Enable support for the H6 display engine.
Signed-off-by: Dave Airlie <airlied@redhat.com>
[airlied: added drm/v3d: fix broken build to the merge commit]
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/321be9d3-ab75-5f92-8193-e5113662edef@linux.intel.com
We can move the remaining RCS workarounds applied to only gen8 to the
engine->wa_list, and then reduce all engine->init_hw callbacks to common
code. The benefit of using the new wa_list is that we verify that the
registers are indeed restored and keep their magic values.
v2: INSTPM_FORCE_ORDERING is already part of gen8_ctx_workarounds, and
as confirmed by the mmio verification is a part of the context image!
v3: MI_MODE is already part of gen8_ctx_workarounds...
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20181206180713.6827-2-chris@chris-wilson.co.uk
If the SOR is already up and running when the kernel driver is probed,
setting a mode will typically fail. This can be seen for example on
Jetson TX2. Under certain circumstances the generic power domain code
will cause the SOR to be reset. However, if the power domain is never
powered off (this can happen if the HDA controller is enabled, which
is part of the same power domain as the SOR), then the SOR will end up
not getting reset and fail to properly set a mode.
To work around this, try to get the reset control and assert/deassert
it, irrespective of whether or not a generic power domain is attached
to the SOR. On platforms where the kernel implements generic power
domains (up to Tegra210) this will fail, because the power domain will
already have acquired an exclusive reference to the reset control. But
on recent platforms there the BPMP provides an ABI to control power
domains, it's possible to acquire the reset control from SOR and use
it to put the SOR into a known good state at probe time.
The proper solution for this is to make the SOR driver capable of
dealing with hardware that's already up and running (by first grace-
fully shutting it down, or perhaps by seamlessly transitioning to the
kernel driver and taking over the running display configuration). That
is fairly involved, though, so we'll go with this quickfix for now.
Signed-off-by: Thierry Reding <treding@nvidia.com>
Remove the temporary workaround of storing the Tegra186 HDMI/DP I/O pad
ID in the SOR driver. The definition has long been available in the
soc/tegra/pmc.h header file.
Signed-off-by: Thierry Reding <treding@nvidia.com>
At enable/disable of the HDCP encryption, for encryption status change
we need minimum one frame duration. And we might program this bit any
point(start/End) in the previous frame.
With 20mSec, observed the timeout for change in encryption status.
Since this is not time critical operation and we need to hold on
until the status is changed, fixing the timeout to 50mSec. (Based on
trial and error method!)
v2:
%s/TIME_FOR_ENCRYPT_STATUS_CHANGE/ENCRYPT_STATUS_CHANGE_TIMEOUT_MS
[Sean Paul]
Signed-off-by: Ramalingam C <ramalingam.c@intel.com>
Reviewed-by: Sean Paul <sean@poorly.run>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/1544010283-20223-5-git-send-email-ramalingam.c@intel.com