Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.
Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.
DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.
Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.
Fixes: 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # 5.14.0
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.
[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.
v2: squash in additional DMI entries
v3: squash in static fix
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
The indentation is screwed up. I'm not sure quite how the logic
should flow. Someone more familiar with this code should
verify this.
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Check the value of per_pixel_alpha to decide whether the Coverage pixel
blend mode is applicable or not.
Fixes: 76818cdd11 ("drm/amd/display: add Coverage blend mode for overlay plane")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
divide error: 0000 [#1] SMP PTI
CPU: 3 PID: 78925 Comm: tee Not tainted 5.15.50-1-lts #1
Hardware name: MSI MS-7A59/Z270 SLI PLUS (MS-7A59), BIOS 1.90 01/30/2018
RIP: 0010:smu_v11_0_set_fan_speed_rpm+0x11/0x110 [amdgpu]
Speed is user-configurable through a file.
I accidentally set it to zero, and the driver crashed.
Reviewed-by: Evan Quan <evan.quan@amd.com>
Reviewed-by: André Almeida <andrealmeid@igalia.com>
Signed-off-by: Yefim Barashkin <mr.b34r@kolabnow.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Various DCE versions had trouble with 36 bpp lb depth, requiring fixes,
last time in commit 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display
on CIK GPUs") for DCE-8. So far >= DCE-11.2 was considered ok, but now I
found out that on DCE-11.2 it causes dithering when there shouldn't be
any, so identity pixel passthrough with identity gamma LUTs doesn't work
when it should. This breaks various important neuroscience applications,
as reported to me by scientific users of Polaris cards under Ubuntu 22.04
with Linux 5.15, and confirmed by testing it myself on DCE-11.2.
Lets only use depth 36 for DCN engines, where my testing showed that it
is both necessary for high color precision output, e.g., RGBA16 fb's,
and not harmful, as far as more than one year in real-world use showed.
DCE engines seem to work fine for high precision output at 30 bpp, so
this ("famous last words") depth 30 should hopefully fix all known problems
without introducing new ones.
Successfully retested on DCE-11.2 Polaris and DCN-1.0 Raven Ridge on
top of Linux 5.19.0-rc2 + drm-next.
Fixes: 353ca0fa56 ("drm/amd/display: Fix 10bit 4K display on CIK GPUs")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Tested-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: stable@vger.kernel.org # 5.14.0
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This version brings along following fixes:
- Fixes for MST, MPO, PSRSU, DP 2.0, Freesync and others
- Add register offsets of NBI and DCN.
- Improvement of ALPM
- Removing assert statement for Linux DM
- Re-implementing ARGB16161616 pixel format
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Aric Cyr <aric.cyr@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
With single display odm 2:1 policy, when moving windowed MPO across
the display, we experience a momentary lag when we move between the
centre of the display and the right half of the display. This is
caused by the MPO pipe being reallocated when it crosses this
boundary
[How]
Handle two cases:
1. if the head pipe has a MPO pipe already allocated in the old
context, then use that pipe if it is available in the current
context
2. if the head pipe is on the left side, check the right side to
see if it has a MPO pipe already allocated. If so, don't use
that pipe if it is selected as the idle pipe in the current
context
Add new function pointer called .acquire_idle_pipe_for_head_pipe
that will pass in the head pipe and handle case 1
Add find_idle_secondary_pipe_check_mpo() to handle case 2
if we don't hit case 1.
In dc_add_plane_to_context(), start with head pipe and check
case 1 and 2 in call acquire_free_pipe_for_head().
If we are on the right side of the display, check case 1
again by passing in right side pipe as the new head in
call acquire_free_pipe_for_head().
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Reviewed-by: Ariel Bernstein <Eric.Bernstein@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Samson Tam <Samson.Tam@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why&How]
Add a field to store the DCN IP offset for use with runtime offset
calculation
This offset is indexed using reg*_BASE_IDX for the corresponding
group of registers. For example, address of DIG_BE_CNTL instance 0 is
calculated like: dcn_reg_offsets[regDIG0_DIG_BE_CNTL_BASE_IDX] +
regDIG0_DIG_BE_CNTL.
{dcn,nbio}_reg_offsets are used only for the ASICs for which runtime
initializaion of offsets are enabled through the modified SR* macros
that contain an additional REG_STRUCT element in the macro definition.
DCN3.5+ will fail dc_create() if {dcn,nbio}_reg_offsets are null. They
are applicable starting with DCN32/321 and are not used for ASICs
upstreamed before them. ASICs before DCN32/321 will not contain any
computation that involves {dcn,nbio}_reg_offsets. For them, the
address/offset computation is done during compile time.
This is evident from the BASE_INNER definition for compile time vs run
time initialization:
Compile time init: #define BASE_INNER(seg) DCN_BASE__INST0_SEG ## seg
Run time init: #define BASE_INNER(seg) ctx->dcn_reg_offsets[seg]
BASE_INNER macro is local to each dcnxx_resource.c and hence different
ASICs can have either runtime or compile time initialization of offsets.
The computation of offset is done for registers all at once during
driver load and hence it does not introduce any performance overhead
during normal operation.
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
ABGR16161616 colour format was added to dcn10/20/30, and set
any ARGB16161616 to the same value as it (26). As such, the
HDR10 Green Point y value was too far off of the EDID stated
value for DisplayPort.
[How]
Added back the pixel format as 22 for ARGB16161616 for
dcn10/20/30.
Reviewed-by: Reza Amini <reza.amini@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Ethan Wellenreiter <Ethan.Wellenreiter@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Unbounded request logic in resource/DML has some issues where
unbounded request is being enabled incorrectly. SW today enables
unbounded request unconditionally in hardware, on the assumption
that HW can always support it in single pipe scenarios.
This worked until now because the same assumption is made in DML.
A new DML update is needed to fix a bug, where there are single
pipe scenarios where unbounded cannot be enabled, and this change
in DML needs to be ported in, and dcn32 resource logic fixed.
[how]
First, dcn32_resource should program unbounded req in HW according
to unbounded req enablement output from DML, as opposed to DML input
Second, port in DML1 update which disables unbounded req in some
scenarios to fix an issue with poor stutter performance
Reviewed-by: Nevenko Stupar <Nevenko.Stupar@amd.com>
Reviewed-by: Dmytro Laktyushkin <Dmytro.Laktyushkin@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Jun Lei <jun.lei@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Move reset_context out of gpu recover function to make it configurable
for different reset purpose.
For the reset way of call gpu_recovery sysfs, force to use full reset
method. Otherwise, try soft reset by default if the related ASIC
supportted, if soft reset failed, will use full reset.
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Support SDMA soft reset for SDMA v6.
V3: use ib test to check soft reset.
V4: squash in unused variable fix (Alex)
Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Status flags definition is reduced to read
less bytes in SCDC transaction for status update.
[How]
Reduce definition of reserved bytes from 3 to 1
for status update.
Reviewed-by: Charlene Liu <Charlene.Liu@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Chris Park <chris.park@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
On the GC 10.3.7 platform the initial MEC release version #3 can support
atomic operation,so need correct and set its MEC atomic support version to #3.
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Aaron Liu <aaron.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
Ideally link capability should be independent from the link
configuration that we decide to use in enable link. Otherwise if link
capability is changed after validation has completed, we could end up
enabling a link configuration with invalid configuration. This would
lead to over link bandwidth subscription or in the extreme case
causes us to enable HPO link to a DIO stream.
[how]
Add a new struct in pipe ctx called link config. This structure will
contain link configuration to enable a link. It will be populated
during map pool resources after we validate link bandwidth. Remove
the reference of verified link cap during enable link process and
use link config in pipe ctx instead.
Reviewed-by: George Shen <George.Shen@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[why]
First MST sideband message returns AUX_RET_ERROR_HPD_DISCON
on certain intel platform. Aux transaction considered failure
if HPD unexpected pulled low. The actual aux transaction success
in such case, hence do not return error.
[how]
Not returning error when AUX_RET_ERROR_HPD_DISCON detected
on the first sideband message.
v2: squash in additional DMI entries
v3: squash in static fix
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Acked-by: Solomon Chiu <solomon.chiu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
At bind time, vc4_v3d_bind() will read a register to retrieve the v3d
version and make sure it's a version we're compatible with.
However, the v3d has an optional clock that is enabled only after the
register read-out and a power domain that wasn't enabled at all in the bind
implementation. This was working fine at boot because both were enabled,
but resulted in the version check failing if we were unbinding and
rebinding the driver because the unbinding would have turned them off.
The fix isn't as easy as calling pm_runtime_resume_and_get() prior to the
register access to power up the power domain though.
Indeed, the runtime_resume implementation will enable the clock mentioned
above, call vc4_v3d_init_hw() and then vc4_irq_enable().
Prior to the previous patch, vc4_irq_enable() needed to occur after our
call to platform_get_irq() and vc4_irq_install(), since vc4_irq_enable()
used to call enable_irq() and vc4_irq_install() will call request_irq().
vc4_irq_install() will also do some register access, so needs the power
domain to be on. So we ended up in a situation where
vc4_v3d_runtime_resume() needed vc4_irq_install() to have been called
before, and vc4_irq_install() needed vc4_v3d_runtime_resume().
The previous patch removed the enable_irq() call in vc4_irq_enable() and
thus removed the dependency of vc4_v3d_runtime_resume() on
vc4_irq_install().
Thus, we can now rework our bind implementation to call
pm_runtime_resume_and_get() before our register access to make sure the
power domain is on. vc4_v3d_runtime_resume() also takes care of turning the
clock on and calling vc4_v3d_init_hw() so we can remove them from bind.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-69-maxime@cerno.tech
The vc4_irq_disable(), among other things, will call disable_irq() to
complete any in-flight interrupts.
This requires its counterpart, vc4_irq_enable(), to call enable_irq() which
causes issues addressed in a later patch.
However, vc4_irq_disable() is called by two callees: vc4_irq_uninstall()
and vc4_v3d_runtime_suspend().
vc4_irq_uninstall() also calls free_irq() which already disables the
interrupt line. We thus don't require an explicit disable_irq() for that
call site.
vc4_v3d_runtime_suspend() doesn't have any other code. However, the rest of
vc4_irq_disable() masks the interrupts coming from the v3d, so explictly
disabling the interrupt line is also redundant.
The only thing we really care about is thus to make sure we don't have any
handler in-flight, as suggested by the comment. We can thus replace
disable_irq() by synchronize_irq().
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-68-maxime@cerno.tech
The vc4 has a custom API to allow components to register a debugfs file
before the DRM driver has been registered and the debugfs_init hook has
been called.
However, the .late_register hook allows to have the debugfs file creation
deferred after that time already.
Let's remove our custom code to only register later our debugfs entries as
part of either debugfs_init or after it.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-65-maxime@cerno.tech
Our current code now mixes some resources whose lifetime are tied to the
device (clocks, IO mappings, etc.) and some that are tied to the DRM device
(encoder, bridge).
The device one will be freed at unbind time, but the DRM one will only be
freed when the last user of the DRM device closes its file handle.
So we end up with a time window during which we can call the encoder hooks,
but we don't have access to the underlying resources and device.
Let's protect all those sections with drm_dev_enter() and drm_dev_exit() so
that we bail out if we are during that window.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-63-maxime@cerno.tech
Whenever the device and driver are unbound, the main device and all the
subdevices will be removed by calling their unbind() method.
However, the DRM device itself will only be freed when the last user will
have closed it.
It means that there is a time window where the device and its resources
aren't there anymore, but the userspace can still call into our driver.
Fortunately, the DRM framework provides the drm_dev_enter() and
drm_dev_exit() functions to make sure our underlying device is still there
for the section protected by those calls. Let's add them to the VEC driver.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-61-maxime@cerno.tech
The current code will call drm_connector_unregister() and
drm_connector_cleanup() when the device is unbound. However, by then, there
might still be some references held to that connector, including by the
userspace that might still have the DRM device open.
Let's switch to a DRM-managed initialization to clean up after ourselves
only once the DRM device has been last closed.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-60-maxime@cerno.tech
The current code will call drm_encoder_cleanup() when the device is
unbound. However, by then, there might still be some references held to
that encoder, including by the userspace that might still have the DRM
device open.
Let's switch to a DRM-managed initialization to clean up after ourselves
only once the DRM device has been last closed.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-59-maxime@cerno.tech
Our internal structure that stores the DRM entities structure is allocated
through a device-managed kzalloc.
This means that this will eventually be freed whenever the device is
removed. In our case, the most likely source of removal is that the main
device is going to be unbound, and component_unbind_all() is being run.
However, it occurs while the DRM device is still registered, which will
create dangling pointers, eventually resulting in use-after-free.
Switch to a DRM-managed allocation to keep our structure until the DRM
driver doesn't need it anymore.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-57-maxime@cerno.tech
The VC4 VEC driver private structure contains only a pointer to the
encoder and connector it implements. This makes the overall structure
somewhat inconsistent with the rest of the driver, and complicates its
initialisation without any apparent gain.
Let's embed the drm_encoder structure (through the vc4_encoder one) and
drm_connector into struct vc4_vec to fix both issues.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-56-maxime@cerno.tech
Our current code now mixes some resources whose lifetime are tied to the
device (clocks, IO mappings, etc.) and some that are tied to the DRM device
(encoder, bridge).
The device one will be freed at unbind time, but the DRM one will only be
freed when the last user of the DRM device closes its file handle.
So we end up with a time window during which we can call the encoder hooks,
but we don't have access to the underlying resources and device.
Let's protect all those sections with drm_dev_enter() and drm_dev_exit() so
that we bail out if we are during that window.
Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Maxime Ripard <maxime@cerno.tech>
Link: https://lore.kernel.org/r/20220711173939.1132294-54-maxime@cerno.tech