Replace the previous O(N^2) implementation of remove_duplicates() with
a O(N) version using a fast/slow pointer approach. The new version
keeps only the first occurrence of each element and compacts the array
in place, improving efficiency without changing functionality.
Signed-off-by: Kuan-Wei Chiu <visitorckw@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dml21_map_dc_state_into_dml_display_cfg calls (the call is usually
inlined by the compiler) populate_dml21_surface_config_from_plane_state
and populate_dml21_plane_config_from_plane_state which may use FPU. In
a x86-64 build:
$ objdump --disassemble=dml21_map_dc_state_into_dml_display_cfg \
> drivers/gpu/drm/amd/display/dc/dml2/dml21/dml21_translation_helper.o |
> grep %xmm -c
63
Thus it needs to be guarded with DC_FP_START. But we must note that the
current code quality of the in-kernel FPU use in AMD dml2 is very much
problematic: we are actually calling DC_FP_START in dml21_wrapper.c
here, and this translation unit is built with CC_FLAGS_FPU. Strictly
speaking this does not make any sense: with CC_FLAGS_FPU the compiler is
allowed to generate FPU uses anywhere in the translated code, perhaps
out of the DC_FP_START guard. This problematic pattern also occurs in
at least dml2_wrapper.c, dcn35_fpu.c, and dcn351_fpu.c. Thus we really
need a careful audit and refactor for the in-kernel FPU uses, and this
patch is simply whacking a mole. However per the reporter, whacking
this mole is enough to make a 9060XT "just work."
Reported-by: Asiacn <710187964@qq.com>
Closes: https://github.com/loongson-community/discussions/issues/102
Tested-by: Asiacn <710187964@qq.com>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
-Certain OVT timings require DSC configurations which divide the
horizontal active unevenly across DSC slices
-DSC slices must be even, so padding needs to be added to the active
to make this possible
-The pixel clock of the HW now needs to be increased to accommodate
the extra padded pixels
-To keep the line time the same, the blank of the HW timing needs to
be increased as well
[How]
-Calculate h_active padding, h_total padding, and pixel clock based
off of the original OVT timing and DSC calculations
-Store these values in the pipe and program HW with these modifications
-Added general support for cases where DSC slice config does not evenly
split the horizontal active by fixing some slice width calculations
-Updated PPS calculations for these cases
Reviewed-by: Chris Park <chris.park@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Bounding box values can be stored in multiple locations. (e.g. PMFW, VBIOS, DMUB).
The source and interpretation of these values can vary with DCN revision
so there should be a component that can gather these values and translate
them accordingly
[How]
Have component start with the statically defined values as a base.
Then update them as needed with DCN-specific logic
Guard this component with FPU flags since values need to be in float point.
Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
-Pipe splitting allows for clocks to be reduced, but when using TMDS 420,
reduced clocks lead to missed clocks cycles on clock resyncing
[How]
-Impose a minimum clock when using TMDS 420
Reviewed-by: Chris Park <chris.park@amd.com>
Signed-off-by: Relja Vojvodic <rvojvodi@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Use max() to reduce the code and improve readability.
No functional changes.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY & HOW]
Setup initial changes required to program another set of watermarks
for a 2nd stutter mode. The 2nd stutter mode will be lower power but
have higher enter/exit latencies.
PMFW to choose which stutter mode to use based on stutter efficiences
to see if original stutter (LP1) or low power stutter (LP2) will result
in better power savings.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace "out == 0" with "!out" for pointer comparison to improve code
readability and conform to coding style.
Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why & How]
- Consolidated the initialization of DML21 parameters into a single
function `dml21_populate_dml_init_params` to streamline the process
and improve code readability.
- Updated the function signatures in the header files to reflect changes
in parameter passing for DML context.
- Removed redundant debug option handling and integrated it into the new
configuration population function.
- Adjusted the DML21 initialization logic in the wrapper to accommodate
the new structure, ensuring compatibility with different DCN versions.
- Enhanced the handling of clock parameters and bounding box configurations
from various sources, including hardware defaults and software policies.
- Improved the clarity of the code by renaming functions and variables for
better understanding of their purposes.
Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Pixel data bandwidth required in mode programming (MP) ends up being
higher than what was calculated in mode support (MS) even though
the prefetch bandwidths calculated in MP are lower than the MS ones.
MP used a different equ prefetch schedule than MS which lead a
slight difference in parameters. This resulted in the pixel data
bandwidth in MP to be higher than MS.
[How]
Rename the RequiredPrefetchBWOTO term so it can be applied generically.
Update the value with the EQU bandwidth if the EQU schedule is used.
Get the max prefetch bandwidth that MS calculated and use it
as part of the calculations for required bandwidth.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
dml2_soc_bb struct can continuously receive updates for future ASICs.
Alignment issues may arise since VBIOS DMCUB contains an older version of
the SOC BB.
Populating the bounding box with values from DMCUB is no longer necessary
since values such as UCLK will be overridden by values acquired by PMFW
anyways.
[HOW]
Use bb_from_dmub to store DCN specific bounding box parameters in DMCUB.
Add helpers to translate DCN specific struct to the corresponding
dml2_soc_bb field.
To avoid alignment issues:
Deprecate applying DMCUB SoC BB for DCN4
For future projects:
Create a flattened struct containing all sensitive parameters in the
bounding box. New parameters can be added to the bottom of the new struct
as needed.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The boolean fast_validate is used as an
input parameter in multiple functions. To
support more scenarios, we are
replacing it with enum dc_validate_mode.
[How]
The enum dc_validate_mode introduces three
possible values:
1) DC_VALIDATE_MODE_AND_PROGRAMMING:
Apply the mode to hardware
2) DC_VALIDATE_MODE_ONLY:
Check whether the mode can be supported
3) DC_VALIDATE_MODE_AND_STATE_INDEX:
Check if the mode can be supported, and
determine the optimal voltage level
needed to support it.
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Yan Li <yan.li@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
mcache allocation programming is not part of DML's core responsibilities.
Keeping this logic in DML leads to poor separation of concerns and complicates maintenance.
[How]
Refactored code to move mcache parameter preparation and mcache ID assignment
into the resource file.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Karthi Kandasamy <karthi.kandasamy@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replace 0 and 1 with false and true for boolean variables in
dml2_core_dcn4_calcs.c and dml2_core_utils.c to align with the Linux
kernel coding style guidelines, which recommend using C99 bool type
with true/false values.
Signed-off-by: Ivan Shamliev <ivan.shamliev.dev@abv.bg>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
There are several gaps that can result in SubVP being enabled with
incompatible HW cursor sizes, and unjust restrictions to cursor size due
to wrong predictions on future usage of SubVP.
[HOW]
- remove "prediction" logic in favor of tagging based on previous SubVP
usage
- block SubVP if current HW cursor settings are incompatible
- provide interface for DM to determine if HW cursor should be disabled
due to an attempt to enable SubVP
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Memory allocation occurs within dml21_validate() for adding phantom planes.
May cause kernel to be tainted due to usage of FP Start.
[How]
Move FP start from dml21_validate to before mode programming/mode support.
Calculations requiring floating point are all done within mode programming
or mode support.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
There are several gaps that can result in SubVP being enabled with
incompatible HW cursor sizes, and unjust restrictions to cursor size due
to wrong predictions on future usage of SubVP
[HOW]
- remove "prediction" logic in favor of tagging based on previous SubVP
usage
- block SubVP if current HW cursor settings are incompatible
- provide interface for DM to determine if HW cursor should be disabled
due to an attempt to enable SubVP
Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
Large stack size observed in DCN4 mode support when compiling with clang.
Additional instrumentation added by compiler adds to stack size.
dml_core_mode_support ends up going over the stack size limit
due to the size of the function.
[How]
Move checks and calculations for prefetch to its own function.
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Mark Broadworth <mark.broadworth@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
For some VR headsets with large blanks, it's possible
to overflow the OTG_VSTARTUP_PARAM:VSTARTUP_START
register. This can lead to incorrect DML calculations
and underflow downstream.
[How]
Min the calcualted max_vstartup_lines with the max
value of the register.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Ryan Seto <ryanseto@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The informative structure needs to be extended by the total number of DPPs
required per each active plane.
The new informative field is going to be used as a statistical indicator.
[How]
The dml2_core_calcs_get_informative() routine must count a total number of DPPs.
Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Oleh Kuzhylnyi <okuzhyln@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
1) The current calculations for OTO prefetch bandwidth do not consider the number of DPP pipes in use.
As a result, OTO prefetch bandwidth may be larger than the vactive bandwidth if multiple DPP pipes are used.
OTO prefetch bandwidth should never exceed the vactive bandwidth.
2) Mode programming may be mismatched with mode support
In cases where mode support has chosen to use the equalized (equ) prefetch schedule,
mode programming may end up using oto prefetch schedule instead.
The bandwidth required to do the oto schedule may end up being higher than the equ schedule.
This can cause the required urgent bandwidth to exceed the available urgent bandwidth.
[How]
Output the oto prefetch bandwidth and incorperate it into the urgent bandwidth calculations
even if the prefetch schedule being used is not the oto schedule.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why/How]
vBlank used to determine the max vStartup is based on the smallest between
the vblank provided by the timing and vblank in ip_caps.
Extra vblank time is not considered if the vblank provided by the timing ends
up being higher than what's defined by the ip_caps
Use 1 less than the vblank size in case the timing is interlaced
so vstartup will always be less than vblank_nom.
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Wayne Lin <wayne.lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are a couple of statements with two following semicolons, replace
these with just one semicolon.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHAT & HOW]
hpo_stream_to_link_encoder_mapping has size MAX_HPO_DP2_ENCODERS(=4),
but location can have size up to 6. As a result, it is necessary to
check location against MAX_HPO_DP2_ENCODERS.
Similiarly, disp_cfg_stream_location can be used as an array index which
should be 0..5, so the ASSERT's conditions should be less without equal.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3904
Reviewed-by: Austin Zheng <Austin.Zheng@amd.com>
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When compiling allmodconfig (CONFIG_WERROR=y) with clang-19, see the
following errors:
.../display/dc/dml2/display_mode_core.c:6268:13: warning: stack frame size (3128) exceeds limit (3072) in 'dml_prefetch_check' [-Wframe-larger-than]
.../display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c:7236:13: warning: stack frame size (3256) exceeds limit (3072) in 'dml_core_mode_support' [-Wframe-larger-than]
Mark static functions called by dml_prefetch_check() and
dml_core_mode_support() noinline_for_stack to avoid them become huge
functions and thus exceed the frame size limit.
A way to reproduce:
$ git checkout next-20250107
$ mkdir build_dir
$ export PATH=/tmp/llvm-19.1.6-x86_64/bin:$PATH
$ make LLVM=1 O=build_dir allmodconfig
$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j
The way how it chose static functions to mark:
[0] Unset CONFIG_WERROR in build_dir/.config.
To get display_mode_core.o without errors.
[1] Get a function list called by dml_prefetch_check().
$ sed -n '6268,6711p' drivers/gpu/drm/amd/display/dc/dml2/display_mode_core.c \
| sed -n -r 's/.*\W(\w+)\(.*/\1/p' | sort -u >/tmp/syms
[2] Get the non-inline function list.
Objdump won't show the symbols if they are inline functions.
$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j
$ objdump -d build_dir/.../display_mode_core.o | \
./scripts/checkstack.pl x86_64 0 | \
grep -f /tmp/syms | cut -d' ' -f2- >/tmp/orig
[3] Get the full function list.
Append "-fno-inline" to `CFLAGS_.../display_mode_core.o` in
drivers/gpu/drm/amd/display/dc/dml2/Makefile.
$ make LLVM=1 O=build_dir drivers/gpu/drm/ -j
$ objdump -d build_dir/.../display_mode_core.o | \
./scripts/checkstack.pl x86_64 0 | \
grep -f /tmp/syms | cut -d' ' -f2- >/tmp/noinline
[4] Get the inline function list.
If a symbol only in /tmp/noinline but not in /tmp/orig, it is a good
candidate to mark noinline.
$ diff /tmp/orig /tmp/noinline
Chosen functions and their stack sizes:
CalculateBandwidthAvailableForImmediateFlip [display_mode_core.o]:144
CalculateExtraLatency [display_mode_core.o]:176
CalculateTWait [display_mode_core.o]:64
CalculateVActiveBandwithSupport [display_mode_core.o]:112
set_calculate_prefetch_schedule_params [display_mode_core.o]:48
CheckGlobalPrefetchAdmissibility [dml2_core_dcn4_calcs.o]:544
calculate_bandwidth_available [dml2_core_dcn4_calcs.o]:320
calculate_vactive_det_fill_latency [dml2_core_dcn4_calcs.o]:272
CalculateDCFCLKDeepSleep [dml2_core_dcn4_calcs.o]:208
CalculateODMMode [dml2_core_dcn4_calcs.o]:208
CalculateOutputLink [dml2_core_dcn4_calcs.o]:176
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>