Wachowski, Karol
72b96ec655
accel/ivpu: Make parts of FW image read-only
...
Implement setting specified buffer ranges as read-only.
In case if specified range is not 64K aligned and 64K contiguous
MMU600 pages are turned on, split 64K mapping to allow 4K granularity
for read-only configuration.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240611120433.1012423-10-jacek.lawrynowicz@linux.intel.com
2024-06-14 09:14:29 +02:00
Kent Overstreet
0069455bcb
fix missing vmalloc.h includes
...
Patch series "Memory allocation profiling", v6.
Overview:
Low overhead [1] per-callsite memory allocation profiling. Not just for
debug kernels, overhead low enough to be deployed in production.
Example output:
root@moria-kvm:~# sort -rn /proc/allocinfo
127664128 31168 mm/page_ext.c:270 func:alloc_page_ext
56373248 4737 mm/slub.c:2259 func:alloc_slab_page
14880768 3633 mm/readahead.c:247 func:page_cache_ra_unbounded
14417920 3520 mm/mm_init.c:2530 func:alloc_large_system_hash
13377536 234 block/blk-mq.c:3421 func:blk_mq_alloc_rqs
11718656 2861 mm/filemap.c:1919 func:__filemap_get_folio
9192960 2800 kernel/fork.c:307 func:alloc_thread_stack_node
4206592 4 net/netfilter/nf_conntrack_core.c:2567 func:nf_ct_alloc_hashtable
4136960 1010 drivers/staging/ctagmod/ctagmod.c:20 [ctagmod] func:ctagmod_start
3940352 962 mm/memory.c:4214 func:alloc_anon_folio
2894464 22613 fs/kernfs/dir.c:615 func:__kernfs_new_node
...
Usage:
kconfig options:
- CONFIG_MEM_ALLOC_PROFILING
- CONFIG_MEM_ALLOC_PROFILING_ENABLED_BY_DEFAULT
- CONFIG_MEM_ALLOC_PROFILING_DEBUG
adds warnings for allocations that weren't accounted because of a
missing annotation
sysctl:
/proc/sys/vm/mem_profiling
Runtime info:
/proc/allocinfo
Notes:
[1]: Overhead
To measure the overhead we are comparing the following configurations:
(1) Baseline with CONFIG_MEMCG_KMEM=n
(2) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n)
(3) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y)
(4) Enabled at runtime (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n && /proc/sys/vm/mem_profiling=1)
(5) Baseline with CONFIG_MEMCG_KMEM=y && allocating with __GFP_ACCOUNT
(6) Disabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=n) && CONFIG_MEMCG_KMEM=y
(7) Enabled by default (CONFIG_MEM_ALLOC_PROFILING=y &&
CONFIG_MEM_ALLOC_PROFILING_BY_DEFAULT=y) && CONFIG_MEMCG_KMEM=y
Performance overhead:
To evaluate performance we implemented an in-kernel test executing
multiple get_free_page/free_page and kmalloc/kfree calls with allocation
sizes growing from 8 to 240 bytes with CPU frequency set to max and CPU
affinity set to a specific CPU to minimize the noise. Below are results
from running the test on Ubuntu 22.04.2 LTS with 6.8.0-rc1 kernel on
56 core Intel Xeon:
kmalloc pgalloc
(1 baseline) 6.764s 16.902s
(2 default disabled) 6.793s (+0.43%) 17.007s (+0.62%)
(3 default enabled) 7.197s (+6.40%) 23.666s (+40.02%)
(4 runtime enabled) 7.405s (+9.48%) 23.901s (+41.41%)
(5 memcg) 13.388s (+97.94%) 48.460s (+186.71%)
(6 def disabled+memcg) 13.332s (+97.10%) 48.105s (+184.61%)
(7 def enabled+memcg) 13.446s (+98.78%) 54.963s (+225.18%)
Memory overhead:
Kernel size:
text data bss dec diff
(1) 26515311 18890222 17018880 62424413
(2) 26524728 19423818 16740352 62688898 264485
(3) 26524724 19423818 16740352 62688894 264481
(4) 26524728 19423818 16740352 62688898 264485
(5) 26541782 18964374 16957440 62463596 39183
Memory consumption on a 56 core Intel CPU with 125GB of memory:
Code tags: 192 kB
PageExts: 262144 kB (256MB)
SlabExts: 9876 kB (9.6MB)
PcpuExts: 512 kB (0.5MB)
Total overhead is 0.2% of total memory.
Benchmarks:
Hackbench tests run 100 times:
hackbench -s 512 -l 200 -g 15 -f 25 -P
baseline disabled profiling enabled profiling
avg 0.3543 0.3559 (+0.0016) 0.3566 (+0.0023)
stdev 0.0137 0.0188 0.0077
hackbench -l 10000
baseline disabled profiling enabled profiling
avg 6.4218 6.4306 (+0.0088) 6.5077 (+0.0859)
stdev 0.0933 0.0286 0.0489
stress-ng tests:
stress-ng --class memory --seq 4 -t 60
stress-ng --class cpu --seq 4 -t 60
Results posted at: https://evilpiepirate.org/~kent/memalloc_prof_v4_stress-ng/
[2] https://lore.kernel.org/all/20240306182440.2003814-1-surenb@google.com/
This patch (of 37):
The next patch drops vmalloc.h from a system header in order to fix a
circular dependency; this adds it to all the files that were pulling it in
implicitly.
[kent.overstreet@linux.dev: fix arch/alpha/lib/memcpy.c]
Link: https://lkml.kernel.org/r/20240327002152.3339937-1-kent.overstreet@linux.dev
[surenb@google.com: fix arch/x86/mm/numa_32.c]
Link: https://lkml.kernel.org/r/20240402180933.1663992-1-surenb@google.com
[kent.overstreet@linux.dev: a few places were depending on sizes.h]
Link: https://lkml.kernel.org/r/20240404034744.1664840-1-kent.overstreet@linux.dev
[arnd@arndb.de: fix mm/kasan/hw_tags.c]
Link: https://lkml.kernel.org/r/20240404124435.3121534-1-arnd@kernel.org
[surenb@google.com: fix arc build]
Link: https://lkml.kernel.org/r/20240405225115.431056-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-1-surenb@google.com
Link: https://lkml.kernel.org/r/20240321163705.3067592-2-surenb@google.com
Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev >
Signed-off-by: Suren Baghdasaryan <surenb@google.com >
Signed-off-by: Arnd Bergmann <arnd@arndb.de >
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com >
Tested-by: Kees Cook <keescook@chromium.org >
Cc: Alexander Viro <viro@zeniv.linux.org.uk >
Cc: Alex Gaynor <alex.gaynor@gmail.com >
Cc: Alice Ryhl <aliceryhl@google.com >
Cc: Andreas Hindborg <a.hindborg@samsung.com >
Cc: Benno Lossin <benno.lossin@proton.me >
Cc: "Björn Roy Baron" <bjorn3_gh@protonmail.com >
Cc: Boqun Feng <boqun.feng@gmail.com >
Cc: Christoph Lameter <cl@linux.com >
Cc: Dennis Zhou <dennis@kernel.org >
Cc: Gary Guo <gary@garyguo.net >
Cc: Miguel Ojeda <ojeda@kernel.org >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Tejun Heo <tj@kernel.org >
Cc: Vlastimil Babka <vbabka@suse.cz >
Cc: Wedson Almeida Filho <wedsonaf@gmail.com >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2024-04-25 20:55:49 -07:00
Wachowski, Karol
8047d36fe5
accel/ivpu: Add debug prints for MMU map/unmap operations
...
It is common need to be able to see IOVA/physical to VPU addresses
mappings. Especially when debugging different kind of memory related
issues. Lack of such logs forces user to modify and recompile KMD manually.
This commit adds those logs under MMU debug mask which can be turned on
dynamically with module param during KMD load.
Signed-off-by: Wachowski, Karol <karol.wachowski@intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20240115134434.493839-4-jacek.lawrynowicz@linux.intel.com
2024-01-22 10:27:47 +01:00
Maxime Ripard
3bf3e21c15
Merge drm/drm-next into drm-misc-next
...
Let's kickstart the v6.8 release cycle.
Signed-off-by: Maxime Ripard <mripard@kernel.org >
2023-11-15 10:56:44 +01:00
Jacek Lawrynowicz
48aea7f2a2
accel/ivpu: Fix locking in ivpu_bo_remove_all_bos_from_context()
...
ivpu_bo_remove_all_bos_from_context() could race with ivpu_bo_free()
when prime buffer was closed after vpu device was closed.
Move the bo_list from context to vdev and use a dedicated lock to
sync it. This list is not modified when BO is added/removed from
a context.
Also rename ivpu_bo_free_vpu_addr() to ivpu_bo_unbind() because this
function does more then just free vpu_addr.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073156.1301669-3-stanislaw.gruszka@linux.intel.com
2023-11-08 16:27:35 +01:00
Jacek Lawrynowicz
b035224134
accel/ivpu: Allocate vpu_addr in gem->open() callback
...
Use gem->open() callback to simplify the code and prepare for gem_shmem
conversion. It is called during handle creation for a gem object,
during prime import and in BO_CREATE ioctl. Hence can be used for vpu_addr
allocation. On the way remove unused bo->user_ptr field.
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231031073156.1301669-2-stanislaw.gruszka@linux.intel.com
2023-11-08 16:26:34 +01:00
Karol Wachowski
3bcc5209ba
accel/ivpu: Make DMA allocations for MMU600 write combined
...
Previously using dma_alloc_wc() API we created cache coherent
(mapped as write-back) mappings.
Because we disable MMU600 snooping it was required to do costly
page walk and cache flushes after each page table modification.
With write-combined buffers it's possible to do a single write memory
barrier to flush write-combined buffer to memory which simplifies the
driver and significantly reduce time of map/unmap operations.
Mapping time of 255 MB is reduced from 2.5 ms to 500 us.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231028155936.1183342-7-stanislaw.gruszka@linux.intel.com
2023-10-31 16:45:45 +01:00
Dave Airlie
7cd62eab9b
BackMerge tag 'v6.6-rc7' into drm-next
...
This is needed to add the msm pr which is based on a higher base.
Signed-off-by: Dave Airlie <airlied@redhat.com >
2023-10-23 18:20:06 +10:00
Wludzik, Jozef
8f5ad367e8
accel/ivpu: Extend address range for MMU mmap
...
Allow to use whole address range in MMU context mmap which is up to 48
bits. Return invalid argument from MMU context mmap in case address is
not aligned to MMU page size, address is below MMU page size or address
is greater then 47 bits.
This fixes problem disallowing to run large models on VPU4
Signed-off-by: Wludzik, Jozef <jozef.wludzik@intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20231018110113.547208-1-stanislaw.gruszka@linux.intel.com
2023-10-19 08:01:20 +02:00
Karol Wachowski
34d03f2a17
accel/ivpu: Initialize context with SSID = 1
...
Context with SSID = 1 is reserved and accesses on that context happen
only when context is uninitialized on the VPU side. Such access triggers
MMU fault (0xa) "Invalid CD Fetch", which doesn't contain any useful
information besides context ID.
This commit will change that state, now (0x10) "Translation fault" will
be triggered and accessed address will shown in the log.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230901094957.168898-7-stanislaw.gruszka@linux.intel.com
2023-09-04 11:01:26 +02:00
Stanislaw Gruszka
edee62c085
accel/ivpu: Add information about context on failure
...
Identify the mmu context that failed to initialize in the error messages.
This allows the error to be correlated with a specific user during debug.
Reviewed-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230901094957.168898-5-stanislaw.gruszka@linux.intel.com
2023-09-04 11:01:26 +02:00
Jacek Lawrynowicz
0a9cd7924e
accel/ivpu: Remove duplicated error messages
...
Reduce the number of error messages per single failure in
ivpu_dev_init() and ivpu_probe().
Most error messages are already printed by functions called
from ivpu_dev_init(). Add missed error prints in ivpu_ipc_init()
and ivpu_mmu_context_init().
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230901094957.168898-3-stanislaw.gruszka@linux.intel.com
2023-09-04 11:01:26 +02:00
Karol Wachowski
162f17b2d9
accel/ivpu: Refactor memory ranges logic
...
Add new dma range and change naming convention for virtual address
memory ranges managed by KMD.
New available ranges are named as follows:
* global range - global context accessible by FW
* aliased range - user context accessible by FW
* dma range - user context accessible by DMA
* shave range - user context accessible by shaves
* global shave range - global context accessible by shave nn
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230731161258.2987564-6-stanislaw.gruszka@linux.intel.com
2023-08-09 13:52:15 +02:00
Karol Wachowski
95d440188d
accel/ivpu: Mark 64 kB contiguous areas as contiguous in PTEs
...
Whenever KMD maps region larger than 64kB that is both aligned and
contiguous, set contiguous bit (52) in MMU PTE descriptor for each page
in that region.
This allows to treat 16 contiguous pages as one and reduce
number of MMU page walks required which results in lower latency.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-6-stanislaw.gruszka@linux.intel.com
2023-06-08 07:54:00 +02:00
Karol Wachowski
103d2ea139
accel/ivpu: Rename and cleanup MMU600 page tables
...
Simplify and unify naming convention in MMU600 page tables
configuration.
All DMA addresses in page tables directly accessed by VPU are called
with _dma sufix and all CPU pointers to those page tables have _ptr
sufix.
Base pointers used to do a page walk on the CPU have corresponding
names:
pud_ptrs (pointers used to get access to PUD DMA)
pmd_ptrs (pointers used to get access to PMD DMA)
pte_ptrs (pointers used to get access to PTE DMA)
with the following convention:
u64 *pud_dma_ptr = pud_ptrs[pgd_idx];
*pud_dma_ptr = pud_dma;
u64 *pmd_dma_ptr = pmd_ptrs[pgd_idx][pud_idx];
*pmd_dma_ptr = pmd_dma;
u64 *pte_dma_ptr = pte_ptrs[pgd_idx][pud_idx][pmd_idx];
*pte_dma_ptr = pte_dma;
On the way change to coherent dma allocation, _wc is only valid on ARM
and was used by mistake.
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-5-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:51 +02:00
Karol Wachowski
a2fd4a6fae
accel/ivpu: Add MMU support for 4 level page mappings
...
Program additional fourth level required for mappings with VA above 38bits.
Co-developed-by: Raymond Tan <raymond.tan@intel.com >
Signed-off-by: Raymond Tan <raymond.tan@intel.com >
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com >
Link: https://patchwork.freedesktop.org/patch/msgid/20230518131605.650622-3-stanislaw.gruszka@linux.intel.com
2023-06-08 07:53:33 +02:00
Jacek Lawrynowicz
263b2ba5fc
accel/ivpu: Add Intel VPU MMU support
...
VPU Memory Management Unit is based on ARM MMU-600.
It allows the creation of multiple virtual address spaces for
the device and map noncontinuous host memory (there is no dedicated
memory on the VPU).
Address space is implemented as a struct ivpu_mmu_context, it has an ID,
drm_mm allocator for VPU addresses and struct ivpu_mmu_pgtable that
holds actual 3-level, 4KB page table.
Context with ID 0 (global context) is created upon driver initialization
and it's mainly used for mapping memory required to execute
the firmware.
Contexts with non-zero IDs are user contexts allocated each time
the devices is open()-ed and they map command buffers and other
workload-related memory.
Workloads executing in a given contexts have access only
to the memory mapped in this context.
This patch is has two main files:
- ivpu_mmu_context.c handles MMU page tables and memory mapping
- ivpu_mmu.c implements a driver that programs the MMU device
Co-developed-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com >
Co-developed-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com >
Signed-off-by: Krystian Pradzynski <krystian.pradzynski@linux.intel.com >
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com >
Reviewed-by: Oded Gabbay <ogabbay@kernel.org >
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com >
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch >
Link: https://patchwork.freedesktop.org/patch/msgid/20230117092723.60441-3-jacek.lawrynowicz@linux.intel.com
2023-01-19 11:07:22 +01:00