mirror of
https://github.com/torvalds/linux.git
synced 2026-04-18 14:53:58 -04:00
Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
161 lines
4.2 KiB
C
161 lines
4.2 KiB
C
// SPDX-License-Identifier: GPL-2.0
|
|
/*
|
|
* Author: Andrei Vagin <avagin@openvz.org>
|
|
* Author: Dmitry Safonov <dima@arista.com>
|
|
*/
|
|
|
|
#include <linux/cleanup.h>
|
|
#include <linux/mm.h>
|
|
#include <linux/time_namespace.h>
|
|
#include <linux/time.h>
|
|
#include <linux/vdso_datastore.h>
|
|
|
|
#include <vdso/clocksource.h>
|
|
#include <vdso/datapage.h>
|
|
|
|
#include "namespace_internal.h"
|
|
|
|
static struct timens_offset offset_from_ts(struct timespec64 off)
|
|
{
|
|
struct timens_offset ret;
|
|
|
|
ret.sec = off.tv_sec;
|
|
ret.nsec = off.tv_nsec;
|
|
|
|
return ret;
|
|
}
|
|
|
|
/*
|
|
* A time namespace VVAR page has the same layout as the VVAR page which
|
|
* contains the system wide VDSO data.
|
|
*
|
|
* For a normal task the VVAR pages are installed in the normal ordering:
|
|
* VVAR
|
|
* PVCLOCK
|
|
* HVCLOCK
|
|
* TIMENS <- Not really required
|
|
*
|
|
* Now for a timens task the pages are installed in the following order:
|
|
* TIMENS
|
|
* PVCLOCK
|
|
* HVCLOCK
|
|
* VVAR
|
|
*
|
|
* The check for vdso_clock->clock_mode is in the unlikely path of
|
|
* the seq begin magic. So for the non-timens case most of the time
|
|
* 'seq' is even, so the branch is not taken.
|
|
*
|
|
* If 'seq' is odd, i.e. a concurrent update is in progress, the extra check
|
|
* for vdso_clock->clock_mode is a non-issue. The task is spin waiting for the
|
|
* update to finish and for 'seq' to become even anyway.
|
|
*
|
|
* Timens page has vdso_clock->clock_mode set to VDSO_CLOCKMODE_TIMENS which
|
|
* enforces the time namespace handling path.
|
|
*/
|
|
static void timens_setup_vdso_clock_data(struct vdso_clock *vc,
|
|
struct time_namespace *ns)
|
|
{
|
|
struct timens_offset *offset = vc->offset;
|
|
struct timens_offset monotonic = offset_from_ts(ns->offsets.monotonic);
|
|
struct timens_offset boottime = offset_from_ts(ns->offsets.boottime);
|
|
|
|
vc->seq = 1;
|
|
vc->clock_mode = VDSO_CLOCKMODE_TIMENS;
|
|
offset[CLOCK_MONOTONIC] = monotonic;
|
|
offset[CLOCK_MONOTONIC_RAW] = monotonic;
|
|
offset[CLOCK_MONOTONIC_COARSE] = monotonic;
|
|
offset[CLOCK_BOOTTIME] = boottime;
|
|
offset[CLOCK_BOOTTIME_ALARM] = boottime;
|
|
}
|
|
|
|
struct page *find_timens_vvar_page(struct vm_area_struct *vma)
|
|
{
|
|
if (likely(vma->vm_mm == current->mm))
|
|
return current->nsproxy->time_ns->vvar_page;
|
|
|
|
/*
|
|
* VM_PFNMAP | VM_IO protect .fault() handler from being called
|
|
* through interfaces like /proc/$pid/mem or
|
|
* process_vm_{readv,writev}() as long as there's no .access()
|
|
* in special_mapping_vmops().
|
|
* For more details check_vma_flags() and __access_remote_vm()
|
|
*/
|
|
|
|
WARN(1, "vvar_page accessed remotely");
|
|
|
|
return NULL;
|
|
}
|
|
|
|
static void timens_set_vvar_page(struct task_struct *task,
|
|
struct time_namespace *ns)
|
|
{
|
|
struct vdso_time_data *vdata;
|
|
struct vdso_clock *vc;
|
|
unsigned int i;
|
|
|
|
if (ns == &init_time_ns)
|
|
return;
|
|
|
|
/* Fast-path, taken by every task in namespace except the first. */
|
|
if (likely(ns->frozen_offsets))
|
|
return;
|
|
|
|
guard(mutex)(&timens_offset_lock);
|
|
/* Nothing to-do: vvar_page has been already initialized. */
|
|
if (ns->frozen_offsets)
|
|
return;
|
|
|
|
ns->frozen_offsets = true;
|
|
vdata = page_address(ns->vvar_page);
|
|
vc = vdata->clock_data;
|
|
|
|
for (i = 0; i < CS_BASES; i++)
|
|
timens_setup_vdso_clock_data(&vc[i], ns);
|
|
|
|
if (IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS)) {
|
|
for (i = 0; i < ARRAY_SIZE(vdata->aux_clock_data); i++)
|
|
timens_setup_vdso_clock_data(&vdata->aux_clock_data[i], ns);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* The vvar page layout depends on whether a task belongs to the root or
|
|
* non-root time namespace. Whenever a task changes its namespace, the VVAR
|
|
* page tables are cleared and then they will be re-faulted with a
|
|
* corresponding layout.
|
|
* See also the comment near timens_setup_vdso_clock_data() for details.
|
|
*/
|
|
static int vdso_join_timens(struct task_struct *task, struct time_namespace *ns)
|
|
{
|
|
struct mm_struct *mm = task->mm;
|
|
struct vm_area_struct *vma;
|
|
VMA_ITERATOR(vmi, mm, 0);
|
|
|
|
guard(mmap_read_lock)(mm);
|
|
for_each_vma(vmi, vma) {
|
|
if (vma_is_special_mapping(vma, &vdso_vvar_mapping))
|
|
zap_vma(vma);
|
|
}
|
|
return 0;
|
|
}
|
|
|
|
void timens_commit(struct task_struct *tsk, struct time_namespace *ns)
|
|
{
|
|
timens_set_vvar_page(tsk, ns);
|
|
vdso_join_timens(tsk, ns);
|
|
}
|
|
|
|
int timens_vdso_alloc_vvar_page(struct time_namespace *ns)
|
|
{
|
|
ns->vvar_page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO);
|
|
if (!ns->vvar_page)
|
|
return -ENOMEM;
|
|
|
|
return 0;
|
|
}
|
|
|
|
void timens_vdso_free_vvar_page(struct time_namespace *ns)
|
|
{
|
|
__free_page(ns->vvar_page);
|
|
}
|