Marco Elver
e4588c25c9
compiler-context-analysis: Remove __cond_lock() function-like helper
...
As discussed in [1], removing __cond_lock() will improve the readability
of trylock code. Now that Sparse context tracking support has been
removed, we can also remove __cond_lock().
Change existing APIs to either drop __cond_lock() completely, or make
use of the __cond_acquires() function attribute instead.
In particular, spinlock and rwlock implementations required switching
over to inline helpers rather than statement-expressions for their
trylock_* variants.
Suggested-by: Peter Zijlstra <peterz@infradead.org >
Signed-off-by: Marco Elver <elver@google.com >
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org >
Link: https://lore.kernel.org/all/20250207082832.GU7145@noisy.programming.kicks-ass.net/ [1]
Link: https://patch.msgid.link/20251219154418.3592607-25-elver@google.com
2026-01-05 16:43:33 +01:00
Matthew Wilcox (Oracle)
2197bb60f8
mm: add vma_start_write_killable()
...
Patch series "vma_start_write_killable"", v2.
When we added the VMA lock, we made a major oversight in not adding a
killable variant. That can run us into trouble where a thread takes the
VMA lock for read (eg handling a page fault) and then goes out to lunch
for an hour (eg doing reclaim). Another thread tries to modify the VMA,
taking the mmap_lock for write, then attempts to lock the VMA for write.
That blocks on the first thread, and ensures that every other page fault
now tries to take the mmap_lock for read. Because everything's in an
uninterruptible sleep, we can't kill the task, which makes me angry.
This patchset just adds vma_start_write_killable() and converts one caller
to use it. Most users are somewhat tricky to convert, so expect follow-up
individual patches per call-site which need careful analysis to make sure
we've done proper cleanup.
This patch (of 2):
The vma can be held read-locked for a substantial period of time, eg if
memory allocation needs to go into reclaim. It's useful to be able to
send fatal signals to threads which are waiting for the write lock.
Link: https://lkml.kernel.org/r/20251110203204.1454057-1-willy@infradead.org
Link: https://lkml.kernel.org/r/20251110203204.1454057-2-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org >
Reviewed-by: Suren Baghdasaryan <surenb@google.com >
Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com >
Reviewed-by: Vlastimil Babka <vbabka@suse.cz >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Cc: Chris Li <chriscli@google.com >
Cc: Jann Horn <jannh@google.com >
Cc: Matthew Wilcox (Oracle) <willy@infradead.org >
Cc: Shakeel Butt <shakeel.butt@linux.dev >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-11-20 13:43:59 -08:00
Lorenzo Stoakes
80d1a81309
docs/mm: expand vma doc to highlight pte freeing, non-vma traversal
...
The process addresses documentation already contains a great deal of
information about mmap/VMA locking and page table traversal and
manipulation.
However it waves it hands about non-VMA traversal. Add a section for this
and explain the caveats around this kind of traversal.
Additionally, commit 6375e95f38 ("mm: pgtable: reclaim empty PTE page in
madvise(MADV_DONTNEED)") caused zapping to also free empty PTE page
tables. Highlight this.
Link: https://lkml.kernel.org/r/20250604180308.137116-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com >
Cc: Jann Horn <jannh@google.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Liam Howlett <liam.howlett@oracle.com >
Cc: Qi Zheng <zhengqi.arch@bytedance.com >
Cc: Shakeel Butt <shakeel.butt@linux.dev >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-07-09 22:41:52 -07:00
Suren Baghdasaryan
795f29616e
docs/mm: document latest changes to vm_lock
...
Change the documentation to reflect that vm_lock is integrated into vma
and replaced with vm_refcnt. Document newly introduced
vma_start_read_locked{_nested} functions.
Link: https://lkml.kernel.org/r/20250213224655.1680278-19-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com >
Reviewed-by: Liam R. Howlett <Liam.Howlett@Oracle.com >
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Tested-by: Shivank Garg <shivankg@amd.com >
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Reviewed-by: Vlastimil Babka <vbabka@suse.cz >
Cc: Christian Brauner <brauner@kernel.org >
Cc: David Hildenbrand <david@redhat.com >
Cc: David Howells <dhowells@redhat.com >
Cc: Davidlohr Bueso <dave@stgolabs.net >
Cc: Hugh Dickins <hughd@google.com >
Cc: Jann Horn <jannh@google.com >
Cc: Johannes Weiner <hannes@cmpxchg.org >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Klara Modin <klarasmodin@gmail.com >
Cc: Lokesh Gidra <lokeshgidra@google.com >
Cc: Mateusz Guzik <mjguzik@gmail.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Mel Gorman <mgorman@techsingularity.net >
Cc: Michal Hocko <mhocko@suse.com >
Cc: Minchan Kim <minchan@google.com >
Cc: Oleg Nesterov <oleg@redhat.com >
Cc: Pasha Tatashin <pasha.tatashin@soleen.com >
Cc: "Paul E . McKenney" <paulmck@kernel.org >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra (Intel) <peterz@infradead.org >
Cc: Shakeel Butt <shakeel.butt@linux.dev >
Cc: Sourav Panda <souravpanda@google.com >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Wei Yang <richard.weiyang@gmail.com >
Cc: Will Deacon <will@kernel.org >
Cc: Heiko Carstens <hca@linux.ibm.com >
Cc: Stephen Rothwell <sfr@canb.auug.org.au >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-03-16 22:06:21 -07:00
Qi Zheng
6c18ec9af8
mm: khugepaged: recheck pmd state in retract_page_tables()
...
Patch series "synchronously scan and reclaim empty user PTE pages", v4.
Previously, we tried to use a completely asynchronous method to reclaim
empty user PTE pages [1]. After discussing with David Hildenbrand, we
decided to implement synchronous reclaimation in the case of
madvise(MADV_DONTNEED) as the first step.
So this series aims to synchronously free the empty PTE pages in
madvise(MADV_DONTNEED) case. We will detect and free empty PTE pages in
zap_pte_range(), and will add zap_details.reclaim_pt to exclude cases
other than madvise(MADV_DONTNEED).
In zap_pte_range(), mmu_gather is used to perform batch tlb flushing and
page freeing operations. Therefore, if we want to free the empty PTE page
in this path, the most natural way is to add it to mmu_gather as well.
Now, if CONFIG_MMU_GATHER_RCU_TABLE_FREE is selected, mmu_gather will free
page table pages by semi RCU:
- batch table freeing: asynchronous free by RCU
- single table freeing: IPI + synchronous free
But this is not enough to free the empty PTE page table pages in paths
other that munmap and exit_mmap path, because IPI cannot be synchronized
with rcu_read_lock() in pte_offset_map{_lock}(). So we should let single
table also be freed by RCU like batch table freeing.
As a first step, we supported this feature on x86_64 and selectd the newly
introduced CONFIG_ARCH_SUPPORTS_PT_RECLAIM.
For other cases such as madvise(MADV_FREE), consider scanning and freeing
empty PTE pages asynchronously in the future.
Note: issues related to TLB flushing are not new to this series and are tracked
in the separate RFC patch [3]. And more context please refer to this
thread [4].
[1]. https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/
[2]. https://lore.kernel.org/lkml/cover.1727332572.git.zhengqi.arch@bytedance.com/
[3]. https://lore.kernel.org/lkml/20240815120715.14516-1-zhengqi.arch@bytedance.com/
[4]. https://lore.kernel.org/lkml/6f38cb19-9847-4f70-bbe7-06881bb016be@bytedance.com/
This patch (of 11):
In retract_page_tables(), the lock of new_folio is still held, we will be
blocked in the page fault path, which prevents the pte entries from being
set again. So even though the old empty PTE page may be concurrently
freed and a new PTE page is filled into the pmd entry, it is still empty
and can be removed.
So just refactor the retract_page_tables() a little bit and recheck the
pmd state after holding the pmd lock.
Link: https://lkml.kernel.org/r/cover.1733305182.git.zhengqi.arch@bytedance.com
Link: https://lkml.kernel.org/r/70a51804cd19d44ccaf031825d9fb6eaf92f2bad.1733305182.git.zhengqi.arch@bytedance.com
Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com >
Suggested-by: Jann Horn <jannh@google.com >
Cc: Andy Lutomirski <luto@kernel.org >
Cc: Catalin Marinas <catalin.marinas@arm.com >
Cc: Dave Hansen <dave.hansen@linux.intel.com >
Cc: David Hildenbrand <david@redhat.com >
Cc: David Rientjes <rientjes@google.com >
Cc: Hugh Dickins <hughd@google.com >
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: Mel Gorman <mgorman@suse.de >
Cc: Muchun Song <muchun.song@linux.dev >
Cc: Peter Xu <peterx@redhat.com >
Cc: Peter Zijlstra <peterz@infradead.org >
Cc: Will Deacon <will@kernel.org >
Cc: Zach O'Keefe <zokeefe@google.com >
Cc: Dan Carpenter <dan.carpenter@linaro.org >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2025-01-13 22:40:46 -08:00
Lorenzo Stoakes
dbf8be8218
docs/mm: add VMA locks documentation
...
Locking around VMAs is complicated and confusing. While we have a number
of disparate comments scattered around the place, we seem to be reaching a
level of complexity that justifies a serious effort at clearly documenting
how locks are expected to be used when it comes to interacting with
mm_struct and vm_area_struct objects.
This is especially pertinent as regards the efforts to find sensible
abstractions for these fundamental objects in kernel rust code whose
compiler strictly requires some means of expressing these rules (and
through this expression, self-document these requirements as well as
enforce them).
The document limits scope to mmap and VMA locks and those that are
immediately adjacent and relevant to them - so additionally covers page
table locking as this is so very closely tied to VMA operations (and
relies upon us handling these correctly).
The document tries to cover some of the nastier and more confusing edge
cases and concerns especially around lock ordering and page table
teardown.
The document is split between generally useful information for users of mm
interfaces, and separately a section intended for mm kernel developers
providing a discussion around internal implementation details.
[lorenzo.stoakes@oracle.com: v3]
Link: https://lkml.kernel.org/r/20241114205402.859737-1-lorenzo.stoakes@oracle.com
[lorenzo.stoakes@oracle.com: docs/mm: minor corrections]
Link: https://lkml.kernel.org/r/d3de735a-25ae-4eb2-866c-a9624fe6f795@lucifer.local
[jannh@google.com: docs/mm: add more warnings around page table access]
Link: https://lkml.kernel.org/r/20241118-vma-docs-addition1-onv3-v2-1-c9d5395b72ee@google.com
Link: https://lkml.kernel.org/r/20241108135708.48567-1-lorenzo.stoakes@oracle.com
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com >
Acked-by: Qi Zheng <zhengqi.arch@bytedance.com >
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org >
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com >
Reviewed-by: Jann Horn <jannh@google.com >
Cc: Alice Ryhl <aliceryhl@google.com >
Cc: Boqun Feng <boqun.feng@gmail.com >
Cc: Hillf Danton <hdanton@sina.com >
Cc: Jonathan Corbet <corbet@lwn.net >
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com >
Cc: Matthew Wilcox <willy@infradead.org >
Cc: SeongJae Park <sj@kernel.org >
Cc: Suren Baghdasaryan <surenb@google.com >
Cc: Vlastimil Babka <vbabka@suse.cz >
Signed-off-by: Andrew Morton <akpm@linux-foundation.org >
2024-12-18 19:04:41 -08:00
Mike Rapoport
ee65728e10
docs: rename Documentation/vm to Documentation/mm
...
so it will be consistent with code mm directory and with
Documentation/admin-guide/mm and won't be confused with virtual machines.
Signed-off-by: Mike Rapoport <rppt@linux.ibm.com >
Suggested-by: Matthew Wilcox <willy@infradead.org >
Tested-by: Ira Weiny <ira.weiny@intel.com >
Acked-by: Jonathan Corbet <corbet@lwn.net >
Acked-by: Wu XiangCheng <bobwxc@email.cn >
2022-06-27 12:52:53 -07:00