mm: add documentation for the mmap_prepare file operation callback

This documentation makes it easier for a driver/file system implementer to
correctly use this callback.

It covers the fundamentals, whilst intentionally leaving the less lovely
possible actions one might take undocumented (for instance - the
success_hook, error_hook fields in mmap_action).

The document also covers the new VMA flags implementation which is the
only one which will work correctly with mmap_prepare.

Link: https://lkml.kernel.org/r/3aebf918c213fa2aecf00a31a444119b5bdd7801.1774045440.git.ljs@kernel.org
Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org>
Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bodo Stroesser <bostroesser@gmail.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Clemens Ladisch <clemens@ladisch.de>
Cc: David Hildenbrand <david@kernel.org>
Cc: David Howells <dhowells@redhat.com>
Cc: Dexuan Cui <decui@microsoft.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Long Li <longli@microsoft.com>
Cc: Marc Dionne <marc.dionne@auristor.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Miquel Raynal <miquel.raynal@bootlin.com>
Cc: Pedro Falcato <pfalcato@suse.de>
Cc: Richard Weinberger <richard@nod.at>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vignesh Raghavendra <vigneshr@ti.com>
Cc: Wei Liu <wei.liu@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
This commit is contained in:
Lorenzo Stoakes (Oracle)
2026-03-20 22:39:28 +00:00
committed by Andrew Morton
parent 3e4bb27068
commit fdd2478413
2 changed files with 143 additions and 0 deletions

View File

@@ -29,6 +29,7 @@ algorithms work.
fiemap
files
locks
mmap_prepare
multigrain-ts
mount_api
quota

View File

@@ -0,0 +1,142 @@
.. SPDX-License-Identifier: GPL-2.0
===========================
mmap_prepare callback HOWTO
===========================
Introduction
============
The ``struct file->f_op->mmap()`` callback has been deprecated as it is both a
stability and security risk, and doesn't always permit the merging of adjacent
mappings resulting in unnecessary memory fragmentation.
It has been replaced with the ``file->f_op->mmap_prepare()`` callback which
solves these problems.
This hook is called right at the beginning of setting up the mapping, and
importantly it is invoked *before* any merging of adjacent mappings has taken
place.
If an error arises upon mapping, it might arise after this callback has been
invoked, therefore it should be treated as effectively stateless.
That is - no resources should be allocated nor state updated to reflect that a
mapping has been established, as the mapping may either be merged, or fail to be
mapped after the callback is complete.
How To Use
==========
In your driver's struct file_operations struct, specify an ``mmap_prepare``
callback rather than an ``mmap`` one, e.g. for ext4:
.. code-block:: C
const struct file_operations ext4_file_operations = {
...
.mmap_prepare = ext4_file_mmap_prepare,
};
This has a signature of ``int (*mmap_prepare)(struct vm_area_desc *)``.
Examining the struct vm_area_desc type:
.. code-block:: C
struct vm_area_desc {
/* Immutable state. */
const struct mm_struct *const mm;
struct file *const file; /* May vary from vm_file in stacked callers. */
unsigned long start;
unsigned long end;
/* Mutable fields. Populated with initial state. */
pgoff_t pgoff;
struct file *vm_file;
vma_flags_t vma_flags;
pgprot_t page_prot;
/* Write-only fields. */
const struct vm_operations_struct *vm_ops;
void *private_data;
/* Take further action? */
struct mmap_action action;
};
This is straightforward - you have all the fields you need to set up the
mapping, and you can update the mutable and writable fields, for instance:
.. code-block:: C
static int ext4_file_mmap_prepare(struct vm_area_desc *desc)
{
int ret;
struct file *file = desc->file;
struct inode *inode = file->f_mapping->host;
...
file_accessed(file);
if (IS_DAX(file_inode(file))) {
desc->vm_ops = &ext4_dax_vm_ops;
vma_desc_set_flags(desc, VMA_HUGEPAGE_BIT);
} else {
desc->vm_ops = &ext4_file_vm_ops;
}
return 0;
}
Importantly, you no longer have to dance around with reference counts or locks
when updating these fields - **you can simply go ahead and change them**.
Everything is taken care of by the mapping code.
VMA Flags
---------
Along with ``mmap_prepare``, VMA flags have undergone an overhaul. Where before
you would invoke one of vm_flags_init(), vm_flags_reset(), vm_flags_set(),
vm_flags_clear(), and vm_flags_mod() to modify flags (and to have the
locking done correctly for you, this is no longer necessary.
Also, the legacy approach of specifying VMA flags via ``VM_READ``, ``VM_WRITE``,
etc. - i.e. using a ``-VM_xxx``- macro has changed too.
When implementing mmap_prepare(), reference flags by their bit number, defined
as a ``VMA_xxx_BIT`` macro, e.g. ``VMA_READ_BIT``, ``VMA_WRITE_BIT`` etc.,
and use one of (where ``desc`` is a pointer to struct vm_area_desc):
* ``vma_desc_test_any(desc, ...)`` - Specify a comma-separated list of flags
you wish to test for (whether _any_ are set), e.g. - ``vma_desc_test_any(
desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)`` - returns ``true`` if either are set,
otherwise ``false``.
* ``vma_desc_set_flags(desc, ...)`` - Update the VMA descriptor flags to set
additional flags specified by a comma-separated list,
e.g. - ``vma_desc_set_flags(desc, VMA_PFNMAP_BIT, VMA_IO_BIT)``.
* ``vma_desc_clear_flags(desc, ...)`` - Update the VMA descriptor flags to clear
flags specified by a comma-separated list, e.g. - ``vma_desc_clear_flags(
desc, VMA_WRITE_BIT, VMA_MAYWRITE_BIT)``.
Actions
=======
You can now very easily have actions be performed upon a mapping once set up by
utilising simple helper functions invoked upon the struct vm_area_desc
pointer. These are:
* mmap_action_remap() - Remaps a range consisting only of PFNs for a specific
range starting a virtual address and PFN number of a set size.
* mmap_action_remap_full() - Same as mmap_action_remap(), only remaps the
entire mapping from ``start_pfn`` onward.
* mmap_action_ioremap() - Same as mmap_action_remap(), only performs an I/O
remap.
* mmap_action_ioremap_full() - Same as mmap_action_ioremap(), only remaps
the entire mapping from ``start_pfn`` onward.
**NOTE:** The ``action`` field should never normally be manipulated directly,
rather you ought to use one of these helpers.