Commit Graph

297 Commits

Author SHA1 Message Date
Jeff Layton
0b2600f81c treewide: change inode->i_ino from unsigned long to u64
On 32-bit architectures, unsigned long is only 32 bits wide, which
causes 64-bit inode numbers to be silently truncated. Several
filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that
exceed 32 bits, and this truncation can lead to inode number collisions
and other subtle bugs on 32-bit systems.

Change the type of inode->i_ino from unsigned long to u64 to ensure that
inode numbers are always represented as 64-bit values regardless of
architecture. Update all format specifiers treewide from %lu/%lx to
%llu/%llx to match the new type, along with corresponding local variable
types.

This is the bulk treewide conversion. Earlier patches in this series
handled trace events separately to allow trace field reordering for
better struct packing on 32-bit.

Signed-off-by: Jeff Layton <jlayton@kernel.org>
Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org
Acked-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-06 14:31:28 +01:00
Linus Torvalds
3e48a11675 Merge tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "In this development cycle, we focused on several key performance
  optimizations:

   - introducing large folio support to enhance read speeds for
     immutable files

   - reducing checkpoint=enable latency by flushing only committed dirty
     pages

   - implementing tracepoints to diagnose and resolve lock priority
     inversion.

  Additionally, we introduced the packed_ssa feature to optimize the SSA
  footprint when utilizing large block sizes.

  Detail summary:

  Enhancements:
   - support large folio for immutable non-compressed case
   - support non-4KB block size without packed_ssa feature
   - optimize f2fs_enable_checkpoint() to avoid long delay
   - optimize f2fs_overwrite_io() for f2fs_iomap_begin
   - optimize NAT block loading during checkpoint write
   - add write latency stats for NAT and SIT blocks in
     f2fs_write_checkpoint
   - pin files do not require sbi->writepages lock for ordering
   - avoid f2fs_map_blocks() for consecutive holes in readpages
   - flush plug periodically during GC to maximize readahead effect
   - add tracepoints to catch lock overheads
   - add several sysfs entries to tune internal lock priorities

  Fixes:
   - fix lock priority inversion issue
   - fix incomplete block usage in compact SSA summaries
   - fix to show simulate_lock_timeout correctly
   - fix to avoid mapping wrong physical block for swapfile
   - fix IS_CHECKPOINTED flag inconsistency issue caused by
     concurrent atomic commit and checkpoint writes
   - fix to avoid UAF in f2fs_write_end_io()"

* tag 'f2fs-for-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (61 commits)
  f2fs: sysfs: introduce critical_task_priority
  f2fs: introduce trace_f2fs_priority_update
  f2fs: fix lock priority inversion issue
  f2fs: optimize f2fs_overwrite_io() for f2fs_iomap_begin
  f2fs: fix incomplete block usage in compact SSA summaries
  f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint()
  f2fs: optimize NAT block loading during checkpoint write
  f2fs: change size parameter of __has_cursum_space() to unsigned int
  f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint
  f2fs: pin files do not require sbi->writepages lock for ordering
  f2fs: fix to show simulate_lock_timeout correctly
  f2fs: introduce FAULT_SKIP_WRITE
  f2fs: check skipped write in f2fs_enable_checkpoint()
  Revert "f2fs: add timeout in f2fs_enable_checkpoint()"
  f2fs: fix to unlock folio in f2fs_read_data_large_folio()
  f2fs: fix error path handling in f2fs_read_data_large_folio()
  f2fs: use folio_end_read
  f2fs: fix to avoid mapping wrong physical block for swapfile
  f2fs: avoid f2fs_map_blocks() for consecutive holes in readpages
  f2fs: advance index and offset after zeroing in large folio read
  ...
2026-02-14 09:48:10 -08:00
Christoph Hellwig
70098d9327 fs,fsverity: clear out fsverity_info from common code
Free the fsverity_info directly in clear_inode instead of requiring file
systems to handle it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Jan Kara <jack@suse.cz>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
Acked-by: David Sterba <dsterba@suse.com> # btrfs
Link: https://lore.kernel.org/r/20260128152630.627409-3-hch@lst.de
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-01-29 09:39:41 -08:00
Chao Yu
66e9e0d55d f2fs: trace elapsed time for cp_rwsem lock
Use f2fs_{down,up}_read_trace for cp_rwsem to trace lock elapsed time.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-07 03:17:06 +00:00
Chao Yu
3cb396a2c7 f2fs: fix to do sanity check on nat entry of quota inode
As Zhiguo reported, nat entry of quota inode could be corrupted:

"ino/block_addr=NULL_ADDR in nid=4 entry"

We'd better to do sanity check on quota inode to detect and record
nat.blk_addr inconsistency, so that we can have a chance to repair
it w/ later fsck.

Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-01 03:26:56 +00:00
Jaegeuk Kim
05e65c14ea f2fs: support large folio for immutable non-compressed case
This patch enables large folio for limited case where we can get the high-order
memory allocation. It supports the encrypted and fsverity files, which are
essential for Android environment.

How to test:
- dd if=/dev/zero of=/mnt/test/test bs=1G count=4
- f2fs_io setflags immutable /mnt/test/test
- echo 3 > /proc/sys/vm/drop_caches
 : to reload inode with large folio
- f2fs_io read 32 0 1024 mmap 0 0 /mnt/test/test

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-16 00:46:49 +00:00
Linus Torvalds
cb015814f8 Merge tag 'f2fs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs updates from Jaegeuk Kim:
 "This series focuses on minor clean-ups and performance optimizations
  across sysfs, documentation, debugfs, tracepoints, slab allocation,
  and GC. Furthermore, it resolves several corner-case bugs caught by
  xfstests, as well as issues related to 16KB page support and
  f2fs_enable_checkpoint.

  Enhancement:
   - wrap ASCII tables in literal blocks to fix LaTeX build
   - optimize trace_f2fs_write_checkpoint with enums
   - support to show curseg.next_blkoff in debugfs
   - add a sysfs entry to show max open zones
   - add fadvise tracepoint
   - use global inline_xattr_slab instead of per-sb slab cache
   - set default valid_thresh_ratio to 80 for zoned devices
   - maintain one time GC mode is enabled during whole zoned GC cycle

  Bug fix:
   - ensure node page reads complete before f2fs_put_super() finishes
   - do not account invalid blocks in get_left_section_blocks()
   - revert summary entry count from 2048 to 512 in 16kb block support
   - detect recoverable inode during dryrun of find_fsync_dnodes()
   - fix age extent cache insertion skip on counter overflow
   - add sanity checks before unlinking and loading inodes
   - ensure minimum trim granularity accounts for all devices
   - block cache/dio write during f2fs_enable_checkpoint()
   - propagate error from f2fs_enable_checkpoint()
   - invalidate dentry cache on failed whiteout creation
   - avoid updating compression context during writeback
   - avoid updating zero-sized extent in extent cache
   - avoid potential deadlock"

* tag 'f2fs-for-6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (39 commits)
  f2fs: ignore discard return value
  f2fs: optimize trace_f2fs_write_checkpoint with enums
  f2fs: fix to not account invalid blocks in get_left_section_blocks()
  f2fs: support to show curseg.next_blkoff in debugfs
  docs: f2fs: wrap ASCII tables in literal blocks to fix LaTeX build
  f2fs: expand scalability of f2fs mount option
  f2fs: change default schedule timeout value
  f2fs: introduce f2fs_schedule_timeout()
  f2fs: use memalloc_retry_wait() as much as possible
  f2fs: add a sysfs entry to show max open zones
  f2fs: wrap all unusable_blocks_per_sec code in CONFIG_BLK_DEV_ZONED
  f2fs: simplify list initialization in f2fs_recover_fsync_data()
  f2fs: revert summary entry count from 2048 to 512 in 16kb block support
  f2fs: fix to detect recoverable inode during dryrun of find_fsync_dnodes()
  f2fs: fix return value of f2fs_recover_fsync_data()
  f2fs: add fadvise tracepoint
  f2fs: fix age extent cache insertion skip on counter overflow
  f2fs: Add sanity checks before unlinking and loading inodes
  f2fs: Rename f2fs_unlink exit label
  f2fs: ensure minimum trim granularity accounts for all devices
  ...
2025-12-09 12:06:20 +09:00
Nikola Z. Ivanov
f37981edcd f2fs: Add sanity checks before unlinking and loading inodes
Add check for inode->i_nlink == 1 for directories during unlink,
as their value is decremented twice, which can trigger a warning in
drop_nlink. In such case mark the filesystem as corrupted and return
from the function call with the relevant failure return value.

Additionally add the check for i_nlink == 1 in
sanity_check_inode in order to detect on-disk corruption early.

Reported-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=c07d47c7bc68f47b9083
Tested-by: syzbot+c07d47c7bc68f47b9083@syzkaller.appspotmail.com
Signed-off-by: Nikola Z. Ivanov <zlatistiv@gmail.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-12-04 02:00:03 +00:00
Mateusz Guzik
ba69118c52 f2fs: use the new ->i_state accessors
Change generated with coccinelle and fixed up by hand as appropriate.

Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20 20:22:27 +02:00
Jaegeuk Kim
078cad8212 f2fs: drop inode from the donation list when the last file is closed
Let's drop the inode from the donation list when there is no other
open file.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-30 17:13:12 +00:00
Matthew Wilcox (Oracle)
8591db2a65 f2fs: Pass a folio to F2FS_NODE()
All callers now have a folio so pass it in

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:48 +00:00
Matthew Wilcox (Oracle)
4ecaf580ee f2fs: Add folio counterparts to page_private_flags functions
Name these new functions folio_test_f2fs_*(), folio_set_f2fs_*() and
folio_clear_f2fs_*().  Convert all callers which currently have a folio
and cast back to a page.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:05 +00:00
Matthew Wilcox (Oracle)
a5f3be6e65 f2fs: Pass a folio to IS_INODE()
All callers now have a folio so pass it in.  Also make it const to help
the compiler.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:57:02 +00:00
Matthew Wilcox (Oracle)
1fd0dffdb4 f2fs: Pass a folio to is_cold_node()
All callers now have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:45 +00:00
Matthew Wilcox (Oracle)
5398745334 f2fs: Pass a folio to set_cold_node()
All callers have a folio so pass it in.  Also mark it as const to help
the compiler.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:20 +00:00
Matthew Wilcox (Oracle)
5ea99b6d70 f2fs: Pass a folio to f2fs_inode_chksum()
Both callers have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:12 +00:00
Matthew Wilcox (Oracle)
6ebd7ba499 f2fs: Pass a folio to f2fs_enable_inode_chksum()
All callers have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:09 +00:00
Matthew Wilcox (Oracle)
e3f1b76d87 f2fs: Pass a folio to f2fs_inode_chksum_set()
All callers have a folio so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:56:06 +00:00
Matthew Wilcox (Oracle)
a63f2de2dd f2fs: Pass a folio to nid_of_node()
All callers have a folio so pass it in.  Also make the argument const
as the function does not modify it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:50 +00:00
Matthew Wilcox (Oracle)
28fde0d7ff f2fs: Pass a folio to ino_of_node()
All callers have a folio so pass it in.  Also make the argument const
as the function does not modify it.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:47 +00:00
Matthew Wilcox (Oracle)
9d71780716 f2fs: Pass a folio to F2FS_INODE()
All callers now have a folio, so pass it in.  Also make it const as
F2FS_INODE() does not modify the struct folio passed in (the data it
describes is mutable, but it does not change the contents of the struct).
This may improve code generation.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:43 +00:00
Matthew Wilcox (Oracle)
1f6425e33d f2fs: Pass a folio to f2fs_sanity_check_inline_data()
The only caller has a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:38 +00:00
Matthew Wilcox (Oracle)
ea3f2069ea f2fs: Pass a folio to sanity_check_inode()
The only caller has a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:34 +00:00
Matthew Wilcox (Oracle)
afd42fa98b f2fs: Pass a folio to sanity_check_extent_cache()
The only caller has a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-22 15:55:31 +00:00
Chao Yu
a509a55f8e f2fs: fix to avoid panic in f2fs_evict_inode
As syzbot [1] reported as below:

R10: 0000000000000100 R11: 0000000000000206 R12: 00007ffe17473450
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
 </TASK>
---[ end trace 0000000000000000 ]---
==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff88812d962278 by task syz-executor/564

CPU: 1 PID: 564 Comm: syz-executor Tainted: G        W          6.1.129-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Call Trace:
 <TASK>
 __dump_stack+0x21/0x24 lib/dump_stack.c:88
 dump_stack_lvl+0xee/0x158 lib/dump_stack.c:106
 print_address_description+0x71/0x210 mm/kasan/report.c:316
 print_report+0x4a/0x60 mm/kasan/report.c:427
 kasan_report+0x122/0x150 mm/kasan/report.c:531
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
 __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
 __list_del_entry include/linux/list.h:134 [inline]
 list_del_init include/linux/list.h:206 [inline]
 f2fs_inode_synced+0xf7/0x2e0 fs/f2fs/super.c:1531
 f2fs_update_inode+0x74/0x1c40 fs/f2fs/inode.c:585
 f2fs_update_inode_page+0x137/0x170 fs/f2fs/inode.c:703
 f2fs_write_inode+0x4ec/0x770 fs/f2fs/inode.c:731
 write_inode fs/fs-writeback.c:1460 [inline]
 __writeback_single_inode+0x4a0/0xab0 fs/fs-writeback.c:1677
 writeback_single_inode+0x221/0x8b0 fs/fs-writeback.c:1733
 sync_inode_metadata+0xb6/0x110 fs/fs-writeback.c:2789
 f2fs_sync_inode_meta+0x16d/0x2a0 fs/f2fs/checkpoint.c:1159
 block_operations fs/f2fs/checkpoint.c:1269 [inline]
 f2fs_write_checkpoint+0xca3/0x2100 fs/f2fs/checkpoint.c:1658
 kill_f2fs_super+0x231/0x390 fs/f2fs/super.c:4668
 deactivate_locked_super+0x98/0x100 fs/super.c:332
 deactivate_super+0xaf/0xe0 fs/super.c:363
 cleanup_mnt+0x45f/0x4e0 fs/namespace.c:1186
 __cleanup_mnt+0x19/0x20 fs/namespace.c:1193
 task_work_run+0x1c6/0x230 kernel/task_work.c:203
 exit_task_work include/linux/task_work.h:39 [inline]
 do_exit+0x9fb/0x2410 kernel/exit.c:871
 do_group_exit+0x210/0x2d0 kernel/exit.c:1021
 __do_sys_exit_group kernel/exit.c:1032 [inline]
 __se_sys_exit_group kernel/exit.c:1030 [inline]
 __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1030
 x64_sys_call+0x7b4/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:232
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2
RIP: 0033:0x7f28b1b8e169
Code: Unable to access opcode bytes at 0x7f28b1b8e13f.
RSP: 002b:00007ffe174710a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 00007f28b1c10879 RCX: 00007f28b1b8e169
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
RBP: 0000000000000002 R08: 00007ffe1746ee47 R09: 00007ffe17472360
R10: 0000000000000009 R11: 0000000000000246 R12: 00007ffe17472360
R13: 00007f28b1c10854 R14: 000000000000dae5 R15: 00007ffe17474520
 </TASK>

Allocated by task 569:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_alloc_info+0x25/0x30 mm/kasan/generic.c:505
 __kasan_slab_alloc+0x72/0x80 mm/kasan/common.c:328
 kasan_slab_alloc include/linux/kasan.h:201 [inline]
 slab_post_alloc_hook+0x4f/0x2c0 mm/slab.h:737
 slab_alloc_node mm/slub.c:3398 [inline]
 slab_alloc mm/slub.c:3406 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
 kmem_cache_alloc_lru+0x104/0x220 mm/slub.c:3429
 alloc_inode_sb include/linux/fs.h:3245 [inline]
 f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x186/0x880 fs/inode.c:1373
 f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
 f2fs_lookup+0x366/0xab0 fs/f2fs/namei.c:487
 __lookup_slow+0x2a3/0x3d0 fs/namei.c:1690
 lookup_slow+0x57/0x70 fs/namei.c:1707
 walk_component+0x2e6/0x410 fs/namei.c:1998
 lookup_last fs/namei.c:2455 [inline]
 path_lookupat+0x180/0x490 fs/namei.c:2479
 filename_lookup+0x1f0/0x500 fs/namei.c:2508
 vfs_statx+0x10b/0x660 fs/stat.c:229
 vfs_fstatat fs/stat.c:267 [inline]
 vfs_lstat include/linux/fs.h:3424 [inline]
 __do_sys_newlstat fs/stat.c:423 [inline]
 __se_sys_newlstat+0xd5/0x350 fs/stat.c:417
 __x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
 x64_sys_call+0x393/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

Freed by task 13:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_free_info+0x31/0x50 mm/kasan/generic.c:516
 ____kasan_slab_free+0x132/0x180 mm/kasan/common.c:236
 __kasan_slab_free+0x11/0x20 mm/kasan/common.c:244
 kasan_slab_free include/linux/kasan.h:177 [inline]
 slab_free_hook mm/slub.c:1724 [inline]
 slab_free_freelist_hook+0xc2/0x190 mm/slub.c:1750
 slab_free mm/slub.c:3661 [inline]
 kmem_cache_free+0x12d/0x2a0 mm/slub.c:3683
 f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1562
 i_callback+0x4c/0x70 fs/inode.c:250
 rcu_do_batch+0x503/0xb80 kernel/rcu/tree.c:2297
 rcu_core+0x5a2/0xe70 kernel/rcu/tree.c:2557
 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
 handle_softirqs+0x178/0x500 kernel/softirq.c:578
 run_ksoftirqd+0x28/0x30 kernel/softirq.c:945
 smpboot_thread_fn+0x45a/0x8c0 kernel/smpboot.c:164
 kthread+0x270/0x310 kernel/kthread.c:376
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295

Last potentially related work creation:
 kasan_save_stack+0x3a/0x60 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xb6/0xc0 mm/kasan/generic.c:486
 kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
 call_rcu+0xd4/0xf70 kernel/rcu/tree.c:2845
 destroy_inode fs/inode.c:316 [inline]
 evict+0x7da/0x870 fs/inode.c:720
 iput_final fs/inode.c:1834 [inline]
 iput+0x62b/0x830 fs/inode.c:1860
 do_unlinkat+0x356/0x540 fs/namei.c:4397
 __do_sys_unlink fs/namei.c:4438 [inline]
 __se_sys_unlink fs/namei.c:4436 [inline]
 __x64_sys_unlink+0x49/0x50 fs/namei.c:4436
 x64_sys_call+0x958/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x4c/0xa0 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

The buggy address belongs to the object at ffff88812d961f20
 which belongs to the cache f2fs_inode_cache of size 1200
The buggy address is located 856 bytes inside of
 1200-byte region [ffff88812d961f20, ffff88812d9623d0)

The buggy address belongs to the physical page:
page:ffffea0004b65800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x12d960
head:ffffea0004b65800 order:2 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff88810a94c500
raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 2, migratetype Reclaimable, gfp_mask 0x1d2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_HARDWALL|__GFP_RECLAIMABLE), pid 569, tgid 568 (syz.2.16), ts 55943246141, free_ts 0
 set_page_owner include/linux/page_owner.h:31 [inline]
 post_alloc_hook+0x1d0/0x1f0 mm/page_alloc.c:2532
 prep_new_page mm/page_alloc.c:2539 [inline]
 get_page_from_freelist+0x2e63/0x2ef0 mm/page_alloc.c:4328
 __alloc_pages+0x235/0x4b0 mm/page_alloc.c:5605
 alloc_slab_page include/linux/gfp.h:-1 [inline]
 allocate_slab mm/slub.c:1939 [inline]
 new_slab+0xec/0x4b0 mm/slub.c:1992
 ___slab_alloc+0x6f6/0xb50 mm/slub.c:3180
 __slab_alloc+0x5e/0xa0 mm/slub.c:3279
 slab_alloc_node mm/slub.c:3364 [inline]
 slab_alloc mm/slub.c:3406 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3413 [inline]
 kmem_cache_alloc_lru+0x13f/0x220 mm/slub.c:3429
 alloc_inode_sb include/linux/fs.h:3245 [inline]
 f2fs_alloc_inode+0x2d/0x340 fs/f2fs/super.c:1419
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x186/0x880 fs/inode.c:1373
 f2fs_iget+0x55/0x4c60 fs/f2fs/inode.c:483
 f2fs_fill_super+0x3ad7/0x6bb0 fs/f2fs/super.c:4293
 mount_bdev+0x2ae/0x3e0 fs/super.c:1443
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:4642
 legacy_get_tree+0xea/0x190 fs/fs_context.c:632
 vfs_get_tree+0x89/0x260 fs/super.c:1573
 do_new_mount+0x25a/0xa20 fs/namespace.c:3056
page_owner free stack trace missing

Memory state around the buggy address:
 ffff88812d962100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88812d962180: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88812d962200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                                ^
 ffff88812d962280: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff88812d962300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

[1] https://syzkaller.appspot.com/x/report.txt?x=13448368580000

This bug can be reproduced w/ the reproducer [2], once we enable
CONFIG_F2FS_CHECK_FS config, the reproducer will trigger panic as below,
so the direct reason of this bug is the same as the one below patch [3]
fixed.

kernel BUG at fs/f2fs/inode.c:857!
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20
Call Trace:
 <TASK>
 evict+0x32a/0x7a0
 do_unlinkat+0x37b/0x5b0
 __x64_sys_unlink+0xad/0x100
 do_syscall_64+0x5a/0xb0
 entry_SYSCALL_64_after_hwframe+0x6e/0xd8
RIP: 0010:f2fs_evict_inode+0x1204/0x1a20

[2] https://syzkaller.appspot.com/x/repro.c?x=17495ccc580000
[3] https://lore.kernel.org/linux-f2fs-devel/20250702120321.1080759-1-chao@kernel.org

Tracepoints before panic:

f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file1
f2fs_unlink_exit: dev = (7,0), ino = 7, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 7, pino = 3, i_mode = 0x81ed, i_size = 10, i_nlink = 0, i_blocks = 0, i_advise = 0x0
f2fs_truncate_node: dev = (7,0), ino = 7, nid = 8, block_address = 0x3c05

f2fs_unlink_enter: dev = (7,0), dir ino = 3, i_size = 4096, i_blocks = 8, name = file3
f2fs_unlink_exit: dev = (7,0), ino = 8, ret = 0
f2fs_evict_inode: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 9000, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate: dev = (7,0), ino = 8, pino = 3, i_mode = 0x81ed, i_size = 0, i_nlink = 0, i_blocks = 24, i_advise = 0x4
f2fs_truncate_blocks_enter: dev = (7,0), ino = 8, i_size = 0, i_blocks = 24, start file offset = 0
f2fs_truncate_blocks_exit: dev = (7,0), ino = 8, ret = -2

The root cause is: in the fuzzed image, dnode #8 belongs to inode #7,
after inode #7 eviction, dnode #8 was dropped.

However there is dirent that has ino #8, so, once we unlink file3, in
f2fs_evict_inode(), both f2fs_truncate() and f2fs_update_inode_page()
will fail due to we can not load node #8, result in we missed to call
f2fs_inode_synced() to clear inode dirty status.

Let's fix this by calling f2fs_inode_synced() in error path of
f2fs_evict_inode().

PS: As I verified, the reproducer [2] can trigger this bug in v6.1.129,
but it failed in v6.16-rc4, this is because the testcase will stop due to
other corruption has been detected by f2fs:

F2FS-fs (loop0): inconsistent node block, node_type:2, nid:8, node_footer[nid:8,ino:8,ofs:0,cpver:5013063228981249506,blkaddr:15366]
F2FS-fs (loop0): f2fs_lookup: inode (ino=9) has zero i_nlink

Fixes: 0f18b462b2 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/x/report.txt?x=13448368580000
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-09 18:02:06 +00:00
Chao Yu
7c30d79930 f2fs: fix to avoid UAF in f2fs_sync_inode_meta()
syzbot reported an UAF issue as below: [1] [2]

[1] https://syzkaller.appspot.com/text?tag=CrashReport&x=16594c60580000

==================================================================
BUG: KASAN: use-after-free in __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
Read of size 8 at addr ffff888100567dc8 by task kworker/u4:0/8

CPU: 1 PID: 8 Comm: kworker/u4:0 Tainted: G        W          6.1.129-syzkaller-00017-g642656a36791 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
Workqueue: writeback wb_workfn (flush-7:0)
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:88 [inline]
 dump_stack_lvl+0x151/0x1b7 lib/dump_stack.c:106
 print_address_description mm/kasan/report.c:316 [inline]
 print_report+0x158/0x4e0 mm/kasan/report.c:427
 kasan_report+0x13c/0x170 mm/kasan/report.c:531
 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report_generic.c:351
 __list_del_entry_valid+0xa6/0x130 lib/list_debug.c:62
 __list_del_entry include/linux/list.h:134 [inline]
 list_del_init include/linux/list.h:206 [inline]
 f2fs_inode_synced+0x100/0x2e0 fs/f2fs/super.c:1553
 f2fs_update_inode+0x72/0x1c40 fs/f2fs/inode.c:588
 f2fs_update_inode_page+0x135/0x170 fs/f2fs/inode.c:706
 f2fs_write_inode+0x416/0x790 fs/f2fs/inode.c:734
 write_inode fs/fs-writeback.c:1460 [inline]
 __writeback_single_inode+0x4cf/0xb80 fs/fs-writeback.c:1677
 writeback_sb_inodes+0xb32/0x1910 fs/fs-writeback.c:1903
 __writeback_inodes_wb+0x118/0x3f0 fs/fs-writeback.c:1974
 wb_writeback+0x3da/0xa00 fs/fs-writeback.c:2081
 wb_check_background_flush fs/fs-writeback.c:2151 [inline]
 wb_do_writeback fs/fs-writeback.c:2239 [inline]
 wb_workfn+0xbba/0x1030 fs/fs-writeback.c:2266
 process_one_work+0x73d/0xcb0 kernel/workqueue.c:2299
 worker_thread+0xa60/0x1260 kernel/workqueue.c:2446
 kthread+0x26d/0x300 kernel/kthread.c:386
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
 </TASK>

Allocated by task 298:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_alloc_info+0x1f/0x30 mm/kasan/generic.c:505
 __kasan_slab_alloc+0x6c/0x80 mm/kasan/common.c:333
 kasan_slab_alloc include/linux/kasan.h:202 [inline]
 slab_post_alloc_hook+0x53/0x2c0 mm/slab.h:768
 slab_alloc_node mm/slub.c:3421 [inline]
 slab_alloc mm/slub.c:3431 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3438 [inline]
 kmem_cache_alloc_lru+0x102/0x270 mm/slub.c:3454
 alloc_inode_sb include/linux/fs.h:3255 [inline]
 f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x18c/0x7e0 fs/inode.c:1373
 f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
 f2fs_lookup+0x3c1/0xb50 fs/f2fs/namei.c:484
 __lookup_slow+0x2b9/0x3e0 fs/namei.c:1689
 lookup_slow+0x5a/0x80 fs/namei.c:1706
 walk_component+0x2e7/0x410 fs/namei.c:1997
 lookup_last fs/namei.c:2454 [inline]
 path_lookupat+0x16d/0x450 fs/namei.c:2478
 filename_lookup+0x251/0x600 fs/namei.c:2507
 vfs_statx+0x107/0x4b0 fs/stat.c:229
 vfs_fstatat fs/stat.c:267 [inline]
 vfs_lstat include/linux/fs.h:3434 [inline]
 __do_sys_newlstat fs/stat.c:423 [inline]
 __se_sys_newlstat+0xda/0x7c0 fs/stat.c:417
 __x64_sys_newlstat+0x5b/0x70 fs/stat.c:417
 x64_sys_call+0x52/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:7
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

Freed by task 0:
 kasan_save_stack mm/kasan/common.c:45 [inline]
 kasan_set_track+0x4b/0x70 mm/kasan/common.c:52
 kasan_save_free_info+0x2b/0x40 mm/kasan/generic.c:516
 ____kasan_slab_free+0x131/0x180 mm/kasan/common.c:241
 __kasan_slab_free+0x11/0x20 mm/kasan/common.c:249
 kasan_slab_free include/linux/kasan.h:178 [inline]
 slab_free_hook mm/slub.c:1745 [inline]
 slab_free_freelist_hook mm/slub.c:1771 [inline]
 slab_free mm/slub.c:3686 [inline]
 kmem_cache_free+0x291/0x560 mm/slub.c:3711
 f2fs_free_inode+0x24/0x30 fs/f2fs/super.c:1584
 i_callback+0x4b/0x70 fs/inode.c:250
 rcu_do_batch+0x552/0xbe0 kernel/rcu/tree.c:2297
 rcu_core+0x502/0xf40 kernel/rcu/tree.c:2557
 rcu_core_si+0x9/0x10 kernel/rcu/tree.c:2574
 handle_softirqs+0x1db/0x650 kernel/softirq.c:624
 __do_softirq kernel/softirq.c:662 [inline]
 invoke_softirq kernel/softirq.c:479 [inline]
 __irq_exit_rcu+0x52/0xf0 kernel/softirq.c:711
 irq_exit_rcu+0x9/0x10 kernel/softirq.c:723
 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1118 [inline]
 sysvec_apic_timer_interrupt+0xa9/0xc0 arch/x86/kernel/apic/apic.c:1118
 asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:691

Last potentially related work creation:
 kasan_save_stack+0x3b/0x60 mm/kasan/common.c:45
 __kasan_record_aux_stack+0xb4/0xc0 mm/kasan/generic.c:486
 kasan_record_aux_stack_noalloc+0xb/0x10 mm/kasan/generic.c:496
 __call_rcu_common kernel/rcu/tree.c:2807 [inline]
 call_rcu+0xdc/0x10f0 kernel/rcu/tree.c:2926
 destroy_inode fs/inode.c:316 [inline]
 evict+0x87d/0x930 fs/inode.c:720
 iput_final fs/inode.c:1834 [inline]
 iput+0x616/0x690 fs/inode.c:1860
 do_unlinkat+0x4e1/0x920 fs/namei.c:4396
 __do_sys_unlink fs/namei.c:4437 [inline]
 __se_sys_unlink fs/namei.c:4435 [inline]
 __x64_sys_unlink+0x49/0x50 fs/namei.c:4435
 x64_sys_call+0x289/0x9a0 arch/x86/include/generated/asm/syscalls_64.h:88
 do_syscall_x64 arch/x86/entry/common.c:51 [inline]
 do_syscall_64+0x3b/0x80 arch/x86/entry/common.c:81
 entry_SYSCALL_64_after_hwframe+0x68/0xd2

The buggy address belongs to the object at ffff888100567a10
 which belongs to the cache f2fs_inode_cache of size 1360
The buggy address is located 952 bytes inside of
 1360-byte region [ffff888100567a10, ffff888100567f60)

The buggy address belongs to the physical page:
page:ffffea0004015800 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100560
head:ffffea0004015800 order:3 compound_mapcount:0 compound_pincount:0
flags: 0x4000000000010200(slab|head|zone=1)
raw: 4000000000010200 0000000000000000 dead000000000122 ffff8881002c4d80
raw: 0000000000000000 0000000080160016 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 3, migratetype Reclaimable, gfp_mask 0xd2050(__GFP_IO|__GFP_NOWARN|__GFP_NORETRY|__GFP_COMP|__GFP_NOMEMALLOC|__GFP_RECLAIMABLE), pid 298, tgid 298 (syz-executor330), ts 26489303743, free_ts 0
 set_page_owner include/linux/page_owner.h:33 [inline]
 post_alloc_hook+0x213/0x220 mm/page_alloc.c:2637
 prep_new_page+0x1b/0x110 mm/page_alloc.c:2644
 get_page_from_freelist+0x3a98/0x3b10 mm/page_alloc.c:4539
 __alloc_pages+0x234/0x610 mm/page_alloc.c:5837
 alloc_slab_page+0x6c/0xf0 include/linux/gfp.h:-1
 allocate_slab mm/slub.c:1962 [inline]
 new_slab+0x90/0x3e0 mm/slub.c:2015
 ___slab_alloc+0x6f9/0xb80 mm/slub.c:3203
 __slab_alloc+0x5d/0xa0 mm/slub.c:3302
 slab_alloc_node mm/slub.c:3387 [inline]
 slab_alloc mm/slub.c:3431 [inline]
 __kmem_cache_alloc_lru mm/slub.c:3438 [inline]
 kmem_cache_alloc_lru+0x149/0x270 mm/slub.c:3454
 alloc_inode_sb include/linux/fs.h:3255 [inline]
 f2fs_alloc_inode+0x2d/0x350 fs/f2fs/super.c:1437
 alloc_inode fs/inode.c:261 [inline]
 iget_locked+0x18c/0x7e0 fs/inode.c:1373
 f2fs_iget+0x55/0x4ca0 fs/f2fs/inode.c:486
 f2fs_fill_super+0x5360/0x6dc0 fs/f2fs/super.c:4488
 mount_bdev+0x282/0x3b0 fs/super.c:1445
 f2fs_mount+0x34/0x40 fs/f2fs/super.c:4743
 legacy_get_tree+0xf1/0x190 fs/fs_context.c:632
page_owner free stack trace missing

Memory state around the buggy address:
 ffff888100567c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888100567d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff888100567d80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                              ^
 ffff888100567e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ffff888100567e80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

[2] https://syzkaller.appspot.com/text?tag=CrashLog&x=13654c60580000

[   24.675720][   T28] audit: type=1400 audit(1745327318.732:72): avc:  denied  { write } for  pid=298 comm="syz-executor399" name="/" dev="loop0" ino=3 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[   24.705426][  T296] ------------[ cut here ]------------
[   24.706608][   T28] audit: type=1400 audit(1745327318.732:73): avc:  denied  { remove_name } for  pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[   24.711550][  T296] WARNING: CPU: 0 PID: 296 at fs/f2fs/inode.c:847 f2fs_evict_inode+0x1262/0x1540
[   24.734141][   T28] audit: type=1400 audit(1745327318.732:74): avc:  denied  { rename } for  pid=298 comm="syz-executor399" name="file0" dev="loop0" ino=4 scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[   24.742969][  T296] Modules linked in:
[   24.765201][   T28] audit: type=1400 audit(1745327318.732:75): avc:  denied  { add_name } for  pid=298 comm="syz-executor399" name="bus" scontext=root:sysadm_r:sysadm_t tcontext=system_u:object_r:unlabeled_t tclass=dir permissive=1
[   24.768847][  T296] CPU: 0 PID: 296 Comm: syz-executor399 Not tainted 6.1.129-syzkaller-00017-g642656a36791 #0
[   24.799506][  T296] Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025
[   24.809401][  T296] RIP: 0010:f2fs_evict_inode+0x1262/0x1540
[   24.815018][  T296] Code: 34 70 4a ff eb 0d e8 2d 70 4a ff 4d 89 e5 4c 8b 64 24 18 48 8b 5c 24 28 4c 89 e7 e8 78 38 03 00 e9 84 fc ff ff e8 0e 70 4a ff <0f> 0b 4c 89 f7 be 08 00 00 00 e8 7f 21 92 ff f0 41 80 0e 04 e9 61
[   24.834584][  T296] RSP: 0018:ffffc90000db7a40 EFLAGS: 00010293
[   24.840465][  T296] RAX: ffffffff822aca42 RBX: 0000000000000002 RCX: ffff888110948000
[   24.848291][  T296] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 0000000000000000
[   24.856064][  T296] RBP: ffffc90000db7bb0 R08: ffffffff822ac6a8 R09: ffffed10200b005d
[   24.864073][  T296] R10: 0000000000000000 R11: dffffc0000000001 R12: ffff888100580000
[   24.871812][  T296] R13: dffffc0000000000 R14: ffff88810fef4078 R15: 1ffff920001b6f5c

The root cause is w/ a fuzzed image, f2fs may missed to clear FI_DIRTY_INODE
flag for target inode, after f2fs_evict_inode(), the inode is still linked in
sbi->inode_list[DIRTY_META] global list, once it triggers checkpoint,
f2fs_sync_inode_meta() may access the released inode.

In f2fs_evict_inode(), let's always call f2fs_inode_synced() to clear
FI_DIRTY_INODE flag and drop inode from global dirty list to avoid this
UAF issue.

Fixes: 0f18b462b2 ("f2fs: flush inode metadata when checkpoint is doing")
Closes: https://syzkaller.appspot.com/bug?extid=849174b2efaf0d8be6ba
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-07-09 18:01:39 +00:00
Eric Biggers
d005af3b67 f2fs: remove unused sbi argument from checksum functions
Since __f2fs_crc32() now calls crc32() directly, it no longer uses its
sbi argument.  Remove that, and simplify its callers accordingly.

Signed-off-by: Eric Biggers <ebiggers@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-05-27 23:52:35 +00:00
Matthew Wilcox (Oracle)
7d28f13c58 f2fs: Pass a folio to get_dnode_addr()
All callers except __get_inode_rdev() and __set_inode_rdev() now have a
folio, but the only callers of those two functions do have a folio, so
pass the folio to them and then into get_dnode_addr().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle)
f92379289f f2fs: Pass a folio to f2fs_update_inode()
All callers now have a folio, so pass it in.  Remove two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:47 +00:00
Matthew Wilcox (Oracle)
398c7df7bc f2fs: Pass a folio to f2fs_init_read_extent_tree()
The only caller alredy has a folio so pass it in.  Remove two calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:46 +00:00
Matthew Wilcox (Oracle)
d79bc8ab44 f2fs: Pass a folio to inline_data_addr()
All callers now have a folio, so pass it in.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:45 +00:00
Matthew Wilcox (Oracle)
1834406c98 f2fs: Pass a folio to __recover_inline_status()
The only caller has a folio so pass it in.  Removes two calls to
compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:45 +00:00
Matthew Wilcox (Oracle)
802aa48dba f2fs: Use a folio in do_read_inode()
Remove five calls to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:41 +00:00
Matthew Wilcox (Oracle)
870ef8d3c4 f2fs: Use a folio in f2fs_update_inode_page()
Remove a call to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:41 +00:00
Matthew Wilcox (Oracle)
7c213e98c7 f2fs: Pass a folio to f2fs_inode_chksum_verify()
Both callers now have a folio, so pass it in.  Removes three calls
to compound_head().

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-28 15:26:35 +00:00
Chao Yu
061cf3a84b f2fs: fix to do sanity check on ino and xnid
syzbot reported a f2fs bug as below:

INFO: task syz-executor140:5308 blocked for more than 143 seconds.
      Not tainted 6.14.0-rc7-syzkaller-00069-g81e4f8d68c66 #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor140 state:D stack:24016 pid:5308  tgid:5308  ppid:5306   task_flags:0x400140 flags:0x00000006
Call Trace:
 <TASK>
 context_switch kernel/sched/core.c:5378 [inline]
 __schedule+0x190e/0x4c90 kernel/sched/core.c:6765
 __schedule_loop kernel/sched/core.c:6842 [inline]
 schedule+0x14b/0x320 kernel/sched/core.c:6857
 io_schedule+0x8d/0x110 kernel/sched/core.c:7690
 folio_wait_bit_common+0x839/0xee0 mm/filemap.c:1317
 __folio_lock mm/filemap.c:1664 [inline]
 folio_lock include/linux/pagemap.h:1163 [inline]
 __filemap_get_folio+0x147/0xb40 mm/filemap.c:1917
 pagecache_get_page+0x2c/0x130 mm/folio-compat.c:87
 find_get_page_flags include/linux/pagemap.h:842 [inline]
 f2fs_grab_cache_page+0x2b/0x320 fs/f2fs/f2fs.h:2776
 __get_node_page+0x131/0x11b0 fs/f2fs/node.c:1463
 read_xattr_block+0xfb/0x190 fs/f2fs/xattr.c:306
 lookup_all_xattrs fs/f2fs/xattr.c:355 [inline]
 f2fs_getxattr+0x676/0xf70 fs/f2fs/xattr.c:533
 __f2fs_get_acl+0x52/0x870 fs/f2fs/acl.c:179
 f2fs_acl_create fs/f2fs/acl.c:375 [inline]
 f2fs_init_acl+0xd7/0x9b0 fs/f2fs/acl.c:418
 f2fs_init_inode_metadata+0xa0f/0x1050 fs/f2fs/dir.c:539
 f2fs_add_inline_entry+0x448/0x860 fs/f2fs/inline.c:666
 f2fs_add_dentry+0xba/0x1e0 fs/f2fs/dir.c:765
 f2fs_do_add_link+0x28c/0x3a0 fs/f2fs/dir.c:808
 f2fs_add_link fs/f2fs/f2fs.h:3616 [inline]
 f2fs_mknod+0x2e8/0x5b0 fs/f2fs/namei.c:766
 vfs_mknod+0x36d/0x3b0 fs/namei.c:4191
 unix_bind_bsd net/unix/af_unix.c:1286 [inline]
 unix_bind+0x563/0xe30 net/unix/af_unix.c:1379
 __sys_bind_socket net/socket.c:1817 [inline]
 __sys_bind+0x1e4/0x290 net/socket.c:1848
 __do_sys_bind net/socket.c:1853 [inline]
 __se_sys_bind net/socket.c:1851 [inline]
 __x64_sys_bind+0x7a/0x90 net/socket.c:1851
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Let's dump and check metadata of corrupted inode, it shows its xattr_nid
is the same to its i_ino.

dump.f2fs -i 3 chaseyu.img.raw
i_xattr_nid                             [0x       3 : 3]

So that, during mknod in the corrupted directory, it tries to get and
lock inode page twice, result in deadlock.

- f2fs_mknod
 - f2fs_add_inline_entry
  - f2fs_get_inode_page --- lock dir's inode page
   - f2fs_init_acl
    - f2fs_acl_create(dir,..)
     - __f2fs_get_acl
      - f2fs_getxattr
       - lookup_all_xattrs
        - __get_node_page --- try to lock dir's inode page

In order to fix this, let's add sanity check on ino and xnid.

Cc: stable@vger.kernel.org
Reported-by: syzbot+cc448dcdc7ae0b4e4ffa@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/67e06150.050a0220.21942d.0005.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Chao Yu
db03c20c08 f2fs: fix to set atomic write status more clear
1. After we start atomic write in a database file, before committing
all data, we'd better not set inode w/ vfs dirty status to avoid
redundant updates, instead, we only set inode w/ atomic dirty status.

2. After we commit all data, before committing metadata, we need to
clear atomic dirty status, and set vfs dirty status to allow vfs flush
dirty inode.

Cc: Daeho Jeong <daehojeong@google.com>
Reported-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Zhiguo Niu <zhiguo.niu@unisoc.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-04-10 03:59:58 +00:00
Yeongjin Gil
f098aeba04 f2fs: fix to avoid atomicity corruption of atomic file
In the case of the following call stack for an atomic file,
FI_DIRTY_INODE is set, but FI_ATOMIC_DIRTIED is not subsequently set.

f2fs_file_write_iter
  f2fs_map_blocks
    f2fs_reserve_new_blocks
      inc_valid_block_count
        __mark_inode_dirty(dquot)
          f2fs_dirty_inode

If FI_ATOMIC_DIRTIED is not set, atomic file can encounter corruption
due to a mismatch between old file size and new data.

To resolve this issue, I changed to set FI_ATOMIC_DIRTIED when
FI_DIRTY_INODE is set. This ensures that FI_DIRTY_INODE, which was
previously cleared by the Writeback thread during the commit atomic, is
set and i_size is updated.

Cc: <stable@vger.kernel.org>
Fixes: fccaa81de8 ("f2fs: prevent atomic file from being dirtied before commit")
Reviewed-by: Sungjong Seo <sj1557.seo@samsung.com>
Reviewed-by: Sunmin Jeong <s_min.jeong@samsung.com>
Signed-off-by: Yeongjin Gil <youngjin.gil@samsung.com>
Reviewed-by: Daeho Jeong <daehojeong@google.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-17 17:38:33 +00:00
Chao Yu
986c50f6bc f2fs: fix to avoid accessing uninitialized curseg
syzbot reports a f2fs bug as below:

F2FS-fs (loop3): Stopped filesystem due to reason: 7
kworker/u8:7: attempt to access beyond end of device
BUG: unable to handle page fault for address: ffffed1604ea3dfa
RIP: 0010:get_ckpt_valid_blocks fs/f2fs/segment.h:361 [inline]
RIP: 0010:has_curseg_enough_space fs/f2fs/segment.h:570 [inline]
RIP: 0010:__get_secs_required fs/f2fs/segment.h:620 [inline]
RIP: 0010:has_not_enough_free_secs fs/f2fs/segment.h:633 [inline]
RIP: 0010:has_enough_free_secs+0x575/0x1660 fs/f2fs/segment.h:649
 <TASK>
 f2fs_is_checkpoint_ready fs/f2fs/segment.h:671 [inline]
 f2fs_write_inode+0x425/0x540 fs/f2fs/inode.c:791
 write_inode fs/fs-writeback.c:1525 [inline]
 __writeback_single_inode+0x708/0x10d0 fs/fs-writeback.c:1745
 writeback_sb_inodes+0x820/0x1360 fs/fs-writeback.c:1976
 wb_writeback+0x413/0xb80 fs/fs-writeback.c:2156
 wb_do_writeback fs/fs-writeback.c:2303 [inline]
 wb_workfn+0x410/0x1080 fs/fs-writeback.c:2343
 process_one_work kernel/workqueue.c:3236 [inline]
 process_scheduled_works+0xa66/0x1840 kernel/workqueue.c:3317
 worker_thread+0x870/0xd30 kernel/workqueue.c:3398
 kthread+0x7a9/0x920 kernel/kthread.c:464
 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:148
 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244

Commit 8b10d36537 ("f2fs: introduce FAULT_NO_SEGMENT") allows to trigger
no free segment fault in allocator, then it will update curseg->segno to
NULL_SEGNO, though, CP_ERROR_FLAG has been set, f2fs_write_inode() missed
to check the flag, and access invalid curseg->segno directly in below call
path, then resulting in panic:

- f2fs_write_inode
 - f2fs_is_checkpoint_ready
  - has_enough_free_secs
   - has_not_enough_free_secs
    - __get_secs_required
     - has_curseg_enough_space
      - get_ckpt_valid_blocks
      : access invalid curseg->segno

To avoid this issue, let's:
- check CP_ERROR_FLAG flag in prior to f2fs_is_checkpoint_ready() in
f2fs_write_inode().
- in has_curseg_enough_space(), save curseg->segno into a temp variable,
and verify its validation before use.

Fixes: 8b10d36537 ("f2fs: introduce FAULT_NO_SEGMENT")
Reported-by: syzbot+b6b347b7a4ea1b2e29b6@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67973c2b.050a0220.11b1bb.0089.GAE@google.com
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Chao Yu
1cf6b5670a f2fs: do sanity check on inode footer in f2fs_get_inode_page()
This patch introduces a new wrapper f2fs_get_inode_page(), then, caller
can use it to load inode block to page cache, meanwhile it will do sanity
check on inode footer.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-03-11 03:25:53 +00:00
Jaegeuk Kim
ef0c333cad f2fs: keep POSIX_FADV_NOREUSE ranges
This patch records POSIX_FADV_NOREUSE ranges for users to reclaim the caches
instantly off from LRU.

Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-13 17:58:31 +00:00
Chao Yu
1534747d31 f2fs: don't retry IO for corrupted data scenario
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]
F2FS-fs (dm-105): inconsistent node block, nid:430, node_footer[nid:2198964142,ino:598252782,ofs:118300154,cpver:5409237455940746069,blkaddr:2125070942]

If node block is loaded successfully, but its content is inconsistent, it
doesn't need to retry IO.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-02-10 17:13:40 +00:00
Nathan Chancellor
a68905d48a f2fs: Fix format specifier in sanity_check_inode()
When building for 32-bit platforms, for which 'size_t' is 'unsigned int',
there is a warning due to an incorrect format specifier:

  fs/f2fs/inode.c:320:6: error: format specifies type 'unsigned long' but the argument has type 'unsigned int' [-Werror,-Wformat]
    318 |                 f2fs_warn(sbi, "%s: inode (ino=%lx) has corrupted i_inline_xattr_size: %d, min: %lu, max: %lu",
        |                                                                                                 ~~~
        |                                                                                                 %u
    319 |                           __func__, inode->i_ino, fi->i_inline_xattr_size,
    320 |                           MIN_INLINE_XATTR_SIZE, MAX_INLINE_XATTR_SIZE);
        |                           ^~~~~~~~~~~~~~~~~~~~~
  fs/f2fs/f2fs.h:1855:46: note: expanded from macro 'f2fs_warn'
   1855 |         f2fs_printk(sbi, false, KERN_WARNING fmt, ##__VA_ARGS__)
        |                                              ~~~    ^~~~~~~~~~~
  fs/f2fs/xattr.h:86:31: note: expanded from macro 'MIN_INLINE_XATTR_SIZE'
     86 | #define MIN_INLINE_XATTR_SIZE (sizeof(struct f2fs_xattr_header) / sizeof(__le32))
        |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Use the format specifier for 'size_t', '%zu', to resolve the warning.

Fixes: 5c1768b672 ("f2fs: fix to do sanity check correctly on i_inline_xattr_size")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-22 21:04:49 +00:00
Chao Yu
5c1768b672 f2fs: fix to do sanity check correctly on i_inline_xattr_size
syzbot reported an out-of-range access issue as below:

UBSAN: array-index-out-of-bounds in fs/f2fs/f2fs.h:3292:19
index 18446744073709550491 is out of range for type '__le32[923]' (aka 'unsigned int[923]')
CPU: 0 UID: 0 PID: 5338 Comm: syz.0.0 Not tainted 6.12.0-syzkaller-10689-g7af08b57bcb9 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014
Call Trace:
 <TASK>
 __dump_stack lib/dump_stack.c:94 [inline]
 dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
 ubsan_epilogue lib/ubsan.c:231 [inline]
 __ubsan_handle_out_of_bounds+0x121/0x150 lib/ubsan.c:429
 read_inline_xattr+0x273/0x280
 lookup_all_xattrs fs/f2fs/xattr.c:341 [inline]
 f2fs_getxattr+0x57b/0x13b0 fs/f2fs/xattr.c:533
 vfs_getxattr_alloc+0x472/0x5c0 fs/xattr.c:393
 ima_read_xattr+0x38/0x60 security/integrity/ima/ima_appraise.c:229
 process_measurement+0x117a/0x1fb0 security/integrity/ima/ima_main.c:353
 ima_file_check+0xd9/0x120 security/integrity/ima/ima_main.c:572
 security_file_post_open+0xb9/0x280 security/security.c:3121
 do_open fs/namei.c:3830 [inline]
 path_openat+0x2ccd/0x3590 fs/namei.c:3987
 do_file_open_root+0x3a7/0x720 fs/namei.c:4039
 file_open_root+0x247/0x2a0 fs/open.c:1382
 do_handle_open+0x85b/0x9d0 fs/fhandle.c:414
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

index: 18446744073709550491 (decimal, unsigned long long)
= 0xfffffffffffffb9b (hexadecimal) = -1125 (decimal, long long)
UBSAN detects that inline_xattr_addr() tries to access .i_addr[-1125].

w/ below testcase, it can reproduce this bug easily:
- mkfs.f2fs -f -O extra_attr,flexible_inline_xattr /dev/sdb
- mount -o inline_xattr_size=512 /dev/sdb /mnt/f2fs
- touch /mnt/f2fs/file
- umount /mnt/f2fs
- inject.f2fs --node --mb i_inline --nid 4 --val 0x1 /dev/sdb
- inject.f2fs --node --mb i_inline_xattr_size --nid 4 --val 2048 /dev/sdb
- mount /dev/sdb /mnt/f2fs
- getfattr /mnt/f2fs/file

The root cause is if metadata of filesystem and inode were fuzzed as below:
- extra_attr feature is enabled
- flexible_inline_xattr feature is enabled
- ri.i_inline_xattr_size = 2048
- F2FS_EXTRA_ATTR bit in ri.i_inline was not set

sanity_check_inode() will skip doing sanity check on fi->i_inline_xattr_size,
result in using invalid inline_xattr_size later incorrectly, fix it.

Meanwhile, let's fix to check lower boundary for .i_inline_xattr_size w/
MIN_INLINE_XATTR_SIZE like we did in parse_options().

There is a related issue reported by syzbot, Qasim Ijaz has anlyzed and
fixed it w/ very similar way [1], as discussed, we all agree that it will
be better to do sanity check in sanity_check_inode() for fix, so finally,
let's fix these two related bugs w/ current patch.

Including commit message from Qasim's patch as below, thanks a lot for
his contribution.

"In f2fs_getxattr(), the function lookup_all_xattrs() allocates a 12-byte
(base_size) buffer for an inline extended attribute. However, when
__find_inline_xattr() calls __find_xattr(), it uses the macro
"list_for_each_xattr(entry, addr)", which starts by calling
XATTR_FIRST_ENTRY(addr). This skips a 24-byte struct f2fs_xattr_header
at the beginning of the buffer, causing an immediate out-of-bounds read
in a 12-byte allocation. The subsequent !IS_XATTR_LAST_ENTRY(entry)
check then dereferences memory outside the allocated region, triggering
the slab-out-of bounds read.

This patch prevents the out-of-bounds read by adding a check to bail
out early if inline_size is too small and does not account for the
header plus the 4-byte value that IS_XATTR_LAST_ENTRY reads."

[1]: https://lore.kernel.org/linux-f2fs-devel/Z32y1rfBY9Qb5ZjM@qasdev.system/

Fixes: 6afc662e68 ("f2fs: support flexible inline xattr size")
Reported-by: syzbot+69f5379a1717a0b982a1@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-f2fs-devel/674f4e7d.050a0220.17bd51.004f.GAE@google.com
Reported-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Closes: https://syzkaller.appspot.com/bug?extid=f5e74075e096e757bdbf
Tested-by: syzbot <syzbot+f5e74075e096e757bdbf@syzkaller.appspotmail.com>
Tested-by: Qasim Ijaz <qasdev00@gmail.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2025-01-16 17:27:51 +00:00
Daeho Jeong
128d333f0d f2fs: introduce device aliasing file
F2FS should understand how the device aliasing file works and support
deleting the file after use. A device aliasing file can be created by
mkfs.f2fs tool and it can map the whole device with an extent, not
using node blocks. The file space should be pinned and normally used for
read-only usages.

Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-11-01 01:19:00 +00:00
Qi Han
d5c367ef82 f2fs: fix f2fs_bug_on when uninstalling filesystem call f2fs_evict_inode.
creating a large files during checkpoint disable until it runs out of
space and then delete it, then remount to enable checkpoint again, and
then unmount the filesystem triggers the f2fs_bug_on as below:

------------[ cut here ]------------
kernel BUG at fs/f2fs/inode.c:896!
CPU: 2 UID: 0 PID: 1286 Comm: umount Not tainted 6.11.0-rc7-dirty #360
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
RIP: 0010:f2fs_evict_inode+0x58c/0x610
Call Trace:
 __die_body+0x15/0x60
 die+0x33/0x50
 do_trap+0x10a/0x120
 f2fs_evict_inode+0x58c/0x610
 do_error_trap+0x60/0x80
 f2fs_evict_inode+0x58c/0x610
 exc_invalid_op+0x53/0x60
 f2fs_evict_inode+0x58c/0x610
 asm_exc_invalid_op+0x16/0x20
 f2fs_evict_inode+0x58c/0x610
 evict+0x101/0x260
 dispose_list+0x30/0x50
 evict_inodes+0x140/0x190
 generic_shutdown_super+0x2f/0x150
 kill_block_super+0x11/0x40
 kill_f2fs_super+0x7d/0x140
 deactivate_locked_super+0x2a/0x70
 cleanup_mnt+0xb3/0x140
 task_work_run+0x61/0x90

The root cause is: creating large files during disable checkpoint
period results in not enough free segments, so when writing back root
inode will failed in f2fs_enable_checkpoint. When umount the file
system after enabling checkpoint, the root inode is dirty in
f2fs_evict_inode function, which triggers BUG_ON. The steps to
reproduce are as follows:

dd if=/dev/zero of=f2fs.img bs=1M count=55
mount f2fs.img f2fs_dir -o checkpoint=disable:10%
dd if=/dev/zero of=big bs=1M count=50
sync
rm big
mount -o remount,checkpoint=enable f2fs_dir
umount f2fs_dir

Let's redirty inode when there is not free segments during checkpoint
is disable.

Signed-off-by: Qi Han <hanqi@vivo.com>
Reviewed-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-10-14 20:04:57 +00:00
Daeho Jeong
fccaa81de8 f2fs: prevent atomic file from being dirtied before commit
Keep atomic file clean while updating and make it dirtied during commit
in order to avoid unnecessary and excessive inode updates in the previous
fix.

Fixes: 4bf7832234 ("f2fs: mark inode dirty for FI_ATOMIC_COMMITTED flag")
Signed-off-by: Daeho Jeong <daehojeong@google.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-11 03:30:27 +00:00
Chao Yu
5697e94daa f2fs: get rid of page->index
Convert to use folio, so that we can get rid of 'page->index' to
prepare for removal of 'index' field in structure page [1].

[1] https://lore.kernel.org/all/Zp8fgUSIBGQ1TN0D@casper.infradead.org/

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Reviewed-by: Li Zetao <lizetao1@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-09-06 23:04:48 +00:00
Chao Yu
5bcde45578 f2fs: get rid of buffer_head use
Convert to use folio and related functionality.

Cc: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-08-15 15:26:40 +00:00
Chao Yu
7309871c03 f2fs: clean up F2FS_I()
Use temporary variable instead of F2FS_I() for cleanup.

Signed-off-by: Chao Yu <chao@kernel.org>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2024-07-10 23:13:15 +00:00