linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Linus Torvalds	a436a0b847	Merge tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Refactor code paths involved with partial block zero-out in prearation for converting ext4 to use iomap for buffered writes - Remove use of d_alloc() from ext4 in preparation for the deprecation of this interface - Replace some J_ASSERTS with a journal abort so we can avoid a kernel panic for a localized file system error - Simplify various code paths in mballoc, move_extent, and fast commit - Fix rare deadlock in jbd2_journal_cancel_revoke() that can be triggered by generic/013 when blocksize < pagesize - Fix memory leak when releasing an extended attribute when its value is stored in an ea_inode - Fix various potential kunit test bugs in fs/ext4/extents.c - Fix potential out-of-bounds access in check_xattr() with a corrupted file system - Make the jbd2_inode dirty range tracking safe for lockless reads - Avoid a WARN_ON when writeback files due to a corrupted file system; we already print an ext4 warning indicatign that data will be lost, so the WARN_ON is not necessary and doesn't add any new information * tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) jbd2: fix deadlock in jbd2_journal_cancel_revoke() ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all() ext4: fix possible null-ptr-deref in mbt_kunit_exit() ext4: fix possible null-ptr-deref in extents_kunit_exit() ext4: fix the error handling process in extents_kunit_init). ext4: call deactivate_super() in extents_kunit_exit() ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init() ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access ext4: zero post-EOF partial block before appending write ext4: move pagecache_isize_extended() out of active handle ext4: remove ctime/mtime update from ext4_alloc_file_blocks() ext4: unify SYNC mode checks in fallocate paths ext4: ensure zeroed partial blocks are persisted in SYNC mode ext4: move zero partial block range functions out of active handle ext4: pass allocate range as loff_t to ext4_alloc_file_blocks() ext4: remove handle parameters from zero partial block functions ext4: move ordered data handling out of ext4_block_do_zero_range() ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range() ext4: factor out journalled block zeroing range ext4: rename and extend ext4_block_truncate_page() ...	2026-04-17 17:08:31 -07:00
Li Chen	4edafa81a1	jbd2: store jinode dirty range in PAGE_SIZE units jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Represent the dirty range end as an exclusive end page. This avoids a special sentinel value and keeps MAX_LFS_FILESIZE on 32-bit representable. Publish a new dirty range by updating end_page before start_page, and treat start_page >= end_page as empty in the accessor for robustness. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-5-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2026-04-09 10:52:35 -04:00
Jeff Layton	0b2600f81c	treewide: change inode->i_ino from unsigned long to u64 On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-03-06 14:31:28 +01:00
Linus Torvalds	32a92f8c89	Convert more 'alloc_obj' cases to default GFP_KERNEL arguments This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 20:03:00 -08:00
Linus Torvalds	bf4afc53b7	Convert 'alloc_obj' family to use the new default GFP_KERNEL argument This was done entirely with mindless brute force, using git grep -l '\<k[vmz]alloc_objs(., GFP_KERNEL)' \| xargs sed -i 's/$alloc_objs(.*$, GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-02-21 17:09:51 -08:00
Kees Cook	69050f8d6d	treewide: Replace kmalloc with kmalloc_obj for non-scalar types This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(PTR, FAM, COUNT, ...) (where TYPE may also be VAR) The resulting allocations no longer return "void ", instead returning "TYPE ". Signed-off-by: Kees Cook <kees@kernel.org>	2026-02-21 01:02:28 -08:00
Ye Bin	6abfe10789	jbd2: fix the inconsistency between checksum and data in memory for journal sb Copying the file system while it is mounted as read-only results in a mount failure: [~]# mkfs.ext4 -F /dev/sdc [~]# mount /dev/sdc -o ro /mnt/test [~]# dd if=/dev/sdc of=/dev/sda bs=1M [~]# mount /dev/sda /mnt/test1 [ 1094.849826] JBD2: journal checksum error [ 1094.850927] EXT4-fs (sda): Could not load journal inode mount: mount /dev/sda on /mnt/test1 failed: Bad message The process described above is just an abstracted way I came up with to reproduce the issue. In the actual scenario, the file system was mounted read-only and then copied while it was still mounted. It was found that the mount operation failed. The user intended to verify the data or use it as a backup, and this action was performed during a version upgrade. Above issue may happen as follows: ext4_fill_super set_journal_csum_feature_set(sb) if (ext4_has_metadata_csum(sb)) incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3; if (test_opt(sb, JOURNAL_CHECKSUM) jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat); lock_buffer(journal->j_sb_buffer); sb->s_feature_incompat \|= cpu_to_be32(incompat); //The data in the journal sb was modified, but the checksum was not updated, so the data remaining in memory has a mismatch between the data and the checksum. unlock_buffer(journal->j_sb_buffer); In this case, the journal sb copied over is in a state where the checksum and data are inconsistent, so mounting fails. To solve the above issue, update the checksum in memory after modifying the journal sb. Fixes: `4fd5ea43bc` ("jbd2: checksum journal superblock") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2025-11-26 17:05:47 -05:00
Wengang Wang	80d05f640a	jbd2: store more accurate errno in superblock when possible When jbd2_journal_abort() is called, the provided error code is stored in the journal superblock. Some existing calls hard-code -EIO even when the actual failure is not I/O related. This patch updates those calls to pass more accurate error codes, allowing the superblock to record the true cause of failure. This helps improve diagnostics and debugging clarity when analyzing journal aborts. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Message-ID: <20251031210501.7337-1-wen.gang.wang@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-11-26 17:05:39 -05:00
Tetsuo Handa	524c385383	jbd2: use a per-journal lock_class_key for jbd2_trans_commit_key syzbot is reporting possibility of deadlock due to sharing lock_class_key for jbd2_handle across ext4 and ocfs2. But this is a false positive, for one disk partition can't have two filesystems at the same time. Reported-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e493c165d26d6fcbf72 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <987110fc-5470-457a-a218-d286a09dd82f@I-love.SAKURA.ne.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2025-11-13 08:34:39 -05:00
Ingo Molnar	41cb08555c	treewide, timers: Rename from_timer() to timer_container_of() Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com	2025-06-08 09:07:37 +02:00
Eric Biggers	fff6f35b9b	jbd2: remove journal_t argument from jbd2_superblock_csum() Since jbd2_superblock_csum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-05-20 10:31:12 -04:00
Eric Biggers	76005718cf	jbd2: remove journal_t argument from jbd2_chksum() Since jbd2_chksum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-05-20 10:31:12 -04:00
Zhang Yi	d6bf294773	ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folio jbd2_journal_blocks_per_page() returns the number of blocks in a single page. Rename it to jbd2_journal_blocks_per_folio() and make it returns the number of blocks in the largest folio, preparing for the calculation of journal credits blocks when allocating blocks within a large folio in the writeback path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-05-20 10:31:12 -04:00
Harshad Shirwadkar	857d32f261	ext4: rework fast commit commit path This patch reworks fast commit's commit path to remove locking the journal for the entire duration of a fast commit. Instead, we only lock the journal while marking all the eligible inodes as "committing". This allows handles to make progress in parallel with the fast commit. Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20250508175908.1004880-5-harshadshirwadkar@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-05-08 21:56:17 -04:00
Thomas Gleixner	8fa7292fee	treewide: Switch/rename to timer_delete[_sync]() timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2025-04-05 10:30:12 +02:00
Zhang Yi	aac45075f6	jbd2: add a missing data flush during file and fs synchronization When the filesystem performs file or filesystem synchronization (e.g., ext4_sync_file()), it queries the journal to determine whether to flush the file device through jbd2_trans_will_send_data_barrier(). If the target transaction has not started committing, it assumes that the journal will submit the flush command, allowing the filesystem to bypass a redundant flush command. However, this assumption is not always valid. If the journal is not located on the filesystem device, the journal commit thread will not submit the flush command unless the variable ->t_need_data_flush is set to 1. Consequently, the flush may be missed, and data may be lost following a power failure or system crash, even if the synchronization appears to succeed. Unfortunately, we cannot determine with certainty whether the target transaction will flush to the filesystem device before it commits. However, if it has not started committing, it must be the running transaction. Therefore, fix it by always set its t_need_data_flush to 1, ensuring that the committing thread will flush the filesystem device. Fixes: `bbd2be3691` ("jbd2: Add function jbd2_trans_will_send_data_barrier()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20241206111327.4171337-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-21 00:59:28 -04:00
Zhang Yi	18aba2adb3	jbd2: fix off-by-one while erasing journal In __jbd2_journal_erase(), the block_stop parameter includes the last block of a contiguous region; however, the calculation of byte_stop is incorrect, as it does not account for the bytes in that last block. Consequently, the page cache is not cleared properly, which occasionally causes the ext4/050 test to fail. Since block_stop operates on inclusion semantics, it involves repeated increments and decrements by 1, significantly increasing the complexity of the calculations. Optimize the calculation and fix the incorrect byte_stop by make both block_stop and byte_stop to use exclusion semantics. This fixes a failure in fstests ext4/050. Fixes: `01d5d96542` ("ext4: add discard/zeroout flags to journal flush") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250217065955.3829229-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Matthew Wilcox (Oracle)	08be56fec0	ext4: remove references to bh->b_page Buffer heads are attached to folios, not to pages. Also flush_dcache_page() is now deprecated in favour of flush_dcache_folio(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250213182303.2133205-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-18 00:15:25 -04:00
Eric Biggers	f6fc1584f5	jbd2: remove redundant function jbd2_journal_has_csum_v2or3_feature Since commit `dd348f054b` ("jbd2: switch to using the crc32c library"), jbd2_journal_has_csum_v2or3() and jbd2_journal_has_csum_v2or3_feature() are the same. Remove jbd2_journal_has_csum_v2or3_feature() and just keep jbd2_journal_has_csum_v2or3(). Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/20250207031424.42755-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Jan Kara	e6eff39dd0	jbd2: remove wrong sb->s_sequence check Journal emptiness is not determined by sb->s_sequence == 0 but rather by sb->s_start == 0 (which is set a few lines above). Furthermore 0 is a valid transaction ID so the check can spuriously trigger. Remove the invalid WARN_ON. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250206094657.20865-3-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2025-03-17 11:19:41 -04:00
Eric Biggers	dd348f054b	jbd2: switch to using the crc32c library Now that the crc32c() library function directly takes advantage of architecture-specific optimizations, it is unnecessary to go through the crypto API. Just use crc32c(). This is much simpler, and it improves performance due to eliminating the crypto API overhead. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20241202010844.144356-18-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>	2024-12-01 17:23:02 -08:00
Daniel Martín Gómez	3e7c69cdb0	jbd2: Fix comment describing journal_init_common() The code indicates that journal_init_common() fills the journal_t object it returns while the comment incorrectly states that only a few fields are initialised. Also, the comment claims that journal structures could be created from scratch which isn't possible as journal_init_common() calls journal_load_superblock() which loads and checks journal superblock from disk. Signed-off-by: Daniel Martín Gómez <dalme@riseup.net> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241107144538.3544-1-dalme@riseup.net Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-11-13 12:56:48 -05:00
Zhihao Cheng	abe1ac7ca8	jbd2: make b_frozen_data allocation always succeed The b_frozen_data allocation should not be failed during journal committing process, otherwise jbd2 will abort. Since commit 490c1b444ce653d("jbd2: do not fail journal because of frozen_buffer allocation failure") already added '__GFP_NOFAIL' flag in do_get_write_access(), just add '__GFP_NOFAIL' flag for all allocations in jbd2_journal_write_metadata_buffer(), like 'new_bh' allocation does. Besides, remove all error handling branches for do_get_write_access(). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20241012085530.2147846-1-chengzhihao@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-11-12 23:54:15 -05:00
Kemeng Shi	6140ceb9b2	jbd2: remove unneeded check of ret in jbd2_fc_get_buf Simply return -EINVAL if j_fc_off is invalid to avoid repeated check of ret. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	1862304b06	jbd2: correct comment jbd2_mark_journal_empty After jbd2_mark_journal_empty, journal log is supposed to be empty. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	f47aa3ebe3	jbd2: move escape handle to futher improve jbd2_journal_write_metadata_buffer Move escape handle to futher improve code readability and remove some repeat check. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	7c48e7d5a1	jbd2: remove unneeded done_copy_out variable in jbd2_journal_write_metadata_buffer It's more intuitive to use jh_in->b_frozen_data directly instead of done_copy_out variable. Simply remove unneeded done_copy_out variable and use b_frozen_data instead. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240801013815.2393869-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	debbfd991f	jbd2: remove unneeded kmap for jh_in->b_frozen_data in jbd2_journal_write_metadata_buffer Remove kmap for page of b_frozen_data from jbd2_alloc() which always provides an address from the direct kernel mapping. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	fa10db138d	jbd2: remove unused return value of jbd2_fc_release_bufs Remove unused return value of jbd2_fc_release_bufs. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	f2917bda8a	jbd2: remove dead check in journal_alloc_journal_head We will alloc journal_head with __GFP_NOFAIL anyway, test for failure is pointless. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240801013815.2393869-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Kemeng Shi	f0e3c14802	jbd2: correctly compare tids with tid_geq function in jbd2_fc_begin_commit Use tid_geq to compare tids to work over sequence number wraps. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Cc: stable@kernel.org Link: https://patch.msgid.link/20240801013815.2393869-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-08-26 23:49:15 -04:00
Luis Henriques (SUSE)	6db3c1575a	ext4: fix fast commit inode enqueueing during a full journal commit When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org	2024-08-26 21:21:10 -04:00
Jan Kara	a794c9ad02	jbd2: increase maximum transaction size Originally, we were quite conservative in limiting maximum transaction size to a quarter of the journal because we were not accounting transaction descriptor and revoke blocks. These days we do properly account them and reserve space for them from the total transaction credits. Thus there's no need to be so conservative and we can increase the maximum transaction size to one third of the journal (even half should work fine in principle but the performance will likely suffer in that case). This also fixes failures to grow filesystems with tiny journals. Link: CA+hUFcuGs04JHZ_WzA1zGN57+ehL2qmHOt5a7RMpo+rv6Vyxtw@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240701132800.7158-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	1cf5b024a3	jbd2: drop pointless shrinker batch initialization In jbd2_journal_init_common() we set batch size of a shrinker shrinking checkpointed buffers to journal->j_max_transaction_buffers. But that is guaranteed to be 0 at that point so we effectively stay with the default shrinker batch size of 128. It has been like this since introduction of jbd2 shrinkers so just drop the pointless initialization. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-4-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	e3a00a2378	jbd2: precompute number of transaction descriptor blocks Instead of computing the number of descriptor blocks a transaction can have each time we need it (which is currently when starting each transaction but will become more frequent later) precompute the number once during journal initialization together with maximum transaction size. We perform the precomputation whenever journal feature set is updated similarly as for computation of journal->j_revoke_records_per_block. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-2-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jan Kara	4aa99c71e4	jbd2: make jbd2_journal_get_max_txn_bufs() internal There's no reason to have jbd2_journal_get_max_txn_bufs() public function. Currently all users are internal and can use journal->j_max_transaction_buffers instead. This saves some unnecessary recomputations of the limit as a bonus which becomes important as this function gets more complex in the following patch. CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20240624170127.3253-1-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-08 23:59:37 -04:00
Jeff Johnson	89a8718cef	jbd2: add missing MODULE_DESCRIPTION() Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/jbd2/jbd2.o Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Link: https://patch.msgid.link/20240526-md-fs-jbd2-v1-1-7bba6665327d@quicinc.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-05 13:26:19 -04:00
Zhang Yi	7c73ddb758	jbd2: speed up jbd2_transaction_committed() jbd2_transaction_committed() is used to check whether a transaction with the given tid has already committed, it holds j_state_lock in read mode and check the tid of current running transaction and committing transaction, but holding the j_state_lock is expensive. We have already stored the sequence number of the most recently committed transaction in journal t->j_commit_sequence, we could do this check by comparing it with the given tid instead. If the given tid isn't smaller than j_commit_sequence, we can ensure that the given transaction has been committed. That way we could drop the expensive lock and achieve about 10% ~ 20% performance gains in concurrent DIOs on may virtual machine with 100G ramdisk. fio -filename=/mnt/foo -direct=1 -iodepth=10 -rw=$rw -ioengine=libaio \ -bs=4k -size=10G -numjobs=10 -runtime=60 -overwrite=1 -name=test \ -group_reporting Before: overwrite IOPS=88.2k, BW=344MiB/s read IOPS=95.7k, BW=374MiB/s rand overwrite IOPS=98.7k, BW=386MiB/s randread IOPS=102k, BW=397MiB/s After: overwrite IOPS=105k, BW=410MiB/s read IOPS=112k, BW=436MiB/s rand overwrite IOPS=104k, BW=404MiB/s randread IOPS=111k, BW=432MiB/s CC: Dave Chinner <david@fromorbit.com> Suggested-by: Dave Chinner <david@fromorbit.com> Link: https://lore.kernel.org/linux-ext4/ZjILCPNZRHeazSqV@dread.disaster.area/ Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://patch.msgid.link/20240520131831.2910790-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-07-05 12:45:36 -04:00
Kemeng Shi	b07855348b	jbd2: remove unnecessary "should_sleep" in kjournald2 We only need to sleep if no running transaction is expired. Simply remove unnecessary "should_sleep". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-10-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	136d3e0703	jbd2: remove dead check of JBD2_UNMOUNT in kjournald2 We always set JBD2_UNMOUNT with j_state_lock held in journal_kill_thread. In kjournald2, we check JBD2_UNMOUNT flag two times under the same j_state_lock. Then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-9-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	d5c545735a	jbd2: remove dead equality check of j_commit_[sequence/request] in kjournald2 The j_commit_[sequence/request] are updated with j_state_lock held during runtime. In kjournald2, two equality checks of j_commit_[sequence/request] are under the same j_state_lock, then the second check is unnecessary. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-8-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	4eb9bd13eb	jbd2: use bh_in instead of jh2bh(jh_in) to simplify code We save jh2bh(jh_in) to bh_in, so use bh_in directly instead of jh2bh(jh_in) to simplify the code. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-7-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	daabedd664	jbd2: remove unneeded kmap to do escape in jbd2_journal_write_metadata_buffer The data to do escape could be accessed directly from b_frozen_data, just remove unneeded kmap. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-6-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	4c15129aaa	jbd2: jump to new copy_done tag when b_frozen_data is created concurrently If b_frozen_data is created concurrently, we can update new_folio and new_offset with b_frozen_data and then move forward Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-5-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	5dd3e8c075	jbd2: remove unnedded "need_copy_out" in jbd2_journal_write_metadata_buffer As we only need to copy out when we should do escape, need_copy_out could be simply replaced by "do_escape". Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-4-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	abe48a5225	jbd2: remove unused return info from jbd2_journal_write_metadata_buffer The done_copy_out info from jbd2_journal_write_metadata_buffer is not used. Simply remove it. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-3-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Kemeng Shi	cc102aa246	jbd2: avoid memleak in jbd2_journal_write_metadata_buffer The new_bh is from alloc_buffer_head, we should call free_buffer_head to free it in error case. Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240514112438.1269037-2-shikemeng@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-06-27 10:20:26 -04:00
Al Viro	224941e837	use ->bd_mapping instead of ->bd_inode->i_mapping Just the low-hanging fruit... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240411145346.2516848-2-viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>	2024-05-03 02:36:51 -04:00
Zhihao Cheng	62ec1707cb	jbd2: replace journal state flag by checking errseq Now JBD2 detects metadata writeback error of fs dev according to errseq. Replace journal state flag by checking errseq. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-3-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00
Zhihao Cheng	990b6b5b13	jbd2: add errseq to detect client fs's bdev writeback error Add errseq in journal, so that JBD2 can detect whether metadata is successfully written to fs bdev. This patch adds detection in recovery process to replace original solution(using local variable wb_err). Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20231213013224.2100050-2-chengzhihao1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>	2024-01-04 23:42:21 -05:00

1 2 3 4 5 ...

343 Commits