linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Linus Torvalds	7c8a4671dc	Merge tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree(). This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms. This accompanies OPEN_TREE_NAMESPACE and avoids a needless detour via OPEN_TREE_NAMESPACE to get the same effect. Will be especially useful when you mount an actual filesystem to be used as the container rootfs. - Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down. This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced: - CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare() Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there. - Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to switch out the rootfs without pivot_root(2). The traditional approach to switching the rootfs involves pivot_root(2) or a chroot_fs_refs()-based mechanism that atomically updates fs->root for all tasks sharing the same fs_struct. This has consequences for fork(), unshare(CLONE_FS), and setns(). This series instead decomposes root-switching into individually atomic, locally-scoped steps: fd_tree = open_tree(-EBADF, "/newroot", OPEN_TREE_CLONE \| OPEN_TREE_CLOEXEC); fchdir(fd_tree); move_mount(fd_tree, "", AT_FDCWD, "/", MOVE_MOUNT_BENEATH \| MOVE_MOUNT_F_EMPTY_PATH); chroot("."); umount2(".", MNT_DETACH); Since each step only modifies the caller's own state, the fork/unshare/setns races are eliminated by design. A key step to making this possible is to remove the locked mount restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting beneath a mount that is locked. The locked mount protects the underlying mount from being revealed. This is a core mechanism of unshare(CLONE_NEWUSER \| CLONE_NEWNS). The mounts in the new mount namespace become locked. That effectively makes the new mount table useless as the caller cannot ever get rid of any of the mounts no matter how useless they are. We can lift this restriction though. We simply transfer the locked property from the top mount to the mount beneath. This works because what we care about is to protect the underlying mount aka the parent. The mount mounted between the parent and the top mount takes over the job of protecting the parent mount from the top mount mount. This leaves us free to remove the locked property from the top mount which can consequently be unmounted: unshare(CLONE_NEWUSER \| CLONE_NEWNS) and we inherit a clone of procfs on /proc then currently we cannot unmount it as: umount -l /proc will fail with EINVAL because the procfs mount is locked. After this series we can now do: mount --beneath -t tmpfs tmpfs /proc umount -l /proc after which a tmpfs mount has been placed beneath the procfs mount. The tmpfs mount has become locked and the procfs mount has become unlocked. This means you can safely modify an inherited mount table after unprivileged namespace creation. Afterwards we simply make it possible to move a mount beneath the rootfs allowing to upgrade the rootfs. Removing the locked restriction makes this very useful for containers created with unshare(CLONE_NEWUSER \| CLONE_NEWNS) to reshuffle an inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to switch out the rootfs instead of using the costly pivot_root(2). * tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/namespaces: remove unused utils.h include from listns_efault_test selftests/fsmount_ns: add missing TARGETS and fix cap test selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment selftests/empty_mntns: fix statmount_alloc() signature mismatch selftests/statmount: remove duplicate wait_for_pid() mount: always duplicate mount selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests move_mount: allow MOVE_MOUNT_BENEATH on the rootfs move_mount: transfer MNT_LOCKED selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces selftests: add FSMOUNT_NAMESPACE tests selftests/statmount: add statmount_alloc() helper tools: update mount.h header mount: add FSMOUNT_NAMESPACE mount: simplify __do_loopback() mount: start iterating from start of rbtree	2026-04-14 19:59:25 -07:00
Linus Torvalds	91a4855d6c	Merge tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...	2026-04-14 18:36:10 -07:00
Linus Torvalds	fabd5a8d24	Merge tag 'x86_cache_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: - Add return value descriptions to several internal functions, addressing kernel-doc complaints - Add the x86 maintainer mailing list to the resctrl section so they are automatically included in patch submissions, and reference the applicable contribution rules document - Allow users to apply a single Capacity Bitmask to all cache domains at once using '' as a shorthand, instead of having to specify each domain individually. This is particularly user-friendly on high core-count systems with many cache clusters - When a user provides a non-existent domain ID while configuring cache allocation, ensure the failure reason is properly reported to the user rather than silently returning an error with a misleading "ok" status tag 'x86_cache_for_v7.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Add missing return value descriptions MAINTAINERS: Update resctrl entry fs/resctrl: Add "*" shorthand to set io_alloc CBM for all domains fs/resctrl: Report invalid domain ID when parsing io_alloc_cbm	2026-04-14 14:46:37 -07:00
Christian Brauner	ad4999496e	mount: always duplicate mount In the OPEN_TREE_NAMESPACE path vfs_open_tree() resolves a path via filename_lookup() without holding namespace_lock. Between the lookup and create_new_namespace() acquiring namespace_lock via LOCK_MOUNT_EXACT_COPY() another thread can unmount the mount, setting mnt->mnt_ns to NULL. When create_new_namespace() then checks !mnt->mnt_ns it incorrectly takes the swap-and-mntget path that was designed for fsmount()'s detached mounts. This reuses a mount whose mnt_mp_list is in an inconsistent state from the concurrent unmount, causing a general protection fault in __umount_mnt() -> hlist_del_init(&mnt->mnt_mp_list) during namespace teardown. Remove the !mnt->mnt_ns special case entirely. Instead, always duplicate the mount: - For OPEN_TREE_NAMESPACE use __do_loopback() which will properly clone the mount or reject it via may_copy_tree() if it was unmounted in the race window. - For fsmount() use clone_mnt() directly (via the new MOUNT_COPY_NEW flag) since the mount is freshly created by vfs_create_mount() and not in any namespace so __do_loopback()'s IS_MNT_UNBINDABLE, may_copy_tree, and __has_locked_children checks don't apply. Reported-by: syzbot+e4470cc28308f2081ec8@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-14 09:30:15 +02:00
Linus Torvalds	4793dae01f	Merge tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core Pull driver core updates from Danilo Krummrich: "debugfs: - Fix NULL pointer dereference in debugfs_create_str() - Fix misplaced EXPORT_SYMBOL_GPL for debugfs_create_str() - Fix soundwire debugfs NULL pointer dereference from uninitialized firmware_file device property: - Make fwnode flags modifications thread safe; widen the field to unsigned long and use set_bit() / clear_bit() based accessors - Document how to check for the property presence devres: - Separate struct devres_node from its "subclasses" (struct devres, struct devres_group); give struct devres_node its own release and free callbacks for per-type dispatch - Introduce struct devres_action for devres actions, avoiding the ARCH_DMA_MINALIGN alignment overhead of struct devres - Export struct devres_node and its init/add/remove/dbginfo primitives for use by Rust Devres<T> - Fix missing node debug info in devm_krealloc() - Use guard(spinlock_irqsave) where applicable; consolidate unlock paths in devres_release_group() driver_override: - Convert PCI, WMI, vdpa, s390/cio, s390/ap, and fsl-mc to the generic driver_override infrastructure, replacing per-bus driver_override strings, sysfs attributes, and match logic; fixes a potential UAF from unsynchronized access to driver_override in bus match() callbacks - Simplify __device_set_driver_override() logic kernfs: - Send IN_DELETE_SELF and IN_IGNORED inotify events on kernfs file and directory removal - Add corresponding selftests for memcg platform: - Allow attaching software nodes when creating platform devices via a new 'swnode' field in struct platform_device_info - Add kerneldoc for struct platform_device_info software node: - Move software node initialization from postcore_initcall() to driver_init(), making it available early in the boot process - Move kernel_kobj initialization (ksysfs_init) earlier to support the above - Remove software_node_exit(); dead code in a built-in unit SoC: - Introduce of_machine_read_compatible() and of_machine_read_model() OF helpers and export soc_attr_read_machine() to replace direct accesses to of_root from SoC drivers; also enables CONFIG_COMPILE_TEST coverage for these drivers sysfs: - Constify attribute group array pointers to 'const struct attribute_group const ' in sysfs functions, device_add_groups() / device_remove_groups(), and struct class Rust: - Devres: - Embed struct devres_node directly in Devres<T> instead of going through devm_add_action(), avoiding the extra allocation and the unnecessary ARCH_DMA_MINALIGN alignment - I/O: - Turn IoCapable from a marker trait into a functional trait carrying the raw I/O accessor implementation (io_read / io_write), providing working defaults for the per-type Io methods - Add RelaxedMmio wrapper type, making relaxed accessors usable in code generic over the Io trait - Remove overloaded per-type Io methods and per-backend macros from Mmio and PCI ConfigSpace - I/O (Register): - Add IoLoc trait and generic read/write/update methods to the Io trait, making I/O operations parameterizable by typed locations - Add register! macro for defining hardware register types with typed bitfield accessors backed by Bounded values; supports direct, relative, and array register addressing - Add write_reg() / try_write_reg() and LocatedRegister trait - Update PCI sample driver to demonstrate the register! macro Example: ``` register! { /// UART control register. CTRL(u32) @ 0x18 { /// Receiver enable. 19:19 rx_enable => bool; /// Parity configuration. 14:13 parity ?=> Parity; } /// FIFO watermark and counter register. WATER(u32) @ 0x2c { /// Number of datawords in the receive FIFO. 26:24 rx_count; /// RX interrupt threshold. 17:16 rx_water; } } impl WATER { fn rx_above_watermark(&self) -> bool { self.rx_count() > self.rx_water() } } fn init(bar: &pci::Bar<BAR0_SIZE>) { let water = WATER::zeroed() .with_const_rx_water::<1>(); // > 3 would not compile bar.write_reg(water); let ctrl = CTRL::zeroed() .with_parity(Parity::Even) .with_rx_enable(true); bar.write_reg(ctrl); } fn handle_rx(bar: &pci::Bar<BAR0_SIZE>) { if bar.read(WATER).rx_above_watermark() { // drain the FIFO } } fn set_parity(bar: &pci::Bar<BAR0_SIZE>, parity: Parity) { bar.update(CTRL, \|r\| r.with_parity(parity)); } ``` - IRQ: - Move 'static bounds from where clauses to trait declarations for IRQ handler traits - Misc: - Enable the generic_arg_infer Rust feature - Extend Bounded with shift operations, single-bit bool conversion, and const get() Misc: - Make deferred_probe_timeout default a Kconfig option - Drop auxiliary_dev_pm_ops; the PM core falls back to driver PM callbacks when no bus type PM ops are set - Add conditional guard support for device_lock() - Add ksysfs.c to the DRIVER CORE MAINTAINERS entry - Fix kernel-doc warnings in base.h - Fix stale reference to memory_block_add_nid() in documentation" * tag 'driver-core-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core: (67 commits) bus: fsl-mc: use generic driver_override infrastructure s390/ap: use generic driver_override infrastructure s390/cio: use generic driver_override infrastructure vdpa: use generic driver_override infrastructure platform/wmi: use generic driver_override infrastructure PCI: use generic driver_override infrastructure driver core: make software nodes available earlier software node: remove software_node_exit() kernel: ksysfs: initialize kernel_kobj earlier MAINTAINERS: add ksysfs.c to the DRIVER CORE entry drivers/base/memory: fix stale reference to memory_block_add_nid() device property: Document how to check for the property presence soundwire: debugfs: initialize firmware_file to empty string debugfs: fix placement of EXPORT_SYMBOL_GPL for debugfs_create_str() debugfs: check for NULL pointer in debugfs_create_str() driver core: Make deferred_probe_timeout default a Kconfig option driver core: simplify __device_set_driver_override() clearing logic driver core: auxiliary bus: Drop auxiliary_dev_pm_ops device property: Make modifications of fwnode "flags" thread safe rust: devres: embed struct devres_node directly ...	2026-04-13 19:03:11 -07:00
Linus Torvalds	613b48bbd4	Merge tag 'execve-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull execve updates from Kees Cook: - use strnlen() in __set_task_comm (Thorsten Blum) - update task_struct->comm comment (Thorsten Blum) * tag 'execve-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: sched: update task_struct->comm comment exec: use strnlen() in __set_task_comm	2026-04-13 17:41:36 -07:00
Linus Torvalds	cae0d23288	Merge tag 'pstore-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull pstore updates from Kees Cook: - fix ftrace dump when ECC is enabled (Andrey Skvortsov) - fix resource leak when ioremap() fails (Cole Leavitt) - Remove useless memblock header (Guilherme G. Piccoli) - Fix ECC parameter help text (Guilherme G. Piccoli) - Keep ftrace module parameter and debugfs switch in sync (Guilherme G. Piccoli) - Factor KASLR offset in the core kernel instruction addresses (Guilherme G. Piccoli) * tag 'pstore-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: pstore/ftrace: Factor KASLR offset in the core kernel instruction addresses pstore/ftrace: Keep ftrace module parameter and debugfs switch in sync pstore/ram: fix resource leak when ioremap() fails pstore/ramoops: Fix ECC parameter help text pstore/ramoops: Remove useless memblock header pstore: fix ftrace dump, when ECC is enabled	2026-04-13 17:39:08 -07:00
Linus Torvalds	9932f00bf4	Merge tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux Pull fscrypt updates from Eric Biggers: - Various cleanups for the interface between fs/crypto/ and filesystems, from Christoph Hellwig - Simplify and optimize the implementation of v1 key derivation by using the AES library instead of the crypto_skcipher API * tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/linux: fscrypt: use AES library for v1 key derivation ext4: use a byte granularity cursor in ext4_mpage_readpages fscrypt: pass a real sector_t to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range fscrypt: pass a byte offset to fscrypt_zeroout_range fscrypt: pass a byte length to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_zeroout_range_inline_crypt fscrypt: pass a byte offset to fscrypt_set_bio_crypt_ctx fscrypt: pass a byte offset to fscrypt_mergeable_bio fscrypt: pass a byte offset to fscrypt_generate_dun fscrypt: move fscrypt_set_bio_crypt_ctx_bh to buffer.c ext4, fscrypt: merge fscrypt_mergeable_bio_bh into io_submit_need_new_bio ext4: factor out a io_submit_need_new_bio helper ext4: open code fscrypt_set_bio_crypt_ctx_bh ext4: initialize the write hint in io_submit_init_bio	2026-04-13 17:29:12 -07:00
Linus Torvalds	81dc1e4d32	Merge tag 'v7.1-rc1-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6 Pull smb client updates from Steve French: - Fix EAs bounds check - Fix OOB read in symlink response parsing - Add support for creating tmpfiles - Minor debug improvement for mount failure - Minor crypto cleanup - Add missing module description - mount fix for lease vs. nolease - Add Metze as maintainer for smbdirect - Minor error mapping header cleanup - Improve search speed of SMB1 maperror - Fix potential null ptr ref in smb2 map error tests * tag 'v7.1-rc1-part1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (26 commits) smb: client: allow both 'lease' and 'nolease' mount options smb: client: get rid of d_drop()+d_add() smb: client: set ATTR_TEMPORARY with O_TMPFILE \| O_EXCL smb: client: add support for O_TMPFILE vfs: introduce d_mark_tmpfile_name() MAINTAINERS: create entry for smbdirect smb: client: add missing MODULE_DESCRIPTION() to smb1maperror_test smb: client: fix OOB reads parsing symlink error response smb: client: fix off-by-8 bounds check in check_wsl_eas() smb: client: Remove unnecessary selection of CRYPTO_ECB smb/client: move smb2maperror declarations to smb2proto.h smb/client: introduce KUnit tests to check DOS/SRV err mapping search smb/client: check if SMB1 DOS/SRV error mapping arrays are sorted smb/client: use binary search for SMB1 DOS/SRV error mapping smb/client: autogenerate SMB1 DOS/SRV to POSIX error mapping smb/client: annotate smberr.h with POSIX error codes smb/client: move ERRnetlogonNotStarted to DOS error class smb/client: introduce KUnit test to check ntstatus_to_dos_map search smb/client: check if ntstatus_to_dos_map is sorted smb/client: use binary search for NT status to DOS mapping ...	2026-04-13 17:09:00 -07:00
Linus Torvalds	0b0128e64a	Merge tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux Pull xfs updates from Carlos Maiolino: "There aren't any new features. The whole series is just a collection of bug fixes and code refactoring. There is some new information added a couple new tracepoints, new data added to mountstats, but no big changes" * tag 'xfs-merge-7.1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (41 commits) xfs: fix number of GC bvecs xfs: untangle the open zones reporting in mountinfo xfs: expose the number of open zones in sysfs xfs: reduce special casing for the open GC zone xfs: streamline GC zone selection xfs: refactor GC zone selection helpers xfs: rename xfs_zone_gc_iter_next to xfs_zone_gc_iter_irec xfs: put the open zone later xfs_open_zone_put xfs: add a separate tracepoint for stealing an open zone for GC xfs: delay initial open of the GC zone xfs: fix a resource leak in xfs_alloc_buftarg() xfs: handle too many open zones when mounting xfs: refactor xfs_mount_zones xfs: fix integer overflow in busy extent sort comparator xfs: fix integer overflow in deferred intent sort comparators xfs: fold xfs_setattr_size into xfs_vn_setattr_size xfs: remove a duplicate assert in xfs_setattr_size xfs: return default quota limits for IDs without a dquot xfs: start gc on zonegc_low_space attribute updates xfs: don't decrement the buffer LRU count for in-use buffers ...	2026-04-13 17:03:48 -07:00
Linus Torvalds	230fb3a33e	Merge tag 'erofs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs Pull erofs updates from Gao Xiang: - Validate xattr h_shared_count to report -EFSCORRUPTED explicitly for crafted images - Verify metadata accesses for file-backed mounts via rw_verify_area() - Fix FS_IOC_GETFSLABEL to include the trailing NUL byte, consistent with ext4 and xfs - Properly handle 48-bit on-disk blocks/uniaddr for extra devices - Fix an index underflow in the LZ4 in-place decompression that can cause out-of-bounds accesses with crafted images - Minor fixes and cleanups * tag 'erofs-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs: erofs: error out obviously illegal extents in advance erofs: clean up encoded map flags erofs: fix unsigned underflow in z_erofs_lz4_handle_overlap() erofs: handle 48-bit blocks/uniaddr for extra devices erofs: include the trailing NUL in FS_IOC_GETFSLABEL erofs: ensure all folios are managed in erofs_try_to_free_all_cached_folios() erofs: verify metadata accesses for file-backed mounts erofs: harden h_shared_count in erofs_init_inode_xattrs()	2026-04-13 16:59:19 -07:00
Linus Torvalds	a62fe21079	Merge tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat Pull exfat updates from Namjae Jeon: - Implement FALLOC_FL_ALLOCATE_RANGE to add support for preallocating clusters without zeroing, helping to reduce file fragmentation - Add a unified block readahead helper for FAT chain conversion, bitmap allocation, and directory entry lookups - Optimize exfat_chain_cont_cluster() by caching buffer heads to minimize mark_buffer_dirty() and mirroring overhead during NO_FAT_CHAIN to FAT_CHAIN conversion - Switch to truncate_inode_pages_final() in evict_inode() to prevent BUG_ON caused by shadow entries during reclaim - Fix a 32-bit truncation bug in directory entry calculations by ensuring proper bitwise coercion - Fix sb->s_maxbytes calculation to correctly reflect the maximum possible volume size for a given cluster size, resolving xfstests generic/213 - Introduced exfat_cluster_walk() helper to traverse FAT chains by a specified step, handling both ALLOC_NO_FAT_CHAIN and ALLOC_FAT_CHAIN modes - Introduced exfat_chain_advance() helper to advance an exfat_chain structure, updating both the current cluster and remaining size - Remove dead assignments and fix Smatch warnings * tag 'exfat-for-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat: exfat: use exfat_chain_advance helper exfat: introduce exfat_chain_advance helper exfat: remove NULL cache pointer case in exfat_ent_get exfat: use exfat_cluster_walk helper exfat: introduce exfat_cluster_walk helper exfat: fix incorrect directory checksum after rename to shorter name exfat: fix s_maxbytes exfat: fix passing zero to ERR_PTR() in exfat_mkdir() exfat: fix error handling for FAT table operations exfat: optimize exfat_chain_cont_cluster with cached buffer heads exfat: drop redundant sec parameter from exfat_mirror_bh exfat: use readahead helper in exfat_get_dentry exfat: use readahead helper in exfat_allocate_bitmap exfat: add block readahead in exfat_chain_cont_cluster exfat: add fallocate FALLOC_FL_ALLOCATE_RANGE support exfat: Fix bitwise operation having different size exfat: Drop dead assignment of num_clusters exfat: use truncate_inode_pages_final() at evict_inode()	2026-04-13 16:57:31 -07:00
Linus Torvalds	f2729827ae	Merge tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2 Pull nilfs2 updates from Viacheslav Dubeyko: "This contains fixes of syzbot reported issues in NILFS2 functionality: - The DAT inode's btree node cache (i_assoc_inode) is initialized lazily during btree operations. However, nilfs_mdt_save_to_shadow_map() assumes i_assoc_inode is already initialized when copying dirty pages to the shadow map during GC. If NILFS_IOCTL_CLEAN_SEGMENTS is called immediately after mount before any btree operation has occurred on the DAT inode, i_assoc_inode is NULL leading to a general protection fault. Fix this by calling nilfs_attach_btree_node_cache() on the DAT inode in nilfs_dat_read() at mount time, ensuring i_assoc_inode is always initialized before any GC operation can use it (Deepanshu Kartikey) - nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks by comparing it with the current block number bd_blocknr. If they differ, the block is considered dead and skipped. A corrupted ioctl request with bd_oblocknr set to 0 causes the comparison to incorrectly match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing the dead block check and calling nilfs_bmap_mark() on a non- existent block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering the WARN_ON(ret == -ENOENT). Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the beginning of each iteration (Deepanshu Kartikey)" * tag 'nilfs2-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/nilfs2: nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty() nilfs2: fix NULL i_assoc_inode dereference in nilfs_mdt_save_to_shadow_map	2026-04-13 16:53:19 -07:00
Linus Torvalds	4d9981429a	Merge tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs Pull hfsplus updates from Viacheslav Dubeyko: "This contains several fixes of syzbot reported issues and HFS+ fixes of xfstests failures. - Fix a syzbot reported issue of a KMSAN uninit-value in hfsplus_strcasecmp(). The root cause was that hfs_brec_read() doesn't validate that the on-disk record size matches the expected size for the record type being read. The fix introduced hfsplus_brec_read_cat() wrapper that validates the record size based on the type field and returns -EIO if size doesn't match (Deepanshu Kartikey) - Fix a syzbot reported issue of processing corrupted HFS+ images where the b-tree allocation bitmap indicates that the header node (Node 0) is free. Node 0 must always be allocated. Violating this invariant leads to allocator corruption, which cascades into kernel panics or undefined behavior. Prevent trusting a corrupted allocator state by adding a validation check during hfs_btree_open(). If corruption is detected, print a warning identifying the specific corrupted tree and force the filesystem to mount read-only (SB_RDONLY). This prevents kernel panics from corrupted images while enabling data recovery (Shardul Bankar) - Fix a potential deadlock in hfsplus_fill_super(). hfsplus_fill_super() calls hfs_find_init() to initialize a search structure, which acquires tree->tree_lock. If the subsequent call to hfsplus_cat_build_key() fails, the function jumps to the out_put_root error label without releasing the lock. Fix this by adding the missing hfs_find_exit(&fd) call before jumping to the out_put_root error label. This ensures that tree->tree_lock is properly released on the error path (Zilin Guan) - Update a files ctime after rename in hfsplus_rename() (Yangtao Li) The rest of the patches introduce the HFS+ fixes for the case of generic/348, generic/728, generic/533, generic/523, and generic/642 test-cases of xfstests suite" * tag 'hfs-v7.1-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/vdubeyko/hfs: hfsplus: fix generic/642 failure hfsplus: rework logic of map nodes creation in xattr b-tree hfsplus: fix logic of alloc/free b-tree node hfsplus: fix error processing issue in hfs_bmap_free() hfsplus: fix potential race conditions in b-tree functionality hfsplus: extract hidden directory search into a helper function hfsplus: fix held lock freed on hfsplus_fill_super() hfsplus: fix generic/523 test-case failure hfsplus: validate b-tree node 0 bitmap at mount time hfsplus: refactor b-tree map page access and add node-type validation hfsplus: fix to update ctime after rename hfsplus: fix generic/533 test-case failure hfsplus: set ctime after setxattr and removexattr hfsplus: fix uninit-value by validating catalog record size hfsplus: fix potential Allocation File corruption after fsync	2026-04-13 16:50:38 -07:00
Linus Torvalds	f3756afb6f	Merge tag 'affs-for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull AFFS fix from David Sterba: "There's a potential out-of-bounds read in the directory hash table during readdir" * tag 'affs-for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: affs: bound hash_pos before table lookup in affs_readdir	2026-04-13 16:39:01 -07:00
Linus Torvalds	c92b4d3dd5	Merge tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs updates from David Sterba: "User visible changes: - move shutdown ioctl support out of experimental features, a forced stop of filesystem operation until the next unmount; additionally there's a super block operation to forcibly remove a device from under the filesystem that could lead to a shutdown or not if the redundancy allows that - report filesystem shutdown using fserror mechanism - tree-checker updates: - verify free space info, extent and bitmap items - verify remap-tree items and related data in block group items Performance improvements: - speed up clearing first extent in the tracked range (+10% throughput on sample workload) - reduce COW rewrites of extent buffers during the same transaction - avoid taking big device lock to update device stats during transaction commit - fix unnecessary flush on close when truncating empty files (observed in practice on a backup application) - prevent direct reclaim during compressed readahead to avoid stalls under memory pressure Notable fixes: - fix chunk allocation strategy on RAID1-like block groups with disproportionate device sizes, this could lead to ENOSPC due to skewed reservation estimates - adjust metadata reservation overcommit ratio to be less aggressive and also try to flush if possible, this avoids ENOSPC and potential transaction aborts in some edge cases (that are otherwise hard to reproduce) - fix silent IO error in encoded writes and ordered extent split in zoned mode, the error was not correctly propagated to the address space and could lead to zeroed ranges - don't mark inline files NOCOMPRESS unexpectedly, the intent was to do that for single block writes of regular files - fix deadlock between reflink and transaction commit when using flushoncommit - fix overly strict item check of a running dev-replace operation Core: - zoned mode space reservation fixes: - cap delayed refs metadata reservation to avoid overcommit - update logic to reclaim partially unusable zones - add another state to flush and reclaim partially used zone - limit number of zones reclaimed in one go to avoid blocking other operations - don't let log trees consume global reserve on overcommit and fall back to transaction commit - revalidate extent buffer when checking its up-to-date status - add self tests for zoned mode block group specifics - reduce atomic allocations in some qgroup paths - avoid unnecessary root node COW during snapshotting - start new transaction in block group relocation conditionally - faster check of NOCOW files on currently snapshotted root - change how compressed bio size is tracked from bio and reduce the structure size - new tracepoint for search slot restart tracking - checksum list manipulation improvements - type, parameter cleanups, refactoring - error handling improvements, transaction abort call adjustments" * tag 'for-7.1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (116 commits) btrfs: btrfs_log_dev_io_error() on all bio errors btrfs: fix silent IO error loss in encoded writes and zoned split btrfs: skip clearing EXTENT_DEFRAG for NOCOW ordered extents btrfs: use BTRFS_FS_UPDATE_UUID_TREE_GEN flag for UUID tree rescan check btrfs: remove duplicate journal_info reset on failure to commit transaction btrfs: tag as unlikely if statements that check for fs in error state btrfs: fix double free in create_space_info() error path btrfs: fix double free in create_space_info_sub_group() error path btrfs: do not reject a valid running dev-replace btrfs: only invalidate btree inode pages after all ebs are released btrfs: prevent direct reclaim during compressed readahead btrfs: replace BUG_ON() with error return in cache_save_setup() btrfs: zstd: don't cache sectorsize in a local variable btrfs: zlib: don't cache sectorsize in a local variable btrfs: zlib: drop redundant folio address variable btrfs: lzo: inline read/write length helpers btrfs: use common eb range validation in read_extent_buffer_to_user_nofault() btrfs: read eb folio index right before loops btrfs: rename local variable for offset in folio btrfs: unify types for binary search variables ...	2026-04-13 16:35:32 -07:00
Linus Torvalds	7fe6ac157b	Merge tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O copies between kernel and userspace by matching registered buffer PFNs at I/O time. Includes selftests. - Refactor bio integrity to support filesystem initiated integrity operations and arbitrary buffer alignment. - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast and slow paths. Add bio_await() and bio_submit_or_kill() helpers, unify synchronous bi_end_io callbacks. - Fix zone write plug refcount handling and plug removal races. Add support for serializing zone writes at QD=1 for rotational zoned devices, yielding significant throughput improvements. - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET command. - Add io_uring passthrough (uring_cmd) support to the BSG layer. - Replace pp_buf in partition scanning with struct seq_buf. - zloop improvements and cleanups. - drbd genl cleanup, switching to pre_doit/post_doit. - NVMe pull request via Keith: - Fabrics authentication updates - Enhanced block queue limits support - Workqueue usage updates - A new write zeroes device quirk - Tagset cleanup fix for loop device - MD pull requests via Yu Kuai: - Fix raid5 soft lockup in retry_aligned_read() - Fix raid10 deadlock with check operation and nowait requests - Fix raid1 overlapping writes on writemostly disks - Fix sysfs deadlock on array_state=clear - Proactive RAID-5 parity building with llbitmap, with write_zeroes_unmap optimization for initial sync - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops version mismatch fallback - Fix bcache use-after-free and uninitialized closure - Validate raid5 journal metadata payload size - Various cleanups - Various other fixes, improvements, and cleanups * tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits) ublk: fix tautological comparison warning in ublk_ctrl_reg_buf scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd() block: refactor blkdev_zone_mgmt_ioctl MAINTAINERS: update ublk driver maintainer email Documentation: ublk: address review comments for SHMEM_ZC docs ublk: allow buffer registration before device is started ublk: replace xarray with IDA for shmem buffer index allocation ublk: simplify PFN range loop in __ublk_ctrl_reg_buf ublk: verify all pages in multi-page bvec fall within registered range ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support xfs: use bio_await in xfs_zone_gc_reset_sync block: add a bio_submit_or_kill helper block: factor out a bio_await helper block: unify the synchronous bi_end_io callbacks xfs: fix number of GC bvecs selftests/ublk: add read-only buffer registration test selftests/ublk: add filesystem fio verify test for shmem_zc selftests/ublk: add hugetlbfs shmem_zc test for loop target selftests/ublk: add shared memory zero-copy test selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target ...	2026-04-13 15:51:31 -07:00
Linus Torvalds	3ba310f2a3	Merge tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: "We only have five patches in the LSM tree, but three of the five are for an important bugfix relating to overlayfs and the mmap() and mprotect() access controls for LSMs. Highlights below: - Fix problems with the mmap() and mprotect() LSM hooks on overlayfs As we are dealing with problems both in mmap() and mprotect() there are essentially two components to this fix, spread across three patches with all marked for stable. The simplest portion of the fix is the creation of a new LSM hook, security_mmap_backing_file(), that is used to enforce LSM mmap() access controls on backing files in the stacked/overlayfs case. The existing security_mmap_file() does not have visibility past the user file. You can see from the associated SELinux hook callback the code is fairly straightforward. The mprotect() fix is a bit more complicated as there is no way in the mprotect() code path to inspect both the user and backing files, and bolting on a second file reference to vm_area_struct wasn't really an option. The solution taken here adds a LSM security blob and associated hooks to the backing_file struct that LSMs can use to capture and store relevant information from the user file. While the necessary SELinux information is relatively small, a single u32, I expect other LSMs to require more than that, and a dedicated backing_file LSM blob provides a storage mechanism without negatively impacting other filesystems. I want to note that other LSMs beyond SELinux have been involved in the discussion of the fixes presented here and they are working on their own related changes using these new hooks, but due to other issues those patches will be coming at a later date. - Use kstrdup_const()/kfree_const() for securityfs symlink targets - Resolve a handful of kernel-doc warnings in cred.h" * tag 'lsm-pr-20260410' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: selinux: fix overlayfs mmap() and mprotect() access checks lsm: add backing_file LSM hooks fs: prepare for adding LSM blob to backing_file securityfs: use kstrdup_const() to manage symlink targets cred: fix kernel-doc warnings in cred.h	2026-04-13 15:17:28 -07:00
Linus Torvalds	ef3da345cc	Merge tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - coredump: add tracepoint for coredump events - fs: hide file and bfile caches behind runtime const machinery Fixes: - fix architecture-specific compat_ftruncate64 implementations - dcache: Limit the minimal number of bucket to two - fs/omfs: reject s_sys_blocksize smaller than OMFS_DIR_START - fs/mbcache: cancel shrink work before destroying the cache - dcache: permit dynamic_dname()s up to NAME_MAX Cleanups: - remove or unexport unused fs_context infrastructure - trivial ->setattr cleanups - selftests/filesystems: Assume that TIOCGPTPEER is defined - writeback: fix kernel-doc function name mismatch for wb_put_many() - autofs: replace manual symlink buffer allocation in autofs_dir_symlink - init/initramfs.c: trivial fix: FSM -> Finite-state machine - fs: remove stale and duplicate forward declarations - readdir: Introduce dirent_size() - fs: Replace user_access_{begin/end} by scoped user access - kernel: acct: fix duplicate word in comment - fs: write a better comment in step_into() concerning .mnt assignment - fs: attr: fix comment formatting and spelling issues" * tag 'vfs-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (28 commits) dcache: permit dynamic_dname()s up to NAME_MAX fs: attr: fix comment formatting and spelling issues fs: hide file and bfile caches behind runtime const machinery fs: write a better comment in step_into() concerning .mnt assignment proc: rename proc_notify_change to proc_setattr proc: rename proc_setattr to proc_nochmod_setattr affs: rename affs_notify_change to affs_setattr adfs: rename adfs_notify_change to adfs_setattr hfs: update comments on hfs_inode_setattr kernel: acct: fix duplicate word in comment fs: Replace user_access_{begin/end} by scoped user access readdir: Introduce dirent_size() coredump: add tracepoint for coredump events fs: remove do_sys_truncate fs: pass on FTRUNCATE_* flags to do_truncate fs: fix archiecture-specific compat_ftruncate64 fs: remove stale and duplicate forward declarations init/initramfs.c: trivial fix: FSM -> Finite-state machine autofs: replace manual symlink buffer allocation in autofs_dir_symlink fs/mbcache: cancel shrink work before destroying the cache ...	2026-04-13 14:20:11 -07:00
Linus Torvalds	07c3ef5822	Merge tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull clone and pidfs updates from Christian Brauner: "Add three new clone3() flags for pidfd-based process lifecycle management. CLONE_AUTOREAP: CLONE_AUTOREAP makes a child process auto-reap on exit without ever becoming a zombie. This is a per-process property in contrast to the existing auto-reap mechanism via SA_NOCLDWAIT or SIG_IGN for SIGCHLD which applies to all children of a given parent. Currently the only way to automatically reap children is to set SA_NOCLDWAIT or SIG_IGN on SIGCHLD. This is a parent-scoped property affecting all children which makes it unsuitable for libraries or applications that need selective auto-reaping of specific children while still being able to wait() on others. CLONE_AUTOREAP stores an autoreap flag in the child's signal_struct. When the child exits do_notify_parent() checks this flag and causes exit_notify() to transition the task directly to EXIT_DEAD. Since the flag lives on the child it survives reparenting: if the original parent exits and the child is reparented to a subreaper or init the child still auto-reaps when it eventually exits. This is cleaner than forcing the subreaper to get SIGCHLD and then reaping it. If the parent doesn't care the subreaper won't care. If there's a subreaper that would care it would be easy enough to add a prctl() that either just turns back on SIGCHLD and turns off auto-reaping or a prctl() that just notifies the subreaper whenever a child is reparented to it. CLONE_AUTOREAP can be combined with CLONE_PIDFD to allow the parent to monitor the child's exit via poll() and retrieve exit status via PIDFD_GET_INFO. Without CLONE_PIDFD it provides a fire-and-forget pattern. No exit signal is delivered so exit_signal must be zero. CLONE_THREAD and CLONE_PARENT are rejected: CLONE_THREAD because autoreap is a process-level property, and CLONE_PARENT because an autoreap child reparented via CLONE_PARENT could become an invisible zombie under a parent that never calls wait(). The flag is not inherited by the autoreap process's own children. Each child that should be autoreaped must be explicitly created with CLONE_AUTOREAP. CLONE_NNP: CLONE_NNP sets no_new_privs on the child at clone time. Unlike prctl(PR_SET_NO_NEW_PRIVS) which a process sets on itself, CLONE_NNP allows the parent to impose no_new_privs on the child at creation without affecting the parent's own privileges. CLONE_THREAD is rejected because threads share credentials. CLONE_NNP is useful on its own for any spawn-and-sandbox pattern but was specifically introduced to enable unprivileged usage of CLONE_PIDFD_AUTOKILL. CLONE_PIDFD_AUTOKILL: This flag ties a child's lifetime to the pidfd returned from clone3(). When the last reference to the struct file created by clone3() is closed the kernel sends SIGKILL to the child. A pidfd obtained via pidfd_open() for the same process does not keep the child alive and does not trigger autokill - only the specific struct file from clone3() has this property. This is useful for container runtimes, service managers, and sandboxed subprocess execution - any scenario where the child must die if the parent crashes or abandons the pidfd or just wants a throwaway helper process. CLONE_PIDFD_AUTOKILL requires both CLONE_PIDFD and CLONE_AUTOREAP. It requires CLONE_PIDFD because the whole point is tying the child's lifetime to the pidfd. It requires CLONE_AUTOREAP because a killed child with no one to reap it would become a zombie - the primary use case is the parent crashing or abandoning the pidfd so no one is around to call waitpid(). CLONE_THREAD is rejected because autokill targets a process not a thread. If CLONE_NNP is specified together with CLONE_PIDFD_AUTOKILL an unprivileged user may spawn a process that is autokilled. The child cannot escalate privileges via setuid/setgid exec after being spawned. If CLONE_PIDFD_AUTOKILL is specified without CLONE_NNP the caller must have have CAP_SYS_ADMIN in its user namespace" * tag 'vfs-7.1-rc1.pidfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests: check pidfd_info->coredump_code correctness pidfds: add coredump_code field to pidfd_info kselftest/coredump: reintroduce null pointer dereference selftests/pidfd: add CLONE_PIDFD_AUTOKILL tests selftests/pidfd: add CLONE_NNP tests selftests/pidfd: add CLONE_AUTOREAP tests pidfd: add CLONE_PIDFD_AUTOKILL clone: add CLONE_NNP clone: add CLONE_AUTOREAP	2026-04-13 13:27:11 -07:00
Linus Torvalds	fc825e513c	Merge tag 'vfs-7.1-rc1.bh.metadata' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs buffer_head updates from Christian Brauner: "This cleans up the mess that has accumulated over the years in metadata buffer_head tracking for inodes. It moves the tracking into dedicated structure in filesystem-private part of the inode (so that we don't use private_list, private_data, and private_lock in struct address_space), and also moves couple other users of private_data and private_list so these are removed from struct address_space saving 3 longs in struct inode for 99% of inodes" * tag 'vfs-7.1-rc1.bh.metadata' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) fs: Drop i_private_list from address_space fs: Drop mapping_metadata_bhs from address space ext4: Track metadata bhs in fs-private inode part minix: Track metadata bhs in fs-private inode part udf: Track metadata bhs in fs-private inode part fat: Track metadata bhs in fs-private inode part bfs: Track metadata bhs in fs-private inode part affs: Track metadata bhs in fs-private inode part ext2: Track metadata bhs in fs-private inode part fs: Provide functions for handling mapping_metadata_bhs directly fs: Switch inode_has_buffers() to take mapping_metadata_bhs fs: Make bhs point to mapping_metadata_bhs fs: Move metadata bhs tracking to a separate struct fs: Fold fsync_buffers_list() into sync_mapping_buffers() fs: Drop osync_buffers_list() kvm: Use private inode list instead of i_private_list fs: Remove i_private_data aio: Stop using i_private_data and i_private_lock hugetlbfs: Stop using i_private_data fs: Stop using i_private_data for metadata bh tracking ...	2026-04-13 12:46:42 -07:00
Linus Torvalds	2802f94072	Merge tag 'vfs-7.1-rc1.fat' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull FAT updates from Christian Brauner: "Minor fixes for the fat filesystem" * tag 'vfs-7.1-rc1.fat' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fat: fix stack frame size warnings in KUnit tests fat: add KUnit tests for timestamp conversion helpers	2026-04-13 12:40:26 -07:00
Linus Torvalds	b7d74ea0fd	Merge tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64	2026-04-13 12:19:01 -07:00
Linus Torvalds	0f00132132	Merge tag 'vfs-7.1-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs integrity updates from Christian Brauner: "This adds support to generate and verify integrity information (aka T10 PI) in the file system, instead of the automatic below the covers support that is currently used. The implementation is based on refactoring the existing block layer PI code to be reusable for this use case, and then adding relatively small wrappers for the file system use case. These are then used in iomap to implement the semantics, and wired up in XFS with a small amount of glue code. Compared to the baseline this does not change performance for writes, but increases read performance up to 15% for 4k I/O, with the benefit decreasing with larger I/O sizes as even the baseline maxes out the device quickly on my older enterprise SSD" * tag 'vfs-7.1-rc1.integrity' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: xfs: support T10 protection information iomap: support T10 protection information iomap: support ioends for buffered reads iomap: add a bioset pointer to iomap_read_folio_ops ntfs3: remove copy and pasted iomap code iomap: allow file systems to hook into buffered read bio submission iomap: only call into ->submit_read when there is a read_ctx iomap: pass the iomap_iter to ->submit_read iomap: refactor iomap_bio_read_folio_range block: pass a maxlen argument to bio_iov_iter_bounce block: add fs_bio_integrity helpers block: make max_integrity_io_size public block: prepare generation / verification helpers for fs usage block: add a bdev_has_integrity_csum helper block: factor out a bio_integrity_setup_default helper block: factor out a bio_integrity_action helper	2026-04-13 10:40:26 -07:00
Linus Torvalds	3383589700	Merge tag 'vfs-7.1-rc1.directory' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory updates from Christian Brauner: "Recently 'start_creating', 'start_removing', 'start_renaming' and related interfaces were added which combine the locking and the lookup. At that time many callers were changed to use the new interfaces. However there are still an assortment of places out side of the core vfs where the directory is locked explictly, whether with inode_lock() or lock_rename() or similar. These were missed in the first pass for an assortment of uninteresting reasons. This addresses the remaining places where explicit locking is used, and changes them to use the new interfaces, or otherwise removes the explicit locking. The biggest changes are in overlayfs. The other changes are quite simple, though maybe the cachefiles changes is the least simple of those" * tag 'vfs-7.1-rc1.directory' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: unexport lock_rename(), lock_rename_child(), unlock_rename() ovl: remove ovl_lock_rename_workdir() ovl: use is_subdir() for testing if one thing is a subdir of another ovl: change ovl_create_real() to get a new lock when re-opening created file. ovl: pass name buffer to ovl_start_creating_temp() cachefiles: change cachefiles_bury_object to use start_renaming_dentry() ovl: Simplify ovl_lookup_real_one() VFS: make lookup_one_qstr_excl() static. nfsd: switch purge_old() to use start_removing_noperm() selinux: Use simple_start_creating() / simple_done_creating() Apparmor: Use simple_start_creating() / simple_done_creating() libfs: change simple_done_creating() to use end_creating() VFS: move the start_dirop() kerndoc comment to before start_dirop() fs/proc: Don't lock root inode when creating "self" and "thread-self" VFS: note error returns in documentation for various lookup functions	2026-04-13 10:24:33 -07:00
Linus Torvalds	c8db08110c	Merge tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs xattr updates from Christian Brauner: "This reworks the simple_xattr infrastructure and adds support for user.* extended attributes on sockets. The simple_xattr subsystem currently uses an rbtree protected by a reader-writer spinlock. This series replaces the rbtree with an rhashtable giving O(1) average-case lookup with RCU-based lockless reads. This sped up concurrent access patterns on tmpfs quite a bit and it's an overall easy enough conversion to do and gets rid or rwlock_t. The conversion is done incrementally: a new rhashtable path is added alongside the existing rbtree, consumers are migrated one at a time (shmem, kernfs, pidfs), and then the rbtree code is removed. All three consumers switch from embedded structs to pointer-based lazy allocation so the rhashtable overhead is only paid for inodes that actually use xattrs. With this infrastructure in place the series adds support for user.* xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support from the underlying filesystem (e.g. tmpfs) but sockets in sockfs - that is everything created via socket() including abstract namespace AF_UNIX sockets - had no xattr support at all. The xattr_permission() checks are reworked to allow user.* xattrs on S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and 128KB total value size matching the limits already in use for kernfs. The practical motivation comes from several directions. systemd and GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus there are tools like dbus-monitor that can observe IPC traffic across the system but this only works because D-Bus has a central broker. For Varlink there is no broker and there is currently no way to identify which sockets speak Varlink. With user.* xattrs on sockets a service can label its socket with the IPC protocol it speaks (e.g., user.varlink=1) and an eBPF program can then selectively capture traffic on those sockets. Enumerating bound sockets via netlink combined with these xattr labels gives a way to discover all Varlink IPC entrypoints for debugging and introspection. Similarly, systemd-journald wants to use xattrs on the /dev/log socket for protocol negotiation to indicate whether RFC 5424 structured syslog is supported or whether only the legacy RFC 3164 format should be used. In containers these labels are particularly useful as high-privilege or more complicated solutions for socket identification aren't available. The series comes with comprehensive selftests covering path-based AF_UNIX sockets, sockfs socket operations, per-inode limit enforcement, and xattr operations across multiple address families (AF_INET, AF_INET6, AF_NETLINK, AF_PACKET)" * tag 'vfs-7.1-rc1.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/xattr: test xattrs on various socket families selftests/xattr: sockfs socket xattr tests selftests/xattr: path-based AF_UNIX socket xattr tests xattr: support extended attributes on sockets xattr,net: support limited amount of extended attributes on sockfs sockets xattr: move user limits for xattrs to generic infra xattr: switch xattr_permission() to switch statement xattr: add xattr_permission_error() xattr: remove rbtree-based simple_xattr infrastructure pidfs: adapt to rhashtable-based simple_xattrs kernfs: adapt to rhashtable-based simple_xattrs with lazy allocation shmem: adapt to rhashtable-based simple_xattrs with lazy allocation xattr: add rhashtable-based simple_xattr infrastructure xattr: add rcu_head and rhash_head to struct simple_xattr	2026-04-13 10:10:28 -07:00
Linus Torvalds	0e58e3f1c5	Merge tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs writeback updates from Christian Brauner: "This introduces writeback helper APIs and converts f2fs, gfs2 and nfs to stop accessing writeback internals directly" * tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfs: stop using writeback internals for WB_WRITEBACK accounting gfs2: stop using writeback internals for dirty_exceeded check f2fs: stop using writeback internals for dirty_exceeded checks writeback: prep helpers for dirty-limit and writeback accounting	2026-04-13 10:08:01 -07:00
Rajasi Mandal	4248ed1013	smb: client: allow both 'lease' and 'nolease' mount options Change the nolease mount option from fsparam_flag() to fsparam_flag_no() so that both 'lease' and 'nolease' are accepted as valid mount options. Previously, only 'nolease' was recognized. Passing 'lease' would fail with an unknown parameter error (or be silently ignored with 'sloppy'). With this change: - 'nolease' disables lease requests (same behavior as before) - 'lease' explicitly enables lease requests This also renames the enum value from Opt_nolease to Opt_lease and uses result.negated to set ctx->no_lease, which is the standard pattern used by other flag_no options in the cifs mount option parser. Signed-off-by: Rajasi Mandal <rajasimandal@microsoft.com> Reviewed-by: Meetakshi Setiya <msetiya@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-13 09:14:54 -05:00
Guilherme G. Piccoli	24b8f8dcb9	pstore/ftrace: Factor KASLR offset in the core kernel instruction addresses The pstore ftrace frontend works by purely collecting the instruction address, saving it on the persistent area through the backend and when the log is read, on next boot for example, the address is then resolved by using the regular printk symbol lookup (%pS for example). Problem: if we are running a relocatable kernel with KASLR enabled, this is a recipe for failure in the symbol resolution on next boots, since the addresses are offset'ed by the KASLR address. So, naturally the way to go is factor the KASLR address out of instruction address collection, and adding the fresh offset when resolving the symbol on future boots. Problem #2: modules also have varying addresses that float based on module base address and potentially the module ordering in memory, meaning factoring KASLR offset for them is useless. So, let's hereby only take KASLR offset into account for core kernel addresses, leaving module ones as is. And we have yet a 3rd complexity: not necessarily the check range for core kernel addresses holds true on future boots, since the module base address will vary. With that, the choice was to mark the addresses as being core vs module based on its MSB. And with that... ...we have the 4th challenge here: for some "simple" architectures, the CPU number is saved bit-encoded on the instruction pointer, to allow bigger timestamps - this is set through the PSTORE_CPU_IN_IP define for such architectures. Hence, the approach here is to skip such architectures (at least in a first moment). Finished? No. On top of all previous complexities, we have one extra pain point: kaslr_offset() is inlined and fully "resolved" at boot-time, after kernel decompression, through ELF relocation mechanism. Once the offset is known, it's patched to the kernel text area, wherever it is used. The mechanism, and its users, are only built-in - incompatible with module usage. Though there are possibly some hacks (as computing the offset using some kallsym lookup), the choice here is to restrict this optimization to the (hopefully common) case of CONFIG_PSTORE=y. TL;DR: let's factor KASLR offsets on pstore/ftrace for core kernel addresses, only when PSTORE is built-in and leaving module addresses out, as well as architectures that define PSTORE_CPU_IN_IP. Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Link: https://patch.msgid.link/20260410205848.2607169-1-gpiccoli@igalia.com Signed-off-by: Kees Cook <kees@kernel.org>	2026-04-10 23:59:41 -07:00
Paulo Alcantara	dc0325b0aa	smb: client: get rid of d_drop()+d_add() Replace d_drop()+d_add() in cifs_tmpfile() and cifs_create() with d_instantiate(), and in cifs_atomic_open() with d_splice_alias() if in-lookup, otherwise d_instantiate(). Reported-by: Al Viro <viro@zeniv.linux.org.uk> Closes: https://lore.kernel.org/r/20260408065719.GF3836593@ZenIV Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: NeilBrown <neilb@ownmail.net> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-10 20:18:45 -05:00
Paulo Alcantara	62e02084ab	smb: client: set ATTR_TEMPORARY with O_TMPFILE \| O_EXCL Set ATTR_TEMPORARY attribute on temporary delete-on-close files when O_EXCL is specified in conjunction with O_TMPFILE to let some servers cache as much data as possible and possibly never persist them into storage, thereby improving performance. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-10 11:25:35 -05:00
Paulo Alcantara	3e7d63037a	smb: client: add support for O_TMPFILE Implement O_TMPFILE support for SMB2+ in the CIFS client. Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-10 11:25:35 -05:00
Paulo Alcantara	30a59dddd6	vfs: introduce d_mark_tmpfile_name() CIFS requires O_TMPFILE dentries to have names of newly created delete-on-close files in the server so it can build full pathnames from the root of the share when performing operations on them. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Cc: David Howells <dhowells@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: linux-fsdevel@vger.kernel.org Cc: linux-cifs@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-10 11:25:34 -05:00
Venkat Rao Bagalkote	bc1a64d236	smb: client: add missing MODULE_DESCRIPTION() to smb1maperror_test On the latest linux-next following modpost warning is reported: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/smb/client/smb1maperror_test.o Add MODULE_DESCRIPTION() to the test module to fix the warning. Reviewed-by: Saket Kumar Bhaskar <skb99@linux.ibm.com> Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Signed-off-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-10 11:25:17 -05:00
Linus Torvalds	7c6c4ed80b	Merge tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: "The kernfs rbtree is keyed by (hash, ns, name) where the hash is seeded with the raw namespace pointer via init_name_hash(ns). The resulting hash values are exposed to userspace through readdir seek positions, and the pointer-based ordering in kernfs_name_compare() is observable through entry order. Switch from raw pointers to ns_common::ns_id for both hashing and comparison. A preparatory commit first replaces all const void * namespace parameters with const struct ns_common * throughout kernfs, sysfs, and kobject so the code can access ns->ns_id. Also compare the ns_id when hashes match in the rbtree to handle crafted collisions. Also fix eventpoll RCU grace period issue and a cachefiles refcount problem" * tag 'vfs-7.0-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: kernfs: make directory seek namespace-aware kernfs: use namespace id instead of pointer for hashing and comparison kernfs: pass struct ns_common instead of const void * for namespace tags eventpoll: defer struct eventpoll free to RCU grace period cachefiles: fix incorrect dentry refcount in cachefiles_cull()	2026-04-10 08:40:49 -07:00
Gao Xiang	a5242d37c8	erofs: error out obviously illegal extents in advance Detect some corrupted extent cases during metadata parsing rather than letting them result in harmless decompression failures later: - For full-reference compressed extents, the compressed size must not exceed the decompressed size, which is a strict on-disk layout constraint; - For plain (shifted/interlaced) extents, the decoded size must not exceed the encoded size, even accounting for partial decoding. Both ways work but it should be better to report illegal extents as metadata layout violations rather than deferring as decompression failure. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-04-10 16:53:39 +08:00
Gao Xiang	5c40d2e9e3	erofs: clean up encoded map flags - Remove EROFS_MAP_ENCODED since it was always set together with EROFS_MAP_MAPPED for compressed extents and checked redundantly; - Replace the EROFS_MAP_FULL_MAPPED flag with the opposite EROFS_MAP_PARTIAL_MAPPED flag so that extents are implicitly fully mapped initially to simplify the logic; - Make fragment extents independent of EROFS_MAP_MAPPED since they are not directly allocated on disk; thus fragment extents are no longer twisted with mapped extents. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-04-10 16:53:39 +08:00
Hyungjung Joo	6fa253b38b	affs: bound hash_pos before table lookup in affs_readdir affs_readdir() decodes ctx->pos into hash_pos and chain_pos and then dereferences AFFS_HEAD(dir_bh)->table[hash_pos] before validating that hash_pos is within the runtime table bound. Treat out-of-range positions as end-of-directory before the first table lookup. Signed-off-by: Hyungjung Joo <jhj140711@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>	2026-04-10 02:51:05 +02:00
Jakub Kicinski	b6e39e4846	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR (net-7.0-rc8). Conflicts: net/ipv6/seg6_iptunnel.c `c3812651b5` ("seg6: separate dst_cache for input and output paths in seg6 lwtunnel") `78723a62b9` ("seg6: add per-route tunnel source address") https://lore.kernel.org/adZhwtOYfo-0ImSa@sirena.org.uk net/ipv4/icmp.c `fde29fd934` ("ipv4: icmp: fix null-ptr-deref in icmp_build_probe()") `d98adfbdd5` ("ipv4: drop ipv6_stub usage and use direct function calls") https://lore.kernel.org/adO3dccqnr6j-BL9@sirena.org.uk Adjacent changes: drivers/net/ethernet/stmicro/stmmac/chain_mode.c `51f4e090b9` ("net: stmmac: fix integer underflow in chain mode") `6b4286e055` ("net: stmmac: rename STMMAC_GET_ENTRY() -> STMMAC_NEXT_ENTRY()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-04-09 13:20:59 -07:00
Junrui Luo	21e161de2d	erofs: fix unsigned underflow in z_erofs_lz4_handle_overlap() Some crafted images can have illegal (!partial_decoding && m_llen < m_plen) extents, and the LZ4 inplace decompression path can be wrongly hit, but it cannot handle (outpages < inpages) properly: "outpages - inpages" wraps to a large value and the subsequent rq->out[] access reads past the decompressed_pages array. However, such crafted cases can correctly result in a corruption report in the normal LZ4 non-inplace path. Let's add an additional check to fix this for backporting. Reproducible image (base64-encoded gzipped blob): H4sIAJGR12kCA+3SPUoDQRgG4MkmkkZk8QRbRFIIi9hbpEjrHQI5ghfwCN5BLCzTGtLbBI+g dilSJo1CnIm7GEXFxhT6PDDwfrs73/ywIQD/1ePD4r7Ou6ETsrq4mu7XcWfj++Pb58nJU/9i PNtbjhan04/9GtX4qVYc814WDqt6FaX5s+ZwXXeq52lndT6IuVvlblytLMvh4Gzwaf90nsvz 2DF/21+20T/ldgp5s1jXRaN4t/8izsy/OUB6e/Qa79r+JwAAAAAAAL52vQVuGQAAAP6+my1w ywAAAAAAAADwu14ATsEYtgBQAAA= $ mount -t erofs -o cache_strategy=disabled foo.erofs /mnt $ dd if=/mnt/data of=/dev/null bs=4096 count=1 Fixes: `598162d050` ("erofs: support decompress big pcluster for lz4 backend") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>	2026-04-10 01:45:47 +08:00
Christian Brauner	cb76a81c7c	kernfs: make directory seek namespace-aware The rbtree backing kernfs directories is ordered by (hash, ns_id, name) but kernfs_dir_pos() only searches by hash when seeking to a position during readdir. When two nodes from different namespaces share the same hash value, the binary search can land on a node in the wrong namespace. The subsequent skip-forward loop walks rb_next() and may overshoot the correct node, silently dropping an entry from the readdir results. With the recent switch from raw namespace pointers to public namespace ids as hash seeds, computing hash collisions became an offline operation. An unprivileged user could unshare into a new network namespace, create a single interface whose name-hash collides with a target entry in init_net, and cause a victim's seekdir/readdir on /sys/class/net to miss that entry. Fix this by extending the rbtree search in kernfs_dir_pos() to also compare namespace ids when hashes match. Since the rbtree is already ordered by (hash, ns_id, name), this makes the seek land directly in the correct namespace's range, eliminating the wrong-namespace overshoot. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Christian Brauner	1fe989e1c4	kernfs: use namespace id instead of pointer for hashing and comparison kernfs uses the namespace tag as both a hash seed (via init_name_hash()) and a comparison key in the rbtree. The resulting hash values are exposed to userspace through directory seek positions (ctx->pos), and the raw pointer comparisons in kernfs_name_compare() encode kernel pointer ordering into the rbtree layout. This constitutes a KASLR information leak since the hash and ordering derived from kernel pointers can be observed from userspace. Fix this by using the 64-bit namespace id (ns_common::ns_id) instead of the raw pointer value for both hashing and comparison. The namespace id is a stable, non-secret identifier that is already exposed to userspace through other interfaces (e.g., /proc/pid/ns/, ioctl NS_GET_NSID). Introduce kernfs_ns_id() as a helper that extracts the namespace id from a potentially-NULL ns_common pointer, returning 0 for the no-namespace case. All namespace equality checks in the directory iteration and dentry revalidation paths are also switched from pointer comparison to ns_id comparison for consistency. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Christian Brauner	e3b2cf6e5d	kernfs: pass struct ns_common instead of const void * for namespace tags kernfs has historically used const void * to pass around namespace tags used for directory-level namespace filtering. The only current user of this is sysfs network namespace tagging where struct net pointers are cast to void . Replace all const void namespace parameters with const struct ns_common * throughout the kernfs, sysfs, and kobject namespace layers. This includes the kobj_ns_type_operations callbacks, kobject_namespace(), and all sysfs/kernfs APIs that accept or return namespace tags. Passing struct ns_common is needed because various codepaths require access to the underlying namespace. A struct ns_common can always be converted back to the concrete namespace type (e.g., struct net) via container_of() or to_ns_common() in the reverse direction. This is a preparatory change for switching to ns_id-based directory iteration to prevent a KASLR pointer leak through the current use of raw namespace pointers as hash seeds and comparison keys. Signed-off-by: Christian Brauner <brauner@kernel.org>	2026-04-09 14:36:52 +02:00
Viacheslav Dubeyko	c1307d18ca	hfsplus: fix generic/642 failure The xfstests' test-case generic/642 finishes with corrupted HFS+ volume: sudo ./check generic/642 [sudo] password for slavad: FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 7.0.0-rc1+ #26 SMP PREEMPT_DYNAMIC Mon Mar 23 17:24:32 PDT 2026 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/642 6s ... _check_generic_filesystem: filesystem on /dev/loop51 is inconsistent (see xfstests-dev/results//generic/642.full for details) Ran: generic/642 Failures: generic/642 Failed 1 of 1 tests sudo fsck.hfs -d /dev/loop51 /dev/loop51 Using cacheBlockSize=32K cacheTotalBlock=1024 cacheSize=32768K. Executing fsck_hfs (version 540.1-Linux). Checking non-journaled HFS Plus Volume. The volume name is untitled Checking extents overflow file. Checking catalog file. Checking multi-linked files. Checking catalog hierarchy. Checking extended attributes file. invalid free nodes - calculated 1637 header 1260 Invalid B-tree header Invalid map node (8, 0) Checking volume bitmap. Checking volume information. Verify Status: VIStat = 0x0000, ABTStat = 0xc000 EBTStat = 0x0000 CBTStat = 0x0000 CatStat = 0x00000000 Repairing volume. Rechecking volume. Checking non-journaled HFS Plus Volume. The volume name is untitled Checking extents overflow file. Checking catalog file. Checking multi-linked files. Checking catalog hierarchy. Checking extended attributes file. Checking volume bitmap. Checking volume information. The volume untitled was repaired successfully. The fsck tool detected that Extended Attributes b-tree is corrupted. Namely, the free nodes number is incorrect and map node bitmap has inconsistent state. Analysis has shown that during b-tree closing there are still some lost b-tree's nodes in the hash out of b-tree structure. But this orphaned b-tree nodes are still accounted as used in map node bitmap: tree_cnid 8, nidx 0, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 1, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 3, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 54, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 67, node_count 1408, free_nodes 1403 tree_cnid 8, nidx 0, prev 0, next 0, parent 0, num_recs 3, type 0x1, height 0 tree_cnid 8, nidx 1, prev 0, next 0, parent 3, num_recs 1, type 0xff, height 1 tree_cnid 8, nidx 3, prev 0, next 0, parent 0, num_recs 1, type 0x0, height 2 tree_cnid 8, nidx 54, prev 29, next 46, parent 3, num_recs 0, type 0xff, height 1 tree_cnid 8, nidx 67, prev 8, next 14, parent 3, num_recs 0, type 0xff, height 1 This issue happens in hfs_bnode_split() logic during detection the possibility of moving half ot the records out of the node. The hfs_bnode_split() contains a loop that implements a roughly 50/50 split of the B-tree node's records by scanning the offset table to find where the data crosses the node's midpoint. If this logic detects the incapability of spliting the node, then it simply calls hfs_bnode_put() for newly created node. However, node is not set as HFS_BNODE_DELETED and real deletion of node doesn't happen. As a result, the empty node becomes orphaned but it is still accounted as used. Finally, fsck tool detects this inconsistency of HFS+ volume. This patch adds call of hfs_bnode_unlink() before hfs_bnode_put() for the case if new node cannot be used for spliting the existing node. sudo ./check generic/642 FSTYP -- hfsplus PLATFORM -- Linux/x86_64 hfsplus-testing-0001 7.0.0-rc1+ #26 SMP PREEMPT_DYNAMIC Fri Apr 3 12:39:13 PDT 2026 MKFS_OPTIONS -- /dev/loop51 MOUNT_OPTIONS -- /dev/loop51 /mnt/scratch generic/642 40s ... 39s Ran: generic/642 Passed all 1 tests Closes: https://github.com/hfs-linux-kernel/hfs-linux-kernel/issues/242 cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-6-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2026-04-08 14:23:29 -07:00
Viacheslav Dubeyko	732af3aa63	hfsplus: rework logic of map nodes creation in xattr b-tree In hfsplus_init_header_node() when node_count > 63488 (header bitmap capacity), the code calculates map_nodes, subtracts them from free_nodes, and marks their positions used in the bitmap. However, it doesn't write the actual map node structure (type, record offsets, bitmap) for those physical positions, only node 0 is written. This patch reworks hfsplus_create_attributes_file() logic by introducing a specialized method of hfsplus_init_map_node() and writing the allocated map b-tree's nodes by means of hfsplus_write_attributes_file_node() method. cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-5-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2026-04-08 14:23:29 -07:00
Viacheslav Dubeyko	63584d7676	hfsplus: fix logic of alloc/free b-tree node The hfs_bmap_alloc() and hfs_bmap_free() modify the b-tree's counters and nodes' bitmap of b-tree. However, hfs_btree_write() synchronizes the state of in-core b-tree's counters and node's bitmap with b-tree's descriptor in header node. Postponing this synchronization could result in inconsistent state of file system volume. This patch adds calling of hfs_btree_write() in hfs_bmap_alloc() and hfs_bmap_free() methods. cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-4-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2026-04-08 14:23:29 -07:00
Viacheslav Dubeyko	cd3901f4c0	hfsplus: fix error processing issue in hfs_bmap_free() Currently, we check only -EINVAL error code in hfs_bmap_free() after calling the hfs_bmap_clear_bit(). It means that other error codes will be silently ignored. This patch adds the checking of all other error codes. cc: Shardul Bankar <shardul.b@mpiricsoftware.com> cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-3-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2026-04-08 14:23:29 -07:00
Viacheslav Dubeyko	6dca66d7ba	hfsplus: fix potential race conditions in b-tree functionality The HFS_BNODE_DELETED flag is checked in hfs_bnode_put() under locked tree->hash_lock. This patch adds locking for the case of setting the HFS_BNODE_DELETED flag in hfs_bnode_unlink() with the goal to avoid potential race conditions. The hfs_btree_write() method should be called under tree->tree_lock. This patch reworks logic by adding locking the tree->tree_lock for the calls of hfs_btree_write() in hfsplus_cat_write_inode() and hfsplus_system_write_inode(). This patch adds also the lockdep_assert_held() in hfs_bmap_reserve(), hfs_bmap_alloc(), and hfs_bmap_free(). cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> cc: Yangtao Li <frank.li@vivo.com> cc: linux-fsdevel@vger.kernel.org Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/20260403230556.614171-2-slava@dubeyko.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>	2026-04-08 14:23:28 -07:00
Greg Kroah-Hartman	3df690bba2	smb: client: fix OOB reads parsing symlink error response When a CREATE returns STATUS_STOPPED_ON_SYMLINK, smb2_check_message() returns success without any length validation, leaving the symlink parsers as the only defense against an untrusted server. symlink_data() walks SMB 3.1.1 error contexts with the loop test "p < end", but reads p->ErrorId at offset 4 and p->ErrorDataLength at offset 0. When the server-controlled ErrorDataLength advances p to within 1-7 bytes of end, the next iteration will read past it. When the matching context is found, sym->SymLinkErrorTag is read at offset 4 from p->ErrorContextData with no check that the symlink header itself fits. smb2_parse_symlink_response() then bounds-checks the substitute name using SMB2_SYMLINK_STRUCT_SIZE as the offset of PathBuffer from iov_base. That value is computed as sizeof(smb2_err_rsp) + sizeof(smb2_symlink_err_rsp), which is correct only when ErrorContextCount == 0. With at least one error context the symlink data sits 8 bytes deeper, and each skipped non-matching context shifts it further by 8 + ALIGN(ErrorDataLength, 8). The check is too short, allowing the substitute name read to run past iov_len. The out-of-bound heap bytes are UTF-16-decoded into the symlink target and returned to userspace via readlink(2). Fix this all up by making the loops test require the full context header to fit, rejecting sym if its header runs past end, and bound the substitute name against the actual position of sym->PathBuffer rather than a fixed offset. Because sub_offs and sub_len are 16bits, the pointer math will not overflow here with the new greater-than. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: Bharath SM <bharathsm@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable <stable@kernel.org> Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Assisted-by: gregkh_clanker_t1000 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-07 15:51:39 -05:00
Greg Kroah-Hartman	3d8b9d06bd	smb: client: fix off-by-8 bounds check in check_wsl_eas() The bounds check uses (u8 )ea + nlen + 1 + vlen as the end of the EA name and value, but ea_data sits at offset sizeof(struct smb2_file_full_ea_info) = 8 from ea, not at offset 0. The strncmp() later reads ea->ea_data[0..nlen-1] and the value bytes follow at ea_data[nlen+1..nlen+vlen], so the actual end is ea->ea_data + nlen + 1 + vlen. Isn't pointer math fun? The earlier check (u8 )ea > end - sizeof(*ea) only guarantees the 8-byte header is in bounds, but since the last EA is placed within 8 bytes of the end of the response, the name and value bytes are read past the end of iov. Fix this mess all up by using ea->ea_data as the base for the bounds check. An "untrusted" server can use this to leak up to 8 bytes of kernel heap into the EA name comparison and influence which WSL xattr the data is interpreted as. Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: Bharath SM <bharathsm@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Cc: stable <stable@kernel.org> Assisted-by: gregkh_clanker_t1000 Reviewed-by: Paulo Alcantara (Red Hat) <pc@manguebit.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Steve French <stfrench@microsoft.com>	2026-04-07 15:51:01 -05:00

1 2 3 4 5 ...

104865 Commits