linux

mirror of https://github.com/torvalds/linux.git synced 2026-05-05 23:05:25 -04:00

Author	SHA1	Message	Date
Max Kellermann	28a3f6ab2f	fs/open: make chmod_common() and chown_common() killable Allows killing processes that are waiting for the inode lock. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/20250513150327.1373061-2-max.kellermann@ionos.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-15 12:03:12 +02:00
Max Kellermann	d8c5507cd1	include/linux/fs.h: add inode_lock_killable() Prepare for making inode operations killable while they're waiting for the lock. Signed-off-by: Max Kellermann <max.kellermann@ionos.com> Link: https://lore.kernel.org/20250513150327.1373061-1-max.kellermann@ionos.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-15 12:03:11 +02:00
Miklos Szeredi	e0410e956b	readdir: supply dir_context.count as readdir buffer size hint This is a preparation for large readdir buffers in fuse. Simply setting the fuse buffer size to the userspace buffer size should work, the record sizes are similar (fuse's is slightly larger than libc's, so no overflow should ever happen). Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Jaco Kroon <jaco@uls.co.za> Link: https://lore.kernel.org/20250513151012.1476536-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-15 11:26:05 +02:00
Yafang Shao	e7b9cea718	vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup involves scanning all files and caching their metadata (including dentries and inodes) in memory. Each HDD contains approximately 2 million files, resulting in a total of ~20 million cached dentries after initialization. To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite this configuration, memory pressure conditions can still trigger reclamation of up to 50% of cached dentries, reducing the cache from 20 million to approximately 10 million entries. During the subsequent cache rebuild period, any HDFS datanode restart operation incurs substantial latency penalties until full cache recovery completes. To maintain service stability, we need to preserve more dentries during memory reclamation. The current minimum reclaim ratio (1/100 of total dentries) remains too aggressive for our workload. This patch introduces vfs_cache_pressure_denom for more granular cache pressure control. The configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000] effectively maintains the full 20 million dentry cache under memory pressure, preventing datanode restart performance degradation. Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0] Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Link: https://lore.kernel.org/20250511083624.9305-1-laoar.shao@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-15 11:12:59 +02:00
Miklos Szeredi	8d9117009d	fuse: don't allow signals to interrupt getdents copying When getting the directory contents, the entries are first fetched to a kernel buffer, then they are copied to userspace with dir_emit(). This second phase is non-blocking as long as the userspace buffer is not paged out, making it interruptible makes zero sense. Overload d_type as flags, since it only uses 4 bits from 32. Reviewed-by: Bernd Schubert <bschubert@ddn.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/20250513112335.1473177-1-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-15 11:12:11 +02:00
Petr Vaněk	678927c0c9	Documentation: fix typo in root= kernel parameter description Fixes a typo in the root= parameter description, changing "this a a" to "this is a". Fixes: `c0c1a7dcb6` ("init: move the nfs/cifs/ram special cases out of name_to_dev_t") Signed-off-by: Petr Vaněk <arkamar@atlas.cz> Link: https://lore.kernel.org/20250512110827.32530-1-arkamar@atlas.cz Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-13 09:27:57 +02:00
Christian Brauner	e68ecc161f	Merge patch series "Minor namespace code simplication" Joel Savitz <jsavitz@redhat.com> says: The two patches are independent of each other. The first patch removes unnecssary NULL guards from free_nsproxy() and create_new_namespaces() in line with other usage of the put__ns() call sites. The second patch slightly reduces the size of the kernel when CONFIG_CGROUPS is not selected. patches from https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com: include/cgroup: separate {get,put}_cgroup_ns no-op case kernel/nsproxy: remove unnecessary guards Link: https://lore.kernel.org/20250508184930.183040-1-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-09 13:14:02 +02:00
Joel Savitz	79fb8d8d93	include/cgroup: separate {get,put}_cgroup_ns no-op case When CONFIG_CGROUPS is not selected, {get,put}_cgroup_ns become no-ops and therefore it is not necessary to compile in the code for changing the reference count. When CONFIG_CGROUP is selected, there is no valid case where either of {get,put}_cgroup_ns() will be called with a NULL argument. Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/20250508184930.183040-3-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-09 13:13:54 +02:00
Joel Savitz	5caa2d89b7	kernel/nsproxy: remove unnecessary guards In free_nsproxy() and the error path of create_new_namesapces() the put__ns() calls are guarded by unnecessary NULL checks. put_pid_ns(), put_ipc_ns(), put_uts_ns(), and put_time_ns() will never receive a NULL argument unless their namespace type is disabled, and in this case all four become no-ops at compile time anyway. put_mnt_ns() will never receive a null argument at any time. This unguarded usage is in line with other call sites of put__ns(). Signed-off-by: Joel Savitz <jsavitz@redhat.com> Link: https://lore.kernel.org/20250508184930.183040-2-jsavitz@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-09 13:13:54 +02:00
Christoph Hellwig	bb01e8cc10	fs: use writeback_iter directly in mpage_writepages Stop using write_cache_pages and use writeback_iter directly. This removes an indirect call per written folio and makes the code easier to follow. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250507062124.3933305-1-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-05-09 12:37:48 +02:00
Jinliang Zheng	9f81d70702	fs: remove useless plus one in super_cache_scan() After commit `475d0db742` ("fs: Fix theoretical division by 0 in super_cache_scan()."), there's no need to plus one to prevent division by zero. Remove it to simplify the code. Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Link: https://lore.kernel.org/20250428135050.267297-1-alexjlzheng@tencent.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-29 13:08:29 +02:00
Christian Brauner	19bbfe7b5f	fs: add S_ANON_INODE This makes it easy to detect proper anonymous inodes and to ensure that we can detect them in codepaths such as readahead(). Readahead on anonymous inodes didn't work because they didn't have a proper mode. Now that they have we need to retain EINVAL being returned otherwise LTP will fail. We also need to ensure that ioctls aren't simply fired like they are for regular files so things like inotify inodes continue to correctly call their own ioctl handlers as in [1]. Reported-by: Xilin Wu <sophon@radxa.com> Link: https://lore.kernel.org/3A9139D5CD543962+89831381-31b9-4392-87ec-a84a5b3507d8@radxa.com [1] Link: https://lore.kernel.org/7a1a7076-ff6b-4cb0-94e7-7218a0a44028@sirena.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 13:20:14 +02:00
Christian Brauner	c4044870ae	Merge patch series "two nits for path lookup" Mateusz Guzik <mjguzik@gmail.com> says: Since path looku is being looked at, two extra nits from me: 1. some trivial jump avoidance in inode_permission() 2. but more importantly avoiding a memory access which is most likely a cache miss when descending into devcgroup_inode_permission() the file seems to have no maintainer fwiw anyhow I'm confident the way forward is to add IOP_FAST_MAY_EXEC (or similar) to elide inode_permission() in the common case to begin with. There are quite a few branches which straight up don't need execute. On top of that btrfs has a permission hook only to check for MAY_WRITE, which in case of path lookup is not set. With the above flag the call will be avoided. * patches from https://lore.kernel.org/20250416221626.2710239-1-mjguzik@gmail.com: device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() fs: touch up predicts in inode_permission() Link: https://lore.kernel.org/20250416221626.2710239-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Christian Brauner	79beea2db0	fs: remove uselib() system call This system call has been deprecated for quite a while now. Let's try and remove it from the kernel completely. Link: https://lore.kernel.org/20250415-kanufahren-besten-02ac00e6becd@brauner Acked-by: Kees Cook <kees@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Mateusz Guzik	4ef4ac3601	device_cgroup: avoid access to ->i_rdev in the common case in devcgroup_inode_permission() The routine gets called for every path component during lookup. ->i_mode is going to be cached on account of permission checks, while ->i_rdev is an area which is most likely cache-cold. gcc 14.2 is kind enough to emit one branch: movzwl (%rbx),%eax mov %eax,%edx and $0xb000,%dx cmp $0x2000,%dx je 11bc <inode_permission+0xec> This patch is lazy in that I don't know if the ->i_rdev branch makes any sense with the newly added mode check upfront. I am not changing any semantics here though. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250416221626.2710239-3-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Zijun Hu	d1f482108a	fs/fs_parse: Remove unused and problematic validate_constant_table() Remove validate_constant_table() since: - It has no caller. - It has below 3 bugs for good constant table array array[] which must end with a empty entry, and take below invocation for explaination: validate_constant_table(array, ARRAY_SIZE(array), ...) - Always return wrong value due to the last empty entry. - Imprecise error message for missorted case. - Potential NULL pointer dereference since the last pr_err() may use @tbl[i].name NULL pointer to print the last empty entry's name. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250415-fix_fs-v4-1-5d575124a3ff@quicinc.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Mateusz Guzik	875ccc0ddc	fs: touch up predicts in inode_permission() The routine only encounters errors when people try to access things they can't, which is a negligible amount of calls. The only questionable bit might be the pre-existing predict around MAY_WRITE. Currently the routine is predominantly used for MAY_EXEC, so this makes some sense. I verified this straightens out the asm. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250416221626.2710239-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:59 +02:00
Zijun Hu	296b67059e	fs/fs_parse: Delete macro fsparam_u32hex() Delete macro fsparam_u32hex() since: - it has no caller. - it uses as type @fs_param_is_u32_hex which is never defined, so will cause compile error when caller uses it. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250411-fix_fs-v2-1-5d3395c102e4@quicinc.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:58 +02:00
Mateusz Guzik	8564124c36	fs: improve codegen in link_path_walk() Looking at the asm produced by gcc 13.3 for x86-64: 1. may_lookup() usage was not optimized for succeeding, despite the routine being inlined and rightfully starting with likely(!err) 2. the compiler assumed the path will have an indefinite amount of slashes to skip, after which the result will be an empty name As such: 1. predict may_lookup() succeeding 2. check for one slash, no explicit predicts. do roll forward with skipping more slashes while predicting there is only one 3. predict the path to find was not a mere slash This also has a side effect of shrinking the file: add/remove: 1/1 grow/shrink: 0/3 up/down: 934/-1012 (-78) Function old new delta link_path_walk - 934 +934 path_parentat 138 112 -26 path_openat 4864 4823 -41 path_lookupat 418 374 -44 link_path_walk.part.constprop 901 - -901 Total: Before=46639, After=46561, chg -0.17% Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250412110935.2267703-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:58 +02:00
Li RongQing	ef181fa11d	fs: Make file-nr output the total allocated file handles Make file-nr output the total allocated file handles, not per-cpu cache number, it's more precise, and not in hot path Signed-off-by: Li RongQing <lirongqing@baidu.com> Link: https://lore.kernel.org/20250410112117.2851-1-lirongqing@baidu.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:58 +02:00
Colin Ian King	6b24a702ec	select: core_sys_select add unlikely branch hint on return path Adding an unlikely() hint on the n < 0 comparison return path improves run-time performance of the select() system call, the negative value of n is very uncommon in normal select usage. Benchmarking on an Debian based Intel(R) Core(TM) Ultra 9 285K with a 6.15-rc1 kernel built with 14.2.0 using a select of 1000 file descriptors with zero timeout shows a consistent call reduction from 258 ns down to 254 ns, which is a ~1.5% performance improvement. Results based on running 25 tests with turbo disabled (to reduce clock freq turbo changes), with 30 second run per test and comparing the number of select() calls per second. The % standard deviation of the 25 tests was 0.24%, so results are reliable. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/20250414092426.53529-1-colin.i.king@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-21 10:27:58 +02:00
Zijun Hu	1363c134ad	fs/filesystems: Fix potential unsigned integer underflow in fs_name() fs_name() has @index as unsigned int, so there is underflow risk for operation '@index--'. Fix by breaking the for loop when '@index == 0' which is also more proper than '@index <= 0' for unsigned integer comparison. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250410-fix_fs-v1-1-7c14ccc8ebaa@quicinc.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-14 13:05:59 +02:00
Zijun Hu	698d1b483c	fs/fs_context: Mark an unlikely if condition with unlikely() in vfs_parse_monolithic_sep() There is no mount option with pattern "...,=key_or_value,...", so the if condition '(value == key)' in while loop of vfs_parse_monolithic_sep() is is unlikely true. Mark the condition with unlikely() to improve both performance and readability. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250410-fix_fs-v1-5-7c14ccc8ebaa@quicinc.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-14 13:05:59 +02:00
Zijun Hu	1d17057d21	fs/fs_parse: Correct comments of fs_validate_description() For fs_validate_description(), its comments easily mislead reader that the function will search array @desc for duplicated entries with name specified by parameter @name, but @name is not used for search actually. Fix by marking name as owner's name of these parameter specifications. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-14 13:05:40 +02:00
Zijun Hu	916148d24d	fs/fs_context: Use KERN_INFO for infof()\|info_plog()\|infofc() Use KERN_INFO instead of default KERN_NOTICE for infof()\|info_plog()\|infofc() to printk informational messages. Signed-off-by: Zijun Hu <quic_zijuhu@quicinc.com> Link: https://lore.kernel.org/20250410-rfc_fix_fs-v1-1-406e13b3608e@quicinc.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-11 16:10:51 +02:00
Colin Ian King	5730609ffd	select: do_pollfd: add unlikely branch hint return path Adding an unlikely() hint on the fd < 0 comparison return path improves run-time performance of the poll() system call. gcov based coverage analysis based on running stress-ng and a kernel build shows that this path return path is highly unlikely. Benchmarking on an Debian based Intel(R) Core(TM) Ultra 9 285K with a 6.15-rc1 kernel and a poll of 1024 file descriptors with zero timeout shows an call reduction from 32818 ns down to 32635 ns, which is a ~0.5% performance improvement. Results based on running 25 tests with turbo disabled (to reduce clock freq turbo changes), with 30 second run per test and comparing the number of poll() calls per second. The % standard deviation of the 25 tests was 0.08%, so results are reliable. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Link: https://lore.kernel.org/20250409155510.577490-1-colin.i.king@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-11 15:56:54 +02:00
David Howells	f1745496d3	netfs: Update main API document Bring the netfs documentation up to date. Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/1690127.1744208325@warthog.procyon.org.uk Reviewed-by: "Paulo Alcantara (Red Hat)" <pc@manguebit.com> cc: Jeff Layton <jlayton@kernel.org> cc: Viacheslav Dubeyko <slava@dubeyko.com> cc: Alex Markuze <amarkuze@redhat.com> cc: Timothy Day <timday@amazon.com> cc: Jonathan Corbet <corbet@lwn.net> cc: netfs@lists.linux.dev cc: linux-doc@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-11 15:23:50 +02:00
Mateusz Guzik	e45960c279	fs: unconditionally use atime_needs_update() in pick_link() Vast majority of the time the func returns false. This avoids a branch to determine whether we are in RCU mode. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250408073641.1799151-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 11:08:24 +02:00
Christian Brauner	c9b380a017	Merge patch series "fs: sort out cosmetic differences between stat funcs and add predicts" Predict fastpaths in stat and during fdput(). * patches from https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com: fs: predict not having to do anything in fdput() fs: sort out cosmetic differences between stat funcs and add predicts Link: https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:10 +02:00
Mateusz Guzik	5f3e0b4a1f	fs: predict not having to do anything in fdput() This matches the annotation in fdget(). Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250406235806.1637000-2-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:07 +02:00
Mateusz Guzik	eaec2cd167	fs: sort out cosmetic differences between stat funcs and add predicts This is a nop, but I did verify asm improves. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/20250406235806.1637000-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-08 10:28:07 +02:00
Christian Brauner	9d36c5145a	Merge patch series "fs: harden anon inodes" Christian Brauner <brauner@kernel.org> says: * Anonymous inodes currently don't come with a proper mode causing issues in the kernel when we want to add useful VFS debug assert. Fix that by giving them a proper mode and masking it off when we report it to userspace which relies on them not having any mode. * Anonymous inodes currently allow to change inode attributes because the VFS falls back to simple_setattr() if i_op->setattr isn't implemented. This means the ownership and mode for every single user of anon_inode_inode can be changed. Block that as it's either useless or actively harmful. If specific ownership is needed the respective subsystem should allocate anonymous inodes from their own private superblock. * Port pidfs to the new anon_inode_{g,s}etattr() helpers. * Add proper tests for anonymous inode behavior. The anonymous inode specific fixes should ideally be backported to all LTS kernels. * patches from https://lore.kernel.org/20250407-work-anon_inode-v1-0-53a44c20d44e@kernel.org: selftests/filesystems: add fourth test for anonymous inodes selftests/filesystems: add third test for anonymous inodes selftests/filesystems: add second test for anonymous inodes selftests/filesystems: add first test for anonymous inodes anon_inode: raise SB_I_NODEV and SB_I_NOEXEC pidfs: use anon_inode_setattr() anon_inode: explicitly block ->setattr() pidfs: use anon_inode_getattr() anon_inode: use a proper mode internally Link: https://lore.kernel.org/20250407-work-anon_inode-v1-0-53a44c20d44e@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:15 +02:00
Christian Brauner	25a6cc9a63	selftests/filesystems: add open() test for anonymous inodes Test that anonymous inodes cannot be open()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-9-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:15 +02:00
Christian Brauner	f8ca403ae7	selftests/filesystems: add exec() test for anonymous inodes Test that anonymous inodes cannot be exec()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-8-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	fcf31ec7ca	selftests/filesystems: add chmod() test for anonymous inodes Test that anonymous inodes cannot be chmod()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-7-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	c784159750	selftests/filesystems: add chown() test for anonymous inodes Test that anonymous inodes cannot be chown()ed. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-6-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:20:14 +02:00
Christian Brauner	1ed95281c0	anon_inode: raise SB_I_NODEV and SB_I_NOEXEC It isn't possible to execute anonymous inodes because they cannot be opened in any way after they have been created. This includes execution: execveat(fd_anon_inode, "", NULL, NULL, AT_EMPTY_PATH) Anonymous inodes have inode->f_op set to no_open_fops which sets no_open() which returns ENXIO. That means any call to do_dentry_open() which is the endpoint of the do_open_execat() will fail. There's no chance to execute an anonymous inode. Unless a given subsystem overrides it ofc. However, we should still harden this and raise SB_I_NODEV and SB_I_NOEXEC on the superblock itself so that no one gets any creative ideas. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-5-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:19:04 +02:00
Christian Brauner	c83b902496	pidfs: use anon_inode_setattr() So far pidfs did use it's own version. Just use the generic version. We use our own wrappers because we're going to be implementing properties soon. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-4-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:19:02 +02:00
Christian Brauner	22bdf3d658	anon_inode: explicitly block ->setattr() It is currently possible to change the mode and owner of the single anonymous inode in the kernel: int main(int argc, char *argv[]) { int ret, sfd; sigset_t mask; struct signalfd_siginfo fdsi; sigemptyset(&mask); sigaddset(&mask, SIGINT); sigaddset(&mask, SIGQUIT); ret = sigprocmask(SIG_BLOCK, &mask, NULL); if (ret < 0) _exit(1); sfd = signalfd(-1, &mask, 0); if (sfd < 0) _exit(2); ret = fchown(sfd, 5555, 5555); if (ret < 0) _exit(3); ret = fchmod(sfd, 0777); if (ret < 0) _exit(3); _exit(4); } This is a bug. It's not really a meaningful one because anonymous inodes don't really figure into path lookup and they cannot be reopened via /proc/<pid>/fd/<nr> and can't be used for lookup itself. So they can only ever serve as direct references. But it is still completely bogus to allow the mode and ownership or any of the properties of the anonymous inode to be changed. Block this! Link: https://lore.kernel.org/20250407-work-anon_inode-v1-3-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:59 +02:00
Christian Brauner	37e62dafbf	pidfs: use anon_inode_getattr() So far pidfs did use it's own version. Just use the generic version. We use our own wrappers because we're going to be implementing our own retrieval properties soon. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-2-53a44c20d44e@kernel.org Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:56 +02:00
Christian Brauner	cfd86ef7e8	anon_inode: use a proper mode internally This allows the VFS to not trip over anonymous inodes and we can add asserts based on the mode into the vfs. When we report it to userspace we can simply hide the mode to avoid regressions. I've audited all direct callers of alloc_anon_inode() and only secretmen overrides i_mode and i_op inode operations but it already uses a regular file. Link: https://lore.kernel.org/20250407-work-anon_inode-v1-1-53a44c20d44e@kernel.org Fixes: `af153bb63a` ("vfs: catch invalid modes in may_open()") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # all LTS kernels Reported-by: syzbot+5d8e79d323a13aa0b248@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/67ed3fb3.050a0220.14623d.0009.GAE@google.com Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 16:18:46 +02:00
David Disseldorp	418556fa57	docs: initramfs: update compression and mtime descriptions Update the document to reflect that initramfs didn't replace initrd following kernel 2.5.x. The initramfs buffer format now supports many compression types in addition to gzip, so include them in the grammar section. c_mtime use is dependent on CONFIG_INITRAMFS_PRESERVE_MTIME. Signed-off-by: David Disseldorp <ddiss@suse.de> Link: https://lore.kernel.org/r/20250402033949.852-2-ddiss@suse.de Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Christian Brauner <brauner@kernel.org>	2025-04-07 09:38:01 +02:00
Linus Torvalds	0af2f6be1b	Linux 6.15-rc1 v6.15-rc1	2025-04-06 13:11:33 -07:00
Thomas Weißschuh	0efdedb335	tools/include: make uapi/linux/types.h usable from assembly The "real" linux/types.h UAPI header gracefully degrades to a NOOP when included from assembly code. Mirror this behaviour in the tools/ variant. Test for __ASSEMBLER__ over __ASSEMBLY__ as the former is provided by the toolchain automatically. Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/lkml/af553c62-ca2f-4956-932c-dd6e3a126f58@sirena.org.uk/ Fixes: `c9fbaa8795` ("selftests: vDSO: parse_vdso: Use UAPI headers instead of libc headers") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Link: https://patch.msgid.link/20250321-uapi-consistency-v1-1-439070118dc0@linutronix.de Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2025-04-06 12:55:31 -07:00
Linus Torvalds	710329254d	Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - support up to 8192 processors - add cpuidle governor debug telemetry, disabled by default - update default output to exclude cpuidle invocation counts - bug fixes * tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: v2025.05.06 tools/power turbostat: disable "cpuidle" invocation counters, by default tools/power turbostat: re-factor sysfs code tools/power turbostat: Restore GFX sysfs fflush() call tools/power turbostat: Document GNR UncMHz domain convention tools/power turbostat: report CoreThr per measurement interval tools/power turbostat: Increase CPU_SUBSET_MAXCPUS to 8192 tools/power turbostat: Add idle governor statistics reporting tools/power turbostat: Fix names matching tools/power turbostat: Allow Zero return value for some RAPL registers tools/power turbostat: Clustered Uncore MHz counters should honor show/hide options	2025-04-06 12:32:43 -07:00
Linus Torvalds	59f392fa7c	Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire fix from Vinod Koul: - add missing config symbol CONFIG_SND_HDA_EXT_CORE required for asoc driver CONFIG_SND_SOF_SOF_HDA_SDW_BPT * tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE	2025-04-06 12:04:53 -07:00
Len Brown	03e00e373c	tools/power turbostat: v2025.05.06 Support up to 8192 processors Add cpuidle governor debug telemetry, disabled by default Update default output to exclude cpuidle invocation counts Bug fixes Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 14:49:20 -04:00
Len Brown	ec4acd3166	tools/power turbostat: disable "cpuidle" invocation counters, by default Create "pct_idle" counter group, the sofware notion of residency so it can now be singled out, independent of other counter groups. Create "cpuidle" group, the cpuidle invocation counts. Disable "cpuidle", by default. Create "swidle" = "cpuidle" + "pct_idle". Undocument "sysfs", the old name for "swidle", but keep it working for backwards compatibilty. Create "hwidle", all the HW idle counters Modify "idle", enabled by default "idle" = "hwidle" + "pct_idle" (and now excludes "cpuidle") Signed-off-by: Len Brown <len.brown@intel.com>	2025-04-06 14:29:57 -04:00
Linus Torvalds	dda8887894	Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fix from Ingo Molnar: "Fix a perf events time accounting bug" * tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Fix child_total_time_enabled accounting bug at task exit	2025-04-06 10:48:12 -07:00
Linus Torvalds	302deb109d	Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: - Fix a nonsensical Kconfig combination - Remove an unnecessary rseq-notification * tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: rseq: Eliminate useless task_work on execve sched/isolation: Make CONFIG_CPU_ISOLATION depend on CONFIG_SMP	2025-04-06 10:44:58 -07:00

1 2 3 4 5 ...

1351148 Commits