linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-18 06:44:00 -04:00

Author	SHA1	Message	Date
Amery Hung	f75aeb2de8	bpf: Dissociate struct_ops program with map if map_update fails Currently, when bpf_struct_ops_map_update_elem() fails, the programs' st_ops_assoc will remain set. They may become dangling pointers if the map is freed later, but they will never be dereferenced since the struct_ops attachment did not succeed. However, if one of the programs is subsequently attached as part of another struct_ops map, its st_ops_assoc will be poisoned even though its old st_ops_assoc was stale from a failed attachment. Fix the spurious poisoned st_ops_assoc by dissociating struct_ops programs with a map if the attachment fails. Move bpf_prog_assoc_struct_ops() to after *plink++ to make sure bpf_prog_disassoc_struct_ops() will not miss a program when iterating st_map->links. Note that, dissociating a program from a map requires some attention as it must not reset a poisoned st_ops_assoc or a st_ops_assoc pointing to another map. The former is already guarded in bpf_prog_disassoc_struct_ops(). The latter also will not happen since st_ops_assoc of programs in st_map->links are set by bpf_prog_assoc_struct_ops(), which can only be poisoned or pointing to the current map. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260417174900.2895486-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-17 12:04:14 -07:00
Puranjay Mohan	2845989f2e	bpf: Validate node_id in arena_alloc_pages() arena_alloc_pages() accepts a plain int node_id and forwards it through the entire allocation chain without any bounds checking. Validate node_id before passing it down the allocation chain in arena_alloc_pages(). Fixes: `317460317a` ("bpf: Introduce bpf_arena.") Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260417152135.1383754-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-17 10:12:55 -07:00
Jiri Olsa	380044c40b	libbpf: Prevent double close and leak of btf objects Sashiko found possible double close of btf object fd [1], which happens when strdup in load_module_btfs fails at which point the obj->btf_module_cnt is already incremented. The error path close btf fd and so does later cleanup code in bpf_object_post_load_cleanup function. Also libbpf_ensure_mem failure leaves btf object not assigned and it's leaked. Replacing the err_out label with break to make the error path less confusing as suggested by Alan. Incrementing obj->btf_module_cnt only if there's no failure and releasing btf object in error path. Fixes: `91abb4a6d7` ("libbpf: Support attachment of BPF tracing programs to kernel modules") [1] https://sashiko.dev/#/patchset/20260324081846.2334094-1-jolsa%40kernel.org Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260416100034.1610852-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 16:00:10 -07:00
Alexei Starovoitov	d6f5841a4f	Merge branch 'bpf-allow-utf-8-literals-in-bpf_bprintf_prepare' Yihan Ding says: ==================== bpf: allow UTF-8 literals in bpf_bprintf_prepare() bpf_bprintf_prepare() currently rejects any non-ASCII byte in format strings, so helpers such as bpf_trace_printk() fail to emit UTF-8 literal text even when those bytes are not part of a format specifier. Keep plain text permissive while continuing to parse '%' sequences as ASCII-only. Patch 1 updates snprintf_negative() at the same time so the selftests stay consistent during bisection. Patch 2 then extends trace_printk coverage for both the valid UTF-8 literal case and the invalid non-ASCII-after-'%' case. Changes in v3: - drop Suggested-by trailers and move review credit into this changelog - update test_snprintf_negative() in patch 1/2 so plain non-ASCII text is accepted while non-ASCII after '%' is still rejected, keeping ./test_progs -t snprintf aligned with the new behavior. - clarify the trace_printk negative case with an explicit invalid format string and comment - address Paul Chaignon's review feedback and keep the negative coverage requested earlier by Alan Maguire Changes in v2: - split the core change and selftest updates into two patches - drop unnecessary isspace()/ispunct() casts - add comments to clarify plain-text vs format-specifier handling - add a negative selftest for non-ASCII bytes inside '%' sequences Testing: - Reproduced on x86_64 without the core fix: ASCII trace output works, while UTF-8 literal text in bpf_trace_printk() is rejected and produces no trace output - Verified with tools/testing/selftests/bpf: ./test_progs -t trace_printk - Verified with tools/testing/selftests/bpf: ./test_progs -t snprintf ==================== Link: https://patch.msgid.link/20260416120142.1420646-1-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:53:32 -07:00
Yihan Ding	4198ff31ed	selftests/bpf: cover UTF-8 trace_printk output Extend trace_printk coverage to verify that UTF-8 literal text is emitted successfully and that '%' parsing still rejects non-ASCII bytes once format parsing starts. Use an explicitly invalid format string for the negative case so the ASCII-only parser expectation is visible from the test code itself. Signed-off-by: Yihan Ding <dingyihan@uniontech.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416120142.1420646-3-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:53:32 -07:00
Yihan Ding	b960430ea8	bpf: allow UTF-8 literals in bpf_bprintf_prepare() bpf_bprintf_prepare() only needs ASCII parsing for conversion specifiers. Plain text can safely carry bytes >= 0x80, so allow UTF-8 literals outside '%' sequences while keeping ASCII control bytes rejected and format specifiers ASCII-only. This keeps existing parsing rules for format directives unchanged, while allowing helpers such as bpf_trace_printk() to emit UTF-8 literal text. Update test_snprintf_negative() in the same commit so selftests keep matching the new plain-text vs format-specifier split during bisection. Fixes: `48cac3f4a9` ("bpf: Implement formatted output helpers with bstr_printf") Signed-off-by: Yihan Ding <dingyihan@uniontech.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416120142.1420646-2-dingyihan@uniontech.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:53:32 -07:00
Alexei Starovoitov	766bf026d0	Merge branch 'bpf-fix-null-deref-when-storing-scalar-into-kptr-slot' Mykyta Yatsenko says: ==================== bpf: Fix NULL deref when storing scalar into kptr slot map_kptr_match_type() accesses reg->btf before confirming the register is PTR_TO_BTF_ID. A scalar store into a kptr slot has no btf, causing a NULL pointer dereference. Guard base_type() first. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> ==================== Link: https://patch.msgid.link/20260416-kptr_crash-v1-0-5589356584b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:20:32 -07:00
Mykyta Yatsenko	fcd11ff8bd	selftests/bpf: Reject scalar store into kptr slot Verify that the verifier rejects a direct scalar write to a kptr map value slot without crashing. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260416-kptr_crash-v1-2-5589356584b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:20:27 -07:00
Mykyta Yatsenko	4d0a375887	bpf: Fix NULL deref in map_kptr_match_type for scalar regs Commit `ab6c637ad0` ("bpf: Fix a bpf_kptr_xchg() issue with local kptr") refactored map_kptr_match_type() to branch on btf_is_kernel() before checking base_type(). A scalar register stored into a kptr slot has no btf, so the btf_is_kernel(reg->btf) call dereferences NULL. Move the base_type() != PTR_TO_BTF_ID guard before any reg->btf access. Fixes: `ab6c637ad0` ("bpf: Fix a bpf_kptr_xchg() issue with local kptr") Reported-by: Hiker Cl <clhiker365@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=221372 Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260416-kptr_crash-v1-1-5589356584b4@meta.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 15:20:26 -07:00
Daniel Borkmann	e5f635edd3	bpf: Fix precedence bug in convert_bpf_ld_abs alignment check Fix an operator precedence issue in convert_bpf_ld_abs() where the expression offset + ip_align % size evaluates as offset + (ip_align % size) due to % having higher precedence than +. That latter evaluation does not make any sense. The intended check is (offset + ip_align) % size == 0 to verify that the packet load offset is properly aligned for direct access. With NET_IP_ALIGN == 2, the bug causes the inline fast-path for direct packet loads to almost never be taken on !CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS platforms. This forces nearly all cBPF BPF_LD_ABS packet loads through the bpf_skb_load_helper slow path on the affected archs. Fixes: `e0cea7ce98` ("bpf: implement ld_abs/ld_ind in native bpf") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260416122719.661033-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:35:22 -07:00
Alexei Starovoitov	1cedfe17ba	Merge branch 'emit-endbr-bti-instructions-for-indirect' Xu Kuohai says: ==================== emit ENDBR/BTI instructions for indirect On architectures with CFI protection enabled that require landing pad instructions at indirect jump targets, such as x86 with CET/IBT enabled and arm64 with BTI enabled, kernel panics when an indirect jump lands on a target without landing pad. Therefore, the JIT must emit landing pad instructions for indirect jump targets. The verifier already recognizes which instructions are indirect jump targets during the verification phase. So we can store this information in env->insn_aux_data and pass it to the JIT as new parameter, allowing the JIT to consult env->insn_aux_data to determine which instructions are indirect jump targets. During JIT, constants blinding is performed. It rewrites the private copy of instructions for the JITed program, but it does not adjust the global env->insn_aux_data array. As a result, after constants blinding, the instruction indexes used by JIT may no longer match the indexes in env->insn_aux_data, so the JIT can not use env->insn_aux_data directly. To avoid this mismatch, and given that all existing arch-specific JITs already implement constants blinding with largely duplicated code, move constants blinding from JIT to generic code. v15: - Rebase and target bpf tree - Resotre subprog_start of the fake 'exit' subprog on failure - Fix wrong function name used in comment v14: https://lore.kernel.org/all/cover.1776062885.git.xukuohai@hotmail.com/ - Rebase - Fix comment style - Fix incorrect variable and function name used in commit message v13: https://lore.kernel.org/bpf/20260411133847.1042658-1-xukuohai@huaweicloud.com - Use vmalloc to allocate memory for insn_aux_data copies to match with vfree - Do not free the copied memory of insn_aux_data when restoring from failure - Code cleanup v12: https://lore.kernel.org/bpf/20260403132811.753894-1-xukuohai@huaweicloud.com - Restore env->insn_aux_data on JIT failure - Fix incorrect error code sign (-EFAULT vs EFAULT) - Fix incorrect prog used in the restore path v11: https://lore.kernel.org/bpf/20260403090915.473493-1-xukuohai@huaweicloud.com - Restore env->subprog_info after jit_subprogs() fails - Clear prog->jit_requested and prog->blinding_requested on failure - Use the actual env->insn_aux_data size in clear_insn_aux_data() on failure v10: https://lore.kernel.org/bpf/20260324122052.342751-1-xukuohai@huaweicloud.com - Fix the incorrect call_imm restore in jit_subprogs - Define a dummy void version of bpf_jit_prog_release_other and bpf_patch_insn_data when the corresponding config is not set - Remove the unnecessary #ifdef in x86_64 JIT (Leon Hwang) v9: https://lore.kernel.org/bpf/20260312170255.3427799-1-xukuohai@huaweicloud.com - Make constant blinding available for classic bpf (Eduard) - Clear prog->bpf_func, prog->jited ... on the error path of extra pass (Eduard) - Fix spelling errors and remove unused parameter (Anton Protopopov) v8: https://lore.kernel.org/bpf/20260309140044.2652538-1-xukuohai@huaweicloud.com - Define void bpf_jit_blind_constants() function when CONFIG_BPF_JIT is not set - Move indirect_target fixup for insn patching from bpf_jit_blind_constants() to adjust_insn_aux_data() v7: https://lore.kernel.org/bpf/20260307103949.2340104-1-xukuohai@huaweicloud.com - Move constants blinding logic back to bpf/core.c - Compute ip address before switch statement in x86 JIT - Clear JIT state from error path on arm64 and loongarch v6: https://lore.kernel.org/bpf/20260306102329.2056216-1-xukuohai@huaweicloud.com - Move constants blinding from JIT to verifier - Move call to bpf_prog_select_runtime from bpf_prog_load to verifier v5: https://lore.kernel.org/bpf/20260302102726.1126019-1-xukuohai@huaweicloud.com - Switch to pass env to JIT directly to get rid of copying private insn_aux_data for each prog v4: https://lore.kernel.org/all/20260114093914.2403982-1-xukuohai@huaweicloud.com - Switch to the approach proposed by Eduard, using insn_aux_data to identify indirect jump targets, and emit ENDBR on x86 v3: https://lore.kernel.org/bpf/20251227081033.240336-1-xukuohai@huaweicloud.com - Get rid of unnecessary enum definition (Yonghong Song, Anton Protopopov) v2: https://lore.kernel.org/bpf/20251223085447.139301-1-xukuohai@huaweicloud.com - Exclude instruction arrays not used for indirect jumps (Anton Protopopov) v1: https://lore.kernel.org/bpf/20251127140318.3944249-1-xukuohai@huaweicloud.com ==================== Link: https://patch.msgid.link/20260416064341.151802-1-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:41 -07:00
Xu Kuohai	f6606a44bc	bpf, arm64: Emit BTI for indirect jump target On CPUs that support BTI, the indirect jump selftest triggers a kernel panic because there is no BTI instructions at the indirect jump targets. Fix it by emitting a BTI instruction for each indirect jump target. For reference, below is a sample panic log. Internal error: Oops - BTI: 0000000036000003 [#1] SMP ... Call trace: bpf_prog_2e5f1c71c13ac3e0_big_jump_table+0x54/0xf8 (P) bpf_prog_run_pin_on_cpu+0x140/0x468 bpf_prog_test_run_syscall+0x280/0x3b8 bpf_prog_test_run+0x22c/0x2c0 Fixes: `f4a66cf1cb` ("bpf: arm64: Add support for indirect jumps") Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> # v8 Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> # v12 Acked-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-6-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:41 -07:00
Xu Kuohai	9a0e89dcc9	bpf, x86: Emit ENDBR for indirect jump targets On CPUs that support CET/IBT, the indirect jump selftest triggers a kernel panic because the indirect jump targets lack ENDBR instructions. To fix it, emit an ENDBR instruction to each indirect jump target. Since the ENDBR instruction shifts the position of original jited instructions, fix the instruction address calculation wherever the addresses are used. For reference, below is a sample panic log. Missing ENDBR: bpf_prog_2e5f1c71c13ac3e0_big_jump_table+0x97/0xe1 ------------[ cut here ]------------ kernel BUG at arch/x86/kernel/cet.c:133! Oops: invalid opcode: 0000 [#1] SMP NOPTI ... ? 0xffffffffc00fb258 ? bpf_prog_2e5f1c71c13ac3e0_big_jump_table+0x97/0xe1 bpf_prog_test_run_syscall+0x110/0x2f0 ? fdget+0xba/0xe0 __sys_bpf+0xe4b/0x2590 ? __kmalloc_node_track_caller_noprof+0x1c7/0x680 ? bpf_prog_test_run_syscall+0x215/0x2f0 __x64_sys_bpf+0x21/0x30 do_syscall_64+0x85/0x620 ? bpf_prog_test_run_syscall+0x1e2/0x2f0 Fixes: `493d9e0d60` ("bpf, x86: add support for indirect jumps") Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> # v8 Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> # v12 Acked-by: Leon Hwang <leon.hwang@linux.dev> Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-5-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:40 -07:00
Xu Kuohai	07ae6c130b	bpf: Add helper to detect indirect jump targets Introduce helper bpf_insn_is_indirect_target to check whether a BPF instruction is an indirect jump target. Since the verifier knows which instructions are indirect jump targets, add a new flag indirect_target to struct bpf_insn_aux_data to mark them. The verifier sets this flag when verifying an indirect jump target instruction, and the helper checks the flag to determine whether an instruction is an indirect jump target. Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> #v8 Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> #v12 Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-4-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:40 -07:00
Xu Kuohai	d9ef13f727	bpf: Pass bpf_verifier_env to JIT Pass bpf_verifier_env to bpf_int_jit_compile(). The follow-up patch will use env->insn_aux_data in the JIT stage to detect indirect jump targets. Since bpf_prog_select_runtime() can be called by cbpf and lib/test_bpf.c code without verifier, introduce helper __bpf_prog_select_runtime() to accept the env parameter. Remove the call to bpf_prog_select_runtime() in bpf_prog_load(), and switch to call __bpf_prog_select_runtime() in the verifier, with env variable passed. The original bpf_prog_select_runtime() is preserved for cbpf and lib/test_bpf.c, where env is NULL. Now all constants blinding calls are moved into the verifier, except the cbpf and lib/test_bpf.c cases. The instructions arrays are adjusted by bpf_patch_insn_data() function for normal cases, so there is no need to call adjust_insn_arrays() in bpf_jit_blind_constants(). Remove it. Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> # v8 Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> # v12 Acked-by: Hengqi Chen <hengqi.chen@gmail.com> # v14 Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-3-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:40 -07:00
Xu Kuohai	d3e945223e	bpf: Move constants blinding out of arch-specific JITs During the JIT stage, constants blinding rewrites instructions but only rewrites the private instruction copy of the JITed subprog, leaving the global env->prog->insnsi and env->insn_aux_data untouched. This causes a mismatch between subprog instructions and the global state, making it difficult to use the global data in the JIT. To avoid this mismatch, and given that all arch-specific JITs already support constants blinding, move it to the generic verifier code, and switch to rewrite the global env->prog->insnsi with the global states adjusted, as other rewrites in the verifier do. This removes the constants blinding calls in each JIT, which are largely duplicated code across architectures. Since constants blinding is only required for JIT, and there are two JIT entry functions, jit_subprogs() for BPF programs with multiple subprogs and bpf_prog_select_runtime() for programs with no subprogs, move the constants blinding invocation into these two functions. In the verifier path, bpf_patch_insn_data() is used to keep global verifier auxiliary data in sync with patched instructions. A key question is whether this global auxiliary data should be restored on the failure path. Besides instructions, bpf_patch_insn_data() adjusts: - prog->aux->poke_tab - env->insn_array_maps - env->subprog_info - env->insn_aux_data For prog->aux->poke_tab, it is only used by JIT or only meaningful after JIT succeeds, so it does not need to be restored on the failure path. For env->insn_array_maps, when JIT fails, programs using insn arrays are rejected by bpf_insn_array_ready() due to missing JIT addresses. Hence, env->insn_array_maps is only meaningful for JIT and does not need to be restored. For subprog_info, if jit_subprogs fails and CONFIG_BPF_JIT_ALWAYS_ON is not enabled, kernel falls back to interpreter. In this case, env->subprog_info is used to determine subprogram stack depth. So it must be restored on failure. For env->insn_aux_data, it is freed by clear_insn_aux_data() at the end of bpf_check(). Before freeing, clear_insn_aux_data() loops over env->insn_aux_data to release jump targets recorded in it. The loop uses env->prog->len as the array length, but this length no longer matches the actual size of the adjusted env->insn_aux_data array after constants blinding. To address it, a simple approach is to keep insn_aux_data as adjusted after failure, since it will be freed shortly, and record its actual size for the loop in clear_insn_aux_data(). But since clear_insn_aux_data() uses the same index to loop over both env->prog->insnsi and env->insn_aux_data, this approach results in incorrect index for the insnsi array. So an alternative approach is adopted: clone the original env->insn_aux_data before blinding and restore it after failure, similar to env->prog. For classic BPF programs, constants blinding works as before since it is still invoked from bpf_prog_select_runtime(). Reviewed-by: Anton Protopopov <a.s.protopopov@gmail.com> # v8 Reviewed-by: Hari Bathini <hbathini@linux.ibm.com> # powerpc jit Reviewed-by: Pu Lehui <pulehui@huawei.com> # riscv jit Acked-by: Hengqi Chen <hengqi.chen@gmail.com> # loongarch jit Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20260416064341.151802-2-xukuohai@huaweicloud.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-16 07:03:40 -07:00
Martin KaFai Lau	a204466529	Merge branch 'bpf-sockmap-fix-af_unix-null-ptr-deref-in-proto-update' Michal Luczaj says: ==================== bpf, sockmap: Fix af_unix null-ptr-deref in proto update Updating sockmap/sockhash using a unix sock races unix_stream_connect(): when sock_map_sk_state_allowed() passes (sk_state == TCP_ESTABLISHED), unix_peer(sk) in unix_stream_bpf_update_proto() may still return NULL. ==================== Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-0-2af6fe97918e@rbox.co Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2026-04-15 17:23:18 -07:00
Michal Luczaj	64c2f93fc3	bpf, sockmap: Take state lock for af_unix iter When a BPF iterator program updates a sockmap, there is a race condition in unix_stream_bpf_update_proto() where the `peer` pointer can become stale[1] during a state transition TCP_ESTABLISHED -> TCP_CLOSE. CPU0 bpf CPU1 close -------- ---------- // unix_stream_bpf_update_proto() sk_pair = unix_peer(sk) if (unlikely(!sk_pair)) return -EINVAL; // unix_release_sock() skpair = unix_peer(sk); unix_peer(sk) = NULL; sock_put(skpair) sock_hold(sk_pair) // UaF More practically, this fix guarantees that the iterator program is consistently provided with a unix socket that remains stable during iterator execution. [1]: BUG: KASAN: slab-use-after-free in unix_stream_bpf_update_proto+0x155/0x490 Write of size 4 at addr ffff8881178c9a00 by task test_progs/2231 Call Trace: dump_stack_lvl+0x5d/0x80 print_report+0x170/0x4f3 kasan_report+0xe4/0x1c0 kasan_check_range+0x125/0x200 unix_stream_bpf_update_proto+0x155/0x490 sock_map_link+0x71c/0xec0 sock_map_update_common+0xbc/0x600 sock_map_update_elem+0x19a/0x1f0 bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217 bpf_iter_run_prog+0x21e/0xae0 bpf_iter_unix_seq_show+0x1e0/0x2a0 bpf_seq_read+0x42c/0x10d0 vfs_read+0x171/0xb20 ksys_read+0xff/0x200 do_syscall_64+0xf7/0x5e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Allocated by task 2236: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 __kasan_slab_alloc+0x63/0x80 kmem_cache_alloc_noprof+0x1d5/0x680 sk_prot_alloc+0x59/0x210 sk_alloc+0x34/0x470 unix_create1+0x86/0x8a0 unix_stream_connect+0x318/0x15b0 __sys_connect+0xfd/0x130 __x64_sys_connect+0x72/0xd0 do_syscall_64+0xf7/0x5e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Freed by task 2236: kasan_save_stack+0x30/0x50 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x70 __kasan_slab_free+0x47/0x70 kmem_cache_free+0x11c/0x590 __sk_destruct+0x432/0x6e0 unix_release_sock+0x9b3/0xf60 unix_release+0x8a/0xf0 __sock_release+0xb0/0x270 sock_close+0x18/0x20 __fput+0x36e/0xac0 fput_close_sync+0xe5/0x1a0 __x64_sys_close+0x7d/0xd0 do_syscall_64+0xf7/0x5e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `2c860a43dd` ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.") Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-5-2af6fe97918e@rbox.co	2026-04-15 17:23:14 -07:00
Michal Luczaj	dca38b7734	bpf, sockmap: Fix af_unix null-ptr-deref in proto update unix_stream_connect() sets sk_state (`WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED)`) _before_ it assigns a peer (`unix_peer(sk) = newsk`). sk_state == TCP_ESTABLISHED makes sock_map_sk_state_allowed() believe that socket is properly set up, which would include having a defined peer. IOW, there's a window when unix_stream_bpf_update_proto() can be called on socket which still has unix_peer(sk) == NULL. CPU0 bpf CPU1 connect -------- ------------ WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) sock_map_sk_state_allowed(sk) ... sk_pair = unix_peer(sk) sock_hold(sk_pair) sock_hold(newsk) smp_mb__after_atomic() unix_peer(sk) = newsk BUG: kernel NULL pointer dereference, address: 0000000000000080 RIP: 0010:unix_stream_bpf_update_proto+0xa0/0x1b0 Call Trace: sock_map_link+0x564/0x8b0 sock_map_update_common+0x6e/0x340 sock_map_update_elem_sys+0x17d/0x240 __sys_bpf+0x26db/0x3250 __x64_sys_bpf+0x21/0x30 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Initial idea was to move peer assignment _before_ the sk_state update[1], but that involved an additional memory barrier, and changing the hot path was rejected. Then a NULL check during proto update in unix_stream_bpf_update_proto() was considered[2], but the follow-up discussion[3] focused on the root cause, i.e. sockmap update taking a wrong lock. Or, more specifically, missing unix_state_lock()[4]. In the end it was concluded that teaching sockmap about the af_unix locking would be unnecessarily complex[5]. Complexity aside, since BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT are allowed to update sockmaps, sock_map_update_elem() taking the unix lock, as it is currently implemented in unix_state_lock(): spin_lock(&unix_sk(s)->lock), would be problematic. unix_state_lock() taken in a process context, followed by a softirq-context TC BPF program attempting to take the same spinlock -- deadlock[6]. This way we circled back to the peer check idea[2]. [1]: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/ [2]: https://lore.kernel.org/netdev/20240610174906.32921-1-kuniyu@amazon.com/ [3]: https://lore.kernel.org/netdev/7603c0e6-cd5b-452b-b710-73b64bd9de26@linux.dev/ [4]: https://lore.kernel.org/netdev/CAAVpQUA+8GL_j63CaKb8hbxoL21izD58yr1NvhOhU=j+35+3og@mail.gmail.com/ [5]: https://lore.kernel.org/bpf/CAAVpQUAHijOMext28Gi10dSLuMzGYh+jK61Ujn+fZ-wvcODR2A@mail.gmail.com/ [6]: https://lore.kernel.org/bpf/dd043c69-4d03-46fe-8325-8f97101435cf@linux.dev/ Summary of scenarios where af_unix/stream connect() may race a sockmap update: 1. connect() vs. bpf(BPF_MAP_UPDATE_ELEM), i.e. sock_map_update_elem_sys() Implemented NULL check is sufficient. Once assigned, socket peer won't be released until socket fd is released. And that's not an issue because sock_map_update_elem_sys() bumps fd refcnf. 2. connect() vs BPF program doing update Update restricted per verifier.c:may_update_sockmap() to BPF_PROG_TYPE_TRACING/BPF_TRACE_ITER BPF_PROG_TYPE_SOCK_OPS (bpf_sock_map_update() only) BPF_PROG_TYPE_SOCKET_FILTER BPF_PROG_TYPE_SCHED_CLS BPF_PROG_TYPE_SCHED_ACT BPF_PROG_TYPE_XDP BPF_PROG_TYPE_SK_REUSEPORT BPF_PROG_TYPE_FLOW_DISSECTOR BPF_PROG_TYPE_SK_LOOKUP Plus one more race to consider: CPU0 bpf CPU1 connect -------- ------------ WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED) sock_map_sk_state_allowed(sk) sock_hold(newsk) smp_mb__after_atomic() unix_peer(sk) = newsk sk_pair = unix_peer(sk) if (unlikely(!sk_pair)) return -EINVAL; CPU1 close ---------- skpair = unix_peer(sk); unix_peer(sk) = NULL; sock_put(skpair) // use after free? sock_hold(sk_pair) 2.1 BPF program invoking helper function bpf_sock_map_update() -> BPF_CALL_4(bpf_sock_map_update(), ...) Helper limited to BPF_PROG_TYPE_SOCK_OPS. Nevertheless, a unix sock might be accessible via bpf_map_lookup_elem(). Which implies sk already having psock, which in turn implies sk already having sk_pair. Since sk_psock_destroy() is queued as RCU work, sk_pair won't go away while BPF executes the update. 2.2 BPF program invoking helper function bpf_map_update_elem() -> sock_map_update_elem() 2.2.1 Unix sock accessible to BPF prog only via sockmap lookup in BPF_PROG_TYPE_SOCKET_FILTER, BPF_PROG_TYPE_SCHED_CLS, BPF_PROG_TYPE_SCHED_ACT, BPF_PROG_TYPE_XDP, BPF_PROG_TYPE_SK_REUSEPORT, BPF_PROG_TYPE_FLOW_DISSECTOR, BPF_PROG_TYPE_SK_LOOKUP. Pretty much the same as case 2.1. 2.2.2 Unix sock accessible to BPF program directly: BPF_PROG_TYPE_TRACING, narrowed down to BPF_TRACE_ITER. Sockmap iterator (sock_map_seq_ops) is safe: unix sock residing in a sockmap means that the sock already went through the proto update step. Unix sock iterator (bpf_iter_unix_seq_ops), on the other hand, gives access to socks that may still be unconnected. Which means iterator prog can race sockmap/proto update against connect(). BUG: KASAN: null-ptr-deref in unix_stream_bpf_update_proto+0x253/0x4d0 Write of size 4 at addr 0000000000000080 by task test_progs/3140 Call Trace: dump_stack_lvl+0x5d/0x80 kasan_report+0xe4/0x1c0 kasan_check_range+0x125/0x200 unix_stream_bpf_update_proto+0x253/0x4d0 sock_map_link+0x71c/0xec0 sock_map_update_common+0xbc/0x600 sock_map_update_elem+0x19a/0x1f0 bpf_prog_bbbf56096cdd4f01_selective_dump_unix+0x20c/0x217 bpf_iter_run_prog+0x21e/0xae0 bpf_iter_unix_seq_show+0x1e0/0x2a0 bpf_seq_read+0x42c/0x10d0 vfs_read+0x171/0xb20 ksys_read+0xff/0x200 do_syscall_64+0xf7/0x5e0 entry_SYSCALL_64_after_hwframe+0x76/0x7e While the introduced NULL check prevents null-ptr-deref in the BPF program path as well, it is insufficient to guard against a poorly timed close() leading to a use-after-free. This will be addressed in a subsequent patch. Fixes: `c63829182c` ("af_unix: Implement ->psock_update_sk_prot()") Closes: https://lore.kernel.org/netdev/ba5c50aa-1df4-40c2-ab33-a72022c5a32e@rbox.co/ Reported-by: Michal Luczaj <mhal@rbox.co> Reported-by: 钱一铭 <yimingqian591@gmail.com> Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-4-2af6fe97918e@rbox.co	2026-04-15 17:22:58 -07:00
Michal Luczaj	997b8483d4	selftests/bpf: Extend bpf_iter_unix to attempt deadlocking Updating a sockmap from a unix iterator prog may lead to a deadlock. Piggyback on the original selftest. Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-3-2af6fe97918e@rbox.co	2026-04-15 17:22:55 -07:00
Michal Luczaj	4d328dd695	bpf, sockmap: Fix af_unix iter deadlock bpf_iter_unix_seq_show() may deadlock when lock_sock_fast() takes the fast path and the iter prog attempts to update a sockmap. Which ends up spinning at sock_map_update_elem()'s bh_lock_sock(): WARNING: possible recursive locking detected test_progs/1393 is trying to acquire lock: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: sock_map_update_elem+0xdb/0x1f0 but task is already holding lock: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(slock-AF_UNIX); lock(slock-AF_UNIX); * DEADLOCK * May be due to missing lock nesting notation 4 locks held by test_progs/1393: #0: ffff88814b59c790 (&p->lock){+.+.}-{4:4}, at: bpf_seq_read+0x59/0x10d0 #1: ffff88811ec25fd8 (sk_lock-AF_UNIX){+.+.}-{0:0}, at: bpf_seq_read+0x42c/0x10d0 #2: ffff88811ec25f58 (slock-AF_UNIX){+...}-{3:3}, at: __lock_sock_fast+0x37/0xe0 #3: ffffffff85a6a7c0 (rcu_read_lock){....}-{1:3}, at: bpf_iter_run_prog+0x51d/0xb00 Call Trace: dump_stack_lvl+0x5d/0x80 print_deadlock_bug.cold+0xc0/0xce __lock_acquire+0x130f/0x2590 lock_acquire+0x14e/0x2b0 _raw_spin_lock+0x30/0x40 sock_map_update_elem+0xdb/0x1f0 bpf_prog_2d0075e5d9b721cd_dump_unix+0x55/0x4f4 bpf_iter_run_prog+0x5b9/0xb00 bpf_iter_unix_seq_show+0x1f7/0x2e0 bpf_seq_read+0x42c/0x10d0 vfs_read+0x171/0xb20 ksys_read+0xff/0x200 do_syscall_64+0x6b/0x3a0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: `2c860a43dd` ("bpf: af_unix: Implement BPF iterator for UNIX domain socket.") Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-2-2af6fe97918e@rbox.co	2026-04-15 17:22:47 -07:00
Michal Luczaj	a25566084e	bpf, sockmap: Annotate af_unix sock:: Sk_state data-races sock_map_sk_state_allowed() and sock_map_redirect_allowed() read af_unix socket sk_state locklessly. Use READ_ONCE(). Note that for sock_map_redirect_allowed() change affects not only af_unix, but all non-TCP sockets (UDP, af_vsock). Suggested-by: Kuniyuki Iwashima <kuniyu@google.com> Suggested-by: Martin KaFai Lau <martin.lau@linux.dev> Signed-off-by: Michal Luczaj <mhal@rbox.co> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260414-unix-proto-update-null-ptr-deref-v4-1-2af6fe97918e@rbox.co	2026-04-15 17:22:42 -07:00
Alexei Starovoitov	9d8e92e15f	Merge branch 'bpf-copy-bpf-token-from-main-program-to-subprograms' Eduard Zingerman says: ==================== bpf: copy BPF token from main program to subprograms bpf_jit_subprogs() omits aux->token when it creates a struct bpf_prog_aux instances for a subprograms. This means that for programs loaded via BPF token (i.e., from a non-init user namespace), subprograms fail the bpf_token_capable() check in bpf_prog_kallsyms_add() and don't appear in /proc/kallsyms. Which in-turn makes it impossible to freplace such subprograms. Changelog: v3 -> v4: - check sysctl_set calls for errors (sashiko). v2 -> v3: - mark selftest as serial (sashiko). v1 -> v2: - target bpf-next tree (fixups.c) instead of bpf tree (verifier.c). v1: https://lore.kernel.org/bpf/20260414-subprog-token-fix-v1-0-5b1a38e01546@gmail.com/T/ v2: https://lore.kernel.org/bpf/20260414-subprog-token-fix-v2-0-59146c31f6f1@gmail.com/T/ v3: https://lore.kernel.org/bpf/20260415-subprog-token-fix-v3-0-6fefe1d51646@gmail.com/T/ ==================== Link: https://patch.msgid.link/20260415-subprog-token-fix-v4-0-9bd000e8b068@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 16:46:55 -07:00
Eduard Zingerman	969fb456ff	selftests/bpf: verify kallsyms entries for token-loaded subprograms Add a test that loads an XDP program with a global subprogram using a BPF token from a user namespace, then verifies that both the main program and the subprogram appear in /proc/kallsyms. This exercises the bpf_prog_kallsyms_add() path for subprograms and would have caught the missing aux->token copy in bpf_jit_subprogs(). load_kallsyms_local() filters out kallsyms with zero addresses. For a process with limited capabilities to read kallsym addresses the following sysctl variables have to be set to zero: - /proc/sys/kernel/perf_event_paranoid - /proc/sys/kernel/kptr_restrict Set these variables using sysctl_set() utility function extracted from unpriv_bpf_disabled.c to a separate c/header. Since the test modifies global system state, mark it as serial. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260415-subprog-token-fix-v4-2-9bd000e8b068@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 16:46:47 -07:00
Eduard Zingerman	0251e40c48	bpf: copy BPF token from main program to subprograms bpf_jit_subprogs() copies various fields from the main program's aux to each subprogram's aux, but omits the BPF token. This causes bpf_prog_kallsyms_add() to fail for subprograms loaded via BPF token, as bpf_token_capable() falls back to capable() in init_user_ns when token is NULL. Copy prog->aux->token to func[i]->aux->token so that subprograms inherit the same capability delegation as the main program. Fixes: `d79a354975` ("bpf: Consistently use BPF token throughout BPF verifier logic") Signed-off-by: Tao Chen <ctao@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260415-subprog-token-fix-v4-1-9bd000e8b068@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 16:46:47 -07:00
Alexei Starovoitov	d3fdb3db13	Merge branch 'fix-garbage-data-in-task-local-data' Amery Hung says: ==================== Fix garbage data in task local data Hi, The patchset fixes two scenarios where BPF side task local data API may see garbage data and adds corresponding selftests. ==================== Link: https://patch.msgid.link/20260413190259.358442-1-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:10:21 -07:00
Amery Hung	b4b0233730	selftests/bpf: Test small task local data allocation Make sure task local data is working correctly for different allocation sizes. Existing task local data selftests allocate the maximum amount of data possible but miss the garbage data issue when only small amount of data is allocated. Therefore, test small data allocations as well. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-4-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:10:20 -07:00
Amery Hung	615e55a241	selftests/bpf: Fix tld_get_data() returning garbage data BPF side tld_get_data() currently may return garbage when tld_data_u is not aligned to page_size. This can happen when small amount of memory is allocated for tld_data_u. The misalignment is supposed to be allowed and the BPF side will use tld_data_u->start to reference the tld_data_u in a page. However, since "start" is within tld_data_u, there is no way to know the correct "start" in the first place. As a result, BPF programs will see garbage data. The selftest did not catch this since it tries to allocate the maximum amount of data possible (i.e., a page) such that tld_data_u->start is always correct. Fix it by moving tld_data_u->start to tld_data_map->start. The original field is now renamed as unused instead of removing it because BPF side tld_get_data() views off = 0 returned from tld_fetch_key() as uninitialized. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-3-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:10:20 -07:00
Amery Hung	36bf7beb9d	selftests/bpf: Prevent allocating data larger than a page Fix a bug in the task local data library that may allocate more than a a page for tld_data_u. This may happen when users set a too large TLD_DYN_DATA_SIZE, so check it when creating dynamic TLD fields and fix the corresponding selftest. Signed-off-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20260413190259.358442-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:10:20 -07:00
Alexei Starovoitov	b3dde701e7	Merge branch 'bpf-arm64-riscv-remove-redundant-icache-flush-after-pack-allocator-finalize' Puranjay Mohan says: ==================== bpf, arm64/riscv: Remove redundant icache flush after pack allocator finalize Changelog: v1: https://lore.kernel.org/all/20260413123256.3296452-1-puranjay@kernel.org/ Changes in v2: - Remove "#include <asm/cacheflush.h>" as it is not needed now. - Add Acked-by: Song Liu <song@kernel.org> When the BPF prog pack allocator was added for arm64 and riscv, the existing bpf_flush_icache() calls were retained after bpf_jit_binary_pack_finalize(). However, the finalize path copies the JITed code via architecture text patching routines (__text_poke on arm64, patch_text_nosync on riscv) that already perform a full flush_icache_range() internally. The subsequent bpf_flush_icache() repeats the same cache maintenance on the same range. Remove the redundant flush and the now-unused bpf_flush_icache() definitions on both architectures. ==================== Link: https://patch.msgid.link/20260413191111.3426023-1-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:09:47 -07:00
Puranjay Mohan	46ee1342b8	bpf, riscv: Remove redundant bpf_flush_icache() after pack allocator finalize bpf_flush_icache() calls flush_icache_range() to clean the data cache and invalidate the instruction cache for the JITed code region. However, since commit `48a8f78c50` ("bpf, riscv: use prog pack allocator in the BPF JIT"), this flush is redundant. bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX region via bpf_arch_text_copy() -> patch_text_nosync(), and patch_text_nosync() already calls flush_icache_range() on the written range. The subsequent bpf_flush_icache() repeats the same cache maintenance on an overlapping range. Remove the redundant bpf_flush_icache() call and its now-unused definition. Fixes: `48a8f78c50` ("bpf, riscv: use prog pack allocator in the BPF JIT") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Pu Lehui <pulehui@huawei.com> Tested-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/20260413191111.3426023-3-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:09:46 -07:00
Puranjay Mohan	42f18ae530	bpf, arm64: Remove redundant bpf_flush_icache() after pack allocator finalize bpf_flush_icache() calls flush_icache_range() to clean the data cache and invalidate the instruction cache for the JITed code region. However, since commit `1dad391dae` ("bpf, arm64: use bpf_prog_pack for memory management"), this flush is redundant. bpf_jit_binary_pack_finalize() copies the JITed instructions to the ROX region via bpf_arch_text_copy() -> aarch64_insn_copy() -> __text_poke(), and __text_poke() already calls flush_icache_range() on the written range. The subsequent bpf_flush_icache() repeats the same cache maintenance on an overlapping range, including an unnecessary second synchronous IPI to all CPUs via kick_all_cpus_sync(). Remove the redundant bpf_flush_icache() call and its now-unused definition. Fixes: `1dad391dae` ("bpf, arm64: use bpf_prog_pack for memory management") Acked-by: Song Liu <song@kernel.org> Signed-off-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20260413191111.3426023-2-puranjay@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:09:46 -07:00
Alexei Starovoitov	4fddde2a73	bpf: Fix use-after-free in arena_vm_close on fork arena_vm_open() only bumps vml->mmap_count but never registers the child VMA in arena->vma_list. The vml->vma always points at the parent VMA, so after parent munmap the pointer dangles. If the child then calls bpf_arena_free_pages(), zap_pages() reads the stale vml->vma triggering use-after-free. Fix this by preventing the arena VMA from being inherited across fork with VM_DONTCOPY, and preventing VMA splits via the may_split callback. Also reject mremap with a .mremap callback returning -EINVAL. A same-size mremap(MREMAP_FIXED) on the full arena VMA reaches copy_vma() through the following path: check_prep_vma() - returns 0 early: new_len == old_len skips VM_DONTEXPAND check prep_move_vma() - vm_start == old_addr and vm_end == old_addr + old_len so may_split is never called move_vma() copy_vma_and_data() copy_vma() vm_area_dup() - copies vm_private_data (vml pointer) vm_ops->open() - bumps vml->mmap_count vm_ops->mremap() - returns -EINVAL, rollback unmaps new VMA The refcount ensures the rollback's arena_vm_close does not free the vml shared with the original VMA. Reported-by: Weiming Shi <bestswngs@gmail.com> Reported-by: Xiang Mei <xmei5@asu.edu> Fixes: `317460317a` ("bpf: Introduce bpf_arena.") Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Link: https://lore.kernel.org/r/20260413194245.21449-1-alexei.starovoitov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:08:55 -07:00
Daniel Borkmann	1dd8be4ec7	bpf, arm64: Fix off-by-one in check_imm signed range check check_imm(bits, imm) is used in the arm64 BPF JIT to verify that a branch displacement (in arm64 instruction units) fits into the signed N-bit immediate field of a B, B.cond or CBZ/CBNZ encoding before it is handed to the encoder. The macro currently tests for (imm > 0 && imm >> bits) \|\| (imm < 0 && ~imm >> bits) which admits values in [-2^N, 2^N) — effectively a signed (N+1)-bit range. A signed N-bit field only holds [-2^(N-1), 2^(N-1)), so the check admits one extra bit of range on each side. In particular, for check_imm19(), values in [2^18, 2^19) slip past the check but do not fit into the 19-bit signed imm19 field of B.cond. aarch64_insn_encode_immediate() then masks the raw value into the 19-bit field, setting bit 18 (the sign bit) and flipping a forward branch into a backward one. Same class of issue exists for check_imm26() and the B/BL encoding. Shift by (bits - 1) instead of bits so the actual signed N-bit range is enforced. Fixes: `e54bcde3d6` ("arm64: eBPF JIT compiler") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260415121403.639619-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:08:03 -07:00
Daniel Borkmann	48d83d9493	bpf, arm64: Reject out-of-range B.cond targets aarch64_insn_gen_cond_branch_imm() calls label_imm_common() to compute a 19-bit signed byte offset for a conditional branch, but unlike its siblings aarch64_insn_gen_branch_imm() and aarch64_insn_gen_comp_branch_imm(), it does not check whether label_imm_common() returned its out-of-range sentinel (range) before feeding the value to aarch64_insn_encode_immediate(). aarch64_insn_encode_immediate() unconditionally masks the value with the 19-bit field mask, so an offset that was rejected by label_imm_common() gets silently truncated. With the sentinel value SZ_1M, the resulting field ends up with bit 18 (the sign bit of the 19-bit signed displacement) set, and the CPU decodes it as a ~1 MiB backward branch, producing an incorrectly targeted B.cond instruction. For code-gen locations like the emit_bpf_tail_call() this function is the only barrier between an overflowing displacement and a silently miscompiled branch. Fix it by returning AARCH64_BREAK_FAULT when the offset is out of range, so callers see a loud failure instead of a silently misencoded branch. validate_code() scans the generated image for any AARCH64_BREAK_FAULT and then lets the JIT fail. Fixes: `345e0d35ec` ("arm64: introduce aarch64_insn_gen_cond_branch_imm()") Fixes: `c94ae4f7c5` ("arm64: insn: remove BUG_ON from codegen") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Link: https://lore.kernel.org/r/20260415121403.639619-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 12:07:57 -07:00
Alexei Starovoitov	2865c3f3f6	Merge branch 'bpf-arg-tracking-for-imprecise-multi-offset-bpf_st-stx' Eduard Zingerman says: ==================== bpf: arg tracking for imprecise/multi-offset BPF_ST/STX When the static arg tracking analysis encounters a store through a pointer with imprecise or multi-offset destination, it must use weak updates (join) instead of strong updates (overwrite) for the affected at_stack slots. At runtime only one slot is actually written; the others retain their old values. Two cases are addressed: - BPF_STX, handled by spill_to_stack(). It was gated on `dst_is_local_fp = (frame == depth)`, which missed ARG_IMPRECISE pointers entirely. - BPF_ST, handled by clear_stack_for_all_offs(). It delegates to clear_overlapping_stack_slots() which unconditionally set `at_stack[i] = none`. Change to `at_stack[i] = join(old, none)` when multiple candidate slots exist (cnt != 1), so that untouched slots preserve their tracked values. No veristat diff compared to current master when tested on selftests, sched_ext, cilium and a set of Meta internal programs. This addresses issues reported by sashiko for patch #7 in [1]. [1] https://sashiko.dev/#/patchset/20260410-patch-set-v4-0-5d4eecb343db%40gmail.com Changelog: v2 -> v3: - Use check_add_overflow() in arg_add() (Alexei). - Add missing fixes tag (CI bot). - Remove unused __imm in the selftest (sashiko). v1 -> v2: - Delete the OFF_IMPRECISE constant, always rely on arg_track->cnt == 0 as a marker the offset is imprecise. (Alexei). - Squash all patches together to simplify backporting to 'bpf' branch (Alexei). v1: https://lore.kernel.org/bpf/20260413-stacklive-fixes-v1-0-9f48a9999d6e@gmail.com/T/ v2: https://lore.kernel.org/bpf/20260413-stacklive-fixes-v2-0-ff91c4f8d273@gmail.com/T/ --- ==================== Link: https://patch.msgid.link/20260413-stacklive-fixes-v2-0-398e126e5cf3@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 08:40:48 -07:00
Eduard Zingerman	d97cc8fc99	selftests/bpf: arg tracking for imprecise/multi-offset BPF_ST/STX Add test cases for clear_stack_for_all_offs and dst_is_local_fp handling of multi-offset and ARG_IMPRECISE stack pointers: - st_imm_join_with_multi_off: BPF_ST through multi-offset dst should join at_stack with none instead of overwriting both candidate slots. - st_imm_join_with_imprecise_off: BPF_ST through offset-imprecise dst should join at_stack with none instead of clearing all slots. - st_imm_join_with_single_off: a canary checking that BPF_ST with a known offset overwrites slot instead of joining. - imprecise_dst_spill_join: BPF_STX through ARG_IMPRECISE dst should be recognized as a local spill and join at_stack with the written value. Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260413-stacklive-fixes-v2-2-398e126e5cf3@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 08:40:48 -07:00
Eduard Zingerman	ecdd4fd8a5	bpf: fix arg tracking for imprecise/multi-offset BPF_ST/STX BPF_STX through ARG_IMPRECISE dst should be recognized as a local spill and join at_stack with the written value. For example, consider the following situation: // r1 = ARG_IMPRECISE{mask=BIT(0)\|BIT(1)} (u64 )(r1 + 0) = r8 Here the analysis should produce an equivalent of at_stack[] = join(old, r8) BPF_ST through multi-offset or imprecise dst should join at_stack with none instead of overwriting the slots. For example, consider the following situation: // r1 = ARG_IMPRECISE{mask=BIT(0)\|BIT(1)} (u64 )(r1 + 0) = 0 Here the analysis should produce an equivalent of at_stack[r1] = join(old, none). Move the definition of the clear_overlapping_stack_slots() in order to have __arg_track_join() visible. Remove the OFF_IMPRECISE constant to avoid having two ways to express imprecise offset. Only 'offset-imprecise {frame=N, cnt=0}' remains. Fixes: `bf0c571f7f` ("bpf: introduce forward arg-tracking dataflow analysis") Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20260413-stacklive-fixes-v2-1-398e126e5cf3@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 08:40:47 -07:00
Shung-Hsi Yu	813f336269	selftests/bpf: Fix timer_start_deadlock failure due to hrtimer change Since commit `f2e388a019` ("hrtimer: Reduce trace noise in hrtimer_start()"), hrtimer_cancel tracepoint is no longer called when a hrtimer is re-armed. So instead of a hrtimer_cancel followed by hrtimer_start tracepoint events, there is now only a since hrtimer_start tracepoint event with the new was_armed field set to 1, to indicated that the hrtimer was previously armed. Update timer_start_deadlock accordingly so it traces hrtimer_start tracepoint instead, with was_armed used as guard. Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com> Tested-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260415120329.129192-1-shung-hsi.yu@suse.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2026-04-15 08:25:04 -07:00
Linus Torvalds	1f5ffc6721	Fix mismerge of the arm64 / timer-core interrupt handling changes Commit `c43267e679` ("Merge tag 'arm64-upstream' of git://...") had a conflict in the irq entry/exit code due to commit `c5538d0141` ("entry: Split kernel mode logic from irqentry_{enter,exit}()") having moved the core code in irqentry_enter/exit() from kernel/entry/common.c into helper inline functions in include/linux/irq-entry-common.h. On the other side of the merge, the timer-core code had introduced deferred hrtimer rearming infrastructure in commit `0e98eb1481` ("entry: Prepare for deferred hrtimer rearming"), adding two calls to hrtimer_rearm_deferred() in irqentry_enter(). When merging the two, moving the two calls to the new location wasn't a problem, but afterwards I had made the mistake of looking what had happened in linux-next. And linux-next had a very different merge resolution in commit 04f02dc3ea74 ("Merge tag 'entry-for-arm64-26-04-08' into sched/hrtick"), which had unified the two calls into one single call-site in irqentry_exit_to_kernel_mode_preempt(). And that merge resolution looked cleverer than the straightforward one I had done, so I re-did my merge the way it had been done in linux-next. But it turns out nobody apparently tests linux-next, and the merge in linux-next was just wrong. The difference is that hrtimer_rearm_deferred() doesn't get called at all for the case when state.exit_rcu is true, and the boot will typically fail due to timers not triggering correctly. So this undoes the "clever" merge, and does the straightforward one instead. Fixes: `c43267e679` ("Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux" Reported-and-tested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Link: https://lore.kernel.org/all/CAADnVQJ=MoiX4=guPWhL9vtnAELkpNx=GNm8RA1-aV424UFz2A@mail.gmail.com/ Link: https://lore.kernel.org/all/CAHk-=wg8+BER4VyFKG3rnPi2gXxbf-jbHS=EU+xhFqGVQfbutw@mail.gmail.com/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2026-04-14 23:03:02 -07:00
Linus Torvalds	5c0f43e853	Merge tag 'kernel-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull pid_namespace updates from Christian Brauner: - pid_namespace: make init creation more flexible Annotate ->child_reaper accesses with {READ,WRITE}_ONCE() to protect the unlocked readers from cpu/compiler reordering, and enforce that pid 1 in a pid namespace is always the first allocated pid (the set_tid path already required this). On top of that, allow opening pid_for_children before the pid namespace init has been created. This lets one process create the pid namespace and a different process create the init via setns(), which makes clone3(set_tid) usable in all cases evenly and is particularly useful to CRIU when restoring nested containers. A new selftest covers both the basic create-pidns-then-init flow and the cross-process variant, and a MAINTAINERS entry for the pid namespace code is added. - unrelated signal cleanup: update outdated comment for the removed freezable_schedule() * tag 'kernel-7.1-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: signal: update outdated comment for removed freezable_schedule() MAINTAINERS: add a pid namespace entry selftests: Add tests for creating pidns init via setns pid_namespace: allow opening pid_for_children before init was created pid: check init is created first after idr alloc pid_namespace: avoid optimization of accesses to ->child_reaper	2026-04-14 20:28:40 -07:00
Linus Torvalds	7c8a4671dc	Merge tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: - Add FSMOUNT_NAMESPACE flag to fsmount() that creates a new mount namespace with the newly created filesystem attached to a copy of the real rootfs. This returns a namespace file descriptor instead of an O_PATH mount fd, similar to how OPEN_TREE_NAMESPACE works for open_tree(). This allows creating a new filesystem and immediately placing it in a new mount namespace in a single operation, which is useful for container runtimes and other namespace-based isolation mechanisms. This accompanies OPEN_TREE_NAMESPACE and avoids a needless detour via OPEN_TREE_NAMESPACE to get the same effect. Will be especially useful when you mount an actual filesystem to be used as the container rootfs. - Currently, creating a new mount namespace always copies the entire mount tree from the caller's namespace. For containers and sandboxes that intend to build their mount table from scratch this is wasteful: they inherit a potentially large mount tree only to immediately tear it down. This series adds support for creating a mount namespace that contains only a clone of the root mount, with none of the child mounts. Two new flags are introduced: - CLONE_EMPTY_MNTNS (0x400000000) for clone3(), using the 64-bit flag space - UNSHARE_EMPTY_MNTNS (0x00100000) for unshare() Both flags imply CLONE_NEWNS. The resulting namespace contains a single nullfs root mount with an immutable empty directory. The intended workflow is to then mount a real filesystem (e.g., tmpfs) over the root and build the mount table from there. - Allow MOVE_MOUNT_BENEATH to target the caller's rootfs, allowing to switch out the rootfs without pivot_root(2). The traditional approach to switching the rootfs involves pivot_root(2) or a chroot_fs_refs()-based mechanism that atomically updates fs->root for all tasks sharing the same fs_struct. This has consequences for fork(), unshare(CLONE_FS), and setns(). This series instead decomposes root-switching into individually atomic, locally-scoped steps: fd_tree = open_tree(-EBADF, "/newroot", OPEN_TREE_CLONE \| OPEN_TREE_CLOEXEC); fchdir(fd_tree); move_mount(fd_tree, "", AT_FDCWD, "/", MOVE_MOUNT_BENEATH \| MOVE_MOUNT_F_EMPTY_PATH); chroot("."); umount2(".", MNT_DETACH); Since each step only modifies the caller's own state, the fork/unshare/setns races are eliminated by design. A key step to making this possible is to remove the locked mount restriction. Originally MOVE_MOUNT_BENEATH doesn't support mounting beneath a mount that is locked. The locked mount protects the underlying mount from being revealed. This is a core mechanism of unshare(CLONE_NEWUSER \| CLONE_NEWNS). The mounts in the new mount namespace become locked. That effectively makes the new mount table useless as the caller cannot ever get rid of any of the mounts no matter how useless they are. We can lift this restriction though. We simply transfer the locked property from the top mount to the mount beneath. This works because what we care about is to protect the underlying mount aka the parent. The mount mounted between the parent and the top mount takes over the job of protecting the parent mount from the top mount mount. This leaves us free to remove the locked property from the top mount which can consequently be unmounted: unshare(CLONE_NEWUSER \| CLONE_NEWNS) and we inherit a clone of procfs on /proc then currently we cannot unmount it as: umount -l /proc will fail with EINVAL because the procfs mount is locked. After this series we can now do: mount --beneath -t tmpfs tmpfs /proc umount -l /proc after which a tmpfs mount has been placed beneath the procfs mount. The tmpfs mount has become locked and the procfs mount has become unlocked. This means you can safely modify an inherited mount table after unprivileged namespace creation. Afterwards we simply make it possible to move a mount beneath the rootfs allowing to upgrade the rootfs. Removing the locked restriction makes this very useful for containers created with unshare(CLONE_NEWUSER \| CLONE_NEWNS) to reshuffle an inherited mount table safely and MOVE_MOUNT_BENEATH makes it possible to switch out the rootfs instead of using the costly pivot_root(2). * tag 'vfs-7.1-rc1.mount.v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: selftests/namespaces: remove unused utils.h include from listns_efault_test selftests/fsmount_ns: add missing TARGETS and fix cap test selftests/empty_mntns: fix wrong CLONE_EMPTY_MNTNS hex value in comment selftests/empty_mntns: fix statmount_alloc() signature mismatch selftests/statmount: remove duplicate wait_for_pid() mount: always duplicate mount selftests/filesystems: add MOVE_MOUNT_BENEATH rootfs tests move_mount: allow MOVE_MOUNT_BENEATH on the rootfs move_mount: transfer MNT_LOCKED selftests/filesystems: add clone3 tests for empty mount namespaces selftests/filesystems: add tests for empty mount namespaces namespace: allow creating empty mount namespaces selftests: add FSMOUNT_NAMESPACE tests selftests/statmount: add statmount_alloc() helper tools: update mount.h header mount: add FSMOUNT_NAMESPACE mount: simplify __do_loopback() mount: start iterating from start of rbtree	2026-04-14 19:59:25 -07:00
Linus Torvalds	91a4855d6c	Merge tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core & protocols: - Support HW queue leasing, allowing containers to be granted access to HW queues for zero-copy operations and AF_XDP - Number of code moves to help the compiler with inlining. Avoid output arguments for returning drop reason where possible - Rework drop handling within qdiscs to include more metadata about the reason and dropping qdisc in the tracepoints - Remove the rtnl_lock use from IP Multicast Routing - Pack size information into the Rx Flow Steering table pointer itself. This allows making the table itself a flat array of u32s, thus making the table allocation size a power of two - Report TCP delayed ack timer information via socket diag - Add ip_local_port_step_width sysctl to allow distributing the randomly selected ports more evenly throughout the allowed space - Add support for per-route tunsrc in IPv6 segment routing - Start work of switching sockopt handling to iov_iter - Improve dynamic recvbuf sizing in MPTCP, limit burstiness and avoid buffer size drifting up - Support MSG_EOR in MPTCP - Add stp_mode attribute to the bridge driver for STP mode selection. This addresses concerns about call_usermodehelper() usage - Remove UDP-Lite support (as announced in 2023) - Remove support for building IPv6 as a module. Remove the now unnecessary function calling indirection Cross-tree stuff: - Move Michael MIC code from generic crypto into wireless, it's considered insecure but some WiFi networks still need it Netfilter: - Switch nft_fib_ipv6 module to no longer need temporary dst_entry object allocations by using fib6_lookup() + RCU. Florian W reports this gets us ~13% higher packet rate - Convert IPVS's global __ip_vs_mutex to per-net service_mutex and switch the service tables to be per-net. Convert some code that walks the service lists to use RCU instead of the service_mutex - Add more opinionated input validation to lower security exposure - Make IPVS hash tables to be per-netns and resizable Wireless: - Finished assoc frame encryption/EPPKE/802.1X-over-auth - Radar detection improvements - Add 6 GHz incumbent signal detection APIs - Multi-link support for FILS, probe response templates and client probing - New APIs and mac80211 support for NAN (Neighbor Aware Networking, aka Wi-Fi Aware) so less work must be in firmware Driver API: - Add numerical ID for devlink instances (to avoid having to create fake bus/device pairs just to have an ID). Support shared devlink instances which span multiple PFs - Add standard counters for reporting pause storm events (implement in mlx5 and fbnic) - Add configuration API for completion writeback buffering (implement in mana) - Support driver-initiated change of RSS context sizes - Support DPLL monitoring input frequency (implement in zl3073x) - Support per-port resources in devlink (implement in mlx5) Misc: - Expand the YAML spec for Netfilter Drivers - Software: - macvlan: support multicast rx for bridge ports with shared source MAC address - team: decouple receive and transmit enablement for IEEE 802.3ad LACP "independent control" - Ethernet high-speed NICs: - nVidia/Mellanox: - support high order pages in zero-copy mode (for payload coalescing) - support multiple packets in a page (for systems with 64kB pages) - Broadcom 25-400GE (bnxt): - implement XDP RSS hash metadata extraction - add software fallback for UDP GSO, lowering the IOMMU cost - Broadcom 800GE (bnge): - add link status and configuration handling - add various HW and SW statistics - Marvell/Cavium: - NPC HW block support for cn20k - Huawei (hinic3): - add mailbox / control queue - add rx VLAN offload - add driver info and link management - Ethernet NICs: - Marvell/Aquantia: - support reading SFP module info on some AQC100 cards - Realtek PCI (r8169): - add support for RTL8125cp - Realtek USB (r8152): - support for the RTL8157 5Gbit chip - add 2500baseT EEE status/configuration support - Ethernet NICs embedded and off-the-shelf IP: - Synopsys (stmmac): - cleanup and reorganize SerDes handling and PCS support - cleanup descriptor handling and per-platform data - cleanup and consolidate MDIO defines and handling - shrink driver memory use for internal structures - improve Tx IRQ coalescing - improve TCP segmentation handling - add support for Spacemit K3 - Cadence (macb): - support PHYs that have inband autoneg disabled with GEM - support IEEE 802.3az EEE - rework usrio capabilities and handling - AMD (xgbe): - improve power management for S0i3 - improve TX resilience for link-down handling - Virtual: - Google cloud vNIC: - support larger ring sizes in DQO-QPL mode - improve HW-GRO handling - support UDP GSO for DQO format - PCIe NTB: - support queue count configuration - Ethernet PHYs: - automatically disable PHY autonomous EEE if MAC is in charge - Broadcom: - add BCM84891/BCM84892 support - Micrel: - support for LAN9645X internal PHY - Realtek: - add RTL8224 pair order support - support PHY LEDs on RTL8211F-VD - support spread spectrum clocking (SSC) - Maxlinear: - add PHY-level statistics via ethtool - Ethernet switches: - Maxlinear (mxl862xx): - support for bridge offloading - support for VLANs - support driver statistics - Bluetooth: - large number of fixes and new device IDs - Mediatek: - support MT6639 (MT7927) - support MT7902 SDIO - WiFi: - Intel (iwlwifi): - UNII-9 and continuing UHR work - MediaTek (mt76): - mt7996/mt7925 MLO fixes/improvements - mt7996 NPU support (HW eth/wifi traffic offload) - Qualcomm (ath12k): - monitor mode support on IPQ5332 - basic hwmon temperature reporting - support IPQ5424 - Realtek: - add USB RX aggregation to improve performance - add USB TX flow control by tracking in-flight URBs - Cellular: - IPA v5.2 support" * tag 'net-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1561 commits) net: pse-pd: fix kernel-doc function name for pse_control_find_by_id() wireguard: device: use exit_rtnl callback instead of manual rtnl_lock in pre_exit wireguard: allowedips: remove redundant space tools: ynl: add sample for wireguard wireguard: allowedips: Use kfree_rcu() instead of call_rcu() MAINTAINERS: Add netkit selftest files selftests/net: Add additional test coverage in nk_qlease selftests/net: Split netdevsim tests from HW tests in nk_qlease tools/ynl: Make YnlFamily closeable as a context manager net: airoha: Add missing PPE configurations in airoha_ppe_hw_init() net: airoha: Fix VIP configuration for AN7583 SoC net: caif: clear client service pointer on teardown net: strparser: fix skb_head leak in strp_abort_strp() net: usb: cdc-phonet: fix skb frags[] overflow in rx_complete() selftests/bpf: add test for xdp_master_redirect with bond not up net, bpf: fix null-ptr-deref in xdp_master_redirect() for down master net: airoha: Remove PCE_MC_EN_MASK bit in REG_FE_PCE_CFG configuration sctp: disable BH before calling udp_tunnel_xmit_skb() sctp: fix missing encap_port propagation for GSO fragments net: airoha: Rely on net_device pointer in ETS callbacks ...	2026-04-14 18:36:10 -07:00
Linus Torvalds	f5ad410100	Merge tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Pull bpf updates from Alexei Starovoitov: - Welcome new BPF maintainers: Kumar Kartikeya Dwivedi, Eduard Zingerman while Martin KaFai Lau reduced his load to Reviwer. - Lots of fixes everywhere from many first time contributors. Thank you All. - Diff stat is dominated by mechanical split of verifier.c into multiple components: - backtrack.c: backtracking logic and jump history - states.c: state equivalence - cfg.c: control flow graph, postorder, strongly connected components - liveness.c: register and stack liveness - fixups.c: post-verification passes: instruction patching, dead code removal, bpf_loop inlining, finalize fastcall 8k line were moved. verifier.c still stands at 20k lines. Further refactoring is planned for the next release. - Replace dynamic stack liveness with static stack liveness based on data flow analysis. This improved the verification time by 2x for some programs and equally reduced memory consumption. New logic is in liveness.c and supported by constant folding in const_fold.c (Eduard Zingerman, Alexei Starovoitov) - Introduce BTF layout to ease addition of new BTF kinds (Alan Maguire) - Use kmalloc_nolock() universally in BPF local storage (Amery Hung) - Fix several bugs in linked registers delta tracking (Daniel Borkmann) - Improve verifier support of arena pointers (Emil Tsalapatis) - Improve verifier tracking of register bounds in min/max and tnum domains (Harishankar Vishwanathan, Paul Chaignon, Hao Sun) - Further extend support for implicit arguments in the verifier (Ihor Solodrai) - Add support for nop,nop5 instruction combo for USDT probes in libbpf (Jiri Olsa) - Support merging multiple module BTFs (Josef Bacik) - Extend applicability of bpf_kptr_xchg (Kaitao Cheng) - Retire rcu_trace_implies_rcu_gp() (Kumar Kartikeya Dwivedi) - Support variable offset context access for 'syscall' programs (Kumar Kartikeya Dwivedi) - Migrate bpf_task_work and dynptr to kmalloc_nolock() (Mykyta Yatsenko) - Fix UAF in in open-coded task_vma iterator (Puranjay Mohan) * tag 'bpf-next-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (241 commits) selftests/bpf: cover short IPv4/IPv6 inputs with adjust_room bpf: reject short IPv4/IPv6 inputs in bpf_prog_test_run_skb selftests/bpf: Use memfd_create instead of shm_open in cgroup_iter_memcg selftests/bpf: Add test for cgroup storage OOB read bpf: Fix OOB in pcpu_init_value selftests/bpf: Fix reg_bounds to match new tnum-based refinement selftests/bpf: Add tests for non-arena/arena operations bpf: Allow instructions with arena source and non-arena dest registers bpftool: add missing fsession to the usage and docs of bpftool docs/bpf: add missing fsession attach type to docs bpf: add missing fsession to the verifier log bpf: Move BTF checking logic into check_btf.c bpf: Move backtracking logic to backtrack.c bpf: Move state equivalence logic to states.c bpf: Move check_cfg() into cfg.c bpf: Move compute_insn_live_regs() into liveness.c bpf: Move fixup/post-processing logic from verifier.c into fixups.c bpf: Simplify do_check_insn() bpf: Move checks for reserved fields out of the main pass bpf: Delete unused variable ...	2026-04-14 18:04:04 -07:00
Linus Torvalds	e997ac58ad	Merge tag 'linux_kselftest-next-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest updates from Shuah Khan: - cpu-hotplug: fix to check if cpu hotplug is supported to avoid test failures when cpu hotplug isn't supported. - frace: fix to relevant comparisons and path checks in the helper so it handles those patterns without spurious shell warnings. - runner.sh: add ktrap support - tracing: fix to make --logdir option work again - tracing: fix to check awk supports non POSIX strtonum() - mqueue: fix incorrectly named settings file to make sure the test used the correct timeout value - kselftest: - fix to treat xpass as successful result - add ksft_reset_state() - kselftest_harness: - validate kselftest exit codes are handled explicitly - add detection of invalid mixing of kselftest and harness functionality - add validation of intermixing of kselftest and harness functionality - run_kselftest.sh: - remove unused $ROOT - resolve BASE_DIR with pwd -P to avoid dependency on realpath or readlink commands to generate a physical absolute path for BASE_DIR - allow choosing per-test log directory - preserve subtarget failures in all/install * tag 'linux_kselftest-next-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/ftrace: Quote check_requires comparisons selftests: Preserve subtarget failures in all/install selftests/run_kselftest.sh: Allow choosing per-test log directory selftests/run_kselftest.sh: Resolve BASE_DIR with pwd -P selftests/run_kselftest.sh: Remove unused $ROOT selftests/cpu-hotplug: Fix check for cpu hotplug not supported selftests/mqueue: Fix incorrectly named file selftests: Use ktap helpers for runner.sh selftests: harness: Validate intermixing of kselftest and harness functionality selftests: harness: Detect illegal mixing of kselftest and harness functionality selftests: kselftest: Add ksft_reset_state() selftests: harness: Validate that explicit kselftest exitcodes are handled selftests: kselftest: Treat xpass as successful result selftests/tracing: Fix to check awk supports non POSIX strtonum() selftests/tracing: Fix to make --logdir option work again	2026-04-14 17:46:12 -07:00
Linus Torvalds	6198c86a97	Merge tag 'linux_kselftest-kunit-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kunit tool updates from Shuah Khan: - terminate kernel under test on SIGINT when it catches SIGINT to make sure the TTY isn't messed up and terminate the running kernel - recommend --raw_output=all when KTAP header isn't found in the kernel output, it's useful to re-run the test with --raw_output=all to find out the reasons why the test didn't complete. - skip stty when stdin is not a tty to avoid writing noise to stderr. - show suites when user runs --list_suites option instead of entire list of tests to make the output user friendly and concise. * tag 'linux_kselftest-kunit-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: kunit: tool: Terminate kernel under test on SIGINT kunit: tool: skip stty when stdin is not a tty kunit: tool: Recommend --raw_output=all if no KTAP found kunit: Add --list_suites to show suites	2026-04-14 17:39:42 -07:00
Linus Torvalds	88b29f3f57	Merge tag 'modules-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux Pull module updates from Sami Tolvanen: "Kernel symbol flags: - Replace the separate _gpl symbol sections (__ksymtab_gpl and __kcrctab_gpl) with a unified symbol table and a new __kflagstab section. This section stores symbol flags, such as the GPL-only flag, as an 8-bit bitset for each exported symbol. This is a cleanup that simplifies symbol lookup in the module loader by avoiding table fragmentation and will allow a cleaner way to add more flags later if needed. Module signature UAPI: - Move struct module_signature to the UAPI headers to allow reuse by tools outside the kernel proper, such as kmod and scripts/sign-file. This also renames a few constants for clarity and drops unused signature types as preparation for hash-based module integrity checking work that's in progress. Sysfs: - Add a /sys/module/<module>/import_ns sysfs attribute to show the symbol namespaces imported by loaded modules. This makes it easier to verify driver API access at runtime on systems that care about such things (e.g. Android). Cleanups and fixes: - Force sh_addr to 0 for all sections in module.lds. This prevents non-zero section addresses when linking modules with 'ld.bfd -r', which confused elfutils. - Fix a memory leak of charp module parameters on module unload when the kernel is configured with CONFIG_SYSFS=n. - Override the -EEXIST error code returned by module_init() to userspace. This prevents confusion with the errno reserved by the module loader to indicate that a module is already loaded. - Simplify the warning message and drop the stack dump on positive returns from module_init(). - Drop unnecessary extern keywords from function declarations and synchronize parse_args() arguments with their implementation" tag 'modules-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/modules/linux: (23 commits) module: Simplify warning on positive returns from module_init() module: Override -EEXIST module return documentation: remove references to _gpl sections module: remove _gpl sections from vmlinux and modules module: deprecate usage of _gpl sections in module loader module: use kflagstab instead of _gpl sections module: populate kflagstab in modpost module: add kflagstab section to vmlinux and modules module: define ksym_flags enumeration to represent kernel symbol flags selftests/bpf: verify_pkcs7_sig: Use 'struct module_signature' from the UAPI headers sign-file: use 'struct module_signature' from the UAPI headers tools uapi headers: add linux/module_signature.h module: Move 'struct module_signature' to UAPI module: Give MODULE_SIG_STRING a more descriptive name module: Give 'enum pkey_id_type' a more specific name module: Drop unused signature types extract-cert: drop unused definition of PKEY_ID_PKCS7 docs: symbol-namespaces: mention sysfs attribute module: expose imported namespaces via sysfs module: Remove extern keyword from param prototypes ...	2026-04-14 17:16:38 -07:00
Linus Torvalds	ee60c510fb	Merge tag 'nolibc-20260412-for-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc Pull nolibc updates from Thomas Weißschuh: - Many new features and optimizations to printf() - Rename non-standard symbols to avoid collisions with application code - Support for byteswap.h, endian.h, err.h and asprintf() - 64-bit dev_t - Smaller cleanups and fixes to the code and build system * tag 'nolibc-20260412-for-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/nolibc/linux-nolibc: (61 commits) selftests/nolibc: use gcc 15 tools/nolibc: support UBSAN on gcc tools/nolibc: create __nolibc_no_sanitize_ubsan selftests/nolibc: don't skip tests for unimplemented syscalls anymore selftests/nolibc: explicitly handle ENOSYS from ptrace() tools/nolibc: add byteorder conversions tools/nolibc: add the _syscall() macro tools/nolibc: move the call to __sysret() into syscall() tools/nolibc: rename the internal macros used in syscall() selftests/nolibc: only use libgcc when really necessary selftests/nolibc: test the memory allocator tools/nolibc: check for overflow in calloc() without divisions tools/nolibc: add support for asprintf() tools/nolibc: use __builtin_offsetof() tools/nolibc: use makedev() in fstatat() tools/nolibc: handle all major and minor numbers in makedev() and friends tools/nolibc: make dev_t 64 bits wide tools/nolibc: move the logic of makedev() and friends into functions selftests/nolibc: add a test for stat().st_rdev selftests/nolibc: add some tests for makedev() and friends ...	2026-04-14 17:13:09 -07:00
Linus Torvalds	3203a08c12	Merge tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - powerpc support for huge pfnmaps - Cleanups to use masked user access - Rework pnv_ioda_pick_m64_pe() to use better bitmap API - Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC - Backup region offset update to eflcorehdr - Fixes for wii/ps3 platform - Implement JIT support for private stack in powerpc - Implement JIT support for fsession in powerpc64 trampoline - Add support for instruction array and indirect jump in powerpc - Misc selftest fixes and cleanups Thanks to Abhishek Dubey, Aditya Gupta, Alex Williamson, Amit Machhiwal, Andrew Donnellan, Bartosz Golaszewski, Cédric Le Goater, Chen Ni, Christophe Leroy (CS GROUP), Hari Bathini, J. Neuschäfer, Mukesh Kumar Chaurasiya (IBM), Nam Cao, Nilay Shroff, Pavithra Prakash, Randy Dunlap, Ritesh Harjani (IBM), Shrikanth Hegde, Sourabh Jain, Vaibhav Jain, Venkat Rao Bagalkote, and Yury Norov (NVIDIA) * tag 'powerpc-7.1-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (47 commits) mailmap: Add entry for Andrew Donnellan powerpc32/bpf: fix loading fsession func metadata using PPC_LI32 selftest/bpf: Enable gotox tests for powerpc64 powerpc64/bpf: Add support for indirect jump selftest/bpf: Enable instruction array test for powerpc powerpc/bpf: Add support for instruction array powerpc32/bpf: Add fsession support powerpc64/bpf: Implement fsession support selftests/bpf: Enable private stack tests for powerpc64 powerpc64/bpf: Implement JIT support for private stack powerpc: pci-ioda: Optimize pnv_ioda_pick_m64_pe() powerpc: pci-ioda: use bitmap_alloc() in pnv_ioda_pick_m64_pe() powerpc/net: Inline checksum wrappers and convert to scoped user access powerpc/sstep: Convert to scoped user access powerpc/align: Convert emulate_spe() to scoped user access powerpc/ptrace: Convert gpr32_set_common_user() to scoped user access powerpc/futex: Use masked user access powerpc/audit: Convert powerpc to AUDIT_ARCH_COMPAT_GENERIC cpuidle: powerpc: avoid double clear when breaking snooze powerpc/ps3: spu.c: fix enum and Return kernel-doc warnings ...	2026-04-14 17:10:15 -07:00
Linus Torvalds	e6b162a63f	Merge tag 'm68knommu-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu update from Greg Ungerer: - fix task info flags handling for 68000 nommu * tag 'm68knommu-for-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k: Fix task info flags handling for 68000	2026-04-14 17:07:45 -07:00

1 2 3 4 5 ...

1434335 Commits