Commit c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
broke map_kptr selftest since it removed the function we were kprobing.
Use a new kfunc that invokes call_rcu_tasks_trace and sets a program
provided pointer to an integer to 1. Technically this can be unsafe if
the memory being written to from the callback disappears, but this is
just for usage in a test where we ensure we spin until we see the value
to be set to 1, so it's ok.
Reported-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Fixes: c27cea4416 ("rcu: Re-implement RCU Tasks Trace in terms of SRCU-fast")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260211185747.3630539-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update the expected result of the selftest as recursion of task local
storage syscall and helpers have been relaxed. Now that the percpu
counter is removed, task local storage helpers, bpf_task_storage_get()
and bpf_task_storage_delete() can now run on the same CPU at the same
time unless they cause deadlock.
Note that since there is no percpu counter preventing recursion in
task local storage helpers, bpf_trampoline now catches the recursion
of on_update as reported by recursion_misses.
on_enter: tp_btf/sys_enter
on_update: fentry/bpf_local_storage_update
Old behavior New behavior
____________ ____________
on_enter on_enter
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock succeed bpf_local_storage_update(&map_a)
bpf_local_storage_update(&map_a)
on_update on_update
bpf_task_storage_get(&map_a) bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail on_update::misses++ (1)
return NULL create and return map_a::ptr
map_a::ptr += 1 (1)
bpf_task_storage_delete(&map_a)
return 0
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail on_update::misses++ (2)
return NULL create and return map_b::ptr
map_b::ptr += 1 (1)
create and return map_a::ptr create and return map_a::ptr
map_a::ptr = 200 map_a::ptr = 200
bpf_task_storage_get(&map_b) bpf_task_storage_get(&map_b)
bpf_task_storage_trylock succeed lockless lookup succeed
bpf_local_storage_update(&map_b) return map_b::ptr
on_update
bpf_task_storage_get(&map_a)
bpf_task_storage_trylock fail
lockless lookup succeed
return map_a::ptr
map_a::ptr += 1 (201)
bpf_task_storage_delete(&map_a)
bpf_task_storage_trylock fail
return -EBUSY
nr_del_errs++ (1)
bpf_task_storage_get(&map_b)
bpf_task_storage_trylock fail
return NULL
create and return ptr
map_b::ptr = 100
Expected result:
map_a::ptr = 201 map_a::ptr = 200
map_b::ptr = 100 map_b::ptr = 1
nr_del_err = 1 nr_del_err = 0
on_update::recursion_misses = 0 on_update::recursion_misses = 2
On_enter::recursion_misses = 0 on_enter::recursion_misses = 0
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260205222916.1788211-14-ameryhung@gmail.com
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to
an odd number.
TOO_MANY_FRAGS test case contains an invalid packet consisting of
(xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD
flag set. This packet is expected to be dropped. After that, there is a
valid linear packet, which is expected to be received back.
Once (xdp_zc_max_segs) is an odd number, the last packet cannot be
received, if packet forwarding between Rx and Tx interfaces relies on
the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed,
if all traffic is looped.
Turns out, sending function processes multiple invalid frags as if they
were in 2-frag packets. So once the invalid mbuf packet contains an odd
number of those, the valid packet after gets paired with the previous
invalid descriptor, and hence does not get an ethernet header generated, so
it is either dropped or malformed.
Make invalid packets in verbatim mode always have only a single frag. For
such packets, number of frags is otherwise meaningless, as descriptor flags
are pre-configured in verbatim mode and packet data is not generated for
invalid descriptors.
Fixes: 697604492b ("selftests/xsk: add invalid descriptor test for multi-buffer")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-3-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Referenced commit reduced the scope of the variable pkt, so now it has to
be reinitialized via pkt_stream_get_next_rx_pkt(), which also increments
some counters. When the packet is interrupted by the batch ending, pkt
stream therefore proceeds to the next packet, while xsk ring still contains
the previous one, this results in a pkt_nb mismatch.
Decrement the affected counters when packet is interrupted.
Fixes: 8913e653e9 ("selftests/xsk: Iterate over all the sockets in the receive pkts function")
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Link: https://lore.kernel.org/r/20260203155103.2305816-2-larysa.zaremba@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add stress tests for BPF timers that run in NMI context using perf_event
programs attached to PERF_COUNT_HW_CPU_CYCLES.
The tests cover three scenarios:
- nmi_race: Tests concurrent timer start and async cancel operations
- nmi_update: Tests updating a map element (effectively deleting and
inserting new for array map) from within a timer callback
- nmi_cancel: Tests timer self-cancellation attempt.
A common test_common() helper is used to share timer setup logic across
all test modes.
The tests spawn multiple threads in a child process to generate
perf events, which trigger the BPF programs in NMI context. Hit counters
verify that the NMI code paths were actually exercised.
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260201025403.66625-8-alexei.starovoitov@gmail.com
The added fsession does not prevent running on those architectures, that
haven't added fsession support.
For example, try to run fsession tests on arm64:
test_fsession_basic:PASS:fsession_test__open_and_load 0 nsec
test_fsession_basic:PASS:fsession_attach 0 nsec
check_result:FAIL:test_run_opts err unexpected error: -14 (errno 14)
In order to prevent such errors, add bpf_jit_supports_fsession() to guard
those architectures.
Fixes: 2d419c4465 ("bpf: add fsession support")
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Tested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260131144950.16294-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_bpftool_map.sh script tests that maps read/write accesses
are being properly allowed/refused by the kernel depending on a specific
fmod_ret program being attached on security_bpf_map function.
Rewrite this test to integrate it in the test_progs. The
new test spawns a few subtests:
#36/1 bpftool_maps_access/unprotected_unpinned:OK
#36/2 bpftool_maps_access/unprotected_pinned:OK
#36/3 bpftool_maps_access/protected_unpinned:OK
#36/4 bpftool_maps_access/protected_pinned:OK
#36/5 bpftool_maps_access/nested_maps:OK
#36/6 bpftool_maps_access/btf_list:OK
#36 bpftool_maps_access:OK
Summary: 1/6 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-3-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The test_bpftool_metadata.sh script validates that bpftool properly
returns in its ouptput any metadata generated by bpf programs through
some .rodata sections.
Port this test to the test_progs framework so that it can be executed
automatically in CI. The new test, similarly to the former script,
checks that valid data appears both for textual output and json output,
as well as for both data not used at all and used data. For the json
check part, the expected json string is hardcoded to avoid bringing a
new external dependency (eg: a json deserializer) for test_progs.
As the test is now converted into test_progs, remove the former script.
The newly converted test brings two new subtests:
#37/1 bpftool_metadata/metadata_unused:OK
#37/2 bpftool_metadata/metadata_used:OK
#37 bpftool_metadata:OK
Summary: 1/2 PASSED, 0 SKIPPED, 0 FAILED
Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://lore.kernel.org/r/20260123-bpftool-tests-v4-2-a6653a7f28e7@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
CI occasionally reports failures in the
percpu_alloc/cpu_flag_lru_percpu_hash selftest, for example:
First test_progs failure (test_progs_no_alu32-x86_64-llvm-21):
#264/15 percpu_alloc/cpu_flag_lru_percpu_hash
...
test_percpu_map_op_cpu_flag:FAIL:bpf_map_lookup_batch value on specified cpu unexpected bpf_map_lookup_batch value on specified cpu: actual 0 != expected 3735929054
The unexpected value indicates that an element was removed from the map.
However, the test never calls delete_elem(), so the only possible cause
is LRU eviction.
This can happen when the current task migrates to another CPU: an
update_elem() triggers eviction because there is no available LRU node
on local freelist and global freelist.
Harden the test against this behavior by provisioning sufficient spare
elements. Set max_entries to 'nr_cpus * 2' and restrict the test to using
the first nr_cpus entries, ensuring that updates do not spuriously trigger
LRU eviction.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260119133417.19739-1-leon.hwang@linux.dev
Add a new selftest suite `exe_ctx` to verify the accuracy of the
bpf_in_task(), bpf_in_hardirq(), and bpf_in_serving_softirq() helpers
introduced in bpf_experimental.h.
Testing these execution contexts deterministically requires crossing
context boundaries within a single CPU. To achieve this, the test
implements a "Trigger-Observer" pattern using bpf_testmod:
1. Trigger: A BPF syscall program calls a new bpf_testmod kfunc
bpf_kfunc_trigger_ctx_check().
2. Task to HardIRQ: The kfunc uses irq_work_queue() to trigger a
self-IPI on the local CPU.
3. HardIRQ to SoftIRQ: The irq_work handler calls a dummy function
(observed by BPF fentry) and then schedules a tasklet to
transition into SoftIRQ context.
The user-space runner ensures determinism by pinning itself to CPU 0
before execution, forcing the entire interrupt chain to remain on a
single core. Dummy noinline functions with compiler barriers are
added to bpf_testmod.c to serve as stable attachment points for
fentry programs. A retry loop is used in user-space to wait for the
asynchronous SoftIRQ to complete.
Note that testing on s390x is avoided because supporting those helpers
purely in BPF on s390x is not possible at this point.
Reviewed-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Changwoo Min <changwoo@igalia.com>
Link: https://lore.kernel.org/r/20260125115413.117502-3-changwoo@igalia.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test session cookie for fsession. Multiple fsession BPF progs is attached
to bpf_fentry_test1() and session cookie is read and write in the
testcase.
bpf_get_func_ip() will influence the layout of the session cookies, so we
test the cookie in two case: with and without bpf_get_func_ip().
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20260124062008.8657-13-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If the argument 'pull_len' of run_test() is 'PULL_MAX' or
'PULL_MAX | PULL_PLUS_ONE', the eventual pull_len size
will close to the page size. On arm64 systems with 64K pages,
the pull_len size will be close to 64K. But the existing buffer
will be close to 9000 which is not enough to pull.
For those failed run_tests(), make buff size to
pg_sz + (pg_sz / 2)
This way, there will be enough buffer space to pull
regardless of page size.
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055128.495265-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On arm64 systems with 64K pages, the selftest task_local_data has the following
failures:
...
test_task_local_data_basic:PASS:tld_create_key 0 nsec
test_task_local_data_basic:FAIL:tld_create_key unexpected tld_create_key: actual 0 != expected -28
...
test_task_local_data_basic_thread:PASS:run task_main 0 nsec
test_task_local_data_basic_thread:FAIL:task_main retval unexpected error: 2 (errno 0)
test_task_local_data_basic_thread:FAIL:tld_get_data value0 unexpected tld_get_data value0: actual 0 != expected 6268
...
#447/1 task_local_data/task_local_data_basic:FAIL
...
#447/2 task_local_data/task_local_data_race:FAIL
#447 task_local_data:FAIL
When TLD_DYN_DATA_SIZE is 64K page size, for
struct tld_meta_u {
_Atomic __u8 cnt;
__u16 size;
struct tld_metadata metadata[];
};
field 'cnt' would overflow. For example, for 4K page, 'cnt' will
be 4096/64 = 64. But for 64K page, 'cnt' will be 65536/64 = 1024
and 'cnt' is not enough for 1024. To accommodate 64K page,
'_Atomic __u8 cnt' becomes '_Atomic __u16 cnt'. A few other places
are adjusted accordingly.
In test_task_local_data.c, the value for TLD_DYN_DATA_SIZE is changed
from 4096 to (getpagesize() - 8) since the maximum buffer size for
TLD_DYN_DATA_SIZE is (getpagesize() - 8).
Reviewed-by: Alan Maguire <alan.maguire@oracle.com>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Cc: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260123055122.494352-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When wq__attach() fails, serial_test_wq() returns early without calling
wq__destroy(), leaking the skeleton resources allocated by
wq__open_and_load(). This causes ASAN leak reports in selftests runs.
Fix this by jumping to a common clean_up label that calls wq__destroy()
on all exit paths after successful open_and_load.
Note that the early return after wq__open_and_load() failure is correct
and doesn't need fixing, since that function returns NULL on failure
(after internally cleaning up any partial allocations).
Fixes: 8290dba519 ("selftests/bpf: wq: add bpf_wq_start() checks")
Signed-off-by: Kery Qi <qikeyu2017@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20260121094114.1801-3-qikeyu2017@gmail.com
Now BPF_DIV has range tracking support via interval analysis. This patch
adds selftests to cover various cases of BPF_DIV and BPF_MOD operations
when the divisor is a constant, also covering both signed and unsigned variants.
This patch includes several types of tests in 32-bit and 64-bit variants:
1. For UDIV
- positive divisor
- zero divisor
2. For SDIV
- positive divisor, positive dividend
- positive divisor, negative dividend
- positive divisor, mixed sign dividend
- negative divisor, positive dividend
- negative divisor, negative dividend
- negative divisor, mixed sign dividend
- zero divisor
- overflow (SIGNED_MIN/-1), normal dividend
- overflow (SIGNED_MIN/-1), constant dividend
3. For UMOD
- positive divisor
- positive divisor, small dividend
- zero divisor
4. For SMOD
- positive divisor, positive dividend
- positive divisor, negative dividend
- positive divisor, mixed sign dividend
- positive divisor, mixed sign dividend, small dividend
- negative divisor, positive dividend
- negative divisor, negative dividend
- negative divisor, mixed sign dividend
- negative divisor, mixed sign dividend, small dividend
- zero divisor
- overflow (SIGNED_MIN/-1), normal dividend
- overflow (SIGNED_MIN/-1), constant dividend
Specifically, these selftests are based on dead code elimination:
If the BPF verifier can precisely analyze the result of BPF_DIV/BPF_MOD
instruction, it can prune the path that leads to an error (here we use
invalid memory access as the error case), allowing the program to pass
verification.
Co-developed-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Signed-off-by: Shenghao Yuan <shenghaoyuan0928@163.com>
Co-developed-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Tianci Cao <ziye@zju.edu.cn>
Signed-off-by: Yazhou Tang <tangyazhou518@outlook.com>
Link: https://lore.kernel.org/r/20260119085458.182221-3-tangyazhou@zju.edu.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add trivial end-to-end tests to validate that KF_IMPLICIT_ARGS flag is
properly handled by both resolve_btfids and the verifier.
Declare kfuncs in bpf_testmod. Check that bpf_prog_aux pointer is set
in the kfunc implementation. Verify that calls with implicit args and
a legacy case all work.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260120222638.3976562-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On my arm64 machine, I get the following failure:
...
tester_init:PASS:tester_log_buf 0 nsec
process_subtest:PASS:obj_open_mem 0 nsec
process_subtest:PASS:specs_alloc 0 nsec
serial_test_map_kptr:PASS:rcu_tasks_trace_gp__open_and_load 0 nsec
...
test_map_kptr_success:PASS:map_kptr__open_and_load 0 nsec
test_map_kptr_success:PASS:test_map_kptr_ref1 refcount 0 nsec
test_map_kptr_success:FAIL:test_map_kptr_ref1 retval unexpected error: 2 (errno 2)
test_map_kptr_success:PASS:test_map_kptr_ref2 refcount 0 nsec
test_map_kptr_success:FAIL:test_map_kptr_ref2 retval unexpected error: 1 (errno 2)
...
#201/21 map_kptr/success-map:FAIL
In serial_test_map_kptr(), before test_map_kptr_success(), one
kern_sync_rcu() is used to have some delay for freeing the map.
But in my environment, one kern_sync_rcu() seems not enough and
caused the test failure.
In bpf_map_free_in_work() in syscall.c, the queue time for
queue_work(system_dfl_wq, &map->work)
may be longer than expected. This may cause the test failure
since test_map_kptr_success() expects all previous maps having been freed.
Since it is not clear how long queue_work() time takes, a bpf prog
is added to count the reference after bpf_kfunc_call_test_acquire().
If the number of references is 2 (for initial ref and the one just
acquired), all previous maps should have been released. This will
resolve the above 'retval unexpected error' issue.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20260116052245.3692405-1-yonghong.song@linux.dev
Cross-merge BPF and other fixes after downstream PR.
No conflicts.
Adjacent:
Auto-merging MAINTAINERS
Auto-merging Makefile
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/sched/ext.c
Auto-merging mm/memcontrol.c
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The ldimm64 instruction for map value supports an offset.
For insn array maps it wasn't tested before, as normally
such instructions aren't generated. However, this is still
possible to pass such instructions, so add a few tests to
check that correct offsets work properly and incorrect
offsets are rejected.
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Link: https://lore.kernel.org/r/20260111153047.8388-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The BPF verifier was recently updated to treat pointers to struct types
returned from BPF kfuncs as implicitly trusted by default. Add a new
test case to exercise this new implicit trust semantic.
The KF_ACQUIRE flag was dropped from the bpf_get_root_mem_cgroup()
kfunc because it returns a global pointer to root_mem_cgroup without
performing any explicit reference counting. This makes it an ideal
candidate to verify the new implicit trusted pointer semantics.
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260113083949.2502978-3-mattbobrowski@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The current selftest sk_bypass_prot_mem only supports 4K page.
When running with 64K page on arm64, the following failure happens:
...
check_bypass:FAIL:no bypass unexpected no bypass: actual 3 <= expected 32
...
#385/1 sk_bypass_prot_mem/TCP :FAIL
...
check_bypass:FAIL:no bypass unexpected no bypass: actual 4 <= expected 32
...
#385/2 sk_bypass_prot_mem/UDP :FAIL
...
Adding support to 64K page as well fixed the failure.
Cc: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061028.3798326-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
On arm64 with 64K page , I observed the following test failure:
...
subtest_dmabuf_iter_check_lots_of_buffers:FAIL:total_bytes_read unexpected total_bytes_read:
actual 4696 <= expected 65536
#97/3 dmabuf_iter/lots_of_buffers:FAIL
With 4K page on x86, the total_bytes_read is 4593.
With 64K page on arm64, the total_byte_read is 4696.
In progs/dmabuf_iter.c, for each iteration, the output is
BPF_SEQ_PRINTF(seq, "%lu\n%llu\n%s\n%s\n", inode, size, name, exporter);
The only difference between 4K and 64K page is 'size' in
the above BPF_SEQ_PRINTF. The 4K page will output '4096' and
the 64K page will output '65536'. So the total_bytes_read with 64K page
is slighter greater than 4K page.
Adjusting the total_bytes_read from 65536 to 4096 fixed the issue.
Cc: T.J. Mercier <tjmercier@google.com>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260113061023.3798085-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test coverage for the new BPF_F_CPU and BPF_F_ALL_CPUS flags support
in percpu maps. The following APIs are exercised:
* bpf_map_update_batch()
* bpf_map_lookup_batch()
* bpf_map_update_elem()
* bpf_map__update_elem()
* bpf_map_lookup_elem_flags()
* bpf_map__lookup_elem()
For lru_percpu_hash map, set max_entries to
'libbpf_num_possible_cpus() + 1' and only use the first
'libbpf_num_possible_cpus()' entries. This ensures a spare entry is always
available in the LRU free list, avoiding eviction.
When updating an existing key in lru_percpu_hash map:
1. l_new = prealloc_lru_pop(); /* Borrow from free list */
2. l_old = lookup_elem_raw(); /* Found, key exists */
3. pcpu_copy_value(); /* In-place update */
4. bpf_lru_push_free(); /* Return l_new to free list */
Also add negative tests to verify that non-percpu array and hash maps
reject the BPF_F_CPU and BPF_F_ALL_CPUS flags.
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260107022022.12843-8-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Update the selftest to check that the metadata size check takes the
xdp_frame size into account in bpf_prog_test_run. The original
check (for meta size 256) was broken because the data frame supplied was
smaller than this, triggering a different EINVAL return. So supply a
larger data frame for this test to make sure we actually exercise the
check we think we are.
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260105114747.1358750-2-toke@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
With trusted args now being the default, passing NULL to kfunc
parameters that are pointers causes verifier rejection rather than a
runtime error. The test_bpf_nf test was failing because it attempted to
pass NULL to bpf_xdp_ct_lookup() to verify runtime error handling.
Since the NULL check now happens at verification time, remove the
runtime test case that passed NULL to the bpf_tuple parameter and
instead add verification-time tests to ensure the verifier correctly
rejects programs that pass NULL to trusted arguments.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-11-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
As arena kfuncs can now be called from non-sleepable contexts, test this
by adding non-sleepable copies of tests in verifier_arena, this is done
by using a socket program instead of syscall.
Add a new test case in verifier_arena_large to check that the
bpf_arena_alloc_pages() works for more than 1024 pages.
1024 * sizeof(struct page *) is the upper limit of kmalloc_nolock() but
bpf_arena_alloc_pages() should still succeed because it re-uses this
array in a loop.
Augment the arena_list selftest to also run in non-sleepable context by
taking rcu_read_lock.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20251222195022.431211-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add test coverage for the kfuncs that fetch memcg stats. Using some common
stats, test scenarios ensuring that the given stat increases by some
arbitrary amount. The stats selected cover the three categories represented
by the enums: node_stat_item, memcg_stat_item, vm_event_item.
Since only a subset of all stats are queried, use a static struct made up
of fields for each stat. Write to the struct with the fetched values when
the bpf program is invoked and read the fields in the user mode program for
verification.
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev>
Link: https://lore.kernel.org/r/20251223044156.208250-6-roman.gushchin@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.
Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
* ${1}.BTF with .BTF section data
* ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
* ${1}.BTF.base with .BTF.base section data for out-of-tree modules
The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.
The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.
There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.
For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.
Before this patch it worked as follows:
* pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
llvm-objcopy [3] to it
* then everything except the .BTF section was stripped from .tmp_vmlinux1
into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
* resolve_btfids was executed later on vmlinux.unstripped [4],
updating it in-place
After this patch gen-btf.sh implements the following:
* pahole consumes .tmp_vmlinux1 and produces a *detached* file with
raw BTF data
* resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
(potentially modified) .BTF, and .BTF_ids sections data
* a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
BTF output of resolve_btfids
* .BTF_ids data gets embedded into vmlinux.unstripped in
link-vmlinux.sh by objcopy --update-section
For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.
With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).
Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.
[1] https://lore.kernel.org/bpf/131b4190-9c49-4f79-a99d-c00fac97fa44@linux.dev/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n110
[3] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/tree/btf_encoder.c?h=v1.31#n1803
[4] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/scripts/link-vmlinux.sh?h=v6.18#n284
[5] https://lore.kernel.org/bpf/20200819092342.259004-1-jolsa@kernel.org/
[6] https://lore.kernel.org/bpf/cover.1707223196.git.vmalik@redhat.com/
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Alan Maguire <alan.maguire@oracle.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20251219181825.1289460-3-ihor.solodrai@linux.dev