linux

mirror of https://github.com/torvalds/linux.git synced 2026-04-24 01:25:49 -04:00

Author	SHA1	Message	Date
Linus Torvalds	0c00ed308d	Merge tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Support for batch request processing for ublk, improving the efficiency of the kernel/ublk server communication. This can yield nice 7-12% performance improvements - Support for integrity data for ublk - Various other ublk improvements and additions, including a ton of selftests additions and updated - Move the handling of blk-crypto software fallback from below the block layer to above it. This reduces the complexity of dealing with bio splitting - Series fixing a number of potential deadlocks in blk-mq related to the queue usage counter and writeback throttling and rq-qos debugfs handling - Add an async_depth queue attribute, to resolve a performance regression that's been around for a qhilw related to the scheduler depth handling - Only use task_work for IOPOLL completions on NVMe, if it is necessary to do so. An earlier fix for an issue resulted in all these completions being punted to task_work, to guarantee that completions were only run for a given io_uring ring when it was local to that ring. With the new changes, we can detect if it's necessary to use task_work or not, and avoid it if possible. - rnbd fixes: - Fix refcount underflow in device unmap path - Handle PREFLUSH and NOUNMAP flags properly in protocol - Fix server-side bi_size for special IOs - Zero response buffer before use - Fix trace format for flags - Add .release to rnbd_dev_ktype - MD pull requests via Yu Kuai - Fix raid5_run() to return error when log_init() fails - Fix IO hang with degraded array with llbitmap - Fix percpu_ref not resurrected on suspend timeout in llbitmap - Fix GPF in write_page caused by resize race - Fix NULL pointer dereference in process_metadata_update - Fix hang when stopping arrays with metadata through dm-raid - Fix any_working flag handling in raid10_sync_request - Refactor sync/recovery code path, improve error handling for badblocks, and remove unused recovery_disabled field - Consolidate mddev boolean fields into mddev_flags - Use mempool to allocate stripe_request_ctx and make sure max_sectors is not less than io_opt in raid5 - Fix return value of mddev_trylock - Fix memory leak in raid1_run() - Add Li Nan as mdraid reviewer - Move phys_vec definitions to the kernel types, mostly in preparation for some VFIO and RDMA changes - Improve the speed for secure erase for some devices - Various little rust updates - Various other minor fixes, improvements, and cleanups * tag 'for-7.0/block-20260206' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (162 commits) blk-mq: ABI/sysfs-block: fix docs build warnings selftests: ublk: organize test directories by test ID block: decouple secure erase size limit from discard size limit block: remove redundant kill_bdev() call in set_blocksize() blk-mq: add documentation for new queue attribute async_dpeth block, bfq: convert to use request_queue->async_depth mq-deadline: covert to use request_queue->async_depth kyber: covert to use request_queue->async_depth blk-mq: add a new queue sysfs attribute async_depth blk-mq: factor out a helper blk_mq_limit_depth() blk-mq-sched: unify elevators checking for async requests block: convert nr_requests to unsigned int block: don't use strcpy to copy blockdev name blk-mq-debugfs: warn about possible deadlock blk-mq-debugfs: add missing debugfs_mutex in blk_mq_debugfs_register_hctxs() blk-mq-debugfs: remove blk_mq_debugfs_unregister_rqos() blk-mq-debugfs: make blk_mq_debugfs_register_rqos() static blk-rq-qos: fix possible debugfs_mutex deadlock blk-mq-debugfs: factor out a helper to register debugfs for all rq_qos blk-wbt: fix possible deadlock to nest pcpu_alloc_mutex under q_usage_counter ...	2026-02-09 17:57:21 -08:00
Ming Lei	e07a2039b6	selftests: ublk: add selftest for UBLK_F_NO_AUTO_PART_SCAN Add test_part_01.sh to test the UBLK_F_NO_AUTO_PART_SCAN feature flag which allows suppressing automatic partition scanning during device startup while still allowing manual partition probing. The test verifies: - Normal behavior: partitions are auto-detected without the flag - With flag: partitions are not auto-detected during START_DEV - Manual scan: blockdev --rereadpt works with the flag Also update kublk tool to support --no_auto_part_scan option and recognize the feature flag. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-31 06:36:41 -07:00
Ming Lei	e8cd481cc6	selftests: ublk: support arbitrary threads/queues combination Enable flexible thread-to-queue mapping in batch I/O mode to support arbitrary combinations of threads and queues, improving resource utilization and scalability. Key improvements: - Support N:M thread-to-queue mapping (previously limited to 1:1) - Dynamic buffer allocation based on actual queue assignment per thread - Thread-safe queue preparation with spinlock protection - Intelligent buffer index calculation for multi-queue scenarios - Enhanced validation for thread/queue combination constraints Implementation details: - Add q_thread_map matrix to track queue-to-thread assignments - Dynamic allocation of commit and fetch buffers per thread - Round-robin queue assignment algorithm for load balancing - Per-queue spinlock to prevent race conditions during prep - Updated buffer index calculation using queue position within thread This enables efficient configurations like: - Any other N:M combinations for optimal resource matching Testing: - Added test_batch_02.sh: 4 threads vs 1 queue - Added test_batch_03.sh: 1 thread vs 4 queues - Validates correctness across different mapping scenarios Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	20aeab0b08	selftests: ublk: add --batch/-b for enabling F_BATCH_IO Add --batch/-b for enabling F_BATCH_IO. Add batch_01 for covering its basic function. Add stress_08 and stress_09 for covering stress test. Add recovery test for F_BATCH_IO in generic_04 and generic_05. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	cb5a6b3087	selftests: ublk: handle UBLK_U_IO_FETCH_IO_CMDS Add support for UBLK_U_IO_FETCH_IO_CMDS to enable efficient batch fetching of I/O commands using multishot io_uring operations. Key improvements: - Implement multishot UBLK_U_IO_FETCH_IO_CMDS for continuous command fetching - Add fetch buffer management with page-aligned, mlocked buffers - Process fetched I/O command tags from kernel-provided buffers - Integrate fetch operations with existing batch I/O infrastructure - Significantly reduce uring_cmd issuing overhead through batching The implementation uses two fetch buffers per thread with automatic requeuing to maintain continuous I/O command flow. Each fetch operation retrieves multiple command tags in a single syscall, dramatically improving performance compared to individual command fetching. Technical details: - Fetch buffers are page-aligned and mlocked for optimal performance - Uses IORING_URING_CMD_MULTISHOT for continuous operation - Automatic buffer management and requeuing on completion - Enhanced CQE handling for fetch command completions Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	dee7024ffe	selftests: ublk: handle UBLK_U_IO_COMMIT_IO_CMDS Implement UBLK_U_IO_COMMIT_IO_CMDS to enable efficient batched completion of I/O operations in the batch I/O framework. This completes the batch I/O infrastructure by adding the commit phase that notifies the kernel about completed I/O operations: Key features: - Batch multiple I/O completions into single UBLK_U_IO_COMMIT_IO_CMDS - Dynamic commit buffer allocation and management per thread - Automatic commit buffer preparation before processing events - Commit buffer submission after processing completed I/Os - Integration with existing completion workflows Implementation details: - ublk_batch_prep_commit() allocates and initializes commit buffers - ublk_batch_complete_io() adds completed I/Os to current batch - ublk_batch_commit_io_cmds() submits batched completions to kernel - Modified ublk_process_io() to handle batch commit lifecycle - Enhanced ublk_complete_io() to route to batch or legacy completion The commit buffer stores completion information (tag, result, buffer details) for multiple I/Os, then submits them all at once, significantly reducing syscall overhead compared to individual I/O completions. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	d468930a01	selftests: ublk: handle UBLK_U_IO_PREP_IO_CMDS Implement support for UBLK_U_IO_PREP_IO_CMDS in the batch I/O framework: - Add batch command initialization and setup functions - Implement prep command queueing with proper buffer management - Add command completion handling for prep and commit commands - Integrate batch I/O setup into thread initialization - Update CQE handling to support batch commands The implementation uses the previously established buffer management infrastructure to queue UBLK_U_IO_PREP_IO_CMDS commands. Commands are prepared in the first thread context and use commit buffers for efficient command batching. Key changes: - ublk_batch_queue_prep_io_cmds() prepares I/O command batches - ublk_batch_compl_cmd() handles batch command completions - Modified thread setup to use batch operations when enabled - Enhanced buffer index calculation for batch mode Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	dccbfa9d41	selftests: ublk: add batch buffer management infrastructure Add the foundational infrastructure for UBLK_F_BATCH_IO buffer management including: - Allocator utility functions for small sized per-thread allocation - Batch buffer allocation and deallocation functions - Buffer index management for commit buffers - Thread state management for batch I/O mode - Buffer size calculation based on device features This prepares the groundwork for handling batch I/O commands by establishing the buffer management layer needed for UBLK_U_IO_PREP_IO_CMDS and UBLK_U_IO_COMMIT_IO_CMDS operations. The allocator uses CPU sets for efficient per-thread buffer tracking, and commit buffers are pre-allocated with 2 buffers per thread to handle overlapping command operations. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	f1d621b5a0	selftests: ublk: add ublk_io_buf_idx() for returning io buffer index Since UBLK_F_PER_IO_DAEMON is added, io buffer index may depend on current thread because the common way is to use per-pthread io_ring_ctx for issuing ublk uring_cmd. Add one helper for returning io buffer index, so we can hide the buffer index implementation details for target code. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	584709ad5c	selftests: ublk: replace assert() with ublk_assert() Replace assert() with ublk_assert() since it is often triggered in daemon, and we may get nothing shown in terminal. Add ublk_assert(), so we can log something to syslog when assert() is triggered. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-22 20:05:41 -07:00
Ming Lei	e7e1cc18f1	selftests/ublk: fix garbage output in foreground mode Initialize _evtfd to -1 in struct dev_ctx to prevent garbage output when running kublk in foreground mode. Without this, _evtfd is zero-initialized to 0 (stdin), and ublk_send_dev_event() writes binary data to stdin which appears as garbage on the terminal. Also fix debug message format string. Fixes: `6aecda00b7` ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-21 07:54:55 -07:00
Ming Lei	23e62cf755	selftests/ublk: fix error handling for starting device Fix error handling in ublk_start_daemon() when start_dev fails: 1. Call ublk_ctrl_stop_dev() to cancel inflight uring_cmd before cleanup. Without this, the device deletion may hang waiting for I/O completion that will never happen. 2. Add fail_start label so that pthread_join() is called on the error path. This ensures proper thread cleanup when startup fails. Fixes: `6aecda00b7` ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-21 07:54:55 -07:00
Ming Lei	75aad5ffe0	selftests/ublk: fix IO thread idle check Include cmd_inflight in ublk_thread_is_done() check. Without this, the thread may exit before all FETCH commands are completed, which may cause device deletion to hang. Fixes: `6aecda00b7` ("selftests: ublk: add kernel selftests for ublk") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-21 07:54:55 -07:00
Ming Lei	65955a0993	selftests: ublk: add stop command with --safe option Add 'stop' subcommand to kublk utility that uses the new UBLK_CMD_TRY_STOP_DEV command when --safe option is specified. This allows stopping a device only if it has no active openers, returning -EBUSY otherwise. Also add test_generic_16.sh to test the new functionality. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-12 15:07:31 -07:00
Caleb Sander Mateos	24f8a44b79	selftests: ublk: implement integrity user copy in kublk If integrity data is enabled for kublk, allocate an integrity buffer for each I/O. Extend ublk_user_copy() to copy the integrity data between the ublk request and the integrity buffer if the ublksrv_io_desc indicates that the request has integrity data. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-12 09:16:38 -07:00
Caleb Sander Mateos	6ed6476c4a	selftests: ublk: add kublk support for integrity params Add integrity param command line arguments to kublk. Plumb these to struct ublk_params for the null and fault_inject targets, as they don't need to actually read or write the integrity data. Forbid the integrity params for loop or stripe until the integrity data copy is implemented. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-12 09:16:38 -07:00
Caleb Sander Mateos	c1d7c0f9cd	selftests: ublk: display UBLK_F_INTEGRITY support Add support for printing the UBLK_F_INTEGRITY feature flag in the human-readable kublk features output. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2026-01-12 09:16:38 -07:00
Caleb Sander Mateos	b9f0a94c3b	selftests: ublk: add support for user copy to kublk The ublk selftests mock ublk server kublk supports every data copy mode except user copy. Add support for user copy to kublk, enabled via the --user_copy (-u) command line argument. On writes, issue pread() calls to copy the write data into the ublk_io's buffer before dispatching the write to the target implementation. On reads, issue pwrite() calls to copy read data from the ublk_io's buffer before committing the request. Copy in 2 KB chunks to provide some coverage of the offseting logic. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-12-12 12:50:41 -07:00
Caleb Sander Mateos	52bc483763	selftests: ublk: forbid multiple data copy modes The kublk mock ublk server allows multiple data copy mode arguments to be passed on the command line (--zero_copy, --get_data, and --auto_zc). The ublk device will be created with all the requested feature flags, however kublk will only use one of the modes to interact with request data (arbitrarily preferring auto_zc over zero_copy over get_data). To clarify the intent of the test, don't allow multiple data copy modes to be specified. --zero_copy and --auto_zc are allowed together for --auto_zc_fallback, which uses both copy modes. Don't set UBLK_F_USER_COPY for zero_copy, as it's a separate feature. Fix the test cases in test_stress_05 passing --get_data along with --zero_copy or --auto_zc. Signed-off-by: Caleb Sander Mateos <csander@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-12-12 12:50:41 -07:00
Ming Lei	3f5b1169d2	selftests: ublk: make ublk_thread thread-local variable Refactor ublk_thread to be a thread-local variable instead of storing it in ublk_dev: - Remove pthread_t thread field from struct ublk_thread and move it to struct ublk_thread_info - Remove struct ublk_thread array from struct ublk_dev, reducing memory footprint - Define struct ublk_thread as local variable in __ublk_io_handler_fn() instead of accessing it from dev->threads[] - Extract main IO handling logic into __ublk_io_handler_fn() which is marked as noinline - Move CPU affinity setup to ublk_io_handler_fn() before calling __ublk_io_handler_fn() - Update ublk_thread_set_sched_affinity() to take struct ublk_thread_info * instead of struct ublk_thread *, and use pthread_setaffinity_np() instead of sched_setaffinity() - Reorder struct ublk_thread fields to group related state together This change makes each thread's ublk_thread structure truly local to the thread, improving cache locality and reducing memory usage. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-03 08:34:59 -07:00
Ming Lei	0123bb91f4	selftests: ublk: set CPU affinity before thread initialization Move ublk_thread_set_sched_affinity() call before ublk_thread_init() to ensure memory allocations during thread initialization occur on the correct NUMA node. This leverages Linux's first-touch memory policy for better NUMA locality. Also convert ublk_thread_set_sched_affinity() to use pthread_setaffinity_np() instead of sched_setaffinity(), as the pthread API is the proper interface for setting thread affinity in multithreaded programs. Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-11-03 08:34:59 -07:00
Linus Torvalds	e1b1d03cee	Merge tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - NVMe pull request via Keith: - FC target fixes (Daniel) - Authentication fixes and updates (Martin, Chris) - Admin controller handling (Kamaljit) - Target lockdep assertions (Max) - Keep-alive updates for discovery (Alastair) - Suspend quirk (Georg) - MD pull request via Yu: - Add support for a lockless bitmap. A key feature for the new bitmap are that the IO fastpath is lockless. If a user issues lots of write IO to the same bitmap bit in a short time, only the first write has additional overhead to update bitmap bit, no additional overhead for the following writes. By supporting only resync or recover written data, means in the case creating new array or replacing with a new disk, there is no need to do a full disk resync/recovery. - Switch ->getgeo() and ->bios_param() to using struct gendisk rather than struct block_device. - Rust block changes via Andreas. This series adds configuration via configfs and remote completion to the rnull driver. The series also includes a set of changes to the rust block device driver API: a few cleanup patches, and a few features supporting the rnull changes. The series removes the raw buffer formatting logic from `kernel::block` and improves the logic available in `kernel::string` to support the same use as the removed logic. - floppy arch cleanups - Reduce the number of dereferencing needed for ublk commands - Restrict supported sockets for nbd. Mostly done to eliminate a class of issues perpetually reported by syzbot, by using nonsensical socket setups. - A few s390 dasd block fixes - Fix a few issues around atomic writes - Improve DMA interation for integrity requests - Improve how iovecs are treated with regards to O_DIRECT aligment constraints. We used to require each segment to adhere to the constraints, now only the request as a whole needs to. - Clean up and improve p2p support, enabling use of p2p for metadata payloads - Improve locking of request lookup, using SRCU where appropriate - Use page references properly for brd, avoiding very long RCU sections - Fix ordering of recursively submitted IOs - Clean up and improve updating nr_requests for a live device - Various fixes and cleanups * tag 'for-6.18/block-20250929' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (164 commits) s390/dasd: enforce dma_alignment to ensure proper buffer validation s390/dasd: Return BLK_STS_INVAL for EINVAL from do_dasd_request ublk: remove redundant zone op check in ublk_setup_iod() nvme: Use non zero KATO for persistent discovery connections nvmet: add safety check for subsys lock nvme-core: use nvme_is_io_ctrl() for I/O controller check nvme-core: do ioccsz/iorcsz validation only for I/O controllers nvme-core: add method to check for an I/O controller blk-cgroup: fix possible deadlock while configuring policy blk-mq: fix null-ptr-deref in blk_mq_free_tags() from error path blk-mq: Fix more tag iteration function documentation selftests: ublk: fix behavior when fio is not installed ublk: don't access ublk_queue in ublk_unmap_io() ublk: pass ublk_io to __ublk_complete_rq() ublk: don't access ublk_queue in ublk_need_complete_req() ublk: don't access ublk_queue in ublk_check_commit_and_fetch() ublk: don't pass ublk_queue to ublk_fetch() ublk: don't access ublk_queue in ublk_config_io_buf() ublk: don't access ublk_queue in ublk_check_fetch_buf() ublk: pass q_id and tag to __ublk_check_and_get_req() ...	2025-10-02 10:16:56 -07:00
Uday Shankar	742bcc1101	selftests: ublk: kublk: add UBLK_F_BUF_REG_OFF_DAEMON to feat_map When UBLK_F_BUF_REG_OFF_DAEMON was added, we missed updating kublk's feat_map, which results in the feature being reported as "unknown." Add UBLK_F_BUF_REG_OFF_DAEMON to feat_map to fix this. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-19 11:06:15 -06:00
Uday Shankar	1f924cf781	selftests: ublk: kublk: simplify feat_map definition Simplify the definition of feat_map by introducing a helper macro FEAT_NAME to avoid having to type the feature name twice. As a side effect, this changes the names in the feature list to be the full macro name instead of the abbreviated names that were used before, but this is a good change for clarity. Using the full feature macro names ruins the alignment of the output, so change the output format to put each feature's hex value before its name, as this is easier to align nicely. The output now looks as follows: root# ./kublk features ublk_drv features: 0x7fff 0x1 : UBLK_F_SUPPORT_ZERO_COPY 0x2 : UBLK_F_URING_CMD_COMP_IN_TASK 0x4 : UBLK_F_NEED_GET_DATA 0x8 : UBLK_F_USER_RECOVERY 0x10 : UBLK_F_USER_RECOVERY_REISSUE 0x20 : UBLK_F_UNPRIVILEGED_DEV 0x40 : UBLK_F_CMD_IOCTL_ENCODE 0x80 : UBLK_F_USER_COPY 0x100 : UBLK_F_ZONED 0x200 : UBLK_F_USER_RECOVERY_FAIL_IO 0x400 : UBLK_F_UPDATE_SIZE 0x800 : UBLK_F_AUTO_BUF_REG 0x1000 : UBLK_F_QUIESCE 0x2000 : UBLK_F_PER_IO_DAEMON 0x4000 : unknown Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-09-19 11:06:15 -06:00
Ming Lei	9b2785ea85	ublk selftests: add --no_ublk_fixed_fd for not using registered ublk char device Add a new command line option --no_ublk_fixed_fd that excludes the ublk control device (/dev/ublkcN) from io_uring's registered files array. When this option is used, only backing files are registered starting from index 1, while the ublk control device is accessed using its raw file descriptor. Add ublk_get_registered_fd() helper function that returns the appropriate file descriptor for use with io_uring operations. Key optimizations implemented: - Cache UBLKS_Q_NO_UBLK_FIXED_FD flag in ublk_queue.flags to avoid reading dev->no_ublk_fixed_fd in fast path - Cache ublk char device fd in ublk_queue.ublk_fd for fast access - Update ublk_get_registered_fd() to use ublk_queue * parameter - Update io_uring_prep_buf_register/unregister() to use ublk_queue * - Replace ublk_device * access with ublk_queue * access in fast paths Also pass --no_ublk_fixed_fd to test_stress_04.sh for covering plain ublk char device mode. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250827121602.2619736-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-28 07:56:57 -06:00
Akhilesh Patil	0227af355b	selftests: ublk: Use ARRAY_SIZE() macro to improve code Use ARRAY_SIZE() macro while calculating size of an array to improve code readability and reduce potential sizing errors. Implement this suggestion given by spatch tool by running coccinelle script - scripts/coccinelle/misc/array_size.cocci Follow ARRAY_SIZE() macro usage pattern in ublk.c introduced by, commit `ec12009318` ("selftests: ublk: fix ublk_find_tgt()") wherever appropriate to maintain consistency. Signed-off-by: Akhilesh Patil <akhilesh@ee.iitb.ac.in> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/aKGihYui6/Pcijbk@bhairav-test.ee.iitb.ac.in Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-08-18 05:36:29 -06:00
Ming Lei	c1dc9b0d9e	selftests: ublk: add helper ublk_handle_uring_cmd() for handle ublk command Add helper ublk_handle_uring_cmd() for handling ublk command, and make ublk_handle_cqe() more readable. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-17-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Ming Lei	a66f890176	selftests: ublk: improve flags naming Improve all kinds of flags naming by adding its host structure suffix for making code more readable. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-16-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Ming Lei	c3a6d48f86	selftests: ublk: remove ublk queue self-defined flags Remove ublk queue self-defined flags, and use the uapi flags directly. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-15-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Ming Lei	92dda98424	selftests: ublk: pass 'ublk_thread ' to more common helpers Pass 'ublk_thread ' to more common helpers, then we can avoid to store this reference into 'struct ublk_io'. Prepare for supporting to handle IO via different task context. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-14-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Ming Lei	e0054835bf	selftests: ublk: pass 'ublk_thread ' to ->queue_io() and ->tgt_io_done() 'struct thread' is task local structure, and the related code will become more readable if we pass it via parameter. Meantime pass 'ublk_thread ' to ublk_io_alloc_sqes(), and this way is natural since we use per-thread io_uring for handling IO. More importantly it helps much for removing the current ubq_daemon or per-io-task limit. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-13-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Ming Lei	b36c73251a	selftests: ublk: remove `tag` parameter of ->tgt_io_done() The `tag` parameter can be figured out from cqe->user_data, and that is also the only way to get the info, so remove `tag` parameter, and let target code retrieve it from cqe->user_data. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250713143415.2857561-12-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-07-15 08:04:17 -06:00
Uday Shankar	a2f4c1ae16	selftests: ublk: kublk: improve behavior on init failure Some failure modes are handled poorly by kublk. For example, if ublk_drv is built as a module but not currently loaded into the kernel, ./kublk add ... just hangs forever. This happens because in this case (and a few others), the worker process does not notify its parent (via a write to the shared eventfd) that it has tried and failed to initialize, so the parent hangs forever. Fix this by ensuring that we always notify the parent process of any initialization failure, and have the parent print a (not very descriptive) log line when this happens. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250603-ublk_init_fail-v1-1-87c91486230e@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-06-03 20:19:44 -06:00
Uday Shankar	abe54c1603	selftests: ublk: kublk: decouple ublk_queues from ublk server threads Add support in kublk for decoupled ublk_queues and ublk server threads. kublk now has two modes of operation: - (preexisting mode) threads and queues are paired 1:1, and each thread services all the I/Os of one queue - (new mode) thread and queue counts are independently configurable. threads service I/Os in a way that balances load across threads even if load is not balanced over queues. The default is the preexisting mode. The new mode is activated by passing the --per_io_tasks flag. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-6-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-31 14:38:43 -06:00
Uday Shankar	b9848ca7a7	selftests: ublk: kublk: move per-thread data out of ublk_queue Towards the goal of decoupling ublk_queues from ublk server threads, move resources/data that should be per-thread rather than per-queue out of ublk_queue and into a new struct ublk_thread. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-5-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-31 14:38:39 -06:00
Uday Shankar	8f75ba28b8	selftests: ublk: kublk: lift queue initialization out of thread Currently, each ublk server I/O handler thread initializes its own queue. However, as we move towards decoupled ublk_queues and ublk server threads, this model does not make sense anymore, as there will no longer be a concept of a thread having "its own" queue. So lift queue initialization out of the per-thread ublk_io_handler_fn and into a loop in ublk_start_daemon (which runs once for each device). There is a part of ublk_queue_init (ring initialization) which does actually need to happen on the thread that will use the ring; that is separated into a separate ublk_thread_init which is still called by each I/O handler thread. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-4-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-31 14:38:35 -06:00
Uday Shankar	9773709752	selftests: ublk: kublk: tie sqe allocation to io instead of queue We currently have a helper ublk_queue_alloc_sqes which the ublk targets use to allocate SQEs for their own operations. However, as we move towards decoupled ublk_queues and ublk server threads, this helper does not make sense anymore. SQEs are allocated from rings, and we will have one ring per thread to avoid locking. Change the SQE allocation helper to ublk_io_alloc_sqes. Currently this still allocates SQEs from the io's queue's ring, but when we fully decouple threads and queues, it will allocate from the io's thread's ring instead. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-3-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-31 14:38:30 -06:00
Uday Shankar	bf098d7269	selftests: ublk: kublk: plumb q_id in io_uring user_data Currently, when we process CQEs, we know which ublk_queue we are working on because we know which ring we are working on, and ublk_queues and rings are in 1:1 correspondence. However, as we decouple ublk_queues from ublk server threads, ublk_queues and rings will no longer be in 1:1 correspondence - each ublk server thread will have a ring, and each thread may issue commands against more than one ublk_queue. So in order to know which ublk_queue a CQE refers to, plumb that information in the associated SQE's user_data. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250529-ublk_task_per_io-v8-2-e9d3b119336a@purestorage.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-31 14:38:26 -06:00
Ming Lei	533c87e2ed	selftests: ublk: add test for UBLK_F_QUIESCE Add test generic_11 for covering new control command of UBLK_U_CMD_QUIESCE_DEV. Add 'quiesce -n dev_id' sub-command on ublk utility for transitioning device state to quiesce states, then verify the feature via generic_10 by doing quiesce and recovery. Cc: Yoav Cohen <yoav@nvidia.com> Link: https://lore.kernel.org/linux-block/DM4PR12MB632807AB7CDCE77D1E5AB7D0A9B92@DM4PR12MB6328.namprd12.prod.outlook.com/ Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-23 09:42:12 -06:00
Ming Lei	f40b1f2670	selftests: ublk: add test case for UBLK_U_CMD_UPDATE_SIZE Add test generic_10 for covering new control command of UBLK_U_CMD_UPDATE_SIZE. Add 'update_size -s\|--size size_in_bytes' sub-command on ublk utility for supporting this feature, then verify the feature via generic_10. Cc: Omri Mann <omri@nvidia.com> Cc: Jared Holzman <jholzman@nvidia.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250522163523.406289-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-23 09:42:12 -06:00
Ming Lei	6f1a182a87	selftests: ublk: add test for covering UBLK_AUTO_BUF_REG_FALLBACK Add test for covering UBLK_AUTO_BUF_REG_FALLBACK: - pass '--auto_zc_fallback' to null target, which requires both F_AUTO_BUF_REG and F_SUPPORT_ZERO_COPY for handling UBLK_AUTO_BUF_REG_FALLBACK - add ->buf_index() method for returning invalid buffer index to trigger UBLK_AUTO_BUF_REG_FALLBACK - add generic_09 for running the test - add --auto_zc_fallback test in stress_03/stress_04/stress_05 Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-20 10:24:45 -06:00
Ming Lei	8ccebc19ee	selftests: ublk: support UBLK_F_AUTO_BUF_REG Enable UBLK_F_AUTO_BUF_REG support for ublk utility by argument `--auto_zc`, meantime support this feature in null, loop and stripe target code. Add function test generic_08 for covering basic UBLK_F_AUTO_BUF_REG feature. Also cover UBLK_F_AUTO_BUF_REG in stress_03, stress_04 and stress_05 test too. 'fio/t/io_uring -p0 /dev/ublkb0' shows that F_AUTO_BUF_REG can improve IOPS by 50% compared with F_SUPPORT_ZERO_COPY in my test VM. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250520045455.515691-6-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-05-20 10:24:45 -06:00
Ming Lei	730d837979	selftests: ublk: fix UBLK_F_NEED_GET_DATA Commit `57e13a2e8c` ("selftests: ublk: support user recovery") starts to support UBLK_F_NEED_GET_DATA for covering recovery feature, however the ublk utility implementation isn't done correctly. Fix it by supporting UBLK_F_NEED_GET_DATA correctly. Also add test generic_07 for covering UBLK_F_NEED_GET_DATA. Reviewed-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: `57e13a2e8c` ("selftests: ublk: support user recovery") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250429022941.1718671-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-29 06:01:36 -06:00
Ming Lei	5533bc70ae	selftests: ublk: fix recover test When adding recovery test: - 'break' is missed for handling '-g' argument - test name of test_generic_05.sh is wrong So fix the two. Fixes: `57e13a2e8c` ("selftests: ublk: support user recovery") Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Uday Shankar <ushankar@purestorage.com> Link: https://lore.kernel.org/r/20250421235947.715272-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-23 13:58:55 -06:00
Uday Shankar	81586652bb	selftests: ublk: add generic_06 for covering fault inject Add one simple fault inject target, and verify if an application using ublk device sees an I/O error quickly after the ublk server dies. Signed-off-by: Uday Shankar <ushankar@purestorage.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250416035444.99569-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:33:21 -06:00
Ming Lei	57e13a2e8c	selftests: ublk: support user recovery Add user recovery feature. Meantime add user recovery test: generic_04 and generic_05(zero copy) Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-12-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:32:18 -06:00
Ming Lei	810b88f3dc	selftests: ublk: support target specific command line Support target specific command line for making related command line code handling more readable & clean. Also helps for adding new features. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-11-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:32:18 -06:00
Ming Lei	6c62fd04e8	selftests: ublk: increase max nr_queues and queue depth Increase max nr_queues to 32, and queue depth to 1024. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-10-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:32:18 -06:00
Ming Lei	2f0a692a93	selftests: ublk: set queue pthread's cpu affinity In NUMA machine, ublk IO performance is very sensitive with queue pthread's affinity setting. Retrieve queue's affinity and select the 1st cpu as queue thread's sched affinity, and it is observed that single cpu task affinity can get stable & good performance if client application is put on proper cpu. Dump this info when adding one ublk device. Use shmem to communicate queue's tid between parent and daemon. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-9-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:32:18 -06:00
Ming Lei	62867a046a	selftests: ublk: setup ring with IORING_SETUP_SINGLE_ISSUER/IORING_SETUP_DEFER_TASKRUN It is observed that this way is more efficient for fast nvme backing file. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20250412023035.2649275-8-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2025-04-16 19:32:18 -06:00

1 2

65 Commits