mirror of
https://github.com/torvalds/linux.git
synced 2026-04-18 06:44:00 -04:00
The atomic use in the main I/O path is causing perf issues when using higher performance backend devices and multiple queues (more than 10 when using vhost-scsi) like with this fio workload: [global] bs=4K iodepth=128 direct=1 ioengine=libaio group_reporting time_based runtime=120 name=standard-iops rw=randread numjobs=16 cpus_allowed=0-15 To fix this issue, move the LUN stats to per CPU. Note: I forgot to include this patch with the delayed/ordered per CPU tracking and per device/device entry per CPU stats. With this patch you get the full 33% improvements when using fast backends, multiple queues and multiple IO submiters. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://patch.msgid.link/20250917221338.14813-4-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>