mirror of
https://github.com/torvalds/linux.git
synced 2026-04-18 23:03:57 -04:00
Implement a PCI target driver using the PCI endpoint framework. This requires hardware with a PCI controller capable of executing in endpoint mode. The PCI endpoint framework is used to set up a PCI endpoint function and its BAR compatible with a NVMe PCI controller. The framework is also used to map local memory to the PCI address space to execute MMIO accesses for retrieving NVMe commands from submission queues and posting completion entries to completion queues. If supported, DMA is used for command retreival and command data transfers, based on the PCI address segments indicated by the command using either PRPs or SGLs. The NVMe target driver relies on the NVMe target core code to execute all commands isssued by the host. The PCI target driver is mainly responsible for the following: - Initialization and teardown of the endpoint device and its backend PCI target controller. The PCI target controller is created using a subsystem and a port defined through configfs. The port used must be initialized with the "pci" transport type. The target controller is allocated and initialized when the PCI endpoint is started by binding it to the endpoint PCI device (nvmet_pci_epf_epc_init() function). - Manage the endpoint controller state according to the PCI link state and the actions of the host (e.g. checking the CC.EN register) and propagate these actions to the PCI target controller. Polling of the controller enable/disable is done using a delayed work scheduled every 5ms (nvmet_pci_epf_poll_cc() function). This work is started whenever the PCI link comes up (nvmet_pci_epf_link_up() notifier function) and stopped when the PCI link comes down (nvmet_pci_epf_link_down() notifier function). nvmet_pci_epf_poll_cc() enables and disables the PCI controller using the functions nvmet_pci_epf_enable_ctrl() and nvmet_pci_epf_disable_ctrl(). The controller admin queue is created using nvmet_pci_epf_create_cq(), which calls nvmet_cq_create(), and nvmet_pci_epf_create_sq() which uses nvmet_sq_create(). nvmet_pci_epf_disable_ctrl() always resets the PCI controller to its initial state so that nvmet_pci_epf_enable_ctrl() can be called again. This ensures correct operation if, for instance, the host reboots causing the PCI link to be temporarily down. - Manage the controller admin and I/O submission queues using local memory. Commands are obtained from submission queues using a work item that constantly polls the doorbells of all submissions queues (nvmet_pci_epf_poll_sqs() function). This work is started whenever the controller is enabled (nvmet_pci_epf_enable_ctrl() function) and stopped when the controller is disabled (nvmet_pci_epf_disable_ctrl() function). When new commands are submitted by the host, DMA transfers are used to retrieve the commands. - Initiate the execution of all admin and I/O commands using the target core code, by calling a requests execute() function. All commands are individually handled using a per-command work item (nvmet_pci_epf_iod_work() function). A command overall execution includes: initializing a struct nvmet_req request for the command, using nvmet_req_transfer_len() to get a command data transfer length, parse the command PRPs or SGLs to get the PCI address segments of the command data buffer, retrieve data from the host (if the command is a write command), call req->execute() to execute the command and transfer data to the host (for read commands). - Handle the completions of commands as notified by the ->queue_response() operation of the PCI target controller (nvmet_pci_epf_queue_response() function). Completed commands are added to a list of completed command for their CQ. Each CQ list of completed command is processed using a work item (nvmet_pci_epf_cq_work() function) which posts entries for the completed commands in the CQ memory and raise an IRQ to the host to signal the completion. IRQ coalescing is supported as mandated by the NVMe base specification for PCI controllers. Of note is that completion entries are transmitted to the host using MMIO, after mapping the completion queue memory to the host PCI address space. Unlike for retrieving commands from SQs, DMA is not used as it degrades performance due to the transfer serialization needed (which delays completion entries transmission). The configuration of a NVMe PCI endpoint controller is done using configfs. First the NVMe PCI target controller configuration must be done to set up a subsystem and a port with the "pci" addr_trtype attribute. The subsystem can be setup using a file or block device backed namespace or using a passthrough NVMe device. After this, the PCI endpoint can be configured and bound to the PCI endpoint controller to start the NVMe endpoint controller. In order to not overcomplicate this initial implementation of an endpoint PCI target controller driver, protection information is not for now supported. If the PCI controller port and namespace are configured with protection information support, an error will be returned when the controller is created and initialized when the endpoint function is started. Protection information support will be added in a follow-up patch series. Using a Rock5B board (Rockchip RK3588 SoC, PCI Gen3x4 endpoint controller) with a target PCI controller setup with 4 I/O queues and a null_blk block device as a namespace, the maximum performance using fio was measured at 131 KIOPS for random 4K reads and up to 2.8 GB/S throughput. Some data points are: Rnd read, 4KB, QD=1, 1 job : IOPS=16.9k, BW=66.2MiB/s (69.4MB/s) Rnd read, 4KB, QD=32, 1 job : IOPS=78.5k, BW=307MiB/s (322MB/s) Rnd read, 4KB, QD=32, 4 jobs: IOPS=131k, BW=511MiB/s (536MB/s) Seq read, 512KB, QD=32, 1 job : IOPS=5381, BW=2691MiB/s (2821MB/s) The NVMe PCI endpoint target driver is not intended for production use. It is a tool for learning NVMe, exploring existing features and testing implementations of new NVMe features. Co-developed-by: Rick Wertenbroek <rick.wertenbroek@gmail.com> Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Tested-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by: Keith Busch <kbusch@kernel.org>
129 lines
3.6 KiB
Plaintext
129 lines
3.6 KiB
Plaintext
# SPDX-License-Identifier: GPL-2.0-only
|
|
|
|
config NVME_TARGET
|
|
tristate "NVMe Target support"
|
|
depends on BLOCK
|
|
depends on CONFIGFS_FS
|
|
select NVME_KEYRING if NVME_TARGET_TCP_TLS
|
|
select KEYS if NVME_TARGET_TCP_TLS
|
|
select SGL_ALLOC
|
|
help
|
|
This enabled target side support for the NVMe protocol, that is
|
|
it allows the Linux kernel to implement NVMe subsystems and
|
|
controllers and export Linux block devices as NVMe namespaces.
|
|
You need to select at least one of the transports below to make this
|
|
functionality useful.
|
|
|
|
To configure the NVMe target you probably want to use the nvmetcli
|
|
tool from http://git.infradead.org/users/hch/nvmetcli.git.
|
|
|
|
config NVME_TARGET_DEBUGFS
|
|
bool "NVMe Target debugfs support"
|
|
depends on NVME_TARGET
|
|
help
|
|
This enables debugfs support to display the connected controllers
|
|
to each subsystem
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_PASSTHRU
|
|
bool "NVMe Target Passthrough support"
|
|
depends on NVME_TARGET
|
|
depends on NVME_CORE=y || NVME_CORE=NVME_TARGET
|
|
help
|
|
This enables target side NVMe passthru controller support for the
|
|
NVMe Over Fabrics protocol. It allows for hosts to manage and
|
|
directly access an actual NVMe controller residing on the target
|
|
side, including executing Vendor Unique Commands.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_LOOP
|
|
tristate "NVMe loopback device support"
|
|
depends on NVME_TARGET
|
|
select NVME_FABRICS
|
|
select SG_POOL
|
|
help
|
|
This enables the NVMe loopback device support, which can be useful
|
|
to test NVMe host and target side features.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_RDMA
|
|
tristate "NVMe over Fabrics RDMA target support"
|
|
depends on INFINIBAND && INFINIBAND_ADDR_TRANS
|
|
depends on NVME_TARGET
|
|
select SGL_ALLOC
|
|
help
|
|
This enables the NVMe RDMA target support, which allows exporting NVMe
|
|
devices over RDMA.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_FC
|
|
tristate "NVMe over Fabrics FC target driver"
|
|
depends on NVME_TARGET
|
|
depends on HAS_DMA
|
|
select SGL_ALLOC
|
|
help
|
|
This enables the NVMe FC target support, which allows exporting NVMe
|
|
devices over FC.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_FCLOOP
|
|
tristate "NVMe over Fabrics FC Transport Loopback Test driver"
|
|
depends on NVME_TARGET
|
|
select NVME_FABRICS
|
|
select SG_POOL
|
|
depends on NVME_FC
|
|
depends on NVME_TARGET_FC
|
|
help
|
|
This enables the NVMe FC loopback test support, which can be useful
|
|
to test NVMe-FC transport interfaces.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_TCP
|
|
tristate "NVMe over Fabrics TCP target support"
|
|
depends on INET
|
|
depends on NVME_TARGET
|
|
help
|
|
This enables the NVMe TCP target support, which allows exporting NVMe
|
|
devices over TCP.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_TCP_TLS
|
|
bool "NVMe over Fabrics TCP target TLS encryption support"
|
|
depends on NVME_TARGET_TCP
|
|
select NET_HANDSHAKE
|
|
help
|
|
Enables TLS encryption for the NVMe TCP target using the netlink handshake API.
|
|
|
|
The TLS handshake daemon is available at
|
|
https://github.com/oracle/ktls-utils.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_AUTH
|
|
bool "NVMe over Fabrics In-band Authentication in target side"
|
|
depends on NVME_TARGET
|
|
select NVME_AUTH
|
|
help
|
|
This enables support for NVMe over Fabrics In-band Authentication in
|
|
target side.
|
|
|
|
If unsure, say N.
|
|
|
|
config NVME_TARGET_PCI_EPF
|
|
tristate "NVMe PCI Endpoint Function target support"
|
|
depends on NVME_TARGET && PCI_ENDPOINT
|
|
depends on NVME_CORE=y || NVME_CORE=NVME_TARGET
|
|
help
|
|
This enables the NVMe PCI Endpoint Function target driver support,
|
|
which allows creating a NVMe PCI controller using an endpoint mode
|
|
capable PCI controller.
|
|
|
|
If unsure, say N.
|