Commit Graph

133 Commits

Author SHA1 Message Date
Linus Torvalds
1ddeeb2a05 Merge tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:

 - MD pull requests via Song:
      - Cleanup redundant checks (Yu Kuai)
      - Remove deprecated headers (Marc Zyngier, Song Liu)
      - Concurrency fixes (Li Lingfeng)
      - Memory leak fix (Li Nan)
      - Refactor raid1 read_balance (Yu Kuai, Paul Luse)
      - Clean up and fix for md_ioctl (Li Nan)
      - Other small fixes (Gui-Dong Han, Heming Zhao)
      - MD atomic limits (Christoph)

 - NVMe pull request via Keith:
      - RDMA target enhancements (Max)
      - Fabrics fixes (Max, Guixin, Hannes)
      - Atomic queue_limits usage (Christoph)
      - Const use for class_register (Ricardo)
      - Identification error handling fixes (Shin'ichiro, Keith)

 - Improvement and cleanup for cached request handling (Christoph)

 - Moving towards atomic queue limits. Core changes and driver bits so
   far (Christoph)

 - Fix UAF issues in aoeblk (Chun-Yi)

 - Zoned fix and cleanups (Damien)

 - s390 dasd cleanups and fixes (Jan, Miroslav)

 - Block issue timestamp caching (me)

 - noio scope guarding for zoned IO (Johannes)

 - block/nvme PI improvements (Kanchan)

 - Ability to terminate long running discard loop (Keith)

 - bdev revalidation fix (Li)

 - Get rid of old nr_queues hack for kdump kernels (Ming)

 - Support for async deletion of ublk (Ming)

 - Improve IRQ bio recycling (Pavel)

 - Factor in CPU capacity for remote vs local completion (Qais)

 - Add shared_tags configfs entry for null_blk (Shin'ichiro

 - Fix for a regression in page refcounts introduced by the folio
   unification (Tony)

 - Misc fixes and cleanups (Arnd, Colin, John, Kunwu, Li, Navid,
   Ricardo, Roman, Tang, Uwe)

* tag 'for-6.9/block-20240310' of git://git.kernel.dk/linux: (221 commits)
  block: partitions: only define function mac_fix_string for CONFIG_PPC_PMAC
  block/swim: Convert to platform remove callback returning void
  cdrom: gdrom: Convert to platform remove callback returning void
  block: remove disk_stack_limits
  md: remove mddev->queue
  md: don't initialize queue limits
  md/raid10: use the atomic queue limit update APIs
  md/raid5: use the atomic queue limit update APIs
  md/raid1: use the atomic queue limit update APIs
  md/raid0: use the atomic queue limit update APIs
  md: add queue limit helpers
  md: add a mddev_is_dm helper
  md: add a mddev_add_trace_msg helper
  md: add a mddev_trace_remap helper
  bcache: move calculation of stripe_size and io_opt into bcache_device_init
  virtio_blk: Do not use disk_set_max_open/active_zones()
  aoe: fix the potential use-after-free problem in aoecmd_cfg_pkts
  block: move capacity validation to blkpg_do_ioctl()
  block: prevent division by zero in blk_rq_stat_sum()
  drbd: atomically update queue limits in drbd_reconsider_queue_parameters
  ...
2024-03-11 11:43:44 -07:00
Max Gurtovoy
ca2b221d89 nvmet: introduce new max queue size configuration entry
Using this port configuration, one will be able to set the maximal queue
size to be used for any controller that will be associated to the
configured port.

The default value stayed 1024 but each transport will be able to set the
its own values before enabling the port.

Introduce lower limit of 16 for minimal queue depth (same as we use in
the host fabrics drivers).

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Guixin Liu <kanie@linux.alibaba.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02 15:18:08 -08:00
Max Gurtovoy
63e8fd6240 nvmet: set maxcmd to be per controller
This is a preparation for having a dynamic configuration of max queue
size for a controller. Make sure that the maxcmd field stays the same as
the MQES (+1) value as we do today.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-03-02 15:18:08 -08:00
Christian Brauner
e9a7254b03 nvme: port block device access to file
Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-15-adbd023e19cc@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-25 12:05:24 +01:00
Linus Torvalds
90d624af2e Merge tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:

 - Improvements to the queue_rqs() support, and adding null_blk support
   for that as well (Chengming)

 - Series improving badblocks support (Coly)

 - Key store support for sed-opal (Greg)

 - IBM partition string handling improvements (Jan)

 - Make number of ublk devices supported configurable (Mike)

 - Cancelation improvements for ublk (Ming)

 - MD pull requests via Song:
     - Handle timeout in md-cluster, by Denis Plotnikov
     - Cleanup pers->prepare_suspend, by Yu Kuai
     - Rewrite mddev_suspend(), by Yu Kuai
     - Simplify md_seq_ops, by Yu Kuai
     - Reduce unnecessary locking array_state_store(), by Mariusz
       Tkaczyk
     - Make rdev add/remove independent from daemon thread, by Yu Kuai
     - Refactor code around quiesce() and mddev_suspend(), by Yu Kuai

 - NVMe pull request via Keith:
     - nvme-auth updates (Mark)
     - nvme-tcp tls (Hannes)
     - nvme-fc annotaions (Kees)

 - Misc cleanups and improvements (Jiapeng, Joel)

* tag 'for-6.7/block-2023-10-30' of git://git.kernel.dk/linux: (95 commits)
  block: ublk_drv: Remove unused function
  md: cleanup pers->prepare_suspend()
  nvme-auth: allow mixing of secret and hash lengths
  nvme-auth: use transformed key size to create resp
  nvme-auth: alloc nvme_dhchap_key as single buffer
  nvmet-tcp: use 'spin_lock_bh' for state_lock()
  powerpc/pseries: PLPKS SED Opal keystore support
  block: sed-opal: keystore access for SED Opal keys
  block:sed-opal: SED Opal keystore
  ublk: simplify aborting request
  ublk: replace monitor with cancelable uring_cmd
  ublk: quiesce request queue when aborting queue
  ublk: rename mm_lock as lock
  ublk: move ublk_cancel_dev() out of ub->mutex
  ublk: make sure io cmd handled in submitter task context
  ublk: don't get ublk device reference in ublk_abort_queue()
  ublk: Make ublks_max configurable
  ublk: Limit dev_id/ub_number values
  md-cluster: check for timeout while a new disk adding
  nvme: rework NVME_AUTH Kconfig selection
  ...
2023-11-01 12:30:07 -10:00
Jan Kara
2a4936e933 nvmet: Convert to bdev_open_by_path()
Convert nvmet to use bdev_open_by_path() and pass the handle around.

CC: linux-nvme@lists.infradead.org
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230927093442.25915-13-jack@suse.cz
Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-10-28 13:29:18 +02:00
Hannes Reinecke
70525e5d82 nvmet-tcp: peek icreq before starting TLS
Incoming connection might be either 'normal' NVMe-TCP connections
starting with icreq or TLS handshakes. To ensure that 'normal'
connections can still be handled we need to peek the first packet
and only start TLS handshake if it's not an icreq.
With that we can lift the restriction to always set TREQ to
'required' when TLS1.3 is enabled.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-11 10:29:59 -07:00
Hannes Reinecke
675b453e02 nvmet-tcp: enable TLS handshake upcall
TLS handshake is handled in userspace with the netlink tls handshake
protocol.

The patch adds a function to start the TLS handshake upcall for any
incoming network connections if the TCP TSAS sectype is set to 'tls1.3'.

A config option NVME_TARGET_TCP_TLS selects whether the TLS handshake
upcall should be compiled in.  The patch also adds reference counting
to struct nvmet_tcp_queue to ensure the queue is always valid when the
the TLS handshake completes.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-11 10:29:59 -07:00
Hannes Reinecke
eb398812bd nvmet: Set 'TREQ' to 'required' when TLS is enabled
The current implementation does not support secure concatenation,
so 'TREQ' is always set to 'required' when TLS is enabled.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-11 10:29:59 -07:00
Christophe JAILLET
9d16d26477 nvmet: Reorder fields in 'struct nvmet_ns'
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_ns' from 520 to 512
bytes.

When such a structure is allocated in nvmet_ns_alloc(), because of the way
memory allocation works, when 520 bytes were requested, 1024 bytes were
allocated.

So, on x86_64, this change saves 512 bytes per allocation.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-21 09:41:40 -07:00
Christophe JAILLET
c60651e32f nvmet: reorder fields in 'struct nvmet_sq'
Group some variables based on their sizes to reduce holes.
On x86_64, this shrinks the size of 'struct nvmet_sq' from 472 to 464
bytes when CONFIG_NVME_TARGET_AUTH is defined.

This structure is embedded into some other structures, so it helps reducing
their sizes as well.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-06-12 10:36:37 -07:00
Christoph Hellwig
c5a9abfad9 nvmet: remove nvmet_req_cns_error_complete
Just fold it into the only caller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2023-04-13 08:55:05 +02:00
Christoph Hellwig
9326353566 nvmet: rename nvmet_execute_identify_cns_cs_ns
nvmet_execute_identify_ns_zns is a more descriptive name for the
function handling the "I/O Command Set Specific Identify Namespace
Data Structure for the Zoned Namespace Command Set".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
2023-04-13 08:55:04 +02:00
Damien Le Moal
a5a6ab0950 nvmet: fix I/O Command Set specific Identify Controller
For an identify command with cns set to NVME_ID_CNS_CS_CTRL, the NVMe
2.0 specification states that:

If the I/O Command Set specified by the CSI field does not have an
Identify Controller data structure, then the controller shall return
a zero filled data structure. If the host requests a data structure for
an I/O Command Set that the controller does not support, the controller
shall abort the command with a status code of Invalid Field in Command.

However, the current implementation of this identify command in
nvmet_execute_identify() only handles the ZNS command set, returning an
error for the NVM command set, which is not compliant with the
specifications as we do support this command set.

Fix this by:
1) Renaming nvmet_execute_identify_cns_cs_ctrl() to
   nvmet_execute_identify_ctrl_zns() to continue handling the
   ZNS command set as is.
2) Introduce a nvmet_execute_identify_ctrl_ns() helper to handle the
   NVM command set, returning a zero filled nvme_id_ctrl_nvm data
   structure.
3) Modify nvmet_execute_identify() to call these helpers based on
   the csi specified, returning an error for unsupported command sets.

Fixes: aaf2e048af ("nvmet: add ZBD over ZNS backend support")
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Tested-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2023-04-13 08:55:04 +02:00
Aleksandr Miloserdov
68c5444c31 nvmet: expose firmware revision to configfs
Allow user to set currently active firmware revision

Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Aleksandr Miloserdov <a.miloserdov@yadro.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-21 08:35:58 +01:00
Aleksandr Miloserdov
23855abdc4 nvmet: expose IEEE OUI to configfs
Allow user to set OUI for the controller vendor.

Reviewed-by: Konstantin Shelekhin <k.shelekhin@yadro.com>
Reviewed-by: Dmitriy Bogdanov <d.bogdanov@yadro.com>
Signed-off-by: Aleksandr Miloserdov <a.miloserdov@yadro.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-11-21 08:35:58 +01:00
Christoph Hellwig
fa8f9ac423 nvmet: only allocate a single slab for bvecs
There is no need to have a separate slab cache for each namespace,
and having separate ones creates duplicate debugs file names as well.

Fixes: d5eff33ee6 ("nvmet: add simple file backed ns support")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
2022-11-15 10:50:22 +01:00
Christoph Hellwig
1befd944e0 nvmet-auth: don't try to cancel a non-initialized work_struct
Currently blktests nvme/002 trips up debugobjects if CONFIG_NVME_AUTH is
enabled, but authentication is not on a queue.  This is because
nvmet_auth_sq_free cancels sq->auth_expired_work unconditionaly, while
auth_expired_work is only ever initialized if authentication is enabled
for a given controller.

Fix this by calling most of what is nvmet_init_auth unconditionally
when initializing the SQ, and just do the setting of the result
field in the connect command handler.

Fixes: db1312dd95 ("nvmet: implement basic In-Band Authentication")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
2022-09-27 09:22:09 +02:00
Hannes Reinecke
1a70200f40 nvmet-auth: expire authentication sessions
Each authentication step is required to be completed within the
KATO interval (or two minutes if not set). So add a workqueue function
to reset the transaction ID and the expected next protocol step;
this will automatically the next authentication command referring
to the terminated authentication.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:50 -06:00
Hannes Reinecke
7a277c37d3 nvmet-auth: Diffie-Hellman key exchange support
Implement Diffie-Hellman key exchange using FFDHE groups for NVMe
In-Band Authentication.
This patch adds a new host configfs attribute 'dhchap_dhgroup' to
select the FFDHE group to use.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:50 -06:00
Hannes Reinecke
db1312dd95 nvmet: implement basic In-Band Authentication
Implement NVMe-oF In-Band authentication according to NVMe TPAR 8006.
This patch adds three additional configfs entries 'dhchap_key',
'dhchap_ctrl_key', and 'dhchap_hash' to the 'host' configfs directory.
The 'dhchap_key' and 'dhchap_ctrl_key' entries need to be in the ASCII
format as specified in NVMe Base Specification v2.0 section 8.13.5.8
'Secret representation'.
'dhchap_hash' defaults to 'hmac(sha256)', and can be written to to
switch to a different HMAC algorithm.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:49 -06:00
Hannes Reinecke
6490c9ed06 nvmet: parse fabrics commands on io queues
Some fabrics commands can be sent via io queues, so add a new
function nvmet_parse_fabrics_io_cmd() and rename the existing
nvmet_parse_fabrics_cmd() to nvmet_parse_fabrics_admin_cmd().

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-08-02 17:14:49 -06:00
Alan Adamson
34ad61514c nvmet: add a clear_ids attribute for passthru targets
If the clear_ids attribute is set to true, the EUI/GUID/UUID is cleared
for the passthru target.  By default, loop targets will set clear_ids to
true.

This resolves an issue where a connect to a passthru target fails when
using a trtype of 'loop' because EUI/GUID/UUID is not unique.

Fixes: 2079f41ec6 ("nvme: check that EUI/GUID/UUID are globally unique")
Signed-off-by: Alan Adamson <alan.adamson@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-06-30 08:23:24 +02:00
Sagi Grimberg
8832cf9221 nvmet: use a private workqueue instead of the system workqueue
Any attempt to flush kernel-global WQs has possibility of deadlock
so we should simply stop using them, instead introduce nvmet_wq
which is the generic nvmet workqueue for work elements that
don't explicitly require a dedicated workqueue (by the mere fact
that they are using the system_wq).

Changes were done using the following replaces:

 - s/schedule_work(/queue_work(nvmet_wq, /g
 - s/schedule_delayed_work(/queue_delayed_work(nvmet_wq, /g
 - s/flush_scheduled_work()/flush_workqueue(nvmet_wq)/g

Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-03-29 09:29:04 +02:00
Christoph Hellwig
da78373396 nvmet: move the call to nvmet_ns_changed out of nvmet_ns_revalidate
nvmet_ns_changed states via lockdep that the ns->subsys->lock must be
held. The only caller of nvmet_ns_changed which does not acquire that
lock is nvmet_ns_revalidate. nvmet_ns_revalidate has 3 callers,
of which 2 do not acquire that lock: nvmet_execute_identify_cns_cs_ns
and nvmet_execute_identify_ns. The other caller
nvmet_ns_revalidate_size_store does acquire the lock.

Move the call to nvmet_ns_changed from nvmet_ns_revalidate to the callers
so that they can perform the correct locking as needed.

This issue was found using a static type-based analyser and manually
verified.

Reported-by: Niels Dossche <dossche.niels@gmail.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
2022-03-15 10:36:47 +01:00
Chaitanya Kulkarni
2caecd62ea nvmet: use i_size_read() to set size for file-ns
Instead of calling vfs_getattr() use i_size_read() to read the size of
file so we can read the size of not only file type but also block type
with one call. This is needed to implement buffered_io support for the
NVMeOF block device backend.

We also change return type of function nvmet_file_ns_revalidate() from
int to void, since this function does not return any meaning value.

Signed-off-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2022-02-28 13:45:05 +02:00
Hannes Reinecke
598e75934c nvmet: switch check for subsystem type
Invert the check for discovery subsystem type to allow for additional
discovery subsystem types.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-27 08:03:30 +02:00
Hannes Reinecke
a294711ed5 nvmet: add nvmet_is_disc_subsys() helper
Add a helper function to determine if a given subsystem is a discovery
subsystem.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-20 19:16:02 +02:00
Max Gurtovoy
6d1555cc41 nvmet: add get_max_queue_size op for controllers
Some transports, such as RDMA, would like to set the queue size
according to device/port/ctrl characteristics. Add a new nvmet transport
op that is called during ctrl initialization. This will not effect
transports that don't implement this option.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-10-20 19:16:01 +02:00
Christoph Hellwig
ab7a2737ac nvmet: return bool from nvmet_passthru_ctrl and nvmet_is_passthru_req
The target core code never needs the host-side nvme_ctrl structure.
Open code two uses of nvmet_is_passthru_req in passthru.c, and then
switch the helpers used by the core to return bool.  Also rename the
fuctions to better match their usage:

  nvmet_passthru_ctrl -> nvmet_is_passthru_subsys
  nvmet_req_passthru_ctrl -> nvmet_is_passthru_req

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2021-09-06 09:59:03 +02:00
Adam Manzanares
77d651a655 nvmet: looks at the passthrough controller when initializing CAP
For a passthru controller make cap initialization dependent on the cap of
the passthru controller, given that multiple Command Set support needs
to be supported by the underlying controller.  For that move the
initialization of CAP later so that it can use the fully initialized
nvmet_ctrl structure.

Fixes: ab5d0b38c0 (nvmet: add Command Set Identifier support)
Signed-off-by: Adam Manzanares <a.manzanares@samsung.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
[hch: refactored the code a bit to keep it more contained in passthru.c]
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-09-06 09:59:03 +02:00
Linus Torvalds
440462198d Merge tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block
Pull block driver updates from Jens Axboe:
 "Pretty calm round, mostly just NVMe and a bit of MD:

   - NVMe updates (via Christoph)
        - improve the APST configuration algorithm (Alexey Bogoslavsky)
        - look for StorageD3Enable on companion ACPI device
          (Mario Limonciello)
        - allow selecting the network interface for TCP connections
          (Martin Belanger)
        - misc cleanups (Amit Engel, Chaitanya Kulkarni, Colin Ian King,
          Christoph)
        - move the ACPI StorageD3 code to drivers/acpi/ and add quirks
          for certain AMD CPUs (Mario Limonciello)
        - zoned device support for nvmet (Chaitanya Kulkarni)
        - fix the rules for changing the serial number in nvmet
          (Noam Gottlieb)
        - various small fixes and cleanups (Dan Carpenter, JK Kim,
          Chaitanya Kulkarni, Hannes Reinecke, Wesley Sheng, Geert
          Uytterhoeven, Daniel Wagner)

   - MD updates (Via Song)
        - iostats rewrite (Guoqing Jiang)
        - raid5 lock contention optimization (Gal Ofri)

   - Fall through warning fix (Gustavo)

   - Misc fixes (Gustavo, Jiapeng)"

* tag 'for-5.14/drivers-2021-06-29' of git://git.kernel.dk/linux-block: (78 commits)
  nvmet: use NVMET_MAX_NAMESPACES to set nn value
  loop: Fix missing discard support when using LOOP_CONFIGURE
  nvme.h: add missing nvme_lba_range_type endianness annotations
  nvme: remove zeroout memset call for struct
  nvme-pci: remove zeroout memset call for struct
  nvmet: remove zeroout memset call for struct
  nvmet: add ZBD over ZNS backend support
  nvmet: add Command Set Identifier support
  nvmet: add nvmet_req_bio put helper for backends
  nvmet: add req cns error complete helper
  block: export blk_next_bio()
  nvmet: remove local variable
  nvmet: use nvme status value directly
  nvmet: use u32 type for the local variable nsid
  nvmet: use u32 for nvmet_subsys max_nsid
  nvmet: use req->cmd directly in file-ns fast path
  nvmet: use req->cmd directly in bdev-ns fast path
  nvmet: make ver stable once connection established
  nvmet: allow mn change if subsys not discovered
  nvmet: make sn stable once connection was established
  ...
2021-06-30 12:21:16 -07:00
Chaitanya Kulkarni
aaf2e048af nvmet: add ZBD over ZNS backend support
NVMe TP 4053 – Zoned Namespaces (ZNS) allows host software to
communicate with a non-volatile memory subsystem using zones for NVMe
protocol-based controllers. NVMeOF already support the ZNS NVMe
Protocol compliant devices on the target in the passthru mode. There
are generic zoned block devices like  Shingled Magnetic Recording (SMR)
HDDs that are not based on the NVMe protocol.

This patch adds ZNS backend support for non-ZNS zoned block devices as
NVMeOF targets.

This support includes implementing the new command set NVME_CSI_ZNS,
adding different command handlers for ZNS command set such as NVMe
Identify Controller, NVMe Identify Namespace, NVMe Zone Append,
NVMe Zone Management Send and NVMe Zone Management Receive.

With the new command set identifier, we also update the target command
effects logs to reflect the ZNS compliant commands.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:21 +02:00
Chaitanya Kulkarni
ab5d0b38c0 nvmet: add Command Set Identifier support
NVMe TP 4056 allows controllers to support different command sets.
NVMeoF target currently only supports namespaces that contain
traditional logical blocks that may be randomly read and written. In
some applications there is a value in exposing namespaces that contain
logical blocks that have special access rules (e.g. sequentially write
required namespace such as Zoned Namespace (ZNS)).

In order to support the Zoned Block Devices (ZBD) backend, controllers
need to have support for ZNS Command Set Identifier (CSI).

In this preparation patch, we adjust the code such that it can now
support the default command set identifier. We update the namespace data
structure to store the CSI value which defaults to NVME_CSI_NVM
that represents traditional logical blocks namespace type.

The CSI support is required to implement the ZBD backend for NVMeOF
with host side NVMe ZNS interface, since ZNS commands belong to
the different command set than the default one.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:20 +02:00
Chaitanya Kulkarni
9a01b58c22 nvmet: add nvmet_req_bio put helper for backends
In current code there exists two backends which are using inline bio
optimization, that adds a duplicate code for freeing the bio.

For Zoned Block Device backend we also use the same optimzation and it
will lead to having duplicate code in the three backends: generic
bdev, passsthru, and generic zns.

Add a helper function to avoid duplicate code and update the respective
backends.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:20 +02:00
Chaitanya Kulkarni
6e597263f9 nvmet: add req cns error complete helper
We report error and complete the request when identify cns value is not
handled in nvmet_execute_identify(). This error reporting is also needed
for Zone Block Device backend for NVMeOF target.

Add a helper nvmet_req_cns_error_compplete() to report an error and
complete the request when idenitfy command cns not handled value.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:20 +02:00
Chaitanya Kulkarni
86693c43bb nvmet: use u32 for nvmet_subsys max_nsid
Use u32 type for the nsid_max member of the nvmet_subsys structure.
This avoids the type confusion when updating the subsys->nax_nsid from
ns->nsid. This also matches the nvmet_ns->nsid member.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:20 +02:00
Noam Gottlieb
7ae023c5aa nvmet: make sn stable once connection was established
Once some host has connected to the target, make sure that the serial
number is stable and cannot be changed.

Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:19 +02:00
Noam Gottlieb
e13b061589 nvmet: change sn size and check validity
According to the NVM specification, the serial_number should be 20 bytes
(bytes 23:04 of the Identify Controller data structure), and should
contain only ASCII characters.

In accordance, the serial_number size is changed to 20 bytes and before
any attempt to store a new value in serial_number we check that the
input is valid - i.e. contains only ASCII characters, is not empty and
does not exceed 20 bytes.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-06-17 15:51:19 +02:00
Sagi Grimberg
aaeadd7075 nvmet: fix false keep-alive timeout when a controller is torn down
Controller teardown flow may take some time in case it has many I/O
queues, and the host may not send us keep-alive during this period.
Hence reset the traffic based keep-alive timer so we don't trigger
a controller teardown as a result of a keep-alive expiration.

Reported-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-26 16:18:22 +02:00
Chaitanya Kulkarni
608a969046 nvmet: fix inline bio check for bdev-ns
When handling rw commands, for inline bio case we only consider
transfer size. This works well when req->sg_cnt fits into the
req->inline_bvec, but it will result in the warning in
__bio_add_page() when req->sg_cnt > NVMET_MAX_INLINE_BVEC.

Consider an I/O size 32768 and first page is not aligned to the page
boundary, then I/O is split in following manner :-

[ 2206.256140] nvmet: sg->length 3440 sg->offset 656
[ 2206.256144] nvmet: sg->length 4096 sg->offset 0
[ 2206.256148] nvmet: sg->length 4096 sg->offset 0
[ 2206.256152] nvmet: sg->length 4096 sg->offset 0
[ 2206.256155] nvmet: sg->length 4096 sg->offset 0
[ 2206.256159] nvmet: sg->length 4096 sg->offset 0
[ 2206.256163] nvmet: sg->length 4096 sg->offset 0
[ 2206.256166] nvmet: sg->length 4096 sg->offset 0
[ 2206.256170] nvmet: sg->length 656 sg->offset 0

Now the req->transfer_size == NVMET_MAX_INLINE_DATA_LEN i.e. 32768, but
the req->sg_cnt is (9) > NVMET_MAX_INLINE_BIOVEC which is (8).
This will result in the following warning message :-

nvmet_bdev_execute_rw()
	bio_add_page()
		__bio_add_page()
			WARN_ON_ONCE(bio_full(bio, len));

This scenario is very hard to reproduce on the nvme-loop transport only
with rw commands issued with the passthru IOCTL interface from the host
application and the data buffer is allocated with the malloc() and not
the posix_memalign().

Fixes: 73383adfad ("nvmet: don't split large I/Os unconditionally")
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-05-11 18:30:45 +02:00
Noam Gottlieb
48b4c010c8 nvmet: do not allow model_number exceed 40 bytes
According to the NVM specifications, the model number size should be
40 bytes (bytes 63:24 of the Identify Controller data structure).
Therefore, any attempt to store a value into model_number which
exceeds 40 bytes should return an error.

Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Noam Gottlieb <ngottlieb@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
de5878048e nvmet: remove unnecessary ctrl parameter
The function nvmet_ctrl_find_get() accepts out pointer to nvmet_ctrl
structure. This function returns the same error value from two places
that is :- NVME_SC_CONNECT_INVALID_PARAM | NVME_SC_DNR.

Move this to the caller so we can change the return type to nvmet_ctrl.

Now that we can changed the return type, instead of taking out pointer
to the nvmet_ctrl structure remove that function parameter and return
the valid nvmet_ctrl pointer on success and NULL on failure.

Also, add and rename the goto labels for more readability with comments.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:27 +02:00
Chaitanya Kulkarni
7798df6fcf nvmet: remove an unnecessary function parameter to nvmet_check_ctrl_status
In nvmet_check_ctrl_status() cmd can be derived from nvmet_req. Remove
the local variable cmd in the nvmet_check_ctrl_status() and function
parameter cmd for nvmet_check_ctrl_status(). Derive the cmd value from
req parameter in the nvmet_check_ctrl_status().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-04-02 18:48:25 +02:00
Max Gurtovoy
d9f273b758 nvmet: model_number must be immutable once set
In case we have already established connection to nvmf target, it
shouldn't be allowed to change the model_number. E.g. if someone will
identify ctrl and get model_number of "my_model" later on will change
the model_numbel via configfs to "my_new_model" this will break the NVMe
specification for "Get Log Page – Persistent Event Log" that refers to
Model Number as: "This field contains the same value as reported in the
Model Number field of the Identify Controller data structure, bytes
63:24."

Although it doesn't mentioned explicitly that this field can't be
changed, we can assume it.

So allow setting this field only once: using configfs or in the first
identify ctrl operation.

Signed-off-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-03-05 13:41:03 +01:00
Chaitanya Kulkarni
20c2c3bb83 nvmet: add nvmet_req_subsys() helper
Just like what we have to get the passthru ctrl from the req, add an
helper to get the subsystem associated with the nvmet_req() instead
of open coding the chain of structures.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
d81d57cf1b nvmet: add helper to report invalid opcode
In the NVMeOF block device backend, file backend, and passthru backend
we reject and report the commands if opcode is not handled.

Add an helper and use it in block device backend to keep the code
and error message uniform.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:05 +01:00
Chaitanya Kulkarni
3a1f7c79ae nvmet: make nvmet_find_namespace() req based
The six callers of nvmet_find_namespace() duplicate the error log page
update and status setting code for each call on failure.

All callers are nvmet requests based functions, so we can pass req
to the nvmet_find_namesapce() & derive ctrl from req, that'll allow us
to update the error log page in nvmet_find_namespace(). Now that we
pass the request we can also get rid of the local variable in
nvmet_find_namespace() and use the req->ns and return the error code.

Replace the ctrl parameter with nvmet_req for nvmet_find_namespace(),
centralize the error log page update for non allocated namesapces, and
return uniform error for non-allocated namespace.

The nvmet_find_namespace() takes nsid parameter which is from NVMe
commands structures such as get_log_page, identify, rw and common. All
these commands have same offset for the nsid field.

Derive nsid from req->cmd->common.nsid) & remove the extra parameter
from the nvmet_find_namespace().

Lastly now we associate the ns to the req parameter that we pass to the
nvmet_find_namespace(), rename nvmet_find_namespace() to
nvmet_req_find_ns().

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-10 16:38:04 +01:00
Chaitanya Kulkarni
193fcf371f nvmet: add lba to sect conversion helpers
In this preparation patch, we add helpers to convert lbas to sectors &
sectors to lba. This is needed to eliminate code duplication in the ZBD
backend.

Use these helpers in the block device backend.

Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2021-02-02 10:26:11 +01:00
Amit
6d65aeab7b nvmet: remove unused ctrl->cqs
remove unused cqs from nvmet_ctrl struct
this will reduce the allocated memory.

Signed-off-by: Amit <amit.engel@dell.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
2020-12-01 20:36:36 +01:00