Commit Graph

6080 Commits

Author SHA1 Message Date
Christoph Hellwig
d250bf4e77 blk-mq: only iterate over inflight requests in blk_mq_tagset_busy_iter
We already check for started commands in all callbacks, but we should
also protect against already completed commands.  Do this by taking
the checks to common code.

Acked-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30 11:31:34 -06:00
Kevin Vigor
5e3c3a7ece nbd: clear DISCONNECT_REQUESTED flag once disconnection occurs.
When a userspace client requests a NBD device be disconnected, the
DISCONNECT_REQUESTED flag is set. While this flag is set, the driver
will not inform userspace when a connection is closed.

Unfortunately the flag was never cleared, so once a disconnect was
requested the driver would thereafter never tell userspace about a
closed connection. Thus when connections failed due to timeout, no
attempt to reconnect was made and eventually the device would fail.

Fix by clearing the DISCONNECT_REQUESTED flag (and setting the
DISCONNECTED flag) once all connections are closed.

Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Kevin Vigor <kvigor@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-30 11:30:42 -06:00
Christoph Hellwig
0df0bb080a null_blk: complete requests from ->timeout
By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Christoph Hellwig
c5fb85b7ff mtip32xx: complete requests from ->timeout
By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Christoph Hellwig
e5eab01704 nbd: complete requests from ->timeout
By completing the request entirely in the driver we can remove the
BLK_EH_HANDLED return value and thus the split responsibility between the
driver and the block layer that has been causing trouble.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Christoph Hellwig
6600593cbd block: rename BLK_EH_NOT_HANDLED to BLK_EH_DONE
The BLK_EH_NOT_HANDLED implies nothing happen, but very often that
is not what is happening - instead the driver already completed the
command.  Fix the symbolic name to reflect that a little better.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-29 08:59:21 -06:00
Joe Perches
5657a819a8 block drivers/block: Use octal not symbolic permissions
Convert the S_<FOO> symbolic permissions to their octal equivalents as
using octal and not symbolic permissions is preferred by many as more
readable.

see: https://lkml.org/lkml/2016/8/2/1945

Done with automated conversion via:
$ ./scripts/checkpatch.pl -f --types=SYMBOLIC_PERMS --fix-inplace <files...>

Miscellanea:

o Wrapped modified multi-line calls to a single line where appropriate
o Realign modified multi-line calls to open parenthesis

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-24 13:38:59 -06:00
Linus Torvalds
b68ea0ee03 Merge tag 'for-linus-20180524' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Two fixes that should go into this release:

   - a loop writeback error clearing fix from Jeff

   - the sr sense fix from myself"

* tag 'for-linus-20180524' of git://git.kernel.dk/linux-block:
  loop: clear wb_err in bd_inode when detaching backing file
  sr: pass down correctly sized SCSI sense buffer
2018-05-24 08:53:20 -07:00
Josef Bacik
6df133a149 nbd: set discard granularity properly
For some reason we had discard granularity set to 512 always even when
discards were disabled.  Fix this by having the default be 0, and then
if we turn it on set the discard granularity to the blocksize.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-23 15:27:26 -06:00
Dan Melnic
2189c97cdb block/ndb: add WQ_UNBOUND to the knbd-recv workqueue
Add WQ_UNBOUND to the knbd-recv workqueue so we're not bound
to a single CPU that is selected at device creation time.

Signed-off-by: Dan Melnic <dmm@fb.com>
Reviewed-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-22 11:47:34 -06:00
Jeff Layton
eedffa28c9 loop: clear wb_err in bd_inode when detaching backing file
When a loop block device encounters a writeback error, that error will
get propagated to the bd_inode's wb_err field. If we then detach the
backing file from it, attach another and fsync it, we'll get back the
writeback error that we had from the previous backing file.

This is a bit of a grey area as POSIX doesn't cover loop devices, but it
is somewhat counterintuitive.

If we detach a backing file from the loopdev while there are still
unreported errors, take it as a sign that we're no longer interested in
the previous file, and clear out the wb_err in the loop blockdev.

Reported-and-Tested-by: Theodore Y. Ts'o <tytso@mit.edu>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-21 12:36:03 -06:00
Josef Bacik
76aa1d3412 nbd: call nbd_bdev_reset instead of bd_set_size on disconnect
We need to make sure we don't just set the size of the bdev to 0 while
it's being used by a file system.  We have the appropriate check in
nbd_bdev_reset, simply use that helper instead.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:13 -06:00
Josef Bacik
fe1f9e6659 nbd: fix how we set bd_invalidated
bd_invalidated is kind of a pain wrt partitions as it really only
triggers the partition rescan if it is set after bd_ops->open() runs, so
setting it when we reset the device isn't useful.  We also sporadically
would still have partitions left over in some disconnect cases, so fix
this by always setting bd_invalidated on open if there's no
configuration or if we've had a disconnect action happen, that way the
partition table gets invalidated and rescanned properly.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:12 -06:00
Josef Bacik
96d97e1782 nbd: clear_sock on netlink disconnect
This is what the ioctl based nbd disconnect does as well.  Without this
the device will just sit there and wait for the connection to go away
(or IO to occur) before the device gets torn down.  Instead clear
everything up on our end so the configuration goes away as quickly as
possible.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:10 -06:00
Josef Bacik
9e2b19675d nbd: use bd_set_size when updating disk size
When we stopped relying on the bdev everywhere I broke updating the
block device size on the fly, which ceph relies on.  We can't just do
set_capacity, we also have to do bd_set_size so things like parted will
notice the device size change.

Fixes: 29eaadc ("nbd: stop using the bdev everywhere")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:09 -06:00
Josef Bacik
c3f7c93976 nbd: update size when connected
I messed up changing the size of an NBD device while it was connected by
not actually updating the device or doing the uevent.  Fix this by
updating everything if we're connected and we change the size.

cc: stable@vger.kernel.org
Fixes: 639812a ("nbd: don't set the device size until we're connected")
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:07 -06:00
Josef Bacik
8364da4751 nbd: fix nbd device deletion
This fixes a use after free bug, we shouldn't be doing disk->queue right
after we do del_gendisk(disk).  Save the queue and do the cleanup after
the del_gendisk.

Fixes: c6a4759ea0 ("nbd: add device refcounting")
cc: stable@vger.kernel.org
Signed-off-by: Josef Bacik <jbacik@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-16 12:54:06 -06:00
Christoph Hellwig
004fd11db1 drbd: switch to proc_create_single
And stop messing with try_module_get on THIS_MODULE, which doesn't make
any sense here.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:24:30 +02:00
Christoph Hellwig
3f3942aca6 proc: introduce proc_create_single{,_data}
Variants of proc_create{,_data} that directly take a seq_file show
callback and drastically reduces the boilerplate code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig <hch@lst.de>
2018-05-16 07:23:35 +02:00
Kent Overstreet
45db54d58d block: Split out bio_list_copy_data()
Found a bug (with ASAN) where we were passing a bio to bio_copy_data()
with bi_next not NULL, when it should have been - a driver had left
bi_next set to something after calling bio_endio().

Since the normal case is only copying single bios, split out
bio_list_copy_data() to avoid more bugs like this in the future.

Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 13:16:10 -06:00
Christoph Hellwig
0eb0b63c1d block: consistently use GFP_NOIO instead of __GFP_NORECLAIM
Same numerical value (for now at least), but a much better documentation
of intent.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 08:55:18 -06:00
Christoph Hellwig
ff005a0662 block: sanitize blk_get_request calling conventions
Switch everyone to blk_get_request_flags, and then rename
blk_get_request_flags to blk_get_request.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-14 08:55:12 -06:00
Christoph Hellwig
e4f0e0cbf4 ps3disk: handle highmem pages
The ps3disk driver already kmaps all pages when copying from/to the
internal bounce buffer, so it can accept highmem pages just fine.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-11 15:08:03 -06:00
Christoph Hellwig
ad180f6f71 aoe: handle highmem pages
Use kmap_atomic when copying out of a bio_vec.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-11 15:08:00 -06:00
Christoph Hellwig
00f0a51f0b DAC960: don't use block layer bounce buffers
DAC960 just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-11 15:07:54 -06:00
Christoph Hellwig
5c26e05039 mtip32xx: don't use block layer bounce buffers
mtip32xx just sets the block bounce limit to the dma mask, which means
that the iommu or swiotlb already take care of the bounce buffering,
and the block bouncing can be removed.

Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-11 15:07:52 -06:00
Christophe JAILLET
8e3c283fc6 mtip32xx: Fix an error handling path in 'mtip_pci_probe()'
Branch to the right label in the error handling path in order to keep it
logical.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-10 08:21:47 -06:00
Ilya Dryomov
0010f7052d libceph: add osd_req_op_extent_osd_data_bvecs()
... and store num_bvecs for client code's convenience.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Jeff Layton <jlayton@redhat.com>
Reviewed-by: "Yan, Zheng" <zyan@redhat.com>
2018-05-10 10:15:05 +02:00
SeongJae Park
316ba5736c brd: Mark as non-rotational
This commit sets QUEUE_FLAG_NONROT and clears up QUEUE_FLAG_ADD_RANDOM
to mark the ramdisks as non-rotational device.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-09 08:43:57 -06:00
Tetsuo Handa
d3349b6b3c loop: remember whether sysfs_create_group() was done
syzbot is hitting WARN() triggered by memory allocation fault
injection [1] because loop module is calling sysfs_remove_group()
when sysfs_create_group() failed.
Fix this by remembering whether sysfs_create_group() succeeded.

[1] https://syzkaller.appspot.com/bug?id=3f86c0edf75c86d2633aeb9dd69eccc70bc7e90b

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+9f03168400f56df89dbc6f1751f4458fe739ff29@syzkaller.appspotmail.com>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

Renamed sysfs_ready -> sysfs_inited.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-05-07 15:26:36 -06:00
Linus Torvalds
8fba70b085 Merge tag 'for-linus-20180425' of git://git.kernel.dk/linux-block
Pull block updates from Jens Axboe:
 "I ended up sitting on this about a week longer than I wanted to, since
  we were hashing out details with a timeout change. I've now killed
  that patch, so we can flush the existing queue in due time.

  This contains:

   - Fix for an old regression, where entering the queue can be
     disturbed by a signal to the process. This can cause spurious EIO.
     Fix from Alan Jenkins.

   - cdrom information leak fix from Dan.

   - Trivial helper for testing queue FUA from Dave Chinner, part of his
     O_DIRECT FUA series.

   - Series of swim fixes from Finn that actually makes it work again.

   - Loop O_DIRECT corruption fix, which caused data corruption in
     production for us. From me.

   - BFQ crash fix from me.

   - bcache maintainer update. Michael no longer has the time to do it,
     Coly has stepped up to serve as the new maintainer.

   - blkcg locking fixes from Jiang Biao.

   - Revert of a change from this merge window from Ming, that causes an
     issue on some hardware.

   - Minor clarification doc addition from Linus Walleij"

* tag 'for-linus-20180425' of git://git.kernel.dk/linux-block: (22 commits)
  Revert "blk-mq: remove code for dealing with remapping queue"
  block: mq: Add some minor doc for core structs
  bcache: mark Coly Li as bcache maintainer
  MAINTAINERS: Remove me as maintainer of bcache
  blkcg: init root blkcg_gq under lock
  blkcg: small fix on comment in blkcg_init_queue
  blkcg: don't hold blkcg lock when deactivating policy
  block: add blk_queue_fua() helper function
  cdrom: information leak in cdrom_ioctl_media_changed()
  bfq-iosched: ensure to clear bic/bfqq pointers when preparing request
  blk-mq: start request gstate with gen 1
  block/swim: Select appropriate drive on device open
  block/swim: Fix IO error at end of medium
  block/swim: Check drive type
  block/swim: Rename macros to avoid inconsistent inverted logic
  block/swim: Don't log an error message for an invalid ioctl
  block/swim: Remove extra put_disk() call from error path
  block/swim: Fix array bounds check
  m68k/mac: Don't remap SWIM MMIO region
  loop: handle short DIO reads
  ...
2018-04-25 21:05:15 -07:00
Finn Thain
b3906535cc block/swim: Select appropriate drive on device open
The driver supports internal and external FDD units so the floppy_open
function must not hard-code the drive location.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
5a13388d7a block/swim: Fix IO error at end of medium
Reading to the end of a 720K disk results in an IO error instead of EOF
because the block layer thinks the disk has 2880 sectors. (Partly this
is a result of inverted logic of the ONEMEG_MEDIA bit that's now fixed.)

Initialize the density and head count in swim_add_floppy() to agree
with the device size passed to set_capacity() during drive probe.

Call set_capacity() again upon device open, after refreshing the density
and head count values.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
8a500df63d block/swim: Check drive type
The SWIM chip is compatible with GCR-mode Sony 400K/800K drives but
this driver only supports MFM mode. Therefore only Sony FDHD drives
are supported. Skip incompatible drives.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
56a1c5ee54 block/swim: Rename macros to avoid inconsistent inverted logic
The Sony drive status bits use active-low logic. The swim_readbit()
function converts that to 'C' logic for readability. Hence, the
sense of the names of the status bit macros should not be inverted.

Mostly they are correct. However, the TWOMEG_DRIVE, MFM_MODE and
TWOMEG_MEDIA macros have inverted sense (like MkLinux). Fix this
inconsistency and make the following patches less confusing.

The same problem affects swim3.c so fix that too.

No functional change.

The FDHD drive status bits are documented in sonydriv.cpp from MAME
and in swimiii.h from MkLinux.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
8e2ab5a4ef block/swim: Don't log an error message for an invalid ioctl
The 'eject' shell command may send various different ioctl commands.
This leads to error messages on the console even though the FDEJECT
ioctl succeeds.

~# eject floppy
SWIM floppy_ioctl: unknown cmd 21257
SWIM floppy_ioctl: unknown cmd 1

Don't log an error message for an invalid ioctl, just do as the
swim3 driver does and return -ENOTTY.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
c1d6207cc0 block/swim: Remove extra put_disk() call from error path
Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Fixes: 103db8b2df ("[PATCH] swim: stop sharing request queue across multiple gendisks")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
7ae6a2b6cc block/swim: Fix array bounds check
In the floppy_find() function in swim.c is a call to
get_disk(swd->unit[drive].disk). The actual parameter to this call
can be a NULL pointer when drive == swd->floppy_count. This causes
an oops in get_disk().

Data read fault at 0x00000198 in Super Data (pc=0x1be5b6)
BAD KERNEL BUSERR
Oops: 00000000
Modules linked in: swim_mod ipv6 mac8390
PC: [<001be5b6>] get_disk+0xc/0x76
SR: 2004  SP: 9a078bc1  a2: 0213ed90
d0: 00000000    d1: 00000000    d2: 00000000    d3: 000000ff
d4: 00000002    d5: 02983590    a0: 02332e00    a1: 022dfd64
Process dd (pid: 285, task=020ab25b)
Frame format=B ssw=074d isc=4a88 isb=6732 daddr=00000198 dobuf=00000000
baddr=001be5bc dibuf=bfffffff ver=f
Stack from 022dfca4:
        00000000 0203fc00 0213ed90 022dfcc0 02982936 00000000 00200000 022dfd08
        0020f85a 00200000 022dfd64 02332e00 004040fc 00000014 001be77e 022dfd64
        00334e4a 001be3f8 0800001d 022dfd64 01c04b60 01c04b70 022aba80 029828f8
        02332e00 022dfd2c 001be7ac 0203fc00 00200000 022dfd64 02103a00 01c04b60
        01c04b60 0200e400 022dfd68 000e191a 00200000 022dfd64 02103a00 0800001d
        00000000 00000003 000b89de 00500000 02103a00 01c04b60 02103a08 01c04c2e
Call Trace: [<02982936>] floppy_find+0x3e/0x4a [swim_mod]
 [<00200000>] uart_remove_one_port+0x1a2/0x260
 [<0020f85a>] kobj_lookup+0xde/0x132
 [<00200000>] uart_remove_one_port+0x1a2/0x260
 [<001be77e>] get_gendisk+0x0/0x130
 [<00334e4a>] mutex_lock+0x0/0x2e
 [<001be3f8>] disk_block_events+0x0/0x6c
 [<029828f8>] floppy_find+0x0/0x4a [swim_mod]
 [<001be7ac>] get_gendisk+0x2e/0x130
 [<00200000>] uart_remove_one_port+0x1a2/0x260
 [<000e191a>] __blkdev_get+0x32/0x45a
 [<00200000>] uart_remove_one_port+0x1a2/0x260
 [<000b89de>] complete_walk+0x0/0x8a
 [<000e1e22>] blkdev_get+0xe0/0x29a
 [<000e1fdc>] blkdev_open+0x0/0xb0
 [<000b89de>] complete_walk+0x0/0x8a
 [<000e1fdc>] blkdev_open+0x0/0xb0
 [<000e01cc>] bd_acquire+0x74/0x8a
 [<000e205c>] blkdev_open+0x80/0xb0
 [<000e1fdc>] blkdev_open+0x0/0xb0
 [<000abf24>] do_dentry_open+0x1a4/0x322
 [<00020000>] __do_proc_douintvec+0x22/0x27e
 [<000b89de>] complete_walk+0x0/0x8a
 [<000baa62>] link_path_walk+0x0/0x48e
 [<000ba3f8>] inode_permission+0x20/0x54
 [<000ac0e4>] vfs_open+0x42/0x78
 [<000bc372>] path_openat+0x2b2/0xeaa
 [<000bc0c0>] path_openat+0x0/0xeaa
 [<0004463e>] __irq_wake_thread+0x0/0x4e
 [<0003a45a>] task_tick_fair+0x18/0xc8
 [<000bd00a>] do_filp_open+0xa0/0xea
 [<000abae0>] do_sys_open+0x11a/0x1ee
 [<00020000>] __do_proc_douintvec+0x22/0x27e
 [<000abbf4>] SyS_open+0x1e/0x22
 [<00020000>] __do_proc_douintvec+0x22/0x27e
 [<00002b40>] syscall+0x8/0xc
 [<00020000>] __do_proc_douintvec+0x22/0x27e
 [<0000c00b>] dyadic+0x1/0x28
Code: 4e5e 4e75 4e56 fffc 2f0b 2f02 266e 0008 <206b> 0198 4a88 6732 2428 002c 661e 486b 0058 4eb9 0032 0b96 588f 4a88 672c 2008
Disabling lock debugging due to kernel taint

Fix the array index bounds check to avoid this.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@vger.kernel.org # v4.14+
Fixes: 8852ecd974 ("[PATCH] m68k: mac - Add SWIM floppy support")
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Finn Thain
b64576cbf3 m68k/mac: Don't remap SWIM MMIO region
For reasons I don't understand, calling ioremap() then iounmap() on
the SWIM MMIO region causes a hang on 68030 (but not on 68040).

~# modprobe swim_mod
SWIM floppy driver Version 0.2 (2008-10-30)
SWIM device not found !
watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [modprobe:285]
Modules linked in: swim_mod(+)
Format 00  Vector: 0064  PC: 000075aa  Status: 2000    Not tainted
ORIG_D0: ffffffff  D0: d00c0000  A2: 007c2370  A1: 003f810c
A0: 00040000  D5: d0096800  D4: d0097e00
D3: 00000001  D2: 00000003  D1: 00000000
Non-Maskable Interrupt
Modules linked in: swim_mod(+)
PC: [<000075ba>] __iounmap+0x24/0x10e
SR: 2000  SP: 007abc48  a2: 007c2370
d0: d00c0000    d1: 000001a0    d2: 00000019    d3: 00000001
d4: d0097e00    d5: d0096800    a0: 00040000    a1: 003f810c
Process modprobe (pid: 285, task=007c2370)
Frame format=0
Stack from 007abc7c:
        ffffffed 00000000 006a4060 004712e0 007abca0 000076ea d0080000 00080000
        010bb4b8 007abcd8 010ba542 d0096000 00000000 00000000 00000001 010bb59c
        00000000 007abf30 010bb4b8 0047760a 0047763c 00477612 00616540 007abcec
        0020a91a 00477600 0047760a 010bb4cc 007abd18 002092f2 0047760a 00333b06
        007abd5c 00000000 0047760a 010bb4cc 00404f90 004776b8 00000001 007abd38
        00209446 010bb4cc 0047760a 010bb4cc 0020938e 0031f8be 00616540 007abd64
Call Trace: [<000076ea>] iounmap+0x46/0x5a
 [<00080000>] shrink_page_list+0x7f6/0xe06
 [<010ba542>] swim_probe+0xe4/0x496 [swim_mod]
 [<0020a91a>] platform_drv_probe+0x20/0x5e
 [<002092f2>] driver_probe_device+0x21c/0x2b8
 [<00333b06>] mutex_lock+0x0/0x2e
 [<00209446>] __driver_attach+0xb8/0xce
 [<0020938e>] __driver_attach+0x0/0xce
 [<0031f8be>] klist_next+0x0/0xa0
 [<00207562>] bus_for_each_dev+0x74/0xba
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<00333b06>] mutex_lock+0x0/0x2e
 [<00208e44>] driver_attach+0x1a/0x1e
 [<0020938e>] __driver_attach+0x0/0xce
 [<00207e26>] bus_add_driver+0x188/0x234
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<00209894>] driver_register+0x58/0x104
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<010bd000>] swim_init+0x0/0x2c [swim_mod]
 [<0020a7be>] __platform_driver_register+0x38/0x3c
 [<010bd028>] swim_init+0x28/0x2c [swim_mod]
 [<000020dc>] do_one_initcall+0x38/0x196
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<003331cc>] mutex_unlock+0x0/0x3e
 [<00333b06>] mutex_lock+0x0/0x2e
 [<003331cc>] mutex_unlock+0x0/0x3e
 [<00333b06>] mutex_lock+0x0/0x2e
 [<003331cc>] mutex_unlock+0x0/0x3e
 [<00333b06>] mutex_lock+0x0/0x2e
 [<003331cc>] mutex_unlock+0x0/0x3e
 [<00333b06>] mutex_lock+0x0/0x2e
 [<00075008>] __free_pages+0x0/0x38
 [<000045c0>] mangle_kernel_stack+0x30/0xda
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<003331cc>] mutex_unlock+0x0/0x3e
 [<00333b06>] mutex_lock+0x0/0x2e
 [<0005ced4>] do_init_module+0x42/0x266
 [<010bd000>] swim_init+0x0/0x2c [swim_mod]
 [<000344c0>] blocking_notifier_call_chain+0x0/0x20
 [<0005eda0>] load_module+0x1a30/0x1e70
 [<0000465d>] mangle_kernel_stack+0xcd/0xda
 [<00331c64>] __generic_copy_from_user+0x0/0x46
 [<0033256e>] _cond_resched+0x0/0x32
 [<00331b9c>] memset+0x0/0x98
 [<0033256e>] _cond_resched+0x0/0x32
 [<0005f25c>] SyS_init_module+0x7c/0x112
 [<00002000>] _start+0x0/0x8
 [<00002000>] _start+0x0/0x8
 [<00331c82>] __generic_copy_from_user+0x1e/0x46
 [<0005f2b2>] SyS_init_module+0xd2/0x112
 [<0000465d>] mangle_kernel_stack+0xcd/0xda
 [<00002b40>] syscall+0x8/0xc
 [<0000465d>] mangle_kernel_stack+0xcd/0xda
 [<0008c00c>] pcpu_balance_workfn+0xb2/0x40e
Code: 2200 7419 e4a9 e589 2841 d9fc 0000 1000 <2414> 7203 c282 7602 b681 6600 0096 0242 fe00 0482 0000 0000 e9c0 11c3 ed89 2642

There's no need to call ioremap() for the SWIM address range, as it lies
within the usual IO device region at 0x5000 0000, which has already been
mapped by head.S.

Remove the redundant ioremap() and iounmap() calls to fix the hang.

Cc: Laurent Vivier <lvivier@redhat.com>
Cc: stable@vger.kernel.org # v4.14+
Tested-by: Stan Johnson <userm57@yahoo.com>
Signed-off-by: Finn Thain <fthain@telegraphics.com.au>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-16 21:49:35 -06:00
Ilya Dryomov
d93605407a rbd: notrim map option
Add an option to turn off discard and write zeroes offload support to
avoid deprovisioning a fully provisioned image.  When enabled, discard
requests will fail with -EOPNOTSUPP, write zeroes requests will fall
back to manually zeroing.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Hitoshi Kamei <hitoshi.kamei.xm@hitachi.com>
2018-04-16 09:38:40 +02:00
Ilya Dryomov
420efbdf4d rbd: adjust queue limits for "fancy" striping
In order to take full advantage of merging in ceph_file_to_extents(),
allow object set sized I/Os.  If the layout is not "fancy", an object
set consists of just one object.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16 09:38:40 +02:00
Arnd Bergmann
c6244b3b23 rbd: avoid Wreturn-type warnings
In some configurations gcc cannot see that rbd_assert(0) leads to an
unreachable code path:

  drivers/block/rbd.c: In function 'rbd_img_is_write':
  drivers/block/rbd.c:1397:1: error: control reaches end of non-void function [-Werror=return-type]
  drivers/block/rbd.c: In function '__rbd_obj_handle_request':
  drivers/block/rbd.c:2499:1: error: control reaches end of non-void function [-Werror=return-type]
  drivers/block/rbd.c: In function 'rbd_obj_handle_write':
  drivers/block/rbd.c:2471:1: error: control reaches end of non-void function [-Werror=return-type]

As the rbd_assert() here shows has no extra information beyond the verbose
BUG(), we can simply use BUG() directly in its place.  This is reliably
detected as not returning on any architecture, since it doesn't depend
on the unlikely() comparison that confused gcc.

Fixes: 3da691bf43 ("rbd: new request handling code")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16 09:38:40 +02:00
Dongsheng Yang
34f55d0b3a rbd: support timeout in rbd_wait_state_locked()
currently, the rbd_wait_state_locked() will wait forever if we
can't get our state locked. Example:

rbd map --exclusive test1  --> /dev/rbd0
rbd map test1  --> /dev/rbd1
dd if=/dev/zero of=/dev/rbd1 bs=1M count=1 --> IO blocked

To avoid this problem, this patch introduce a timeout design
in rbd_wait_state_locked(). Then rbd_wait_state_locked() will
return error when we reach a timeout.

This patch allow user to set the lock_timeout in rbd mapping.

Signed-off-by: Dongsheng Yang <dongsheng.yang@easystack.cn>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16 09:38:40 +02:00
Ilya Dryomov
2f18d46683 rbd: refactor rbd_wait_state_locked()
In preparation for lock_timeout option, make rbd_wait_state_locked()
return error codes.

Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2018-04-16 09:38:40 +02:00
Jens Axboe
f9de14bc7e loop: handle short DIO reads
We ran into an issue with loop and btrfs, where btrfs would complain about
checksum errors. It turns out that is because we don't handle short reads
at all, we just zero fill the remainder. Worse than that, we don't handle
the filling properly, which results in loop trying to advance a single
bio by much more than its size, since it doesn't take chaining into
account.

Handle short reads appropriately, by simply retrying at the new correct
offset. End the remainder of the request with EIO, if we get a 0 read.

Fixes: bc07c10a36 ("block: loop: support DIO & AIO")
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-14 22:34:27 -06:00
Jens Axboe
1894e91654 loop: remove cmd->rq member
We can always get at the request from the payload, no need to store
a pointer to it.

Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-14 22:34:27 -06:00
Linus Torvalds
edda415314 Merge tag 'for-linus-20180413' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
 "Followup fixes for this merge window. This contains:

   - Series from Ming, fixing corner cases in our CPU <-> queue mapping.

     This triggered repeated warnings on especially s390, but I also hit
     it in cpu hot plug/unplug testing while doing IO on NVMe on x86-64.

   - Another fix from Ming, ensuring that we always order budget and
     driver tag identically, avoiding a deadlock on QD=1 devices.

   - Loop locking regression fix from this merge window, from Omar.

   - Another loop locking fix, this time missing an unlock, from Tetsuo
     Handa.

   - Fix for racing IO submission with device removal from Bart.

   - sr reference fix from me, fixing a case where disk change or
     getevents can race with device removal.

   - Set of nvme fixes by way of Keith, from various contributors"

* tag 'for-linus-20180413' of git://git.kernel.dk/linux-block: (28 commits)
  nvme: expand nvmf_check_if_ready checks
  nvme: Use admin command effects for admin commands
  nvmet: fix space padding in serial number
  nvme: check return value of init_srcu_struct function
  nvmet: Fix nvmet_execute_write_zeroes sector count
  nvme-pci: Separate IO and admin queue IRQ vectors
  nvme-pci: Remove unused queue parameter
  nvme-pci: Skip queue deletion if there are no queues
  nvme: target: fix buffer overflow
  nvme: don't send keep-alives to the discovery controller
  nvme: unexport nvme_start_keep_alive
  nvme-loop: fix kernel oops in case of unhandled command
  nvme: enforce 64bit offset for nvme_get_log_ext fn
  sr: get/drop reference to device in revalidate and check_events
  blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped"
  blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash
  backing: silence compiler warning using __printf
  blk-mq: remove code for dealing with remapping queue
  blk-mq: reimplement blk_mq_hw_queue_mapped
  blk-mq: don't check queue mapped in __blk_mq_delay_run_hw_queue()
  ...
2018-04-13 15:15:15 -07:00
Linus Torvalds
b284d4d5a6 Merge tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
 "The big ticket items are:

   - support for rbd "fancy" striping (myself).

     The striping feature bit is now fully implemented, allowing mapping
     v2 images with non-default striping patterns. This completes
     support for --image-format 2.

   - CephFS quota support (Luis Henriques and Zheng Yan).

     This set is based on the new SnapRealm code in the upcoming v13.y.z
     ("Mimic") release. Quota handling will be rejected on older
     filesystems.

   - memory usage improvements in CephFS (Chengguang Xu).

     Directory specific bits have been split out of ceph_file_info and
     some effort went into improving cap reservation code to avoid OOM
     crashes.

  Also included a bunch of assorted fixes all over the place from
  Chengguang and others"

* tag 'ceph-for-4.17-rc1' of git://github.com/ceph/ceph-client: (67 commits)
  ceph: quota: report root dir quota usage in statfs
  ceph: quota: add counter for snaprealms with quota
  ceph: quota: cache inode pointer in ceph_snap_realm
  ceph: fix root quota realm check
  ceph: don't check quota for snap inode
  ceph: quota: update MDS when max_bytes is approaching
  ceph: quota: support for ceph.quota.max_bytes
  ceph: quota: don't allow cross-quota renames
  ceph: quota: support for ceph.quota.max_files
  ceph: quota: add initial infrastructure to support cephfs quotas
  rbd: remove VLA usage
  rbd: fix spelling mistake: "reregisteration" -> "reregistration"
  ceph: rename function drop_leases() to a more descriptive name
  ceph: fix invalid point dereference for error case in mdsc destroy
  ceph: return proper bool type to caller instead of pointer
  ceph: optimize memory usage
  ceph: optimize mds session register
  libceph, ceph: add __init attribution to init funcitons
  ceph: filter out used flags when printing unused open flags
  ceph: don't wait on writeback when there is no more dirty pages
  ...
2018-04-10 12:25:30 -07:00
Omar Sandoval
bdac616db9 loop: fix LOOP_GET_STATUS lock imbalance
Commit 2d1d4c1e59 made loop_get_status() drop lo_ctx_mutex before
returning, but the loop_get_status_old(), loop_get_status64(), and
loop_get_status_compat() wrappers don't call loop_get_status() if the
passed argument is NULL. The callers expect that the lock is dropped, so
make sure we drop it in that case, too.

Reported-by: syzbot+31e8daa8b3fc129e75f2@syzkaller.appspotmail.com
Fixes: 2d1d4c1e59 ("loop: don't call into filesystem while holding lo_ctl_mutex")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-10 08:38:46 -06:00
Tetsuo Handa
1e047eaab3 block/loop: fix deadlock after loop_set_status
syzbot is reporting deadlocks at __blkdev_get() [1].

----------------------------------------
[   92.493919] systemd-udevd   D12696   525      1 0x00000000
[   92.495891] Call Trace:
[   92.501560]  schedule+0x23/0x80
[   92.502923]  schedule_preempt_disabled+0x5/0x10
[   92.504645]  __mutex_lock+0x416/0x9e0
[   92.510760]  __blkdev_get+0x73/0x4f0
[   92.512220]  blkdev_get+0x12e/0x390
[   92.518151]  do_dentry_open+0x1c3/0x2f0
[   92.519815]  path_openat+0x5d9/0xdc0
[   92.521437]  do_filp_open+0x7d/0xf0
[   92.527365]  do_sys_open+0x1b8/0x250
[   92.528831]  do_syscall_64+0x6e/0x270
[   92.530341]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[   92.931922] 1 lock held by systemd-udevd/525:
[   92.933642]  #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x73/0x4f0
----------------------------------------

The reason of deadlock turned out that wait_event_interruptible() in
blk_queue_enter() got stuck with bdev->bd_mutex held at __blkdev_put()
due to q->mq_freeze_depth == 1.

----------------------------------------
[   92.787172] a.out           S12584   634    633 0x80000002
[   92.789120] Call Trace:
[   92.796693]  schedule+0x23/0x80
[   92.797994]  blk_queue_enter+0x3cb/0x540
[   92.803272]  generic_make_request+0xf0/0x3d0
[   92.807970]  submit_bio+0x67/0x130
[   92.810928]  submit_bh_wbc+0x15e/0x190
[   92.812461]  __block_write_full_page+0x218/0x460
[   92.815792]  __writepage+0x11/0x50
[   92.817209]  write_cache_pages+0x1ae/0x3d0
[   92.825585]  generic_writepages+0x5a/0x90
[   92.831865]  do_writepages+0x43/0xd0
[   92.836972]  __filemap_fdatawrite_range+0xc1/0x100
[   92.838788]  filemap_write_and_wait+0x24/0x70
[   92.840491]  __blkdev_put+0x69/0x1e0
[   92.841949]  blkdev_close+0x16/0x20
[   92.843418]  __fput+0xda/0x1f0
[   92.844740]  task_work_run+0x87/0xb0
[   92.846215]  do_exit+0x2f5/0xba0
[   92.850528]  do_group_exit+0x34/0xb0
[   92.852018]  SyS_exit_group+0xb/0x10
[   92.853449]  do_syscall_64+0x6e/0x270
[   92.854944]  entry_SYSCALL_64_after_hwframe+0x42/0xb7

[   92.943530] 1 lock held by a.out/634:
[   92.945105]  #0: 00000000a2849e25 (&bdev->bd_mutex){+.+.}, at: __blkdev_put+0x3c/0x1e0
----------------------------------------

The reason of q->mq_freeze_depth == 1 turned out that loop_set_status()
forgot to call blk_mq_unfreeze_queue() at error paths for
info->lo_encrypt_type != NULL case.

----------------------------------------
[   37.509497] CPU: 2 PID: 634 Comm: a.out Tainted: G        W        4.16.0+ #457
[   37.513608] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/19/2017
[   37.518832] RIP: 0010:blk_freeze_queue_start+0x17/0x40
[   37.521778] RSP: 0018:ffffb0c2013e7c60 EFLAGS: 00010246
[   37.524078] RAX: 0000000000000000 RBX: ffff8b07b1519798 RCX: 0000000000000000
[   37.527015] RDX: 0000000000000002 RSI: ffffb0c2013e7cc0 RDI: ffff8b07b1519798
[   37.529934] RBP: ffffb0c2013e7cc0 R08: 0000000000000008 R09: 47a189966239b898
[   37.532684] R10: dad78b99b278552f R11: 9332dca72259d5ef R12: ffff8b07acd73678
[   37.535452] R13: 0000000000004c04 R14: 0000000000000000 R15: ffff8b07b841e940
[   37.538186] FS:  00007fede33b9740(0000) GS:ffff8b07b8e80000(0000) knlGS:0000000000000000
[   37.541168] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   37.543590] CR2: 00000000206fdf18 CR3: 0000000130b30006 CR4: 00000000000606e0
[   37.546410] Call Trace:
[   37.547902]  blk_freeze_queue+0x9/0x30
[   37.549968]  loop_set_status+0x67/0x3c0 [loop]
[   37.549975]  loop_set_status64+0x3b/0x70 [loop]
[   37.549986]  lo_ioctl+0x223/0x810 [loop]
[   37.549995]  blkdev_ioctl+0x572/0x980
[   37.550003]  block_ioctl+0x34/0x40
[   37.550006]  do_vfs_ioctl+0xa7/0x6d0
[   37.550017]  ksys_ioctl+0x6b/0x80
[   37.573076]  SyS_ioctl+0x5/0x10
[   37.574831]  do_syscall_64+0x6e/0x270
[   37.576769]  entry_SYSCALL_64_after_hwframe+0x42/0xb7
----------------------------------------

[1] https://syzkaller.appspot.com/bug?id=cd662bc3f6022c0979d01a262c318fab2ee9b56f

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reported-by: syzbot <bot+48594378e9851eab70bcd6f99327c7db58c5a28a@syzkaller.appspotmail.com>
Fixes: ecdd09597a ("block/loop: fix race between I/O and set_status")
Cc: Ming Lei <tom.leiming@gmail.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: stable <stable@vger.kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2018-04-10 08:38:46 -06:00