The name ATA_QCFLAG_FAILED is misleading since it does not mean that a
QC completed in error, or that it didn't complete at all. It means that
libata decided to schedule EH for the QC, so the QC is now owned by the
libata error handler (EH).
The normal execution path is responsible for not accessing a QC owned
by EH. libata core enforces the rule by returning NULL from
ata_qc_from_tag() for QCs owned by EH.
It is quite easy to mistake that a QC marked with ATA_QCFLAG_FAILED was
an error. However, a QC that was actually an error is instead indicated
by having qc->err_mask set. E.g. when we have a NCQ error, we abort all
QCs, which currently will mark all QCs as ATA_QCFLAG_FAILED. However, it
will only be a single QC that is an error (i.e. has qc->err_mask set).
Rename ATA_QCFLAG_FAILED to ATA_QCFLAG_EH to more clearly highlight that
this flag simply means that a QC is now owned by EH. This new name will
not mislead to think that the QC was an error (which is instead
indicated by having qc->err_mask set).
This also makes it more obvious that the EH code skips all QCs that do
not have ATA_QCFLAG_EH set (rather than ATA_QCFLAG_FAILED), since the EH
code should simply only care about QCs that are owned by EH itself.
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Factor out sas_ata_add_dev() and put it in sas_ata.c since it is a SATA
related interface. Also follow the standard coding style to define an
inline empty function when CONFIG_SCSI_SAS_ATA is not enabled.
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The domain device 'child' is allocated in sas_ex_discover_end_dev() and
used to be added to the dev_list in this function. After the following two
fixes the device is added to the disco_list instead. As a result, the
list_del() and locking left behind is now redundant.
Fixes: 87c8331fcf ("[SCSI] libsas: prevent domain rediscovery competing with ata error handling")
Fixes: 92625f9bff ("[SCSI] libsas: restore scan order")
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The coding style where calling this interface is inconsistent with other
interfaces for SATA devices. The standard style for other SATA interfaces
is like:
#ifdefine CONFIG_SCSI_SAS_ATA
void sas_ata_task_abort(struct sas_task *task);
#else
static inline void sas_ata_task_abort(struct sas_task *task)
{
}
#endif
And the callers does not have to do things like "#ifdefine CONFIG_SCSI_SAS_ATA"
and may call the interface directly. So follow the standard style here.
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a sas_get_ata_command_set() declaration above sas_get_ata_info()
to make it compile. However, this function is defined in the same
file. Move it up to save the forward declaration.
Also remove the variable 'fis' which is not needed in this function.
Cc: John Garry <john.g.garry@oracle.com>
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (target, ufs, smartpqi, lpfc).
There are some core changes, mostly around reworking some of our user
context assumptions in device put and moving some code around.
The remaining updates are bug fixes and minor changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (138 commits)
scsi: sg: Fix get_user() in call sg_scsi_ioctl()
scsi: megaraid_sas: Fix some spelling mistakes in comment
scsi: core: Use SCSI_SCAN_INITIAL in do_scsi_scan_host()
scsi: core: Use SCSI_SCAN_RESCAN in __scsi_add_device()
scsi: ufs: ufs-mediatek: Remove unnecessary return code
scsi: ufs: core: Fix the polling implementation
scsi: libsas: Do not export sas_ata_wait_after_reset()
scsi: hisi_sas: Fix SATA devices missing issue during I_T nexus reset
scsi: libsas: Add smp_ata_check_ready_type()
scsi: Revert "scsi: hisi_sas: Don't send bcast events from HW during nexus HA reset"
scsi: Revert "scsi: hisi_sas: Drain bcast events in hisi_sas_rescan_topology()"
scsi: ufs: ufs-mediatek: Modify the return value
scsi: ufs: ufs-mediatek: Remove unneeded code
scsi: device_handler: alua: Call scsi_device_put() from non-atomic context
scsi: device_handler: alua: Revert "Move a scsi_device_put() call out of alua_check_vpd()"
scsi: snic: Fix possible UAF in snic_tgt_create()
scsi: qla2xxx: Initialize vha->unknown_atio_[list, work] for NPIV hosts
scsi: qla2xxx: Remove duplicate of vha->iocb_work initialization
scsi: fcoe: Fix transport not deattached when fcoe_if_init() fails
scsi: sd: Use 16-byte SYNCHRONIZE CACHE on ZBC devices
...
Similar to how AHCI handles NCQ errors in ahci_error_intr() ->
ata_port_abort() -> ata_do_link_abort(), add an NCQ error handler for LLDDs
to call to initiate a link abort.
This will mark all outstanding QCs as failed and kick-off EH.
Note:
A "force reset" argument is added for drivers which require the ATA error
handling to always reset the device.
A driver may require this feature for when SATA device per-SCSI cmnd
resources are only released during reset for ATA EH. As such, we need an
option to force reset to be done, regardless of what any EH autopsy
decides.
The SATA device FIS fields are set to indicate a device error from
ata_eh_analyze_tf().
Suggested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Suggested-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Link: https://lore.kernel.org/r/1665998435-199946-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: Niklas Cassel <niklas.cassel@wdc.com> # pm80xx
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (qla2xxx, lpfc, ufs, hisi_sas, mpi3mr,
mpt3sas, target). The biggest change (from my biased viewpoint) being
that the mpi3mr now attached to the SAS transport class, making it the
first fusion type device to do so.
Beyond the usual bug fixing and security class reworks, there aren't a
huge number of core changes"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (141 commits)
scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()
scsi: mpi3mr: Remove unnecessary cast
scsi: stex: Properly zero out the passthrough command structure
scsi: mpi3mr: Update driver version to 8.2.0.3.0
scsi: mpi3mr: Fix scheduling while atomic type bug
scsi: mpi3mr: Scan the devices during resume time
scsi: mpi3mr: Free enclosure objects during driver unload
scsi: mpi3mr: Handle 0xF003 Fault Code
scsi: mpi3mr: Graceful handling of surprise removal of PCIe HBA
scsi: mpi3mr: Schedule IRQ kthreads only on non-RT kernels
scsi: mpi3mr: Support new power management framework
scsi: mpi3mr: Update mpi3 header files
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix ioc->base_readl() use"
scsi: mpt3sas: Revert "scsi: mpt3sas: Fix writel() use"
scsi: wd33c93: Remove dead code related to the long-gone config WD33C93_PIO
scsi: core: Add I/O timeout count for SCSI device
scsi: qedf: Populate sysfs attributes for vport
scsi: pm8001: Replace one-element array with flexible-array member
scsi: 3w-xxxx: Replace one-element array with flexible-array member
scsi: hptiop: Replace one-element array with flexible-array member in struct hpt_iop_request_ioctl_command()
...
The function __ata_change_queue_depth() uses the helper
ata_scsi_find_dev() to get the ata_device structure of a scsi device and
set that device maximum queue depth. However, when the ata device is
managed by libsas, ata_scsi_find_dev() returns NULL, turning
__ata_change_queue_depth() into a nop, which prevents the user from
setting the maximum queue depth of ATA devices used with libsas based
HBAs.
Fix this by renaming __ata_change_queue_depth() to
ata_change_queue_depth() and adding a pointer to the ata_device
structure of the target device as argument. This pointer is provided by
ata_scsi_change_queue_depth() using ata_scsi_find_dev() in the case of
a libata managed device and by sas_change_queue_depth() using
sas_to_ata_dev() in the case of a libsas managed ata device.
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Tested-by: John Garry <john.garry@huawei.com>
When executing SMP task failed, the smp_execute_task_sg() calls del_timer()
to delete "slow_task->timer". However, if the timer handler
sas_task_internal_timedout() is running, the del_timer() in
smp_execute_task_sg() will not stop it and a UAF will happen. The process
is shown below:
(thread 1) | (thread 2)
smp_execute_task_sg() | sas_task_internal_timedout()
... |
del_timer() |
... | ...
sas_free_task(task) |
kfree(task->slow_task) //FREE|
| task->slow_task->... //USE
Fix by calling del_timer_sync() in smp_execute_task_sg(), which makes sure
the timer handler have finished before the "task->slow_task" is
deallocated.
Link: https://lore.kernel.org/r/20220920144213.10536-1-duoming@zju.edu.cn
Fixes: 2908d778ab ("[SCSI] aic94xx: new driver")
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:
In function ‘sas_get_ex_change_count’,
inlined from ‘sas_find_bcast_dev’ at
drivers/scsi/libsas/sas_expander.c:1816:8:
drivers/scsi/libsas/sas_expander.c:1781:20: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[32]’ [-Warray-bounds]
1781 | if (rg_resp->result != SMP_RESP_FUNC_ACC) {
| ~~~~~~~^~~~~~~~
This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.
Fix this by defining struct smp_rg_resp for sas report general responses.
Link: https://lore.kernel.org/r/20220609022456.409087-3-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When compiling with gcc 12, several warnings are thrown by gcc when
compiling drivers/scsi/libsas/sas_expander.c, e.g.:
In function ‘sas_get_phy_change_count’,
inlined from ‘sas_find_bcast_phy.constprop’ at
drivers/scsi/libsas/sas_expander.c:1737:9:
drivers/scsi/libsas/sas_expander.c:1697:39: warning: array subscript
‘struct smp_resp[0]’ is partly outside array bounds of ‘unsigned
char[56]’ [-Warray-bounds]
1697 | *pcc = disc_resp->disc.change_count;
| ~~~~~~~~~~~~~~~^~~~~~~~~~~~~
This is due to the use of the struct smp_resp to aggregate all possible
response types using a union but allocating a response buffer with a size
exactly equal to the size of the response type needed. This leads to access
to fields of struct smp_resp from an allocated memory area that is smaller
than the size of struct smp_resp.
Fix this by defining struct smp_disc_resp for sas discovery operations.
Since this structure and the generic struct smp_resp are identical for
the little endian and big endian archs, move the definition of these
structures at the end of include/scsi/sas.h to avoid repeating their
definition.
Link: https://lore.kernel.org/r/20220609022456.409087-2-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The internal abort feature is common to hisi_sas and pm8001 HBAs, and the
driver support is similar also, so add a common handler.
Two modes of operation will be supported:
- single: Abort a single tagged command
- device: Abort all commands associated with a specific domain device
A new protocol is added, SAS_PROTOCOL_INTERNAL_ABORT, so the common queue
command API may be re-used.
Only add "single" support as a first step.
Link: https://lore.kernel.org/r/1647001432-239276-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sparse throws a warning about context imbalance ("different lock contexts
for basic block") in sas_form_port() as it gets confused with the fact that
a port is locked within one of the two search loops and unlocked afterward
outside of the search loops once the phy is added to the port. Since this
code is not easy to follow, improve it by factoring out the code adding the
phy to the port once the port is locked into the helper function
sas_form_port_add_phy(). This helper can then be called directly within the
port search loops, avoiding confusion and clearing the sparse warning.
Link: https://lore.kernel.org/r/20220228094857.557329-1-damien.lemoal@opensource.wdc.com
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers using libsas have to issue their own TMFs, and the code to do this
is duplicated between drivers.
Add a common function to handle TMFs. This will be also used for sending
ATA commands, but use name "tmf" as the purpose is similar between SCSI and
ATA in the usage.
The force_phy_id argument will be used for a hisi_sas HW bug workaround.
We check the sas task status stat field against TMF codes, as according to
the SAS v1.1 spec 9.2.2.5.3, if response_data is in datapres, then the
response code should be a TMF code - see table 128.
Link: https://lore.kernel.org/r/1645112566-115804-10-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
LLDD TMF callbacks may return linux or other error codes instead of TMF
codes. This may cause problems in sas_scsi_find_task() ->
.lldd_query_task(), as only TMF codes are handled there. As such, we may
not return a task_disposition type from sas_scsi_find_task(). Function
sas_eh_handle_sas_errors() only handles that type, and will only progress
error handling for those recognised types.
Return TASK_ABORT_FAILED upon exit on the assumption that the command may
still be alive and error handling should be escalated.
Link: https://lore.kernel.org/r/1645112566-115804-2-git-send-email-john.garry@huawei.com
Tested-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>