bianbu-linux-6.6

mirror of https://gitee.com/bianbu-linux/linux-6.6 synced 2025-07-18 01:23:36 -04:00

Author	SHA1	Message	Date
Saurav Kashyap	b09ea43fec	scsi: qedf: Do not kill timeout work for original I/O on RRQ completion The timer is already cancelled when abort is completed, hence no need to cancel it again. Link: https://lore.kernel.org/r/20200807110656.19965-4-jhasan@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Saurav Kashyap	7fb8ff0806	scsi: qedf: Check the validity of rjt frame before processing This is reported by Klockwork. Link: https://lore.kernel.org/r/20200807110656.19965-3-jhasan@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Saurav Kashyap	a521bbc38d	scsi: qedf: Check for port type and role before processing an event The rport lock gets initialized during offload. If a non-FCP or non-target rport got logout then this rport will be uninitialized. KASAN was complaining because of it. ========= [ 14.384434] the code is fine but needs lockdep annotation. [ 14.384482] turning off the locking correctness validator. ======== Link: https://lore.kernel.org/r/20200807110656.19965-2-jhasan@marvell.com Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Sai Prakash Ranjan	68bdb3db6c	scsi: ufs-qcom: Remove unused MSM bus scaling APIs MSM bus scaling has moved on to use interconnect framework and downstream bus scaling APIs like msm_bus_scale*() do not exist anymore in the kernel. Currently they are guarded by a config which also does not exist and hence there are no build failures reported. Remove these unused interfaces as they are currently no-ops and the scaling support that may be added in future will use interconnect API. Link: https://lore.kernel.org/r/20200804161033.15586-1-saiprakash.ranjan@codeaurora.org Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Don Brace	ce60a2b827	scsi: smartpqi: Bump version to 1.2.16-010 Link: https://lore.kernel.org/r/159622931040.30579.9167901134341507088.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Gerry Morong <gerry.morong@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Kevin Barnett	8b664fefa3	scsi: smartpqi: Add RAID bypass counter Add a counter to assist in verifying when RAID bypass is being used. Link: https://lore.kernel.org/r/159622930468.30579.13153724465552773544.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Kevin Barnett	4d15ad3813	scsi: smartpqi: Support device deletion via sysfs Support device deletion via sysfs. I.e: echo 1 > /sys/block/sd<X>/device/delete Link: https://lore.kernel.org/r/159622929885.30579.2727491506675011534.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Kevin Barnett	9e68cccc8e	scsi: smartpqi: Avoid crashing kernel for controller issues Eliminate kernel panics when getting invalid responses from controller. Take controller offline instead of causing kernel panics. Link: https://lore.kernel.org/r/159622929306.30579.16523318707596752828.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Prasad Munirathnam <Prasad.Munirathnam@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Mahesh Rajashekhara	244ca45e15	scsi: smartpqi: Update logical volume size after expansion Have OS rescan after logical volume expansion to reflect new size. Link: https://lore.kernel.org/r/159622928727.30579.298277463169866711.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Mahesh Rajashekhara	3af06083ba	scsi: smartpqi: Add id support for SmartRAID 3152-8i VID_9005, DID_028F, SVID_9005 and SDID_080A. Link: https://lore.kernel.org/r/159622928143.30579.14769183842894725454.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Kevin Barnett <kevin.barnett@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:51 -04:00
Kevin Barnett	ce14379350	scsi: smartpqi: Identify physical devices without issuing INQUIRY Eliminate issuing INQUIRYs to problematic devices by using information provided by controller. Link: https://lore.kernel.org/r/159622927172.30579.3960527536810532094.stgit@brunhilda Reviewed-by: Scott Teel <scott.teel@microsemi.com> Reviewed-by: Scott Benesh <scott.benesh@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Signed-off-by: Kevin Barnett <kevin.barnett@microsemi.com> Signed-off-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	0491bdc7ee	scsi: mpt3sas: Update driver version to 35.100.00.00 Updated driver version to 35.100.00.00 Link: https://lore.kernel.org/r/1596096229-3341-8-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	711a923c14	scsi: mpt3sas: Postprocessing of target and LUN reset If driver has not received the interrupt for the aborted SCSI command before processing the TM reply, driver polls all the reply descriptor pools looking for the reply for the aborted SCSI command before marking TM as FAILED. If it finds the reply, then it marks the TM as SUCCESS otherwise it marks it FAILED. scsih_tm_cmd_map_status() checks whether TM has aborted the timed out SCSI command or not. If TM has aborted the IO, then it returns SUCCESS else it returns FAILED. Link: https://lore.kernel.org/r/1596096229-3341-7-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	521e9c0b62	scsi: mpt3sas: Add functions to check if any cmd is outstanding on Target and LUN Add helper functions to check whether any SCSI command is outstanding on particular Target, LUN device. Also add function parameters 'channel', 'id' to function mpt3sas_scsih_issue_tm(). Link: https://lore.kernel.org/r/1596096229-3341-6-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	5afa9d4444	scsi: mpt3sas: Rename and export interrupt mask/unmask functions Rename Function _base_unmask_interrupts() to mpt3sas_base_unmask_interrupts() and _base_mask_interrupts() to mpt3sas_base_mask_interrupts(). Also add function declarion to mpt3sas_base.h Link: https://lore.kernel.org/r/1596096229-3341-5-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	9e73ed2e4c	scsi: mpt3sas: Cancel the running work during host reset It is not recommended to issue back-to-back host reset without any delay. However, if someone issues back-to-back host reset then we observe that target devices get unregistered and re-register with SML. And if OS drive is behind the HBA when it gets unregistered, then file-system goes into read-only mode. Normally during host reset, driver marks accessible target devices as responding and triggers the event MPT3SAS_REMOVE_UNRESPONDING_DEVICES to remove any non-responding devices through FW worker thread. While processing this event, driver unregisters the non-responding devices and clears the responding flag for all the devices. Currently, during host reset, driver is cancelling only those Firmware event works which are pending in Firmware event workqueue. It is not cancelling work which is currently running. Change the driver to cancel all events. Link: https://lore.kernel.org/r/1596096229-3341-4-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	af6ec1eee5	scsi: mpt3sas: Dump system registers for debugging When controller fails to transition to READY state during driver probe, dump the system interface register set. This will give snapshot of the firmware status for debugging driver load issues. Link: https://lore.kernel.org/r/1596096229-3341-3-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Suganath Prabu S	f09219e48b	scsi: mpt3sas: Memset config_cmds.reply buffer with zeros Currently config_cmds.reply buffer is not memset to zero before posting config page request message. In some cases, for the current config request, the previous config reply is getting processed and we will observe PageType mismatch between request to reply buffer. It will be difficult to debug this type of issue and it confuses by thinking that HBA Firmware itself posted the wrong config reply. So it is better to memset the config_cmds.reply buffer with zeros before issuing the config request. Link: https://lore.kernel.org/r/1596096229-3341-2-git-send-email-suganath-prabu.subramani@broadcom.com Signed-off-by: Suganath Prabu S <suganath-prabu.subramani@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:50 -04:00
Can Guo	8bb2dde069	scsi: ufs: Properly release resources if a task is aborted successfully In current UFS task abort hook, namely ufshcd_abort(), if one task is aborted successfully, clk_gating.active_reqs held by this task is not decreased, which makes clk_gating.active_reqs stay above zero forever, thus clock gating would never happen. Instead of releasing resources of one task "manually", use the existing func __ufshcd_transfer_req_compl(). This change also eliminates a possible race of scsi_dma_unmap() from the real completion in IRQ handler path. Link: https://lore.kernel.org/r/1596975355-39813-10-git-send-email-cang@codeaurora.org Fixes: `1ab27c9cf8` ("ufs: Add support for clock gating") CC: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-20 21:41:25 -04:00
Quinn Tran	dca93232b3	Revert "scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe" FCP T10-PI and NVMe features are independent of each other. This patch allows both features to co-exist. This reverts commit `5da05a26b8`. Link: https://lore.kernel.org/r/20200806111014.28434-12-njavali@marvell.com Fixes: `5da05a26b8` ("scsi: qla2xxx: Disable T10-DIF feature with FC-NVMe during probe") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:43:55 -04:00
Saurav Kashyap	de7e619430	Revert "scsi: qla2xxx: Fix crash on qla2x00_mailbox_command" FCoE adapter initialization failed for ISP8021 with the following patch applied. In addition, reproduction of the issue the patch originally tried to address has been unsuccessful. This reverts commit `3cb182b3fa`. Link: https://lore.kernel.org/r/20200806111014.28434-11-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:43:55 -04:00
Quinn Tran	83949613fa	scsi: qla2xxx: Fix null pointer access during disconnect from subsystem NVMEAsync command is being submitted to QLA while the same NVMe controller is in the middle of reset. The reset path has deleted the association and freed aen_op->fcp_req.private. Add a check for this private pointer before issuing the command. ... 6 [ffffb656ca11fce0] page_fault at ffffffff8c00114e [exception RIP: qla_nvme_post_cmd+394] RIP: ffffffffc0d012ba RSP: ffffb656ca11fd98 RFLAGS: 00010206 RAX: ffff8fb039eda228 RBX: ffff8fb039eda200 RCX: 00000000000da161 RDX: ffffffffc0d4d0f0 RSI: ffffffffc0d26c9b RDI: ffff8fb039eda220 RBP: 0000000000000013 R8: ffff8fb47ff6aa80 R9: 0000000000000002 R10: 0000000000000000 R11: ffffb656ca11fdc8 R12: ffff8fb27d04a3b0 R13: ffff8fc46dd98a58 R14: 0000000000000000 R15: ffff8fc4540f0000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 7 [ffffb656ca11fe08] nvme_fc_start_fcp_op at ffffffffc0241568 [nvme_fc] 8 [ffffb656ca11fe50] nvme_fc_submit_async_event at ffffffffc0241901 [nvme_fc] 9 [ffffb656ca11fe68] nvme_async_event_work at ffffffffc014543d [nvme_core] 10 [ffffb656ca11fe98] process_one_work at ffffffff8b6cd437 11 [ffffb656ca11fed8] worker_thread at ffffffff8b6cdcef 12 [ffffb656ca11ff10] kthread at ffffffff8b6d3402 13 [ffffb656ca11ff50] ret_from_fork at ffffffff8c000255 -- PID: 37824 TASK: ffff8fb033063d80 CPU: 20 COMMAND: "kworker/u97:451" 0 [ffffb656ce1abc28] __schedule at ffffffff8be629e3 1 [ffffb656ce1abcc8] schedule at ffffffff8be62fe8 2 [ffffb656ce1abcd0] schedule_timeout at ffffffff8be671ed 3 [ffffb656ce1abd70] wait_for_completion at ffffffff8be639cf 4 [ffffb656ce1abdd0] flush_work at ffffffff8b6ce2d5 5 [ffffb656ce1abe70] nvme_stop_ctrl at ffffffffc0144900 [nvme_core] 6 [ffffb656ce1abe80] nvme_fc_reset_ctrl_work at ffffffffc0243445 [nvme_fc] 7 [ffffb656ce1abe98] process_one_work at ffffffff8b6cd437 8 [ffffb656ce1abed8] worker_thread at ffffffff8b6cdb50 9 [ffffb656ce1abf10] kthread at ffffffff8b6d3402 10 [ffffb656ce1abf50] ret_from_fork at ffffffff8c000255 Link: https://lore.kernel.org/r/20200806111014.28434-10-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:15 -04:00
Saurav Kashyap	dffa114533	scsi: qla2xxx: Check if FW supports MQ before enabling OS boot during Boot from SAN was stuck at dracut emergency shell after enabling NVMe driver parameter. For non-MQ support the driver was enabling MQ. Add a check to confirm if FW supports MQ. Link: https://lore.kernel.org/r/20200806111014.28434-9-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Saurav Kashyap <skashyap@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:15 -04:00
Arun Easi	897d68eb81	scsi: qla2xxx: Fix WARN_ON in qla_nvme_register_hba qla_nvme_register_hba() puts out a warning when there are not enough queue pairs available for FC-NVME. Just fail the NVME registration rather than a WARNING + call Trace. Link: https://lore.kernel.org/r/20200806111014.28434-8-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:14 -04:00
Arun Easi	49030003a3	scsi: qla2xxx: Allow ql2xextended_error_logging special value 1 to be set anytime ql2xextended_error_logging can now be set to 1 to get the default mask value, as opposed to at module load time only. Link: https://lore.kernel.org/r/20200806111014.28434-7-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Arun Easi <aeasi@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:13 -04:00
Quinn Tran	81b9d1e19d	scsi: qla2xxx: Reduce noisy debug message Update debug level and message for ELS IOCB done. Link: https://lore.kernel.org/r/20200806111014.28434-6-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:13 -04:00
Quinn Tran	abb31aeaa9	scsi: qla2xxx: Fix login timeout Multipath errors were seen during failback due to login timeout. The remote device sent LOGO, the local host tore down the session and did relogin. The RSCN arrived indicates remote device is going through failover after which the relogin is in a 20s timeout phase. At this point the driver is stuck in the relogin process. Add a fix to delete the session as part of abort/flush the login. Link: https://lore.kernel.org/r/20200806111014.28434-5-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:12 -04:00
Quinn Tran	4709272f63	scsi: qla2xxx: Indicate correct supported speeds for Mezz card Correct the supported speeds for 16G Mezz card. Link: https://lore.kernel.org/r/20200806111014.28434-4-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:11 -04:00
Quinn Tran	a117579d02	scsi: qla2xxx: Flush I/O on zone disable Perform implicit logout to flush I/O on zone disable. Link: https://lore.kernel.org/r/20200806111014.28434-3-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:11 -04:00
Quinn Tran	10ae30ba66	scsi: qla2xxx: Flush all sessions on zone disable On Zone Disable, certain switches would ignore all commands. This causes timeout for both switch scan command and abort of that command. On detection of this condition, all sessions will be shutdown. Link: https://lore.kernel.org/r/20200806111014.28434-2-njavali@marvell.com Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Quinn Tran <qutran@marvell.com> Signed-off-by: Himanshu Madhani <hmadhani@marvell.com> Signed-off-by: Nilesh Javali <njavali@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:40:10 -04:00
Enzo Matsumiya	c314a014b1	scsi: qla2xxx: Use MBX_TOV_SECONDS for mailbox command timeout values Improves readability of qla_mbx.c. Link: https://lore.kernel.org/r/20200805200546.22497-1-ematsumiya@suse.de Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Roman Bolshakov <r.bolshakov@yadro.com> Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:19:06 -04:00
Douglas Gilbert	223f91b480	scsi: scsi_debug: Fix scp is NULL errors John Garry reported 'sdebug_q_cmd_complete: scp is NULL' failures that were mainly seen on aarch64 machines (e.g. RPi 4 with four A72 CPUs). The problem was tracked down to a missing critical section on a "short circuit" path. Namely, the time to process the current command so far has already exceeded the requested command duration (i.e. the number of nanoseconds in the ndelay parameter). The random=1 parameter setting was pivotal in finding this error. The failure scenario involved first taking that "short circuit" path (due to a very short command duration) and then taking the more likely hrtimer_start() path (due to a longer command duration). With random=1 each command's duration is taken from the uniformly distributed [0..ndelay) interval. The fio utility also helped by reliably generating the error scenario at about once per minute on a RPi 4 (64 bit OS). Link: https://lore.kernel.org/r/20200813155738.109298-1-dgilbert@interlog.com Reported-by: John Garry <john.garry@huawei.com> Reviewed-by: Lee Duncan <lduncan@suse.com> Signed-off-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:13:17 -04:00
Bean Huo	d87a1f6d02	scsi: ufs: No need to send Abort Task if the task in DB was cleared If the bit corresponding to a task in the Doorbell register has been cleared, no need to poll the status of the task on the device side and to send an Abort Task TM. Instead, let it directly goto cleanup. In addition, to keep original debug output, move the goto below the debug print. Link: https://lore.kernel.org/r/20200811141859.27399-3-huobean@gmail.com Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Can Guo <cang@codeaurora.org> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:09:14 -04:00
Stanley Chu	b10178ee7f	scsi: ufs: Clean up completed request without interrupt notification If somehow no interrupt notification is raised for a completed request and its doorbell bit is cleared by host, UFS driver needs to cleanup its outstanding bit in ufshcd_abort(). Otherwise, system may behave abnormally in the following scenario: After ufshcd_abort() returns, this request will be requeued by SCSI layer with its outstanding bit set. Any future completed request will trigger ufshcd_transfer_req_compl() to handle all "completed outstanding bits". At this time the "abnormal outstanding bit" will be detected and the "requeued request" will be chosen to execute request post-processing flow. This is wrong because this request is still "alive". Link: https://lore.kernel.org/r/20200811141859.27399-2-huobean@gmail.com Reviewed-by: Can Guo <cang@codeaurora.org> Acked-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:09:13 -04:00
Adrian Hunter	127d5f7c4b	scsi: ufs: Improve interrupt handling for shared interrupts For shared interrupts, the interrupt status might be zero, so check that first. Link: https://lore.kernel.org/r/20200811133936.19171-2-adrian.hunter@intel.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:03:29 -04:00
Adrian Hunter	6337f58cec	scsi: ufs: Fix interrupt error message for shared interrupts The interrupt might be shared, in which case it is not an error for the interrupt handler to be called when the interrupt status is zero, so don't print the message unless there was enabled interrupt status. Link: https://lore.kernel.org/r/20200811133936.19171-1-adrian.hunter@intel.com Fixes: `9333d77573` ("scsi: ufs: Fix irq return code") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:03:29 -04:00
Adrian Hunter	8da76f71fe	scsi: ufs-pci: Add quirk for broken auto-hibernate for Intel EHL Intel EHL UFS host controller advertises auto-hibernate capability but it does not work correctly. Add a quirk for that. [mkp: checkpatch fix] Link: https://lore.kernel.org/r/20200810141024.28859-1-adrian.hunter@intel.com Fixes: `8c09d75276` ("scsi: ufshdc-pci: Add Intel PCI IDs for EHL") Acked-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 22:00:26 -04:00
Stanley Chu	215d326702	scsi: ufs-mediatek: Fix incorrect time to wait link status Fix incorrect calculation of "ms" based waiting time in function ufs_mtk_setup_clocks(). Link: https://lore.kernel.org/r/20200809055702.20140-1-stanley.chu@mediatek.com Fixes: `9006e3986f` ("scsi: ufs-mediatek: Do not gate clocks if auto-hibern8 is not entered yet") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 21:53:37 -04:00
Stanley Chu	93b6c5db06	scsi: ufs: Fix possible infinite loop in ufshcd_hold In ufshcd_suspend(), after clk-gating is suspended and link is set as Hibern8 state, ufshcd_hold() is still possibly invoked before ufshcd_suspend() returns. For example, MediaTek's suspend vops may issue UIC commands which would call ufshcd_hold() during the command issuing flow. Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled, then ufshcd_hold() may enter infinite loops because there is no clk-ungating work scheduled or pending. In this case, ufshcd_hold() shall just bypass, and keep the link as Hibern8 state. Link: https://lore.kernel.org/r/20200809050734.18740-1-stanley.chu@mediatek.com Reviewed-by: Avri Altman <avri.altman@wdc.com> Co-developed-by: Andy Teng <andy.teng@mediatek.com> Signed-off-by: Andy Teng <andy.teng@mediatek.com> Signed-off-by: Stanley Chu <stanley.chu@mediatek.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 21:52:45 -04:00
Jing Xiangfeng	2138d1c918	scsi: ufs: ti-j721e-ufs: Fix error return in ti_j721e_ufs_probe() Fix to return error code PTR_ERR() from the error handling case instead of 0. Link: https://lore.kernel.org/r/20200806070135.67797-1-jingxiangfeng@huawei.com Fixes: `22617e2163` ("scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes") Reviewed-by: Avri Altman <avri.altman@wdc.com> Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 21:48:45 -04:00
Can Guo	5586dd8ea2	scsi: ufs: Fix a race condition between error handler and runtime PM ops The current IRQ handler blocks SCSI requests before scheduling eh_work, when error handler calls pm_runtime_get_sync, if ufshcd_suspend/resume sends a SCSI cmd, most likely the SSU cmd, since SCSI requests are blocked, pm_runtime_get_sync() will never return because ufshcd_suspend/resume is blocked by the SCSI cmd. - In queuecommand path, hba->ufshcd_state check and ufshcd_send_command should stay under the same spin lock. This is to make sure that no more commands leak into doorbell after hba->ufshcd_state is changed. - Don't block SCSI requests before error handler starts to run, let error handler block SCSI requests when it is ready to start error recovery. - Don't let SCSI layer keep requeuing the SCSI cmds sent from HBA runtime PM ops, let them pass or fail them. Let them pass if eh_work is scheduled due to non-fatal errors. Fail them if eh_work is scheduled due to fatal errors, otherwise the cmds may eventually time out since UFS is in bad state, which gets error handler blocked for too long. If we fail the SCSI cmds sent from HBA runtime PM ops, HBA runtime PM ops fails too, but it does not hurt since error handler can recover HBA runtime PM error. Link: https://lore.kernel.org/r/1596975355-39813-9-git-send-email-cang@codeaurora.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:56 -04:00
Can Guo	c3be8d1ee1	scsi: ufs: Move dumps in IRQ handler to error handler Performing dumps in the IRQ handler causes system stability issues. Move dumps to the error handler and only print basic host registers here. Link: https://lore.kernel.org/r/1596975355-39813-8-git-send-email-cang@codeaurora.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:55 -04:00
Can Guo	c72e79c0ad	scsi: ufs: Recover HBA runtime PM error in error handler The current error handler can not recover HBA runtime PM error if ufshcd_suspend/resume has failed due to UFS errors, e.g. hibern8 enter/exit error or SSU cmd error. When this happens, error handler may fail performing a full reset and restore because error handler always assumes that power, IRQs and clocks are ready after pm_runtime_get_sync returns, but actually they are not if ufshcd_resume fails[1]. If ufschd_suspend/resume fails due to UFS errors, runtime PM framework saves the error value to dev.power.runtime_error. After that, HBA dev runtime suspend/resume would not be invoked anymore unless runtime_error is cleared[2]. In case of ufshcd_suspend/resume fails due to UFS errors, for scenario [1], error handler cannot assume anything of pm_runtime_get_sync, meaning error handler should explicitly turn ON powers, IRQs and clocks again. To get the HBA runtime PM work as regard for scenario [2], error handler can clear the runtime_error by calling pm_runtime_set_active() if full reset and restore succeeds. And, more important, if pm_runtime_set_active() returns no error, which means runtime_error has been cleared, we also need to resume those scsi devices under HBA in case any of them has failed to be resumed due to HBA runtime resume failure. This is to unblock blk_queue_enter in case there are bios waiting inside it. Link: https://lore.kernel.org/r/1596975355-39813-7-git-send-email-cang@codeaurora.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:54 -04:00
Can Guo	4db7a23605	scsi: ufs: Fix concurrency of error handler and other error recovery paths Error recovery can be invoked from multiple code paths, including hibern8 enter/exit (from ufshcd_link_recovery), ufshcd_eh_host_reset_handler() and eh_work scheduled from IRQ context. Ultimately, these paths are all trying to invoke ufshcd_reset_and_restore() in either a synchronous or asynchronous manner. This causes problems: - If link recovery happens during ungate work, ufshcd_hold() would be called recursively. Although commit `53c12d0ef6` ("scsi: ufs: fix error recovery after the hibern8 exit failure") fixed a deadlock due to recursive calls of ufshcd_hold() by adding a check of eh_in_progress into ufshcd_hold, this check allows eh_work to run in parallel while link recovery is running. - Similar concurrency can also happen when error recovery is invoked from ufshcd_eh_host_reset_handler and ufshcd_link_recovery. - Concurrency can even happen between eh_works. eh_work, currently queued on system_wq, is allowed to have multiple instances running in parallel, but we don't have proper protection for that. If any of above concurrency scenarios happen, error recovery would fail and lead ufs device and host into bad states. To fix the concurrency problem, this change queues eh_work on a single threaded workqueue and removes link recovery calls from the hibern8 enter/exit path. In addition, make use of eh_work in eh_host_reset_handler instead of calling ufshcd_reset_and_restore. This unifies the UFS error recovery mechanism. According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs when the link is broken. This essentially applies to any power mode change operations (since they all use PACP_PWR cmds in UniPro layer). So, if a power mode change operation (including AH8 enter/exit) fails, mark link state as UIC_LINK_BROKEN_STATE and schedule the eh_work. In this case, error handler needs to do a full reset and restore to recover the link back to active. Before the link state is recovered to active, ufshcd_uic_pwr_ctrl simply returns -ENOLINK to avoid more errors. Link: https://lore.kernel.org/r/1596975355-39813-6-git-send-email-cang@codeaurora.org Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:54 -04:00
Can Guo	3f8af60447	scsi: ufs: Add some debug information to ufshcd_print_host_state() Information about the last interrupt status and timestamp is helpful when debugging system stability issues (IRQ starvation, for instance). Add this information to ufshcd_print_host_state() output. In addition, UFS device information such as model name and firmware version also comes in handy during debugging. This is printed as well. Link: https://lore.kernel.org/r/1596975355-39813-5-git-send-email-cang@codeaurora.org Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:53 -04:00
Can Guo	423cc66b51	scsi: ufs-qcom: Remove testbus dump in ufs_qcom_dump_dbg_regs Dumping testbus registers outputs a lot of information and can cause stability issues. Remove the dump code. Link: https://lore.kernel.org/r/1596975355-39813-4-git-send-email-cang@codeaurora.org Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:52 -04:00
Can Guo	89dd87acd4	scsi: ufs: ufs-qcom: Fix race conditions caused by ufs_qcom_testbus_config() If ufs_qcom_dump_dbg_regs() calls ufs_qcom_testbus_config() from ufshcd_suspend/resume and/or clk gate/ungate context, pm_runtime_get_sync() and ufshcd_hold() will cause a race condition. Fix this by removing the unnecessary calls of pm_runtime_get_sync() and ufshcd_hold(). Link: https://lore.kernel.org/r/1596975355-39813-3-git-send-email-cang@codeaurora.org Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:51 -04:00
Can Guo	2dec9475a4	scsi: ufs: Add checks before setting clk-gating states Clock gating can be turned on/off selectively which means the associated state information is only correct if the feature is enabled. This change makes sure that we only look at state of clk-gating if it is enabled. Link: https://lore.kernel.org/r/1596975355-39813-2-git-send-email-cang@codeaurora.org Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Hongwu Su <hongwus@codeaurora.org> Reviewed-by: Stanley Chu <stanley.chu@mediatek.com> Reviewed-by: Bean Huo <beanhuo@micron.com> Reviewed-by: Asutosh Das <asutoshd@codeaurora.org> Signed-off-by: Can Guo <cang@codeaurora.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-08-17 20:54:50 -04:00
Linus Torvalds	c9c9735c46	SCSI misc on 20200814 This is the set of patches which arrived too late to stabilise in -next for the first pull. It's really just an lpfc driver update and an assortment of minor fixes, all in drivers. The only core update is to the zone block device driver, which isn't the one most people use. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXza9yiYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYyiAQCUJJ8m vraqyBzjAPWpPsoNZmrvUxciXEDhoLrFo4Nl8wEAoJPVjUd79dXCJHB1Oq+MaeaF LODgv2bO1F4TCf59F+4= =+wdb -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull more SCSI updates from James Bottomley: "This is the set of patches which arrived too late to stabilise in -next for the first pull. It's really just an lpfc driver update and an assortment of minor fixes, all in drivers. The only core update is to the zone block device driver, which isn't the one most people use" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: lpfc: Update lpfc version to 12.8.0.3 scsi: lpfc: Fix LUN loss after cable pull scsi: lpfc: Fix validation of bsg reply lengths scsi: lpfc: Fix retry of PRLI when status indicates its unsupported scsi: lpfc: Fix oops when unloading driver while running mds diags scsi: lpfc: Fix RSCN timeout due to incorrect gidft counter scsi: lpfc: Fix no message shown for lpfc_hdw_queue out of range value scsi: lpfc: Fix FCoE speed reporting scsi: lpfc: Add missing misc_deregister() for lpfc_init() scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetport scsi: scsi_transport_sas: Add spaces around binary operator "\|" scsi: sd_zbc: Improve zone revalidation scsi: libfc: Free skb in fc_disc_gpn_id_resp() for valid cases scsi: fcoe: Memory leak fix in fcoe_sysfs_fcf_del() scsi: target: Make iscsit_register_transport() return void	2020-08-14 16:01:59 -07:00
Linus Torvalds	57b0779392	virtio: fixes, features IRQ bypass support for vdpa and IFC MLX5 vdpa driver Endian-ness fixes for virtio drivers Misc other fixes Signed-off-by: Michael S. Tsirkin <mst@redhat.com> -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAl8yVEwPHG1zdEByZWRo YXQuY29tAAoJECgfDbjSjVRpNPEH/0Dtq1s1V4r/kxtLUoMophv9wuORpWCr98BQ 2aOveTmwTOVdZVOiw2tzTgO9nbWx+cL2HvkU7Aajfpz5hh93Z2VOo2n4a7hBC79f rlc3GXiG+pMk5RfmqGofIHTU+D6ony4D5SXlUDurLdtEwunyuqZwABiWkZjdclZJ bv90IL8Upzbz0rxYr7k3z8UepdOCt7r4QS/o7STHZBjJRyylxmO/R2yTnh6PtpRK Q/z35wJBJ3SKc8X3Fi0VOOSeGNZOiypkkl9ZnLVY5lExNAU1+2MMn2UK119SlCDV MSxb7quYFF4cksXH1g77GMBNi1uADRh1dtFMZdkKhZGljGxKLxo= =6VTZ -----END PGP SIGNATURE----- Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost Pull virtio updates from Michael Tsirkin: - IRQ bypass support for vdpa and IFC - MLX5 vdpa driver - Endianness fixes for virtio drivers - Misc other fixes * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (71 commits) vdpa/mlx5: fix up endian-ness for mtu vdpa: Fix pointer math bug in vdpasim_get_config() vdpa/mlx5: Fix pointer math in mlx5_vdpa_get_config() vdpa/mlx5: fix memory allocation failure checks vdpa/mlx5: Fix uninitialised variable in core/mr.c vdpa_sim: init iommu lock virtio_config: fix up warnings on parisc vdpa/mlx5: Add VDPA driver for supported mlx5 devices vdpa/mlx5: Add shared memory registration code vdpa/mlx5: Add support library for mlx5 VDPA implementation vdpa/mlx5: Add hardware descriptive header file vdpa: Modify get_vq_state() to return error code net/vdpa: Use struct for set/get vq state vdpa: remove hard coded virtq num vdpasim: support batch updating vhost-vdpa: support IOTLB batching hints vhost-vdpa: support get/set backend features vhost: generialize backend features setting/getting vhost-vdpa: refine ioctl pre-processing vDPA: dont change vq irq after DRIVER_OK ...	2020-08-11 14:34:17 -07:00

... 11 12 13 14 15 ...

20603 commits