run_interrupt_driven_hooks: still waiting after 60 seconds for xpt_config

mactoserver

Dabbler
Joined
Dec 23, 2019
Messages
12
Swapped HBAs and removed the IOM card to no avail. Removing one QSFP cable to see if that helps.
 

mactoserver

Dabbler
Joined
Dec 23, 2019
Messages
12
So I booted up, one cable left in just to see what would happen and got in. Now adding drives one-by-one and getting interesting logs:


Code:
Dec 23 17:46:10 san1 da0 at pmspcbsd0 bus 0 scbus1 target 1 lun 0
Dec 23 17:46:10 san1 da0: <HGST HUH728080AL4200 A7D8> Fixed Direct Access SPC-4 SCSI device
Dec 23 17:46:10 san1 da0: Serial Number 2EG6PK2P
Dec 23 17:46:10 san1 da0: 600.000MB/s transfers
Dec 23 17:46:10 san1 da0: Command Queueing enabled
Dec 23 17:46:10 san1 da0: Attempt to query device size failed: NOT READY, Logical unit not ready, notify
Dec 23 17:46:14 san1 ses1: da0,pass6: SAS Device Slot Element: 1 Phys at Slot 0
Dec 23 17:46:14 san1 ses1:  phy 0: SAS device type 1 id 1
Dec 23 17:46:14 san1 ses1:  phy 0: protocols: Initiator( None ) Target( SSP )
Dec 23 17:46:14 san1 ses1:  phy 0: parent 500a0980025eb03f addr 5000cca23b0c2c82
Dec 23 17:46:14 san1 da1 at pmspcbsd0 bus 0 scbus1 target 2 lun 0
Dec 23 17:46:14 san1 da1: <HGST HUH728080AL4200 A7D8> Fixed Direct Access SPC-4 SCSI device
Dec 23 17:46:14 san1 da1: Serial Number 2EG5XJHR
Dec 23 17:46:14 san1 da1: 600.000MB/s transfers
Dec 23 17:46:14 san1 da1: Command Queueing enabled
Dec 23 17:46:14 san1 da1: Attempt to query device size failed: NOT READY, Logical unit not ready, notify
Dec 23 17:46:18 san1 ses1: da1,pass7: SAS Device Slot Element: 1 Phys at Slot 1
Dec 23 17:46:18 san1 ses1:  phy 0: SAS device type 1 id 1
Dec 23 17:46:18 san1 ses1:  phy 0: protocols: Initiator( None ) Target( SSP )
Dec 23 17:46:18 san1 ses1:  phy 0: parent 500a0980025eb03f addr 5000cca23b0ac3da
Dec 23 17:46:19 san1 da2 at pmspcbsd0 bus 0 scbus1 target 3 lun 0
Dec 23 17:46:19 san1 da2: <HGST HUH728080AL4200 A7D8> Fixed Direct Access SPC-4 SCSI device
Dec 23 17:46:19 san1 da2: Serial Number 2EG5RWHR
Dec 23 17:46:19 san1 da2: 600.000MB/s transfers
Dec 23 17:46:19 san1 da2: Command Queueing enabled
Dec 23 17:46:19 san1 da2: Attempt to query device size failed: NOT READY, Logical unit not ready, notify
Dec 23 17:46:19 san1 devd: notify_clients: send() failed; dropping unresponsive client
Dec 23 17:46:21 san1 ses1: da2,pass8: SAS Device Slot Element: 1 Phys at Slot 2
Dec 23 17:46:21 san1 ses1:  phy 0: SAS device type 1 id 1
Dec 23 17:46:21 san1 ses1:  phy 0: protocols: Initiator( None ) Target( SSP )
Dec 23 17:46:21 san1 ses1:  phy 0: parent 500a0980025eb03f addr 5000cca23b0a6f16
Dec 23 17:46:21 san1 da3 at pmspcbsd0 bus 0 scbus1 target 4 lun 0
Dec 23 17:46:21 san1 da3: <HGST HUH728080AL4200 A7D8> Fixed Direct Access SPC-4 SCSI device
Dec 23 17:46:21 san1 da3: Serial Number 2EG6PR1P
Dec 23 17:46:21 san1 da3: 600.000MB/s transfers
Dec 23 17:46:21 san1 da3: Command Queueing enabled
Dec 23 17:46:21 san1 da3: Attempt to query device size failed: NOT READY, Logical unit not ready, notify
Dec 23 17:46:32 san1 da0 at pmspcbsd0 bus 0 scbus1 target 1 lun 0
Dec 23 17:46:32 san1 da0: <HGST HUH728080AL4200 A7D8> s/n 2EG6PK2P detached
Dec 23 17:46:32 san1 da1 at pmspcbsd0 bus 0 scbus1 target 2 lun 0
Dec 23 17:46:32 san1 da1: <HGST HUH728080AL4200 A7D8> s/n 2EG5XJHR detached
Dec 23 17:46:32 san1 (da0:pmspcbsd0:0:1:0): Periph destroyed
Dec 23 17:46:32 san1 (da1:pmspcbsd0:0:2:0): Periph destroyed



 

mactoserver

Dabbler
Joined
Dec 23, 2019
Messages
12
After much (much) fighting, I finally ended up doing the following procedure:

  1. Boot the system with just one of the QSFP cables connected and all the HDDs disconnected
  2. Once booted, one-drive-at-a-time doing the following, and do NOT do anything else:
    1. Insert drive into shelf
    2. While watching logs, wait for initial recognition by FreeNAS
    3. Once recognized, wait for FreeNAS to finish probing the drive (will show the drive size in the logs) - takes about a minute
    4. If the drive never makes it to 2.3 then the drive is irrecoverable (at least as for as I can tell). Remove and go to the next.
    5. Now run `zpool import` to see what drives are recognized. If it takes more than a couple minutes to get an answer, the newly inserted drive is bad as well. Remove and you will likely have to start-over by rebooting, and inserting one at a time, skipping the bad drives you already identified.
  3. Now, `zpool import {poolname}`
  4. Mine threw an error so I had to do `zpool import -F {poolname}` then `zpool scrub {poolname}`
  5. Much of the drives were corrupted so I am archiving data, and going to reconstruct the entire project without the Netapp system entirely.

Thanks for the help, much apppreciated! :)

P.S. Merry Christmas!
 
Top