What happened here? Transient issue with one disk, that self-healed after the spare had been switched in?
I got an alert email,
zpool status tells me:
And ten minutes later when it completed, all was back to normal:
Er. What? Do I need to worry about the da7 aka gptid/89c... disk? Twiddle any timeouts? Five hours of resilvering after what looks like a seven second outage seems a bit strong.
/var/log/messages has some info. I can see the obvious dropout and retry but I don't know any deeper interpretation:
I got an alert email,
Code:
New alerts: * The volume DataPool state is ONLINE: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
zpool status tells me:
Code:
pool: DataPool state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Mar 27 12:39:52 2019 16.3T scanned at 992M/s, 16.0T issued at 973M/s, 16.3T total 1.49T resilvered, 98.04% done, 0 days 00:05:44 to go config: NAME STATE READ WRITE CKSUM DataPool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/95bc5b38-45d4-11e9-b26b-a0369f4e18bc ONLINE 0 0 0 gptid/0f0e1314-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 gptid/18382224-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 gptid/243629bc-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 gptid/bb2a79b2-43f7-11e9-a821-a0369f4e18bc ONLINE 0 0 0 gptid/3a1b891f-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 gptid/6d56f648-44b4-11e9-a821-a0369f4e18bc ONLINE 0 0 0 spare-1 ONLINE 0 0 0 gptid/89c921a9-45a7-11e9-83e8-a0369f4e18bc ONLINE 0 0 0 gptid/d5888e7b-45d4-11e9-b26b-a0369f4e18bc ONLINE 0 0 0 gptid/8365c061-44b4-11e9-a821-a0369f4e18bc ONLINE 0 0 0 spares 3049612206338135796 INUSE was /dev/gptid/d5888e7b-45d4-11e9-b26b-a0369f4e18bc gptid/d6aa21af-45d4-11e9-b26b-a0369f4e18bc AVAIL errors: No known data errors
And ten minutes later when it completed, all was back to normal:
Code:
pool: DataPool state: ONLINE scan: resilvered 1.59T in 0 days 05:18:42 with 0 errors on Wed Mar 27 17:58:34 2019 config: NAME STATE READ WRITE CKSUM DataPool ONLINE 0 0 0 raidz1-0 ONLINE 0 0 0 gptid/95bc5b38-45d4-11e9-b26b-a0369f4e18bc ONLINE 0 0 0 gptid/0f0e1314-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 gptid/18382224-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 raidz1-1 ONLINE 0 0 0 gptid/243629bc-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 gptid/bb2a79b2-43f7-11e9-a821-a0369f4e18bc ONLINE 0 0 0 gptid/3a1b891f-3784-11e9-a44b-1866da830726 ONLINE 0 0 0 raidz1-2 ONLINE 0 0 0 gptid/6d56f648-44b4-11e9-a821-a0369f4e18bc ONLINE 0 0 0 gptid/89c921a9-45a7-11e9-83e8-a0369f4e18bc ONLINE 0 0 0 gptid/8365c061-44b4-11e9-a821-a0369f4e18bc ONLINE 0 0 0 spares gptid/d5888e7b-45d4-11e9-b26b-a0369f4e18bc AVAIL gptid/d6aa21af-45d4-11e9-b26b-a0369f4e18bc AVAIL errors: No known data errors
Er. What? Do I need to worry about the da7 aka gptid/89c... disk? Twiddle any timeouts? Five hours of resilvering after what looks like a seven second outage seems a bit strong.
/var/log/messages has some info. I can see the obvious dropout and retry but I don't know any deeper interpretation:
Code:
Mar 27 12:22:31 Sisyphus kernel: ix1: link state changed to UP Mar 27 12:38:25 Sisyphus (da7:mps0:0:27:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 length 0 SMID 602 Aborting command 0xfffffe00015f2620 Mar 27 12:38:25 Sisyphus mps0: Sending reset from mpssas_send_abort for target ID 27 Mar 27 12:38:27 Sisyphus mps0: mpssas_prepare_remove: Sending reset for target ID 27 Mar 27 12:38:27 Sisyphus da7 at mps0 bus 0 scbus0 target 27 lun 0 Mar 27 12:38:27 Sisyphus da7: <ATA ST4000NM0033-9ZM GA6C> s/n Z1Z90VPC detached Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 17 83 18 c8 00 00 00 80 00 00 length 65536 SMID 623 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 17 83 73 48 00 00 00 80 00 00 length 65536 SMID 171 terminated ioc 804b loginfo 31130000 scsi 0 state c xfer 0 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 17 83 18 c8 00 00 00 80 00 00 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 85 f8 40 80 00 00 00 80 00 00 length 65536 SMID 895 terminated ioc 804b l(da7:mps0:0:27:0): CAM status: CCB request completed with an error Mar 27 12:38:28 Sisyphus oginfo 31130000 scsi 0 state c xfer 0 Mar 27 12:38:28 Sisyphus mps0: (da7:mps0:0:27:0): Error 5, Periph was invalidated Mar 27 12:38:28 Sisyphus Unfreezing devq for target ID 27 Mar 27 12:38:28 Sisyphus mps0: Unfreezing devq for target ID 27 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 17 83 73 48 00 00 00 80 00 00 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): CAM status: CCB request completed with an error Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): Error 5, Periph was invalidated Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): READ(16). CDB: 88 00 00 00 00 01 85 f8 40 80 00 00 00 80 00 00 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): CAM status: CCB request completed with an error Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): Error 5, Periph was invalidated Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): SYNCHRONIZE CACHE(10). CDB: 35 00 00 00 00 00 00 00 00 00 Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): CAM status: Command timeout Mar 27 12:38:28 Sisyphus (da7:mps0:0:27:0): Error 5, Periph was invalidated Mar 27 12:38:28 Sisyphus GEOM_MIRROR: Device swap0: provider da7p1 disconnected. Mar 27 12:38:28 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=15465247203013771127 Mar 27 12:38:29 Sisyphus (da7:mps0:0:27:0): Periph destroyed Mar 27 12:38:29 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=15465247203013771127 Mar 27 12:38:29 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=6396531611511845079 Mar 27 12:38:33 Sisyphus mps0: SAS Address for SATA device = 371b835464926847 Mar 27 12:38:33 Sisyphus mps0: SAS Address from SATA device = 371b835464926847 Mar 27 12:38:33 Sisyphus (probe0:mps0:0:27:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 398 terminated ioc 804b loginfo 31170000 scsi 0 state c xfer 0 Mar 27 12:38:33 Sisyphus (probe0:mps0:0:27:0): INQUIRY. CDB: 12 00 00 00 24 00 Mar 27 12:38:33 Sisyphus (probe0:mps0:0:27:0): CAM status: CCB request completed with an error Mar 27 12:38:33 Sisyphus (probe0:mps0:0:27:0): Retrying command Mar 27 12:38:35 Sisyphus ses0: da7,pass8: SAS Device Slot Element: 1 Phys at Slot 2, Not All Phys Mar 27 12:38:35 Sisyphus ses0: phy 0: SATA device Mar 27 12:38:35 Sisyphus ses0: phy 0: parent 50050cc10b3710bf addr 50050cc10b371099 Mar 27 12:38:35 Sisyphus da7 at mps0 bus 0 scbus0 target 27 lun 0 Mar 27 12:38:35 Sisyphus da7: <ATA ST4000NM0033-9ZM GA6C> Fixed Direct Access SPC-4 SCSI device Mar 27 12:38:35 Sisyphus da7: Serial Number Z1Z90VPC Mar 27 12:38:35 Sisyphus da7: 300.000MB/s transfers Mar 27 12:38:35 Sisyphus da7: Command Queueing enabled Mar 27 12:38:35 Sisyphus da7: 3815447MB (7814037168 512 byte sectors) Mar 27 12:38:45 Sisyphus GEOM_ELI: Device mirror/swap0.eli destroyed. Mar 27 12:38:45 Sisyphus GEOM_MIRROR: Device swap0: provider destroyed. Mar 27 12:38:45 Sisyphus GEOM_MIRROR: Device swap0 destroyed. Mar 27 12:38:47 Sisyphus GEOM_MIRROR: Device mirror/swap0 launched (2/2). Mar 27 12:38:48 Sisyphus GEOM_ELI: Device mirror/swap0.eli created. Mar 27 12:38:48 Sisyphus GEOM_ELI: Encryption: AES-XTS 128 Mar 27 12:38:48 Sisyphus GEOM_ELI: Crypto: hardware Mar 27 12:38:49 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=8411539843296667809 Mar 27 12:38:49 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=15465247203013771127 Mar 27 12:38:49 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=3049612206338135796 Mar 27 12:38:49 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=8887614878914963373 ... Mar 27 17:58:35 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=8411539843296667809 Mar 27 17:58:35 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=15465247203013771127 Mar 27 17:58:35 Sisyphus ZFS: vdev state changed, pool_guid=14803431099891158073 vdev_guid=8887614878914963373