I have just had a very strange error, a disk just randomly disconnected from the pool - I assumed it had failed and resigned myself to getting it RMA'd and then replaced
I offlined the disk in the GUI and then ran a query on the SMART status of the disk that had dropped out:
This seems to indicate to me there's nothing wrong with my disk
After offlining the disk I was able to wipe the disk through the GUI and then replace the disk with itself and the array is currently resilvering with the disk that dropped out!
It is currently at 57% without any issues
This is the output from the console:
What has gone wrong here? As so far it seems nothing is wrong and the unit is operating
I offlined the disk in the GUI and then ran a query on the SMART status of the disk that had dropped out:
Code:
# smartctl -a /dev/da29 smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p9 amd64] (local build) Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Vendor: SEAGATE Product: ST10000NM0096 Revision: E005 Compliance: SPC-4 User Capacity: 10,000,831,348,736 bytes [10.0 TB] Logical block size: 512 bytes Physical block size: 4096 bytes LU is fully provisioned Rotation Rate: 7200 rpm Form Factor: 3.5 inches Logical Unit id: 0x5000c500ae10409b Serial number: <serial number> Device type: disk Transport protocol: SAS (SPL-3) Local Time is: Tue Jun 9 12:38:44 2020 BST SMART support is: Available - device has SMART capability. SMART support is: Enabled Temperature Warning: Enabled === START OF READ SMART DATA SECTION === SMART Health Status: OK Grown defects during certification <not available> Total blocks reassigned during format <not available> Total new blocks reassigned <not available> Power on minutes since format <not available> Current Drive Temperature: 34 C Drive Trip Temperature: 60 C Manufactured in week 24 of year 2019 Specified cycle count over device lifetime: 50000 Accumulated start-stop cycles: 140 Specified load-unload count over device lifetime: 600000 Accumulated load-unload cycles: 306 Elements in grown defect list: 0 Vendor (Seagate Cache) information Blocks sent to initiator = 50826152 Blocks received from initiator = 1326407200 Blocks read from cache and sent to initiator = 2455860 Number of read and write commands whose size <= segment size = 15160418 Number of read and write commands whose size > segment size = 147550 Vendor (Seagate/Hitachi) factory information number of hours powered up = 2592.82 number of minutes until next internal SMART test = 38 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 6382600 0 0 6382600 0 26.023 0 write: 0 0 0 0 0 2881.274 0 Non-medium error count: 6 [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed - 2576 - [- - -] # 2 Background short Completed - 2552 - [- - -] # 3 Background short Completed - 2528 - [- - -] # 4 Background short Completed - 2504 - [- - -] # 5 Background short Completed - 2480 - [- - -] # 6 Background short Completed - 2456 - [- - -] # 7 Background short Completed - 2432 - [- - -] # 8 Background long Completed - 2424 - [- - -] # 9 Background short Completed - 2408 - [- - -] #10 Background short Completed - 2384 - [- - -] #11 Background short Completed - 2360 - [- - -] #12 Background short Completed - 2336 - [- - -] #13 Background short Completed - 2312 - [- - -] #14 Background short Completed - 2288 - [- - -] #15 Background short Completed - 2264 - [- - -] #16 Background short Completed - 2240 - [- - -] #17 Background short Completed - 2216 - [- - -] #18 Background short Completed - 2192 - [- - -] #19 Background short Completed - 2168 - [- - -] #20 Background short Completed - 2144 - [- - -] Long (extended) Self-test duration: 55333 seconds [922.2 minutes]
This seems to indicate to me there's nothing wrong with my disk
After offlining the disk I was able to wipe the disk through the GUI and then replace the disk with itself and the array is currently resilvering with the disk that dropped out!
It is currently at 57% without any issues
This is the output from the console:
Code:
Jun 9 10:58:24 backup ZFS: vdev state changed, pool_guid=3015116645274777860 vdev_guid=1888851438361380937 Jun 9 10:58:24 backup ZFS: vdev is removed, pool_guid=3015116645274777860 vdev_guid=1888851438361380937 Jun 9 10:58:24 backup mpr0: mprsas_prepare_remove: Sending reset for target ID 19 Jun 9 10:58:24 backup da11 at mpr0 bus 0 scbus12 target 19 lun 0 Jun 9 10:58:24 backup da11: <SEAGATE ST10000NM0096 E005> s/n <serial number> detached Jun 9 10:58:24 backup GEOM_MULTIPATH: da11 in disk12 was disconnected Jun 9 10:58:24 backup GEOM_MULTIPATH: da11 removed from disk12 Jun 9 10:58:24 backup (da11:mpr0:0:19:0): Periph destroyed Jun 9 10:58:24 backup mpr0: clearing target 19 handle 0x0016 Jun 9 10:58:24 backup mpr0: At enclosure level 0, slot 11, connector name ( ) Jun 9 10:58:24 backup mpr0: Unfreezing devq for target ID 19 Jun 9 10:58:24 backup mpr0: mprsas_prepare_remove: Sending reset for target ID 124 Jun 9 10:58:24 backup da29 at mpr0 bus 0 scbus12 target 124 lun 0 Jun 9 10:58:24 backup da29: <SEAGATE ST10000NM0096 E005> s/n <serial number> detached Jun 9 10:58:24 backup mpr0: clearing target 124 handle 0x002c Jun 9 10:58:24 backup mpr0: At enclosure level 1, slot 11, connector name ( ) Jun 9 10:58:24 backup mpr0: Unfreezing devq for target ID 124 Jun 9 10:58:24 backup GEOM_MULTIPATH: da29 in disk12 was disconnected Jun 9 10:58:24 backup GEOM_MULTIPATH: out of providers for disk12 Jun 9 10:58:24 backup GEOM_MULTIPATH: da29 removed from disk12 Jun 9 10:58:24 backup GEOM_MULTIPATH: destroying disk12 Jun 9 10:58:24 backup (da29:mpr0:0:124:0): Periph destroyed Jun 9 10:58:24 backup GEOM_ELI: Device gptid/8695ac43-9388-11ea-b61d-ac1f6bbc06e4.eli destroyed. Jun 9 10:58:24 backup GEOM_MULTIPATH: disk12 destroyed Jun 9 10:58:24 backup GEOM_ELI: Detached gptid/8695ac43-9388-11ea-b61d-ac1f6bbc06e4.eli on last close. Jun 9 11:14:41 backup mpr0: SAS Address from SAS device page0 = 5000c500ae104099 Jun 9 11:14:41 backup mpr0: Found device <401<SspTarg>,End Device> <12.0Gbps> handle<0x0016> enclosureHandle<0x0002> slot 11 Jun 9 11:14:41 backup mpr0: At enclosure level 0 and connector name ( ) Jun 9 11:14:41 backup mpr0: SAS Address from SAS device page0 = 5000c500ae104099 Jun 9 11:14:41 backup mpr0: Found device <401<SspTarg>,End Device> <12.0Gbps> handle<0x002c> enclosureHandle<0x0004> slot 11 Jun 9 11:14:41 backup mpr0: At enclosure level 1 and connector name ( ) Jun 9 11:14:42 backup (probe0:mpr0:0:19:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 741 terminated ioc 804b loginfo 31110e05 scsi 0 state c xfer 0 Jun 9 11:14:42 backup (probe0:mpr0:0:19:0): INQUIRY. CDB: 12 00 00 00 24 00 Jun 9 11:14:42 backup (probe0:mpr0:0:19:0): CAM status: CCB request completed with an error Jun 9 11:14:42 backup (probe0:mpr0:0:19:0): Retrying command Jun 9 11:14:42 backup (probe1:mpr0:0:124:0): INQUIRY. CDB: 12 00 00 00 24 00 length 36 SMID 584 terminated ioc 804b loginfo 31110e05 scsi 0 state c xfer 0 Jun 9 11:14:42 backup (probe1:mpr0:0:124:0): INQUIRY. CDB: 12 00 00 00 24 00 Jun 9 11:14:42 backup (probe1:mpr0:0:124:0): CAM status: CCB request completed with an error Jun 9 11:14:42 backup (probe1:mpr0:0:124:0): Retrying command Jun 9 11:15:02 backup ses5: da11,pass15,da29,pass36 in 'Slot11', SAS Slot: 1 phys at slot 11 Jun 9 11:15:02 backup ses5: phy 0: SAS device type 1 phy 0 Target ( SSP ) Jun 9 11:15:02 backup ses5: phy 0: parent 5003048020986aff addr 5000c500ae104099 Jun 9 11:15:02 backup da11 at mpr0 bus 0 scbus12 target 19 lun 0 Jun 9 11:15:02 backup da11: <SEAGATE ST10000NM0096 E005> Fixed Direct Access SPC-4 SCSI device Jun 9 11:15:02 backup da11: Serial Number <serial number> Jun 9 11:15:02 backup da11: 1200.000MB/s transfers Jun 9 11:15:02 backup da11: Command Queueing enabled Jun 9 11:15:02 backup da11: 9537536MB (19532873728 512 byte sectors) Jun 9 11:15:02 backup da29 at mpr0 bus 0 scbus12 target 124 lun 0 Jun 9 11:15:02 backup da29: <SEAGATE ST10000NM0096 E005> Fixed Direct Access SPC-4 SCSI device Jun 9 11:15:02 backup da29: Serial Number <serial number> Jun 9 11:15:02 backup da29: 1200.000MB/s transfers Jun 9 11:15:02 backup da29: Command Queueing enabled Jun 9 11:15:02 backup da29: 9537536MB (19532873728 512 byte sectors) Jun 9 11:15:02 backup GEOM_MULTIPATH: disk12 created Jun 9 11:15:02 backup GEOM_MULTIPATH: da11 added to disk12 Jun 9 11:15:02 backup GEOM_MULTIPATH: da11 is now active path in disk12 Jun 9 11:15:03 backup GEOM_MULTIPATH: da29 added to disk12
What has gone wrong here? As so far it seems nothing is wrong and the unit is operating