Failed HDD causes boot failure?

Status
Not open for further replies.

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
First, let me emphasize that I do not care about recovering my data. I am just trying to learn more about what happened.

I am using FreeNAS-9.3-STABLE-201602020212

Hardware:
  • Intel Core i3-2120
  • 8GB (2x4) DDR3-1333
  • ASUS P8H61-M LE CSM
  • Dell SAS 6/iR
  • Pool Alpha: 4x Seagate 750GB 7200RPM SATA drives in Raidz2
  • Pool Beta: 4x Seagate 400GB 10kRPM SAS drives in Raidz2
So, I recently shut down and relocated this machine, and it seems that (coincidentally) two drives from pool Alpha have failed. During boot, it looked like the init script was trying over and over again to bring those drives online. I figured it would time out eventually and continue, but it was still going after leaving it overnight. I had to physically disconnect the drives from the HBA in order to get the system to boot.

Now, after booting once without the drives, and later reattaching them, I am able to boot just fine. However, zpool import says that the two drives are "missing," and one of them does not even appear in /dev.

So, my questions are about the expected behavior in this situation:
  1. Can FreeNAS normally distinguish between an absent drive and a failed drive?
  2. Is it not unusual for a failed drive to interfere with the system init process?
  3. Is this a hardware problem, i.e., is my HBA doing something funny?
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
  1. Depends.
  2. Not particularly - I've seen a failed hard drive prevent a machine from POSTing at all.
  3. No idea. Did you have regular SMART tests set up, with properly configured email notifications?
 

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
Did you have regular SMART tests set up, with properly configured email notifications?
I did have SMART tests scheduled (weekly), although the surviving drives do not seem to have logged the results of those tests. I did not configure email.

There doesn't seem to be any problem starting tests manually.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I asked because it's unusual for multiple drives to fail simultaneously, and with email notifications configured you would have been warned about the first failure.

Please post, between CODE tags, the output of smartctl -x /dev/daX for each drive X.
 

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
/dev/da0
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357e6d3
Serial number:        3RJ0HS2S
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Feb 23 15:15:20 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     30 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1651383193
  Blocks received from initiator = 4088808182
  Blocks read from cache and sent to initiator = 1120155740
  Number of read and write commands whose size <= segment size = 424198276
  Number of read and write commands whose size > segment size = 58174175

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 52430.17
  number of minutes until next internal SMART test = 20

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3260420586        3         0  3260420589   3260420589     569216.108           0
write:         0        0         0         0          0     120868.222           0
verify: 567144053        0         0  567144053   567144053     106594.624           0

Non-medium error count:     4208

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   52407                 - [-   -    -]
# 2  Background long   Completed                   -       1                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 52430:10 [3145810 minutes]
    Number of background scans performed: 316,  scan progress: 0.00%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e6d1
    attached SAS address = 0x50022190c5175800
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 2
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e6d2
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0



/dev/da1
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357f8c7
Serial number:        3RJ0JDCD
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Feb 23 15:15:30 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     31 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1722350242
  Blocks received from initiator = 3643708106
  Blocks read from cache and sent to initiator = 890986536
  Number of read and write commands whose size <= segment size = 2965123857
  Number of read and write commands whose size > segment size = 60272260

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 53576.20
  number of minutes until next internal SMART test = 19

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   307049579        0         0  307049579   307049579     151465.844           0
write:         0        0         0         0          0      98651.271           0
verify: 159004248        0         0  159004248   159004248     110114.647           0

Non-medium error count:       25

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   53553                 - [-   -    -]
# 2  Background long   Completed                   -       2                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 53576:12 [3214572 minutes]
    Number of background scans performed: 318,  scan progress: 0.00%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357f8c5
    attached SAS address = 0x50022190c5175801
    attached phy identifier = 1
    Invalid DWORD count = 8
    Running disparity error count = 7
    Loss of DWORD synchronization = 6
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357f8c6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0



/dev/da2
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357e9d7
Serial number:        3RJ0HH1J
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Feb 23 15:15:40 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     32 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 3

Vendor (Seagate) cache information
  Blocks sent to initiator = 4210272453
  Blocks received from initiator = 2894555283
  Blocks read from cache and sent to initiator = 4052064319
  Number of read and write commands whose size <= segment size = 2950029901
  Number of read and write commands whose size > segment size = 60272260

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 54190.90
  number of minutes until next internal SMART test = 20

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   684290469        6         0  684290475   684290475     150134.634           0
write:         0        0         5         5          5      98267.720           0
verify: 306616577        2         0  306616579   306616579     110271.628           0

Non-medium error count:   124947

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   54168                 - [-   -    -]
# 2  Background long   Completed                   -       2                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 54190:54 [3251454 minutes]
    Number of background scans performed: 323,  scan progress: 0.00%
    Number of background medium scans performed: 0

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 40000:17  0000000003e6c5df  [1,17,1]   Recovered via rewrite in-place
   2 40170:35  0000000003e6c5df  [1,17,1]   Recovered via rewrite in-place
   3 41021:59  0000000003e6c5df  [1,17,3]   Recovered via rewrite in-place
   4 46476:17  000000000b9d7aa1  [1,17,3]   Successfully reassigned

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e9d5
    attached SAS address = 0x50022190c5175802
    attached phy identifier = 2
    Invalid DWORD count = 8
    Running disparity error count = 6
    Loss of DWORD synchronization = 6
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e9d6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0



/dev/da3
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357fba7
Serial number:        3RJ0JDGG
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Tue Feb 23 15:15:51 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     30 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 123853339
  Blocks received from initiator = 61002980
  Blocks read from cache and sent to initiator = 2568461243
  Number of read and write commands whose size <= segment size = 261347687
  Number of read and write commands whose size > segment size = 58174175

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 52410.70
  number of minutes until next internal SMART test = 20

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   984401217       37         0  984401254   984401254     564065.906           0
write:         0        0         0         0          0     118805.932           0
verify: 1082282375        0         0  1082282375   1082282375     106788.016           0

Non-medium error count:    12696

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   52388                 - [-   -    -]
# 2  Background long   Completed                   -       1                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: waiting until BMS interval timer expires
    Accumulated power on time, hours:minutes 52410:42 [3144642 minutes]
    Number of background scans performed: 316,  scan progress: 0.00%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357fba5
    attached SAS address = 0x50022190c5175803
    attached phy identifier = 3
    Invalid DWORD count = 8
    Running disparity error count = 8
    Loss of DWORD synchronization = 6
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357fba6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0






/dev/da4
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    5QD2GC07
LU WWN Device Id: 5 000c50 0028d7f48
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Feb 23 14:55:25 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  430) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 221) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   103   085   006    Pre-fail  Always       -       31303309
  3 Spin_Up_Time            0x0003   098   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       286
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       1
  7 Seek_Error_Rate         0x000f   065   060   030    Pre-fail  Always       -       171932148957
  9 Power_On_Hours          0x0032   035   035   000    Old_age   Always       -       57787
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       157
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   064   032   045    Old_age   Always   In_the_past 36 (Min/Max 34/41 #358)
194 Temperature_Celsius     0x0022   036   068   000    Old_age   Always       -       36 (0 21 0 0 0)
195 Hardware_ECC_Recovered  0x001a   062   057   000    Old_age   Always       -       231565928
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     57767         -
# 2  Conveyance offline  Completed without error       00%     57764         -
# 3  Extended offline    Completed without error       00%         7         -
# 4  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.



/dev/da5
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    5QD2LNYN
LU WWN Device Id: 5 000c50 0028d982c
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Tue Feb 23 14:55:39 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  430) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 221) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   105   088   006    Pre-fail  Always       -       49225381
  3 Spin_Up_Time            0x0003   098   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       93
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       32
  7 Seek_Error_Rate         0x000f   080   060   030    Pre-fail  Always       -       108296625
  9 Power_On_Hours          0x0032   039   039   000    Old_age   Always       -       53637
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       93
187 Reported_Uncorrect      0x0032   099   099   000    Old_age   Always       -       1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   066   030   045    Old_age   Always   In_the_past 34 (Min/Max 33/38)
194 Temperature_Celsius     0x0022   034   070   000    Old_age   Always       -       34 (0 19 0 0 0)
195 Hardware_ECC_Recovered  0x001a   064   056   000    Old_age   Always       -       190424106
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0000   100   253   000    Old_age   Offline      -       0
202 Data_Address_Mark_Errs  0x0032   100   253   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 1
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 occurred at disk power-on lifetime: 10951 hours (456 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 78 a2 db 40  Error: UNC at LBA = 0x00dba278 = 14393976

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  25 00 80 00 a2 db 40 00      07:58:16.441  READ DMA EXT
  25 00 80 80 a1 db 40 00      07:58:16.440  READ DMA EXT
  25 00 80 00 a1 db 40 00      07:58:16.440  READ DMA EXT
  25 00 80 80 a0 db 40 00      07:58:16.439  READ DMA EXT
  25 00 80 00 a0 db 40 00      07:58:16.436  READ DMA EXT

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     53616         -
# 2  Conveyance offline  Completed without error       00%     53613         -
# 3  Extended offline    Completed without error       00%        14         -
# 4  Short offline       Completed without error       00%         7         -
# 5  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Here's what I see:
  1. Your PERC SAS 6/iR is not in IT mode, which limits the data we can see in the reports for da0-da3.
  2. da4 and da5 appear to be connected via SATA. Please confirm.
  3. Your drives ran for 6 years without any SMART tests being logged.
  4. da4 has 1 bad sector, which is not a big deal.
  5. da5 has 32 bad sectors, which is a too high for comfort.
  6. Two drives are missing, as you stated. Either something got damaged or dislodged during the relocation, or they failed some time in the past and you were not notified due to not having email notifications set up.
All four drives for pool Beta appear to be present, so that pool should be working. Is it?

You said Alpha is RAIDZ2, in which case you should be able to recover it with 2 missing drives. I know you said recovery isn't important, so we could stop right here, but if you want to continue...

You could try forcing the pool to import with zpool import -f.
 
Last edited:

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
Your PERC SAS 6/iR is not in IT mode, which limits the data we can see in the reports for da0-da3.
Note that da4-5 are also connected to the SAS 6/iR. I'm not using any of the motherboard's SATA ports.

I suppose I assumed that since the drives were passed through it must have been in IT mode. Unfortunately, there doesn't seem to be an official source for the firmware anymore (or at least I can't find older LSI stuff on the new Avago website), and I can only find one unofficial link. If those aren't the correct versions for the BSD driver, I'm pretty much out of luck.

da4 and da5 appear to be connected via SATA. Please confirm.
Yes, they are SATA drives.

Your drives ran for 6 years without any SMART tests being logged.
The weekly test should have run 7 or 8 times in the period that I've been using them, so I am a little concerned about that. The previous owner used them in hardware RAID, so otherwise the lack of tests is not really a surprise to me.

All four dries for pool Beta appear to be present, so that pool should be working. Is it?
Yes, which is good news, but makes me reluctant to mess with the firmware on the HBA. I don't have any other hardware compatible with SAS drives, and losing access to that pool would be substantially more inconvenient. I have already successfully replaced one of those drives, and I have a couple more spares on hand. The SAS drive failed much more gracefully; it was marked as "offline" and the pool status was "degraded." I simply followed the normal replacement procedure and everything was fine. Perhaps luck was a factor; it sounds like proactively replacing drives is strongly preferable to allowing them to fail and replacing them after the fact. So, what risk is involved in flashing the firmware? If bricking my HBA is a possibility, I would probably not want to proceed at this point.

You could try forcing the pool to import with zpool import -f.
I did give that a shot, and it did not work. It gave me a status of "faulted." I have already repurposed the drives, so I can't try anything else.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Note that da4-5 are also connected to the SAS 6/iR.
OK, so SAS drives respond differently to smartctl. Unfortunately I have no experience with SAS drives.
The weekly test should have run 7 or 8 times in the period that I've been using them, so I am a little concerned about that.
You need to figure this out or risk running into a similar situation, i.e. undetected disk failure leading to loss of data.
I assumed that since the drives were passed through it must have been in IT mode.
Not necessarily. Please post the output of dmesg (or just attach a debug dump).
there doesn't seem to be an official source for the firmware anymore
I used this to flash my SAS 6/iR:
SAS3081ER.png
what risk is involved in flashing the firmware? If bricking my HBA is a possibility, I would probably not want to proceed at this point.
Well, there are no guarantees, but if you did brick it I would be willing to send you my IT mode SAS 6/iR for the cost of shipping.
It gave me a status of "faulted."
Then my guess is there was at least one fault with the remaining two drives, which would have been the final straw.

What you really need is new hardware. Your system is not server grade and doesn't appear to support ECC RAM.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Don't forget to mention that SAS 6/iR doesn't fully support drives >3TB (only sees something like 2.2 TB), so not much room for future growth...

Suggest you heed advice already given and take the time to acquire proper hardware before your next venture with FreeNas.
 

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
I managed to get the SAS6/iR flashed to IT mode.

For comparison, here's the output of smartctl -x after the flash:

/dev/da0
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357e6d3
Serial number:        3RJ0HS2S
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Feb 24 17:33:30 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     26 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1651494778
  Blocks received from initiator = 4091358526
  Blocks read from cache and sent to initiator = 1120181778
  Number of read and write commands whose size <= segment size = 424270760
  Number of read and write commands whose size > segment size = 58174175

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 52455.03
  number of minutes until next internal SMART test = 52

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   3260438328        3         0  3260438331   3260438331     569216.177           0
write:         0        0         0         0          0     120869.542           0
verify: 567144053        0         0  567144053   567144053     106594.624           0

Non-medium error count:     4208

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   52407                 - [-   -    -]
# 2  Background long   Completed                   -       1                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 52455:03 [3147303 minutes]
    Number of background scans performed: 316,  scan progress: 9.11%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e6d1
    attached SAS address = 0x50022190c5175800
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e6d2
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0




/dev/da1
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357f8c7
Serial number:        3RJ0JDCD
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Feb 24 17:33:40 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     26 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 1722457659
  Blocks received from initiator = 3646264930
  Blocks read from cache and sent to initiator = 891013300
  Number of read and write commands whose size <= segment size = 2965196311
  Number of read and write commands whose size > segment size = 60272260

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 53601.17
  number of minutes until next internal SMART test = 52

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   307054890        0         0  307054890   307054890     151465.907           0
write:         0        0         0         0          0      98652.595           0
verify: 159004248        0         0  159004248   159004248     110114.647           0

Non-medium error count:       25

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   53553                 - [-   -    -]
# 2  Background long   Completed                   -       2                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 53601:10 [3216070 minutes]
    Number of background scans performed: 318,  scan progress: 9.33%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357f8c5
    attached SAS address = 0x50022190c5175801
    attached phy identifier = 1
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357f8c6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0





/dev/da2
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357e9d7
Serial number:        3RJ0HH1J
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Feb 24 17:33:50 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     26 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 3

Vendor (Seagate) cache information
  Blocks sent to initiator = 4210376088
  Blocks received from initiator = 2897113491
  Blocks read from cache and sent to initiator = 4052089939
  Number of read and write commands whose size <= segment size = 2950102190
  Number of read and write commands whose size > segment size = 60272260

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 54215.78
  number of minutes until next internal SMART test = 52

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   684297382        6         0  684297388   684297388     150134.694           0
write:         0        0         5         5          5      98269.044           0
verify: 306616577        2         0  306616579   306616579     110271.628           0

Non-medium error count:   124947

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   54168                 - [-   -    -]
# 2  Background long   Completed                   -       2                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 54215:47 [3252947 minutes]
    Number of background scans performed: 323,  scan progress: 9.17%
    Number of background medium scans performed: 0

   #  when        lba(hex)    [sk,asc,ascq]    reassign_status
   1 40000:17  0000000003e6c5df  [1,17,1]   Recovered via rewrite in-place
   2 40170:35  0000000003e6c5df  [1,17,1]   Recovered via rewrite in-place
   3 41021:59  0000000003e6c5df  [1,17,3]   Recovered via rewrite in-place
   4 46476:17  000000000b9d7aa1  [1,17,3]   Successfully reassigned

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e9d5
    attached SAS address = 0x50022190c5175802
    attached phy identifier = 2
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357e9d6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0




/dev/da3
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Vendor:               SEAGATE
Product:              ST3400755SS
Revision:             NS25
User Capacity:        400,000,000,000 bytes [400 GB]
Logical block size:   512 bytes
Rotation Rate:        10033 rpm
Logical Unit id:      0x5000c5000357fba7
Serial number:        3RJ0JDGG
Device type:          disk
Transport protocol:   SAS (SPL-3)
Local Time is:        Wed Feb 24 17:34:05 2016 CST
SMART support is:     Available - device has SMART capability.
SMART support is:     Enabled
Temperature Warning:  Disabled or Not Supported
Read Cache is:        Enabled
Writeback Cache is:   Disabled

=== START OF READ SMART DATA SECTION ===
SMART Health Status: OK

Current Drive Temperature:     25 C
Drive Trip Temperature:        68 C

Elements in grown defect list: 0

Vendor (Seagate) cache information
  Blocks sent to initiator = 123966676
  Blocks received from initiator = 63556612
  Blocks read from cache and sent to initiator = 2568483031
  Number of read and write commands whose size <= segment size = 261420356
  Number of read and write commands whose size > segment size = 58174175

Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 52435.58
  number of minutes until next internal SMART test = 52

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   984426416       37         0  984426453   984426453     564065.972           0
write:         0        0         0         0          0     118807.254           0
verify: 1082282375        0         0  1082282375   1082282375     106788.016           0

Non-medium error count:    12696

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background long   Completed                   -   52388                 - [-   -    -]
# 2  Background long   Completed                   -       1                 - [-   -    -]
# 3  Background short  Completed                   -       0                 - [-   -    -]

Long (extended) Self Test duration: 5626 seconds [93.8 minutes]

Background scan results log
  Status: scan is active
    Accumulated power on time, hours:minutes 52435:35 [3146135 minutes]
    Number of background scans performed: 316,  scan progress: 9.63%
    Number of background medium scans performed: 0

Protocol Specific port log page for SAS SSP
relative target port id = 1
  generation code = 0
  number of phys = 1
  phy identifier = 0
    attached device type: SAS or SATA device
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=1 stp=1 smp=1
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357fba5
    attached SAS address = 0x50022190c5175803
    attached phy identifier = 3
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0
relative target port id = 2
  generation code = 0
  number of phys = 1
  phy identifier = 1
    attached device type: no device attached
    attached reason: unknown
    reason: unknown
    negotiated logical link rate: phy enabled; 3 Gbps
    attached initiator port: ssp=0 stp=0 smp=0
    attached target port: ssp=0 stp=0 smp=0
    SAS address = 0x5000c5000357fba6
    attached SAS address = 0x0
    attached phy identifier = 0
    Invalid DWORD count = 0
    Running disparity error count = 0
    Loss of DWORD synchronization = 0
    Phy reset problem = 0




/dev/da4
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    5QD2GC07
LU WWN Device Id: 5 000c50 0028d7f48
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Wed Feb 24 17:34:15 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Disabled
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Unexpected SCT status 0xffff (action_code=4, function_code=2)
Wt Cache Reorder: Unknown (SCT Feature Control command failed)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  430) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 221) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   111   085   006    -    35454658
  3 Spin_Up_Time            PO----   095   091   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    289
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    1
  7 Seek_Error_Rate         POSR--   065   060   030    -    171933536023
  9 Power_On_Hours          -O--CK   035   035   000    -    57813
10 Spin_Retry_Count        PO--C-   100   100   097    -    0
12 Power_Cycle_Count       -O--CK   100   100   020    -    160
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   072   032   045    Past 28 (1 102 28 23 0)
194 Temperature_Celsius     -O---K   028   068   000    -    28 (0 21 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   093   057   000    -    49259086
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ------   100   253   000    -    0
202 Data_Address_Mark_Errs  -O--CK   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01       GPL,SL  R/O      1  Summary SMART error log
0x02       GPL,SL  R/O      5  Comprehensive SMART error log
0x03       GPL,SL  R/O      5  Ext. Comprehensive SMART error log
0x06       GPL,SL  R/O      1  SMART self-test log
0x07       GPL,SL  R/O      1  Extended self-test log
0x09       GPL,SL  R/W      1  Selective self-test log
0x10       GPL,SL  R/O      1  NCQ Command Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters
0x20       GPL,SL  R/O      1  Streaming performance log [OBS-8]
0x21       GPL,SL  R/O      1  Write stream error log
0x22       GPL,SL  R/O      1  Read stream error log
0x23       GPL,SL  R/O      1  Delayed sector log [OBS-8]
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0       GPL,SL  VS       1  Device vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL,SL  VS     101  Device vendor specific log
0xa8       GPL,SL  VS      20  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer
0xff       GPL     -    23040  Reserved

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     57767         -
# 2  Conveyance offline  Completed without error       00%     57764         -
# 3  Extended offline    Completed without error       00%         7         -
# 4  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       521 (0x0209)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                 28 Celsius
Power Cycle Max Temperature:         28 Celsius
Lifetime    Max Temperature:         68 Celsius

Another SCT command is executing, abort Read Data Table
(SCT ext_status_code 0xffff, action_code=4, function_code=2)
Read SCT Temperature History failed

Another SCT command is executing, abort Error Recovery Control
(SCT ext_status_code 0xffff, action_code=4, function_code=2)
SCT (Get) Error Recovery Control command failed

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS





Hit character limit, continuing in a new post.
 

carnogaunt

Cadet
Joined
Feb 21, 2016
Messages
6
/dev/da5
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda ES
Device Model:     ST3750640NS
Serial Number:    5QD2LNYN
LU WWN Device Id: 5 000c50 0028d982c
Firmware Version: 3BKH
User Capacity:    750,156,374,016 bytes [750 GB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA/ATAPI-7 (minor revision not indicated)
Local Time is:    Wed Feb 24 17:34:23 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Disabled
APM feature is:   Disabled
Rd look-ahead is: Enabled
Write cache is:   Enabled
ATA Security is:  Disabled, NOT FROZEN [SEC1]
Unexpected SCT status 0xffff (action_code=4, function_code=2)
Wt Cache Reorder: Unknown (SCT Feature Control command failed)

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  430) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 221) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   113   088   006    -    51857894
  3 Spin_Up_Time            PO----   095   091   000    -    0
  4 Start_Stop_Count        -O--CK   100   100   020    -    96
  5 Reallocated_Sector_Ct   PO--CK   100   100   036    -    32
  7 Seek_Error_Rate         POSR--   080   060   030    -    109682886
  9 Power_On_Hours          -O--CK   039   039   000    -    53662
10 Spin_Retry_Count        PO--C-   100   100   097    -    0
12 Power_Cycle_Count       -O--CK   100   100   020    -    96
187 Reported_Uncorrect      -O--CK   099   099   000    -    1
189 High_Fly_Writes         -O-RCK   100   100   000    -    0
190 Airflow_Temperature_Cel -O---K   071   030   045    Past 29 (Min/Max 25/29)
194 Temperature_Celsius     -O---K   029   070   000    -    29 (0 19 0 0 0)
195 Hardware_ECC_Recovered  -O-RC-   086   056   000    -    12821017
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
200 Multi_Zone_Error_Rate   ------   100   253   000    -    0
202 Data_Address_Mark_Errs  -O--CK   100   253   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01       GPL,SL  R/O      1  Summary SMART error log
0x02       GPL,SL  R/O      5  Comprehensive SMART error log
0x03       GPL,SL  R/O      5  Ext. Comprehensive SMART error log
0x06       GPL,SL  R/O      1  SMART self-test log
0x07       GPL,SL  R/O      1  Extended self-test log
0x09       GPL,SL  R/W      1  Selective self-test log
0x10       GPL,SL  R/O      1  NCQ Command Error log
0x11       GPL,SL  R/O      1  SATA Phy Event Counters
0x20       GPL,SL  R/O      1  Streaming performance log [OBS-8]
0x21       GPL,SL  R/O      1  Write stream error log
0x22       GPL,SL  R/O      1  Read stream error log
0x23       GPL,SL  R/O      1  Delayed sector log [OBS-8]
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa0       GPL,SL  VS       1  Device vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL,SL  VS     101  Device vendor specific log
0xa8       GPL,SL  VS      20  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer
0xff       GPL     -    23040  Reserved

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
Device Error Count: 1
    CR     = Command Register
    FEATR  = Features Register
    COUNT  = Count (was: Sector Count) Register
    LBA_48 = Upper bytes of LBA High/Mid/Low Registers ]  ATA-8
    LH     = LBA High (was: Cylinder High) Register    ]   LBA
    LM     = LBA Mid (was: Cylinder Low) Register      ] Register
    LL     = LBA Low (was: Sector Number) Register     ]
    DV     = Device (was: Device/Head) Register
    DC     = Device Control Register
    ER     = Error register
    ST     = Status register
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 1 [0] occurred at disk power-on lifetime: 10951 hours (456 days + 7 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER -- ST COUNT  LBA_48  LH LM LL DV DC
  -- -- -- == -- == == == -- -- -- -- --
  40 -- 51 00 00 00 21 a2 78 db 78 40 00  Error: UNC at LBA = 0x21a278db78 = 144459750264

  Commands leading to the command that caused the error were:
  CR FEATR COUNT  LBA_48  LH LM LL DV DC  Powered_Up_Time  Command/Feature_Name
  -- == -- == -- == == == -- -- -- -- --  ---------------  --------------------
  25 00 00 00 80 00 21 a2 00 db 00 40 00     07:58:16.441  READ DMA EXT
  25 00 00 00 80 00 21 a1 00 db 80 40 00     07:58:16.440  READ DMA EXT
  25 00 00 00 80 00 21 a1 00 db 00 40 00     07:58:16.440  READ DMA EXT
  25 00 00 00 80 00 21 a0 00 db 80 40 00     07:58:16.439  READ DMA EXT
  25 00 00 00 80 00 21 a0 00 db 00 40 00     07:58:16.436  READ DMA EXT

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     53616         -
# 2  Conveyance offline  Completed without error       00%     53613         -
# 3  Extended offline    Completed without error       00%        14         -
# 4  Short offline       Completed without error       00%         7         -
# 5  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  2
SCT Version (vendor specific):       521 (0x0209)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                 29 Celsius
Power Cycle Max Temperature:         29 Celsius
Lifetime    Max Temperature:         68 Celsius

Another SCT command is executing, abort Read Data Table
(SCT ext_status_code 0xffff, action_code=4, function_code=2)
Read SCT Temperature History failed

Another SCT command is executing, abort Error Recovery Control
(SCT ext_status_code 0xffff, action_code=4, function_code=2)
SCT (Get) Error Recovery Control command failed

Device Statistics (GP Log 0x04) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            1  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS




What you really need is new hardware.
I wholeheartedly agree.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
Looks like da2 also has at least one bad sector.

Recapping, it looks like you lost pool Alpha because two drives failed and one of the remaining two is also failing (most likely da5), with no email notifications to warn you along the way.
I wholeheartedly agree.
:cool:
 
Status
Not open for further replies.
Top