Critical: the Volume state is DEGRADED

Status
Not open for further replies.

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
Relative newb here, having some issues with my Freenas box.
i get a alert on the Alert System that reads: Critical: The volume Hell state is DEGRADED: one or more devices has experienced and error resulting in data corruption. Applications ma bye affected.

i have run zpool status -v and zpool scrup -v Hell (hell is my main volume) with varying results.
at first, the error went away after a scrub and reboot. re did a scrub and now there are way more errors in the files even after a reboot.

any help with this would be awesome!
 

Attachments

  • FreeNAS errors.PNG
    FreeNAS errors.PNG
    24.6 KB · Views: 482

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The errors will always go away after a reboot. As to what's causing them, hard to say with the very little information you've provided. FreeNAS version and full hardware would help. But it seems unlikely that all four of your disks would have died at the same time. There's a helpful resource on hard drive troubleshooting you should check out.
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
sorry, this is my first time posting.
FreeNAS version 11.1-U6 on 32gig flash drive
8gigs of ram
4 2TB IronWorlf NAS Drives on raidz2

all my data is accessible and i am able to open the files. even the files that zpool status -v Hell show as "degraded"
at first it was showing the errors in the Snapshots, i destroyed the snapshots and now its showing the faults in the actual data on my disks. sorry if i am not giving you all the info, im very new to all of this.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Motherboard? CPU? Power supply? Disk controller?

It's not very common for more than one disk (much less four disks) to be showing issues at the same time, so the issue is more likely with something that's common to all of them. But let's get some information about your disks. Run smartctl -x /dev/ada0, and post the output here in code tags (see the forum rules for how to use code tags). You'll find this much easier if you enable SSH and connect using an SSH client rather than the shell button in the GUI. See this page for information on setting up SSH, though you don't need to bother with the public key stuff.
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
4 x IPSG SEAGATE IRONWOLF 2TB HD
Core i3-8100 Coffee Lake 3.6 GHz LGA 1151
H370M-ITX/ac LGA 1151 mITX Intel Motherboard
power supply Corsair RM650X modular
no disk controller? drives at attached to motherboard via Sata6gig ports.
Another thing that has been happening is the nas will reboot at random times. i have no idea why it does this. it was suggested to me to replace the psu, which i did and am testing it now.


Code:
root@freenas:~ # smartctl -x /dev/ada0
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST2000VN004-2E4164
Serial Number:    Z526WWJK
LU WWN Device Id: 5 000c50 0b33adb41
Firmware Version: SC60
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 23 19:07:58 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is:   Unavailable
APM level is:     1 (minimum power consumption with standby)
Rd look-ahead is: Enabled
Write cache is:   Enabled
DSN feature is:   Unavailable
ATA Security is:  Disabled, frozen [SEC2]
Wt Cache Reorder: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 262) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     POSR--   117   099   006    -    159753592
  3 Spin_Up_Time            PO----   098   096   000    -    0
  4 Start_Stop_Count        -O--CK   093   093   020    -    7850
  5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
  7 Seek_Error_Rate         POSR--   063   060   030    -    2553754
  9 Power_On_Hours          -O--CK   100   100   000    -    696
10 Spin_Retry_Count        PO--C-   100   100   097    -    0
12 Power_Cycle_Count       -O--CK   100   100   020    -    30
184 End-to-End_Error        -O--CK   100   100   099    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
188 Command_Timeout         -O--CK   100   100   000    -    0
189 High_Fly_Writes         -O-RCK   097   097   000    -    3
190 Airflow_Temperature_Cel -O---K   071   063   045    -    29 (Min/Max 27/29)
191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
192 Power-Off_Retract_Count -O--CK   100   100   000    -    0
193 Load_Cycle_Count        -O--CK   083   083   000    -    35806
194 Temperature_Celsius     -O---K   029   040   000    -    29 (0 19 0 0 0)
197 Current_Pending_Sector  -O--C-   100   100   000    -    0
198 Offline_Uncorrectable   ----C-   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

General Purpose Log Directory Version 1
SMART           Log Directory Version 1 [multi-sector log support]
Address    Access  R/W   Size  Description
0x00       GPL,SL  R/O      1  Log Directory
0x01           SL  R/O      1  Summary SMART error log
0x02           SL  R/O      5  Comprehensive SMART error log
0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
0x06           SL  R/O      1  SMART self-test log
0x07       GPL     R/O      1  Extended self-test log
0x09           SL  R/W      1  Selective self-test log
0x10       GPL     R/O      1  NCQ Command Error log
0x11       GPL     R/O      1  SATA Phy Event Counters log
0x15       GPL     R/W      1  Rebuild Assist log
0x21       GPL     R/O      1  Write stream error log
0x22       GPL     R/O      1  Read stream error log
0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
0xa1       GPL,SL  VS      20  Device vendor specific log
0xa2       GPL     VS    4496  Device vendor specific log
0xa8       GPL,SL  VS     129  Device vendor specific log
0xa9       GPL,SL  VS       1  Device vendor specific log
0xab       GPL     VS       1  Device vendor specific log
0xb0       GPL     VS    5176  Device vendor specific log
0xbe-0xbf  GPL     VS   65535  Device vendor specific log
0xc0       GPL,SL  VS       1  Device vendor specific log
0xc1       GPL,SL  VS      10  Device vendor specific log
0xc3       GPL,SL  VS       8  Device vendor specific log
0xc4       GPL,SL  VS       5  Device vendor specific log
0xd1       GPL,SL  VS       8  Device vendor specific log
0xe0       GPL,SL  R/W      1  SCT Command/Status
0xe1       GPL,SL  R/W      1  SCT Data Transfer

SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
No Errors Logged

SMART Extended Self-test Log Version: 1 (1 sectors)
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%       676         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

SCT Status Version:                  3
SCT Version (vendor specific):       522 (0x020a)
SCT Support Level:                   1
Device State:                        Active (0)
Current Temperature:                    27 Celsius
Power Cycle Min/Max Temperature:     27/27 Celsius
Lifetime    Min/Max Temperature:     19/37 Celsius
Under/Over Temperature Limit Count:   0/0

SCT Temperature History Version:     2
Temperature Sampling Period:         1 minute
Temperature Logging Interval:        94 minutes
Min/Max recommended Temperature:      1/61 Celsius
Min/Max Temperature Limit:            2/60 Celsius
Temperature History Size (Index):    128 (74)

Index    Estimated Time   Temperature Celsius
  75    2018-12-15 11:38    29  **********
  76    2018-12-15 13:12    29  **********
  77    2018-12-15 14:46    29  **********
  78    2018-12-15 16:20    31  ************
  79    2018-12-15 17:54    31  ************
  80    2018-12-15 19:28    30  ***********
  81    2018-12-15 21:02    31  ************
  82    2018-12-15 22:36    31  ************
  83    2018-12-16 00:10     ?  -
  84    2018-12-16 01:44    31  ************
  85    2018-12-16 03:18    32  *************
  86    2018-12-16 04:52    35  ****************
  87    2018-12-16 06:26    34  ***************
  88    2018-12-16 08:00    33  **************
...    ..(  5 skipped).    ..  **************
  94    2018-12-16 17:24    33  **************
  95    2018-12-16 18:58    36  *****************
  96    2018-12-16 20:32    37  ******************
  97    2018-12-16 22:06    34  ***************
  98    2018-12-16 23:40    33  **************
...    ..(  4 skipped).    ..  **************
103    2018-12-17 07:30    33  **************
104    2018-12-17 09:04    32  *************
105    2018-12-17 10:38    32  *************
106    2018-12-17 12:12    33  **************
107    2018-12-17 13:46    33  **************
108    2018-12-17 15:20    32  *************
...    ..(  2 skipped).    ..  *************
111    2018-12-17 20:02    32  *************
112    2018-12-17 21:36    33  **************
...    ..(  6 skipped).    ..  **************
119    2018-12-18 08:34    33  **************
120    2018-12-18 10:08    32  *************
121    2018-12-18 11:42    34  ***************
122    2018-12-18 13:16     ?  -
123    2018-12-18 14:50    23  ****
124    2018-12-18 16:24    33  **************
125    2018-12-18 17:58    32  *************
126    2018-12-18 19:32    33  **************
127    2018-12-18 21:06    33  **************
   0    2018-12-18 22:40    33  **************
   1    2018-12-19 00:14    34  ***************
   2    2018-12-19 01:48    32  *************
...    ..(  2 skipped).    ..  *************
   5    2018-12-19 06:30    32  *************
   6    2018-12-19 08:04    31  ************
   7    2018-12-19 09:38    32  *************
...    ..(  3 skipped).    ..  *************
  11    2018-12-19 15:54    32  *************
  12    2018-12-19 17:28    33  **************
  13    2018-12-19 19:02    33  **************
  14    2018-12-19 20:36    32  *************
...    ..(  3 skipped).    ..  *************
  18    2018-12-20 02:52    32  *************
  19    2018-12-20 04:26    33  **************
  20    2018-12-20 06:00    32  *************
...    ..(  2 skipped).    ..  *************
  23    2018-12-20 10:42    32  *************
  24    2018-12-20 12:16    31  ************
  25    2018-12-20 13:50    31  ************
  26    2018-12-20 15:24    31  ************
  27    2018-12-20 16:58    32  *************
...    ..( 11 skipped).    ..  *************
  39    2018-12-21 11:46    32  *************
  40    2018-12-21 13:20    31  ************
  41    2018-12-21 14:54    32  *************
  42    2018-12-21 16:28    31  ************
  43    2018-12-21 18:02    31  ************
  44    2018-12-21 19:36    31  ************
  45    2018-12-21 21:10    32  *************
  46    2018-12-21 22:44    31  ************
...    ..(  4 skipped).    ..  ************
  51    2018-12-22 06:34    31  ************
  52    2018-12-22 08:08    32  *************
  53    2018-12-22 09:42    32  *************
  54    2018-12-22 11:16    35  ****************
...    ..(  2 skipped).    ..  ****************
  57    2018-12-22 15:58    35  ****************
  58    2018-12-22 17:32    36  *****************
  59    2018-12-22 19:06     ?  -
  60    2018-12-22 20:40    33  **************
  61    2018-12-22 22:14    34  ***************
  62    2018-12-22 23:48    32  *************
  63    2018-12-23 01:22    31  ************
...    ..(  4 skipped).    ..  ************
  68    2018-12-23 09:12    31  ************
  69    2018-12-23 10:46    30  ***********
  70    2018-12-23 12:20    31  ************
  71    2018-12-23 13:54    32  *************
  72    2018-12-23 15:28    32  *************
  73    2018-12-23 17:02     ?  -
  74    2018-12-23 18:36    27  ********

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

Device Statistics (GP/SMART Log 0x04) not supported

Pending Defects log (GP Log 0x0c) not supported

SATA Phy Event Counters (GP Log 0x11)
ID      Size     Value  Description
0x000a  2            4  Device-to-host register FISes sent due to a COMRESET
0x0001  2            0  Command failed due to ICRC error
0x0003  2            0  R_ERR response for device-to-host data FIS
0x0004  2            0  R_ERR response for host-to-device data FIS
0x0006  2            0  R_ERR response for device-to-host non-data FIS
0x0007  2            0  R_ERR response for host-to-device non-data FIS
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
here is the smartctl for all drives.

Code:
root@freenas:~ # bash -c 'for i in `sysctl -n kern.disks` ; do smartctl -a /dev/$i ; done'
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST2000VN004-2E4164
Serial Number:    Z526R3PM
LU WWN Device Id: 5 000c50 0b325a460
Firmware Version: SC60
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 23 19:15:10 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   97) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 252) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   100   006    Pre-fail  Always       -       200337064
  3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7863
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2583626
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       696
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   070   062   045    Old_age   Always       -       30 (Min/Max 28/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   083   083   000    Old_age   Always       -       35765
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%       676         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST2000VN004-2E4164
Serial Number:    Z526T12Z
LU WWN Device Id: 5 000c50 0b3258bad
Firmware Version: SC60
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 23 19:15:11 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 260) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       173820192
  3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7856
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2401474
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       696
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   059   045    Old_age   Always       -       32 (Min/Max 30/33)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   083   083   000    Old_age   Always       -       35783
194 Temperature_Celsius     0x0022   032   041   000    Old_age   Always       -       32 (0 20 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%       676         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST2000VN004-2E4164
Serial Number:    Z526PL3S
LU WWN Device Id: 5 000c50 0b30b3fd8
Firmware Version: SC60
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 23 19:15:11 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 263) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       147526752
  3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7860
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2466895
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       696
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   099   099   000    Old_age   Always       -       1
190 Airflow_Temperature_Cel 0x0022   068   060   045    Old_age   Always       -       32 (Min/Max 29/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   083   083   000    Old_age   Always       -       35796
194 Temperature_Celsius     0x0022   032   040   000    Old_age   Always       -       32 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%       676         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate IronWolf
Device Model:     ST2000VN004-2E4164
Serial Number:    Z526WWJK
LU WWN Device Id: 5 000c50 0b33adb41
Firmware Version: SC60
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Dec 23 19:15:11 2018 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  107) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test routine
recommended polling time:        ( 262) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x10bd) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       159762792
  3 Spin_Up_Time            0x0003   098   096   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   093   093   020    Old_age   Always       -       7850
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   063   060   030    Pre-fail  Always       -       2554137
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       696
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       30
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   097   097   000    Old_age   Always       -       3
190 Airflow_Temperature_Cel 0x0022   070   063   045    Old_age   Always       -       30 (Min/Max 27/30)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       0
193 Load_Cycle_Count        0x0032   083   083   000    Old_age   Always       -       35819
194 Temperature_Celsius     0x0022   030   040   000    Old_age   Always       -       30 (0 19 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Interrupted (host reset)      70%       676         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/da0: Unknown USB bridge [0x13fe:0x5500 (0x110)]
Please specify device type with the -d option.

Use smartctl -h to get a usage summary

 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
OK, nothing obviously failing there, though a couple of issues:
  • The load cycle count is already really high--not sure how you'd address that with a Seagate drive though (with a WD, you'd want to look around for WDIDLE, but I don't expect it will work with a Seagate).
  • This disk has never completed a SMART self-test--you should have these running on a regular schedule. Short tests should run at least every few days; long tests at least every few weeks.
The motherboard isn't at all something we'd recommend (though at least it has Intel NICs), but as long as the disks aren't running through a Marvell controller chip I wouldn't expect it to be the source of the problem. Are the disks connected directly to the motherboard?
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
This is a learning project for me, so far all I’ve been doing is setup and troubleshooting issues. This one so far has stumped me.
All 4 disks are connected directly to the motherboard.
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
i was able to run a memtest, resuts came back fine, no errors. i was also able to do smartctl long tests on all the drives as well and they came back with no errors. however, system reset as of 9 min ago. i saw a message on the log at the bottom of the GUI saying something like "configuration reset requested" then i re logged in and checked the uptime and system had rebooted.
not sure what else i can do! this is getting very frustrating :(
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
Memtest is unfortunately the escape goat in that regards. I think mainly people beleive Memtest is the way to go in validating RAM errors, and it might be, but I think Memtest is only showing when memory is failing in area where memory isn't capable beyond it's limit whithout a mean of sanity check, which ECC RAM obviously is.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So lets take a step back and regroup so we are all on the same page and then I will provide my advice...

1. You have a system running FreeNAS 11.1-U6.
2. You have 8GB Ram and four 2TB Iron Wolf drives.
3. Your system appears to reboot randomly.

Questions:
1. Do you have any jails setup? If yes, what are they?
2. What Make/Model of USB Flash drive are you using?
3. Backup any data you have, odds are the data could be wiped out or corrupt before we fix this.

What I'd recommend you do:
1. Run MemTest86 on your system for 2 solid days non-stop.
2. Run a CPU Burn-In test on the system for 3 hours. Odds are if this is an issue I think it would fail within the first hour.
3. If and only if these first two tests pass then continue, if they do not pass then you have a hardware issue that must be fixed.
4. Re-install FreeNAS 11.1-U4 on a clean USB Flash Drive (I say U4 because I've heard of issues for a few people with U6). Do not install jails. Set up standard routines such as SMART testing, your internet, etc... Create shares but just leave the jails alone. Run this configuration for a while and see if the problems continue. If they stop then maybe the problem was a jail or maybe U6. You will need to figure that out.
5. If the problems persist with U4 then you are going to need to test the hard drive interface I think.

What bugs me:
So something is bugging me, your load cycle counts are very high. I own 6TB Iron Wolf drives and they have been running for 9030 hours and have a load cycle count of 505 and all of mine are power off related. Something is up with yours. You didn't set up values to try and sleep the drives did you? You should look into this.

Got to run, the family is calling me to open Christmas Gifts. Good Luck!
 

matthewviking1

Dabbler
Joined
Dec 22, 2018
Messages
10
Joe, thanks for the advice. i will work on these items this coming weekend as i have to work this week.
so far what i have done;
deleted all the jails and the shares that were causing the issues at first. only shares that are currently online are some test shares that have no data. all my data is backed up and safe.
1 memtest, took about 2 hours no errors.
smartctl on all drives, no errors.
i updated bios to latest version.
so far the system has been running for 22 hours and 48 min with no reboots. i am hopeful!
ill run through your suggested tests this weekend and see what happens! ill update you with my progress as i get working on the items listed! thanks!!!
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I hope you find out that your system is stable (memtest and cpu stress test passes extended running). If you find out that FreeNAS 11.1-U4 seems to be running fine then I'd stick with that for at least 1 month just to get some real time on it. If after that then you could jump up to 11.1-U6 and see if anything bad happens, again give it at least 1 month. Troubleshooting intermittent problems can take a long time to isolate the issue. With any luck the problem just never shows up again and then maybe you could say that it must have been a corrupt installation.

Keep us updated on the results.

Have a Happy New Year!
 
Status
Not open for further replies.
Top