Degraded Pool

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Sorry, I am a noob with FreeNas. I have a brand new pool that is showing Degraded. I'm not sure what drive is bad; ada1, ada2, or ada3. I did try and run the smartctl -t /dev/ada(1-3), but it did not return any results. The screen shot is the only results it returned (after about an hour of waiting for all 3.

1567713593791.png


Here is what my Pool status shows.
1567713986932.png

could it be possible that the drives have already gone bad after a couple of weeks?

Freenas-11.2-Stable
CPU:amd phenom 840 x4
Motherboard: Asus
M4A88TD-V EVO/USB3
Ram: 12 gb
Hard Drive:
WD80EMAZ 8tb (X2)
Seagate
ST8000DM004 8tb
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
The smartctl command only runs the test; it doesn't tell you the results. Use smartctl -a /dev/ada1 and review the results. Ideally run this command via a remote shell (enable SSH from your Services panel, if you need a Windows client try PuTTy) so that the text can be pasted here inside of [ CODE ] blocks.

Given the zpool status though, you seem to have two degraded drives in a RAIDZ1. Data corruption is possible at this point, but you may have gotten lucky and the checksum errors on ada1 might have valid copies on ada2+3, and similarly the ada2 errors might be covered by ada1+3.

ST8000DM004
That's a shingled drive. Is this ada1 by any chance? Either way, strong suggestion to shuck another WD 8TB external and use it to replace.
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Thank you @HoneyBadger ! I'm not sure what happened, but now, after a restart, all my folders/files are gone off that pool. Yes, the seagate is ada1.

Here is the code from the -a command for ada1.

Code:
[# smartctl -a /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     ST8000DM004-2CX188
Serial Number:    ZCT0XWQQ
LU WWN Device Id: 5 000c50 0b5ea9243
Firmware Version: 0001
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5425 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 5
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Thu Sep  5 15:07:32 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x73) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   1) minutes.
Extended self-test
routine
recommended polling time:        ( 984) minutes.
Conveyance self-test routine
recommended polling time:        (   2) minutes.
SCT capabilities:              (0x30a5) SCT Status supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   100   064   006    Pre-fail  Always       -       493632
  3 Spin_Up_Time            0x0003   097   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   099   099   020    Old_age   Always       -       1903
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   075   060   045    Pre-fail  Always       -       33720227
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       745 (91 187 0)
10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   099   099   020    Old_age   Always       -       1903
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   068   064   040    Old_age   Always       -       32 (Min/Max 28/32)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       1920
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1928
194 Temperature_Celsius     0x0022   032   040   000    Old_age   Always       -       32 (0 25 0 0 0)
195 Hardware_ECC_Recovered  0x001a   100   064   000    Old_age   Always       -       493632
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       726 (100 243 0)
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       6147050885
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       7043643213

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       743         -
# 2  Short offline       Completed without error       00%       719         -
# 3  Extended offline    Interrupted (host reset)      00%       719         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):


Did I need to post for each drive or were you just wanting ada1?
 
Last edited:
Joined
Oct 18, 2018
Messages
969
Did I need to post for each drive or were you just wanting ada1?
It would be good to post results for at least the drives showing as degraded. For readability you can also surround the output with CODE tags.

I am curious to know, did you set up your pool entirely through the GUI?
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Again, I am sorry. I'm not sure what you mean by surround the code with tags. I looked through the familybrown.org/dokuwiki/doku.php?id=fester:hvalid_hdd webpage, but I didn't see it had any "Tags" to mimic. I have the output of ada3, but maybe you know a source I could look at for TAGS?

Yes, I set the pool up with the GUI. I know enough about this to be dangerous, as you can tell. Most of what I know is from YouTube.

Also, this is the new error I'm getting:

Pool Plex2 state is UNAVAIL: One or more devices are faulted in response to IO failures.
Fri, 06 Sep 2019 07:24:09 GMT
1567775798987.png


I have not removed any drive!
 
Last edited:

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Code:
smartctl -a /dev/ada3
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EMAZ-00WJTA0
Serial Number:    7SH3VAWD
LU WWN Device Id: 5 000cca 252cfd789
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Sep  6 06:01:45 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1049) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   253   253   024    Pre-fail  Always       -       162 (Average 120)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       2324
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   099   099   000    Old_age   Always       -       10624
 10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       39
 22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   086   086   000    Old_age   Always       -       17377
193 Load_Cycle_Count        0x0012   086   086   000    Old_age   Always       -       17377
194 Temperature_Celsius     0x0002   216   216   000    Old_age   Always       -       30 (Min/Max 21/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%     10609         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):


I can't seem to run the -a or -t for ada2.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
The drive known as ada1 is in very bad shape with almost half a million read and approaching 34 million seek errors. This drive is destined for the garbage.

It looks like ada2 may be in the same direction.

ada3 looks OK from that test, but since your pool is only RAIDZ1 that means with 2 disks out, your data is lost (I hope you have a backup of anything important).
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The drive known as ada1 is in very bad shape with almost half a million read and approaching 34 million seek errors.
Keep in mind, the disk is a Seagate, so the reported values are a "rate" and not a simple count. I think someone else has described how Seagate works those numbers, but it's not something I've paid much attention to.
I can't seem to run the -a or -t for ada2.
What happens when you try?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
the disk is a Seagate, so the reported values are a "rate" and not a simple count
Thanks for the pointer. I guess errors per year might be it, but any way you look at it, that's a lot that are on the way.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
OK, I get it... although it's an incredibly complicated way to store 2 numbers in one number which makes it very difficult to understand which value either of those numbers might be.

I'm going to stick with my habit of avoiding Seagate drives even more firmly now.
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Thanks everyone for helping me.
@danb35 Here is what i get when I run that -a command.
Code:
# smartctl -a /dev/ada2
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ada2: Unable to detect device type
Please specify device type with the -d option.

Use smartctl -h to get a usage summary


Could it be my mother board SATA ports? It's a really old motherboard. I'm not sure why it says removed. It would suck if both drives are bad. The as they are so new.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
As far as I can tell, it may help to see the smartctl output with the format switch set to hex in order to read the seagate attributes for errors vs seeks... smartctl -a -f hex /dev/ada1
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
I'm not sure if it's the same issue, but I'm getting this error, repeated, when I try to shut down the server.
Code:
rrdcache plugin: Faild to connect to RRDCached at unix:/var/run/rrdcached.sock: 
....
shut down terminated
Init: some processes would not die: ps axl advised
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Here is what I get back from -a on ada1, looks like ada2 is now ada1. So both WD drive look alright if I'm reading this right.
Code:
~ # smartctl -a /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.2-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     WDC WD80EMAZ-00WJTA0
Serial Number:    7HKJ05VF
LU WWN Device Id: 5 000cca 257f18d28
Firmware Version: 83.H0A83
User Capacity:    8,001,563,222,016 bytes [8.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Form Factor:      3.5 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Fri Sep  6 15:46:02 2019 PDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x80) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 245) Self-test routine in progress...
                                        50% of test remaining.
Total time to complete Offline
data collection:                (   93) seconds.
Offline data collection
capabilities:                    (0x5b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (1029) minutes.
SCT capabilities:              (0x003d) SCT Status supported.
                                        SCT Error Recovery Control supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0004   129   129   054    Old_age   Offline      -       112
  3 Spin_Up_Time            0x0007   253   253   024    Pre-fail  Always       -       286 (Average 48)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       325
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000a   100   100   067    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0004   128   128   020    Old_age   Offline      -       18
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       769
10 Spin_Retry_Count        0x0012   100   100   060    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       325
22 Unknown_Attribute       0x0023   100   100   025    Pre-fail  Always       -       100
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       839
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       839
194 Temperature_Celsius     0x0002   216   216   000    Old_age   Always       -       30 (Min/Max 24/35)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%       720         -
# 2  Short offline       Completed without error       00%       720         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Both of the WD drives data you shared show no areas of concern (other than only recording short tests... no long test to be sure of health).
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
I'm not sure if this need to be put in a new post, but I put in my new drive, at about 7 central std) and it started the resilvering. It's at 1% after 24 hours. What should I do?
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
you can look at dmesg to see if you are having read errors, but other than that, "wait longer" is the answer.

If you post the putput from zpool status -v maybe something will indicate what's going on.
 

jspfunk

Dabbler
Joined
Oct 26, 2017
Messages
48
Here is the zpool status -v code

Code:
dev> ...
        remove [-nps] <pool> <device> ...

        labelclear [-f] <vdev>

        checkpoint [--discard] <pool> ...

        list [-Hpv] [-o property[,...]] [-T d|u] [pool] ... [interval [count]]
        iostat [-v] [-T d|u] [pool] ... [interval [count]]
        status [-vx] [-T d|u] [pool] ... [interval [count]]

        online [-e] <pool> <device> ...
        offline [-t] <pool> <device> ...
        clear [-nF] <pool> [device]
        reopen <pool>

        attach [-f] <pool> <device> <new-device>
        detach <pool> <device>
        replace [-f] <pool> <device> [new-device]
        split [-n] [-R altroot] [-o mntopts]
            [-o property=value] <pool> <newpool> [<device> ...]

        scrub [-s | -p] <pool> ...

        import [-d dir] [-D]
        import [-o mntopts] [-o property=value] ...
            [-d dir | -c cachefile] [-D] [-f] [-m] [-N] [-R root] [-F [-n]] -a
        import [-o mntopts] [-o property=value] ...
            [-d dir | -c cachefile] [-D] [-f] [-m] [-N] [-R root] [-F [-n]] [-t]
            [--rewind-to-checkpoint] <pool | id> [newpool]
        export [-f] <pool> ...
        upgrade [-v]
        upgrade [-V version] <-a | pool ...>
        reguid <pool>

        history [-il] [<pool>] ...
        get [-Hp] [-o "all" | field[,...]] <"all" | property[,...]> <pool> ...
        set <property=value> <pool>
 
Top