Degraded pool, 2 of 4 drives missing

·HeRmAN· · Mar 1, 2020

Hi

I'm a newbie to FreeNAS and have run in to some problems with my setup.

The setup is as follows:
HP Proliant Microserver gen 8
16GB ECC unbuffered RAM
4xSeagate Barracuda Compute brand new 2TB ST2000DM008-2FR102 in a RAIDZ2 pool, all drives in one VDev
MB, Disc controller and NIC are what comes from HP, nothing changed.

The zpool status and the Alerts given from the system says

Pool P1 state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state..
Sun, 1 Mar 2020 04:26:27 AM (Europe/Stockholm)
at a time when I was sleeping this morning.

And I also got an Critical Alert some days ago.
Device: /dev/ada2, 8 Currently unreadable (pending) sectors.
Tue, 25 Feb 2020 12:16:01 PM (Europe/Stockholm)Dismiss

Code:

Last login: Thu Feb 27 18:09:43 on pts/2
FreeBSD 11.3-RELEASE-p5 (FreeNAS.amd64) #0 r325575+8ed1cd24b60(HEAD): Mon Jan 27 18:07:23 UTC 2020

        FreeNAS (c) 2009-2020, The FreeNAS Development Team
        All rights reserved.
        FreeNAS is released under the modified BSD license.

        For more information, documentation, help or support, go here:
        http://freenas.org
Welcome to HeRmAN's FreeNAS S1

Warning: settings changed through the CLI are not written to
the configuration database and will be reset on reboot.

root@S1[~]# zpool status
  pool: P1
state: DEGRADED
status: One or more devices has been removed by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
  scan: resilvered 26.5M in 0 days 00:00:03 with 0 errors on Sun Mar  1 15:43:44 2020
config:

        NAME                                            STATE     READ WRITE CKSUM
        P1                                              DEGRADED     0     0 0
          raidz2-0                                      DEGRADED     0     0 0
            gptid/177edd20-4062-11ea-9994-94188238f940  ONLINE       0     0 0
            gptid/1874a33c-4062-11ea-9994-94188238f940  ONLINE       0     0 0
            14819823119617628312                        REMOVED      0     0 0  was /dev/gptid/194f9d5e-4062-11ea-9994-94188238f940
            3160331546298106103                         REMOVED      0     0 0  was /dev/gptid/1a4869cf-4062-11ea-9994-94188238f940

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0 days 00:00:14 with 0 errors on Thu Feb 27 12:45:14 2020
config:

If I need to provide more information, please let me know how to find it.

New to FreeNAS but with some knowledge in Linux (which I know isn't the same as FreeNAS or BSD) and slowly learning.

·HeRmAN· · Mar 1, 2020

Forgot to mention that the boot pool is a 256GB SSD drive at as ada4,
the P1 pool consist of ada0, ada1, ada2, ada3 and it's ada2 and ada3 that is missing now.

Code:

root@S1[/dev]# ls -la | grep ada
crw-r-----   1 root  operator  0x71 Mar  1 15:42 ada0
crw-r-----   1 root  operator  0x78 Mar  1 15:42 ada0p1
crw-r-----   1 root  operator  0x79 Mar  1 15:42 ada0p2
crw-r-----   1 root  operator  0x7a Mar  1 15:42 ada1
crw-r-----   1 root  operator  0x80 Mar  1 15:42 ada1p1
crw-r-----   1 root  operator  0x81 Mar  1 15:42 ada1p2
crw-r-----   1 root  operator  0x7d Mar  1 15:42 ada4
crw-r-----   1 root  operator  0x86 Mar  1 15:42 ada4p1
crw-r-----   1 root  operator  0x87 Mar  1 15:42 ada4p2
lrwxr-xr-x   1 root  wheel       11 Mar  1 15:42 dumpdev -> /dev/ada3p1

·HeRmAN· · Mar 20, 2020

Long time but no answers or help.

I replaced the 2 missing drives with 2 brand new identical drives.
So far so good, resilvering needed lots of hours but finally the pool was healthy and i removed the old drives in the UI

Overview

Platform: Generic

Version: FreeNAS-11.3-U1

FreeBSD 11.3-RELEASE-p6 (FreeNAS.amd64) #0 r325575+d5b100edfcb(HEAD): Fri Feb 21 18:53:26 UTC 2020

zpool status
pool: P1
state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
the pool may no longer be accessible by software that does not support
the features. See zpool-features(7) for details.
scan: resilvered 1.83T in 1 days 04:08:15 with 0 errors on Sat Mar 14 03:00:43 2020
config:

NAME STATE READ WRITE CKSUM
P1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/177edd20-4062-11ea-9994-94188238f940 ONLINE 0 0 0
gptid/1874a33c-4062-11ea-9994-94188238f940 ONLINE 0 0 0
gptid/c50f07f7-64aa-11ea-939f-94188238f940 ONLINE 0 0 0
gptid/ba8dc60f-64ab-11ea-939f-94188238f940 ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0 days 00:00:21 with 0 errors on Thu Mar 19 03:45:21 2020
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
ada4p2 ONLINE 0 0 0

errors: No known data errors

Now, one of the brand new drives got a critical alert

CRITICAL
Device: /dev/ada3, 8 Currently unreadable (pending) sectors.
Fri, 20 Mar 2020 10:32:15 AM (Europe/Stockholm)

CRITICAL
Device: /dev/ada3, 8 Offline uncorrectable sectors.
Fri, 20 Mar 2020 10:32:16 AM (Europe/Stockholm)

INFO
Scrub of pool 'freenas-boot' finished.
Thu, 19 Mar 2020 03:45:21 AM (Europe/Stockholm)

smartctl -A /dev/ada3
smartctl 7.0 2018-12-30 r4883 [FreeBSD 11.3-RELEASE-p6 amd64] (local build)
Copyright (C) 2002-18, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 061 060 006 Pre-fail Always - 18413888
3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 1
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 075 060 045 Pre-fail Always - 35390853
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 182 (58 118 0)
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 1
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 064 062 040 Old_age Always - 36 (Min/Max 22/38)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 8
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
194 Temperature_Celsius 0x0022 036 040 000 Old_age Always - 36 (0 22 0 0 0)
195 Hardware_ECC_Recovered 0x001a 073 064 000 Old_age Always - 18413888
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 182 (71 16 0)
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 2217743002
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 478845

Any Idea of why this happens, how to prevent it?
Trying to migrate from my very old HP MediaSmart EX495 Server but this feels less secure.

sretalla · Mar 20, 2020

I know seagate drives use a strange double-byte system for read and write errors (so those numbers may not actually be a problem), but these 2 look like the pertinent ones to me:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 8
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 8

As far as I can see, it's just a bad drive (I guess both are the same?, maybe the same batch).

Did you burn them in when setting up?

·HeRmAN· · Mar 20, 2020

sretalla said:
Did you burn them in when setting up?

No, Should I and If so, where can I read how to do it?

Can be the same batch, Will check if I can see any manufacturing date on them.

tfran1990 · Mar 20, 2020

·HeRmAN· said:
Will check if I can see any manufacturing date on them

look and see if the S/N are close together.

·HeRmAN· said:
where can I read how to do it

search the forum for badblocks burn in

·HeRmAN· · Mar 20, 2020

They have dates
The first two removed are
S/N: WFL14526 DOM: 04OCT2018 was in Bay2
S/N: WFL0HAEA DOM: 21NOV2018 was in Bay4
The drive alerting today is
S/N: WFL1SBOJ DOM: 31JAN2019 as ada3 in Bay4

Good so far drives
S/N: WFL13AZD DOM: 04OCT2018 as ada0 in Bay1
S/N: WFL1SBL2 DOM: 31JAN2019 as ada2 in Bay2
S/N: WFL0FYB1 DOM: 21NOV2018 as ada1 in Bay3

Important Announcement for the TrueNAS Community.

Degraded pool, 2 of 4 drives missing

·HeRmAN·

Dabbler

·HeRmAN·

Dabbler

·HeRmAN·

Dabbler

sretalla

Powered by Neutrality

·HeRmAN·

Dabbler

tfran1990

Patron

·HeRmAN·

Dabbler

Similar threads