Problem with rebuild raidz2

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
Hello,
I am pretty new and inexperienced with freenas. I was runing hw raid for a long time so I do understand basics, but not very much in bsd and linux.

I have build a 6 drive data storage and one of the drives went dead after few hours. I bought a new drive, replaced the bad one and then problem begins.
I didnt know how to replace the drive so i maked that drive as a spare one hoped that everything resolves automatically. Nothing happend, so I found a button replace disk under pools that didnt do anything so i tried to check force, i tried to scrub pool... Now I have 2 tomes ada4p2 and once it is as online spare and also it is as removed spare. On dashboard I now see 7 drives and if I disconect that drive I see only 5 drives.

I dont know what I did wrong but I am clueless. I dont have enaugh storage to make backup and make a new pool from scratch, but theres is still one more drive before I get really nervous.

I know i should first try to search in manual and dont try stupid things... I know I am stupid but I still appreciate help.

If any good soul can help me I can provide acces trough teamviewer or whatever you will need to help me resolve this.

some screenshots...

1.png 2.png 3.png

thank you

Matess
 
Joined
Oct 18, 2018
Messages
969
Hi @Matess, sorry to hear you're having problems. In general, if something isn't familiar to you or you don't now how to do something in FreeNAS I suggest you look to the forums or the User Guide first. Unfortunately, if you take the wrongs steps you can make an okay situation into a bad situation accidentally.

I have build a 6 drive data storage and one of the drives went dead after few hours. I bought a new drive, replaced the bad one and then problem begins.
I didnt know how to replace the drive so i maked that drive as a spare one hoped that everything resolves automatically. Nothing happend, so I found a button replace disk under pools that didnt do anything so i tried to check force, i tried to scrub pool... Now I have 2 tomes ada4p2 and once it is as online spare and also it is as removed spare. On dashboard I now see 7 drives and if I disconect that drive I see only 5 drives.

I don't know what I did wrong but I am clueless. I don't have enaugh storage to make backup and make a new pool from scratch, but theres is still one more drive before I get really nervous.
It can be very worrying when things start to go wrong and people start to think about losing data. It is easy to panic and start clicking through the UI to try to fix things. Be careful though, don't panic. Make sure you follow the User Guide and ask for help where you're not sure, taking the wrong step may harm your system.

You have RAIDZ2 which is good. You're not in eminent risk of data loss.

Can you send me the output of zpool status? Please surround the copy-pasted output with code tags like the following.

[CODE]
some code here
[/CODE]


Can you click though storage->pools->raid6->status then click the ... on the drive which failed, it looks like /dev/gptid/5eee...fc1c then try to replace it. Please take a screen shot of any error that comes up; as well as exactly which disks show up in the drop down menu for the replace. This will likely fail again like it did before, that is okay. I just want to see exactly what it shows.

Also, a quick nit. I noticed you have your drives labeled as raid6. FreeNAS uses zfs which has no raid6. The analog in zfs is RIADZ2. The terminology is important to make sure you communicate clearly; for example if you said raid1 to mean a mirror someone might think you mean RAIDZ1 which is more like raid5. Take a look at this terminology primer if it helps.
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
Thank you for your time!
I think what I did was that I wasnt able to find replace button (manual is on old graphics and it is a little bit different), so I added new drive as spare via extend and then I found that replace button, but replace did nothing so I checked force and thats I think is the moment when I started to have one drive two times there. If I try to remove spare it showed some kind of error like removal already in progres (When I am actually looking there right now I can see already removed, but that took almost two weeks) If i try to replace there is no drive in the list. (it is already under missing disk)

Code:
 
root@freenas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:31:25 with 0 errors on Wed Oct  2 05:16:25 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid6
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 321M in 0 days 00:00:24 with 0 errors on Sun Oct  6 19:05:272019
config:

        NAME                                              STATE     READ WRITE CKSUM
        raid6                                             DEGRADED     0     0   0
          raidz2-0                                        DEGRADED     0     0   0
            gptid/5a77b354-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5b8a866b-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5ca726f2-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5ded7061-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            spare-4                                       DEGRADED     0     0   0
              11379530257861494868                        UNAVAIL      0     0   0  was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c
              gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c  ONLINE       0     0   0
            gptid/5fef427e-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
        spares
          7888162580806996450                             REMOVED   was /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c

errors: No known data errors
 
Joined
Oct 18, 2018
Messages
969
If I try to remove spare it showed some kind of error like removal already in progres
Moving forward, please keep exact copies (screen shots if you must) of exact error messages and exactly what steps you took to get there. It will significantly help in troubleshooting.

Code:
  spare-4 DEGRADED 0 0 0 
    11379530257861494868 UNAVAIL 0 0 0 was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c 
    gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c ONLINE 0 0 0
Code:
spares
  7888162580806996450 REMOVED was /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c

This says to me that your issue may be that you added the spare and then removed it from your system. It looks like the system sees the disk in two ways, as a spare and as a possible disk for use as a replacement. If I were you I'd try to remove the disk as a spare and try to resilver from there.

You can try to remove the device with zpool remove raid6 7888162580806996450 I got that number from the zpool status above. If that doesn't work you may need to do zpool remove raid6 /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c

Once it is removed give it a few minutes and lets see the output of zpool status again. At this step you can also try to click the replace button for 11379530257861494868 and try to select the new disk. If it doesn't work, report back exactly what errors you got and a fresh zpool status.
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
root@freenas[~]# zpool remove raid6 7888162580806996450
Code:
cannot remove 7888162580806996450: Pool busy; removal may already be in progress

root@freenas[~]# zpool remove raid6 /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c
Code:
cannot remove /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c: Pool busy; removal may already be in progress

zpool status
Code:
root@freenas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:31:25 with 0 errors on Wed Oct  2 05:16:25 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid6
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 321M in 0 days 00:00:24 with 0 errors on Sun Oct  6 19:05:272019
config:

        NAME                                              STATE     READ WRITE CKSUM
        raid6                                             DEGRADED     0     0   0
          raidz2-0                                        DEGRADED     0     0   0
            gptid/5a77b354-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5b8a866b-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5ca726f2-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5ded7061-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            spare-4                                       DEGRADED     0     0   0
              11379530257861494868                        UNAVAIL      0     0   0  was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c
              gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c  ONLINE       0     0   0
            gptid/5fef427e-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
        spares
          7888162580806996450                             REMOVED   was /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c

errors: No known data errors
 
Joined
Oct 18, 2018
Messages
969
scan: resilvered 321M in 0 days 00:00:24 with 0 errors on Sun Oct 6 19:05:272019
The resilver hasn't progressed at all. Give it another few hours and if the resilver hasn't progress you may try detaching the spare. Give zpool detach 7888162580806996450 or zpool detach /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c in that case. Once that happens you can try to select 11379530257861494868 and choose the new drive as a replacement then.
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
oh my... i missed that I should wait a few hours...

zpool detach raid6 7888162580806996450 worked.... it detached that drive from the upper list, but stays there as removed spare

Code:
root@freenas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:31:25 with 0 errors on Wed Oct  2 05:16:25 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid6
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
        the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: resilvered 321M in 0 days 00:00:24 with 0 errors on Sun Oct  6 19:05:272019
config:

        NAME                                            STATE     READ WRITE CKSUM
        raid6                                           DEGRADED     0     0 0
          raidz2-0                                      DEGRADED     0     0 0
            gptid/5a77b354-b31d-11e9-af29-74d435ecfc1c  ONLINE       0     0 0
            gptid/5b8a866b-b31d-11e9-af29-74d435ecfc1c  ONLINE       0     0 0
            gptid/5ca726f2-b31d-11e9-af29-74d435ecfc1c  ONLINE       0     0 0
            gptid/5ded7061-b31d-11e9-af29-74d435ecfc1c  ONLINE       0     0 0
            11379530257861494868                        UNAVAIL      0     0 0  was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c
            gptid/5fef427e-b31d-11e9-af29-74d435ecfc1c  ONLINE       0     0 0
        spares
          7888162580806996450                           REMOVED   was /dev/gptid/a20f655d-d636-11e9-8c47-74d435ecfc1c

errors: No known data errors

If i try to do that aggain it says:
Code:
root@freenas[~]# zpool detach raid6 7888162580806996450
cannot detach 7888162580806996450: device is reserved as a hot spare



When I try to replace that drive I still have empty list there. That spares status says removed, so I tried to make that online and got this:

4.png

To see that drive in the list... does it has to be as spare or should I somehow remove spare drive?
If you want I can send you access through teamviewer...

Matess
 
Joined
Oct 18, 2018
Messages
969
What I would suggest you do at this point is try to force remove 7888162580806996450 from the pool; and then try to select that disk as a replacement for 11379530257861494868. If that fails, try rebooting. Report back with any errors or hiccups.
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
remove button worked without an error! that drive dissappeared.
It still wasnt in a list when I clicked on replace. I can see that hdd in drives list.
I didnt try anything else... it is rebooting now... 15 minutes later and it is still not up... :(
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
It is finally up. It seemed like an update with problems. I have some pictures but not enaugh strengh to continue. I really have to go to sleep now. Thank you very much for your help. I will post more tomorow.
 
Joined
Oct 18, 2018
Messages
969
It seemed like an update with problems.
I don't know what you mean by this.

What is the status of zpool status?

After the reboot, are you able to select the drive from replace?
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
After reboot (which took an hour) it had a problem with disk not empty and quick wipe did not work

5.png 6.png

after slow wipe it worked no problem

9.png

disk replacement started :) and now it is at 65% resilvering....

If you want to see what it showed during reboot - here it is:

IMG_20191008_235650.jpg IMG_20191009_002148.jpg

I will try to reboot that machine when it will be done. Now it seems working normally.

Again - thank you.
 
Joined
Oct 18, 2018
Messages
969
I will try to reboot that machine when it will be done. Now it seems working normally.
You don't need to reboot the machine when it is done unless you have some other reason to do so.

Again - thank you.
No problem, glad it worked out.
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
it is stucked 3rd day at 65% resilvering - is that normal behavior?

I can hear disks working pretty hard so I am just not sure...
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
Code:
root@freenas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:00:33 with 0 errors on Thu Oct 10 04:45:33 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid6
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Sun Oct 13 16:15:39 2019
        1.37T scanned at 1.41G/s, 567G issued at 583M/s, 15.1T total
        94.3G resilvered, 3.66% done, 0 days 07:16:57 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        raid6                                             DEGRADED     0     0   0
          raidz2-0                                        DEGRADED     0     0   0
            gptid/5a77b354-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            gptid/5b8a866b-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            3910932811181458196                           REMOVED      0     0   0  was /dev/gptid/5ca726f2-b31d-11e9-af29-74d435ecfc1c
            gptid/5ded7061-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0
            replacing-4                                   DEGRADED     0     0   0
              11379530257861494868                        UNAVAIL      0     0   0  was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c
              gptid/56082ecc-ea62-11e9-acb4-74d435ecfc1c  ONLINE       0     0   0
            gptid/5fef427e-b31d-11e9-af29-74d435ecfc1c    ONLINE       0     0   0

errors: No known data errors


10.png 11.png

I think now it is time to get nervous....

Is is possible to find out how much time does that replaced ada4 needs?
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
I think that was a stupid question... I can see progress in shell. So if I get this then problem was with another drive which stucked rebuild process and now when drive is disconected it is veryfing that again so it should go pretty fast to 65% and then it should slowly rebuild the rest - right?
 
Joined
Oct 18, 2018
Messages
969
It looks like a new drive suddenly became UNAVAILABLE? If so, you may have issues with your controller? What motherboard are you using and are you using any add on cards to connect your drives to your system? Are you still using a hardware RAID card? Perhaps it is failing?
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
it is all on motherboard on chipset. There were 4 old drives and 2 new (migrating from hw raid5 to raidz2). On hw raid one drive pretty often gave up and once even 2 drivers were down. I thought it was a faulty cable.... So drives can be bad... (but also motherboard can be bad)
 

Matess

Dabbler
Joined
Oct 6, 2019
Messages
13
Code:
root@freenas[~]# zpool status
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 01:45:59 with 0 errors on Fri Oct 18 05:30:59 2019
config:

        NAME        STATE     READ WRITE CKSUM
        freenas-boot  ONLINE       0     0     0
          da0p2     ONLINE       0     0     0

errors: No known data errors

  pool: raid6
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Oct  9 08:59:29 2019
        9.37T scanned at 580M/s, 8.91T issued at 517M/s, 15.2T total
        877G resilvered, 58.59% done, 0 days 03:33:04 to go
config:

        NAME                                              STATE     READ WRITE CKSUM
        raid6                                             DEGRADED     0     0  61
          raidz2-0                                        DEGRADED     0     0 124
            gptid/5a77b354-b31d-11e9-af29-74d435ecfc1c    DEGRADED     0     0   0  too many errors
            gptid/5b8a866b-b31d-11e9-af29-74d435ecfc1c    DEGRADED     0     0   0  too many errors
            3910932811181458196                           UNAVAIL      0     0   0  was /dev/gptid/5ca726f2-b31d-11e9-af29-74d435ecfc1c
            gptid/5ded7061-b31d-11e9-af29-74d435ecfc1c    DEGRADED     0     0   0  too many errors
            replacing-4                                   DEGRADED     0     0   0
              11379530257861494868                        UNAVAIL      0     0   0  was /dev/gptid/5eee0512-b31d-11e9-af29-74d435ecfc1c
              gptid/56082ecc-ea62-11e9-acb4-74d435ecfc1c  ONLINE       0     0   0
            12196532984740233970                          UNAVAIL      0     0   0  was /dev/gptid/5fef427e-b31d-11e9-af29-74d435ecfc1c

errors: 14 data errors, use '-v' for a list


I am not sure how to continue.... I think backup again and start over on new hardware.... any sugestions?
 
Joined
Oct 18, 2018
Messages
969
I would guess, yes. I really suspect your controller or motherboard here. Unless you threw every drive down the stairs it seems unlikely that so many drives would go down in this way in such a a short amount of time.
 
Top