Drive not online after resilering

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
Hello everyone,
few days ago i got the notification from FreeNAS, that my Pool (RaidZ1, with the 8 WD Red HDD's) is degraded.

Consequently i ordered a new HDD, to start the resilvering process. The resilvering started and everything looked fine. But after 12 hours it just stucked and looks like the resilvering process doesn't get on anymore.

I followed the steps in the guideline (Pool is unencrypted):
1. Set failed disk offline
2. Changed the old HDD for the new HDD
3. Chose "Replace" on the Web-UI and selected the new installed disk
4. Waited and hoped for the resilvering to finish successful

Attached you find some screenshots about the current situation.

Any further ideas, what to do next or had already the same problem?

1583392669176.png


zpool info
1583392697918.png
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
That doesn't look good. Seems your new drive has failed "OFFLINE" while resilvering. Pull it out and test whether it's done a "died after arrival" on you in another box/USB caddy.

Do you have a backup of this zpool? You're in a risky situation right now since it's RAIDZ1. I'd recommend having two disks on hand in future, or better re-creating this as a RAIDZ2.
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
Do you mean the new "drive" is defective ?
I remember while the resolving is running, I look via IPMI on Server and I see errors.
Do you mean I can repeat the resolving on an other SATA Port
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Yes, I think the new drive has died during resilvering. Do you recall what the errors were? If you SSH onto the FreeNAS you can use 'dmesg' to show them again, or 'less /var/log/messages' which gives you datestamps too.

Test the drive in another non-FreeNAS machine, as the partial resilver may have left enough tracks on the new drive to confuse things if you plug it back in again. If it works, I would blank it in the other machine and try again in FreeNAS.
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
I can test it on a Windows Mashine. I can read out the S.MA.R.T values.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
If it works, I would blank it in the other machine and try again in FreeNAS.
I would run badblocks on it before anything...
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
Do you know a tool for Windows?
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Check the current SMART readings (paste here if you're unsure) and format the drive in Windows. If it passes both without issues then put it back into your FreeNAS and run a SMART long test then a badblocks (takes a couple of days).

Remember that you're running degraded all this time - I would very much suggest buying another drive and updating your backups while all this is happening.
 

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
No, I would create a bootable CD with Linux and badblocks on it, boot your windows box with it and run badblocks on the subject hard drive.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
That works too. Puts the winbox out of action for two days.
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
In an emergency.
I am already preparing to copy all the data and rebuild the volume as Raidz2 with 11 Hdds instead of 8.
 
Last edited:

Redcoat

MVP
Joined
Feb 18, 2014
Messages
2,925
11 Hdds instead of 8.
Then I would run badblocks on any "new" hdd's you are introducing to your FreeNAS system.

Edir - or even consider running badblocks on all 11 (plus the spare that I imagine you will keep after this experience...?)
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
2020-03-05 20_19_36-192.168.2.31 - PuTTY.png

This are the SMART valves ;( from the new drive . Very Bad
 

tfran1990

Patron
Joined
Oct 18, 2017
Messages
294
CRC errors sometimes come from bad connections or cables. Having that many errors would "offline" the drive most likely.

check cables and connections. also look at the port on the drive, it may have some sort of damage.

When i replace a disk i
1 shut down
2 connect the new device
3 go to the failing disc and REPLACE then choose the one im replacing it with(i dont offline).
4 once the resilver is done the disc that you replaced the gptid will not show.
 
Last edited:

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
OK I have replace the Cabel and now the resilvering automaticly . I hope the problem is fix.
Thank you for your help. I will write when it done.
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
Ok ! It's done and finish - all drives are Online.
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
I think about to make a new Volume with Raidz2 and 11 Disk . Now I have 8 Disk. I have bought 3 Drives today ;)
 

tfran1990

Patron
Joined
Oct 18, 2017
Messages
294
If you are doing a rebuild of the pool make sure you consider your needs at this time, you would get more IO if you did 2 vdevs RAIDZ2 6 drives each.(would need 12 drives)
think about whats more important, raw space or space with better IO.
 

555NASE

Patron
Joined
Mar 3, 2017
Messages
202
Also ist is the better way to have 12 drives for more speed and Raw . Is that right?
 
Top