Critical error! What to do next?

Status
Not open for further replies.

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Hi Freenas people,

I got the next message: The volume FreeNAS_Z1_1 (ZFS) state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state. And now I have no idea what to do. Can somebody please guide me what to do to figure it out and solve this error?
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
after a reboot I get this:


  • CRITICAL: May 1, 2018, 11:08 a.m. - Device: /dev/ada0, 1 Currently unreadable (pending) sectors
  • CRITICAL: May 1, 2018, 11:08 a.m. - The volume FreeNAS_Z1_1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Can somebody please guide me what to do to figure it out and solve this error?
Well, you could try searching the forums for any of the hundred other times this problem has come up. Or even look in the Resources section; there's a thorough resource on disk troubleshooting there.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
i did search but can't find a solution for this. I did not know about the resources section and will go there right now.
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
This is basic stuff, and it’s written in the manual how to replace a drive. The magic of ZFS is that it’s telling you this. Other file systems would not and just let you keep using broken files. As long as you have backups you are fine. Hard drives break all the time. Just diagnose the issue, figure out if it’s false of positive (I just replace them) and the replace with a prepared spare of order one asap.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
@garm thank you. After a check on the cables and stuff, I rebooted the system and got:


  • CRITICAL: May 1, 2018, 11:22 a.m. - Device: /dev/ada0, 1 Currently unreadable (pending) sectors
This seems like an error like sometime before. Which was then advised to ignore after I did some tests. Do you advise to replace ada0 disk?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Well you should look through smart reports and so on but ya, I would replace a drive reporting pending sectors.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Device: /dev/ada0, 1 Currently unreadable (pending) sectors
Again, there's a resource specifically on hard drive troubleshooting, and I'd encourage you to read it. I probably wouldn't replace a disk that I've had for a while with just one bad sector, if there were no other indications of problems. If it were a new disk, definitely.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
CRITICAL: May 1, 2018, 11:08 a.m. - Device: /dev/ada0, 1 Currently unreadable (pending) sectors
This typically indicates that the drive is going bad and needs to be replaced, but not always, depending on a number of factors including the brand of drive. It is only "Pending"...
What is the pool type? I get the impression is is RAIDz1 because of this:
The volume FreeNAS_Z1_1
How many drives do you have in the pool? Is the drive new enough that it is still in warranty?
Would you give us the SMART results for the drive like this user did:
https://forums.freenas.org/index.php?threads/3-currently-unreadable-pending-sectors.57230/
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
@Chris Moore it is a Z1 pool of 3 WD RED 4TB disks. I contacted the webshop and they will replace the drive under warranty. Meaning I order a new disk now, replace the old disk with the new (going to read how to this evening) and sent them the old disk and get a refund.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
@danb35 thank you! This seems more clear then the manual (http://doc.freenas.org/9.10/storage.html#replacing-a-failed-drive)

But then there is this part: "Shut down your server, remove the failed/failing disk, and replace it with a good, burned-in, tested disk. Power up the server". Call me a n00b but I get a brand new disk. Can I not just replace the old one with the new one? Also the disk is not offline now and info looks like this:

upload_2018-5-1_19-47-35.png


So I thought I use the Serial Number on the disk info and the actual disks to identify the ada0 disk to know which one to replace.

Also in the manual it is stated that: A disk that is failing but has not completely failed can be replaced in place, without first removing it. Whether this is a good idea depends on the overall condition of the failing disk. A disk with a few newly-bad blocks that is otherwise functional can be left in place during the replacement to provide data redundancy. A drive that is experiencing continuous errors can actually slow down the replacement. In extreme cases, a disk with serious problems might spend so much time retrying failures that it could prevent the replacement resilvering from completing before another drive fails.

Some advise is really appreciated :)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Can I not just replace the old one with the new one?
Yes, you can. But you still should burn in the disk before you make it part of your pool--infant mortality is still a thing, and disks have been known to be defective on arrival, or very shortly afterward.
A disk that is failing but has not completely failed can be replaced in place, without first removing it.
Correct, and often a good idea, especially with RAIDZ1, as it doesn't compromise the pool's redundancy during the replacement process. If you have a place to plug in the fourth disk, I'd recommend doing that; it's really only a slight variation on the process I document in the resource. Install the new (tested) disk in your server, go to Volume Status, select ada0, click Replace, select the replacement (which will probably be ada3), click Replace Disk. It will start resilvering; when it finishes, ada0 will be removed from your pool. At some convenient time, remove ada0 from your server and return it to your vendor.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
I want to follow your advise. So please explain me a bit more about the steps I need to take. If I follow your manual I miss the part of burning in and testing a disk.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
Install the new (tested) disk in your server, go to Volume Status, select ada0, click Replace, select the replacement (which will probably be ada3), click Replace Disk. It will start resilvering; when it finishes, ada0 will be removed from your pool. At some convenient time, remove ada0 from your server and return it to your vendor.

So this means I do not use the OFFLINE button. It will automaticly be removed from the pool. Then I shut down the system and remove the hdd? Or do I need to put it OFFLINE before that?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
@danb35 sorry for being a bit clueless and insecure...but to be sure:

1: I shut down the FreeNAS system and place the new disk.
2: after booting the FreeNAS I use PuTTY to do

a short self-test:
Code:
smartctl -t short /dev/adaX

run a conveyance test:
Code:
smartctl -t conveyance /dev/adaX

a long test:
Code:
smartctl -t long /dev/adaX

the badblocks test including starting tmux:
Code:
tmux

Code:
badblocks -b 4096 -ws /dev/adaX

run the SMART test
Code:
smartctl -t long /dev/adaX

read the results:
Code:
smartctl -A /dev/adaX

3: when RAW value remains emtpy I then start the previous instructions?
 

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
If you can, don’t burn in disks in your FreeNAS host. The risk of typos are just not worth it.
 

JFD

Dabbler
Joined
Jul 25, 2016
Messages
46
@danb35 sorry for being clueless and insecure...but just to be certain

1: I shutdown the system and place the new harddrive (being the 4th drive in the system)
2: I start the FreeNAS system and connect using PuTTY
3: I test with the next codes:

Code:
smartctl -t short /dev/ada3

Code:
smartctl -t conveyance /dev/ada3

Code:
smartctl -t long /dev/adaX

Code:
tmux

Code:
badblocks -b 4096 -ws /dev/ada3

(If session gets disconnected, to resume the session and view the test status)
Code:
tmux attach

After the test is complete
Code:
exit

4: reboot the FreeNAS system and get the results:
Code:
smartctl -t long /dev/ada3

Code:
smartctl -A /dev/ada3

And when RAW values are empty I can start the replacement part. Is this correct? Do I need to format the disk in anyway for the correct filesystem somewhere??

Thank you very much so far. Since Englisch isn't my native language and IT just something I do as a hobby, I just need to be very sure what the correct way is to get things done.
 
Status
Not open for further replies.
Top