Alert System: CRITICAL: The volume xxxxx state is ONLINE

Status
Not open for further replies.

artis1sysop

Dabbler
Joined
Jul 31, 2017
Messages
21
I don't understand this CRITICAL message, and what it means. Can someone explain, and tell me how to fix it? It's been in my Alerts since Sept 23, and it's now Oct 4 and hasn't gone away.

Code:
The volume BackupFiles state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.


upload_2018-10-4_11-23-38.png
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
This means exactly what it says on the tin: the volume "BackupFiles" is online, but one or more of the devices in it (your disk drives) has suffered an unrecoverable error. You have enough redundancy that ZFS was able to correct the error, and applications are unaffected.

Investigate your volume status under Storage -> Volumes, select the "BackupFiles" volume, and then click the bottom button labeled "Volume Status". For easier copy-and-pasting to the forum, you can also use a shell or SSH session and issue the command zpool status BackupFiles - please paste the results inside of CODE tags if you do this.

Most likely you have a failing drive, and you should consider replacing it. See the documentation under 8.1.10 Replacing A Failed Drive.
http://doc.freenas.org/11/storage.html#replacing-a-failed-drive

If you have a spare SATA/SAS port, you should consider attaching the additional drive and following the "Replace in-place" steps in 8.1.11 Replacing Drives to Grow a ZFS pool

http://doc.freenas.org/11/storage.html#replacing-drives-to-grow-a-zfs-pool

You also are not running the SMART drive health checks, which will give you early-warning notices of drives that are experiencing difficulty.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
It's been in my Alerts since Sept 23, and it's now Oct 4 and hasn't gone away.
"I've had a critical alert for almost two weeks, but haven't bothered to even ask about it until now." I guess your data must not be very important.
 

artis1sysop

Dabbler
Joined
Jul 31, 2017
Messages
21
This means exactly what it says on the tin: the volume "BackupFiles" is online, but one or more of the devices in it (your disk drives) has suffered an unrecoverable error. You have enough redundancy that ZFS was able to correct the error, and applications are unaffected.
<snip>
You also are not running the SMART drive health checks, which will give you early-warning notices of drives that are experiencing difficulty.

Thanks for the reply.

This is what I have on the volume status page:
upload_2018-10-4_13-9-41.png


As for the SMART health checks not running, the SMART service won't start, I think it has something to do with the drives are on a Dell PERC330 RAID controller and the PERC330 is doing the RAID, not software. Is there a way to fix this?
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
This is what I have on the volume status page:

You have two CSUM (checksum) errors on device da3, but you have a far bigger problem:

the drives are on a Dell PERC330 RAID controller and the PERC330 is doing the RAID, not software.

You've successfully followed the FreeNAS Worst Practices Guide (http://www.freenas.org/blog/freenas-worst-practices/) and used a hardware RAID controller underneath ZFS.

Is there a way to fix this?

Back up your data, destroy the pool, remove the RAID card and replace it with an HBA, then recreate and restore your data.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
the PERC330 is doing the RAID, not software.
I don't think it is, as demonstrated by the fact that the volume status page is showing the RAID status. Unless you've created three arrays on your controller, and then used those to create a RAIDZ pool (which would be a spectacularly bad configuration).
Back up your data, destroy the pool, remove the RAID card and replace it with an HBA, then recreate and restore your data.
Agreed.
 

artis1sysop

Dabbler
Joined
Jul 31, 2017
Messages
21
It's been awhile, but I don't recall creating a RAID from the Dell PERC H330. The drives are just plugged into a hot-swap backplane.

The Dell server is in another state, I'm asking a technician there to take a look at the hardware configuration. I don't know if the Dell PERC H330 is an on-board controller or if it's a physical board that can be pulled. I suspect that it's built-in on-board. I found some links on disabling Dell on-board PERC S300 (different model, ours is H330) RAID controller.

Is this what I want to do (assuming the instructions are the same for the Dell PERC H330)
upload_2018-10-4_14-2-17.png
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
If it has the capability of acting as a simple disk controller, without doing anything RAID-y, or interfering with the system's direct access to the disks, it needs to be configured to do that. But if it's what's keeping smartd from running, that isn't a good sign.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The Dell server is in another state, I'm asking a technician there to take a look at the hardware configuration.
Who configured the system? When? Exactly what hardware was used and why was it selected?
and hasn't gone away.
Why would you expect a hardware fault to go away without intervention from the sysop?
 
Status
Not open for further replies.
Top