ZFS concerns

Status
Not open for further replies.

hfromM

Cadet
Joined
Nov 25, 2014
Messages
9
Hello

I have setup my freeNAS system and I am quite happy with it so far, I use 2 x 4TB drives in a mirror setup. I went through the cyberjock's ZFS guide and I stumbled upon this passage:

"When a VDev can no longer provide 100% of its data using checksums or mirrors, the VDev will fail. If any VDev in a zpool is failed you will lose the entire zpool with no chance of partial recovery. (Read this again so it sinks in)"

I have trouble understanding this. For instance I plan on using a smaller single disk with ZFS on it for data I really don't care about much, I deliberately take drive failure as an option in this case.
I will perform regular scrubs on this single drive, what happens if the scrub detects some checksum irregularities for some files? According to the statement above this sounds disastrous.
Also does a checksum error mean the file is not accessible anymore?

I don't need any features besides scrubs of ZFS, is ZFS really a good option for me then? It seems like some overkill and in case of unexpected disk/boot/mount troubles I would have to go through quite some hassle compared to an NTFS disk that I could just plug into someones PC and restore files. I just want a simple reliable storage for my mirror array and less reliable storage for a smaller disk.

Thank you
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Of course the file is no longer available if it has a checksum error (the whole pool will probably need to be destroyed as well). What's the point of trying to open a file that is not correct?

Have you ever tried to corrupt a random file and see what happens?

ZFS is designed to maintain integrity. You cannot maintain integrity without mirrors or similar setups. But that applies to NTFS as well, difference being it'll happily pass you whatever corrupted crap happens to be on the disk.
 

hfromM

Cadet
Joined
Nov 25, 2014
Messages
9
The whole pool will need to be destroyed? Let's say there is a single file with a checksum error, in the statement that I quoted above it sounds like your whole vdev and then the pool will automatically fail. To be more specific:
-Will I still be able to access all other unharmed files to copy them somewhere?
-Should I choose UFS over ZFS for a single drive?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
It won't fail outright, typically, but don't expect it to be reliable.

You can also forget about UFS, as it's no longer supported in 9.3
 
L

L

Guest
So this is just the same as any other filesystem. If I lose both disks in a mirror(100%), I will lose all the data. If I have 10 mirrored pairs and I lose a whole pair(both sides).

There is no filesystem/volume manager on the planet that will be able to provide redundancy when all the redundancy is gone.
 

SirMaster

Patron
Joined
Mar 19, 2014
Messages
241
You could enable multiple copies on the single-disk zpool so that if there is a checksum error in a file, the file will likely be able to be repaired from the other copy.

You should not have to destroy the zpool if you have a corrupted file even if you only have a single-disk zpool with copies set to 1. ZFS will tell you which file is corrupted and unrecoverable and you just need to delete that file as well as any snapshots that reference that file. Your zpool will then be happy again.

Only if your metadata gets corrupted will you need to destroy the zpool (because it will likely be unable to import). This is less likely to happen because by default 2 copies of metadata are kept even on a single-disk zpool. Also if you have copies set to 2 for instance then 3 copies of the metadata are kept by default (copies+1).
 

no_connection

Patron
Joined
Dec 15, 2013
Messages
480
If the drive starts to throw enough errors to be kicked out of the pool, it's arguably the same thing as a broken disk as it is likely to fail any way.

If ZFS won't serve you a corrupt file out of the box there should to be a way to force it. Turning off check sums aught to do that if needed be.
Easy to test it if you are unsure.

In this case the strong point of ZFS is knowing if data is corrupt, not necessarily provide data integrity or redundancy.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
The whole pool will need to be destroyed? Let's say there is a single file with a checksum error, in the statement that I quoted above it sounds like your whole vdev and then the pool will automatically fail. To be more specific:
-Will I still be able to access all other unharmed files to copy them somewhere?
-Should I choose UFS over ZFS for a single drive?

UFS isn't an option, so give that up right now. 9.3 has no UFS support.

As for accessing the files, so long as the corruption is limited to the contents of the file and you don't have metadata corruption, then you should be able to access the other files. Of course, this is so rare you shouldn't expect it. The norm is metadata corruption, which means your box might crash and on reboot the pool will never mount again. This means all data in the pool is lost forever.

Some people have metadata corruption and had a pool that would mount, but trying to get data off the pool is very difficult and the amount of data you can obtain ranges from 0% to 99.9%. Again, shouldn't be banking on that 99.9% because that's not typical.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
So you have 2 drives mirrored in a single vdev in a single pool. If both drives fail, then the vdev fails and then the pool fails. You can have multiple vdevs in a single pool. That is what the warning is about. Say you had 4 drives - 2 mirrored pairs. If 2 drives in the same mirrored pair (vdev) failed, then the pool is lost. The file checksum has nothing to do with the vdev/pool loss warning.

For your single drive, checksums are almost pointless (if not impossible). It's like saying I want to run a mirrored pair, but with only 1 drive. There is no protection nor way to recover a corrupt file.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
For your single drive, checksums are almost pointless (if not impossible). It's like saying I want to run a mirrored pair, but with only 1 drive. There is no protection nor way to recover a corrupt file.
Checksums are most certainly possible (and, in fact, the norm) in a single-disk pool, and still useful, in that they allow you (or, more accurately, they allow ZFS) to detect file corruption. ZFS will not knowingly provide invalid data, and it's the checksums that make that possible.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Checksums are most certainly possible (and, in fact, the norm) in a single-disk pool, and still useful, in that they allow you (or, more accurately, they allow ZFS) to detect file corruption. ZFS will not knowingly provide invalid data, and it's the checksums that make that possible.

Unless the system is configured with copies=2 (or more) and therefore effectively halving (or more) the usable capacity of a single drive, recovery wouldn't be possible. So while, yes, a single drive with a single instance of a file is checksummed, and ZFS can detect corruption, without another piece of data somewhere it can't be corrected.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".

Point taken. :smile:
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
no pun intended
 

Tywin

Contributor
Joined
Sep 19, 2014
Messages
163
Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".

I don't think it's pointless at all. For example, let's say I have some bulk data that is annoying to replace, but not critical to my personal survival. I absolutely want the ability to validate the integrity of this data. If I know that it has become corrupted, I can go off and replace it. What I absolutely do not want is silent corruption. This is even more important if one keeps offsite backups (I'm not talking about ZFS replication here). If data became silently corrupted, I could happily overwrite my backups with corrupted data before discovering there was a problem. If on the other hand in the course of attempting to do a backup I get an error message warning me that my newer data is corrupt, then I can first restore from my backup.

I am not yet running FreeNAS, so this is the situation I am currently in with my NTFS-atop-RAID5 setup. I have a checksum file of my bulk data directories, but it is annoying to run manually; I would love to have this feature built into the operating and file systems.
 

pjc

Contributor
Joined
Aug 26, 2014
Messages
187
"When a VDev can no longer provide 100% of its data using checksums or mirrors, the VDev will fail. If any VDev in a zpool is failed you will lose the entire zpool with no chance of partial recovery. (Read this again so it sinks in)"
The first sentence is a slightly overstatement. The sentence would be true if it read "100% of its metadata", which is essentially the directory structure (to gloss over many details).

If the device fails entirely (the second sentence), the device fails. In a single-drive situation, that's hardly surprising. Cyberjock's admonition here is to warn people that use striping without redundancy: any failed drive tanks the whole pool.

what happens if the scrub detects some checksum irregularities for some files?
Some files with bad checksums is not the same as a failed device. But if you get more than a trivial number of checksum errors, you're well on the way to failed device.

And without redundancy, it means some data loss.
 
Status
Not open for further replies.
Top