ZFS concerns

hfromM · Dec 4, 2014

Hello

I have setup my freeNAS system and I am quite happy with it so far, I use 2 x 4TB drives in a mirror setup. I went through the cyberjock's ZFS guide and I stumbled upon this passage:

"When a VDev can no longer provide 100% of its data using checksums or mirrors, the VDev will fail. If any VDev in a zpool is failed you will lose the entire zpool with no chance of partial recovery. (Read this again so it sinks in)"

I have trouble understanding this. For instance I plan on using a smaller single disk with ZFS on it for data I really don't care about much, I deliberately take drive failure as an option in this case.
I will perform regular scrubs on this single drive, what happens if the scrub detects some checksum irregularities for some files? According to the statement above this sounds disastrous.
Also does a checksum error mean the file is not accessible anymore?

I don't need any features besides scrubs of ZFS, is ZFS really a good option for me then? It seems like some overkill and in case of unexpected disk/boot/mount troubles I would have to go through quite some hassle compared to an NTFS disk that I could just plug into someones PC and restore files. I just want a simple reliable storage for my mirror array and less reliable storage for a smaller disk.

Thank you

Ericloewe · Dec 4, 2014

Of course the file is no longer available if it has a checksum error (the whole pool will probably need to be destroyed as well). What's the point of trying to open a file that is not correct?

Have you ever tried to corrupt a random file and see what happens?

ZFS is designed to maintain integrity. You cannot maintain integrity without mirrors or similar setups. But that applies to NTFS as well, difference being it'll happily pass you whatever corrupted crap happens to be on the disk.

hfromM · Dec 4, 2014

The whole pool will need to be destroyed? Let's say there is a single file with a checksum error, in the statement that I quoted above it sounds like your whole vdev and then the pool will automatically fail. To be more specific:
-Will I still be able to access all other unharmed files to copy them somewhere?
-Should I choose UFS over ZFS for a single drive?

Ericloewe · Dec 4, 2014

It won't fail outright, typically, but don't expect it to be reliable.

You can also forget about UFS, as it's no longer supported in 9.3

L · Dec 4, 2014

So this is just the same as any other filesystem. If I lose both disks in a mirror(100%), I will lose all the data. If I have 10 mirrored pairs and I lose a whole pair(both sides).

There is no filesystem/volume manager on the planet that will be able to provide redundancy when all the redundancy is gone.

SirMaster · Dec 4, 2014

You could enable multiple copies on the single-disk zpool so that if there is a checksum error in a file, the file will likely be able to be repaired from the other copy.

You should not have to destroy the zpool if you have a corrupted file even if you only have a single-disk zpool with copies set to 1. ZFS will tell you which file is corrupted and unrecoverable and you just need to delete that file as well as any snapshots that reference that file. Your zpool will then be happy again.

Only if your metadata gets corrupted will you need to destroy the zpool (because it will likely be unable to import). This is less likely to happen because by default 2 copies of metadata are kept even on a single-disk zpool. Also if you have copies set to 2 for instance then 3 copies of the metadata are kept by default (copies+1).

no_connection · Dec 4, 2014

If the drive starts to throw enough errors to be kicked out of the pool, it's arguably the same thing as a broken disk as it is likely to fail any way.

If ZFS won't serve you a corrupt file out of the box there should to be a way to force it. Turning off check sums aught to do that if needed be.
Easy to test it if you are unsure.

In this case the strong point of ZFS is knowing if data is corrupt, not necessarily provide data integrity or redundancy.

cyberjock · Dec 4, 2014

hfromM said:
The whole pool will need to be destroyed? Let's say there is a single file with a checksum error, in the statement that I quoted above it sounds like your whole vdev and then the pool will automatically fail. To be more specific:
-Will I still be able to access all other unharmed files to copy them somewhere?
-Should I choose UFS over ZFS for a single drive?

UFS isn't an option, so give that up right now. 9.3 has no UFS support.

As for accessing the files, so long as the corruption is limited to the contents of the file and you don't have metadata corruption, then you should be able to access the other files. Of course, this is so rare you shouldn't expect it. The norm is metadata corruption, which means your box might crash and on reboot the pool will never mount again. This means all data in the pool is lost forever.

Some people have metadata corruption and had a pool that would mount, but trying to get data off the pool is very difficult and the amount of data you can obtain ranges from 0% to 99.9%. Again, shouldn't be banking on that 99.9% because that's not typical.

depasseg · Dec 4, 2014

So you have 2 drives mirrored in a single vdev in a single pool. If both drives fail, then the vdev fails and then the pool fails. You can have multiple vdevs in a single pool. That is what the warning is about. Say you had 4 drives - 2 mirrored pairs. If 2 drives in the same mirrored pair (vdev) failed, then the pool is lost. The file checksum has nothing to do with the vdev/pool loss warning.

For your single drive, checksums are almost pointless (if not impossible). It's like saying I want to run a mirrored pair, but with only 1 drive. There is no protection nor way to recover a corrupt file.

danb35 · Dec 5, 2014

depasseg said:
For your single drive, checksums are almost pointless (if not impossible). It's like saying I want to run a mirrored pair, but with only 1 drive. There is no protection nor way to recover a corrupt file.

Checksums are most certainly possible (and, in fact, the norm) in a single-disk pool, and still useful, in that they allow you (or, more accurately, they allow ZFS) to detect file corruption. ZFS will not knowingly provide invalid data, and it's the checksums that make that possible.

depasseg · Dec 5, 2014

danb35 said:
Checksums are most certainly possible (and, in fact, the norm) in a single-disk pool, and still useful, in that they allow you (or, more accurately, they allow ZFS) to detect file corruption. ZFS will not knowingly provide invalid data, and it's the checksums that make that possible.

Unless the system is configured with copies=2 (or more) and therefore effectively halving (or more) the usable capacity of a single drive, recovery wouldn't be possible. So while, yes, a single drive with a single instance of a file is checksummed, and ZFS can detect corruption, without another piece of data somewhere it can't be corrected.

danb35 · Dec 5, 2014

Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".

depasseg · Dec 5, 2014

danb35 said:
Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".

Point taken.

depasseg · Dec 5, 2014

no pun intended

Tywin · Dec 5, 2014

danb35 said:
Certainly the data can't be recovered without another copy. I guess it's a matter of opinion whether detecting corruption, even without the possibility of repair, is "almost pointless".

I don't think it's pointless at all. For example, let's say I have some bulk data that is annoying to replace, but not critical to my personal survival. I absolutely want the ability to validate the integrity of this data. If I know that it has become corrupted, I can go off and replace it. What I absolutely do not want is silent corruption. This is even more important if one keeps offsite backups (I'm not talking about ZFS replication here). If data became silently corrupted, I could happily overwrite my backups with corrupted data before discovering there was a problem. If on the other hand in the course of attempting to do a backup I get an error message warning me that my newer data is corrupt, then I can first restore from my backup.

I am not yet running FreeNAS, so this is the situation I am currently in with my NTFS-atop-RAID5 setup. I have a checksum file of my bulk data directories, but it is annoying to run manually; I would love to have this feature built into the operating and file systems.

pjc · Dec 8, 2014

hfromM said:
"When a VDev can no longer provide 100% of its data using checksums or mirrors, the VDev will fail. If any VDev in a zpool is failed you will lose the entire zpool with no chance of partial recovery. (Read this again so it sinks in)"

The first sentence is a slightly overstatement. The sentence would be true if it read "100% of its metadata", which is essentially the directory structure (to gloss over many details).

If the device fails entirely (the second sentence), the device fails. In a single-drive situation, that's hardly surprising. Cyberjock's admonition here is to warn people that use striping without redundancy: any failed drive tanks the whole pool.

what happens if the scrub detects some checksum irregularities for some files?

Some files with bad checksums is not the same as a failed device. But if you get more than a trivial number of checksum errors, you're well on the way to failed device.

And without redundancy, it means some data loss.

Important Announcement for the TrueNAS Community.

ZFS concerns

hfromM

Cadet

Ericloewe

Server Wrangler

hfromM

Cadet

Ericloewe

Server Wrangler

L

Guest

SirMaster

Patron

no_connection

Patron

cyberjock

Inactive Account

depasseg

FreeNAS Replicant

danb35

Hall of Famer

depasseg

FreeNAS Replicant

danb35

Hall of Famer

depasseg

FreeNAS Replicant

depasseg

FreeNAS Replicant

Tywin

Contributor

pjc

Contributor

Similar threads

Important Announcement for the TrueNAS Community.

ZFS concerns

Cadet

Server Wrangler

Cadet

Server Wrangler

L

Guest

Patron

Patron

Inactive Account

FreeNAS Replicant

Hall of Famer

FreeNAS Replicant

Hall of Famer

FreeNAS Replicant

FreeNAS Replicant

Contributor

Contributor

Important Announcement for the TrueNAS Community.

Related topics on forums.truenas.com for thread: "ZFS concerns"

Similar threads