ECC vs non-ECC RAM and ZFS

pantss · Jan 19, 2015

I tried reading through a few pages here and there but I'm afraid I don't understand.

Does it mean that without ECC, you can lose ALL your data on ALL your drives despite them being a RAID1 configuration? Is there even RAID1 in ZFS systems? What's the point of RAID1 if memory can corrupt the entire pool? There are people who say that even ECC rams can fail, even though the failure rate is lower.

Is there a way to have a simple RAID1 setup with FREENAS without ECC rams?
How can I have an exact copy of my drive that I dump files into without worrying about data corruption or ECC rams etc?

marbus90 · Jan 19, 2015

The point of raid1 (called mirrorset in zfs) is data protection. ECC RAM can fail, but corrects itself in most cases and alerts for all cases. that way ZFS knows not to use that data and retry it.

And no, I'm not going to give advice on ZFS without ECC.

pantss · Jan 19, 2015

Correct me if I'm wrong, but the only other system is UFS right? Can I have UFS and non-ECC ram and keep my data safe?

marbus90 said:
The point of raid1 (called mirrorset in zfs) is data protection. ECC RAM can fail, but corrects itself in most cases and alerts for all cases. that way ZFS knows not to use that data and retry it.

And no, I'm not going to give advice on ZFS without ECC.

anodos · Jan 19, 2015

pantss said:
Correct me if I'm wrong, but the only other system is UFS right? Can I have UFS and non-ECC ram and keep my data safe?

As of 9.3 FreeNAS doesn't support UFS.

depasseg · Jan 19, 2015

Non-ecc means not safe. Period. The filesystem doesn't matter.

jgreco · Jan 19, 2015

depasseg said:
Non-ecc means not safe. Period. The filesystem doesn't matter.

Well, actually the filesystem does matter a little bit. Your data is at risk with any filesystem if you do not use ECC. However, ZFS relies on the host system a lot more than your average filesystem, so your data is at more risk if you use ZFS without ECC. Counterintuitively, with ZFS, your data could potentially be rewritten for purposes of error correction if the host system thinks that it detects an error - even if you think you are just reading the data.

Non-ECC memory will corrupt data regardless of the type of file system. It is just more dangerous with ZFS.

depasseg · Jan 19, 2015

Oh sure, confuse the situation with technical details.

jgreco · Jan 19, 2015

depasseg said:
Oh sure, confuse the situation with technical details.

Go suffer a double bit ECC error.

depasseg · Jan 19, 2015

I guess that was my point. The filesystem can only help to a certain degree. But without being able to trust the RAM, it's a moo point ("you know, like what a cow thinks" Joey - Friends).

jgreco · Jan 19, 2015

depasseg said:
The filesystem can only help to a certain degree.

But the filesystem can also hurt you; with ZFS, the self-healing system is designed for a trusted computing platform (which implies ECC), and can actively do damage on a non-ECC system if the memory is flaky. That extends to mere reads of the pool corrupting the pool.

With UFS, the data might be corrupted while transiting the host system (either for write or read) but a read of intact data does not result in the intact data being destroyed.

depasseg said:
The filesystem doesn't matter.

That's why I take issue with this statement; non-ECC is unsafe for your data in all cases, but it is MORE dangerous with ZFS, so therefore the filesystem does matter at least somewhat.

depasseg · Jan 19, 2015

Understood and agree that ZFS makes the choice of non-ECC even more dangerous. I don't think people understand what ECC is or what it does. So to simplify, I tried to unlink the security that ZFS (or any filesystem) provides from the safety that ECC RAM provides. I agree that ZFS is much more sensitive to RAM errors. But I don't think the majority of users with the "Can I use RAIDZ2 to the offset the risk with using non-ECC RAM?" are going to get it. Hence my summary of "Non-ECC means not safe."

cyberjock · Jan 19, 2015

depasseg said:
Understood and agree that ZFS makes the choice of non-ECC even more dangerous. I don't think people understand what ECC is or what it does. So to simplify, I tried to unlink the security that ZFS (or any filesystem) provides from the safety that ECC RAM provides. I agree that ZFS is much more sensitive to RAM errors. But I don't think the majority of users with the "Can I use RAIDZ2 to the offset the risk with using non-ECC RAM?" are going to get it. Hence my summary of "Non-ECC means not safe." :)

And that's kind of how I do it. For that question they're making an (inappropriate) link to more parity being the solution to the potential of bad RAM. There is no link between the two at all with regards to protecting your data. RAIDZ2 doesn't offset the risk of bad non-ECC RAM, just like the risk of ECC RAM doesn't mean you can use RAIDZ1. ;)

Louwrentius · Jan 20, 2015

Hello!

I'm not sure if you've seen this but people assert with some explanation that it is not true that a scrub combined with rotten memory will kill your on-disk data.

https://clusterhq.com/blog/file-systems-data-loss-zfs/
http://www.reddit.com/r/DataHoarder/comments/2suurf/how_true_is_this_statement_if_you_experience_a/

The statement is like this:

If a checksum fails due to a rotten bit being inside either the checksum or the data itself, ZFS assumes that something is wrong with the data.
In that case, if parity or mirroring is used, ZFS will use the parity or the mirror. I understand that if the data is from a mirror, it's just exactly the
same data with the same checksum. Either that data will also be corrupt, due to another rotten bit in memmory, which will just result in a lost file.

How, with parity, I can understand that you can use the parity to reconstruct the data, but how does one validate that data with a RAIDZ1 or higher?
Assuming that we read a chunk of data of one drive and that it is seen as rotten due to a bitflip, by reading all the other blocks and do the parity calculation, it can be reconstructed. If the data itself was at fault, all seems well. If the checksum was at fault, the result of the parity calculation will not match the checksum and you lose the file.

At this point, data is still valid on disk, ZFS just can't serve the data. Replace the memory and all is well.

I'm really trying to find a reasonable scenario where a scrub will actually murder the data-at-rest on your system. I want to understand why.

If the bad ram only affects data or checksums, I would expect nothing permament should happen to your data-at-rest, even with a scrub.

The scenarios where I can consieve that you may lose your entire zpool is this:

1. bitflip hits kernel code, more specifically zfs code
2. bitflip hits ZFS in-memory meta data structures

1 -> sounds like a very remote chance, only a few hundred k of 'static' memory that might get affected.
2 -> sounds way more likely, although less likely than data getting (silently) corrupted.

I wonder what I'm missing here. The risk of not using ECC memory is not just losing a few files you just were transfering, but the statement is that existing data or even the zpool can get corrupted.

What I find interesting is that even the ZFS author Matt Ahrens states that non-ECC memory is not that of an issue for home use. But as seen in the discussion on BSDNOW, it seems that the risk of meta data getting corrupted, is not even acknowledged.

I would very much appreciate it if anyone would help shed light on this and help me understand.

Richard Yao Mats Taraldsvik • 4 months ago
There seems to be a misconception that ZFS has a "destroy data if checksum fails" mode. I will probably address this in a future blog post after finishing background research on how people came to think this. However, it is categorically false. If ZFS detects a bad block, it will automatically heal it by fetching a good block and writing that as a replacement with a report that it fixed a checksum failure. If ZFS cannot find a good block, it will report the file as corrupt for the system administrator to correct. In no way does ZFS ever make a bad situation (checksum failure) worse by replacing good data with bad data.

Something else comes to mind. ZFS calculates parity when using RAIDZ(1|2|3).
So we have a RAIDZ1 array. We have some data. We calculate the parity of those two
data blocks. That parity is also checksummed I assume, BUT before the checksum of
the parity block is calculated, the parity block is already mangled by bad memory.
So in this case, the data is fine, the data will checksum fine. But now there is a small landmine
hidden in your file system as part of the parity. Even if you replace the faulty data at a later date,
In case a drive fails, data is reconstructed based on false parity data, resulting in data corruption.
Long story, but is this in any way plausible? It's not that I've read the ZFS source or anything.

cyberjock · Jan 20, 2015

I personally am not even going to discuss this further. We've already got first-hand knowledge of the problem.

ZFS will *always* serve data, even if its corrupted and you can't repair it with parity! This is easy to validate too.. look at all the RAIDZ1 people that still have had pools after replacing a failed disk and a second disk had a few errors while rebuilding.

Anyway, that's all I'm gonna say. I've said what i needed to say in my first post.

Louwrentius · Jan 20, 2015

I'm not debating the ECC vs. non-ECC topic. I'm pro-ECC and probably one of the biggest proponents ;) There is this (maybe mediocre) blogpost about 'please use ZFS with ECC' ram I wrote, one of the first hits in google if you enter 'zfs ecc'. But I do want to be better informed and understand how it works and how it doesn't. Answering, the source code is there for you to read it, basically says: "I actually don't know".

I understand that you've seen some real-life cases where faulty memory was in play, but although I don't want to downplay the relevance of these incidents, they weren't really tests under lab conditions. That's why theory does count. Because then you can exactly describe the scenario's where you know you will fuck up your entire pool if xyz happens in memory due to bitflips. To me that's a better argument than a few anekdotes about people presumingly losing their data due to bad memory. In that case, although you may not care, people can say: "well, I wasn't there."

However, I forgot this nice summary from openzfs.org:

http://open-zfs.org/wiki/Hardware

It can cause metadata corruption. Recovery from such an event will depend on what was corrupted. In the worst, case, a pool could be rendered unimportable. But then they write this:

All filesystems have poor reliability in their absolute worst case bit-flip failure scenarios. Such scenarios should be considered extraordinarily rare.

I wonder on what basis they make such a likelihood estimation.

jgreco · Jan 21, 2015

Yes, the absolute worst case bit-flip failure scenario that totally torches each individual filesystem is extraordinarily rare.

The problem is that there are a lot of less-than-absolute-worst-case bit-flip scenarios. With ZFS, for example, the data is not necessarily protected by checksum while in-core.

The issue with metadata corruption on a system that lacks the equivalent of a fsck or chkdsk is that sooner or later the corruption will exercise a codepath that isn't fully protected from the bad metadata and the host system will do bad things ranging from further corrupting the pool all the way on up to panicking.

So if you want "tests under lab conditions," you can easily write a little program that introduces an error into a metadata block and writes it out to the pool with a valid checksum. That effectively replicates the sort of thing that is worrying. Or you can modify the ZFS source code itself to introduce such errors.

Schaapje said:
But I do want to be better informed and understand how it works and how it doesn't. Answering, the source code is there for you to read it, basically says: "I actually don't know".

No, the problem here is that we do have a basic idea of how it works. The problem is that understanding any given corruption is like getting on the freeway at 70 miles per hour, opening your car door, and trying to pick up a nail with your bare hand that you've been told is laying out there on the freeway at some point. Then being asked to prove that the nail would have ended up in someone's tire, causing an accident that could have resulted in death.

We know that putting nails out on the freeway is dangerous. We've seen nails end up puncturing tires. We can't prove that every nail always ends up in a tire, and wouldn't be idiotic enough to try. We certainly do not suggest that every nail results in an accident and death, but on the other hand, we know that it can and does happen.

So if you want to do "tests under lab conditions," you need to find a way to put nails out on the highway in a controlled manner and then run them over in a reproducible manner. For that, I would suggest that you actually look at the source code and evaluate (particularly) the code paths that ingest pool metadata to see how well they're protected against bad data being introduced. Then you introduce bad data specificially designed to plausibly replicate a single bit flip and see what happens. The problem is that the answer may well be "nothing ... right now" but that at some point in the far distant future when you've got a lot more data on the system, at some critical point that metadata is used as a deciding factor to take some action that ultimately loses you some data, or, worse, your pool. You have to remember that it isn't just a matter of what happens right now, but also any effects down the road for the remaining life of that metadata block, and for the remaining life of any other metadata that contains bad/errant/incorrect info that is introduced as a side effect of that original metadata error.

jgreco · Jan 21, 2015

Schaapje said:
I'm not sure if you've seen this but people assert with some explanation that it is not true that a scrub combined with rotten memory will kill your on-disk data. [...]
In case a drive fails, data is reconstructed based on false parity data, resulting in data corruption.
Long story, but is this in any way plausible? It's not that I've read the ZFS source or anything.

There's a big difference between "a scrub combined with rotten memory will kill your on-disk data" and "a scrub combined with rotten memory could kill your on-disk data."

The former is false. The latter is true.

Note that calling "a scrub combined with rotten memory will kill your on-disk data" false specifically DOES NOT MEAN the inverse, "a scrub combined with rotten memory will not kill your on-disk data," is true.

Computer people often have some difficulty reconciling/grasping things that have some level of indefiniteness. I used to work writing critical code for medical devices, and one of the things that you learn very quickly is that many "unlikely but possible" things actually do eventually happen. You learn to engineer to assume that they WILL happen and then recover as gracefully as possible. It is perfectly reasonable to be aware that a situation is unlikely and still wish to handle it.

For a filesystem, you have to recognize that the issue isn't necessarily one of immediacy. Let's forget ZFS specifics for a moment and just talk generally. That bit-flip today might not have any effect today, but we're talking storage systems, so if that bit flip happens to land in the block allocation table (which is likely to be held in-core for a long period of time), and causes an allocated block to appear as unallocated, that might not seem like a crisis. Let's even go so far as to assume that the errantly "freed" block in question is a data block out of a regular file.

That really isn't a problem - today. The file is intact. And in all likelihood the block won't be allocated because it's a single lonely block, and the system won't try to allocate that since most allocations are of multiple blocks. But one day, when you've stressed out your filesystem by filling it to 98%, that block gets written: it gets filled with metadata for another file. Now metadata for file #2 is in a data block for file #1.

Now you remove file #1. The metadata block is (incorrectly?) freed, since it appears to be a data block belonging to file #1. Then something else overwrites that metadata block with a file data block, and suddenly you have a real mess. From a single bit flip and some not-terribly-unlikely next steps come some real structural problems.

The problem is that over the lifetime of a pool, you might be performing petabytes of reads and writes to the pool. Each individual block is a new opportunity for things to go wrong.

In case a drive fails, data is reconstructed based on false parity data, resulting in data corruption.
Long story, but is this in any way plausible? It's not that I've read the ZFS source or anything.

Specific types of memory issues combined with a pool scrub increases the risk profile by giving more opportunity for things to go wrong. Of course it is entirely plausible that this will happen to someone, somewhere, sometime. Will it happen to you? Almost certainly not. But the key concept here is that this is a totally preventable class of catastrophe.

It's like wearing seatbelts in your car. You don't get into an accident every time you get into your car. Why do you put on your seatbelt every time you get in the car? It's because it can save you from an admittedly unusual but devastating type of catastrophe. Most of us do not need to personally recreate car crashes in our own lab in order to be convinced that the minor added cost of seat belts is totally warranted.

And with that, and because Cyberjock indicates he's not interested in beating the worn away patch that used to be a bloody red spot that used to be a dead horse, I'm closing the thread.

If you wish to use non-ECC, do so with the admonition that you are placing your data at some potential risk. We assume you come to FreeNAS for the data integrity benefits of ZFS, and ECC is an important component in creating the trusted computing platform that ZFS needs in order to do that job.

Important Announcement for The TrueNAS Community.

ECC vs non-ECC RAM and ZFS

pantss

Cadet

marbus90

Guru

pantss

Cadet

anodos

Sambassador

depasseg

FreeNAS Replicant

jgreco

Resident Grinch

depasseg

FreeNAS Replicant

jgreco

Resident Grinch

depasseg

FreeNAS Replicant

jgreco

Resident Grinch

depasseg

FreeNAS Replicant

cyberjock

Inactive Account

Louwrentius

Cadet

cyberjock

Inactive Account

Louwrentius

Cadet

jgreco

Resident Grinch

jgreco

Resident Grinch

Similar threads