ECC vs non-ECC RAM and ZFS

James Snell · Nov 3, 2013

Disclaimer: I haven't bothered to study the architectural details of ZFS. As a user of the file system, I shouldn't really need to know the details either.

I've long-since operated a wide range of personal servers on non-ECC memory and never had a problem I could ever remotely chalk-up to a memory error. I could just be lucky, but back in the day, I ran a ~6-8 disk Linux LVM with zero redundancy with my massive personal media archive on it. It never once failed me. I added & removed drives from it maybe once a year. I replaced the motherboard, cpu, memory (etc) many times too. No data problems, ever. All on consumer-grade hardware. Granted, I favoured Intel everything, I never bought ECC memory, due to cost (and being a kid, at least, for part of that period). Still, I've never had a problem I could remotely trace to memory faults.

I now am operating a FreeNAS box at home with a mirrored ZFS situation, on non-ECC memory. After skimming various discussions basically calling out anyone who doesn't run on ECC a "n00b", "idiot" or whatever, I'm feeling like juggling my hardware around, as I can shuffle my gear such that my FreeNAS will end up on a machine with all sorts of "server-grade" glory.

My statement: I'm pissed by my (mis?)understanding that ZFS is relatively very fault tolerant. I'm sure it must be in various ways. Being intolerant to memory errors and thus failing entirely (whole zpool losses) make this file system sound like a cool academic toy to me. Not something I feel overly inclined to rely on. What I don't know for certain is how other file systems handle the same sorts of failures. I've read that UFS is more stable than ZFS, whatever that means. I've used ReiserFS, ext2,3 & 4 for many many uses, some extremely fault-intolerant and (insanely expensive situations) and always on non-ecc memory. My only problems were bottlenecks or weird physics of moving hdd platters (2.5" mechanical drives will never work in helicopters, take my word for that).

Suffice to say, I've lived and breathed professional Unix Programming & hardcore hobbyist Linux sysadmin for over a decade and this notion of risking everything over not running ECC is really distracting. Am I the luckiest jerk on the planet? Why hasn't this come up for me?

It's not like running on ECC hardware is complicated. Heck, I happen to have some great such hardware laying around, so I have options personally. But it costs a lot more and thus massive droves of users out there can't/won't afford it, especially with past's like mine. Am I totally fooling myself about this topic only referring to ZFS?? I'm sure enough untimely failures will trash any system, but in practice, does this sort of thing really happen? Really? (*shakes you, the reader, violently, desparately*) No seriously, does it really ever happen in the actual real-world? Tell me a freakin actual story!

I think any seasoned computer user can agree there's no substitute for complete, regular and multiple backups, though I can say also the only time I've ever used my own backups were on rare occasions when a hard drive's PCB died (once ever) or when someone else deleted data they later decided they wanted.

Perhaps I've lived a profound, remarkable, sheltered life.. And boasting of my experiences could lead others to their demise (and perhaps my own)?

It seems to me like the frequency of these sorts of errors are so incredibly rare that it's feasible to build software solutions. It's interesting how much computing guts I can get for $300. But I doubt a new motherboard, cpu and stick of ECC memory will ever fall below $500+. I would think some BSD/Linux kernel geek would have written a memory management module that could redundantly store and soft-check everything in memory and thus reduce the situations where ECC would make a real difference. Sure a soft approach would be an inefficient use of the gear. So? Don't you have better things to do with your time? People use fancy ATMEGA cpus just to drive blinky-lights (Arduinos). I bet those chips have more guts than were needed in the Apollo spacecraft. Efficiency is also measured in effort to get the thing working for your needs. So, use your hardware inefficiently if it gives you +1% stability on something you really depend on.

As long as running ZFS practically requires doubly-expensive hardware, it'll be an incomplete file system. And woeful tales of misery will be associated with its use. Especially when it's bundled with free software with names like FreeNAS and NAS4Free. People are going to use such software on the throw-away hardware they have. If they have to use it on >$500 equipment, they'll go buy a dedicated hardware nas unit that already has everything set up for them. FreeNAS' latest categorization of UFS as a legacy feature seems to just further provoke this trajectory of dissent from the users.

Hardware faults will happen. Effectively losing terabytes of data over a potential single bit error, seems pretty remarkably useless. Maybe I'll switch over to UFS with FreeNAS, even with my ECC server in play.

Now, let the crap-fest of hateful responses commence...

DrKK · Nov 3, 2013

James Snell said:
It seems to me like the frequency of these sorts of errors are so incredibly rare that it's feasible to build software solutions. It's interesting how much computing guts I can get for $300. But I doubt a new motherboard, cpu and stick of ECC memory will ever fall below $500+.

So I just built mine. Total price was $800, approximately for all new stuff. Supermicro X10SLM+-F-O mobo, 8GB of Kingston ECC RAM, I am using the Intel Haswell G3220 ($70!), and 3x2TB WD Red, and a nice Gold-80 power supply. All in a Bitfenix Phenom M case. $800. Brand new. For the whole shebang. The hard drives were $103 apiece, so subtract $309, and we're at $497. That's under $500 for PSU, mobo, CPU, and case at the bottom end of what these guys on here consider "acceptable" for performance and stability.

So I'm just saying. And it works awesome.

Innocuous · Nov 4, 2013

Hopefully the heat has now died down on the discussion as I am hoping to get a summarised recommendation that someone like me ( a newbie ) can use and action.
The question that I have for CyberJock and others is this:

What is the lesser of two evils?
A) A more resilient file system (ZFS) with an enhanced data risk owing to RAM memory errors (Non-ECC).
B) A less fault tolerant file system (something other then ZFS) with the regular risk from non-ECC Ram.

I think this is a really important question as quite a large proportion of the posts I have read on this forum encourage all users to use the ZFS file system because of its enhanced stability and functionality, to that extent I have watched reviews and youtube video's where 'Semi Professional' reviewers talk about the merits of the ZFS file system when re-purposing an old computer into a NAS. These types of post and comment could lead a lot of users into a vulnerable position with their data and undermine the reputation of this OS.

If it turns out that B) is the preferred option then I think admins of this forum should be very clear in posts that ZFS should only be used as a Professional/Business functionality. I would even go so far as recommending that a note be added to the UI to this affect. Certainly the wiki needs a strong update to encourage non-ECC memory users to use UFS rather than avoid it:

While the UFS filesystem is supported, it is not recommended as it does not provide any ZFS features such as compression, encryption, deduplication, copy-on-write, lightweight snapshots, or the ability to provide early detection and correction of corrupt data. If you are using UFS as a temporary solution until you can afford better hardware, note that you will have to destroy your existing UFS volume in order to create a ZFS pool, then restore your data from backup.

- http://doc.freenas.org/index.php/Volumes#UFS_Volume_Manager
And to be clear that ECC Ram is not optional if ZFS is being used.

If your system supports it and your budget allows for it, install ECC RAM.

- http://doc.freenas.org/index.php/Hardware_Recommendations

It would also be good to get a clear guidance on our options if ZFS is not used e.g. can we/should we set-up software raid.....

Thanks for your help,

J.

Innocuous · Nov 4, 2013

Last point, it struck me from re-reading my post and that of CyberJock that maybe FreeNAS is not intended for home users like me and that it is being squarely aimed towards business and professional users. Is this the case?

cyberjock said:
The real problem with IT people. They buy desktop hardware, install a Server OS, then call it a server. It's not a true server unless the hardware and software is server-grade. But geeks everywhere wanted to be "cool" and have their own server at home without paying the "server grade" price. Well, time to pay the piper for that inaccuracy that has been allowed to fester for more than 15 years.

I understand the point that you are driving at here CyberJock but I think you are missing the point and being a bit puristic about your definitions.

Firstly - A server is any device that manages access to a centralised resource such as data. Paying for enterprise level hardware does not magically make a server a server, it just makes it more robust, performant or scalable. That is perfectly fine in a multi-million $ organisation where lost data has a real and tangible cost, but maybe the cost benefit ratio is not there when we are talking about a bunch of documents and mp3's.

Secondly - The requirement of most of us 'IT People' is to reuse existing hardware to provide some basic file sharing and data back-up solutions that is better than keeping everything on our desktop PC. If FreeNAS or ZFS is not designed for this task then I would suggest that it is either not being as clear as it could be about its target audience, or the technical delivery of the solution does not provide a solution that meets the requirement of all of its users. Either way this is not the users fault. We have a genuine requirement and are looking for the best solution out there. If FreeNAS is not it, then tell us to go elsewhere. Maybe I/we should stick with our old copies of Windows or Linux? I feel that this would be a shame as FreeNAS seems like it may have a lot to offer even when ZFS is not used as the file system - do you disagree?

Thx,

J.

DrKK · Nov 4, 2013

I mean, I'm sure CJ will answer most of these points, but how can you say these guys aren't being clear on here about their target audience? I think they are EXTREMELY clear that FreeNAS with ZFS is only for people who are ready to study, learn, buy the right hardware, and do it the right way. I read the manual three times, and ran FreeNAS in VM's for weeks to try and learn the ropes, and went through this forum a hundred times. That's how I evaluated whether or not it was for me. The target audience is clearly defined in a million places on this forum, and there are a thousand admonishments not to use non-ECC, RAM-starved, old hardware (or even, new consumer grade hardware) and try to make a ZFS system under the impression that it is in any way RELIABLE or even EFFICIENT in those cases.

I certainly didn't have to ask a single question to decide whether or not FreeNAS was right for me to setup, and what hardware I needed, based upon the information any diligent interested party would garner from even a cursory perusal of the freenas.org site.

I just don't think there is any ambiguity on this point from the devs or anyone else senior in this forum...only from users.

If you have old desktop-grade hardware that you want to cheaply turn into a file server system, I think 100% of the guys on here would tell you that this is not their recommendation, and not the intended use case for FreeNAS w/ ZFS. They offer cogent engineering reasons for that view in a hundred places.

That being said, this doesn't stop people from trying. Sometimes successfully, sometimes not.

cyberjock · Nov 4, 2013

Let me reiterate that everything I'm about to say is personal opinion.

Innocuous said:
What is the lesser of two evils?
A) A more resilient file system (ZFS) with an enhanced data risk owing to RAM memory errors (Non-ECC).
B) A less fault tolerant file system (something other then ZFS) with the regular risk from non-ECC Ram.

Personally, I'd rank things in this order:

1. ZFS with ECC RAM
2. Other file system with hardware RAID with non-ECC RAM
3. ZFS with non-ECC.

The fact that bad non-ECC RAM not only damages your pool on said server but that the most common backup methods(snapshots and rsync) are also rendered useless along with those destinations being trashed themselves makes non-ECC with ZFS far worse. It would be one thing if you could still have good backups with ZFS + bad non-ECC RAM. But based on results from literally every user that has had bad non-ECC RAM losing their data and backups, I consider it to be riskier than using something like Ubuntu + ext4 or Windows + NTFS or FreeNAS + UFS with non-ECC RAM.

Innocuous said:
I think this is a really important question as quite a large proportion of the posts I have read on this forum encourage all users to use the ZFS file system because of its enhanced stability and functionality, to that extent I have watched reviews and youtube video's where 'Semi Professional' reviewers talk about the merits of the ZFS file system when re-purposing an old computer into a NAS. These types of post and comment could lead a lot of users into a vulnerable position with their data and undermine the reputation of this OS.

FreeNAS comes from FreeBSD. And every FreeBSD server owner I've chatted with personally and asked what type of RAM they use has always said ECC. Compare that to Windows guys that have ECC if work paid for it, but largely used non-ECC at home(I did it for 10 years myself). It's just a different culture. FreeBSD revolves around power and reliability. Windows isn't extremely powerful nor is it reliable compared to FreeBSD and Linux. But it does work for most people. I think the real problem is that too many amateur IT guys have misused non-ECC RAM in places where ECC should have been used. Until 2009 every "server" I made wasn't a server at all. It was a desktop with an unusually large number of hard drives and a Server OS. It wasn't a true server in that the hardware wasn't server grade. I didn't ensure high levels of uptime(and I didn't have to) and the benefit of paying for ECC RAM wasn't completely clear or obvious. It might have been "better" than your average desktop because I always shopped for tier 1 parts only. But it definitely wasn't as reliable as comparable Linux and FreeBSD servers of their time. In Windows if your system gives you a BSOD you simply hit the reset button and hope you didn't lose data. ZFS considers losing data in any form to be unacceptable. That's how its designed. That level of reliability doesn't exist in Windows in any form whatsoever. We're talking a whole different level of reliability and data protection when you walk across the street from Microsoft Blvd to FreeBSD St.

This is also where things stop being crystal clear and get really muddy though. So define "Semi professional" reviewer? Hell, define "server" and "repurpose harware"? Everyone has their own definition of what those are, and I'm sure we could find at least 3 different people that could argue 3 different definitions for each of those words. The reality is that your average home IT guy really doesn't know the stuff he should know to work in IT. Lots of government funded research has found that there's not enough IT people that actually know their job in the USA. I believe it too. I've seen some damn smart IT guys that know their stuff and I've met some people I wouldn't trust to fix my mom's laptop. I've talked with people from Dell, HP, etc on Skype that (in my opinion) I'd never trust to build me a server as its clear he's just a high school graduate that thinks he knows far more than he does and he things experience trumps knowledge. It doesn't. I'm sure there's college degree individuals who can actually talk intelligently about hardware, but when your experience is limited to building some computers at home and fixing some at your local high school what makes you think you are "an IT guy"? You wouldn't be considered to be anything in the medical field if you could solve some medical problems with "replace the broken leg"(akin to replace the bad component). If you start thinking about most fields, IT is unique in that people with virtually no clue what they are talking about can get a job in the IT field and suddenly they are a "professional" because they work at Dell. Yes, some of my close friends fall into this category I'm speaking of. I've told some of them what I've thought. Some of them even agree with me, but if it pays the bills why change it?

There's a paper I read somewhere discussing the future generation of "IT guys". You know what the "requirements" are to be labeled an "IT guy" for the future generation? They can use Youtube, they can use Google, and they know how to customize their wallpaper and plug in network cables. No kidding! I couldn't make this stuff up! If I had the link I'd post it(if someone has the link I'd love it if they'd post it). But I bet 99% of us here in the forums would scoff at people that called themselves "IT professionals" because they can use Youtube, Google, and can customize their wallpaper.

Innocuous said:
I would even go so far as recommending that a note be added to the UI to this affect. Certainly the wiki needs a strong update to encourage non-ECC memory users to use UFS rather than avoid it:

- http://doc.freenas.org/index.php/Volumes#UFS_Volume_Manager
And to be clear that ECC Ram is not optional if ZFS is being used.

- http://doc.freenas.org/index.php/Hardware_Recommendations

It would also be good to get a clear guidance on our options if ZFS is not used e.g. can we/should we set-up software raid.....

That idea starts a whole different debate.

FreeBSD's userbase revolves around having an extremely powerful system that you can do just about anything you want with. There's few warnings because the expectation is that the admin will know what they are doing and they won't going to go meddling with things they don't understand.

Window's userbase revolves around having an OS that can do most things you want with a certain amount of efficiency, but isn't nearly as powerful or efficient as a properly administered FreeBSD system. It also doesn't require the deeper level of knowledge that FreeBSD requires.

In effect, Windows has dumbed down computers for the masses. If you've used Windows your entire IT life then you're used to this dumbing down and its very very difficult to swallow the pill called "you don't know anything about computers". Some people think that on a Friday afternoon they're going to sit down and have FreeNAS running smoothly by Saturday afternoon and then 3 months later are asking why they lost their pool and had no backups. Then they might blame ZFS because everything they've read is that ZFS is high and mighty and uncorruptable. Unfortunately they didn't time reading how to take care of ZFS. ZFS will take care of your data if you take care of ZFS. But the opposite is true too. If you don't take care of ZFS you aren't going to like the consequences.

Microsoft did an excellent job of solving most(but not all) problems for you if you wanted to be a system administrator. But FreeBSD, FreeNAS, and Linux provide you with very powerful options. Options powerful enough to cause everlasting consequences for your data. You can optimize your non-Windows server for amazing performance or cause total data loss. Sure, you can tweak Windows. But I can't get the same performance on Windows that I do on FreeNAS or even Linux. I doubled my CIFS performance just by switching to FreeNAS. I spent weeks reading up on how to tweak Windows for maximum performance. And I never saw people that had Windows servers with Gigabit LAN do 120MB/sec from Windows. But I do it every single day on my FreeNAS server with the exact same hardware I had on Windows. This is in spite of ZFS's hunger for RAM and CIFS' hunger for processing power.

Innocuous said:
Last point, it struck me from re-reading my post and that of CyberJock that maybe FreeNAS is not intended for home users like me and that it is being squarely aimed towards business and professional users. Is this the case?

I think that for some users FreeNAS is probably not for them. My mom couldn't make a FreeNAS server work in any forum if her life depended on it. Not everyone is fit to be the administrator of ZFS and not everyone is fit to be a server designer for enterprise class purposes either. There's no shame in that. I'd make a very poor choice for building linux desktops and servers. I'd love to learn(and I am slowly) but I know I'm not "there" yet. I know just enough to be dangerous. But if there's one thing IT people are good at doing, its telling someone they can't help the second someone says "that's not something you should handle because you don't have the necessary knowledge". Not too many IT people I've dealt with are humble enough and conservative enough to realize they don't know something and get help. They also aren't interested in thing like "how ECC" works. They just care that it works. I go and do a lot of reading and question how all this stuff works. It's fascinating to me. And just like prior things I've studied I dig deep because I like to understand how ZFS is made so reliable. I want to know how other file systems work too and I've read quite a few very detailed reports on how NTFS stores data. NTFS is pretty hokey in my opinion, but it does work fairly well and does scale up decently. Microsoft said back before Vista that NTFS needed to be killed off because it doesn't scale "well enough". Many of you may have heard of WinFS which was also killed when Vista was reenvisioned and 2-3 years of work was scrapped. NTFS is still around because Microsoft has reached a point where things are getting out of their control. They're starting to use the ReFS with Server 2012, but thats the extent of it. Microsoft just can't scale a file system up to large sizes effectively and they know it. Eventually ReFS will trickle down to desktops or some other file system will replace NTFS... someday.

Innocuous said:
I understand the point that you are driving at here CyberJock but I think you are missing the point and being a bit puristic about your definitions.

Firstly - A server is any device that manages access to a centralised resource such as data. Paying for enterprise level hardware does not magically make a server a server, it just makes it more robust, performant or scalable. That is perfectly fine in a multi-million $ organisation where lost data has a real and tangible cost, but maybe the cost benefit ratio is not there when we are talking about a bunch of documents and mp3's.

Secondly - The requirement of most of us 'IT People' is to reuse existing hardware to provide some basic file sharing and data back-up solutions that is better than keeping everything on our desktop PC. If FreeNAS or ZFS is not designed for this task then I would suggest that it is either not being as clear as it could be about its target audience, or the technical delivery of the solution does not provide a solution that meets the requirement of all of its users. Either way this is not the users fault. We have a genuine requirement and are looking for the best solution out there. If FreeNAS is not it, then tell us to go elsewhere. Maybe I/we should stick with our old copies of Windows or Linux? I feel that this would be a shame as FreeNAS seems like it may have a lot to offer even when ZFS is not used as the file system - do you disagree?

Most people would laugh at me if I told them that my android phone is a server. So what if it has some CIFS/FTP capabilities. But its far from a "server". And as I said above the definition of "server" is very vague as soon as you walk away from a hardware/software combination that is server grade. For many of us(myself included) we got away with using desktop components for our home "server" for years. But FreeNAS isn't your standard home "server". People with double and triple digit TB of data come here, and that's far from "standard".

FreeNAS has options for those that want to get involved without necessarily requiring ECC RAM. UFS and a hardware RAID or even UFS' own raid functions are probably sufficient for most people. You can also choose to disable ZFS' checksums and parity and effectively strip out the resilience if you so desire.

Ultimately, its up to you as the administrator of your server to do your own research and come to your own conclusions of what is acceptable risk. Me personally, I set my threshold pretty high for the average FreeNAS server designer because ZFS is unforgiving and bad non-ECC RAM can kill your backups pretty easily. As soon as I came to the realization that your backups will be trashed that made it a non-starter for me. Even if your backups get slightly mangled with an alternative file system and bad non-ECC RAM I consider that to be far better than losing everything because you mixed ZFS with non-ECC RAM and lost out.

In alot of regards I consider FreeNAS to be a very good "initiation" into the non-Windows world and getting closer to hardware than you have ever been. Some of the more serious errors people used to make are impossible(such as adding single disks to a zpool). But just like going to boot camp for your favorite military branch, not everyone will make it through. The atrition rate can be pretty high. The question is whether you'll arm yourself with knowledge, wisdom, and the willingness to read the right information to "graduate" or if you'll wash out in disgrace losing your data in the process.

I prefer to help as many people graduate as possible because I think everyone deserves the option of using ZFS and FreeNAS. But not everyone is qualified to use ZFS and FreeNAS, and that's something I can't fix within the confines of a few manuals, presentations, stickies, and some strongly worded forum posts. There's no easy test to figure out who is qualified either. That's something you'll have to determine for yourself. And if you get it wrong the consequence could be your very valuable data.

Sorry this is so long and hopefully you didn't fall asleep ;)

enemy85 · Nov 5, 2013

cyberjock said:
[...] I prefer to help as many people graduate as possible because I think everyone deserves the option of using ZFS and FreeNAS. But not everyone is qualified to use ZFS and FreeNAS, and that's something I can't fix within the confines of a few manuals, presentations, stickies, and some strongly worded forum posts. There's no easy test to figure out who is qualified either. That's something you'll have to determine for yourself. And if you get it wrong the consequence could be your very valuable data.

Let me say that i really agree with you, even if i'm at the very beginning process of my learning about ZFS and Freenas and I hope that studing how things work, will help me not to do a mess!

Innocuous · Nov 5, 2013

Thanks for such a detailed response CyberJock. You make really strong and valid points and I very much appreciate your efforts to ensure that the requirements of ZFS are widely known about.

It sounds like FreeNAS is not suitable for me as my requirements are too simplistic for FreeNAS:
- Reuse my existing hardware (which is not the slowest machine but not server grade equiptment either)
- Store and serve files to my network
- Provide some kind of redundancy within the box (ideally through raid 1 or similar, worst case 24hr sync to a second drive)
- Run applications like, SabNZBD, CouchPotatoe, SickBeard, CrashPlan and maybe Plex

FreeNAS using UFS could do this but it sounds like it is not recommended - is there a reason why? Would using UFS with FreeNAS be any worse than other Linux/Windows alternatives?

Finally, if FreeNAS is not recommended are there other options that those here would recommend? Looking at the market the one that sticks out is Nas4Free. I know that this is also built on FreeBSD, but it seems to have more options in terms of File Systems and regular raid technologies. Are there specific things that I should be looking out for when looking for a consumer NAS OS?

panz · Nov 5, 2013

Just a very, very, very stupid question (coming from who is trying to run a FreeNAS server 24/7, installed on a brand new non-ECC capable hardware):

if I run Memtest+ for, say, 72 hours and it tells me that my RAM is ok in all the tests, is this a bit reassuring?

enemy85 · Nov 5, 2013

panz said:
Just a very, very, very stupid question (coming from who is trying to run a FreeNAS server 24/7, installed on a brand new non-ECC capable hardware):

if I run Memtest+ for, say, 72 hours and it tells me that my RAM is ok in all the tests, is this a bit reassuring?

well i guess no because u could have ram failure in the future

Innocuous · Nov 5, 2013

panz said:
Just a very, very, very stupid question (coming from who is trying to run a FreeNAS server 24/7, installed on a brand new non-ECC capable hardware):

if I run Memtest+ for, say, 72 hours and it tells me that my RAM is ok in all the tests, is this a bit reassuring?

From reading this thread I think the answer will be....it will be reassuring that you are not starting the server with memory errors, but gives you no security if and when one of the memory modules fails. I can imagine that memory modules could fail at any time and that such a failure could go undetected for quite some time destroying lots of your data until it is found. Running Memtest+ daily would limit the amount data that you could lose I guess, but then that defeats the object of a 24/7 server.

Someone more qualified probably needs to confirm this, but I feel fairly confident that this is about right.

panz · Nov 5, 2013

Actually, I do a Memtest+ every 3 months as a standard procedure and a Spinrite Level 2 on all the disks.
But after reading the _very good_ post of CJ I'm SCARED!

panz · Nov 5, 2013

Innocuous said:
From reading this thread I think the answer will be....it will be reassuring that you are not starting the server with memory errors, but gives you no security if and when one of the memory modules fails. I can imagine that memory modules could fail at any time and that such a failure could go undetected for quite some time destroying lots of your data until it is found. Running Memtest+ daily would limit the amount data that you could lose I guess, but then that defeats the object of a 24/7 server.

Someone more qualified probably needs to confirm this, but I feel fairly confident that this is about right.

In my personal experience (I'm not a sysadmin, but I own computers starting from Olivetti first PC era :) ) defective RAM is catched my Memtest in the first minutes. Running a full session of tests every 3 months, during the periodical inspection, is overkill. Running Memtest+ basic tests is very useful and takes far less time than doing scrubs only. Same considerations about Spinrite (the best HDs testing tool I've ever met).

jyavenard · Nov 5, 2013

The thing to remember from reading this forum is that:
Unless you have server-grade hardware, ECC ram and what not. The minute you write some data to a zfs pool your data will be at risk, and likely your house will crash and burn instantly :)

You got to give credit to all those people out there who took the extremely brave (and silly) approach to use zfs on consumer grade PC with plain RAM and have done so for years...
Or even businesses selling ZFS appliances not even using ECC RAM..

ECC RAM protects you from bit flips. It doesn't protect you from bad RAM (faulty RAM or faulty DIMM whichever you want to call it)...
However, one point to remember, it's much easier to troubleshoot and find bad ECC ram than bad plain ram. The logs are much more verbose and when things start to crash at random you can usually find it pretty quickly with ECC Ram.
With non ECC Ram, you never have any guarantees on why your machine crashed: it's always more like you "suspect" you have bad ram and you start troubleshooting thing by replacing components one by one... That can take a long time.
And in my experience, it's not uncommon for DIMMs (e.g. Modules) any type (ECC and non-ECC) to develop a fault over time.
Sometimes just removing the DIMMs from their socket, blowing on them and putting them back is just enough...
So considering the price difference these days between ECC and non-ECC, and seeing how much grief it can save you from wasted time trying to find what is wrong with your server: it certainly seems worth it just for that.

But you have to take I to account that you then need to use a CPU supporting ECC RAM, and that narrow the choice a great deal and prices go up very quickly and for those CPU you also need new motherboards... With intel new sockets, that gap has been narrowed thanks fully..

Yes, there's a higher risk, but what's the exact probability for it to happen? To me I believe it's low enough that it can still be a matter of personal informed choice: and not a case of "you got to be silly to not use ECC ram with zfs" and have no choice BUT use ECC.

If you're in the thinking that ECC is a must, then the extension of that logic should be that all the machines need to use ECC, after all it always come down to the weakest link in the chain.

ZFS is very error resilient, and it can protect you from bit flip too (this is dependent on when the bit flip occurred). If you do snapshots often, in many cases of corrupted zpool it's been possible to revert to an earlier snapshots.
I have used ZFS for years without ECC RAM: I've had my share of cold sweats when a pool refused to mount. But this was always related to faulty disks, and I always managed to recover my data and pool.
Like the poster above, maybe I've just been very lucky.

To me it's not a matter of ZFS & ECC RAM.. It's a matter of ECC vs non-ECC in machines that are going to be running 24/7, 7 days a week for which you don't want to worry about and where you want to be notified that something is going to the crapper and not realise when it's already too late.

And now waiting for the flames and the 200 pages long replies :P

jyavenard · Nov 5, 2013

panz said:
Actually, I do a Memtest+ every 3 months as a standard procedure and a Spinrite Level 2 on all the disks.
But after reading the _very good_ post of CJ I'm SCARED!

And who can afford to take a server down to run memtest every 3 months and wait for hours until it completes?
If that's what required to ensure your ram is okay, better off using ECC!

jyavenard · Nov 5, 2013

cyberjock said:
Yes it does. I discussed this on this thread already. In fact, I think I did at least twice, but its not worth my time to go back and check. I know its in my initial post. Feel free to educate yourself.

They must be made of amazing technologies then, that manage to self-heal from hardware failures such as cold joint (unfortunately far more common these days since the banning of lead in manufacturing), impurities on silicon or manufacturing faults that are only revealed after being in use for months (obviously the list of cause for bad ram doesn't stop there)...

The problem here, is that it seems you can only stand one black&white view of things: it's either your way, or no way at all...
It's unfortunate that those forums do not have an ignore button.
There's no need for you to reply to every single posts made on these forums... You can also just ignore me if it causes you so much distress.

MacZFS · Nov 9, 2013

Cyberjock:

Thanks so much for your efforts to emphasize the importance of using ECC RAM with ZFS!

I'm absolutely convinced that when I eventually build my own ZFS-based NAS/home file server/FreeNAS system (which I plan to do in the near future) I would be asking for trouble if I didn't use ECC RAM. That I'm clear about. What I'm not as clear about is how crucial it is to have every computer/device in the home that accesses the NAS also use ECC RAM. Would that be dependent upon, for example, whether media is being streamed to a PS3 (unsure about whether it uses ECC/non-ECC RAM) for playback on a TV vs. whether one intends to add/edit/change files on the NAS from an iMac with non-ECC RAM?

I actually ordered a new iMac a few days ago but cancelled the order after reading this thread and some others on this board, largely because I wasn't sure about this issue! I'm glad I did because I think I may get the new Mac Pro system that ships in December and uses ECC RAM, unlike the iMac.

I should say that right now I'm using Mac OS 10.8 Mountain Lion with ZEVO, which finally brought ZFS to the Mac in a private-beta GUI edition that Don Brady developed before selling the product to GreenBytes. They later halted development on it before publishing a freely available command-line-only "Community Edition" now incompatible with 10.9 Mavericks. GreenBytes tweeted about a month ago that they were working on updating ZEVO CE to work with 10.9, but that has yet to happen, and many, including Brady, have migrated to other ZFS solutions.

I don't know if anyone here has experience with ZEVO or other ZFS solutions for Mac, but ZEVO brings up a related issue/question: Is using ZEVO on my old non-ECC RAM MacBook Pro (as I currently do with two external HDs) equivalent to accessing a separate FreeNAS share with the NAS running non-ECC RAM in terms of risking potential memory problems translating to data/pool corruption?

So far I've had no problems whatsoever of which I am aware with ZEVO and any data/pool corruption from my MacBook Pro's non-ECC RAM. It passes Rember's testing with flying colors. I used to feel much more secure that all my photos, iTunes library, etc. were in a ZFS pool instead of on an HFS+ volume, but reading this thread has given me pause for concern.

As a corollary, if the answer to the previous question is yes, then I would presume that using ZEVO (if it ever gets updated to work with 10.9 Mavericks!) on a new Mac Pro with ECC RAM would give equivalent benefits in minimizing corruption risk to accessing a separate FreeNAS share with ECC RAM on the NAS (and this ties back in with my question posed earlier about accessing an ECC RAM NAS using non-ECC RAM devices like iMacs and MacBook Pros).

I hope that all made sense. Thanks in advance! :)

cyberjock · Nov 9, 2013

As for your non-ZFS devices, its like this. If all they are doing is reading your media, there's no real risk. You can always do read-only shares to ensure everything is safe. We've had one person that had pictures they copied from their SD card on their desktop to the server and some wound up corrupt. This was because the files were being loaded into RAM as part of the process of copying the pictures to the server. He was convinced it was a server problem and that ZFS is clearly unreliable based on his corruption and it not being detected by ZFS. I gave him a very long winded response and I happened to mention "but if your desktop RAM is corrupt that would cause the corruption". It just so happened that he did run a RAM test and sure as heck he had bad RAM in his desktop.

I'd think that your potential issues are very unlikely. Keep in mind that some bad RAM in your desktop may cause some slight instability, but significant amounts of bad RAM could result in a system that can't even operate for more than a few minutes or might not even finish booting without a BSOD or equivalent. Snapshots can provide you with some protection for a file server as you could roll back some files if things somehow got ugly. But I'd expect that damage to data on your server would be pretty limited. Even in situations where you open a folder of videos and Windows starts thumbnailing them all, the worst case is that your thumbnails might be broken. That's not a big deal in my opinion. So overall I'm not too worried about it.

As for your ZFS based Mac, the consequences are potentially identical to FreeNAS. Even as it stands, if you read up on the ZFS on Linux project they give a cryptic answer in their FAQ:

1.16 Do I have to use ECC memory for ZFS?
Using ECC memory for ZFS is strongly recommended for enterprise environments where the strongest data integrity guarantees are required. Without ECC memory rare random bit flips caused by cosmic rays or by faulty memory can go undetected. If this were to occur ZFS (or any other filesystem) will write the damaged data to disk and be unable to automatically detect the corruption.

Unfortunately, ECC memory is not always supported by consumer grade hardware. And even when it is ECC memory will be more expensive. For home users the additional safety brought by ECC memory might not justify the cost. It's up to you to determine what level of protection your data requires.

If you Google for "ZFS ECC corruption" you'll get plenty of discussions on it. My advice is to read up and make your own decision on how important ECC is for you. I know that I'd never use ZFS on any system that doesn't use ECC RAM. If I were in a position where ECC RAM wasn't an option at all, then I'd rather stick with whatever file system your OS uses by default. At least if your RAM goes bad it won't destroy all of your data.

It is conceivable that non-FreeBSD based implementations of ZFS may be less destructive for various reasons. There is almost no emphasis I've found anywhere regarding recommending scrubs on ZFS at regular intervals. The few people I've chatted with that have used ZFS on Linux for their file server have never run a scrub, never really knew what a scrub was until FreeNAS, and don't run it at a regular schedule. I can imagine the same lack of knowledge and documentation is very likely on the Mac project as well. Many projects/companies aren't quick to stick a big sign out there on their project/product with the (sometimes) serious downsides of requiring scrubs on a regular basis. Look at how ZFS on Linux answers the question about ECC RAM. FreeNAS does to their own credit.

All of us mods write our comments with one thought process. That your data is very important and you're willing to deal with a slight performance penalty or pay for superior hardware if there is a clear benefit. Most of us(like myself) move to ZFS because we decide we want better than what NTFS, ext4, or HFS+ provides. Let's face it, 99.9% of us want FreeNAS for ZFS because of how reliable and safe your data can be stored and many of us buy a whole new system to make it compatible with FreeNAS. Because of that, most of our words of warning are to help people achieve the reliability they think they are getting (and want) when they choose to go to ZFS. Our words are usually geared towards the safety of your data within reason. You don't see us arguing for all SAS systems, which can be very expensive. We recommend and use SATA for most of our systems. The cost versus benefit just doesn't appear to be there. We have lots of evidence that SATA drives work just fine as long as you are proactive with disk monitoring and replacement. Sure, statistically there's probably someone out there that might still have their data if they had gone all SAS, but that's not much benefit for the cost. But due to the number of people that have lost their primary pools, trashed their backups, the warning for ECC is pretty much warranted and necessary. It was a hard pill for me to swallow when I first started dabbling with FreeNAS in early 2012. But as I saw user after user lose all of their data without warning and despite setting up ideal backup plans I had a change of heart.

Until the day comes when someone has a system with bad ECC RAM with a system that actually supports and uses ECC RAM, then I'll recommend ECC RAM for ZFS. I don't expect I'll see that in my lifetime though because I consider ECC to be the solution to corruption directly related to bad RAM.

MacZFS · Nov 12, 2013

Thanks for the helpful reply, cyberjock. Then I won't worry about using non-ECC systems to access my eventual ECC NAS. I figure most of my shares will be read-only anyway, but I think I will restrict more permissive shares to an ECC machine like a Mac Pro just to be safe.

It is conceivable that non-FreeBSD based implementations of ZFS may be less destructive for various reasons. There is almost no emphasis I've found anywhere regarding recommending scrubs on ZFS at regular intervals.

So how often do (or should) you run scrubs on your entire zpool? Rarely? Or only after a disk failure and resilvering? I did some reading about ZFS corruption, and it would seem that people's major problems were mainly caused after running a scrub with bad non-ECC RAM (resulting in an increasing number of bad checksums and often corrupting the entire zpool). I ran a scrub once about a year ago and ZEVO didn't report any checksum errors, but given the reported risk for zpool corruption with non-ECC RAM, I'm reluctant to run one again (besides, if there were any checksum errors, ZEVO wouldn't be able to repair my files because most are not backed up on another "live" ZFS pool, and I don't have redundancy enabled in my zpool for space reasons).

If I'm understanding correctly though, because I used a system with non-ECC RAM to transfer my files to the ZFS pool, in addition to (potentially) risking zpool corruption after a scrub, I can't be 100% certain (short of using a tool like IntegrityChecker) that individual files are free of corruption caused by any RAM bit flipping that may have taken place during file copying.

I found a thread on the GreenBytes forum discussing using ZEVO with non-ECC RAM. One of the posts there recommended ZFS (via ZEVO) over journaled HFS+, even on a non-ECC Mac. Is it possible ZEVO is not as prone to zpool corruption as some other implementations of ZFS? I could find no reports of ZEVO users experiencing zpool corruption from using a Mac's non-ECC RAM, but I realize that doesn't mean it isn't possible.

cyberjock · Nov 12, 2013

Per the ZFS technical manuals scrubs should be performed bi-weekly for consumer grade drives and monthly for enterprise class drives.

As for ZEVO I can't vouch for how it works. But the technological reasons for why ZFS should always be used with ECC isn't different for a different OS. The mathematical equations for checksums and parity data can still be corrupted all the same. I'd wager its far more likely that ZEVO just doesn't have the large userbase that FreeNAS has, so there's only a very small number of users that lose their data from bad RAM. The number of Mac users out there isn't significant and the subset of those users that would use ZEVO for an extended period of time is probably smaller still.

The fact that the website says that there is no deduplication tells me that it uses an old version of the zpools(15?). Not exactly that "mature" if you ask me. And considering there's lots of spam in the forums for "buy wow gold" and other stuff and one of their top posters left and his comments regarding that, it really makes me question how healthy their project is.

Overall, doesn't sound like a particularly healthy software package at all. Seems half-baked, incomplete, not well maintained, and there's a lack of actual knowledgable people in the forums.

Important Announcement for the TrueNAS Community.

ECC vs non-ECC RAM and ZFS

Explorer

FreeNAS Generalissimo

Cadet

Cadet

FreeNAS Generalissimo

Inactive Account

Guru

Cadet

Guru

Guru

Cadet

Guru

Guru

Patron

Patron

Patron

Cadet

Inactive Account

Cadet

Inactive Account

Similar threads