Open ZFS vs. Btrfs | and other file systems

Our Senior Analyst’s take on this week’s Btrfs news from Red Hat

I don’t know who said it first but hats off to them: “The only thing worse than competition is no competition.” This adage applies equally to market making where no competition can mean no customers, and to monopolies and monocultures. Beyond the balance of freedom and control that Open Source provides, the sheer choice found in the Open Source ecosystem is one of its greatest strengths. Name any category of software from complete operating systems on up and you have a plethora of choices with drastically-different philosophies, licenses, countries of origin, programming languages, and user experiences. I personally have invested my volunteer time and career in Open Source hypervisors and file systems and I am saddened to hear that a fledgling alternative to OpenZFS suffered a setback this week with Red Hat’s announcement that it is deprecating Btrfs as a “Preview” file system. SUSE continues to support Btrfs in only RAID 10 equivalent configurations, and only time will tell if bcachefs proves to be a compelling alternative to OpenZFS. This vote of no confidence from Red Hat leaves OpenZFS as the only proven Open Source data-validating enterprise file system and with that role comes great responsibility.

“In their FACES, right?” Wrong. Monocultures risk becoming vulnerable monopolies which is why virus writers target Microsoft Windows and we may face an “Impending Crypto Monoculture“. My colleagues with the OpenBSD project are flattered by the popularity of OpenSSH but insist that they don’t want it to be the only game in town. Monoculturalism has long been a driving factor in computing and is often self-perpetuating: Why not use and standardize on a good technology? OpenSSH was the right solution at the right time and remains the de facto remote login tool on Internet-connected systems, open source and proprietary. The same is becoming true of OpenZFS, the community branch of Sun Microsystems’ revolutionary, and eventually open sourced enterprise file system.

Fortunately, like OpenSSH, OpenZFS really is as good as people say it is. OpenZFS goes to unrivaled lengths to protect your data and is highly flexible and scalable. I have addressed the merits of OpenZFS at length in various ways and I welcome you, in fact urge you to verify those merits on your own. I invite you to start that journey with a simple question: “Can you verify without a doubt that your data has not suffered from bit rot?” I look forward to your answer. In the meantime, I personally am confident that OpenZFS truly addresses the shortcomings of other file systems and does so in a way that is extremely accessible to me:

  • OpenZFS has been my primary store under macOS for over three years and root file system under FreeBSD
  • I have moved OpenZFS-formatted multi-terabyte USB drives from my FreeNAS system to a Raspberry Pi 3 running FreeBSD and run my backup routine without issue
  • I have helped clients configure, maintain and optimize OpenZFS-based systems ranging from one to 500 terabytes in size
  • I have watched the OpenZFS community grow to include amazing volunteers and vendors who do what was impossible with storage at any price only a few years ago

It is an honor to work with the OpenZFS community and iXsystems in particular who, thanks to FreeNAS, TrueNAS and TrueOS, has put OpenZFS in more hands than any other project or product on Earth. Both are just now accelerating from a trot to a gallop and I am very glad that they have been cautious and calculating. Drama is not something you want to associate with file systems or the hardware they run on. Thanks to Illumos, FreeBSD and FreeNAS, no one is stopping you from building a petabyte of storage with whatever hardware you can afford. You really want to get the right hardware but no artificial barriers stand in your way. As you can imagine, iXsystems is an excellent source of the right hardware for OpenZFS, but that too is something I invite you to verify on your own. I am after all, a geek, not a salesperson.

If it’s so good, why isn’t OpenZFS as popular as GNU/Linux?

Short answer: The OpenZFS and Linux kernel licenses are incompatible, but for a reason. It took time, but I accept Bryan Cantrill’s assertion that the Sun CDDL was essential to keeping Sun and later Oracle from doing evil things with ZFS. This pains me because I am not a believer in software patents and believe that permissively-licensed software is the way forward, even if paradoxically at times. I also believe in the 6 reasons for GPL lovers, haters, exploiters, and others to enjoy and support GPL enforcement because all free software licenses need to be enforced to remain meaningful. In the case of GNU/Linux, OpenZFS’ CDDL license is incompatible with the Linux kernel’s General Public License according to the Free Software Foundation and Software Freedom Conservancy. This is presumably why OpenZFS is not even a “Preview” file system in Red Hat Enterprise Linux as Btrfs was. To comply with each license, the end user must manually build OpenZFS for Linux and for what it’s worth, this sounds like a great way to stay true to GNU/Linux’s DIY community roots. Embrace the license diversity and obligations, or agree with me that the permissive licensing of each project would resolve this incompatibility without consequences.

To that point, I welcome the bcachefs project to consider a permissive license to allow its incorporation into FreeBSD, OpenBSD, NetBSD, macOS and even Windows to allow its merits to shine on equal footing and in the hands of as many users as possible. Until that happens though, the Illumos distributions, FreeBSD, TrueOS and FreeNAS remain the only tier-one OpenZFS operating systems and thus places you want to keep your valuable data for the foreseeable future.

Michael Dexter
Independent Analyst

11 Comments

  1. Pedro

    Great article: I agree fully in the dangers of monocultures. OTOH, if you do have monocultures we are better off by having full opensource monocultures instead of closed-source or even restrictively-licensed ones.

    In my opinion, and with the mandatory disclaimer that IANAL, the CDDL is not as incompatible with the GPL as it is claimed to be. In a certain sense you could claim the GPL is incompatible to CDDL, but both are copyleft so there is practically no chance that you won’t comply with the CDDL if you already comply with the GPL. Oracle won’t sue you for complying with the CDDL, and the linux developers can’t really sue you either: not only this is what Linus would call a grey area but what would they ask for .. the source code?

    The real issue about the OpenZFS license is control. Neither linux developers or GNU “evangelists” want to see such a huge part of the kernel suddenly fall under a license different from the GPL.This is a difficult sell in the linux community, and perhaps the same applies to hammerfs or any other filesystem that could be worked on in a future by multiple communities.

    Developing a trustable filesystem with the same featureset as ZFS takes a huge effort and a a lot of time to get right so yes, for a forseeable future we are “stuck” with OpenZFS. Luckily OpenZFS is pretty good.

    Reply
    • Chris

      No, the CDDL and GPL really are incompatible and cannot be used together in a single work (ex: the Linux kernel). There is a pretty good discussion about it here,
      https://opensource.stackexchange.com/questions/2094/are-cddl-and-gpl-really-incompatible

      However, you are also right that the Linux kernel developers are not too enthusiastic about merging it into the main tree to begin with. This has not so much to do with licensing as with the the monolithic nature of the filesystem. ZFS takes over huge swaths of the VFS layer and the md raid layer in addition to providing the block layer filesystem. It does some of its own memory management, and even provides its own ACL implementation and some network services like NFS and SMB. So yeah, it’s a monolith. Breaking all of these pieces up into smaller bits that use as much native code as possible is a big task that the ZoL group has been working hard at, but it is still pretty far away from being palatable to Linus and the other kernel developers.

      That said, I’ve used ZoL extensively and it is quite easy and robust. While it would definitely be nice for it to be available in the mainline kernel, it is not such a huge barrier to overcome if you want to implement ZFS on a Linux system.

      Reply
  2. Jerry Combs

    While Red Hats action is certainly a blow to Btrfs, the importance of it is a bot overblown. Red Hat currently employs no Btrfs maintainers or contributors. This is simply a matter of resources. Suse continues to use it as the default root filesystem and has a team working on enhancing and maintaining it. I prefer ZFS but thus is not yet a death blow to Btrfs.

    See https://news.ycombinator.com/item?id=14907771

    Reply
  3. MB

    SUSE uses btrfs as a default root file system and is supported in RAID0, RAID1, and RAID10 profiles … RAID5 and higher RAID levels are not supported yet, but might be enabled with a future service pack.
    So not only RAID10, and saying something like this implies that there is something wrong with btrfs … and it is just not the case …
    The thing is Red Hat currently lacks the developers that are able to support btrfs, while SUSE, Oracle, Facebook do and will continue to support btrfs along with others from the community
    https://www.suse.com/communities/blog/butter-bei-die-fische/
    So please restrain from premature overjoy …

    Reply
  4. Evi1M4chine

    ZFS has a single glaring problem: Is MASSIVE RAM wasting.
    It was designed for servers, where the main purpose of RAM is to cache the slow storage, both in reading and in writing, and where sudden power loss is not expected.
    But on normal personal computers, wasting a whole damn GB of RAM per TB or storage, is orders of magnitude away from anything acceptable! And not writing data to disk ASAP is also guaranteeing catastrophic data loss at some point. Making the entire point of ZFS moot.
    Worse even on small single-board computers, like those ARM devices that are so popular nowadays. (E.g. Raspberry PI.) Those would make very nice NAS solutions. Especially with modern ones having connectors for even multiple SSDs. The only thing missing, is a good file system. Because with often only 1GB of RAM, using ZFS simply isn’t technically possible.
    And yes, you *can* force ZFS to reduce its wastefulness. But from what I’ve read, that is inviting a ton of problems down the road, as you’re using ZFS far outside of its specifications. It being slow is just the tip of the resulting iceberg.

    There simply is no option out there currently. Btfrs is broken by design, in many, many aspects, and its developers behave like something between immature teenagers, SJWs and greasy-haired stereotypical hot-blooded Southern European / American ego roosters thumping their hairy chests to keep up their pride. No thanks.

    I just wish somebody would make a HomeZFS. A mod of ZFS that is redesigned, from ground up, to work nicely even on the first Raspberry Pi. … Yes, with full scrubbing.

    Reply
    • Corrodias

      Please tell me more about the limits of ZFS. As far as I’m aware, as long as you’re not using de-duplication, the memory requirements for ZFS are pretty tame: maybe 1 GB. It wouldn’t be able to cache much, but it doesn’t need to cache much on a desktop, no moreso than any OS needs to cache lots of disk I/O.

      Async writes would simply mean the last second or two of writes didn’t get saved, which isn’t usually the end of the world. A sudden power loss will accomplish the same thing whether or not your application *believed* that it finished writing your videogame save just before the power went out. But I think you could run it with all sync writes if you really wanted to.

      Reply
  5. GreyGeek

    “There simply is no option out there currently. Btfrs is broken by design,”

    That’s just your opinion, not fact. And, it is absurd to suggest that Btrfs developers deliberately designed Btrfs to break. Btrfs has been out of the “experimental” stage for quite a while. Personally, I’ve been using it for over two years without a single problem. I began using it on KDE Neon User Edition over two years ago. It was the root filesystem on a 750Gb HD. I then added a second 750Gb HD to my second bay, added it to the pool and then balanced them as a RAID1, with system, data and metadata as RAID1.

    After a while I saw no point in having my DVD/CDROM taking up a port so I replaced it with a CDROM HD Caddy and stuck another 750Gb HD in it. I re-balanced the first two drives as a SINGLE for data, metadata and system to double my usable space and the 3rd HD became a depository for send & receive dated snapshots of @ and @home.

    As an old programmer with 40 yrs experience I experiment a lot with various Linux tools and environments. Recently I evaluated ALL of the P2P/mesh networks available to Linux users (including IPFS, P2P, FreeNet, I2P and ZeroNet). I first made a snapshot of my system. After I completed my testing and evaluation I did a rollback to that snapshot. I did it manually and it took 3 minutes. I didn’t have to uninstall or delete anything. With 200Mb @ for over 500 external links using a rollback eliminated them all. Also, if a package update introduces problems instead of spending hours or days trying to fix or revert things one is only a 3 minutes rollback to a pristine environment.

    My experience is that Btrfs is as stable as a rock and certainly ready for any uses most Linux users might have for it. The only caveat is that when installing VirtualBox and creating dynamic virtual disks make sure you add the nocow property to the directory you designate for storing your virtual disks first. Oshunluver, an admin at Kubuntuforums.net, and in real life, has extensive experience using Btrfs. He has installed four or five distros, all booting from the same grub and sharing the same Btrfs pool and at boot time he chooses which distro he wants to boot into. He fully describes his system in a series of posts, along with a series of Btrfs tutorials. Since one can do maintenance on Btrfs while it is live, he can, as I do, create snapshots of any or all of his distros while remaining live in any particular one.

    I just finished an evaluation of OpenZFS, which is in the repository on my Neon. Only a VERY FEW distros offer ZFS as a root file system but there are several instructional articles about how to manually create a distro to use ZFS as a rfs. Most Linux users are not skilled enough to attempt such a task.

    “… in many, many aspects, and its developers behave like something between immature teenagers, SJWs and greasy-haired stereotypical hot-blooded Southern European / American ego roosters thumping their hairy chests to keep up their pride. No thanks.”

    Ad hominem attacks say more about your personality and your judgment than anything about Btrfs or its developers.

    As far as ZFS is concerned, the CDDL license is a valid issue. So is the fact that even for headless servers most prefer to avoid the root file system issue by first setting up a standard Linux server installing FAT32 or EXT4 for boot, grub and the basic linux system, and creating ZFS pools on unformatted devices. Also, while one can shrink or grow a Btrfs pool without removing or adding HDs, one can only grow a ZFS pool. With Btrfs I can rollback to any dated snapshot without destroying any more recent snapshots. ZFS, on the other hand, stores its snapshots within the pool, so rolling back to a earlier snapshot destroys all snapshots made after it. ZFS, however, shines on getting and setting a very large range of pool properties, which makes it very useful in a server or multi-user environment.

    Reply
  6. John Judenrein

    > I invite you to start that journey with a simple question: “Can you verify without a doubt that your data has not suffered from bit rot?” I look forward to your answer.

    Yes.

    $ sudo btrfs scrub status /
    scrub status for
    scrub started at Wed May 9 12:27:17 2018 and finished after 00:03:44
    total bytes scrubbed: GiB with 0 errors

    You have very low standards if you think being able to do that is somehow impressive.

    Reply
  7. c3d

    Thanks for the good writeup. Indeed, having Red Hat “pull the plug” on btrfs is a bit disappointing.

    Now, to be fair, btrfs may be a bit unstable in some cases. Within my first week of using btrfs on Fedora, I lost data to a filesystem bug, which had not happened to me in eons, except with zfs (see below). The initial problem was apparently that I had not done what GreyGeek suggests regarding qcow2 storage files for KVM virtual machines, and I ended up with a file containing a few million extents, something that btrfs had apparently some trouble with. The second problem, compounding the first, was that btrfsck just died looking at the disk, which is not cool.

    That being said, I had been using btrfs for a while on a Synology NAS with nary a peep, so it’s not like you lose data every day when you use btrfs. But there is some truth to the fact that it’s not completely mature yet… The bug with btrfsck was relatively basic. I sent a fix to the mailing list, so at least now it does not die, though it still does not know how to repair the disk.

    So was my experience with ZFS any better? Well, as I hinted above, not quite. It’s annoying that you have to add it manually on Linux because of license conflicts. That means it’s not an option for my Synology NAS, for example. It also has some administrative traps. For example, the first time I used it on a Mac external USB disk, I forgot to “export” before unmounting the disk, tried to attach it to another computer, and could not find my way out of that (it may be simple, it’s just that having to “export” when you unmount a disk is a rather unusual step, and how to recover after you did that is apparently non-obvious, if at all possible). So I lost about 350G of pictures to ZFS that day, though fortunately that was only data copied for a quick test, so no real harm done. Frankly, I almost gave up on ZFS that day.

    I returned to ZFS recently for my largest disk array (12T in RAID5, 5 disks). I did it mostly because it’s portable, unlike btrfs, so I can read my disk both from Linux and macOS (as long as I export it ;-). And I do agree that from an admin point of view, it’s quite nice. But as I said, not without quirks.

    For instance, another somewhat surprising thing is that when things go south, ZFS by default suspends I/Os on a pool. I can guess the rationale, but if your pool is a single disk, you might as well return an error immediately. Right now, I have a bad disk which I used to test ZFS corner cases. Created a sparse image on it. It failed. Good, ZFS won’t accept to corrupt my backup, bonus points for that. But then, things from bad to worse. First, the pool is in “UNAVAIL” state because of “too many I/O errors”, and the disk is in “FAULTED” state. That makes sense, but as I wrote earlier, if it’s a single disk pool with no redundancy, what does ZFS hope to achieve by suspending I/Os instead of returning an error? Magic-based restoration of the lost bits?

    What is even more frustrating is that I cannot unmount it (probably because there are some I/Os pending). I cannot even see the I/O errors, because it tells me “errors: List of errors unavailable (insufficient privileges)” (as root). So now I have this bad disk which ZFS won’t release no matter what I try, but that I cannot write to. Hmmm, looking in my dictionary, “bug” is the correct way to describe that behaviour.

    So ZFS is not all bad, but frankly, if it’s “really as good as people say it is” to quote your article, I hope the people in question are not me, because after losing some data to a good disk due to an easy-to-make mistake, and then having to wait for a system reboot to be able to get rid of a bad disk, I’d say there is still some room for improvement 🙂 [In my face: I’m a developer, I love open source, I should just stop complaining and fix it, right?]

    Still, I’m a bit puzzled why Apple gave up on ZFS and decided to build their own APFS instead. I suspect it was mostly a strategic decision related to their need to have it work well on small devices such as the Apple Watch. But it’s a bit of a shame. At least, you can use ZFS for data disks and I suspect once I get a good enough disk, it will prove quite usable.

    Reply
  8. kamtaot

    Very nice article! I learned some new things from this article. I started using FreeNAS since September 2017 and till now I have not faced any issues. I do not have any experience with Btrfs (except that I installed OpenSUSE Leap 15 couple of days back). That is when I decided to check the difference between these file systems and reached this article.

    I do not know much technical details of ZFS but I am one highly satisfied FreeNAS user and probably due to ZFS (I mean it may be one of the reasons).

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *