OpenZFS: Stronger Than Ever

The 2016 OpenZFS Developer Summit and Hackathon took place September 26th and 27th in San Francisco and showcased the amazing growth that OpenZFS is experiencing as a technology and a project. I attended the first “ZFSDay” back in 2012 and remember the uncertainty surrounding ZFS: Oracle had closed the ZFS and OpenSolaris sources a mere two years earlier and there was still a risk of ZFS being fragmented at the hands of vendors. Fast forward four years and we have a strong OpenZFS project backed by a strong community and consistent vendor contributions. ZFS co-founder Matt Ahrens announced that OpenZFS has joined Software in the Public Interest (SPI), allowing the project to formally accept tax-deductible donations, the first of which included the long-overdue openzfs.org domain name.

Day 0: Social Event

The first day of the event featured an informal gathering at a Mission District restaurant where we introduced ourselves and learned what people were working on. Josh Paetzel, Alexander Motin and I met with developers from Delphix, Intel and Canonical, plus a few independents. Kudos to Colin King of Canonical for not only introducing several of us to his “stress-ng” stress testing suite but also for extolling the virtues of unified source trees like those found in the BSDs. I took the liberty of introducing him to UbuntuBSD and I look forward to more collaboration between users of the different OpenZFS platforms.

Day 1: Talks

The talks day of the event took place at the Children’s Creativity Museum in downtown San Francisco which was where the original ZFSDay took place, then surrounded by the overwhelming Oracle OpenWorld conference. Josh Paetzel, Alexander Motin, Jordan Hubbard, Dru Lavigne, Warren Block, Ronald Pagani, Ash Gokhale, Erin Clark and a few other iXsystems staff attended along with myself. Matt Ahrens gave his annual State of the Union address and made the point that an astonishing 45% of OpenZFS lines of code have been added since 2011. This is remarkable given how easily OpenZFS could have fragmented but instead adopted “feature flags” to extend its abilities. To this day I am not aware of any proprietary feature flags and am grateful to every vendor who has extended OpenZFS.

Brian Behlendorf of Lawrence Livermore National Laboratory and founder of the ZFS on Linux project gave a talk, “Lustre, Supercomputers, and ZFS” which not only described LLNL’s use of OpenZFS but also its integration of OpenZFS with the Lustre parallel file system. Rather than interface at the traditional POSIX file level or block level, LLNL has integrated Lustre with the “native” OpenZFS Data Management Unit or DMU. This follows a very exciting trend in storage of bypassing the various abstractions that have accumulated over the years that each introduce latency and constraints.

Sašo Kiselkov of Nexenta and Tom Caputi of Datto went on to describe the work they are doing on improving Scrub/Resilver Performance and ZFS-Native Encryption respectively. The secret to faster scrubs and resilvers turns out to performing an initial indexing of the work to be done, not unlike how rsync maps out its work. With this index built in RAM, the heavy lifting is sped up significantly. ZFS-Native Encryption is exactly what it sounds like but it is critical that this work be reviewed by cryptography experts and the FreeBSD community prior to upstreaming. Both of these technologies should appear in OpenZFS in less than a year.

From there, Don Brady of Intel and Justin Gibbs of the FreeBSD Foundation talked about the various Fault Management frameworks for OpenZFS which include the Fault Manager Daemon on Illumos, zfsd on FreeBSD and zed on Linux. Having OpenZFS “do the right thing” when something goes wrong is obviously important and goes hand in hand with the last talk, ZFS Validation & QA by Sydney Vanda & John Salinas of Intel. Delphix has long had a ZFS test suite and this team at Intel is working to port it to GNU/Linux. I joined the discussion of the test suite at the hackathon and am exploring both what would be needed to port it to FreeBSD and what features it could use.

The Monday social event included a visit by special guest ZFS co-developer Jeff Bonwick and many familiar faces such as George Wilson, Bryan Cantrill and the various speakers. This set the stage for the less-formal Hackathon the next day at the Delphix offices.

Day 2: Updates and Hackathon

Day 2 began with quick project updates before the work began. Going feature by feature, these quick presentations included:

Eager Zero by George Wilson: When thin provisioning storage with a cloud provider, initializing additional storage can be a rather slow operation. Not unlike stuffing your gear into a collapsed tent, “Eager Zero” eagerly zeros out free space as if you built the tent before using it. The space is empty but it’s all yours to use.

Persistent L2ARC by Saso Kiselkov: The Persistent L2ARC will allow your cached data to survive a reboot and not require “warming up”. This is a huge win for latency-sensitive applications that depend on flash-or-faster data access to perform. The Persistent L2ARC works in conjunction with not only the compressed ARC and the forthcoming ZFS-Native Encryption, all of which should end up in OpenZFS.

Compressed Send and Receive by Dan Kimmel: Not unlike the Persistent L2ARC, this is another “no brainer” feature that will eliminate overhead. Currently, while OpenZFS supports on-disk compression, it decompresses data before it is transferred to other datasets, often having that data recompressed in preparation for transmission. From there it is uncompressed on arrival and recompressed when stored. The shortcoming of this strategy should be obvious: If you know you want the data to be compressed on the source and destination, you may as well preserve that compression throughout the whole process.

Device Removal by Matt Ahrens: Many OpenZFS users discover the hard way that ‘zpool add’ and ‘zpool attach’ are easily confused. The distinction is between “striping” a device into a pool and mirroring a device with an existing one or more. Until now, the first of these could not be undone and would generally put your data at risk because of the lack of redundancy you introduce. Device Removal will transfer data off of the striped-in device and effectively mask it from the system, allowing you to use the device in a more preferred way. This will not yet let you remove vdevs with redundancy like RaidZ but sets the stage for that ability in the future.

Parity Declustered RAID for ZFS (DRAID) by Isaac Huang: This is an interesting one. Parity Declustered RAID creates a “virtual” spare device that can be resilvered very quickly. The physical spare is resilvered as needed but the initial restoration of parity is very fast. The introduction of additional redundancy strategies like this are tribute to the freedom and flexibility of OpenZFS feature flags.

SPA Metadata Allocation Classes by Don Brady: Currently, you can have a separate log device or SLOG. This project extends that concept to other forms of OpenZFS metadata for performance and fragmentation mitigation. Using large data blocks? There is no reason your metadata should do the same and it can now be located on separate devices.

Redacted send/receive by Paul Dagnelie: Only want to send a subset of the files in a snapshotted or bookmarked dataset? This feature will allow for that.

SPA import and pool recovery by Pavel Zakharov: This, in my humble opinion is the big one. Currently, OpenZFS has a very strict policy on data corruption that locks you out of what it considers corrupt data. This all but guarantees data loss if you lack a backup or requires a visit to an extremely-expensive data recovery service. Pavel arrived at Delphix to a difficult data recover task and took the time to automate his work. The result is the ability to import pools that are missing entire vdevs and perform rollbacks beyond the current limits. You may not get all of your data back but this is the most promising option yet to try. With ‘copies=2’ enabled you may be able to retrieve all of your data from what previously would have been an inaccessible pool.

Many of these features will arrive in upstream OpenZFS in November and some of the authors will grant you early access to their code. iXsystems will closely track these features to keep TrueNAS and FreeNAS first-class OpenZFS platforms. The OpenZFS project has published a project roadmap and you are welcome to help with the various efforts. There is still much work to be done but the OpenZFS project has never had such strong community infrastructure, support and participation. Many thanks to every company and individual that has helped make OpenZFS the unrivaled file system it is and I look forward to next year’s OpenZFS Developer Summit.

Michael Dexter
Senior Analyst

3 Comments

  1. Matt Weatherford

    Thank you for this update/ overview of the amazing progress happening on the OpenZFS project!

    Matt

    Reply
    • Scott

      These are all great features and I’m glad that OpenZFS is moving forward.

      However, how about addressing the “elephant in the room”? When is OpenZFS going to support expanding existing arrays like plain old antiquated Raid 5?!?

      As an example, I have a ZFS2 array with 6 drives in it. I would really like to increase the array size by 4 drives, but don’t have alternative storage to back up my FreeNAS to while I break the current array and create a new ZFS2 array with all 10 drives. I can’t afford alternative storage in that capacity and can’t afford the inefficiency of 6 drives to just add 4 drives of storage (pooling two ZFS2 arrays).

      This tech is already ancient and SHOULD be part of ZFS! I’m completely confounded as to why it isn’t there.

      Am I missing something? Is this really already “in the pipeline” but it is being kept secret for some reason?

      Any help to clarify this would be greatly appreciated!

      Reply
  2. Maxim Doucet

    It looks promising! Thanks for this summary and congrats to the contributors!

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *