OpenZFS: Stronger Than Ever
The 2016 OpenZFS Developer Summit and Hackathon took place September 26th and 27th in San Francisco and showcased the amazing growth that OpenZFS is experiencing as a technology and a project. I attended the first “ZFSDay” back in 2012 and remember the uncertainty surrounding ZFS: Oracle had closed the ZFS and OpenSolaris sources a mere two years earlier and there was still a risk of ZFS being fragmented at the hands of vendors. Fast forward four years and we have a strong OpenZFS project backed by a strong community and consistent vendor contributions. ZFS co-founder Matt Ahrens announced that OpenZFS has joined Software in the Public Interest (SPI), allowing the project to formally accept tax-deductible donations, the first of which included the long-overdue openzfs.org domain name.
Day 0: Social Event
The first day of the event featured an informal gathering at a Mission District restaurant where we introduced ourselves and learned what people were working on. Josh Paetzel, Alexander Motin and I met with developers from Delphix, Intel and Canonical, plus a few independents. Kudos to Colin King of Canonical for not only introducing several of us to his “stress-ng” stress testing suite but also for extolling the virtues of unified source trees like those found in the BSDs. I took the liberty of introducing him to UbuntuBSD and I look forward to more collaboration between users of the different OpenZFS platforms.
Day 1: Talks
The talks day of the event took place at the Children’s Creativity Museum in downtown San Francisco which was where the original ZFSDay took place, then surrounded by the overwhelming Oracle OpenWorld conference. Josh Paetzel, Alexander Motin, Jordan Hubbard, Dru Lavigne, Warren Block, Ronald Pagani, Ash Gokhale, Erin Clark and a few other iXsystems staff attended along with myself. Matt Ahrens gave his annual State of the Union address and made the point that an astonishing 45% of OpenZFS lines of code have been added since 2011. This is remarkable given how easily OpenZFS could have fragmented but instead adopted “feature flags” to extend its abilities. To this day I am not aware of any proprietary feature flags and am grateful to every vendor who has extended OpenZFS.
Sašo Kiselkov of Nexenta and Tom Caputi of Datto went on to describe the work they are doing on improving Scrub/Resilver Performance and ZFS-Native Encryption respectively. The secret to faster scrubs and resilvers turns out to performing an initial indexing of the work to be done, not unlike how rsync maps out its work. With this index built in RAM, the heavy lifting is sped up significantly. ZFS-Native Encryption is exactly what it sounds like but it is critical that this work be reviewed by cryptography experts and the FreeBSD community prior to upstreaming. Both of these technologies should appear in OpenZFS in less than a year.
From there, Don Brady of Intel and Justin Gibbs of the FreeBSD Foundation talked about the various Fault Management frameworks for OpenZFS which include the Fault Manager Daemon on Illumos, zfsd on FreeBSD and zed on Linux. Having OpenZFS “do the right thing” when something goes wrong is obviously important and goes hand in hand with the last talk, ZFS Validation & QA by Sydney Vanda & John Salinas of Intel. Delphix has long had a ZFS test suite and this team at Intel is working to port it to GNU/Linux. I joined the discussion of the test suite at the hackathon and am exploring both what would be needed to port it to FreeBSD and what features it could use.
The Monday social event included a visit by special guest ZFS co-developer Jeff Bonwick and many familiar faces such as George Wilson, Bryan Cantrill and the various speakers. This set the stage for the less-formal Hackathon the next day at the Delphix offices.
Day 2: Updates and Hackathon
Eager Zero by George Wilson: When thin provisioning storage with a cloud provider, initializing additional storage can be a rather slow operation. Not unlike stuffing your gear into a collapsed tent, “Eager Zero” eagerly zeros out free space as if you built the tent before using it. The space is empty but it’s all yours to use.
Persistent L2ARC by Saso Kiselkov: The Persistent L2ARC will allow your cached data to survive a reboot and not require “warming up”. This is a huge win for latency-sensitive applications that depend on flash-or-faster data access to perform. The Persistent L2ARC works in conjunction with not only the compressed ARC and the forthcoming ZFS-Native Encryption, all of which should end up in OpenZFS.
Compressed Send and Receive by Dan Kimmel: Not unlike the Persistent L2ARC, this is another “no brainer” feature that will eliminate overhead. Currently, while OpenZFS supports on-disk compression, it decompresses data before it is transferred to other datasets, often having that data recompressed in preparation for transmission. From there it is uncompressed on arrival and recompressed when stored. The shortcoming of this strategy should be obvious: If you know you want the data to be compressed on the source and destination, you may as well preserve that compression throughout the whole process.
Device Removal by Matt Ahrens: Many OpenZFS users discover the hard way that ‘zpool add’ and ‘zpool attach’ are easily confused. The distinction is between “striping” a device into a pool and mirroring a device with an existing one or more. Until now, the first of these could not be undone and would generally put your data at risk because of the lack of redundancy you introduce. Device Removal will transfer data off of the striped-in device and effectively mask it from the system, allowing you to use the device in a more preferred way. This will not yet let you remove vdevs with redundancy like RaidZ but sets the stage for that ability in the future.
Parity Declustered RAID for ZFS (DRAID) by Isaac Huang: This is an interesting one. Parity Declustered RAID creates a “virtual” spare device that can be resilvered very quickly. The physical spare is resilvered as needed but the initial restoration of parity is very fast. The introduction of additional redundancy strategies like this are tribute to the freedom and flexibility of OpenZFS feature flags.
SPA Metadata Allocation Classes by Don Brady: Currently, you can have a separate log device or SLOG. This project extends that concept to other forms of OpenZFS metadata for performance and fragmentation mitigation. Using large data blocks? There is no reason your metadata should do the same and it can now be located on separate devices.
Redacted send/receive by Paul Dagnelie: Only want to send a subset of the files in a snapshotted or bookmarked dataset? This feature will allow for that.
SPA import and pool recovery by Pavel Zakharov: This, in my humble opinion is the big one. Currently, OpenZFS has a very strict policy on data corruption that locks you out of what it considers corrupt data. This all but guarantees data loss if you lack a backup or requires a visit to an extremely-expensive data recovery service. Pavel arrived at Delphix to a difficult data recover task and took the time to automate his work. The result is the ability to import pools that are missing entire vdevs and perform rollbacks beyond the current limits. You may not get all of your data back but this is the most promising option yet to try. With ‘copies=2’ enabled you may be able to retrieve all of your data from what previously would have been an inaccessible pool.