Visualizing ZFS Performance

Many tools exist to understand ZFS performance challenges and opportunities, but a single table by renowned performance engineer Brendan Gregg will teach you to visualize the relationship between each tier of storage devices when architecting your TrueNAS or FreeNAS system.

Brendan Gregg worked closely with the ZFS Team at Sun Microsystems and later wrote the definitive book on Unix systems performance, Systems Performance. In the book, Brendan examines dozens of powerful performance analysis tools from top(1) to DTrace and plots his results with flame graphs to help establish baseline performance and pinpoint anomalies. I can’t recommend the book enough and want to talk about a single chart in it that you might overlook. In the “Example Time Scale of System Latencies” on page 20, Brendan maps the latency of one CPU cycle to one second of time, and continues this mapping down through 14 more example elements of the computing stack. The resulting relative time scale ranges from one second for a CPU cycle to 32 millennia for a server to reboot. The four essential points in Brendan’s scale for ZFS administrators are:

 

 

This deceptively simple chart provides the majority of what you need to understand ZFS performance challenges and opportunities. Newer flash-based storage devices like the NVDIMM and NVMe devices found in the new TrueNAS M-Series bridge the gap between SSDs and system RAM but the distinct performance tiers remain the same. Let’s break them down:

One CPU Cycle

A CPU cycle is the one fixed point of reference for the performance of any given system and most TrueNAS and FreeNAS systems maintain a surplus of CPU power. The operating system and services are the obvious primary consumers of this resource but a ZFS-based storage system makes effective use of CPU resources in less obvious ways: checksumming, compressing, decompressing, and encrypting data. The data integrity guarantee made by ZFS is only possible thanks to a modern CPU’s ability to calculate and validate data block checksums on the fly, a luxury not available on previous generations of systems. The CPU is also used for continuously compressing and decompressing data, reducing the burden on storage devices and yielding a performance gain.

Encryption performed by the CPU typically takes the form of SSH for network transfers or on-disk data block encryption. Faster SSH encryption improves network performance during replication transfers while data encryption can place an equal, if not greater burden on the storage system than compression. In all cases, CPU-based acceleration of compression, decompression, and encryption allows storage devices to perform at their best thanks to the optimization of the data provided to them.

Main RAM Access

Like the CPU, computer memory is used by the operating system and services but it also provides a volatile form of storage that plays a key role in ZFS performance. Computer RAM is considered volatile because its contents are lost when the computer is switched off. While RAM performs dramatically slower than the CPU, it is also dramatically faster than all forms of persistent storage. ZFS uses RAM for its Adaptive Replacement Cache (ARC), which is essentially an intelligent read cache. Any data residing in the ARC, and thus RAM, is available faster than any persistent storage device can provide, at any cost. While ZFS is famous for aggressively using RAM, it is doing so for a very good reason. Investing in RAM can be the greatest investment you can make for read performance.

SSD Storage Access

Sitting squarely between RAM and spinning disks in terms of performance are SSDs, now joined by the yet-faster NVMe cards and memory-class devices like NVDIMMs. Flash-based devices introduce persistent storage but generally pale in comparison to RAM for raw speed. With these stark differences in performance come stark differences in capacity and price, enlightening us to the fact that a high-performance yet cost-competitive storage stack is a compromise made of several types of storage devices. This has been termed “hybrid” storage by the industry. In practice, SSDs are the only practical foundation for an “all-flash array” for the majority of users and, like the ARC, they can also supplement slower storage devices. An SSD or NVMe card is often used for a ZFS separate log device, or SLOG, to boost the performance of synchronized writes, such as over NFS or with a database. The result is “all-flash” write performance and the data is quickly offloaded to spinning disks to take advantage of their capacity. Because this offloading takes place every five seconds by default, a little bit of SLOG storage goes a long way.

On the read side, a level two ARC, or L2ARC, is typically an SSD or NVMe-based read cache that can easily be larger than computer memory of the same price. Serving data from a flash device will clearly be faster than from a spinning disk, but slower than from RAM. Note that using an L2ARC does not mean you cut back on your computer memory too dramatically because the L2ARC index along with various ZFS metadata are still kept in RAM.

Rotational Disk Access

Finally, we reach the spinning disk. While high in capacity, disks are astonishingly slow in performance when compared to flash and RAM-based persistent and volatile storage. It is tempting to scoff at the relative performance of hard disks, but their low cost per terabyte guarantees their role as the heavy lifters of the storage industry for the foreseeable future. Stanley Kubrick’s HAL 9000 computer in the movie 2001 correctly predicted that the future of storage is a bunch of adjacent chips, but we are a long way from that era. Understanding the relative performance of RAM, flash, and rotating disks will help you choose the right storage components for your ZFS storage array. The highly-knowledgeable sales team at iXsystems is here to help you quickly turn all of this theory into a budget for the storage system you need.

 

Michael Dexter
Senior Analyst

Submit a Comment

Your email address will not be published. Required fields are marked *