Intro to ZFS. What is ZFS?

}

September 30, 2013

“The Z file system, originally developed by Sun™, is designed to use a pooled storage method in that space is only used as it is needed for data storage. It is also designed for maximum data integrity, supporting data snapshots, multiple copies, and data checksums. It uses a software data replication model, known as RAID-Z. RAID-Z provides redundancy similar to hardware RAID, but is designed to prevent data write corruption and to overcome some of the limitations of hardware RAID.”


ZFS is a modern 128-bit file system based on the copy-on-write model. It originates from the OpenSolaris project and first appeared in FreeBSD in 2008. ZFS has many innovative features including an integrated volume manager with mirroring and RAID capabilities, data checksumming and compression, writable snapshots that can be transferred between systems and many more.  FreeBSD’s ZFS file system has been updated by merging improvements from the illumos project.
Current FreeBSD implementation of ZFS is ZFS Pool version 28. Here is the history of ZFS releases:
• 7.0+ – original ZFS import, ZFS v6; requires significant tuning for stable operation (no longer supported).
• 7.2 – still ZFS v6, improved memory handling, amd64 may need no memory tuning (no longer supported).
• 7.3+ – backport of new ZFS v13 code, similar to the 8.0 code
• 8.0 – new ZFS v13 code, lots of bug fixes – recommended over all past versions (no longer supported).
• 8.1+ – ZFS v14
• 8.2+ – ZFS v15
• 8.3+ – ZFS v28
• 9.0+ – ZFS v28
ZFS features :
• pooled storage (integrated volume manager)
• transactional semantics (copy-on-write)
• checksums and self-healing (scrub, resilver)
• scalability
• instant snapshots and clones
• dataset compression (lzjb)
• simplified delegable administration

Basic ZFS concepts

The ZFS file system uses two main objects: Pools and Datasets. A ZFS pool is a storage object consisting of virtual devices. These ‘vdevs’ can be:
• disk (partition, GEOM object, …)
• file (experimental purposes)
• mirror (groups two or more vdevs)
• raidz, raidz2, raidz3 (single to triple parity RAID-Z)
• spare (pseudo-vdev for hot spares)
• log (separate ZIL device, may not be raidz)
• cache (L2 cache, may not be mirror or raidz)
Each ZFS pool contains ZFS datasets. ZFS dataset is a generic name for:
file system (POSIX layer)
• volume (virtual block device)
• snapshot (read-only copy of file system or volume)
• clone (file system with initial contents of a snapshot)
For more information about this you can always
check the handbook (http://www.freebsd.org/doc/en_
US.ISO8859-1/books/handbook/filesystems-zfs.html).
Requirements for this tutorial:
• FreeBSD production release (9.1)
• Around 512 MB of disk space (for simulating disks)
• At least 1 GB of RAM
• Root account

Purpose of this tutorial

The purpose of this tutorial is to explore some ZFS features in a safe way to grasp the power and flexibility of this file system. We will take a look at these basic functionalities:
• Create a ZFS pool.
• Create a ZFS mirror.
• Simulate a failure on a mirrored disk.
• Replace a disk.
• Adding disks to a mirrored zpool.
• Check I/O on ZFS pools.

Creating Disks and Pools

To try some ZFS features, first we need to create pools. We will use files to simulate real disks so we can test things safely. I will use the mkfile(8) utility to create some files and use those as disks. mkfile creates one or more files that are suitable for use as NFS-mounted swap areas, or as local swap areas. The file is padded with zeros by default. The default size is in bytes, but it can be flagged as exabytes, petabytes, terabytes, gigabytes, megabytes, kilobytes, or blocks with the e, p, t, g, m, k, or b suffixes, respectively.  Now let’s create some disks! NOTE: If you don’t have the mkfile utility, you just need to: Listing 1. Here, I’m creating 4 disks of 128MB each as you can see in the ls output.

ZPools

All ZFS file systems live in a pool, so first we need to create a zpool. We can check pools with the zpool(8) command. Before creating new zpools, you should check for existing zpools to avoid confusing them with your tutorial pools. You can check what zpools exist with zpool list:
root@apollo:/array # zpool list
no pools available

Now let’s create a zpool with zpool create:
root@apollo:/array # zpool create tutorial /array/disk00

List the current pools:
root@apollo:/array # zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
tutorial 123M 77K 123M 0% 1.00x ONLINE –
root@apollo:/array #

Now let’s use the file system. Create a new file on the pool we just created (Listing 2). Here I have created a 1MB file on the newly created zpool.

Creating a ZFS mirror

A pool with only one disk doesn’t offer any redundancy. Let’s create a new zpool called “example2” using a couple of disks. We will use the keyword “mirror”. As the name states, it will make a mirror using this pair of disks when we create the zpool (Listing 3).
We can check the status of our pools with the zpool status command (Listing 4).
Let’s create a file again and check the status after that (I’ll create a 32MB file):
mkfile 32m /example2/file
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
example2 123M 32.8M 90.2M 26% 1.00x ONLINE –

So now we have our data stored redundantly over the two disks.

Simulating a disk failure

Not everything is nice and calm. Sometimes bad things happen to good people, like a disk going bad at 3 a.m. Let’s simulate a disk failure. For that I’ll overwrite the first disk label with random data:
root@apollo:/ # dd if=/dev/random of=/array/disk01
bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.029959 secs (34180 bytes/sec)

In case you don’t know about the dd(1) command, here is what it does:
“The dd utility copies the standard input to the standard output. Input data is read and written in 512-byte blocks. If input reads are short, input from multiple reads are aggregated to form the output block. When finished, dd displays the number of complete and partial input and output blocks and truncated input records to the standard error output.”
So I wrote a one-time block size of 1024 bytes from /dev/random to our disk01.
ZFS automatically checks for errors when it reads/writes files; we can force a check with the scrub command (Listing 5). We messed up the disk, so it shows as UNAVAIL, but no errors are reported for the pool as a whole:
“Sufficient replicas exist for the pool to continue functioning in a degraded state.”
We still can read and write to the pool:
root@apollo:/ # ls -lrt /example2/
total 32779
-rw——- 1 root wheel 33554432 Jul 29 22:41 file


Replacing a disk

Let’s take out the bad disk from the pool using the detach command:
root@apollo:/ # zpool detach example2 /array/disk01
Now let’s erase our file and create a new one to simulate a new disk:
root@apollo:/ # rm /array/disk01
root@apollo:/ # mkfile 128m /array/disk01

To attach another device, we specify an existing device in the mirror to attach it to with zpool attach (Listing 6). If you do type zpool status fast enough, after you attach the new disk, you will see a resilver (remirroring) in progress with zpool status.  Once the resilver is complete, the pool is healthy again (you can also use ls to check the files are still there):
root@apollo:/ # ls /example2/
file file2

Adding a disk to a Mirrored ZPool
You can add disks to a zpool without taking it offline (Listing 7).
This happens almost instantly. Now zpool status returns that we have a pool composed of two mirrors (Listing 8).
Checking I/O on ZPools
If we need to check IO on our pool, we have the zpool iostat -v (Listing 9).

All the data is currently written on the first mirror pair, as the second pair did not exist at the time the article was written.
That is all for this tutorial. Much more information on ZFS can be found in the following links:
• http://docs.oracle.com/cd/E19253-01/819-5461/
• http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide
• https://wiki.freebsd.org/ZFSTuningGuide
• http://wiki.illumos.org/display/illumos/illumos+Home
• http://manned.org/

Author:
CARLOS ANTONIO NEIRA BUSTOS
This article was re-published with the permission of BSD Magazine.  To Learn More about iXsystem’s commitment to open source check us out here:   https://www.ixsystems.com/about-ix/

Join iX Newsletter

iXsystems values privacy for all visitors. Learn more about how we use cookies and how you can control them by reading our Privacy Policy.
π