tl;dr
(but please read, I suspect this is a ZFS bug that FreeNAS 9.2.1 is working around.)
I have been trying to figure out where all the space used by my 250G zvol is going. The zvol is used with iSCSI, and while the initiator has formatted it for 250G, zfs reports 405G are actually in use with only 164G logically used.
Build Specs
The space usage is as follows: 257G usedbyrefreservation (250G reserved + 7G for metadata), 133G usedbydataset, and 14.7G usedbysnapshots = 405G used. It does add up. But why do I have any "usedbydataset" when this is not a dataset?
I have found other forum topics (here and on ServerFault) and a resolved bug report discussing the problem occurs when using smaller block sizes and disks with 4k sectors, leading to poor block utilitization and therefore more blocks needed to store the data. The only explanation I have found ([OpenIndiana-discuss] Inefficient zvol space usage on 4k drives) doesn't convince me this is expected behavior. Or if it is by design, it hardly feels sane/predictable. For example, that thread discusses how many raidz2 writes are actually being made, counting it toward zvol usage, but isn't the zdev supposed to logically present a single device that handles all of those details internally, leaving the zvol oblivious to the physical details?
With everything I just outlined above, I am really not sure how to work with a zvol and get a predictable result. I cannot properly manage a SAN where I allocate 250G to XenServer via iSCSI but the reality is I have 405G allocated (and counting) out of 1TB. Even my zpool vs zfs allocations do not add up.
(I would expect the zfs filesystem to report 257G + snapshots, uncompressed, but that leaves the 133G usedbydataset unaccounted for still.)
I would really appreciate some advice about how to properly optimize zvol usage so if a zvol has 257G reserved, it will not use more than that (snapshots are separate, and should be). The advice circulating, which I think used as a resolution for bug report #2383, is to minimize the problem with a larger block size. It should not be acceptable to have 257G reserved but have 275G "real" (just guessing) by using a 32k block size. This is like having sparse zvol behavior on a non-sparse volume--you cannot accurately predict when it's time to add new disks* without frequent monitoring.
Unless someone can help me understand zvols better, and that this isn't really a ZFS bug I am reporting in the wrong place (at this point, all of my testing and analysis concludes it's a bug), I am planning to use file-based extents. Those will not have unpredictable total disk usage since the "blocksize" is a zvol issue, right?
(* I have read that it's best practice to leave 20% or more free in a pool for best performance. To be clear, I'm not trying to fill up the pool to the last byte. I'm trying to know if I allocate 80% today that it will stay allocated at 80% and not creep up to 100% over time, proper snapshot management withstanding.)
(but please read, I suspect this is a ZFS bug that FreeNAS 9.2.1 is working around.)
- Will file-based iSCSI extents have unpredictable disk usage? Will any more diskspace be used other than the size of the extent?
- Why does a zvol report "usedbydataset" at all? That's not only for datasets?
- Is this a ZFS bug in the vzol code, or is this blocksize-dependent usage-creep behavior intended by the ZFS designers?
I have been trying to figure out where all the space used by my 250G zvol is going. The zvol is used with iSCSI, and while the initiator has formatted it for 250G, zfs reports 405G are actually in use with only 164G logically used.
Build Specs
Code:
Build FreeNAS-9.2.0-RELEASE-x64 (ab098f4) Platform Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz Memory 16336MB
- pool1 is a mirror of two 1TB disks (4k, ashift=12).
- pool1/iscsi is a dataset with nothing in it. (Maybe file-based extents later.)
- pool1/iscsi/hdd0 is a zvol with the (recently learned) default 8k blocksize used on creation. A XenServer LVM-based Storage Repository lives here.
Code:
# zfs get all pool1/iscsi/hdd0 NAME PROPERTY VALUE SOURCE pool1/iscsi/hdd0 type volume - pool1/iscsi/hdd0 creation Thu Feb 6 17:32 2014 - pool1/iscsi/hdd0 used 405G - pool1/iscsi/hdd0 available 766G - pool1/iscsi/hdd0 referenced 133G - pool1/iscsi/hdd0 compressratio 1.12x - pool1/iscsi/hdd0 reservation none default pool1/iscsi/hdd0 volsize 250G local pool1/iscsi/hdd0 volblocksize 8K - pool1/iscsi/hdd0 checksum on default pool1/iscsi/hdd0 compression lz4 inherited from pool1-san/iscsi pool1/iscsi/hdd0 readonly off default pool1/iscsi/hdd0 copies 1 default pool1/iscsi/hdd0 refreservation 258G local pool1/iscsi/hdd0 primarycache all default pool1/iscsi/hdd0 secondarycache all default pool1/iscsi/hdd0 usedbysnapshots 14.7G - pool1/iscsi/hdd0 usedbydataset 133G - pool1/iscsi/hdd0 usedbychildren 0 - pool1/iscsi/hdd0 usedbyrefreservation 257G - pool1/iscsi/hdd0 logbias latency default pool1/iscsi/hdd0 dedup off default pool1/iscsi/hdd0 mlslabel - pool1/iscsi/hdd0 sync standard default pool1/iscsi/hdd0 refcompressratio 1.10x - pool1/iscsi/hdd0 written 549M - pool1/iscsi/hdd0 logicalused 164G - pool1/iscsi/hdd0 logicalreferenced 146G -
The space usage is as follows: 257G usedbyrefreservation (250G reserved + 7G for metadata), 133G usedbydataset, and 14.7G usedbysnapshots = 405G used. It does add up. But why do I have any "usedbydataset" when this is not a dataset?
I have found other forum topics (here and on ServerFault) and a resolved bug report discussing the problem occurs when using smaller block sizes and disks with 4k sectors, leading to poor block utilitization and therefore more blocks needed to store the data. The only explanation I have found ([OpenIndiana-discuss] Inefficient zvol space usage on 4k drives) doesn't convince me this is expected behavior. Or if it is by design, it hardly feels sane/predictable. For example, that thread discusses how many raidz2 writes are actually being made, counting it toward zvol usage, but isn't the zdev supposed to logically present a single device that handles all of those details internally, leaving the zvol oblivious to the physical details?
With everything I just outlined above, I am really not sure how to work with a zvol and get a predictable result. I cannot properly manage a SAN where I allocate 250G to XenServer via iSCSI but the reality is I have 405G allocated (and counting) out of 1TB. Even my zpool vs zfs allocations do not add up.
Code:
# zpool list NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT pool1 928G 147G 781G 15% 1.00x ONLINE /mnt # zfs list NAME USED AVAIL REFER MOUNTPOINT pool1 405G 508G 152K /mnt/pool1 pool1/iscsi 405G 508G 144K /mnt/pool1/iscsi pool1/iscsi/hdd0 405G 766G 133G -
(I would expect the zfs filesystem to report 257G + snapshots, uncompressed, but that leaves the 133G usedbydataset unaccounted for still.)
I would really appreciate some advice about how to properly optimize zvol usage so if a zvol has 257G reserved, it will not use more than that (snapshots are separate, and should be). The advice circulating, which I think used as a resolution for bug report #2383, is to minimize the problem with a larger block size. It should not be acceptable to have 257G reserved but have 275G "real" (just guessing) by using a 32k block size. This is like having sparse zvol behavior on a non-sparse volume--you cannot accurately predict when it's time to add new disks* without frequent monitoring.
Unless someone can help me understand zvols better, and that this isn't really a ZFS bug I am reporting in the wrong place (at this point, all of my testing and analysis concludes it's a bug), I am planning to use file-based extents. Those will not have unpredictable total disk usage since the "blocksize" is a zvol issue, right?
(* I have read that it's best practice to leave 20% or more free in a pool for best performance. To be clear, I'm not trying to fill up the pool to the last byte. I'm trying to know if I allocate 80% today that it will stay allocated at 80% and not creep up to 100% over time, proper snapshot management withstanding.)