Optimal vdev additions, vdev striping

itsScientific · Feb 12, 2016

I'm trying to understand my [planned] zpool expansion options. Part of that comes down to how ZFS handles its "dynamic striping" across vdevs. Solaris documentation leaves that pretty vague, to the effect of "does good." I think my question boils down to: Which most affects the amount of data striped to each vdev, the % free or the raw amount free in comparison with available vdevs?

The plan starts simply - 2-way mirror vdevs. I'm optimizing for speed, ease of management, and ideally, incremental expansion capabilities. One of the arguments for mirror vdevs is that you can easily expand in two ways - resilver with larger drives (quick with 2 per vdev), and/or add more mirror vdevs over time.

This got me thinking, surely, it would be bad if you took a 80% full zpool and merely added one vdev - ZFS would hit that new, empty vdev hard because the rest of the vdevs are quite full. Perhaps even until the new vdev hit 80% (?). In that case, you'd lose most benefits of striping to the high number of vdevs that a 2-way mirror layout gives you - not a slick means of expansion. It's incremental, but nullifies a key advantage of a mirror-based pool.

Then I figured, when I do add new capacity, the drives will likely be an increase on the originals. Perhaps a resilver-based increase + adding back the smaller drives will be optimal?

Say I have two vdevs, each 2 TB (2x2TB mirror) = 4 TB zpool. At the point I need more capacity, I'd primarily get it by individually resilvering drives to 4 TB each. If the drives used to be 80% full, they're now down to 40%. Great! Now I have an 8 TB pool. But I still only have 2 vdevs and no striping speed increase as capacity increased. But hey, why let the old 2 TB drives go to waste? If I add them back as clean vdevs, I'd now have 4 vdevs to stripe over, 12 TB of storage, and roughly equal TB free (i.e. about 2 TB) per vdev, though not equal % free. If ZFS prioritizes % full, I'd lose some benefit, as data is more heavily distributed to the empty 2TB-based vdevs; however, if it prioritizes available capacity, the drives will quickly be in step, with equal striping across 4 vdevs.

…I’m considering all this because ZFS does not provide a way to re-balance data on vdevs. I'm ultimately stuck with "rebuild the zpool if you want to optimize striping in a new layout." I'd much prefer a "re-balance because I changed the zpool layout" feature, but I get it - that's damn complicated. So what's the best middle ground?

Secondly, without me digging in to the code, are there any guidelines that explain what ZFS attempts to optimize about dynamic striping - percentage full, speed of particular vdev, error conditions, etc...?

(Good link on performance basics : http://constantin.glez.de/blog/2010/06/closer-look-zfs-vdevs-and-performance)

Mirfster · Feb 15, 2016

From my limited understanding, re-balancing of data will only occur on data that is not "at rest". So if you add an additional vdev to your pool, only new and modified data will get spread out across the drives. That is why most people end up with "rebuild the zpool if you want to optimize striping in a new layout."

I wonder if scrubs would assist in this or not; but do not think so.

Guess let's see what the others say about it (I am interested in this as well)

Pointeo13 · Feb 15, 2016

Here is what I have posted in the past,

Because I work with a lot of archive files that will never be re-written and I continue to grow my storage every few months I usually create a script with this basic rsync command that re-writes all the data for me. I'll have it go through the process at least 2-3 times to get the data to spread out evenly. This will write the data in a new directory and I'll run it again to write the data back to the original directory.

rsync --remove-source-files -rtv "/directory/" "/directory/"

itsScientific · Feb 15, 2016

It's a great point that my plan leaves the existing data on the original vdevs, with no striping boost until the data is moved. Definitely worth considering, especially with RAIDZ vdevs.

That said, the sneaky part about 2-way mirror vdevs is that they use both disks of a mirror for read operations, so I already get decent speeds on existing data without a manual rebalance. It's probably enough that I, personally, would forgo the pain and possible fragmentation issues of moving data back and forth.

I figured a picture would be helpful.
This would be my potentially optimal expansion method for a pool of mirrors.

Code:

2x2TB mirror vdevs
+-------+  +-------+
|       |  |       |
|#######|  |#######|
|# 80% #|  |# 80% #|
|#######|  |#######|
|#######|  |#######| 
|#######|  |#######|
+-------+  +-------+
         ||
         ||
         \/
2x4TB & 2x2TB mirror vdevs
+-------+  +-------+ 
|       |  |       | 
|       |  |       | 
|       |  |       |  +------+  +------+
|#######|  |#######|  |      |  |      |
|# 40% #|  |# 40% #|  |  0%  |  |  0%  |
|#######|  |#######|  |      |  |      |
+-------+  +-------+  +------+  +------+

To restate the question, how will ZFS approach dynamic striping in this new layout?

Important Announcement for the TrueNAS Community.

Optimal vdev additions, vdev striping

itsScientific

Cadet

Mirfster

Doesn't know what he's talking about

Pointeo13

Explorer

itsScientific

Cadet

Similar threads