jgreco
Resident Grinch
- Joined
- May 29, 2011
- Messages
- 18,680
That seems like a small sample size, only a few thousand disk commands.
The 44ms read latency is about the only thing that I think might be worth worrying about. ZFS is a CoW filesystem, and writes to the VM virtual disk tend to cause fragmentation. A VM backup strategy that pulls a linear read of the entire vmdk will end up doing a lot of seeking on a well-established VM (one that's had lots of random writes everywhere). Some backup software even tries that in parallel.
In ZFS we try to mitigate the resulting fragmentation (and seek/read operations) by adding ARC and then L2ARC. Complicating this in your case might be the multiple pools, but in general the way we "fix" fragmentation through the caching, so that's probably part of the answer, especially since the main problem I see is read latency. So if and when you decide you have a problem, you can consider trying to fix this through caching by adding a 256GB SSD for L2ARC, or 64GB more RAM, or both.
More disks may also help, if the pool is being driven to unreasonably busy levels during backups. The problem is that your typical hard disk just isn't really capable of a ton of IOPS, so adding more of something that might be way too slow means you've got a vaguely less-slow thing, which isn't necessarily a fix. This requires some analysis and contemplation on your part.
The 44ms read latency is about the only thing that I think might be worth worrying about. ZFS is a CoW filesystem, and writes to the VM virtual disk tend to cause fragmentation. A VM backup strategy that pulls a linear read of the entire vmdk will end up doing a lot of seeking on a well-established VM (one that's had lots of random writes everywhere). Some backup software even tries that in parallel.
In ZFS we try to mitigate the resulting fragmentation (and seek/read operations) by adding ARC and then L2ARC. Complicating this in your case might be the multiple pools, but in general the way we "fix" fragmentation through the caching, so that's probably part of the answer, especially since the main problem I see is read latency. So if and when you decide you have a problem, you can consider trying to fix this through caching by adding a 256GB SSD for L2ARC, or 64GB more RAM, or both.
More disks may also help, if the pool is being driven to unreasonably busy levels during backups. The problem is that your typical hard disk just isn't really capable of a ton of IOPS, so adding more of something that might be way too slow means you've got a vaguely less-slow thing, which isn't necessarily a fix. This requires some analysis and contemplation on your part.