TL;DR: You have several systems. Install FreeNAS on one, Linux/mdadm+xfs on another, maybe a third wildcard system, and then
THUNDERDOME!
I already have the servers on hand, thus I don't want to purchase "one beefy monster box". Also, it is better to have a few smaller servers than one large server (upgrades/maintenance, unexpected outages, etc)
Gotcha. Not having a single point of failure is good, separating workloads is also good, but having too many "islands of storage" isn't. Are you thinking "two or three" or "six or seven"?
For RAM, is there an option to upgrade beyond 32GB of RAM or are they something like early E3 Xeons that max out at that amount? Even with all-flash back-end, RAM is still a whole lot faster than SSD, and I'd say shoot for 64GB or more.
Here are a few I have been reading:
https://www.reddit.com/r/linuxadmin/comments/35aomm/kvm_with_zfs
https://www.reddit.com/r/zfs/comments/5n2hrq/zfs_cow_ok_for_databasesvms/
To be fair, these articles are a little dated and generally refer to BTRFS and ZFS on COW filesystems. Maybe things have changed recently - especially on an all-SSD system (fragmentation not an issue)?
I was hoping more for "case study" not "Reddit post" ... those are indeed old news but even in them I see people reporting that their KVM on ZFS setup works fine, which lines up with my personal experience (as well as VMFS on ZFS) - you just need to make sure you set your pools/vdevs/datasets and their tunables correctly. (See bottom of post.) Fragmentation on all-SSD is less of an issue because "seek time" is effectively 0ms.
Also, while I think ZFS is cool, past experience using ZFS on Linux (0.6.5 days) was painful (16-bay Supermicro server with 16x2TB seagate enterprise drives, 32G RAM):
* Required too much RAM to operate properly
* Read/write performance tended to degrade over time (from hundreds-MB/sec to MB/sec) for no apparent reason
* Took too much time to properly debug performance issues
After spending way too much time trying to tune the box, I reformatted the system with mdadm/XFS and moved on. No more performance issues.
At this point, I am (re)evaluating ZFS to see if it will perform as good as XFS on these high-spec'd systems.
ZoL has improved quite a bit since then, I can't recall but I believe there were quite a few bugs in the code back then that could cause weird performance issues. In regards to overall performance, FreeNAS does a good job with its default tunables for most scenarios, but assuming you go ahead with this one you'd probably want to change a few things.
1. I'm reasonably sure that all Samsung 3D NAND drives use an
8KB internal page size. You'll want to ensure that your vdevs and pool are created with
ashift=13. Otherwise, your drives will all be doing a read-modify-write for every 4K block.
2. Set
recordsize=16K on the NFS export datasets. Otherwise, when you build a VM there, it will create the vdisk with big chunky
128K records.
8K would match the ashift size directly, but your sequential throughput will suffer.
3.
atime=off - you don't need to update this every time you read from a .VMDK
4. Set the tunable
vfs.zfs.metaslab.lba_weighting_enabled: 0 - by default the ZFS metaslab indicator treats all drives like spinning platters, where the "outer edge" is spinning faster and has better performance. Since you have all-flash, you can turn off the LBA weighting.
Again, back to my TL;DR at the top though; you have several systems and presumably no one breathing down your neck to "implement this right now!"
Have yourself a Storage War!