All-flash FreeNAS build w/ 10GbE

5mall5nail5

Dabbler
Joined
Apr 29, 2020
Messages
14
Hey all - not new to FreeNAS or ZFS as a whole, but it's been a while. I have 3 large-ish ESXi hosts in my lab each with (2) E5-2680v2 CPUs and 384GB of RAM and various amounts of spindles and SSDs within on local hardware RAID controllers. My RAID controllers are no longer on the vSphere 7 HCL and so I am looking to go to a storage server and 2 ESXi hosts in my lab.

The FreeNAS server will be a 36-bay (3.5"), (2) E5-2680v2 CPUs, and 384GB of RAM. I have a LSI 9207-8i controller ready to go in. I plan on the following vdev/pools:
  • (16) SATA Samsung PM863 960GB SSDs - mirrors
  • (10) SATA Intel S3710 400GB SSDs - mirrors
  • (10) SATA 8TB 5400 RPMs - some configuration of spanned RAIDZ1 I think with a NVMe SLOG device - just need capacity for large data (media, logs..) mostly read IO here. I do some DVR to a VM so it might be good to have SOME write performance but nothing like the flash-tier. If I can add a SLOG
This storage is going to serve primarily as lab VM storage. I have a BUNCH of VMs/containers that run out of vSphere so I will connect this to my Arista 10GbE switch via (2) 10GbE interfaces for iSCSI multipath. I have (2) Dual-Port Broadcom BCM57810 NICs (SFP+) planned for this.

I haven't created an all-SSD pool yet, so I am not sure if there are any special options I should set during creation of the pool such as ashift or anything. And I haven't done much with zvols/iSCSI in ZFS/FreeNAS so is there any special considerations about block size or anything for VM storage (VMFS 6). I am going from hardware RAID with FastPath IO or whatever its called to shared storage, so I just want to make sure I see the same or similar performance on the flash tier. I am not opposed to putting the spinning disks and SSDs on separate 9207-8i controllers if need be - they are going to be on separate backplanes anyway.

I have built a ~832TB ZFS on Linux storage platform before but I was doing NFS datastores to vSphere and had Intel P3700 NVMe SSDs for SLOG, etc. I have a blog post on that build here: https://www.jonkensy.com/832-tb-zfs-on-linux-project-cheap-and-deep-part-1/

Do you think it's worth it to get an Optane-based SLOG device for my SATA spinning pool? If so which? Any general pointers on config? I have a FreeNAS 9.10 setup still running for another lab but it's all 7,200 RPM enterprise disks and I don't touch it, it just runs (obviously, at that patch level).

Thanks for any advice all!
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Are you chill with losing your HDD pool on resilver? If that'd be a hassle, I'd consider raidz2.

How is the HDD pool accessed? Will these be sync writes? If not, leave SLOG off.

How about the iSCSI targets? Will those be sync writes? If so, you want SLOG for those.

ashift=12 is standard by now I think, but verify.
No thoughts on block size, I'll leave that to people with actual experience.

Broadcom: Maybe? Do you have evidence these work well in FreeNAS? They're not the "tried and true" go-to choice. See https://www.ixsystems.com/community/resources/10-gig-networking-primer.42/.

As for choice of SLOG device, start here: https://www.servethehome.com/buyers...as-servers/top-picks-freenas-zil-slog-drives/
 
Last edited:

5mall5nail5

Dabbler
Joined
Apr 29, 2020
Messages
14
I have Veeam backups, so it's not HORRIBLE if I lose a disk during resilver. RAIDZ2 vs spanned RAIDZ is the same, almost, statistical failure rate without the double write penalty. I am talking putting two vdev raidz in a span, so 4 disk vdev x 2, vs single 8 disk raidz2. The SSDs all have PLP but obviously the SATA spinning pool does not.

I am not really sold on having sync writes in this case as there are so many other failure points in a lab. Would I want an SLOG for sync writes to iSCSI pool for SSDs? That would seem unnecessary to me but I am not sure.

Yes, the BCM57810s work well with FreeBSD/FreeNAS in my experience. They're nice NICs. I also have some Chelsio and Intel X520s, but the 57810s are good IMHO. I have a ton of lab equipment so not super concerned about those NICs specifically.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
Would I want an SLOG for sync writes to iSCSI pool for SSDs? That would seem unnecessary to me but I am not sure

Depends on the speed you are looking for. There is a world of difference between SATA SSD and Optane (or even "just" NVMe) when it comes to SLOG. The IOPS aren't "even to begin to compare". You can always start without SLOG, see whether your write speeds are sufficient for your use case, and how they differ with sync on and sync off. "Sync off" is, as you know, your best case performance you might get close to with "sync on" and a SLOG.

Edit: I should add I am theory-crafting. I have no hands-on experience.
 

Yorick

Wizard
Joined
Nov 4, 2018
Messages
1,912
same, almost, statistical failure rate without the double write penalty

I am wondering about that. If you view disk failure rate as evenly distributed, that's true. But is it? It's the resilver itself that increases the risk of disk failure in the vdev that's being resilvered, because of the additional stress on the disk.
That said, hey, you may be okay "forever", and you have backups.
 

5mall5nail5

Dabbler
Joined
Apr 29, 2020
Messages
14
I am wondering about that. If you view disk failure rate as evenly distributed, that's true. But is it? It's the resilver itself that increases the risk of disk failure in the vdev that's being resilvered, because of the additional stress on the disk.
That said, hey, you may be okay "forever", and you have backups.

Yes but I think about it this way - since ZFS only rebuilds used data (unlike hardware RAID), if a volume is 36TB used out of 48TB (ignoring math) on an 8-drive RAIDZ2 I have to rebuild 6TB if a disk goes away. If I have two RAIDZ groups of 4 spanned, and a disk fails, 36TB/2 = 16TB / 3, I only have to rebuild 5TB and change. So, it's less data to rebuild and doesn't stress the other 4 disks out. I haven't done deeper research but it seems more ideal in some situations. I am not talking about doing a single 8-disk RAIDZ.
 

StoreMore

Dabbler
Joined
Dec 13, 2018
Messages
39
Some food for thought on your SLOG for SSDs. I was talking to IXSystems the other day for some official hardware quotes. We were discussing an all SSD array of at least 8 disks when the SLOG topic came up. Their response was if you are running all SSDs its generally not necessary for a SLOG device in many cases. And, if it does make sense based on a specific situation it will generally need to be faster than the pool of SSDs to see any performance benefits.

For spinning disks, As @Yorick pointed out above it only makes sense in specific cases. General file copies won't likely benefit. If running VMs with sync writes on spinning disk you will see a benefit with the SLOG but again its workload specific.
 

5mall5nail5

Dabbler
Joined
Apr 29, 2020
Messages
14
Some food for thought on your SLOG for SSDs. I was talking to IXSystems the other day for some official hardware quotes. We were discussing an all SSD array of at least 8 disks when the SLOG topic came up. Their response was if you are running all SSDs its generally not necessary for a SLOG device in many cases. And, if it does make sense based on a specific situation it will generally need to be faster than the pool of SSDs to see any performance benefits.

For spinning disks, As @Yorick pointed out above it only makes sense in specific cases. General file copies won't likely benefit. If running VMs with sync writes on spinning disk you will see a benefit with the SLOG but again its workload specific.

Totally agree - and sound logic. I wouldn't put a SLOG on the SSD pool - only the SATA spinner pool. I just tore down this bare metal ESXi host and swapped the RAID controller for the HBA and jammed some disks and SSDs in it to do some testing.
 
Top