Recommended build for compute clusters and backups

Chedda7 · Apr 26, 2023

I’m looking to use TrueNAS Scale as a storage server in a rack with other servers that host VMs for Kubernetes clusters and web apps. The k8s clusters are doing a mixture of compute with Argo Workflows and app hosting themselves. I’m coming from a stronger background in AWS so I was hoping to get some info on hardware needed to do this properly.

1. Speed: I’m used to using SSD EBS volumes on EC2 nodes in AWS, is it generally recommended to stay with flash for an on-prem deployment? I would except ~50 VMs total accessing this storage server via NFS or iSCSI. We don’t need bleeding edge speeds here, but I would like good performance and I expect this system to be deployed for 5+ years. We have 25Gbit networking, would be willing to consider DAC between physical nodes as well.
2. Size: Looking to stay under 100TB usable. The VMs aren’t needing massive storage and this will be the target for backups as well. It seems striped mirrors is recommend for flash systems and something like 2-3x VDEV ZRAID6 for spinning. Is that accurate?

sretalla · Apr 26, 2023

Chedda7 said:
~50 VMs total accessing this storage server via NFS or iSCSI

Chedda7 said:
looking to use TrueNAS Scale

Those 2 things aren't a good combination right now (but might be down the line somewhere).

Use CORE for now if you want iSCSI.

Chedda7 said:
It seems striped mirrors is recommend for flash systems and something like 2-3x VDEV ZRAID6 for spinning. Is that accurate?

Not really.

Striped mirrors are recommended for block storage (i.e. VMs). (lots of spinning disks with SLOG on optane or fewer SSDs with optional optane SLOG)

File storage would be served well by spinning disks in RAIDZ2.

Resource - The path to success for block storage

ZFS does two different things very well. One is storage of large sequentially-written files, such as archives, logs, or data files, where the file does not have the middle bits modified after creation. This is optimal for RAIDZ. It's what most...

www.truenas.com

Chedda7 · Apr 26, 2023

sretalla said:
Use CORE for now if you want iSCSI.

Roger that. Do you have a recommendation of NFS or iSCSI? I've used NFS in the past to that is straight forward for me, iSCSI is new.

sretalla said:
Striped mirrors are recommended for block storage (i.e. VMs). (lots of spinning disks with SLOG on optane or fewer SSDs with optional optane SLOG)

I think I am tracking here. I've found that flash and spinning are similarly priced for our use case when you consider that you need a lot of disks to get the performance up. The fact that we don't need a lot of usable space has me leaning towards flash.

I've read that flash can be intensive, would a system with 2x EPIC 7313 (16-core 3.00GHz 128MB Cache) be sufficient here? Drive count would be around 12.

sretalla · Apr 26, 2023

Chedda7 said:
I've read that flash can be intensive, would a system with 2x EPIC 7313 (16-core 3.00GHz 128MB Cache) be sufficient here? Drive count would be around 12.

It's more about lanes if you're using NVME flash, so more CPUs is more lanes.

It's not that bad if you're on SAS/SATA.

Chedda7 said:
Do you have a recommendation of NFS or iSCSI? I've used NFS in the past to that is straight forward for me, iSCSI is new.

NFS is fine if it means sticking with what you know. Maybe means you can consider SCALE.

Chedda7 · Apr 27, 2023

Thanks for the info @sretalla. I read through the article you posted about Block storage again and bug into the additional readings there as well. It’s helped me to determine that we definitely want to go with flash storage here.

I did some poking around, it seems NVMe at larger densities is the best bang for buck that I can find.

Thinking of getting 12x of these 6 mirrored vdevs: https://www.cdw.com/product/micron-9300-pro-ssd-7.68-tb-u.2-pcie-3.0-x4-nvme/7333343

This will help to keep utilization down and performance up as suggested.

SLOG seems optional, but good safety. Is that accurate? If so I will source an octane drive.

For read cache, how important will it be when using nvme? And also, if I go with 512GB or 1TB of RAM will that suffice for the cache?

sretalla · Apr 27, 2023

Chedda7 said:
SLOG seems optional, but good safety. Is that accurate? If so I will source an octane drive.

SLOG is something that you do in order to "mitigate" the performance penalty of doing sync writes.

Sync writes are safe in their nature, so SLOG doesn't really help with safety of the data, it just makes the speed of sync writes suck less.

Chedda7 said:
For read cache, how important will it be when using nvme? And also, if I go with 512GB or 1TB of RAM will that suffice for the cache?

As much RAM as you can give it will allow ARC to be as good as it can be with holding the needed data ready for serving up. If your pool is already NVME, I wouldn't be wasting my money on L2ARC with that much RAM.

jgreco · Apr 27, 2023

sretalla said:
NFS is fine if it means sticking with what you know. Maybe means you can consider SCALE.

Storage focused users should probably remain on CORE unless they have a specific reason to use SCALE. CORE is an extremely stable platform and there are a lot of benefits to having the stability rather than the relative mayhem that is SCALE. Applications such as fileservice for clusters and backups are better served by long uptimes and minimal change deltas.

sretalla · Apr 27, 2023

If I can draw a small analogy to the loading of bags on a plane at an airport to clarify an application asking for an async write vs a sync write (and with/without SLOG).

Scenario 1, Async write:

Traveler (the application) arrives at the check-in desk and says, "I'm here to get on flight X (the pool), here's my bag (the data)". Drops the bag and continues on to have a coffee or whatever before the flight. Maybe the bag gets lost before making it to the plane, maybe it doesn't. The traveler will only find out at the baggage carousel at the destination airport (the analogy for the next attempt to read the data).

Scenario 2, Sync write (No SLOG):

Traveler arrives at the check-in desk and says, "I'm here to get on flight X, here's my bag, I don't want it to go missing, I'll wait for you to tell me it's on the plane". Hands over the bag and stands there waiting. A lot of time passes and the traveler is then informed that the bag is on the plane and can continue on their way to board (but they will need to run now, since the flight is nearly ready to depart). (missing the flight in this analogy would be an "application timeout")

Scenario 3, Sync write (With SLOG):

Traveler arrives at the check-in desk and says, "I'm here to get on flight X, here's my bag, I don't want it to go missing, I want the white-glove baggage service, I'll wait for them to tell me it's on the plane". Hands over the bag to the "white-glove service attendant" (the SLOG in the analogy), who quickly makes their way directly to the plane with the bag, using all shortcuts, tricks and special access in the airport to get back to inform the traveler the bag is on the plane. Some time passes and the traveler is informed that the bag is on the plane and can continue on their way to board.

For scenarios 2 and 3, the bag is there on the carousel at the other end.

Clearly, the tradeoff for knowing that your bag is on the plane is painful, but can be worth it if you can't afford to be missing your bag.

I'm radically oversimplifying the story regarding the write cache in RAM, ZIL and SLOG, but the high-level process is fairly well covered from the perspective of the application or system requesting the write.

Important Announcement for the TrueNAS Community.

Recommended build for compute clusters and backups

Chedda7

Cadet

sretalla

Powered by Neutrality

Resource - The path to success for block storage

Chedda7

Cadet

sretalla

Powered by Neutrality

Chedda7

Cadet

sretalla

Powered by Neutrality

jgreco

Resident Grinch

sretalla

Powered by Neutrality

Similar threads