Pure flash configuration - spot anything bad?

Another Guy

Cadet
Joined
Jan 10, 2023
Messages
4
Hey guys,

We are currently looking at replacing two of our current storage setups with TrueNAS core deployments running on Supermicro servers and I've ended up with the following config. If anyone here can spot any problems in this setup it would be awesome. (I will compensate with a few drinks if we ever meet in real life).

For some background, we had a look at many MANY other options including Lenovo, Dell, Promise, VSAN and TrueNAS enterprise solutions, but we are in a situation where our storage requirements is beyond our budget even with the amazing prices the TrueNAS team have presented us. I know a lot of you guys will point back at those options, but with

main storage:
  • Supermicro 2114S-WN24RT running AMD EPYC 7313P (16 core, 3GHz) with 256GB RAM
    • Networking: Intel XL710-QDA 10/40GbE - 2 x QSFP+ and 2 x 10GbE on-board
    • OS Volume: 2x800GB Micron MAX 7450 M.2 NVME
    • Storage Volume: 10 x 6.4TB Micron MAX 7450 NVME drives.
    • Extra Bits: Broadcom HBA 9500-8e (this is purely for redundancy - see second unit below for detail)
Notes:
Since its a pure NVME storage server we are not sure if dedicated caching drives hold much value?

We also need to replace our current NAS, which is primarily used as a backup target and for cold storage.


NAS:
  • Supermicro 2114S-WN24RT running AMD EPYC 7313P (16 core, 3GHz) with 256GB RAM
    • Networking: Intel XL710-QDA 10/40GbE - 2 x QSFP+ and 2 x 10GbE on-board
    • OS Volume: 2x800GB Micron Max 7450 M.2 NVME
    • Extra Bits: Broadcom HBA 9500-8e (will handle JBOD)
  • Supermicro SuperChassis 847E1C-R1K23JBOD (44 x 3.5" SAS bays)
    • Storage Volume: 18 x 14TB UltraStar DC HC530 SAS drives
    • SLOG/ZIL: 2 x 3.8TB Micron 5400 MAX SATA
    • Read Cache: 2 x 7.68TB Micron 5400 PRO SATA
About Extra bits:
The idea is to make the two system provide some redundancy to the other.

If the "NAS/JBOD" unit's server fails - we can migrate all the drives by moving the JBOD to the "main" unit and importing the disks.

The other way around (if the "main" storage fails) is a lot more risky since it involves moving the 10 NVME drives into the "NAS" server and importing the volumes. We plan to do some testing with this before we move to production. we've written these risks in big red bold letters to business.

I'm happy to go into more details if anyone wants, but felt this was probably a big enough favour to ask from a community I haven't actively contributed to yet. (we do plan to give feedback if anyone is interested)
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Without information on the workload and the pool/vdev layout it is not possible to make any reasonable comments.

Manual fail-over scenarios, like the one you indicated, are indeed a tremendous risk. Depending on industry regulations/compliance and your local legislation, you may even be personally liable for damage. That is something I would check out.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
  • OS Volume: 2x800GB Micron Max 7450 M.2 NVME
Those are really big drives for OS.
Also, please read the following resource if you want HA.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello @Another Guy

I'd love some more details with regards to the workload you're planning to put onto the main/head unit NVMe pool. Ten NVMe Micron MAXs is going to deliver a prodigious level of IOPS and throughput - will you have multiple client machines hitting it concurrently to be able to saturate it?

Since its a pure NVME storage server we are not sure if dedicated caching drives hold much value?

I wouldn't recommend them here - the L2ARC generally needs to be significantly faster than the pool vdevs to have value, and you're selecting some very speedy devices for pool members.

I also work for iX, so if you'd like to share anything privately, feel free to send me a DM.

Cheers!
 

Another Guy

Cadet
Joined
Jan 10, 2023
Messages
4
Hello @Another Guy

I'd love some more details with regards to the workload you're planning to put onto the main/head unit NVMe pool. Ten NVMe Micron MAXs is going to deliver a prodigious level of IOPS and throughput - will you have multiple client machines hitting it concurrently to be able to saturate it?

Workload is mainly from 6 esxi hosts, hosting roughly 150 VM's. There is the potential to migrate another 3 host esxi cluster that runs our logging. (a medium sized Elasticsearch cluster handling an average of about 400 logs per second). That cluster eats iops at an astonishing rate and currently runs isolated away from everything. We're cautious about migrating its workload onto our main production storage. We also need to do backups of some VM's on an hourly basis, without stunning the production environment. Overnight backups is less of a concern.

When considering our options these two things eventually pushed us to the NVME storage layout:
  • We need enough iops capacity for the next 3-5 years.
  • After running on SSD fronted storage for the past few years, we know we need "all-flash".
  • The 'cost per GB' of high endurance NVME drives is lower than equivalent SAS or SATA SSDs. (probably the biggest motivation)
Even if we moved everything we have onto that unit I am sure there is still significant headroom.

I will send you a DM with some more details.
 

curruscanis

Dabbler
Joined
Feb 15, 2018
Messages
17
Another Guy, I would like to know if you continued to get any feedback on your system build, we are researching a similar build using the Supermicro 2114S-WN24RT, and an all NVME disk array for VMware storage to an ESXi cluster.

Given the similarities, it may be possible to share insight.

Our goals are iSCSI block storage with as much total space as possible given the use of Mirrored vdevs as recommended by most for traditional drive arrays and recommendations to my inquiries even with NVME based storage array. With a supermicro 2u chassis and the 24 bays, that gives me a total of approximately 184TB of raw storage of 24x7.68 NVME drives so a usable array of 46TB for my vmware storage which for us is enough.

We were considering the same CPU, AMD EPYC 7313P, but minor differences of a couple of 480GB nvme drives for OS, and adding 512GB of RAM and an Intel X710-DA4 for SPF+ 10G, connectivity...

Your workload does differ from ours, we are only using 3 ESXi Hosts, and only about 20 VM's. Our current FreeNAS system does the job on spinning disks but we are looking to upgrade our speed and provide better future use. So if you have input on this I would appreciate it.

Thank you for any input you may have or sharing of information.
 

Another Guy

Cadet
Joined
Jan 10, 2023
Messages
4
Hey Curruscanis - what you have there is well spec'd out :) Its possibly way too much for 20 VMs :) but maybe your VM's are guzzlers?

It all comes down to the workload and its best for you to work with actual numbers. You can look at the write iops in vcenter performance and use a generous time-frame (1 year) to see your averaged write iops (I think this metric is best viewed on the controller level, not the datastore level). We used VeeamONE reporting and used some other tool in the past as well, so we know what our Peak iops and average iops are as well as our throughput metrics. (Dont leave this up to "gut feel" - work with your numbers)

The 7.68TB drives are usually lower endurance than the 6.4TB drives and @ 2000 write iops they wont last 4 years. The the best of my knowledge if you dont have a dedicated zil device in the config, the SLOG will live on the pool and this will cause your writes to double. (1 x to SLOG and then 1 x from SLOG to the data pool). This means the < 4 years life suddenly drops down to < 2 years. So I'd recommend you dedicate two of the drives as SLOG devices at least, and consider using higher endurance drives. Also look at overprovisioning the SLOG device to increase its lifespan.

Beyond that I cant really add too much - we seem to be in the same point of our journey. We are also currently considering mirrored vs parity disk layouts.

I am working with TrueNAS to see if we can find a supported Enterprise system, but at this point it looks like we might not be able to get what we need with the budget that we have. The guys have been great though and I highly recommend you engage with them. (again knowing your peak and averaged io requirements are key both iops and MB/s throughput metrics).

The value of having a dual controller system is immense - if we go with this SuperMicro option we have to engineer our fail-over ... using storage replication, manual tasks or using Veeam replication to a second TrueNAS unit (or something else if anyone has any ideas).
 

curruscanis

Dabbler
Joined
Feb 15, 2018
Messages
17
Thank you for the point about looking at my numbers, I will indeed do that as best I can. Sadly with the limited "essentials" version of VMware the monitoring of such statistics is extremally limited by the license, I am attempting to find a way to pull those numbers over time.

I also understand about the SLOG... I was considering your solution of pulling a couple of the drive bays for that as well.

I will review the write life on the drives... my goal for the 7.68TB drives is the total capacity fits my needs even with the high price point. I will see if I can squeak by with the 6.4TBs... even in a 22 drive layout.

As for redundancy, we already use Veeam Replication from our current FreeNAS to an older Dell SAN. I intend to continue this design from the new TrueNAS to the old FreeNAS... but I think I will upgrade the old FreeNAS to TrueNAS once everything is off and stable. We also do full backups to another array off site daily.

Thank you again for your time.
 

Another Guy

Cadet
Joined
Jan 10, 2023
Messages
4
I just realised my endurance/life expectancy calculations are flawed. Please disregard my estimated life expectancy for those drives.

I suddenly have an urge for a slice of humble pie.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Joined
Jul 3, 2015
Messages
926
Joined
Jul 3, 2015
Messages
926
The other way around (if the "main" storage fails) is a lot more risky since it involves moving the 10 NVME drives into the "NAS" server and importing the volumes. We plan to do some testing with this before we move to production. we've written these risks in big red bold letters to business.
I wouldn't fancy the idea of moving the 10 NVMe drives but that's just because Im lazy. I would instead use replication to send the data to your other NAS and then you have a failover ready and waiting if ever needed. This is how I build all my systems. I have two DCs geographically apart and primary replicates to secondary and in the event of a disaster we just failover over (this could be via DNS, communicating new address to customers or better still like me if using SMB use Microsoft's DFS namespace). If the failover is temporary (i.e. you think you can fix primary server within a suitable timeframe for us that's about a working day) then simply give users read-only access to their data so no new data is written and fail-back will be trivial. If however the disaster is deemed unrecoverable on the primary (or at least not within your time window) then give them read/write. You then need to rebuild your primary and plan a migration back again using replication.
 
Top