New iSCSI server for VMware VSAN and General Storage. RAM usage and utilization advice are needed!!!

arigas1982

Cadet
Joined
Jul 6, 2022
Messages
2
Hi team.

I am Alex and a newbie in the Forum. I need some assistance with initial config and RAM usage.

I am currently building a SAN to use as a file server, but also to use as a storage provider for Citrix CVAD and VmWare Horizon and VSAN certification lab.

I will be using primarily iSCSI to present the storage to ESXi and Hyper-V (2019 std).

My rig is not here, but this is the composition:

An old Supermicro X9DR-LNF4+ MB (4 x 16x PCIe 3.0 + 2x 8x PCIe 3.0 )
2x E5-2430L v2 Xeons
512GB LDRDIMM Ram DDR3 10600 (8x 64GB)
1x HP dual 10GB SFP+ Ethernet networking. (NC550SFP)
1x LSI 9271-8 6GB SAS Controller
4x 16TB WD rust drives to create an L3 28.8 TB drive, ZFS equivalent to Raid 10 if such a thing exists.
4x 2TB SSD QVO to create an L2 3.2 TB drive, ZFS equivalent to Raid 10 if such a thing exists.
3x 1TB name on PCIe adapters to either us as cache storage or to create a single loss redundant 1.8 TB L1 drive ( I hope this config is viable / exists in ZFS)

I am also thinking of a 900P Optane 480GB as an alternative to cache.

I know the system is old, and I do not expect a lot of IO or RDMA and all the new goodies, I just need decent 10G performance for deduplication and stability for the VSAN.

My questions are the following:

1) I do not know how to best utilize the memory and /or if I need to get more since LDRDIMM memory is quite affordable and available. ( I can go up to 1.5 TB) If so I would prefer to buy more ram than Ooptane.
2) Do I need to use the NVME and/ or the Optane drives as cache or can I use the ram, like a ram drive to do so?
3) are there any suggestions on Ram drive usage that could speed up the performance and stability

PS. even though this is a lab storage solution, data integrity is the most important thing, not performance, so please if you have time to spare and get back to me, consider that as the most important aspect of the storage.

Thanks for the help team, I hope my rig is not badly designed, I would appreciate any input that you can send.

Alexandros.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hello Alex,

Before we dive into the specifications, and a very long post, I'd like to ask for some details on your use case:
VmWare Horizon and VSAN certification lab.
While Horizon will work quite well on TrueNAS (speaking from experience) - re: vSAN, this is a hyperconverged storage technology from VMware, designed to leverage storage located in each physical host rather than a centralized location. You could use nested virtualization to set up drives for use in a "virtual vSAN" configuration, but this would likely have some performance ramifications and is a little beyond the scope of this forum.

An old Supermicro X9DR-LNF4+ MB (4 x 16x PCIe 3.0 + 2x 8x PCIe 3.0 )
2x E5-2430L v2 Xeons
512GB LDRDIMM Ram DDR3 10600 (8x 64GB)
1x HP dual 10GB SFP+ Ethernet networking. (NC550SFP)
1x LSI 9271-8 6GB SAS Controller
4x 16TB WD rust drives to create an L3 28.8 TB drive, ZFS equivalent to Raid 10 if such a thing exists.
4x 2TB SSD QVO to create an L2 3.2 TB drive, ZFS equivalent to Raid 10 if such a thing exists.
3x 1TB name on PCIe adapters to either us as cache storage or to create a single loss redundant 1.8 TB L1 drive ( I hope this config is viable / exists in ZFS)
Your CPUs aren't physically compatible with your motherboard. The X9DR series takes E5-2600 series, not E5-2400 - make sure you have planned for the former. The reduced-TDP L-series also loses out on sustained clock speeds, so the non-L equivalent is fine to use.

512GB is more than enough RAM, more than I've seen be used in some production environments, and the use of large LRDIMMs leaves you room to upgrade as well.

Your 10Gb NIC isn't well supported though as it uses an Emulex chipset. Try to obtain a Chelsio T420/T520 or Mellanox ConnectX-3 instead. Have you identified a switch, or will you be direct-connecting VMware hosts?

Your LSI is a RAID controller, and must be replaced with a SAS HBA. See resource:


Regarding your L2/L3 nomenclature - TrueNAS isn't capable of "tiering" data, if you're referencing it in that manner. Multiple independent pools can be created. The ZFS equivalent to RAID10 is "mirror vdevs" and is definitely the preferred path for VMware storage. Although the QVO drives have very low write endurance, and might be better replaced with EVO or even Pro drives.

The 3x 1TB NVMe on PCIe can by made into a RAIDZ1 (single parity) but note that you'll also want to consider the model similar to the comments on the QVO above. Choose your SSDs according to the write workload you will be putting on your storage.

I am also thinking of a 900P Optane 480GB as an alternative to cache.

Where you'll need something like this is as a write log device, or SLOG - and as a matter of data integrity you will need one. Please see the resource here for the details:


But the short version is that without the SLOG and "sync=always" enabled on your VMware iSCSI extents, your data is not considered "safe."

There is a long thread here with comparisons of ideal SLOG devices, but you've already arrived at the "best generally-available device" which is an Optane 900p, 905p, or DC P4800/4801:


I know the system is old, and I do not expect a lot of IO or RDMA and all the new goodies, I just need decent 10G performance for deduplication and stability for the VSAN.
Again - can you elaborate on the "VSAN" point? If you plan to do nested virtualization with drives provisioned to virtual ESXi instances from TrueNAS, you'll likely be seeing some congestion issues and write amplification from the fact that both vSAN and ZFS will be providing redundancy, as well as some inherent latency - while 10GbE and 40GbE are both fast, they're not going to be the same as each device being on a local SAS/NVMe link.

Deduplication brings similar words of caution. vSAN dedupe, I assume?

1) I do not know how to best utilize the memory and /or if I need to get more since LDRDIMM memory is quite affordable and available. ( I can go up to 1.5 TB) If so I would prefer to buy more ram than Ooptane.
2) Do I need to use the NVME and/ or the Optane drives as cache or can I use the ram, like a ram drive to do so?
3) are there any suggestions on Ram drive usage that could speed up the performance and stability

1) You have enough RAM for now and they are large DIMMs. The Optane, on the other hand, you will need for data safety. Buy that first. But if you have money left, you won't be hurt by having a terabyte-and-a-half. ZFS will use all available memory as (primarily) read cache.

2) You need the Optane as a persistent write log for safety. Your RAM will automatically be leveraged as a very large read cache.

3) The inherent behavior of ZFS to use your RAM as an ARC (Adaptive Read Cache) is the best solution here. You can't use a RAMdisk for a write log as it isn't persistent (special NVRAM and NVDIMM devices notwithstanding, of course)

Cheers!
 

arigas1982

Cadet
Joined
Jul 6, 2022
Messages
2
Hello Alex,

Before we dive into the specifications, and a very long post, I'd like to ask for some details on your use case:

While Horizon will work quite well on TrueNAS (speaking from experience) - re: vSAN, this is a hyperconverged storage technology from VMware, designed to leverage storage located in each physical host rather than a centralized location. You could use nested virtualization to set up drives for use in a "virtual vSAN" configuration, but this would likely have some performance ramifications and is a little beyond the scope of this forum.


Your CPUs aren't physically compatible with your motherboard. The X9DR series takes E5-2600 series, not E5-2400 - make sure you have planned for the former. The reduced-TDP L-series also loses out on sustained clock speeds, so the non-L equivalent is fine to use.

512GB is more than enough RAM, more than I've seen be used in some production environments, and the use of large LRDIMMs leaves you room to upgrade as well.

Your 10Gb NIC isn't well supported though as it uses an Emulex chipset. Try to obtain a Chelsio T420/T520 or Mellanox ConnectX-3 instead. Have you identified a switch, or will you be direct-connecting VMware hosts?

Your LSI is a RAID controller, and must be replaced with a SAS HBA. See resource:


Regarding your L2/L3 nomenclature - TrueNAS isn't capable of "tiering" data, if you're referencing it in that manner. Multiple independent pools can be created. The ZFS equivalent to RAID10 is "mirror vdevs" and is definitely the preferred path for VMware storage. Although the QVO drives have very low write endurance, and might be better replaced with EVO or even Pro drives.

The 3x 1TB NVMe on PCIe can by made into a RAIDZ1 (single parity) but note that you'll also want to consider the model similar to the comments on the QVO above. Choose your SSDs according to the write workload you will be putting on your storage.



Where you'll need something like this is as a write log device, or SLOG - and as a matter of data integrity you will need one. Please see the resource here for the details:


But the short version is that without the SLOG and "sync=always" enabled on your VMware iSCSI extents, your data is not considered "safe."

There is a long thread here with comparisons of ideal SLOG devices, but you've already arrived at the "best generally-available device" which is an Optane 900p, 905p, or DC P4800/4801:



Again - can you elaborate on the "VSAN" point? If you plan to do nested virtualization with drives provisioned to virtual ESXi instances from TrueNAS, you'll likely be seeing some congestion issues and write amplification from the fact that both vSAN and ZFS will be providing redundancy, as well as some inherent latency - while 10GbE and 40GbE are both fast, they're not going to be the same as each device being on a local SAS/NVMe link.

Deduplication brings similar words of caution. vSAN dedupe, I assume?



1) You have enough RAM for now and they are large DIMMs. The Optane, on the other hand, you will need for data safety. Buy that first. But if you have money left, you won't be hurt by having a terabyte-and-a-half. ZFS will use all available memory as (primarily) read cache.

2) You need the Optane as a persistent write log for safety. Your RAM will automatically be leveraged as a very large read cache.

3) The inherent behavior of ZFS to use your RAM as an ARC (Adaptive Read Cache) is the best solution here. You can't use a RAMdisk for a write log as it isn't persistent (special NVRAM and NVDIMM devices notwithstanding, of course)

Cheers!

Hi HoneyBadger, and thanks for the quick reply.
  • For the CPUs
Indeed E5-2430L v2 will not work with the X9, my bad. I have two pairs of E5-2630L v2 and E5-2630 v2 that I can use. The 2430s will get the museum now!!!
  • For the HBA
I can use this one: Fujitsu D3307-A12 D3307-A100 CP400i 12GB RAID HBA Card LSI 9300-8I Low Profile, Its lying around unused I believe it will be just fine, I will just retire the raid controller.
  • For the SSD and NVME drives:
I know that the endurance on the NVME and the SSDs will be an issue. But as I said this is a lab environment for studies, and I do not expect to have any serious IO.
My usage will be :
  • FsLogix profiles ( around 120Gb on 4-5 vhdxs) -> on the SSDs
  • CVAD Infrastructure ( 2x VMs (Delivery controller + Storefront +FAS) + one WEM VM + one ELM appliance VM --> on the NVME, currently
    ~ 800 GBs. the crucial files : MS SQL - TRN + Bak + are less than 45 GBs
  • VmWare Infrastructure ( 2x VCSA in HA + the witness for vSAN, not sure how this is gonna go still) --> on the NVME, currently just 62 Gbs, including the VUM files.
  • the ELM is a big one, around 800GBs, but these a mostly static OS, App and Platform Layers, not a lot of writes here after the initial config.
I believe my daily writes are (give or take) around 10 to 15 GBs per day or 5-6 PB per year. I do not expect any serious wear.

For the Optane:
  • I have a 280 GB 900P Optane drive to use for the slog, Will it be enough or will I need a bigger drive?
For the VSAN part:
  • I have 3 small X10 Mobos, with 4x 1gbps intel networking. I also have 4 old Mellanox ConnectX-2 Infiniband 40/10 which I can run in 10GB Ethernet mode with no problems. in both Esxi 6.5 and 7 U3 they work fine with the modified MLX_4 driver.
  • My original idea was to use a DAC cable to connect the 10GB on the SAN to my Mikrotik CRC (24port 1GB + 2x 10GB sfp+) directly and then use LACP on the X10s (4x1GBPS) in a Distributed Vswitch (vSan, vMotion and Replication). I understand that on LACP the performance will be very poor, but again this is only for my studies, not a production environment, so my goal is just to learn how to configure and troubleshoot.
  • If this is not working I will go for a Mikrotik based 10GB sfp+ switch running SwitchOS, and see if I can make it work with the ConnectX-2 series. If not I will check for TrueNas compatible ones.
For the Deduplication, I would want it for both vSan Deduplication as you mentioned, but also for Citrix CMS image-based VDAs (LTSR 2203)

Check the above should you find time, and as always thanks for the support!

Thanks for all the info, Indeed you have saved me a lot of time and frustration. Very happy to see people taking an interest in newbies like me in such a detailed and helpful manner.

stay safe,

Alex.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hi HoneyBadger, and thanks for the quick reply.
  • For the CPUs
Indeed E5-2430L v2 will not work with the X9, my bad. I have two pairs of E5-2630L v2 and E5-2630 v2 that I can use. The 2430s will get the museum now!!!
  • For the HBA
I can use this one: Fujitsu D3307-A12 D3307-A100 CP400i 12GB RAID HBA Card LSI 9300-8I Low Profile, Its lying around unused I believe it will be just fine, I will just retire the raid controller.
  • For the SSD and NVME drives:
I know that the endurance on the NVME and the SSDs will be an issue. But as I said this is a lab environment for studies, and I do not expect to have any serious IO.
My usage will be :
  • FsLogix profiles ( around 120Gb on 4-5 vhdxs) -> on the SSDs
  • CVAD Infrastructure ( 2x VMs (Delivery controller + Storefront +FAS) + one WEM VM + one ELM appliance VM --> on the NVME, currently
    ~ 800 GBs. the crucial files : MS SQL - TRN + Bak + are less than 45 GBs
  • VmWare Infrastructure ( 2x VCSA in HA + the witness for vSAN, not sure how this is gonna go still) --> on the NVME, currently just 62 Gbs, including the VUM files.
  • the ELM is a big one, around 800GBs, but these a mostly static OS, App and Platform Layers, not a lot of writes here after the initial config.
I believe my daily writes are (give or take) around 10 to 15 GBs per day or 5-6 PB per year. I do not expect any serious wear.

For the Optane:
  • I have a 280 GB 900P Optane drive to use for the slog, Will it be enough or will I need a bigger drive?
For the VSAN part:
  • I have 3 small X10 Mobos, with 4x 1gbps intel networking. I also have 4 old Mellanox ConnectX-2 Infiniband 40/10 which I can run in 10GB Ethernet mode with no problems. in both Esxi 6.5 and 7 U3 they work fine with the modified MLX_4 driver.
  • My original idea was to use a DAC cable to connect the 10GB on the SAN to my Mikrotik CRC (24port 1GB + 2x 10GB sfp+) directly and then use LACP on the X10s (4x1GBPS) in a Distributed Vswitch (vSan, vMotion and Replication). I understand that on LACP the performance will be very poor, but again this is only for my studies, not a production environment, so my goal is just to learn how to configure and troubleshoot.
  • If this is not working I will go for a Mikrotik based 10GB sfp+ switch running SwitchOS, and see if I can make it work with the ConnectX-2 series. If not I will check for TrueNas compatible ones.
For the Deduplication, I would want it for both vSan Deduplication as you mentioned, but also for Citrix CMS image-based VDAs (LTSR 2203)

Check the above should you find time, and as always thanks for the support!

Thanks for all the info, Indeed you have saved me a lot of time and frustration. Very happy to see people taking an interest in newbies like me in such a detailed and helpful manner.

stay safe,

Alex.
No problem. I have some follow-up comments and a main question at the bottom re: hardware and the vSAN design.

CPUs
Use the 2630v2's - they'll both idle at roughly the same power/thermals, and there's no point in giving yourself an artificial speed limiter when you do want that full turbo speed.

HBA
A 9300-8i is perfect - make sure you've updated the firmware and that it's running in a pure IT mode. I'm not sure if the Fujitsu ones can or need to be "crossflashed" to become official LSI cards, or if they supply a true HBA/IT mode firmware.

SSD and NVMe usage
I think you're off by an order of magnitude - 10-15GB a day gives you 5-6 TB (terabytes) per year, not PB (petabytes) - but other than that it seems like you'll be well within your tolerance for even the QVOs. They do have a rather steep performance cliff if you manage to hit them with enough sustained writes to outstrip their internal pseudo-SLC caching behavior, but it doesn't seem like you'll be doing this in a lab setting except perhaps during initial provisioning. I'd suggest the QVOs be used for the ELM layers and as a separate replica datastore for Horizon as that will direct a big majority of reads there. Actual delta disks, FSLogix profiles, and the SQL would be better off on some 3D TLC NVMe.

Optane
280GB is fine here; however, you will likely want/need to cut several small partitions as each pool would require its own SLOG (in this case, a partition on the 900p) - given your large amount of RAM you could also increase the amount of data allowed to be in the write log from the default 4GB (writes are held in both RAM and your SLOG device) in order to absorb a bit more at a time.

vSAN
I'm still a little unclear on the hardware layout. Do you have separate physical disks to put in each of these 3 X10-based hosts for use as cache/capacity? Or are you planning on making a virtual ESXi install as a VM on each X10 host (nested virtualization) and presenting storage from the TrueNAS machine (over the 10GbE backbone) to use as virtual cache/capacity disks? If it's the latter, then the use of 1GbE interconnect for vSAN will likely become a bottleneck before anything on the 10GbE "back end storage network" and you'll have fewer concerns there.

(Side note re: mixing LACP and vSAN, here's a YouTube video for researching, but LACP often introduces more complexity than you might want in a learning scenario. I'd stick with using regular active/standby failover if you're just dipping into vSAN for the first time.)


Deduplication
vSAN deduplication is a different beast to ZFS, occuring during cache write-back to capacity. It's not available on hybrid so you'd have to ensure all of your disks are detected as flash - which if you're doing the virtual-vSAN using physical spinners on TrueNAS for capacity, is going to give you "performance aberrations" to say the least, as all-flash vSAN doesn't use read caching.

For the Citrix and Horizon workloads though, you'll get a lot of data reduction through the MCS (Citrix) and Instant Clone (Horizon) workflows, because the only full/thick clone will be your master snapshot/replica machine (that can live on the QVO's) while your per-machine deltas can be redirected to the more write-tolerant disks. If you do leverage it at the ZFS level, I would stick to only the "Gold Master Source" and "Replica Copy" LUNs - that way, you don't have the storage footprint of your base images showing up twice, and the clone process will hopefully have inherently-aligned I/O to let it eliminate the back-end writes.

For the rest of the datastores, see what kind of space-savings you get from compression alone (LZ4 or ZSTD)

Hardware/vSAN
Can you just clarify the hypervisor host hardware here? Virtual-vSAN is going to introduce all kinds of "Interesting Situations" as far as the interactions with putting a SAN on a SAN.
 
Top