Suggestions for vdev structure

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
Hi everyone. First of all thanks for taking your time to read this post.

I am working on a build for a hybrid system using ZFS. The chassis has 24 3.5” bays for hdds and the will be 2 cards connected to two x8 slots containing up to 8 NVMe devices. We haven’t the cards yet.

The planed hdds are devided into 3 diferent pools.

Pool1 one is raid-z2 with 8 drives of 1TB each. It stores less frequently changed data.

Pool 2 is for home folders of both Linux and Windows Machines. It will also host an instance of TimeMachine. The plan is using RaidZ2 to improve reliability at the expense of performance.

Pool3 is for databases and we haven’t bought the hdds yet. There is a chance that they might end on the nvme and the extra 8 drives be used as a backup volume. This would mean transferring the TimeMachine to this pool.

NVMe pools are planed to use block storage or (;inclusive) database storage.

Hardware: Dual xeon e5-2643 v2 with 128GB of ecc ram evenly spreaded between CPUs. The CPUs are 3.5Ghz Xeons with 6 cores and 12 threads which seem to work well with pools with compression set to gzip-9.

Dual 10Gbit cards for San.

The motherboard is a SuperMicro X9DRD family with 6x PCIe x8 slots And an embedded hba in IT Mode with 8 links. The is also a card with a second hba from SuperMicro with 8 links.

To fill the entire chassis we might have to use an expander or buy another HBA. We aren’t sure yet . We have an intel expander but so far we haven’t tested if it works with the onboard hba connected with 8 links. We have had problems in the past with 8 link setups on expanders.

My question is simple but wide. Do you think this system is viable? Would you propose changes?

Thank you in advance for your help.

Best Regards,

MAN
 
Joined
Oct 18, 2018
Messages
969
My question is simple but wide. Do you think this system is viable? Would you propose changes?
It is hard to say. For one, there is some important hardware information lacking. For example, exactly what model of motherboard do you have? Would you mind providing it to save folks from having to match your description to the info on Supermicro's site? Also, I assume your chassis has a backplane, what model is the backplane? Also, what model numbers are the cards you're planning to plug those NVMe devices into? Depending on the card you may need to make sure your PCIe ports support bifurcation.

Also, what version of FreeNAS do you plan on running?

It will also host an instance of TimeMachine.
Make sure you properly set quotas etc so that TimeMachine doesn't consume the entire pool. Search around the forums to find folks discussing this topic.
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
It is hard to say. For one, there is some important hardware information lacking. For example, exactly what model of motherboard do you have? Would you mind providing it to save folks from having to match your description to the info on Supermicro's site? Also, I assume your chassis has a backplane, what model is the backplane? Also, what model numbers are the cards you're planning to plug those NVMe devices into? Depending on the card you may need to make sure your PCIe ports support bifurcation.


Thanks for your promptly reply. I was travelling home and I just joted down what I had in mind.

Board:
https://www.supermicro.com/products/motherboard/Xeon/C600/X9DRD-7LN4F-JBOD.cfm

CPUs:
2x https://ark.intel.com/content/www/u...-processor-e5-2643-v2-25m-cache-3-50-ghz.html
Acording to what I have read, the high clock rates are good to provide compression, encryption, check summing and parity.

RAM:
ECC DDR3 128GB

Projected NVMe Cards:
2x https://www.amazon.com/ADWITS-Adapt...keywords=nvme+pcie+x16&qid=1568315776&sr=8-23

They have bifurcation although I loose the x16 download link.

or

2x (but less sticks and cheaper)

https://www.amazon.com/Supermicro-A...G6YEXJHPAGH&psc=1&refRID=9Z09Y2RYDG6YEXJHPAGH


HBAs:

1x internal

1x https://www.amazon.com/SAS9211-8I-8...hbo&qid=1568316105&s=electronics&sr=1-1-fkmr0

Expander:
1x https://www.amazon.com/Intel-RES2SV...ords=intel+sas+expander&qid=1568316283&sr=8-2

If I am not mistaken, this means 4 slots used.
There are 2 remaining which could accomodate the third NVMe cards and:

Infiniband Card:
ConnectX2 with 2 ports. (I have terrible experiences with infiniband and freebsd)


Also, what version of FreeNAS do you plan on running?

This is the tricky question... So far we have a linux running ZFS. That was one of the reasons my question was vague. To have extended infiniband support we have to drop the FreeNAS and setup manualy. There's no problem cause we have skilled personal at the company.


Make sure you properly set quotas etc so that TimeMachine doesn't consume the entire pool. Search around the forums to find folks discussing this topic.

Glad you reminded me of that. Thanks :)


I forgot to mention the projected Backup:

1 x HP Microserver G7 with 4x 3TB WD Red (not acquired yet) and maybe RAID-Z2 because the rate of falures can encompass 1 disk rebuild.
and 16GB ECC RAM. Runs FreeNAS last version. It shows a problem when rebooting cause the ConnectX driver locks just before the system was expected to reboot. We accept this cause we have the IPMI card annd can reset the machine remotely.


1 x https://www.asrockrack.com/general/productdetail.asp?Model=C2750D4I#Specifications with 32GB ECC RAM (16GB so far) and 8x WD RED 3TB (not acquired yet).
SSD to run a proxy. This machine acommodates all outgoing infrastructure. It will run linux cause even with jails we felt limited when tried FreeNAS. So we use a linux distro with ZFS for linux. It has dual connect x-2.


The idea is to Have the pools for this:

Pool1: (HDDs) Almost Staticaly Data with ZFS with compression using gzip-9.

Pool2: (HDDs) Home Folders for Linux and Windows.

Pool3: (HDDs) Database land. We need to run Neo4J. So the option is also to be hosted in linux.

Pool4: (NVMe) BlockStorage or/and database storage. The final choice will depend on our testing with the workloads we need for development. So far it is uncertain.


I think I haven't missed any point.

Thanks in advance.
 
Joined
Oct 18, 2018
Messages
969
Projected NVMe Cards:
2x https://www.amazon.com/ADWITS-Adapt...keywords=nvme+pcie+x16&qid=1568315776&sr=8-23

They have bifurcation although I loose the x16 download link.
So, these cards are PCIex16 cards. Depending on the kind of card they MAY work in an 8x slot with all four ports configured, but even if they do you will see performance penalties. It is typical for a single NVMe drive to want 4x PCIe3.0 lanes. Additionally, depending on that card your board may ALSO need to support bifurcation. Here is a post where someone was trying to help me with a very similar topic. You'll want to be absolutely certain that this card will work as expected with your machine. Supermicro support may offer some help in this regard.

This is the same card I use. It requires that your motherboard support port bifurcation. Check the manual on your board, if you see no information there try going through your bios. If you still see nothing contact Supermicro. These cards have the advantage that they only support 2 cards per device and fit an 8x slot so you won't be leaving any bandwidth on the table. Buy 4 of these and you could have 8 NVMe drives each with 4x lanes of PCIe3.0.

Dual 10Gbit cards for San.
Depending on the cards you could go with a 4-port 16x card in an 8x slot to save a slot. Consider than an 8x PCIe3.0 slot is ~8GB/s so 64Gb/s full duplex. Take off 20% for overhead etc and you're left with 51Gb/s, still way plenty for 4x10Gb ports. I did something similar with a dual port 10Gb card. I reached out to Chelsio to confirm that their card could still push 10G through both ports in the slower slot.

So, if you go with the Supermicro NVMe adapters, a 4-port 10Gb NIC, that leaves 1 port for whatever else. If you really need infiniband you could use it for that.

It only provides power and has 6x 8087 connectors.
Do you have the exact model number? I bet it is in an expander backplane similar to the SAS846EL2 in which case you can plug the backplane directly into the two ports provided by your onboard Broadcom 2308 and you're done. It has the added advantage that you can use that backplane to run other chassis with additional drives without requiring using an additional PCIe slot for an additional HBA. If you're only running spinning rust off that backplane you'll have plenty of bandwidth on those two SAS2 ports at a combined total of ~48Gb/s.

This is the tricky question... So far we have a linux running ZFS. That was one of the reasons my question was vague. To have extended infiniband support we have to drop the FreeNAS and setup manualy. There's no problem cause we have skilled personal at the company.
Be ware that many of the common suggestions on these forums directly relate to FreeNAS on account of how it manages and uses zfs under the hood.
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
I did forget about the backplane. It only provides power and has 6x 8087 connectors.

Your suggestion are really insightful. I am sure the board provides biforcation. I checked that in the bios setup.

About the backplane it is really passive. It just has some capacitors to filter the power and the 8087 connectors. It is like a Norco case but was sold from the U.K.

Meanwhile I remember someone in the fóruns referring to keep less silicon from hba to disks. I guess a second hba could fulfill that goal better.

I understand the info here is FreeNAS specific. Basically what we want is learn about ZFS. If there was no problem with the infiniband cards we would be ready to use FreeNAS. Because of past difficulties we are trying a mixed solution.

But thank you for pointing that out. I will take now sometime to read the posts you recommended.

MAN




About the
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
Your suggestion are really insightful. I am sure the board provides biforcation. I checked that in the bios setup.

About the backplane it is really passive. It just has some capacitors to filter the power and the 8087 connectors. It is like a Norco case but was sold from the U.K.

Meanwhile I remember someone in the fóruns referring to keep less silicon from hba to disks. I guess a second hba could fulfill that goal better.

I understand the info here is FreeNAS specific. Basically what we want is learn about ZFS. If there was no problem with the infiniband cards we would be ready to use FreeNAS. Because of past difficulties we are trying a mixed solution.

But thank you for pointing that out. I will take now sometime to read the posts you recommended.

MAN




About the

Based on your input and some references to infiniband on the forums, I am going to try something slightly diferent.

Instead of running server software on the system using Linux, I will try with vmS and bhyve.

If the system passes the tests, I will convert all NASes to FreeNAS.

Wish me luck . I will post the results.

MAN
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
First Report:

- ZFS on Linux generates incompatible pools with FreeNAS.
I haven't been able to import the pool even readonly.

- An old HP Expander works with dual 8087 connector. So I get the 24 drives all accessible and only waste one pci-e slot.
Past problems were probably due to firmware issues on a Perc H700. We are moving away from hardware raid and only using it on workstations
to cope with larger pools under windows.

- Pool created on FreeNAS is importable on Linux. I am currently copying some files to the FreeNAS created pool on linux to figure out if I can transfer data quickly.

So I have 4 free slots that I can use to plug NVMe's. Instead of the bigger x16 card, I can go to 4x dual nvme cause the motherboard supports bifurcation on all slots. That means 8 nvmes that could for example be configured in a raid-z2 pool. (any sugestions regarding this?)

I will be working on data migration now and try to figure out if bhyve works for me.

MAN
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
Managed to import the ZFSonLinux pool in readonly mode now that all disks are behind the expander.
 
Last edited:

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
More Results:

I have been able to import the ZFSonLinux pool on FreeNAS readonly.

bhyve seems proper for what I want to do. Looking for how to passtrough a card.

Infiniband driver is working using tunnables. the IPOIB is also loadable but it locks the shutdown process. Working on a solution for that.



Now doing a burn-in test with the following chain of events:

Decompress from ReadOnly Pool -> Compress to gzip-9 -> Calculate checksum -> Calculate parity data -> Encrypt

Usage of the CPUs is high and validates the choice of CPUs.


For the ones that didn't know, Mouseless Commander is now part of FreeNAS. Give it a try by executing the mc command.


MAN
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
- ZFS on Linux generates incompatible pools with FreeNAS.
You need to manually choose which feature flags you want when you create the pool, if you're looking for compatibility. For the FreeNAS side of things, check this out: https://www.ixsystems.com/community/resources/zfs-feature-flags-in-freenas.95/

For the Linux side of things, the general concepts are the same, but you'll have to use man zpool-features on your system to figure out what exactly you have.
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
You need to manually choose which feature flags you want when you create the pool, if you're looking for compatibility. For the FreeNAS side of things, check this out: https://www.ixsystems.com/community/resources/zfs-feature-flags-in-freenas.95/

For the Linux side of things, the general concepts are the same, but you'll have to use man zpool-features on your system to figure out what exactly you have.

I am going to create the Pool on FreeNAS and keep using FreeNAS. It uses less feature flags and is readable without using the -f option. I am going to transfer the readonly pool to a rw one and then delete and create a new pool on the source drives.

Don't do this kind of stuff!!! It is crazy and dangerous.
 

Miguel Nunes

Explorer
Joined
May 6, 2016
Messages
52
Data move was successful.

Already Implemented:

Pool1 (HDD Raidz2, 8drives 1TB each): Less frequently changed data. (you can see this as a static file server)

Pool2 (HDD Raidz2, 8drives 1TB each): Home folder for OS accounts (Windows, Linux Dists and macOS).


Being Planed:

Pool3: (HDD Raidz2, 8drives 3TB each): non speed sensitive Large Common Data.

Pool4: (NVMe Raidz2, 8drives 500GB each): Speed Sensitive data for a defined moment in time.


Questions:


HDDs for Pool 3 have not been acquired yet. Would you recommend RaidZ2 or Raid Z3 for 3TB drives? What would be the turning point for Raid-Z3?

NVMes for Pool 4 haven't been acquired yet. Same question here about the RAIDZ2 versus RaidZ3.


Thanks in advance,
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
What would be the turning point for Raid-Z3?
There is no single point. It depends on your requirements - how reliable you need it to be, how long you'd have to wait to replace disks, etc.
 
Top