Specific NAS requirements (24x7 write 100-200MB/s and sometimes little read 1-50MB/s)

Status
Not open for further replies.

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
Hello

I would like to ask you about help with building one very specific NAS. Until now my experience is limited to standard raids 0, 1, 5, 6 and to SATA drives only (no experience on sas/10Gb/infiniband). So I would appreciate aby help and even more personal experience.

Data stored come from 3 blackboxes that are linux based, however very limited in usage and I can only say to what mount point they will write and they always need OK response (regardless if the write operation was OK or not), they might read some data but in very limited size (max for 3 hours max 1MB/s + standard filesystem access)

Requirements for the NAS are following:
  1. Data written 24x7 with transfer rates 100 to 200 MB/s (by 3 blackboxes each has 1 Gb ethernet interface), in parallel cca 50 to 100 files are being written (1-2MB/s per file), and file size is 1 to 5 GB (file_size_treshold = 5GB)
  2. Data read max 12 hours/day using read rates 1 to 50 MB/s (by 2 to 5 IPs connected via 1Gb ethernet) so most data will be deleted by purge without reading :)
  3. One file is recorded on MAXIMUM 1 physical HDD (exception are files greater than set treshold, write continues on another drive) (if free space on HDD is less than file_size_treshold no write operation starts on this drive)
  4. In case of HDD failure loss of data written on this drive is accepted, but no outage of whole NAS should occure
  5. If HDD is lost during write operation on this particular drive the write operation should continue and not throw any errors to the blackbox (the blackboxes are very sensitive and die easily, can not be fixed), but file is lost
  6. No hot swap/hot plug necessary, however it must be possible to identify faulty drive
  7. If one particular HDD is dismounted and connected to stand alone PC the files recorded on this drive must be readable (any filesystem that can be read by Win or Linux accepted)
  8. For power/cooling dimensioning it would be great if the delayed spin-up can be used (this I can do also on HW base)
  9. Initial capacity will be ~20 TB, it will grow to ~100 TB until DEC 2017
  10. All files to be stored in one directory (all 3 blackboxes need to see one and the same mount point)

My intentions and questions are following:
  • SW: Does any "RAID" level exists that can work in this configuration (or non-raid filesystem) I have not come to any idea how to realize this (writing own filesystem is not feasible)
  • HW: Have no experience with infiniband but was thinking to have 4x dual infiniband card in the PC running FreeNas, these cards can serve 32 HDDs + 1one 4xGBIT card to connect the blackboxes and GBIT switch for read operations

Here is the HW:

Infiniband card: 4x HP based or Mellanox based
Infiniband to sata: 8x split cable from 1 infiniband to 4 sata ports
Quad Gbit card: 1x Intel version

+ standard PC components like Intel i7, boot SSD, RAM etc...

Can you please advice me what I'm doing wrong or if it will work at all? Any ideas on better setup? In the SW part I did not manage to find any working solution until now. Also I'm in doubt if it is sufficient to plug the cable into sata hdd on one side and infiniband card on another side or if I need something in-between.There is very limited information on internet about the infiniband split to sata.

Thanks Peter
 

Thomas102

Explorer
Joined
Jun 21, 2017
Messages
83
I think you should explain more about requirements for 3, 7 and 10 (and 4, and 5...).
It looks like there are linked to "limitations" or "optimisations" of your current architecture .
They might be other way to do your functionnal uses cases with freenas.
 

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
Please see explanation and considerations until now which have lead to the questioned requirements:
  • Priority is space, not reliability
  • Storage size will be ~100 TB (or maybe even more) so respecting previous point loss of data should be limited (for example raid 0 can provide almost everything needed, but loss of one HDD will lead to loss of all data) to content of the lost HDD(s) this directly implies req 3 (of course you can say that number of HDDs can be doubled, even if I do not count the problems related to slots etc, the priority is storage so if number of HDDs is doubled they will be not used for reliability but for size = to have 200TB instead of 100)
  • Sometimes it will be needed to copy bigger block of files (e.g. 1TB) to local storage, to not bother the NAS with this and not create potential problem for the writing boxes (troughput etc...) I simply disconnect the needed HDD(s) from NAS, connect them to my local PC copy the needed files and then connect them back - this implies req 7
  • Based on my previous experiences where RAID has 'died' (for whatever reason not related to HDDs) everything is lost because the SW used was upgraded in meantime, it can not read from RAID when it is not complete, there is different FW on mainboard etc... So requirement is to be able to read at least part of data in case the NAS system dies - this implies also req 7
  • The boxes can configure only 1 mount point and each box need to access also data of the other boxes so he knows what the others are doing - this implies 10 (the boxes are really BlackBoxes and there is no way to set something else than mount point)
  • Following previous logic if one HDD is lost data on this HDD are lost, this is clear, but it should not result in whole NAS getting down as this will impact the recording boxes so this implies 4 and 5 (in different words: whatever happens the boxes must continue to write and not receive any error)
  • By simple reliability calculation (100000/32/24) one HDD should be replaced each ~100 days so loss of ~5TB (100/32) /100 days is acceptable this is just to see the magnitude of 'accepted data loss' in case it will be created according to the listed requirements (using the same calculation if raid 0 will be used the magnitude of data loss will be 100TB/100days = 20x more data lost)
The "implies" on above text always means that I was not able to find any better solution.
The used values in calculation are MTBF = 100000hours; Number of HDDs = 32; 24hours per day; total storage size = 100 TB (the 5TB per drive is rounded up significantly), I'm fully aware that the calculation is not really exact, but presents good enough aproximation and can be used to compare various solutions (in my opinion)

The only other idea considered was RAID 6 but at the end negatives in performance and rebuid process after loss of HDD led to exlusion of this solution
Other raids excluded for reasons:
  • Raid 0: 1 HDD loss puts down whole storage
  • Raid 1: Not enough slots available, HDD count to be doubled without significant benefit (as redundancy is not priority)
  • Raid 5: Rebuild process
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
One file is recorded on MAXIMUM 1 physical HDD (exception are files greater than set treshold, write continues on another drive) (if free space on HDD is less than file_size_treshold no write operation starts on this drive)
I think this requirement is going to be the killer, at least as far as FreeNAS is concerned--once disks are added to a pool, there's no way to direct writes/files to a specific disk, as the storage is pooled.
 

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
The mentioned 'killer' requirement has also one background function as there must be a form of load balancing (e.g. for example after 1st boot if the system decides to write all incomming streams to 1st HDD it will not work due to rotational HDD performance on write operations) so this requirement also helps to 'load balance' the writes
 

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
I have discussed also with one expert I know and his proposal was to write properietary FileSystem driver that will take all incomming requests find HDD that has lowest number of write operations and more than 5 GB storage free and divert the write request there, all HDDs will have own filesystem + the properietary FS will keep copy of all the file locations on main boot drive (here I expect to have raid 1 of 2 SSD drives), this allows response to standard operations like list files without spinning all drives and would also allow to control the delayed spinup functions, but my programming skills are very limited and I can not write properietary FS driver
 

Thomas102

Explorer
Joined
Jun 21, 2017
Messages
83
Please see explanation and considerations until now which have lead to the questioned requirements:
  • Priority is space, not reliability
Based on what you said you need reliability. You don't want the whole NAS to go down
  • Storage size will be ~100 TB (or maybe even more) so respecting previous point loss of data should be limited (for example raid 0 can provide almost everything needed, but loss of one HDD will lead to loss of all data) to content of the lost HDD(s) this directly implies req 3 (of course you can say that number of HDDs can be doubled, even if I do not count the problems related to slots etc, the priority is storage so if number of HDDs is doubled they will be not used for reliability but for size = to have 200TB instead of 100)
Again, you want to implement your own reliability by having disks independent.
The issue is that it is not how ZFS/NAS works.
I thunk if you want to go with a NAS you have to follow NAS rules.
  • Sometimes it will be needed to copy bigger block of files (e.g. 1TB) to local storage, to not bother the NAS with this and not create potential problem for the writing boxes (troughput etc...) I simply disconnect the needed HDD(s) from NAS, connect them to my local PC copy the needed files and then connect them back - this implies req 7
To avoid throughput problems you should first plan the NAS carrefully. The way you want to do can do harm more than the problem it is supposed to solve.
  • Based on my previous experiences where RAID has 'died' (for whatever reason not related to HDDs) everything is lost because the SW used was upgraded in meantime, it can not read from RAID when it is not complete, there is different FW on mainboard etc... So requirement is to be able to read at least part of data in case the NAS system dies - this implies also req 7
There are different levels of RAID that should avoid that.
  • The boxes can configure only 1 mount point and each box need to access also data of the other boxes so he knows what the others are doing - this implies 10 (the boxes are really BlackBoxes and there is no way to set something else than mount point)
Mount point for shares does not have to match the disks mount points.
  • Following previous logic if one HDD is lost data on this HDD are lost, this is clear, but it should not result in whole NAS getting down as this will impact the recording boxes so this implies 4 and 5 (in different words: whatever happens the boxes must continue to write and not receive any error)
This is clearly not how ZFS/NAS works.
  • By simple reliability calculation (100000/32/24) one HDD should be replaced each ~100 days so loss of ~5TB (100/32) /100 days is acceptable this is just to see the magnitude of 'accepted data loss' in case it will be created according to the listed requirements (using the same calculation if raid 0 will be used the magnitude of data loss will be 100TB/100days = 20x more data lost)


The "implies" on above text always means that I was not able to find any better solution.
The used values in calculation are MTBF = 100000hours; Number of HDDs = 32; 24hours per day; total storage size = 100 TB (the 5TB per drive is rounded up significantly), I'm fully aware that the calculation is not really exact, but presents good enough aproximation and can be used to compare various solutions (in my opinion)

The only other idea considered was RAID 6 but at the end negatives in performance and rebuid process after loss of HDD led to exlusion of this solution
Other raids excluded for reasons:
  • Raid 0: 1 HDD loss puts down whole storage
  • Raid 1: Not enough slots available, HDD count to be doubled without significant benefit (as redundancy is not priority)
  • Raid 5: Rebuild process
ZFS is doing the Raid on FreeNAS. You have alternatives like stripped mirror (raid 10) or raid50/60 that may fit your requirements.
I think doubling the disks is nothing compared to the cost of creating and testing your own filesystem. 100TB HDD is around 5000$
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Hello

I would like to ask you about help with building one very specific NAS. Until now my experience is limited to standard raids 0, 1, 5, 6 and to SATA drives only (no experience on sas/10Gb/infiniband). So I would appreciate aby help and even more personal experience.

Data stored come from 3 blackboxes that are linux based, however very limited in usage and I can only say to what mount point they will write and they always need OK response (regardless if the write operation was OK or not), they might read some data but in very limited size (max for 3 hours max 1MB/s + standard filesystem access)

Requirements for the NAS are following:
  1. Data written 24x7 with transfer rates 100 to 200 MB/s (by 3 blackboxes each has 1 Gb ethernet interface), in parallel cca 50 to 100 files are being written (1-2MB/s per file), and file size is 1 to 5 GB (file_size_treshold = 5GB)
  2. Data read max 12 hours/day using read rates 1 to 50 MB/s (by 2 to 5 IPs connected via 1Gb ethernet) so most data will be deleted by purge without reading :)
  3. One file is recorded on MAXIMUM 1 physical HDD (exception are files greater than set treshold, write continues on another drive) (if free space on HDD is less than file_size_treshold no write operation starts on this drive)
  4. In case of HDD failure loss of data written on this drive is accepted, but no outage of whole NAS should occure
  5. If HDD is lost during write operation on this particular drive the write operation should continue and not throw any errors to the blackbox (the blackboxes are very sensitive and die easily, can not be fixed), but file is lost
  6. No hot swap/hot plug necessary, however it must be possible to identify faulty drive
  7. If one particular HDD is dismounted and connected to stand alone PC the files recorded on this drive must be readable (any filesystem that can be read by Win or Linux accepted)
  8. For power/cooling dimensioning it would be great if the delayed spin-up can be used (this I can do also on HW base)
  9. Initial capacity will be ~20 TB, it will grow to ~100 TB until DEC 2017
  10. All files to be stored in one directory (all 3 blackboxes need to see one and the same mount point)

My intentions and questions are following:
  • SW: Does any "RAID" level exists that can work in this configuration (or non-raid filesystem) I have not come to any idea how to realize this (writing own filesystem is not feasible)
  • HW: Have no experience with infiniband but was thinking to have 4x dual infiniband card in the PC running FreeNas, these cards can serve 32 HDDs + 1one 4xGBIT card to connect the blackboxes and GBIT switch for read operations

Here is the HW:

Infiniband card: 4x HP based or Mellanox based
Infiniband to sata: 8x split cable from 1 infiniband to 4 sata ports
Quad Gbit card: 1x Intel version

+ standard PC components like Intel i7, boot SSD, RAM etc...

Can you please advice me what I'm doing wrong or if it will work at all? Any ideas on better setup? In the SW part I did not manage to find any working solution until now. Also I'm in doubt if it is sufficient to plug the cable into sata hdd on one side and infiniband card on another side or if I need something in-between.There is very limited information on internet about the infiniband split to sata.

Thanks Peter
Your design is really messed up. I have no clue what you are trying to do but it's not going to work and you should rethink your requirements. For example you want individual disks and to control the writes to them but at the same time you want a single namespace. These 2 things are direct opposites of each other.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
From what you are describing, you are not going to get that unless you literally design the operating system yourself or hire someone to do it. You might be able to use a 'of the shelf' operating system, but you will definitely need some software to handle what goes where with regard to incomming file transfer and that is where you run into the big problem.
It also sounds like you have no concept of how file transfer works when sending data to a storage server. You are not bound by the speed of any single disk because the data is striped across multiple physical disks.
I manage a server at work that runs RedHat Linux and uses ZFS for the storage pool. It has two 10GB links to the network and can almost fully saturate that with many users reading and writing to it simultaneously.
I don't think you are asking the right questions or thinking about your problem in a way that is going to fit a solution that already exists.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I have discussed also with one expert I know and his proposal was to write properietary FileSystem driver
The answer to your problem is never, ever, "write a new filesystem", unless you have huge resources. I recommend that you ignore the advice of this "expert", because they clearly have no idea what they're talking about.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
The answer to your problem is never, ever, "write a new filesystem", unless you have huge resources. I recommend that you ignore the advice of this "expert", because they clearly have no idea what they're talking about.

The experts answer is correct given the requirements.

The requirements are wrong.

I suggest redefining your requirements.

For a start, what protocol is the mount point? SMB, iSCSI or NFS?
 

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
So now you have explained how it is not possible, great :-D
What about making it possible? Several justifications to what was written:
reliability importance: It IS important to have the NAS (as service) available, but if there are cca 10TB/100 days of data lost it is OK, loss of whole NAS is NOT OK
Current mount looks like CIFS (theoretically also NFS should be possible, but nobody has never tried and I prefer not to be the first one
What would you the recommend for this use case?
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
@IcePlanet
I have to agree with everyone else. FreeNAS and ZFS simply won't do what you want it to do. In my opinion, you need a different solution.
There are just too many odd things in the requirements that preclude a normal NAS. Any normal NAS.

I don't think you understand what Infiniband is. It won't control disks. It's a network or cluster protocol. The same connector was used by some SAS, (and few SATA), disk controllers, so the Infiniband to 4 x SATA connectors on a cable make sense. If you understand you want a SAS controller.

Last, 100TB is a sh*t load of data. And trying to support 200Mega-bytes per second writes to a single name space are problematic.

Sorry, but in my opinion I don't see any changes to FreeNAS that could make it work for your use case. And, other than custom SMB, (previously known as CIFS), code, I don't think it's possible with anything.

Closest I can think of is the Union file system on the backend, with single partition / file system per disk. This would allow multiple disks appear as one. Yet be totally indepentant in case of failure, (though I don't know what happens on failure). This may allow removal and install in client. Of course, this does not solve the pushing of data to the Union file system with the requirement of files being written to one disk. That would require custom code, (perhaps from Samba / CIFS).

Good luck.
 

Thomas102

Explorer
Joined
Jun 21, 2017
Messages
83
So now you have explained how it is not possible, great :-D
Actually what is not possible is how you draw your specifications (3,4,5,6,7,8) from your requirements.
And it looks like your first requirement is to save maximum money on hardware.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Personally, I'd suggest building a 100TB NAS with sufficient capability and redundancy to meet the requirements and then a backup script to fill disks.
 

IcePlanet

Cadet
Joined
Aug 9, 2017
Messages
7
So ok I understand the way I have created is dead end. Got it, but what way is then correct?

What is hard fact is that there are these three 'sensitive' blackboxes (technical details in 1st post) and that complete storage is planned to grow to 100TB (any reserve would be appreciated).
Hint "building a 100TB NAS with sufficient capability and redundancy" is not really helpfull for me.
Saving money on HW is not the point, but this is different to waste money on HW (yes I can buy ready made NAS for lot of money which has lot of features I will never use and this I consider as wasted money), so rather than saving on HW I would say to get best efective spend for my use case AND easy maintenance and service so I strongly prefer of the shelf components that can be easily replaced in case of issue and do not bind me to one supplier/manufacturer.
As I plan longer lifetime for this NAS it would be great if adding of drives would be the only work needed...

Also several experiences I have collected during my personal and profesional work:
  • Most RAIDS I have seen failing have failed because of issue other than HDD (e.g. one time it was different FW in controller, which was not compatible with HDDs used during exchange, then it was rebuid process killing further HDDs and getting out of control, last failure was caused by HW issue on power source killing all HDDs with AC and many more examples)
  • What I have build have been very small raids having max 5 TB using either raid 10, 5 or 6 (all based on scsi or sata and using mostly adaptec or mylex controllers). No experience with other raids (so from today point of view quite ancient technology)
  • When benchmarking the raids usually around 100MB's all my builds start to die
  • Also what was significant drawback of all raids I have build was mising extensibility, if I want to add storage I was bound to use the same size drives as before (even when larger driver have been available or more fitting for my prupose) as the raid did not accept bigger drives (to be more precise it accepts bigger drives but can use only capacity equal to smallest drive in the raid)
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Btw, I think unraid works like you want.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Also what was significant drawback of all raids I have build was mising extensibility, if I want to add storage I was bound to use the same size drives as before (even when larger driver have been available or more fitting for my prupose) as the raid did not accept bigger drives (to be more precise it accepts bigger drives but can use only capacity equal to smallest drive in the raid)
This is not exactly the case with ZFS; you can expand a volume by adding additional RAID sets, or vdevs as they're called (in any configuration, with any drive capacities). However, within a vdev, the capacity will be based on the capacity of the smallest drive, as you're saying. So, for a few examples:
  • You have 2 x 2 TB, 2 x 4 TB, and 2 x 6 TB disks. You configure them in striped mirrors (similar to RAID 10). The 2 x 2 TB mirror will have a capacity of 2 TB, the 2 x 4 TB mirror will have a capacity of 4 TB, and the 2 x 6 TB mirror will have a capacity of 6 TB. Together, they'll have a capacity of 12 TB in a single pool (volume).
  • You later acquire 2 x 8 TB drives and want to expand the above pool. You add them as another mirror, and the pool capacity increases to 20 TB. Repeat as needed or desired.
  • You have 6 x 4 TB and 6 x 6 TB disks. You configure them in RAIDZ2 (similar to RAID6, two disks' worth of redundancy), into two vdevs. One contains the 4 TB disks and has a capacity of 16 TB, the other contains the 6 TB disks and has a capacity of 24 TB. Combined into a single pool, the pool will have a capacity of 40 TB.
  • One of the above 4 TB disks fails. You don't have a 4 TB disk on hand to replace it with, but you do have a 6 TB disk, so you replace the 4 TB disk with the 6 TB disk. This works just fine, but the capacity of your pool doesn't change.
  • A few more of the 4 TB disks fail, and you replace them as needed with 6 TB disks. The replacements run without problems, but again the pool capacity doesn't change.
  • You're left with one 4 TB disk in the pool. It doesn't fail, but you need more capacity. You replace it with a 6 TB disk. At that point, the capacity of that vdev increases from 16 TB to 24 TB, increasing the pool's capacity to 48 TB.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Btw, I think unraid works like you want.
"Pretends to function long enough to fool people into believing it's a vaguely safe solution" is what I'd call it, FWIW.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
So ok I understand the way I have created is dead end. Got it, but what way is then correct?

What is hard fact is that there are these three 'sensitive' blackboxes (technical details in 1st post) and that complete storage is planned to grow to 100TB (any reserve would be appreciated).
Hint "building a 100TB NAS with sufficient capability and redundancy" is not really helpfull for me.
Saving money on HW is not the point, but this is different to waste money on HW (yes I can buy ready made NAS for lot of money which has lot of features I will never use and this I consider as wasted money), so rather than saving on HW I would say to get best efective spend for my use case AND easy maintenance and service so I strongly prefer of the shelf components that can be easily replaced in case of issue and do not bind me to one supplier/manufacturer.
As I plan longer lifetime for this NAS it would be great if adding of drives would be the only work needed...

Also several experiences I have collected during my personal and profesional work:
  • Most RAIDS I have seen failing have failed because of issue other than HDD (e.g. one time it was different FW in controller, which was not compatible with HDDs used during exchange, then it was rebuid process killing further HDDs and getting out of control, last failure was caused by HW issue on power source killing all HDDs with AC and many more examples)
  • What I have build have been very small raids having max 5 TB using either raid 10, 5 or 6 (all based on scsi or sata and using mostly adaptec or mylex controllers). No experience with other raids (so from today point of view quite ancient technology)
  • When benchmarking the raids usually around 100MB's all my builds start to die
  • Also what was significant drawback of all raids I have build was mising extensibility, if I want to add storage I was bound to use the same size drives as before (even when larger driver have been available or more fitting for my prupose) as the raid did not accept bigger drives (to be more precise it accepts bigger drives but can use only capacity equal to smallest drive in the raid)
Is the system you want to build going to be in a fixed facility or is it going to be moving, like on a ship?
 
Status
Not open for further replies.
Top