New system build, looking for pool recommendations ssd drives

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
I currently have 3- 4 TB Exos drives in a 3-way mirror running freenas 11.3 on a supermicro mini-ITX motherboard with 16 gb ECC ram and an intel i3 cpu (that supports ECC) . I’m building a new system with Truenas with the following hardware.

Motherboard: used Supermicro X10SRL-F
CPU: used XEON E5-2630L v4
Ram: used 64 gb ECC
Boot Drives: new old stock 2- Intel Enterprise 4510 SSD 240 gb sata
Data drives: new old stock 6- Intel Enterprise 4510 SSD 4 TB sata
Drives connected to motherboard Sata ports
1 gb ethernet via onboard controller to network switch
Truenas core 13.0

I’ve seen many discussions about the danger of only having 1 backup drive in a pool, and in my old system, at a cost of 2/3 raw storage capacity, I went for a 3-way mirror for best data integrity.

With enterprise SSDs, I’m questioning that philosophy. These drives have an unrecoverable read error rate specification of 1 sector in 10^-17 bits read. That is 100x lower that enterprise HDDs. They also have a write performance of around 4000 write cycles. (e.g. 1.1 peta byte for the 240 gb drive).

So if I’m doing the math right, given only one parity drive in a pool, and a drive fails: the probability of an unrecoverable error while resilvering is: (assuming 4TB drives) < 8 x 4 x 10^12/ 10^17 = 1/3125. Not zero but pretty unlikely.

I would be more worried that in changing out the bad disk, I make a blunder in the gui and end up destroying the pool. Especially since l rarely interact with the freenas gui after setting it up, so I’m always rusty on what I’m doing.

I really like mirrors, as they are so simple, and easier to recover from a failure. So I’m wondering with 6 drives, what would be the best configuration.

I’m tempted to set up 5 separate 1 drive pools, with no redundancy, and 1 drive as an unpowered spare. Then set up each data set to be on 2 pools. If a drive fails, it has limited data on it, ( typically 1/5 total, if I manage it correctly), and there is a backup on another pool (on a separate drive). (I’m keeling one of the drives as a spare, because they are an old design, and likely to be hard to find in a few years. With m.2 and U.2, nvme, SATA SSD is headed to extinction.)


No resilvering necessary. And if I did my math right, only a 1/3000 chance of an unrecoverable error while copying the lost data sets over to the replacement disk. But even if that happens, you don’t loose your entire pool, you only loose a file in the data set.

But I would like to hear from the experts.

Thanks
 

chuck32

Guru
Joined
Jan 14, 2023
Messages
623
1 gb ethernet via onboard controller to network switch
Even with RAIDZ2 HDDs I was limited by 1gbe, I'd say reconsider 10g if your data pool is SSDs.

I’ve seen many discussions about the danger of only having 1 backup drive in a pool, and in my old system, at a cost of 2/3 raw storage capacity, I went for a 3-way mirror for best data integrity
RAID is not a backup. The only thing you need to decide how badly you want to avoid restoring from backup and if you trust the backups data integrity.

If I'm right with a two way mirror you can lose individual files due to corruption but truenas will tell you which files, so you can restore them.
RAIDZ2 or higher and you won't have that problem.
I’m tempted to set up 5 separate 1 drive pools, with no redundancy, and 1 drive as an unpowered spare. Then set up each data set to be on 2 pools.
I'm afraid that's not possible, as far as I know a dataset can only be at a pool.
But with that you can go 2way mirrors and achieve the same. Or go ahead and do a striped mirror pool. But then you risk pool failure if one vdev fails. For striping:
As was recommended to me go either 4 drives striped mirror + 1 hot spare or 6 +1.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I am currently in this position with replacing my HDDs with NVMe drives. Six 4TB drives. My real issue now is the cost of the upgrade, my pool is going to be RAIDZ2. My recommendation is using the six 4TB drives to create a RAIDZ2 pool. Two drive protection so you can lose 2 drives and your data is still there. Of course if you run frequent SMART Tests, you should be able to avoid dropping 2 drives at the same time. Some people ignore the warning messages and must say to themselves that they have lots of redundancy, until that last drive fails. Don't be one of those people if you value your data.

Don't worry about statistical failures. Something will eventually fail but with a SSD it could be 3 months or 7 years. You are just dealing with so few drives to really use that kind of math. If you don't believe that you should look at how those numbers were generated. Replacing a drive in a RAIDZ2 is simple. Follow the instructions in the user guide. You could also build the system, put some data on it, pull one drive to see what happens, the one drive you could sanitize it (make it look new), and then introduce it as a replacement drive to resilver into the pool. If you do that I recommend you create a step by step list of instructions for yourself. This will ease your mind on what needs to be done for when a drive does fail.

Additionally I would recommend you trying out SCALE instead. CORE will not see a version 14 as far as I'm aware. SCALE will supperseed it. But whichever you choose, either should run fine. If you start with SCALE, your learning curve will be easier in the long run.

There are a lot of forum entries discussing this kind of stuff. Read more if you need to.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
@joeschmuck - you keep repeating that Core is being deprecated and that Scale will be the only version of TrueNAS available

Can you please state where you are getting this information from - aka an official source - or please stop. I believe that you are wrong, very wrong
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
I’ve seen many discussions about the danger of only having 1 backup drive in a pool, and in my old system, at a cost of 2/3 raw storage capacity, I went for a 3-way mirror for best data integrity.

With enterprise SSDs, I’m questioning that philosophy.
The reasoning is entirely correct. With SSDs there is no cause for concern about UREs when resilvering without redundancy; the main risk is a further drive failure during resilver, and raidz1 or 2-way mirrors may still be considered adequate—depending on one's degree of paranoia.

I’m tempted to set up 5 separate 1 drive pools, with no redundancy, and 1 drive as an unpowered spare. Then set up each data set to be on 2 pools. If a drive fails, it has limited data on it, ( typically 1/5 total, if I manage it correctly),
Here it goes downhill. A dataset cannot spread pools. Single drive vdevs are strongly advised against if the data is of any value. And if you mean to stripe five drives with no redundancy, losing any drive would mean losing ALL data.

The natural setup is a stripe of three 2-way mirrors. Get a seventh drive as cold spare.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
A with RAIDZ2 HDDs I was limited by 1gbe, I'd say reconsider 10g if your data pool is SSDs.

All of my end user devices are connected via google wi-fi. The truenas will be hard wired to the main google wi-fi via a switch. While the google wifi can connect with wifi at greater than 1 Gbs, most of my devices are at least 1 room away, and typically connect at 75 Mbps. So the bottleneck is not the 1 Gbe nas port. My use case is not the nas is the center of the universe. Rather, my nas is used as a backup for all of my PCs. The backups happen daily, and typically are done in a few min. Although I would like higher thru-put to/from the nas, and should mesh connected wifi get better , I certainly will go that way.

I'm afraid that's not possible, as far as I know a dataset can only be at a pool.

i will be using rsync tasks. The 2 datasets wont have identical names. There will be 2 rsync tasks, each with the same source, but different dataset names.

But with that you can go 2way mirrors and achieve the same.

Agreed. But people have criticized that too. Suggesting you need to have 2 redundant drives per pool. What I’m questioning is the 2 redundant drives needed per pool with enterprise SSDs that have 10^-17 uncorrectable error rate, and 4000 write cycles.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
@joeschmuck - you keep repeating that Core is being deprecated and that Scale will be the only version of TrueNAS available
I am pretty sure one of the iXsystem people stated that there is not expectation of anything beyond CORE 13.

Well crap, maybe I misunderstood. Maybe nothing beyond version 13.x ? But clearly 13.1 is expected to come out. Now I'm going to look for that posting or it will drive me crazy. I must have misunderstood it or maybe it was poorly written. Either way, I am clearly in the wrong.

Thanks for calling me on that and allowing me to prove to myself that I was wrong. I won't forget the lesson I just learned.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
I am currently in this position with replacing my HDDs with NVMe drives. Six 4TB drives. My real issue now is the cost of the upgrade, my pool is going to be RAIDZ2. My recommendation is using the six 4TB drives to create a RAIDZ2 pool. Two drive protection so you can lose 2 drives and your data is still there. Of course if you run frequent SMART Tests, you should be able to avoid dropping 2 drives at the same time. Some people ignore the warning messages and must say to themselves that they have lots of redundancy, until that last drive fails. Don't be one of those people if you value your data.

Don't worry about statistical failures. Something will eventually fail but with a SSD it could be 3 months or 7 years. You are just dealing with so few drives to really use that kind of math. If you don't believe that you should look at how those numbers were generated. Replacing a drive in a RAIDZ2 is simple. Follow the instructions in the user guide. You could also build the system, put some data on it, pull one drive to see what happens, the one drive you could sanitize it (make it look new), and then introduce it as a replacement drive to resilver into the pool. If you do that I recommend you create a step by step list of instructions for yourself. This will ease your mind on what needs to be done for when a drive does fail.

Additionally I would recommend you trying out SCALE instead. CORE will not see a version 14 as far as I'm aware. SCALE will supperseed it. But whichever you choose, either should run fine. If you start with SCALE, your learning curve will be easier in the long run.

There are a lot of forum entries discussing this kind of stuff. Read more if you need to.
Thanks for the suggestions.

Yes, I know about statistics, and I often criticize them too (with numbers that low, what is likely to get you is something not even considered (e.g. the drive got dropped when you installed it, and something came loose, leading to failure). And I agree, at some point a drive will fail, and you need to be ready for that.

By the way, I was going to use M.2 drives in supermicro dual M.2 nvme to pcie cards. The motherboard I selected has 5- pcie x8 slots, so it could have 10 M.2 cards (or 20 if you use the qnap quad card). What stopped me was the cost of ebay enterprise M.2 4TB cards, about 2x of equivalent size SATA drives. Of course the SATA drives are slower.

As for SCALE, I shied away from it because it hasn’t had the years of beating on it by users that CORE has. It seemed like a beta release to me. Maybe that is not correct. I don’t think I ever had freenas crash on me. The hardware (motherboard) has failed, but never the software. Of course my use case is pretty simple. A few daily rsyncs and snapshots, and once a month scrubs.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
I've never had Scale crash on me in a year. Forget the whole math thing, it's wrong anyway.

I see nothing wrong with mirrors, as long as you have a backup (ideally multiple). Performance will be superior. It's up to YOU how much parity you want, not anyone elses opinion.

rsyncing between every other drive why? A mirror will be better than that and it doesn't need syncd. Yes, if a drive fails you will replace it but with mirrors, that resilver is much faster than Raidz. Performance will be better with mirrors than separate single drives, and you won't have slowdowns due to rsync, and, the data will always be in sync every second of every day. Also, the data will be checked every scrub. And of course you still need a backup with single drives even if rsyncd.

For 6 drives, I see nothing wrong with 3 vdevs of mirrors if you need the performance. Again, you should always have backups. If you don't need the performance, a raidz2 will give you more space if money is a consideration. Of course, you still need backups. Up to you. But I most definitely would never go the single drive route myself.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
The motherboard I selected has 5- pcie x8 slots, so it could have 10 M.2 cards (or 20 if you use the qnap quad card).
Download the user manual for the MB and make sure you have enough PCIe lanes to each slot. This has been a limiting factor for me, and the cost, and I need a new CPU that can support it all.

Each M.2 card would need 4 PCIe lanes to take advantage of the speed, so do the math here, x16 slot with all the PCIe lanes will support 4 M.2 cards at full speed. x8 slot if they were full would handle two M.2 cards. And you have to be careful because a MB that has multiple x16 slots may only provide 16 PCIe lanes to one of those slots and x8 or x4 to the others. If your add-on card automatically will use 4 PCIe lanes and time share those wit four M.2 modules, they you are getting less than 1/4th the throughput. 1 PCIe lane is basically the same as SATA speed from what I've read, I don't know if that is true but based on the PCIe speed, it sounded good.

Be careful when buying that stuff, make sure it will all work together as you expect. If you do build this, I'd love to know exactly what parts you used and any bottlenecks you may have found.

I fully understand the thought about SCALE. I'm sticking with CORE for a while longer myself.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27

Here it goes downhill. A dataset cannot spread pools. Single drive vdevs are strongly advised against if the data is of any value. And if you mean to stripe five drives with no redundancy, losing any drive would mean losing ALL data.

The natural setup is a stripe of three 2-way mirrors. Get a seventh drive as cold spare.
Apparently I was not clear on this. What I was suggesting was not a dataset spanning multiple pools, rather 2 datasets, each in separate pools. But both datasets having the same contents. (E.g. 2 rsync tasks, each with the same source but different destination datasets. So all data is stored in 2 places. As was pointed out to me, this is essentially the same as a 2 way mirror, but without the automatic checks and repair of errors).

As for a 3-way stripe of 2-way mirrors, I am considering this, but don’t see the advantage of striping over just 3 separate 2-way mirrored pools. You store files in datasets, and likely have quite a few of them. So it should be pretty easy to assign a dataset to a specific pool. But I see one huge disadvantage of a 3-stripes of 2-way-mirrored pool. Lose one of the stripes, and loose the entire pool.

Single drive pools not advised …

As long as you have 2 copies of your data on separate single drive pools, I don’t see the problem. If there is a read error, you will be informed, and you have a copy on the other drive. But of course, why not just mirror 2 drives, and let zfs correct the error.
 

sfatula

Guru
Joined
Jul 5, 2022
Messages
608
Apparently I was not clear on this. What I was suggesting was not a dataset spanning multiple pools, rather 2 datasets, each in separate pools. But both datasets having the same contents. (E.g. 2 rsync tasks, each with the same source but different destination datasets. So all data is stored in 2 places. As was pointed out to me, this is essentially the same as a 2 way mirror, but without the automatic checks and repair of errors).

As for a 3-way stripe of 2-way mirrors, I am considering this, but don’t see the advantage of striping over just 3 separate 2-way mirrored pools. You store files in datasets, and likely have quite a few of them. So it should be pretty easy to assign a dataset to a specific pool. But I see one huge disadvantage of a 3-stripes of 2-way-mirrored pool. Lose one of the stripes, and loose the entire pool.

Single drive pools not advised …

As long as you have 2 copies of your data on separate single drive pools, I don’t see the problem. If there is a read error, you will be informed, and you have a copy on the other drive. But of course, why not just mirror 2 drives, and let zfs correct the error.
Exactly, just mirror them instead of separate drives and save all that IO trying to keep them in sync manually. Yes, >0 chance of losing the pool but the odds of 2 drives in the same vdev going at about the same time are pretty slim esp if you keep up with smart tests and scrubs. I've been using Raid and ZFS since about when mdraid came out, never lost an array or pool but I could of course, which is why I have backups.

The only advantage of 3 way stripe of mirrors over 3 separate mirrors is speed, you get better performance. Well, another possible slight advantage is being able to have larger datasets. Well, one more actually. You can have a hot spare and it applies to all 3 vdevs.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
Download the user manual for the MB and make sure you have enough PCIe lanes to each slot. This has been a limiting factor for me, and the cost, and I need a new CPU that can support it all.

Each M.2 card would need 4 PCIe lanes to take advantage of the speed, so do the math here, x16 slot with all the PCIe lanes will support 4 M.2 cards at full speed. x8 slot if they were full would handle two M.2 cards. And you have to be careful because a MB that has multiple x16 slots may only provide 16 PCIe lanes to one of those slots and x8 or x4 to the others. If your add-on card automatically will use 4 PCIe lanes and time share those wit four M.2 modules, they you are getting less than 1/4th the throughput. 1 PCIe lane is basically the same as SATA speed from what I've read, I don't know if that is true but based on the PCIe speed, it sounded good.

Be careful when buying that stuff, make sure it will all work together as you expect. If you do build this, I'd love to know exactly what parts you used and any bottlenecks you may have found.

I fully understand the thought about SCALE. I'm sticking with CORE for a while longer myself.
Yes, getting enough pcie lanes is the issue. The E5-2630L Xeon has 40 pcie lanes, and on this motherboard, you can arrange 5 slots with x8. There is also one slot with pcie x4 (in a x8 physical) from the southbridge. To use the $55 supermicro dual M.2 NVME to PCIE x8 card, you need to bifurcate the lanes to x4 , x4. Bifurcation was not available with my supermicro motherboard, however it was added with a bios upgrade.

Now the Qnap Qm2-4P-384 quad m.2 nvme to pcie card is a x8 card that doesn’t need bifurcation. It will also work in a x4 (elec) x8 physical slot. And as you say, the speed will be 1/2, (or 1/4 if using x4 slot) speed vs an M.2 card into a x4 slot.

I have the Qnap card and its in my older 4-4TB drive freenas box, with 4-2TB M.2 nvme drives. The card is in a single x16 slot of the mini-itx supermicro motherboard. It’s experimental at the moment, with 2 drives in a 2-way mirror, and a replication task copying a few of my datasets.

It appears to work very well. But that card is $180, so pretty expensive. Also kind of a pain to keep track of which drive is which, as the heat sink covers them.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Qnap Qm2-4P-384 quad m.2 nvme to pcie card
Nice. A little pricey but you do not have to worry about all the PCIe lanes and bifurcation. The only thing I don't like about it is the fan. Is it really going to dissipate much more heat if it's spinning? I'd just unplug it and test it out. PCIe 3 do not heat up like the PCIe 4 speeds do and I would be purchasing PCIe 3 myself, heat is the reason to be honest. I could buy PCIe 4 if the price was cheaper and still run at PCIe 3 speeds. It's just the way I think. I prefer a fanless system. The CPU and one exhaust fan is all I would try to get away with if I could.

It’s experimental at the moment
Then I would recommend you recreate the pool in a few ways. You have RAIDZ2, Mirrors, and whatever you want to try. Test them out. You will find one you like. You know my favorite is RAIDZ2 but I do not have any demanding I/O tasks. I store archival type data and some videos, and I only have a 1Gbit network. Upgrading to 10Gbit would not be financially worth it for me. So a backup of a computer changes from 20 minutes to 5 minutes. It is not a big deal to me. If this were a business then the time could warrant it if I'm paying someone to do this. For a home system, neh, I have time. Plus if I automate it, I could care less if it took 2 hours. Just giving you my perspective. Other people will di things differently as I suspect you have an idea in your mind what you want the system to do.

Experiment, I did for a few weeks and then I finalized my first FreeNAS system. When I had to replace my drives, I changed my configuration (after I backed all my data up).

Best of luck to you.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
Its been awhile, so as a reminder, here is my system

Motherboard: used Supermicro X10SRL-F
CPU: used XEON E5-2630L v4
Ram: used 64 gb ECC
Boot Drives: new old stock 2- Intel Enterprise 4510 SSD 240 gb sata
Data drives: new old stock 6- Intel Enterprise 4510 SSD 4 TB sata
Drives connected to motherboard Sata ports
1 gb ethernet via onboard controller to network switch
Truenas core 13.0

I finally got around to creating a raidz2 array using 5 of 6 of the 4tb SSD disks. I then created a dataset and rsync tasks to backup 3 PCs. This went well with all 3 PCs backed up (only backing up the main user under USERS directory). A day later, one of the disks was declared bad and removed from the pool by truenas.

Since these are new disks that were installed in nov 2023, but not put into a pool until yesterday, I found it hard to believe that a disk had failed. What was the failure? I tried to replace the failed disk - following the manual - with the spare I had already installed, but the drop down list of available drives was empty.

So instead, I destroyed the pool, and created just a 2 disk mirror (for more testing). Then rewrote the rsync tasks, and backed up the PCs to this new pool. I can’t see how the drive could have been declared bad. If there were a write error, shouldn’t the disk automatically remap the offending sector? It just seemed fishy to me.

After destroying the raid2z pool, the bad drive was no longer listed as an available drive. So I removed it from the truenas and put in into a windows box, reformatted it, and it appeared ok. I was able to write files to it.

I then put the windows formatted drive back into the truenas box, and included it into a 2 disk mirror. I then set up a replication task that copied my first dataset (that had the 3 PC backups) to this new pool (containing the “bad disk” ). The replication task was successful.

To me, this casts doubt in my mind as to why truenas was so quick to declare a new, and apparently good drive as bad. Time will tell.

Keep in mind, I only use the nas as a backup device. It is not used as a primary storage, rather it is the backup. primary storage is the PC that I’m backing up. I already have a freenas device that backs up these PCs, and I am in the process of adding this truenas as a secondary backup.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
A day later, one of the disks was declared bad and removed from the pool by truenas.
Do you recall exactly the error? I suspect it was a ZFS error, not actual drive error. Basically the drive either got booted out of the pool because it was reacting slow, or there can be an occasional glitch if you are using SCALE, while CORE is pretty darn stable. I'm not saying SCALE is bad, but not as mature as CORE. So knowing the exact error message makes a difference and the recovery is completely different depending on the error message.

I too agree that it's unlikely the drive is bad. BUT, post the output of smartctl -a /dev/??? where ??? is the drive ident (da0, ada0, sda, etc) and if the drive is showing errors, we should see it here. Sector remapping does happen if there is a physical drive failure. Lets see what the SMART data shows.

It sounds like you have had some time to explore and even make things work, good deal.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Yes, getting enough pcie lanes is the issue. The E5-2630L Xeon has 40 pcie lanes, and on this motherboard, you can arrange 5 slots with x8.
Is this stated as such in the manual? Because if not I would caution that you probably need lanes for connecting some of the on-board stuff.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
To me, this casts doubt in my mind as to why truenas was so quick to declare a new, and apparently good drive as bad.
A successful format is certainly better than the opposite ;-). But it also does not mean that the drives is without issues. Just like a successful extended SMART test. Even with the latter your disk can still start to exhibit problems the next minute.

Let's see how things work out. Good luck!
 

nabsltd

Contributor
Joined
Jul 1, 2022
Messages
133
Is this stated as such in the manual? Because if not I would caution that you probably need lanes for connecting some of the on-board stuff.
Yes, it is.

The motherboard has 32 total lanes connected to 4x slots of electrical x8, and 8 lanes connected to a PEX that feeds 2x slots of electrical x8. These last two slots can be configured as x8/x0 or x4/x4.

Everything else is connected to the PCH.
 

Richard Kellogg

Dabbler
Joined
Jul 30, 2015
Messages
27
Do you recall exactly the error? I suspect it was a ZFS error, not actual drive error. Basically the drive either got booted out of the pool because it was reacting slow, or there can be an occasional glitch if you are using SCALE, while CORE is pretty darn stable. I'm not saying SCALE is bad, but not as mature as CORE. So knowing the exact error message makes a difference and the recovery is completely different depending on the error message.

I too agree that it's unlikely the drive is bad. BUT, post the output of smartctl -a /dev/??? where ??? is the drive ident (da0, ada0, sda, etc) and if the drive is showing errors, we should see it here. Sector remapping does happen if there is a physical drive failure. Lets see what the SMART data shows.

It sounds like you have had some time to explore and even make things work, good deal.
I'm not sure where to find the error logs. Just saw that truenas said the drive was bad. attached file is smartctl output for the "bad" drive. I don't know how to interpret the smartctl output.
 

Attachments

  • ada3-startctl.txt
    5.7 KB · Views: 25
Top