I am so green...

dakotta · Oct 12, 2018

Hello,

I have been using a Verbatim 4-disk RAID 5 storage array (1.3 TB) for about 8 years and I'm becoming increasingly anxious about the stability of the array and my ability to rebuild it if it should fail. (I have spare disks, but if the circuit board fails I don't think I'll be able to find another.)

I'd like to build a FreeNAS server and migrate my data to it.

I've never done anything like this, so there will be a steep learning curve and many hours of breaking/fixing things before I'm happy.

Since I've never done this, and I'm feeling my way about, my ideas are part fantasy and part reality: I need a reality check.

My goals are:
1. rock-solid backup strategy, with versioning and off-site backups
2. the ability to stream music and movies from the NAS to anywhere on my local (home) network
3. the ability to run virtual machines for Windows and Linux (one machine at at time)
4. the ability to stream music/access data from the NAS to a remote machine on the Internet through a VPN

My immediate goals are #1 and #2.
If #3 and #4 are practical, I'd like to design my current system with future expansion in mind.

I have some ideas on hardware, but it seems like the first step is to figure out my required storage capacity.

Right now I have around 4 TB of data that I want to back up. If I double that in 3 years and take a 20% hit for overhead, I will need 10 TB of useful storage capacity.

Using this calculator:
If I set up a RAID-Z2 with 4 TB disks, then 5 disks would give me 12 TB of usable storage, and 6 disks 16 TB? Is that correct?

From what I've read so far, versioning (by way of snapshots) will not increase my storage needs considerably (for data that has not changed). Is that correct?

====================

As a starting point, I'm considering the following:

Chassis: Fractal Design, Node 804, micro ATX cube
https://www.newegg.com/Product/Product.aspx?Item=N82E16811352047

Motherboard: Supermicro, MBD-X11SSM-F-O micro ATX, LGA 1151 Intel C236
https://www.newegg.com/Product/Product.aspx?Item=N82E16813183013

CPU: Intel, Xeon E3-1230 V5 3.4 GHz LGA 1151 80W
https://www.newegg.com/Product/Product.aspx?Item=N82E16819117613

RAM: Samsung, 16GB 288-Pin DDR4 2400 (PC4 19200) Unbuffered Server Memory (32GB total)
https://www.newegg.com/Product/Product.aspx?Item=9SIA7S67Y99082

HDD: Western Digital, Red 4TB NAS Hard Disk Drive - 5400 RPM 3.5 Inch (5 drives)
https://www.newegg.com/Product/Product.aspx?Item=N82E16822236599

Boot Disks: SanDisk, 2.5" 32GB SATA III Internal Solid State Drive
https://www.newegg.com/Product/Product.aspx?Item=9SIA6AH22V6992

Power Supply: Seasonic, FOCUS Plus Series SSR-650PX 650W 80+ Platinum 120mm FDB Fan
https://www.newegg.com/Product/Product.aspx?Item=N82E16817151192

Is this large enough? Too large?

UPS: APC, SMC1000 Smart-UPS 1000VA 120-Volt
https://www.newegg.com/Product/Product.aspx?Item=9SIA17P65C5082

Is this large enough? Too large?

Cheers,

HoneyBadger · Oct 12, 2018

You might call yourself "green" but it certainly looks like you've done your homework here.

Unfortunately NewEgg seems to be down at the moment so I can't check compatibility; but the list of components seems to be sound, as does the plan for RAIDZ2 and your capacity calculations.

jgreco · Oct 12, 2018

dakotta said:
Hello,

I have been using a Verbatim 4-disk RAID 5 storage array (1.3 TB) for about 8 years and I'm becoming increasingly anxious about the stability of the array and my ability to rebuild it if it should fail. (I have spare disks, but if the circuit board fails I don't think I'll be able to find another.)

I'd like to build a FreeNAS server and migrate my data to it.

I've never done anything like this, so there will be a steep learning curve and many hours of breaking/fixing things before I'm happy.

Since I've never done this, and I'm feeling my way about, my ideas are part fantasy and part reality: I need a reality check.

It looks to me like you've got a reasonable plan and reasonable expectations, along with an understanding of the threat posed by old hardware. You appear to have done homework, and appear to be prepared to do experimentation and other testing. This has all the hallmarks of the things that go into a successful deployment and a happy user.

I'd say you're on the right track and keep doing what you're doing.

JohnK · Oct 13, 2018

Hello, you want to check you Ram Compatibility. That mother board requires Unbuffered ECC UDIMM. What I see on Newegg is all registered.

I have a similar mb in my main server and use:
Kingston ValueRAM 16GB (1 x 16GB) DDR4 2400 RAM (Server Memory) ECC DIMM (288-Pin) KVR24E17D8/16
https://www.newegg.com/Product/Product.aspx?Item=N82E16820242273

Also, my first two servers, main and backup I used Fractal Design cases. (Note my outdated sig still show that) Those are great, until you have to start replacing drives. I ended up changing both my server to 3U and 4U and placing them in a rack down in my basement...

dakotta · Oct 13, 2018

Thanks HoneyBadger and jgreco! "There is no kill like overkill" -- hehehe

JohnK said:
Hello, you want to check you Ram Compatibility. That mother board requires Unbuffered ECC UDIMM. What I see on Newegg is all registered.

I have a similar mb in my main server and use:
Kingston ValueRAM 16GB (1 x 16GB) DDR4 2400 RAM (Server Memory) ECC DIMM (288-Pin) KVR24E17D8/16
https://www.newegg.com/Product/Product.aspx?Item=N82E16820242273

Thanks!

For some reason, the item # got stripped from my above link for RAM. I added it back in. The correct description is: Supermicro MEM-DR416L-SL01-EU24 16GB (1x16GB) DDR4 2400 (PC4 19200) ECC Unbuffered Memory RAM. I also double-checked the Supermicro website for compatibility: https://www.supermicro.com/support/resources/mem.cfm

This seems okay... unless I got the board wrong. I was checking for X11SSM-F which is not quite the same as MBD-X11SSM-F-O. I saw a guide here on the Supermicro part numbers, but I can't find it now. So, I'm not sure that these boards require the same RAM.

JohnK said:
Also, my first two servers, main and backup I used Fractal Design cases. (Note my outdated sig still show that) Those are great, until you have to start replacing drives. I ended up changing both my server to 3U and 4U and placing them in a rack down in my basement...

Er... I thought about a rack-mount, but I couldn't figure out what the thing is called that the server mounts into. Amazingly, searching for 'server rack' returns tons of hits. ;) I'll check this out as an option.

Cheers,

JohnK · Oct 13, 2018

dakotta said:
Er... I thought about a rack-mount, but I couldn't figure out what the thing is called that the server mounts into. Amazingly, searching for 'server rack' returns tons of hits. ;) I'll check this out as an option.

Cheers,

I bought mine from Craigslist. Just do a search for server rack and you should be able to get a nice one for about $100.

Chris Moore · Oct 13, 2018

JohnK said:
I bought mine from Craigslist. Just do a search for server rack and you should be able to get a nice one for about $100.

Depends very much on location.

dakotta said:
If I set up a RAID-Z2 with 4 TB disks, then 5 disks would give me 12 TB of usable storage, and 6 disks 16 TB? Is that correct?

Pretty close. Don't forget that you should not use the last 20% of free space because it is copy-on-write. Is that what you are calling overhead? Two different things really.

dakotta said:
From what I've read so far, versioning (by way of snapshots) will not increase my storage needs considerably (for data that has not changed). Is that correct?

Yes, however you need to keep an eye on how much space snapshots are consuming because it can add up faster than you expect.

Lots of great reading under the "Useful Links" button in my signature.

JohnK · Oct 13, 2018

dakotta said:
Hello,

3. the ability to run virtual machines for Windows and Linux (one machine at at time)

I know that FreeNas can be used for virtual machines and I tried for a day or two before rather virtualizing FreeNas on ESXI. I just find if much easier. Also move Plex and Nextcloud out of FreeNas onto their own virtualized Linux servers.

Chris Moore said:
Depends very much on location.

Absolutely, but trust me, if you ca pick it up locally, you will safe a ton on shipping. When I bought my Eaton and the guy brought it out with a forklift, I realized how heavy it was. I had to take that thing apart piece by piece just to get it out of my van...I will probably sell my house with it in it...

Chris Moore · Oct 13, 2018

JohnK said:
if you ca pick it up locally, you will safe a ton on shipping. When I bought my Eaton and the guy brought it out with a forklift, I realized how heavy it was. I had to take that thing apart piece by piece just to get it out of my van...I will probably sell my house with it in it...

If you can find one locally. That is the hard part where I live. I deal with racks at work and we just got an old one out that was slightly taller than the rest. It was too tall to fit through the door standing up. It took two strong guys and a moving dolly to get that out, even empty. I wouldn't want that one though.
I have an old house and just getting a rack inside would be a significant challenge. Narrow doors and narrow hallways, it wouldn't pass modern code. I have an eye on the utility room off the garage, if I get the time and money to do it, I will convert it into a small server room and buy a real rack to put in there.

Chris Moore · Oct 13, 2018

pro lamer said:
so I doesn't seem very different to me...

The 'overhead' in ZFS is the amount of data the file system stores on disk, like checksum data, that is required by ZFS but it is not your actual data.
The amount of space that you are not supposed to use, 20%, is because the file system writes changes to existing files, or new data into free space first, so you need to have some free space to make the write. After the write is successful, the space where the file was stored before is marked as free, if you don't have a snapshot that is holding that block from being free. When you fill the pool beyond 80%, the performance is reduced because the system is having to spend more time hunting for free space. When you fill it beyond 90%, performance falls off a cliff because the system changes to a whole different algorithm to optimize use of free space. As you approach 100% the system will grind to a halt. The recommendation to keep 20% free space for the copy-on-write nature of the file system has nothing to do with the actual 'overhead' of the checksum data that ZFS creates, it has to do with the ability to use the file system at all.
This has been explained many times on the forum. I am paraphrasing what I read on this forum years ago when I first started investigating using ZFS.

pro lamer · Oct 14, 2018

Chris Moore said:
This

Soon after posting I realized that even the dictionary I had quoted was clear enough.

I deleted the post hoping no one had read it yet :)

Anyway it seems simple:

Overhead = extra cost = parity data actually written, metadata etc
The 20% free space - not written so might not be called overhead. But seems not to have any own name, at least not in this calculator nor in the calculator used by the OP author

Edit: I guess the lack of special name for the 20% made me confused. Possibly others as well...

Sent from my mobile phone

jgreco · Oct 14, 2018

pro lamer said:
Edit: I guess the lack of special name for the 20% made me confused. Possibly others as well...

Because the 20% isn't 20%. It could be 1% or it could be 80%, depending on many factors. If you have a write-once archival pool, you can probably get it down to 1%. If you need extremely good VM performance with high fragmentation issues, you need 80%(+!) free.

You might call it "ZFS domain knowledge."

Ender117 · Oct 14, 2018

jgreco said:
If you need extremely good VM performance with high fragmentation issues, you need 80%(+!) free. :)

At that number a all flash storage might start to be more cost-effective though

jgreco · Oct 14, 2018

Ender117 said:
At that number a all flash storage might start to be more cost-effective though

SSD can potentially be slower than HDD if you go around things the wrong way. HDD will have a consistent (and fast) write rate for sequential access. SSD can slow down dramatically for writes once the free page pool is exhausted, and can actually be *slower* than HDD sequential.

More importantly, ZFS is generally optimized for that aspect of HDD performance. ZFS likes to allocate out of longer contiguous blocks of space, and because of this, it can be competitive with SSD. However, you quite possibly need to merge it with some L2ARC SSD, because read seeks of HDD are going to be the thing that kills HDD compared to SSD.

It's interesting. At some point you will be correct, but right now you won't find that to be correct. You still need significant freespace on SSD to be competitive.

Ender117 · Oct 14, 2018

jgreco said:
SSD can potentially be slower than HDD if you go around things the wrong way. HDD will have a consistent (and fast) write rate for sequential access. SSD can slow down dramatically for writes once the free page pool is exhausted, and can actually be *slower* than HDD sequential.

More importantly, ZFS is generally optimized for that aspect of HDD performance. ZFS likes to allocate out of longer contiguous blocks of space, and because of this, it can be competitive with SSD. However, you quite possibly need to merge it with some L2ARC SSD, because read seeks of HDD are going to be the thing that kills HDD compared to SSD.

It's interesting. At some point you will be correct, but right now you won't find that to be correct. You still need significant freespace on SSD to be competitive.

That's interesting indeed. And yes if garbage collection cannot keep up SSD performance would tank. However you are describing a case with high sustained write where the pool will be filled quite fast to which fragmentation will hurt HDD pool's performance bad. Maybe even faster than you deplete a decent SSD's free page pool. Unless of course you are also get rid of old data at ~same speed. This sounds like a very limited use case.

I have always felt ZFS is kind of behind the SSD age. Like indexing L2ARC in RAM. If the index is put in L2ARC itself of course speed would be impacted but you can use a much larger flash cache. Once I discussed with some one who is severing media contents on ZFS to a dozen of people over several 10Gbps link. We soon find out to cache the hottest workset (like the newest episode, most popular movie, etc.) on L2ARC you need excessive amount of RAM. The answer was to swap for 10K HDDs. If this behavior can be changed L2ARC can work as a write-through cache layer which gives you maximum (read)performance while the backing HDD pool provides redundancy.

Arwen · Oct 15, 2018

If I remember correctly, 2 new features are coming to ZFS for the L2ARC;

Compressed data will be preserved on SSD
Metadata in RAM will either be compressed, or it's format shrunk

The first indirectly improves the R/W speed of the SSD, while the second reduces memory impact, (or lets more entries exist).

jgreco · Oct 15, 2018

Well, your argument was that an "all-flash array" might be more cost-effective. Generally speaking, this probably isn't (yet) true. Currently, low-end SSD (WD Blue) is about $160 for 1TB, whereas low-end HDD you can shuck 8TB HDD (WD Red) for $159.

The typical use case for all-flash arrays are for higher demand workloads, and when you look under the sheets, ZFS is really highly optimized towards some of the quirks of hard disk drives, and also optimized for certain types of tasks. ZFS will do long runs of sequential data very well when you're using RAIDZ, as an example, but performance drops precipitously if you try to do random access workloads or moderate-to-high levels of concurrent access on RAIDZ. ZFS has a built-in understanding that sequential LBA access is faster for HDD's, which is why it looks for larger runs of free space and writes things that way. L2ARC is designed as mitigation for read speed impacts of fragmentation of frequently-read data, but that's mostly only meaningful for HDD.

By comparison, ZFS doesn't really play to any of the strengths of SSD, at least not deliberately, because it was really designed in the pre-flash era. Newer commercial systems designed for all-flash organize their storage differently and use a combination of strategies for compression, deduplication, etc., many things ZFS adopted and integrated, but ZFS is a massive heavyweight elephant, with certain design assumptions that storage operated in the milliseconds-to-tens-of-milliseconds speed range, and it has had trouble adapting to keep pace with commercial offerings. It isn't optimized for it, whereas some of the commercial offerings started at the ground floor writing to the strengths of SSD.

Where you consider other technologies such as hybrid arrays, ZFS has claimed "storage tiering" as a happy accident/side-effect of L2ARC, but it isn't a true "designed-it-that-way" technology (think something like macOS Fusion Drive).

If you were going to go with an all-flash array, you might be better off with different underlying filesystem technology. You probably won't find that for free though. It will also be a small array.

By way of comparison, you can get a much larger ZFS array, use only a small part of it, and wait out the next year or so while flash prices crash to an even better price. Then you buy your all-flash array and are left with a massive HDD array that you can use for backups or something like that.

HoneyBadger · Oct 15, 2018

Arwen said:
If I remember correctly, 2 new features are coming to ZFS for the L2ARC;

Compressed data will be preserved on SSD

Metadata in RAM will either be compressed, or it's format shrunk

The first indirectly improves the R/W speed of the SSD, while the second reduces memory impact, (or lets more entries exist).

For #1, pL2ARC exists in Oracle ZFS, but it's still in progress for OpenZFS. Really hoping that one lands soon.

For #2, the L2ARC headers are now way tinier in RAM than they used to be (80 bytes IIRC) and compressed ARC should also apply to metadata, and headers compress well. I did a quick check with some data on a test-box and ended up with around 40 bytes per record.

Ender117 · Oct 15, 2018

jgreco said:
Well, your argument was that an "all-flash array" might be more cost-effective. Generally speaking, this probably isn't (yet) true. Currently, low-end SSD (WD Blue) is about $160 for 1TB, whereas low-end HDD you can shuck 8TB HDD (WD Red) for $159.

Well, if you have certain read requirements. The money spent on RAM and L2ARC might be better put towards a all flash array. That's what I was saying.

jgreco said:
The typical use case for all-flash arrays are for higher demand workloads, and when you look under the sheets, ZFS is really highly optimized towards some of the quirks of hard disk drives, and also optimized for certain types of tasks. ZFS will do long runs of sequential data very well when you're using RAIDZ, as an example, but performance drops precipitously if you try to do random access workloads or moderate-to-high levels of concurrent access on RAIDZ. ZFS has a built-in understanding that sequential LBA access is faster for HDD's, which is why it looks for larger runs of free space and writes things that way. L2ARC is designed as mitigation for read speed impacts of fragmentation of frequently-read data, but that's mostly only meaningful for HDD.

By comparison, ZFS doesn't really play to any of the strengths of SSD, at least not deliberately, because it was really designed in the pre-flash era. Newer commercial systems designed for all-flash organize their storage differently and use a combination of strategies for compression, deduplication, etc., many things ZFS adopted and integrated, but ZFS is a massive heavyweight elephant, with certain design assumptions that storage operated in the milliseconds-to-tens-of-milliseconds speed range, and it has had trouble adapting to keep pace with commercial offerings. It isn't optimized for it, whereas some of the commercial offerings started at the ground floor writing to the strengths of SSD.

Where you consider other technologies such as hybrid arrays, ZFS has claimed "storage tiering" as a happy accident/side-effect of L2ARC, but it isn't a true "designed-it-that-way" technology (think something like macOS Fusion Drive).

If you were going to go with an all-flash array, you might be better off with different underlying filesystem technology. You probably won't find that for free though. It will also be a small array.

Exactly, ZFS is a bit outdated in this regard. OTOH, I feel ZFS to be the most mature among the "next gen FS". ReFS+storage space sounds interesting but write performance on parity pool is just pain. BTRFS had major bugs on RAID5/6 level that chew up user data not long ago. I have been living under the rock so there might be other tech that I don't know, but right now ZFS is what I would trust my data with.

jgreco said:
By way of comparison, you can get a much larger ZFS array, use only a small part of it, and wait out the next year or so while flash prices crash to an even better price. Then you buy your all-flash array and are left with a massive HDD array that you can use for backups or something like that. :)

Absolutely hope this is true, though I am not very optimistic after seeing RAM prices these years.

jgreco · Oct 16, 2018

Ender117 said:
Well, if you have certain read requirements. The money spent on RAM and L2ARC might be better put towards a all flash array. That's what I was saying.

Not right now, not yet, at least. You can probably create specific examples where that's the better option, but assumes that we're disregarding the cost of the storage system itself and just looking at the cost of raw storage. There are certainly filesystems that perform much better on flash, but they're generally proprietary stuff created by startups, so you really need to look at the total solution cost, plus also factor in things like "will they be around next year." I'm thinking of Tintri but that's just a recent example.

Exactly, ZFS is a bit outdated in this regard. OTOH, I feel ZFS to be the most mature among the "next gen FS". ReFS+storage space sounds interesting but write performance on parity pool is just pain. BTRFS had major bugs on RAID5/6 level that chew up user data not long ago. I have been living under the rock so there might be other tech that I don't know, but right now ZFS is what I would trust my data with.

Well, ZFS is "a bit outdated" in the same way my pickup truck is "not really designed" to pull a semi trailer. ZFS was designed around hard disks. It's genius at that. It just isn't that good at flash. It isn't *terrible*, but that's largely due to flash meeting it at least halfway by emulating disk. There's stuff out there written from the ground up for flash and it's amazing. That's not ZFS and it isn't likely that it will ever be, absent some massive rewrite.

The problem here is that designs for important CS stuff used to be underwritten by universities or companies and opened to the world. What we see now is a lot of less-thorough development that aims for the release of a proprietary product. ZFS was originally funded by Sun when it was a UNIX powerhouse, and as that slid, they opened it up for development and now there are lots of parties working on it. Many of these parties, including Nexenta and iXsystems, do profit from this, but they understand the underlying value.

Unfortunately, serious development takes lots of money, and so we're not likely to see a "generation two ZFS" with true storage tiering, the ability to add and remove vdevs and component disks easily, etc., even though these things and more are definitely possible to design for. Most of the current ZFS development is relatively small scale. The companies that are betting on being the next HDS or EMC, and have the massive teams of developers, aren't working on ZFS -- they're working on proprietary products. Companies like Nexenta and iXsystems cannot compete at that level, as much as I'd love to see them do so. So they hack on ZFS and target other segments of the storage market.

Now, the funny part about all of this, and the hard part to predict, is that there probably isn't a need for all storage to be super-mega-turbo-hyper-fast. At some point, flash is expected to eat the HDD market, but there will still be a lot of need for slower, high reliability storage options, even if they happen to be backed by flash, and ZFS seems well-suited to fitting into that niche. Along the way, CPU and RAM will cheapen out, and ZFS will continue to be optimized, so it may eventually end up as one of the major future filesystems.

Absolutely hope this is true, though I am not very optimistic after seeing RAM prices these years.

Well, this stuff is complicated. We've seen DRAM price fixing in the past, and are seeing allegations of it again, and these guys are aware that they have to be careful not to create a total glut, so fabs come online slowly, and mysterious fires happen, and other random incidents gum up the works. But companies like to make profit, so they ramped up the volume to satisfy demand for smartphones, and now that's falling off a bit, which matches up with long-term expectations that've been projected for awhile that we'd see another significant price crash event for flash and RAM.

One of the things that is frequently overlooked is that there is a point where things get where things are "good enough" or "big enough" for the average user. For example, we saw 10Mbps ethernet as standard in 1993, 100 in 1996, 1G in 1999, and 10G in 2003... but while 1G consumer products were available at a reasonable price in the mid 2000's, we still don't see that for 10G. The reality is that once something gets good enough, improvements slow. The average residential user, even with video and all that, doesn't really have a significant need for network speeds past 1Gbps, so we've seen a bit of a cap there. Smartphones have reached a point of being "good enough" for many people, and the two year refresh cycle isn't happening as much. This may be catching the flash and DRAM vendors off-guard a bit, even though great strides are being made in terms of flash density. Similarly, your typical PC doesn't need a 12TB HDD, and even things like the 8TB drives are a bit of a hard sell, so we've seen a bit of a slowdown in the progress of HDD storage improvements. Those things will come to a crossroads soon. It'll be interesting.

Important Announcement for the TrueNAS Community.

I am so green...

Dabbler

actually does care

Resident Grinch

Patron

Dabbler

Patron

Hall of Famer

Patron

Hall of Famer

Hall of Famer

Guru

Resident Grinch

Patron

Resident Grinch

Patron

MVP

Resident Grinch

actually does care

Patron

Resident Grinch

Similar threads