Diaries of a n00b
I have no idea where to post this or if it is even apreciated but I guess it won't hurt anyone.
So this is my little story about how I thought I could quickly replace my windows file server with a freenas box.
Or rather (because this was my first idea) complement my windows file server with a freenas box with the idea that the windows box would simply replicate to the freenas box so I could use snapshots to restore files that got deleted by users and have a backup by means of a second of site box and server replication…..
I had no idea that this would get so complicated!!!
I mean.. how hard can it be? Get a box, put in some drives and turn it on right?
So after watching some youtube movies (I know) and reading several best practice topics on the forums I was happily testing on a virtual box. I had 6 virtual drives in raidz2 as my freenas test machine. I was making snapshots and had active directory working! Life was good!
Until…..
I realized that in order to make a snapshot instantaneously you need to have the data on the freenas box… That sounds logical and simple.. just copy your data over there. But I hit a small bottleneck. In order to copy over the data in an efficient manner, you first need to scan for changes. I’ve tried rsync and syncthing (to get this working as a freenas n00b is a topic on its own). The problem with this was that the scanning part of the process took HOURS and HOURS for a subset of the data on my windows machine. (about 100 gigs of 1 TB of total data). I would be better of making a daily backup!!
That’s not what I was going for so I then thought I’ll just copy everything over to the freenas and let everyone work on that instead of the windows box. Problem solved!
So being quite proud of my selves I posted my intended hardware with the note that I intended to actually use it as the filesystem that everyone would work on.
I got slaughtered….
I posted a x11 system with a i3 6700 and 32 gigs of ram. This because I read that clock is more important than cores and that even if 1 gig per TB is recommended, you can never have enough ram. I intended to use raidz2 with 6 drives (powers of 2 with redundancy on top of that. I did my homework even if now I see that if you use compression this rule goes out of the window. But I digress).
Note that this system would have about 10 TB of usable data where we accumulated about 1 TB in the last few years. The amount of space is overkill!
Little did I know that the performance of raidz2 apparently is sub optimal for IOPS. The general advice was: use mirrors!!!
Use mirrors.. What the h3ll does that mean. Back to the forums and a lot of other resources on the internet and after a lot of reading I concluded (I am a n00b so I will not say I found out since I’m not 100% sure that what I jug down here is correct) that raidz2 generally has the performance of a single drive and if you use several mirrors in one pool, you start to get the benefits of striping in raid.
That made me think.
What’s the actual situation on my network now? (I’m an manager. Not an sys admin) so I found out that we have two kinds of use cases.
1) engineers working on autocad files that area between 2 and 15 MB per file (ballpark)
2) engineers that work in an engineering platform that had the great idea to write thousands and thousands of files with extremely small file sizes. I knew this but I never really checked how small these files are.
Use case 1 I’m not woried about. Even with 20 MB/s transfer speeds over a 1 gig lan this is no problem whatsoever. Less than a second to open a file is acceptable for me as they will work for many minutes before opening the next file. It’s not a workflow bottleneck.
Use case 2 needed some investigation.
How small are the files? How many IOPS does this actually comprise of?
So I took a look at one typical directory on witch an engineer would work on. I found a 100 MB folder. Nice and small… no big deal. Only 76000 files… wait what!!!!
then I found that a spindle can handle about 100 IOPS. Time to open a beer…..
after some checking I found out that not all files are read by the software for every action but it was more than clear that IOPS and not throughput is a problem here. Also IOPS is probably the time consuming part for backups. Keep in mind that this 100 MB of 76000 files is only a subset of several tenth of gigabytes of data.
So I decided to make a ram drive on my workstation and copy the folder there. It took about 5 minutes from my windows box. Copying it to a freenas box with a raidz of 3 drives took about 2 minutes. And copying it back took took about 3 minutes. So I’m getting about 3 to 600 KB/s throughput. However much more IOPS than expected.
Now I’m in the here and now of my “simple replacement of my windows box”
I’m now considering the following implementation:
2 zpools. One with just a single mirror of ssd drives (500 gig each and hoping for about 10000 IOPS) for the IOPS workload.
The second zpool will be a raidz2 pool of 6 drives. The idea is to make a mount point for the rest of the workloads with the bigger files and a backup mount point for the ssd vdevs zpool. That way the immediate reads and writes for the engineers that need high IOPS would use the ssd drives and in the background this would replicate to the slower zpool.
The other workload would be on this second zpool directly, and then the entire slow zpool would be replicated to a second (off site) box.
I thought about adding multiple ssd mirrors to increase IOPS even further, but I don’t believe it will help since it’s going over a lan. Ethernet also adds latency. Speaking of that, I’ve learned on this forum that realtek uses the cpu a lot and Intel nics don’t. So I’m replacing all nics in our work stations. Any call to the cpu adds latency so if this has to be done for the transfer of each file realtek nics are just not acceptable….
I hope you guys had a good laugh at my investigation and all the mistakes I made and all the miss conceptions I still have. I’m trying to learn and the forums here are giving me great pointers. I thought the least I could do is to write down my adventure here for entertainment :)
Signing off for now.
Huib.
p.s. Depending on the reactions to this post I might post test results on my actual implementation later. After probably much more reading…..
I have no idea where to post this or if it is even apreciated but I guess it won't hurt anyone.
So this is my little story about how I thought I could quickly replace my windows file server with a freenas box.
Or rather (because this was my first idea) complement my windows file server with a freenas box with the idea that the windows box would simply replicate to the freenas box so I could use snapshots to restore files that got deleted by users and have a backup by means of a second of site box and server replication…..
I had no idea that this would get so complicated!!!
I mean.. how hard can it be? Get a box, put in some drives and turn it on right?
So after watching some youtube movies (I know) and reading several best practice topics on the forums I was happily testing on a virtual box. I had 6 virtual drives in raidz2 as my freenas test machine. I was making snapshots and had active directory working! Life was good!
Until…..
I realized that in order to make a snapshot instantaneously you need to have the data on the freenas box… That sounds logical and simple.. just copy your data over there. But I hit a small bottleneck. In order to copy over the data in an efficient manner, you first need to scan for changes. I’ve tried rsync and syncthing (to get this working as a freenas n00b is a topic on its own). The problem with this was that the scanning part of the process took HOURS and HOURS for a subset of the data on my windows machine. (about 100 gigs of 1 TB of total data). I would be better of making a daily backup!!
That’s not what I was going for so I then thought I’ll just copy everything over to the freenas and let everyone work on that instead of the windows box. Problem solved!
So being quite proud of my selves I posted my intended hardware with the note that I intended to actually use it as the filesystem that everyone would work on.
I got slaughtered….
I posted a x11 system with a i3 6700 and 32 gigs of ram. This because I read that clock is more important than cores and that even if 1 gig per TB is recommended, you can never have enough ram. I intended to use raidz2 with 6 drives (powers of 2 with redundancy on top of that. I did my homework even if now I see that if you use compression this rule goes out of the window. But I digress).
Note that this system would have about 10 TB of usable data where we accumulated about 1 TB in the last few years. The amount of space is overkill!
Little did I know that the performance of raidz2 apparently is sub optimal for IOPS. The general advice was: use mirrors!!!
Use mirrors.. What the h3ll does that mean. Back to the forums and a lot of other resources on the internet and after a lot of reading I concluded (I am a n00b so I will not say I found out since I’m not 100% sure that what I jug down here is correct) that raidz2 generally has the performance of a single drive and if you use several mirrors in one pool, you start to get the benefits of striping in raid.
That made me think.
What’s the actual situation on my network now? (I’m an manager. Not an sys admin) so I found out that we have two kinds of use cases.
1) engineers working on autocad files that area between 2 and 15 MB per file (ballpark)
2) engineers that work in an engineering platform that had the great idea to write thousands and thousands of files with extremely small file sizes. I knew this but I never really checked how small these files are.
Use case 1 I’m not woried about. Even with 20 MB/s transfer speeds over a 1 gig lan this is no problem whatsoever. Less than a second to open a file is acceptable for me as they will work for many minutes before opening the next file. It’s not a workflow bottleneck.
Use case 2 needed some investigation.
How small are the files? How many IOPS does this actually comprise of?
So I took a look at one typical directory on witch an engineer would work on. I found a 100 MB folder. Nice and small… no big deal. Only 76000 files… wait what!!!!
then I found that a spindle can handle about 100 IOPS. Time to open a beer…..
after some checking I found out that not all files are read by the software for every action but it was more than clear that IOPS and not throughput is a problem here. Also IOPS is probably the time consuming part for backups. Keep in mind that this 100 MB of 76000 files is only a subset of several tenth of gigabytes of data.
So I decided to make a ram drive on my workstation and copy the folder there. It took about 5 minutes from my windows box. Copying it to a freenas box with a raidz of 3 drives took about 2 minutes. And copying it back took took about 3 minutes. So I’m getting about 3 to 600 KB/s throughput. However much more IOPS than expected.
Now I’m in the here and now of my “simple replacement of my windows box”
I’m now considering the following implementation:
2 zpools. One with just a single mirror of ssd drives (500 gig each and hoping for about 10000 IOPS) for the IOPS workload.
The second zpool will be a raidz2 pool of 6 drives. The idea is to make a mount point for the rest of the workloads with the bigger files and a backup mount point for the ssd vdevs zpool. That way the immediate reads and writes for the engineers that need high IOPS would use the ssd drives and in the background this would replicate to the slower zpool.
The other workload would be on this second zpool directly, and then the entire slow zpool would be replicated to a second (off site) box.
I thought about adding multiple ssd mirrors to increase IOPS even further, but I don’t believe it will help since it’s going over a lan. Ethernet also adds latency. Speaking of that, I’ve learned on this forum that realtek uses the cpu a lot and Intel nics don’t. So I’m replacing all nics in our work stations. Any call to the cpu adds latency so if this has to be done for the transfer of each file realtek nics are just not acceptable….
I hope you guys had a good laugh at my investigation and all the mistakes I made and all the miss conceptions I still have. I’m trying to learn and the forums here are giving me great pointers. I thought the least I could do is to write down my adventure here for entertainment :)
Signing off for now.
Huib.
p.s. Depending on the reactions to this post I might post test results on my actual implementation later. After probably much more reading…..