raidz2 = half disk space available???

Status
Not open for further replies.

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Hello all!

I have a problem that I hope someone can help me resolve.

I've recently installed FreeNAS 9.1.1 on a server with 8x3TB disks, and set them up as RAIDZ2. However when the pool was created - the available space is now being shown as 10TB. So it looks like it's less than half of total disk space.

[root@nas1-ny] ~# df -h /mnt/export
Filesystem Size Used Avail Capacity Mounted on
export 10T 256k 10T 0% /mnt/export


I was under the impression that with RAIDZ2 you only lose the capacity of 2 disks, so the capacity should be 24TB - 6TB =~ 18TB. Where did the rest of the space go???

At the same time, when any files are created on the file system, it seems that zpool reports that the amount of space used is double that of the file size.

Anybody knows what's up with that?

[root@nas1-ny] ~# zpool list
NAME SIZE ALLOC FREE CAP DEDUP HEALTH ALTROOT
export 21.8T 821G 20.9T 3% 1.60x ONLINE /mnt

Oh, I also used the 2 additional 2.5-inch disks that were also in the server, that I didn't know what to do with, as ZIL (mirrored).

[root@nas1-ny] ~# zpool status
pool: export
state: ONLINE
scan: resilvered 4K in 0h0m with 0 errors on Fri Sep 6 13:08:42 2013
config:

NAME STATE READ WRITE CKSUM
export ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/cd86ccd8-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/ce39c161-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/ceeae5a6-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/cfa088f6-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
raidz2-1 ONLINE 0 0 0
mfisyspd4p2 ONLINE 0 0 0
gptid/d1155239-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/d1ce137a-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/d2842dde-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
logs
mirror-2 ONLINE 0 0 0
gptid/d3232c51-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0
gptid/d3b35f7b-170e-11e3-a4dc-b8ca3a61e4dc ONLINE 0 0 0

errors: No known data errors
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
The zpool status output shows that your pool configuration is not an 8 drive RAIDZ2, but a stripe of two 4 drive RAIDZ2s. An 8 drive RAIDZ2 would indeed have the capacity of 6 drives (8-2), but your config has the final capacity of only 4 drives: (4-2) + (4-2)
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
The zpool status output shows that your pool configuration is not 8 drive RAIDZ2, but a stripe of two 4 drive RAIDZ2s.
An 8 drive RAIDZ2 would indeed have the capacity of 8 drives (8-2), but your config has the final capacity of only 4 drives: (4-2) + (4-2)


Well, I was kinda wondering what the hell was going on there... I simply used the new FreeNAS 9.1.1 GUI to create it. Apparently it messed it up...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
And "df" is not the proper way to monitor free space in ZFS. zpool list is. If you want to use df you'll be another poster in here complaining about having more or less free space than the GUI claims you have.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
And "df" is not the proper way to monitor free space in ZFS. zpool list is. If you want to use df you'll be another poster in here complaining about having more or less free space than the GUI claims you have.


No, not really :) I realize the df is going to "float" depending on how successful the compression and/or dedup (if used) is.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
And "df" is not the proper way to monitor free space in ZFS. zpool list is. If you want to use df you'll be another poster in here complaining about having more or less free space than the GUI claims you have.


The one mistake I made is I rushed moving my VMWare VMs to the new NFS server without checking first... So now I have to migrate them back to the old server and re-create the pool from scratch :)

Still can't decide whether to use dedup for the VMs or not... Technically, most VMs are clones of the same template, so dedup should do a very good job here. But not sure if 32G of RAM will suffice...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No, not really :) I realize the df is going to "float" depending on how successful the compression and/or dedup (if used) is.

Actually, its far messier than that. It gets confused with some dataset settings, snapshots, etc.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
I'm curious, though, why the GUI decided to create a mirror of two RAIDZ2s... I basically just selected RAIDZ2 and selected all disks - it said "Optimal!" - voila! Never bothered to check what exactly it created... :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Still can't decide whether to use dedup for the VMs or not... Technically, most VMs are clones of the same template, so dedup should do a very good job here. But not sure if 32G of RAM will suffice...

Don't expect high VM savings. Bigger blocks make it far less likely to find a match despite them having the exact same files just in a different location on the VM's disk. And while smaller blocks increase the chance of dedup working, the RAM requirements become so incredibly expensive you could have saved yourself a boatload of cash(think 5 figures) and gone with more/bigger hard drives.

Of course, all of this is if you ignore the potential significant RAM needs for dedup.

I'll give you the disclaimer, be absolutely sure you know what you are getting yourself into with dedup. FreeNAS 8.3 had this warning when dedup was added to FreeNAS:

ZFS v28 includes deduplication, which can be enabled at the dataset level. The more data you write to a deduplicated volume the more memory it requires, and there is no upper bound on this. When the system starts storing the dedup tables on disk because they no longer fit in RAM, performance craters. There is no way to undedup data once it is deduplicated, simply switching dedup off has NO AFFECT on the existing data. Furthermore, importing an unclean pool can require between 3-5GB of RAM per TB of deduped data, and if the system doesn't have the needed RAM it will panic, with the only solution being adding more RAM or recreating the pool. Think carefully before enabling dedup! Then after thinking about it use compression instead.

There have been people locked out of their data forever. So be 100% sure that you are doing what you are wanting to do.
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74


Thanks for the articles!

The VMs don't really take even 1TB total - about 200GB total.

The reason I'm thinking about dedup is to avoid going to the hard disks frequently - since these are VMs that work in a cluster, they do the same work, and there's a very good chance that each of them will need to read duped blocks that's already been cached. Less distraction for the actual hard disks - more time for them to do what can't be cached or deduped. But that's just my guess - I don't know if that's actually going to be true (still need to do some comprehensive testing).
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
Don't expect high VM savings. Bigger blocks make it far less likely to find a match despite them having the exact same files just in a different location on the VM's disk.

No, no, these VMs are exact clones of the same initial image, so if I have 10 VMs, all cloned from the same initial template image - the deduplication ratio will be 10x (well, 11x, including template). It will then start to go down a bit, since each VM will start writing its own logs inside those images, but that amount is minimal (several megabytes per VM). The bulk (8-10GB per VM) will never really change.

My main goal here is not to reduce the amount of disk space they're using, but to decrease the amount of times the NAS needs to physically access the disks.
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
I'm curious, though, why the GUI decided to create a mirror of two RAIDZ2s... I basically just selected RAIDZ2 and selected all disks - it said "Optimal!" - voila! Never bothered to check what exactly it created... :(
I checked the algorithm and the ZFS Volume Manager doesn't consider 8 drive RAIDZ2 optimal. It only likes the configuration when the number of the data disks in a RAIDZx is a power of 2 (2, 4, 8, ...). So, it will never recommend a setup with 6 data drives (7 drive RAIDZ1 or an 8 drive RAIDZ2 or a 9 drive RAIDZ3).
 

dniq

Explorer
Joined
Aug 29, 2012
Messages
74
So, is there a way to convert this to a single RAIDZ2 live, without losing the data? Or should I just migrate the VMs to the old NFS server and re-create the pool?
 

Dusan

Guru
Joined
Jan 29, 2013
Messages
1,165
So, is there a way to convert this to a single RAIDZ2 live, without losing the data? Or should I just migrate the VMs to the old NFS server and re-create the pool?
Unfortunately no. You need to destroy the pool and create it again.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
No, no, these VMs are exact clones of the same initial image, so if I have 10 VMs, all cloned from the same initial template image - the deduplication ratio will be 10x (well, 11x, including template). It will then start to go down a bit, since each VM will start writing its own logs inside those images, but that amount is minimal (several megabytes per VM). The bulk (8-10GB per VM) will never really change.

My main goal here is not to reduce the amount of disk space they're using, but to decrease the amount of times the NAS needs to physically access the disks.

Guess I don't follow here... You want to decrease the amount of times the NAS needs to physically access the disks? Even if you used deduplication the NAS will still access the hard drive data just as often, but just in one location.

If you are truly trying to limit hard drive access then the only option I see is to add an L2ARC using a single SSD (you choose the capacity) and you configure your VMs to only reference the one main VM image.

I use VMWare Workstation in which I have a single VM image and use it to create several VMs from and save their differences separately. There is no need for 10 copies of that main image although I do retain a backup copy on DVD. Does this make sense? You might be handling VMs differently.

And like everyone has said, you need to destroy your pool in order to create the pool properly and it does suck because we all have run across wanting to make a change to the pool but it tends to consume a lot of time moving data off and restoring it later.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Sorry, I meant to add some more.

I'd get rid of the ZIL as well. Why would you need it? You could use one of those drives for the L2ARC if you really wanted to if you don't have a SSD laying around but you should locate a SSD large enough to hold your VM images.

Also, you can create an 8 drive RAIDZ2. When entering Volume Manager, select your 8 drives and ensure you reselect RAIDZ2. It will say it's not optimal as well. Apply it and you should have the proper pool size for what you selected. As for what is optimal, well you have 3TB hard drives which are not fast so I doubt you are loosing much performance.
 

survive

Behold the Wumpus
Moderator
Joined
May 28, 2011
Messages
875
Thanks for the articles!

The VMs don't really take even 1TB total - about 200GB total.

If you only have ~200GB of VM's what not just put them on SSD's? I would think a 4 256GB SSD's on a stripe of mirrors would be silly fast with that amount of data.

-Will
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
well this is just plain sill, shame when you consider basic raid boxes like thecus can do this but freenas cant.

It's not silly when you consider the design decisions when ZFS was made. ZFS was intended for enterprise class solutions. It wasn't intended for home use, so it has some limitations that can drive home users crazy. But, if you know about the limitations going in then its not a big problem. It tests your ability to plan ahead.

That's why I created my presentation in the noobie section to discuss those limitations. Got tired of the complaints and people that didn't figure this stuff out until they did things they shouldn't do(like add single disks) and then get even more upset because they lost their pool.
 
Status
Not open for further replies.
Top