SOLVED RaidZ2 available size when use 4TB hard drives

Status
Not open for further replies.

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
Hi

I use Freenas 9.3.1 and other ZFS system(Nexenta) , I find out a BIG problem.Theirs available space are different when i use 4TB hard drives, their difference about 2TB,why?:eek:o_O

<freenas >
zfs version:5
upload_2016-3-29_23-13-56.png

<other system>
zfs version:5
upload_2016-3-29_23-15-31.png


<freenas>
zpool list (raw capacity) : 36.2T
zfs list (Available space) : 26.8T
upload_2016-3-29_23-16-55.png


<other system>
zpool list (raw capacity) : 36.2T
zfs list (Available space) : 28.4T
upload_2016-3-29_23-21-8.png


<freenas>
use 4TB hard drives*10 create raidz2 group
upload_2016-3-29_23-22-12.png


<other system>
use 4TB hard drives*10 create raidz2 group
upload_2016-3-29_23-23-41.png
 
Last edited:

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Why is your mount point not the standard freenas mount point in your second system listed?

Lots of things affect size and u wouldn't worry about it. You can look into record size and what happens when when you half write to a block.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Plus smaller writes take up more space for parity.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Is the hardware identical between the freenas and the "other" system? In freenas, those are the raw storage values, not the usable space.
 

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
Why is your mount point not the standard freenas mount point in your second system listed?

Lots of things affect size and u wouldn't worry about it. You can look into record size and what happens when when you half write to a block.

The second is Nexenta,so their mount point are different.


I will check their record size , but I remember their record size are 128K(default)
 
Last edited:

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
Is the hardware identical between the freenas and the "other" system?

2 hardware config are the same.(MB:X9SCL-F ,CPU:E3-1230v2 ,RAM:32GB ,data hard drives :HGST 4TB *10)

In freenas, those are the raw storage values, not the usable space.

I check their "zfs list" total space (used+available)
freenas=10.5T+16.3T=26.8T
other system(Nexenta) =28.4T

their different about 2TB
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
2 hardware config are the same.(MB:X9SCL-F ,CPU:E3-1230v2 ,RAM:32GB ,data hard drives :HGST 4TB *10)



I check their "zfs list" total space (used+available)
freenas=10.5T+16.3T=26.8T
other system(Nexenta) =28.4T

their different about 2TB

What did the FreeNAS system report for free space when it was empty?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Okay, then, what happens if you put all the stuff on the FreeNAS onto the Nexenta box?
 

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
What did the FreeNAS system report for free space when it was empty?
When dataset no data, available is 26.7TB
upload_2016-3-30_18-1-35.png


when I import this freenas Dataset to nexenta ,I have some error
Code:
root@nxs168:/volumes# zpool import vol
cannot import 'vol': pool may be in use from other system, it was last accessed by freenas.local (hostid: 0xc7fa733b) on Wed Mar 30 18:03:16 2016
use '-f' to import anyway

root@nxs168:/volumes# zpool import -f vol
This pool uses the following feature(s) not supported by this system:
        com.delphix:embedded_data
cannot import 'vol': unsupported version or feature

upload_2016-3-30_18-16-47.png


I have tried what you asked but during the transfer , there was some errors , but I don't think the errors leads to 2TB of lost in data
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, I didn't want you to import the FreeNAS pool on Nexenta. I'm trying to establish whether maybe we're just seeing something like space lost to RAIDZ2 parity. RAIDZ2 uses a variable amount of space for parity which is dependent upon the size of the block being written.
 

maglin

Patron
Joined
Jun 20, 2015
Messages
299
Kind of on topic. If you have a media pool that 99% of the files are large files will you gain more available free space by using a very large block size due to how parity data is provisioned?

I ask because I see about 2TiB lost on my 3x8TB RAIDZ1. I have about 14.1 TiB available and was expecting more around 15. If so I need to rebuild my pool.

And my RAIDz1 is temporary. I plan to get 1 disc a month until I have 9. And run a hot spare and rebuild with a RAIDz2 once I have at least 6 discs. IE. I know it's not to redundant atm.


Sent from my iPhone using Tapatalk
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Kind of on topic. If you have a media pool that 99% of the files are large files will you gain more available free space by using a very large block size due to how parity data is provisioned?

That's not quite it. This is hard to wrap your head around because it is a multivariable problem.

Look at this:

RAIDZ.png


This is RAIDZ1. Now there's a bunch of interesting stuff here. First, parity is - as you would expect - at least partially related to the number of devices in the vdev. If we look at LBA0 where a 32KB block of data is stored with two parity sectors, this is what people are "expecting" with what they think of as RAID5. On a RAID5 array, it'd even be true.

But here's the thing. ZFS doesn't pre-assign and pre-calculate where the parity goes. It just lays it down contiguously. This is great in some ways (avoids the RAID5 "write hole") but less great in some other ways. So the question then becomes, what are the other cases? Well, consider the case of a single 4KB block - so look at LBA3, the brown. For that, you end up writing a data and a parity block, meaning that you occupy 8KB to write 4KB. But it gets even a bit worse. In order to avoid permanently stranding a sector during a free and reallocate cycle, ZFS pads out writes. Look at LBA7 in the magenta. A 16KB block of data results in a parity block, as you might expect, but also a padding block, which you might not expect. That means that if that block is later freed, another 16KB data block will fit there, or a 12KB data block and a 4KB data block, or three 4KB data blocks.

So this leads us to some perverse situations. If you're laying down 4KB or 8KB blocks, you waste an equivalent amount of space to parity and (for 8KB) padding. A 12KB block is actually optimal, a 16KB block has 8K overhead for parity/padding, and so on. Certain block sizes, like 32KB, fit perfectly.

As the block size increases, there's less opportunity for waste. This is one of the reasons RAIDZ is so excellent at sequential data storage. But if you look at something like RAIDZ3, it has totally awful characteristics for things like block storage, because when you're writing a 4K block to a RAIDZ3, you're actually consuming 16KB of the pool's space - 4KB data and 12KB parity.

So there are two things that work to minimize this. Store large sequential files, and, make wider RAIDZn vdevs. Also, compression helps, because often it will reduce the size of blocks so that the parity "fits" better.
 

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
No, I didn't want you to import the FreeNAS pool on Nexenta. I'm trying to establish whether maybe we're just seeing something like space lost to RAIDZ2 parity. RAIDZ2 uses a variable amount of space for parity which is dependent upon the size of the block being written.

sorry, i misunderstand your mean.

Can you tell me,why the new pool on the 2 system(Freenas&Nexenta) have different AVAILABLE space? What parameters or setting should I check it?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
sorry, i misunderstand your mean.

Can you tell me,why the new pool on the 2 system(Freenas&Nexenta) have different AVAILABLE space? What parameters or setting should I check it?

I don't know. Can you please try that on both platforms as "zpool list -p", which will give a more exact accounting? I'm wondering if perhaps it is some strange issue like TB vs TiB. I'd expect 28.4TB to downconvert to about 25.8TiB so that's probably not it. It's also possible that there's some different estimation factor used, because as we see from the discussion above, the amount of data you can actually store on a RAIDZ2 pool is not a constant, but I don't actually recall how they arrive at the free space estimate.

The key factor here would be that the space available isn't a guarantee. If I sit there for a few days writing out 4KB blocks to your pool, I can run your pool out of space with much less data written than what it is reporting free.
 

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
I don't know. Can you please try that on both platforms as "zpool list -p", which will give a more exact accounting?
Running "zpool list -p" on both platforms , their size very similar.
but running "zfs list -p" , they differ 2TiB :oops:

Freenas=29411297806800 =26.74941952755034 TiB
Nexenta=31265012812104 =28.43536350347131 TiB

<Freenas>
Code:
[root@freenas] ~# zpool list -p vol
NAME   SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT
vol   39857296506880  1179648  39857295327232         -     0%      0  1.00x  ONLINE  /mnt

[root@freenas] ~# zfs list -p vol
NAME    USED           AVAIL   REFER  MOUNTPOINT
vol   702000  29411297806800  224640  /mnt/vol


<Nexenta>
Code:
root@nxs168:/volumes# zpool list -p vol
NAME   SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
vol   39857296506880  440832  39857296066048         -      0  1.00x  ONLINE  -

root@nxs168:/volumes# zfs list -p vol
NAME    USED           AVAIL  REFER  MOUNTPOINT
vol   292536  31265012812104  68544  /volumes/vol
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Can you check what the ashift values are for each pool?

Anyways, it appears that "zfs list" reaches its estimate of pool free space based on imagining that the pool is filled with 128K blocks, which is maybe useless. It's possible Nexenta uses something different to compute it.
 

Fei

Explorer
Joined
Jan 13, 2014
Messages
80
Can you check what the ashift values are for each pool?

Anyways, it appears that "zfs list" reaches its estimate of pool free space based on imagining that the pool is filled with 128K blocks, which is maybe useless. It's possible Nexenta uses something different to compute it.

I found Nexenta ashift=9 , Freenas=12 ,it is root cause?
Code:
root@nxs169:/volumes# zdb |grep ashift
            ashift: 9
            ashift: 9
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
  • Like
Reactions: Fei

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I found Nexenta ashift=9 , Freenas=12 ,it is root cause?
Code:
root@nxs169:/volumes# zdb |grep ashift
            ashift: 9
            ashift: 9

Yeah, probably that. Try creating the pool on Nexenta with ashift=12 and see what the result is. It will still be a little different because FreeNAS reserves 2GB on each disk for swap.
 
Status
Not open for further replies.
Top