RAIDZ1 vs. RAID 5 && UREs

Status
Not open for further replies.

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
It took a long time to finish but zdb -bb came up with what I'd interpret as 4.45% used up space by metadata.
Code:
Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
48.6M  5.68T   5.58T   5.59T    118K    1.02    95.53  ZFS plain file
874K  1.56G    627M   6.91G   8.09K    2.55     0.12  ZFS directory
  ...
32.5M   285G    249G    260G   7.99K    1.14     4.33  zvol object
  ...
82.1M  5.97T   5.83T   5.86T   73.0K    1.02   100.00  Total


If we now consider that global metadata is stored 3 times and dataset metadata 2 times, then I could indeed have about 2% net metadata.

Edit: After reconsideration I believe that "ZFS plain file" and "zvol object" is actual user data. Metadata is probably all the rest (which are about 30 entries all taking up ~0.00%).
Here's the entire output: http://pastebin.com/d06DcRTW

But if 99.86% is used for user data alone, then the metadata rate must be incredibly low. Maybe due to compression?


 
Last edited:

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
Here's my pool. Compression on the biggest dataset (75% of the data or so) is disabled.

Code:
root@nas ~ # zdb -U /data/zfs/zpool.cache -bb nas1pool

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 2 of 3, metaslab 91 of 130 ...
29.9T completed (2613MB/s) estimated time remaining: 0hr 00min 03sec
  No leaks (block sum matches space maps exactly)

  bp count:  168995019
  ganged count:  0
  bp logical:  21965422945280  avg: 129976
  bp physical:  21905872216576  avg: 129624  compression:  1.00
  bp allocated:  32921662377984  avg: 194808  compression:  0.67
  bp deduped:  0  ref>1:  0  deduplication:  1.00
  SPA allocated: 32921662377984  used: 61.42%

  additional, non-pointer bps of type 0:  62097

Blocks  LSIZE  PSIZE  ASIZE  avg  comp  %Total  Type
  -  -  -  -  -  -  -  unallocated
  2  32K  8K  72.0K  36.0K  4.00  0.00  object directory
  6  3.00K  3.00K  216K  36.0K  1.00  0.00  object array
  1  16K  2K  36.0K  36.0K  8.00  0.00  packed nvlist
  -  -  -  -  -  -  -  packed nvlist size
  13  1.41M  21.5K  468K  36.0K  66.98  0.00  bpobj
  -  -  -  -  -  -  -  bpobj header
  -  -  -  -  -  -  -  SPA space map header
 1.34K  28.9M  13.5M  100M  74.6K  2.14  0.00  SPA space map
  7  688K  688K  1.02M  149K  1.00  0.00  ZIL intent log
 9.06K  145M  27.8M  237M  26.1K  5.22  0.00  DMU dnode
  16  32K  15.5K  396K  24.8K  2.06  0.00  DMU objset
  -  -  -  -  -  -  -  DSL directory
  11  6.00K  1K  36.0K  3.27K  6.00  0.00  DSL directory child map
  9  5.50K  1K  72.0K  8K  5.50  0.00  DSL dataset snap map
  12  37.0K  4K  72.0K  6.00K  9.25  0.00  DSL props
  -  -  -  -  -  -  -  DSL dataset
  -  -  -  -  -  -  -  ZFS znode
  -  -  -  -  -  -  -  ZFS V0 ACL
  161M  20.0T  19.9T  29.9T  190K  1.00  100.00  ZFS plain file
 67.1K  251M  26.4M  453M  6.75K  9.50  0.00  ZFS directory
  8  8K  4.50K  192K  24.0K  1.78  0.00  ZFS master node
  -  -  -  -  -  -  -  ZFS delete queue
  -  -  -  -  -  -  -  zvol object
  -  -  -  -  -  -  -  zvol prop
  -  -  -  -  -  -  -  other uint8[]
  -  -  -  -  -  -  -  other uint64[]
  -  -  -  -  -  -  -  other ZAP
  -  -  -  -  -  -  -  persistent error log
  6  656K  48.0K  396K  66.0K  13.67  0.00  SPA history
  -  -  -  -  -  -  -  SPA history offsets
  -  -  -  -  -  -  -  Pool properties
  -  -  -  -  -  -  -  DSL permissions
  -  -  -  -  -  -  -  ZFS ACL
  -  -  -  -  -  -  -  ZFS SYSACL
  -  -  -  -  -  -  -  FUID table
  -  -  -  -  -  -  -  FUID table size
  1  1K  512  36.0K  36.0K  2.00  0.00  DSL dataset next clones
  -  -  -  -  -  -  -  scan work queue
  -  -  -  -  -  -  -  ZFS user/group used
  -  -  -  -  -  -  -  ZFS user/group quota
  -  -  -  -  -  -  -  snapshot refcount tags
  -  -  -  -  -  -  -  DDT ZAP algorithm
  -  -  -  -  -  -  -  DDT statistics
  -  -  -  -  -  -  -  System attributes
  -  -  -  -  -  -  -  SA master node
  8  12.0K  5.00K  192K  24.0K  2.40  0.00  SA attr registration
  16  256K  36.0K  384K  24.0K  7.11  0.00  SA attr layouts
  -  -  -  -  -  -  -  scan translations
  -  -  -  -  -  -  -  deduplicated block
  25  20.5K  3.50K  252K  10.1K  5.86  0.00  DSL deadlist map
  -  -  -  -  -  -  -  DSL deadlist map hdr
  2  1.50K  1K  36.0K  18.0K  1.50  0.00  DSL dir clones
  -  -  -  -  -  -  -  bpobj subobj
  26  188K  57.0K  936K  36.0K  3.30  0.00  deferred free
  -  -  -  -  -  -  -  dedup ditto
  5  33.5K  6.00K  180K  36.0K  5.58  0.00  other
  161M  20.0T  19.9T  29.9T  190K  1.00  100.00  Total

space map refcount mismatch: expected 347 != actual 294


No zvol's on here, just files. Most of the files are relatively large as it's mostly media.

I guess the metadata overhead is just smaller than 0.01%, as it's rounding "ZFS plain file" to 100.00%.
 

bestboy

Contributor
Joined
Jun 8, 2014
Messages
198
Well, it seems that the DMU dnode (Data Management Unit) is making up the biggest portion of the metadata, but we can see high compression ratios (> 5) for that entry:
Code:
Blocks  LSIZE   PSIZE   ASIZE     avg    comp   %Total  Type
  152K  2.37G    391M   1.21G   8.18K    6.21     0.02  DMU dnode
 9.06K   145M   27.8M    237M   26.1K    5.22     0.00  DMU dnode


LSIZE: logical size. The size of the data without compression, raidz or gang overhead.
PSIZE: physical size of the block on disk after compression
ASIZE: allocated size, total size of all blocks allocated to hold this data including any gang headers or raid-Z parity information


The OpenZFS developer wiki also lists an interesting talk from 2008 by Max Bruning that explains the data layout. It also mentions that basically all metadata is compressed (even if the dataset data is not):
If you are a little interested in the technical background, then I can recommend the first 10 minutes of this talk.
If you are more than just a little interested in the technical background, then feel free watch it to the end and prepare to be in awe of what a beast ZFS really is :). There's also a formal specification document for the ZFS on-disk format (PDF, ~500 KiB) that you might want to take a look at.
 
Status
Not open for further replies.
Top