Analyzing my deduped SSD-mirror > leaked space

lopr · Aug 11, 2017

Hello, I am a bit lost on whats wrong with my pool (mirror) consisting of two 250GB SSDs with replication on.
This pool started as a single disk 120GB SSD for testing purposes and I added a 250GB disk to make it a mirror with the help of this post by Duran, checked that autoexpand=on and replaced the old disk with another 250GB SSD to make the pool bigger. Deduplication was on for this pool long before I expanded it. So far so good.

In the past I never experienced any problems with dedup, I was happy about the gained extra space.
Now I wanted to see how the pool is doing in terms of deduplication when I ran into some problems:

that's how I usually checked on deduplication ration and how much data is stored there.

Code:

 ~# zfs list SSD-jails
NAME		USED  AVAIL  REFER  MOUNTPOINT
SSD-jails   129G   124G  21.3M  /mnt/SSD-jails

~# zpool list SSD-jails
NAME		SIZE  ALLOC   FREE  EXPANDSZ   FRAG	CAP  DEDUP  HEALTH  ALTROOT
SSD-jails   236G  98.6G   137G		 -	57%	41%  1.46x  ONLINE  /mnt

Now I wanted to have a closer look at the dedup table:

Code:

~# zpool status -D SSD-jails
  pool: SSD-jails
state: ONLINE
  scan: scrub repaired 0 in 0h14m with 0 errors on Fri Aug 11 14:33:25 2017
config:

		NAME											STATE	 READ WRITE CKSUM
		SSD-jails									   ONLINE	   0	 0	 0
		  mirror-0									  ONLINE	   0	 0	 0
			gptid/22e08b06-7d1d-11e7-a6cf-d050995176b8  ONLINE	   0	 0	 0
			gptid/acb41f25-7d12-11e7-96fb-d050995176b8  ONLINE	   0	 0	 0

errors: No known data errors

dedup: DDT entries 3788677, size 536 on disk, 173 in core

bucket			  allocated					   referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
	 1	3.03M	143G   43.4G   48.2G	3.03M	143G   43.4G   48.2G
	 2	 301K   7.52G   4.30G   4.88G	 662K   16.1G   9.18G   10.5G
	 4	 120K   2.21G   1.06G   1.33G	 632K   11.0G   5.23G   6.69G
	 8	 131K	981M	471M	805M	1.33M   9.75G   4.66G   8.11G
	16	47.0K	132M   59.1M	216M	 807K   2.45G   1.02G   3.64G
	32	1.41K   51.2M   15.4M   18.7M	56.8K   2.18G	687M	813M
	64	  300   17.9M   5.17M   5.68M	25.8K   1.59G	378M	421M
   128	   51   2.70M	302K	396K	8.52K	447M   42.2M   58.3M
   256	   37	915K   46.5K	148K	12.0K	382M   17.4M   47.9M
   512	   19   1.63M	 56K	 76K	11.5K	982M   33.3M   46.0M
	1K		3	130K	  5K	 12K	4.43K	244M   8.83M   17.7M
	2K		6	259K   90.5K	104K	17.9K	774M	270M	311M
	4K		1	 512	 512	  4K	6.46K   3.23M   3.23M   25.9M
	8K		3   1.50K   1.50K	 12K	34.9K   17.4M   17.4M	140M
   16K		2	  1K	  1K	  8K	47.2K   23.6M   23.6M	189M
   32K		3   1.50K   1.50K	 12K	 152K   76.2M   76.2M	609M
   64K		4	  2K	  2K	 16K	 409K	205M	205M   1.60G
Total	3.61M	154G   49.3G   55.4G	7.18M	189G   65.2G   81.3G

well, I have no idea what exactly LSIZE, PSIZE and DSIZE (LSIZE = size after decompression, PSIZE = pysical size on disc, DSIZE = ?) are but I expected one column in the Total line will correspond to the allocated space ALLOC of zpool list and one value will with the used space USED of zfs list, well that's not the case. If I calculate Total referenced DSIZE / Total allocated DSIZE = ~1,46 voila, I get the dedup ratio that zpool list is giving me. Maybe I'm not getting it right, so I checked with zdb:

Code:

~#  zdb -U /data/zfs/zpool.cache -D SSD-jails
DDT-sha256-zap-duplicate: 615627 entries, size 579 on disk, 187 in core
DDT-sha256-zap-unique: 3172603 entries, size 528 on disk, 170 in core

dedup = 1.47, compress = 2.90, copies = 1.25, dedup * compress / copies = 3.41

which gives me the same values (the copies > 1 are another riddle)

while reading man zdb I thought I might try zdb -b to display statistics regarding the number, size (logical, physical and allocated) and deduplication of blocks.

Code:

~# zdb -U /data/zfs/zpool.cache -b SSD-jails

Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 228 of 236 ...
leaked space: vdev 0, offset 0x3906662000, size 16384
leaked space: vdev 0, offset 0x3906395000, size 4096
[... many more of these ... ]
leaked space: vdev 0, offset 0x3905f00000, size 8192
leaked space: vdev 0, offset 0x3905978000, size 4096
leaked space: vdev 0, offset 0x3903f63000, size 4096
block traversal size 18446744069241384960 != alloc 105870340096 (leaked 110338506752)

		bp count:		 2339875
		ganged count:		   0
		bp logical:	40618324480	  avg:  17359
		bp physical:   14964867072	  avg:   6395	 compression:   2.71
		bp allocated:  23318990848	  avg:   9965	 compression:   1.74
		bp deduped:	27787157504	ref>1: 615625   deduplication:   2.19
		SPA allocated: 105870340096	 used: 41.78%

		additional, non-pointer bps of type 0:	 221297
		Dittoed blocks on same vdev: 558221

that's not comforting.. but what does that mean?
compression and deduplication values are quite different here, should they be the same as with the -D option?
I read that leaked space is normal on active pools so
I also tried zdb -c to verify the checksum of all metadata blocks

Code:

~# zdb -U /data/zfs/zpool.cache -c SSD-jails

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 228 of 236 ...
57.7M completed (  57MB/s) estimated time remaining: 0hr 29min 17sec		zdb_blkptr_cb: Got error 122 reading <0, 0, 0, 2f>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1534, 0, 1>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 3, 0>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 2, 0>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, 0>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 0, 0>  -- skipping
101M completed (  25MB/s) estimated time remaining: 1hr 06min 33sec		zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, 81>  -- skipping
131M completed (  21MB/s) estimated time remaining: 1hr 18min 26sec		zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, be>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 0, 17d0>  -- skipping
145M completed (  20MB/s) estimated time remaining: 1hr 22min 17sec		zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, e7>  -- skipping
zdb_blkptr_cb: Got error 122 reading <0, 1535, 1, e8>  -- skipping
^C

I did a scrub > no errors.
data-wise the pool seems ok, all jails are running without any hickups but I am a tad uneasy..

so my questions:
1. can I zfs-send the pool to my data-pool, destroy and rebuild the SSD-jails pool and send it back or will this copy the corrupt metadata along?
2. is the DDT table correct and is it just me, not reading it correctly or is it borked because of this leaked.. stuff?
3. any other suggestions to fix this?

Right now I am doing a smarttest on both disks and will then export the pool to see if there's any difference..

edit: you can see the configuration of my config in my signature

lopr · Aug 11, 2017

smart long test showed no errors,
I exportet the pool and ran the blocktest again

Code:

 ~# zdb -e -b SSD-jails
Traversing all blocks to verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 230 of 236 ...
89.7G completed ( 165MB/s) estimated time remaining: 4294628115hr 4294967287min 4294967246sec

calculation of the remaining time is a bit strange,
it now seems to have stopped at 106G completed, it got there in 10min, now no progress for 30min
CPU is still busy, but no more IO on the disks

edit: aborted it and ran zdb -e -c and it seems it completed after 1:45h
however zdb is still running at 50%

Code:

~# zdb -e -c SSD-jails

Traversing all blocks to verify metadata checksums and verify nothing leaked ...

loading space map for vdev 0 of 1, metaslab 230 of 236 ...
109G completed (  30MB/s) estimated time remaining: 290509hr 51min 40sec		294967261sec
		No leaks (block sum matches space maps exactly)

		bp count:		12850609
		ganged count:		6218
		bp logical:	241480446464	  avg:  18791
		bp physical:   75944750080	  avg:   5909	 compression:   3.18
		bp allocated:  117228675072	  avg:   9122	 compression:   2.06
		bp deduped:	27502944256	ref>1: 607039   deduplication:   1.23
		SPA allocated: 89725730816	 used: 35.41%

		additional, non-pointer bps of type 0:	 608543
		Dittoed blocks on same vdev: 4726146

so yay, it seems my pool is ok?
but it seems zdb is corrupt? will verify integrity of my FreeNAS 9.10-u6 install
re-import of the pool did not work via the GUI but zpool import SSD-jails worked without delay.
all jails disappeard in the GUI -> reboot to see if they are back ->
no, zfs get mounted and mountpoint say it's mounted in the right place, but it is not there, i only see .warden* datasets, not any other datasets

any ideas?

edit2:
rebooted again, and all datasets are correctly mounted and all jails have started.
it still throws out a lot of leaked space when ~# zdb -U /data/zfs/zpool.cache -b SSD-jails but i guess that's because it is not exported.

should I be concerned or is everything normal?

dlavigne · Aug 17, 2017

Did you find the answer to your question?

lopr · Aug 18, 2017

No, I still get the "leaked space" when doing a zdb -U /data/zfs/zpool.cache -b SSD-jails in imported state and a locked up zdb after it finished (without errors) in exported state. However all jails and process run normal without any obvious data corruption.

and I still don't understand why the ALLOC value of zpool list SSD-jails does not match any value of the DDT histogram.

at least I double checked my backup routines.
I will revisit this issue when I upgrade from 9.10.2 to 11.1

Important Announcement for the TrueNAS Community.

Analyzing my deduped SSD-mirror > leaked space

lopr

Explorer

lopr

Explorer

dlavigne

Guest

lopr

Explorer

Similar threads