L2ARC, Shared NVMe, etc

Elliot Dierksen · Oct 15, 2018

After some research reading older posts in the forums, it seems obvious the using all of my Optane 900P 280G for an SLOG is total overkill. I have posted some of my basis results on the speed on the Optane here https://forums.freenas.org/index.php?posts/485730/ and my initial steps for partitioning the Optane here https://forums.freenas.org/index.php?posts/484003/. I have redone the split of my Optane to use some of it for an L2ARC. No particular reason, really just because I can.

Code:

gpart destroy -F nvd0 # just making sure

gpart create -s GPT nvd0
gpart add -t freebsd-zfs -a 1m -l sloga -s 16G nvd0
gpart add -t freebsd-zfs -a 1m -l slogb -s 16G nvd0
gpart add -t freebsd-zfs -a 1m -l slogc -s 16G nvd0
gpart add -t freebsd-zfs -a 1m -l l2arca nvd0 # takes the rest

zpool add RAIDZ2-I log nvd0p1
zpool add RAIDZ2-I cache nvd0p4

I was trying to think of what I could do to see if it was really using that L2ARC. My zpool iostat -v would seem to indicate so.

Code:

root@freenas2:/mnt/RAIDZ2-I/VMWare # zpool iostat -v RAIDZ2-I
										   capacity	 operations	bandwidth
pool									alloc   free   read  write   read  write
--------------------------------------  -----  -----  -----  -----  -----  -----
RAIDZ2-I								5.75T  8.75T	 43	 53  4.21M  1.11M
  raidz2								5.52T  1.73T	 41	 18  4.07M   393K
	gptid/bd041ac6-9e63-11e7-a091-e4c722848f30	  -	  -	  2	  5   824K  84.5K
	gptid/bdef2899-9e63-11e7-a091-e4c722848f30	  -	  -	  4	  6   821K  84.5K
	gptid/bed51d90-9e63-11e7-a091-e4c722848f30	  -	  -	  3	  6   821K  84.5K
	gptid/bfb76075-9e63-11e7-a091-e4c722848f30	  -	  -	  4	  6   821K  84.5K
	gptid/c09c704a-9e63-11e7-a091-e4c722848f30	  -	  -	  3	  5   822K  84.5K
	gptid/c1922b7c-9e63-11e7-a091-e4c722848f30	  -	  -	  3	  6   821K  84.5K
	gptid/c276eb75-9e63-11e7-a091-e4c722848f30	  -	  -	  3	  6   821K  84.5K
	gptid/c3724eeb-9e63-11e7-a091-e4c722848f30	  -	  -	  4	  5   820K  84.5K
  raidz2								 233G  7.02T	  1	 34   144K   728K
	gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.5K   155K
	gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.4K   155K
	gptid/a41758d7-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.5K   155K
	gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.5K   155K
	gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.3K   155K
	gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.4K   155K
	gptid/a94711a5-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.5K   155K
	gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30	  -	  -	  0	  7  24.5K   155K
logs										-	  -	  -	  -	  -	  -
  nvd0p1								8.11M  15.9G	  0	 89	  8  6.27M
cache									   -	  -	  -	  -	  -	  -
  nvd0p4								9.86G   203G	  0	  5	514  3.70M
--------------------------------------  -----  -----  -----  -----  -----  -----

I can't tell if the ZFS reporting stats are showing that or not.

What I was trying to do is test from a FreeBSD VM running on one of my ESXi hosts. ESXi mounts the datastore on FreeNAS via NFS on a 10Gb link (detail all in my sig).

Code:

# dd if=/dev/random of=/tmp/write.foo bs=1m count=16384
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 293.751313 secs (58484400 bytes/sec)
# dd of=/dev/null if=/tmp/write.foo bs=1m
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 47.265901 secs (363472795 bytes/sec)
# dd of=/dev/null if=/tmp/write.foo bs=1m
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 49.936403 secs (344034975 bytes/sec)
# dd of=/dev/null if=/tmp/write.foo bs=1m
16384+0 records in
16384+0 records out
17179869184 bytes transferred in 47.397839 secs (362461022 bytes/sec)

I guess my question is does that indicate that I am using some of the L2ARC as I suspect? Since my test file is 16G, that could certainly all fit into the regular ARC. I can't see how I would be hurting anything, but is there a better way to see if this is helping?

Edit: I forgot to post the zpool status.

Code:

root@freenas2:/mnt/RAIDZ2-I/VMWare # zpool status -v RAIDZ2-I
  pool: RAIDZ2-I
 state: ONLINE
  scan: scrub repaired 0 in 0 days 02:47:24 with 0 errors on Wed Oct  3 23:54:43 2018
config:

		NAME											STATE	 READ WRITE CKSUM
		RAIDZ2-I										ONLINE	   0	 0	 0
		  raidz2-0									  ONLINE	   0	 0	 0
			gptid/bd041ac6-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bdef2899-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bed51d90-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/bfb76075-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c09c704a-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c1922b7c-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c276eb75-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
			gptid/c3724eeb-9e63-11e7-a091-e4c722848f30  ONLINE	   0	 0	 0
		  raidz2-1									  ONLINE	   0	 0	 0
			gptid/a1b7ef4b-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a2eb419f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a41758d7-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a5444dfb-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a6dcd16f-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a80cd73c-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/a94711a5-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
			gptid/aaa6631d-3c2a-11e8-978a-e4c722848f30  ONLINE	   0	 0	 0
		logs
		  nvd0p1										ONLINE	   0	 0	 0
		cache
		  nvd0p4										ONLINE	   0	 0	 0
		spares
		  gptid/4abff125-23a2-11e8-a466-e4c722848f30	AVAIL

errors: No known data errors

kdragon75 · Oct 15, 2018

What's your ZFS record size on your NFS dataset?

kdragon75 · Oct 15, 2018

Elliot Dierksen said:
gpart create -s GPT nvd0 gpart add -t freebsd-zfs -a 1m -l sloga -s 16G nvd0 gpart add -t freebsd-zfs -a 1m -l slogb -s 16G nvd0 gpart add -t freebsd-zfs -a 1m -l slogc -s 16G nvd0 gpart add -t freebsd-zfs -a 1m -l l2arca nvd0 # takes the rest

I also see your not saving any space for wear leveling. I guess if you don't have it in a large environment it's not as big of a concern but still nice to have

Elliot Dierksen · Oct 15, 2018

kdragon75 said:
What's your ZFS record size on your NFS dataset?

The dataset is set to inherit, and I don't see any explict setting at the dataset or pool level. What is the default? Whatever that is is where I am.

kdragon75 said:
I also see your not saving any space for wear leveling. I guess if you don't have it in a large environment it's not as big of a concern but still nice to have

No, definitely not a large environment. I did think about that, I haven't done anything about.

kdragon75 · Oct 15, 2018

Elliot Dierksen said:
The dataset is set to inherit, and I don't see any explict setting at the dataset or pool level. What is the default? Whatever that is is where I am.

The default is from what I understand, dynamic ish and relatively small. This can cause the L2ARC to be extremely inefficient as each record/block in the L2ARC has a pointer stored in the ARC. The smaller the record, the more pointers per GB of L2ARC. I understand each pointer as being about 70 bytes. From that we can calculate the max ARC used gived a L2ARC size and record/block size. Or inversely given a working set size, we can calculate how much L2ARC we need to supplement the ARC. It all simple math.

Elliot Dierksen · Oct 15, 2018

kdragon75 said:
The default is from what I understand, dynamic ish and relatively small. This can cause the L2ARC to be extremely inefficient as each record/block in the L2ARC has a pointer stored in the ARC. The smaller the record, the more pointers per GB of L2ARC. I understand each pointer as being about 70 bytes. From that we can calculate the max ARC used gived a L2ARC size and record/block size. Or inversely given a working set size, we can calculate how much L2ARC we need to supplement the ARC. It all simple math.

I am certainly open to changing things. I have a whole bunch of lab VM's, but there aren't on all the time. The only things that do run all the time are my FreeBSD mail server, Vcenter appliance, and Powerchute appliance. I might have a Cisco wireless lan controller going soon if I get off my duff. Do you have suggestions for what size to use, or how to calculate an optimal value? Here are the 3 machines in question.

kdragon75 · Oct 15, 2018

The issue is more the record size. If we assume an 8K record size and we have a 100G L2ARC, 100,000,000K/8K = 12,500,000 pointers x 70B = 875,000,000B = 875M but if we change that to 16K blocks, that get cut in half down to 437.5M or at the recommended 64K for VM storage (at least for iSCSI) 109M. Now estimating the L2ARC size for desired cach amount is a bit tricker as the L2ARC take some of the ARC. I'm not that good at algebra but if you are I'm sure its not to tricky...

Stux · Oct 15, 2018

It takes a while for a non busy server to begin to populate l2arc. I’m not certain but I believe it functions as an eviction cache from arc, and I believe there is another process which proactively populates it from arc with a write throttle.

My experience is if your slog is not running at high utilization, and you have the ram, then there is no reason to not use some of your fast nvme for l2arc

Wear leveling is apparently not an issue with Optane.

Certainly seems to me that you have 11ish GB of l2arc in use. You’d expect your l2arc hit ratio to be less than ARC.

kdragon75 · Oct 16, 2018

Stux said:
My experience is if your slog is not running at high utilization,

Im guessing you mean L2ARC?

Stux · Oct 16, 2018

kdragon75 said:
Im guessing you mean L2ARC?

No. OP is talking about sharing and nvme between SLOG and L2ARC duties

Elliot Dierksen · Oct 16, 2018

Stux said:
Wear leveling is apparently not an issue with Optane.

That is nice to know. I was thinking I might have to cycle between the partitions every now and again. I prefer to be a lazy sysadmin if I can. -)

kdragon75 · Oct 16, 2018

Stux said:
no reason to not

Yep, double negative got me there.

wreedps · Oct 20, 2018

Cool!

Important Announcement for the TrueNAS Community.

L2ARC, Shared NVMe, etc

Elliot Dierksen

Guru

kdragon75

Wizard

kdragon75

Wizard

Elliot Dierksen

Guru

kdragon75

Wizard

Elliot Dierksen

Guru

kdragon75

Wizard

Stux

MVP

kdragon75

Wizard

Stux

MVP

Elliot Dierksen

Guru

kdragon75

Wizard

wreedps

Patron

Similar threads