Is a ZIL recommended for this configuration?

Status
Not open for further replies.

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
Hello.

My FreeNAS server (v.11.1 with 6 consumer grade 4TB HDs, and 16GB ECC RAM, the maximum supported by the mainboard) is configured as:


Code:
  NAME		  STATE	 READ WRITE CKSUM
	zpool		 ONLINE	   0	 0	 0
	  mirror-0	ONLINE	   0	 0	 0
		ada0	  ONLINE	   0	 0	 0
		ada1	  ONLINE	   0	 0	 0
	  mirror-1	ONLINE	   0	 0	 0
		ada2	  ONLINE	   0	 0	 0
		ada3	  ONLINE	   0	 0	 0
	  mirror-2	ONLINE	   0	 0	 0
		ada4	  ONLINE	   0	 0	 0
		ada5	  ONLINE	   0	 0	 0


This configuration was decided looking for the maximum performance when manipulating files by NFS (the primary use of this server). First question, is this the recommended configuration for this use?


The client is a unique computer (MacOS) connected by a 1Gbps ethernet connection. Of course, the connection is saturated when copying to (and from) the server to the local SSD. But...

1) When duplicating big files (30GB+) by NFS, I'm getting only around 55 MB/s

2) When duplicating folders with a lot of small files (28,000+) by NFS, I'm getting only around 18 MB/s


So, I'm thinking of using the (unique) PCIe slot in the server to connect a pair of (M.2) SSD disks (via PCIe adapter), and configure it as a (mirror) ZIL. Do you think is a good idea? Will I see an improvement?

If yes, which M.2 SSD disks are recommended?

1) The SLC ones? Maybe something like?: https://www.mouser.com/ProductDetail/Swissbit/SFSA008GM1AA1TO-I-DB-216-STD?qs=sGAEpiMZZMsSk9nu1rLV0I7HeT0NGf3f6Y/MYJn0q6sAXL1HM9Olng== 2 Million Hours Reliability (MTBF)
Can't find much more SLCs in the M.2 form, is this type of SSDs almost dead?


2) Or maybe a new fancy 3D MLC one like the Samsung one or the Intel?:
https://www.samsung.com/us/computin...ate-drives/ssd-850-evo-m-2-250gb-mz-n5e250bw/ 1.5 Million Hours Reliability (MTBF)

https://www.intel.com/content/www/u...series/dc-s3520-150gb-m-2-80mm-6gbps-3d1.html 2 Million Hours Reliability (MTBF)
https://www.intel.com/content/www/u...ries/pro-7600p-series-128gb-m-2-80mm-3d2.html 1.6 Million Hours Reliability (MTBF)



--
SAMBA was declined in favor of NFS for performance reasons
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Terminology is important. You always have a ZIL. You can offload the ZIL to a dedicated device, in which case you have an SLOG (separate log).

https://www.mouser.com/ProductDetai...Sk9nu1rLV0I7HeT0NGf3f6Y/MYJn0q6sAXL1HM9Olng== 2 Million Hours Reliability (MTBF)
Can't find much more SLCs in the M.2 form, is this type of SSDs almost dead?
That crummy little thing is going to:
  • Slow you down
  • Not actually protect your data when needed
Or maybe a new fancy 3D MLC one like the Samsung one or the Intel?:
https://www.samsung.com/us/computin...ate-drives/ssd-850-evo-m-2-250gb-mz-n5e250bw/ 1.5 Million Hours Reliability (MTBF)
No, definitely not a consumer SSD.

Do yourself a favor and read the hardware recommendations guide, where this matter is discussed.

SAMBA was declined in favor of NFS for performance reasons
Performance? No, that doesn't make any sense. Any halfway-decent server can saturate GbE with SMB. No sync writes and thus no SLOG required.
 

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
Terminology is important. You always have a ZIL. You can offload the ZIL to a dedicated device, in which case you have an SLOG (separate log).


Thank you very much for your answer Ericloewe.

That crummy little thing is going to:

  • Slow you down
  • Not actually protect your data when needed

Ok, understood, now it's clearer to me what I need.



No, definitely not a consumer SSD.


Do yourself a favor and read the hardware recommendations guide, where this matter is discussed.


Ok, I see now. The SSD recommended in the guide is the P3700. Its more important technical specifications are:
  • Capacity: 400 GB ----> Much much more than I need for a SLOG
  • Endurance Rating (Lifetime Writes): 7.3 PBW (JEDEC Workload), 10 DWPD ----> Is that 7300 TBW? WOW, just WOW!
  • Mean Time Between Failures (MTBF): 2,000,000 Hrs
  • Uncorrectable Bit Error Rate (UBER): less than 1 sector per 1e17 bits read
  • Price: About $500? ----> I'm really surprised about it's technical specifications, but, sorry, simply can't justify this price for a home NAS
And features:
  • Enhanced Power Loss Data Protection: Yes
  • High Endurance Technology (HET): Yes
  • End-to-End Data Protection: Yes



For the price problem, I'm looking to the Intel S3520:
  • Capacity: 150 GB
  • Endurance Rating (Lifetime Writes): 412 TBW ----> Instead of "7.3 PBW (JEDEC Workload), 10 DWPD", is that 7.3 PBW equivalent to 7300 TBW?
  • Mean Time Between Failures (MTBF): 2,000,000 Hrs ----> Same
  • Uncorrectable Bit Error Rate (UBER): 1 sector per 10^17 bits read
  • Price: About $115
And features:
  • Enhanced Power Loss Data Protection: Yes ----> Same
  • High Endurance Technology (HET): No ----> Another difference (really matters? I really don't know)
  • End-to-End Data Protection: Yes ----> Same

So... Is this SSD a viable alternative to the P3700 for a home NAS?
 
Last edited:

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Here's probably a better idea: Use SMB and you won't have any sync writes to worry about.

In a more academic discussion, the absolute best SSD for SLOG out there is the P4800X. The P3700 is one category down, but still crazy fast. Then you have Intel's other NVMe SSDs with power loss protection. The next step down is an Intel S3710, which is SATA and thus handicapped in comparison to the NVMe drives, but it's the best SATA SLOG device out there.
 

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
Thanks Ericloewe for this detailed answer. Now I know where to begin to search.

And, you are right about use SMB instead of NFS is the best solution, I'm aware of that. The problem, of course is with macOS (the unique client of the zfs server) and its disastrous SMB support...
 
Last edited:

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@vicmarto You might be interested in this test;

https://forums.freenas.org/index.php?threads/testing-the-benefits-of-slog-using-a-ram-disk.56561/

Although written with virtual machines in mind, it equally applies to NFS file transfers that use sync writes. No need to alter vfs.zfs.dirty_data_sync, and of course take care with the zpool add and remove commands.

This data's a bit old, but still relevant:

https://b3n.org/ssd-zfs-zil-slog-be...500-seagate-600-pro-crucial-mx100-comparison/

and another ref:

https://www.servethehome.com/buyers...as-servers/top-picks-freenas-zil-slog-drives/
https://www.servethehome.com/what-is-the-zfs-zil-slog-and-what-makes-a-good-one/
 

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
Thank you very much KrisBee, this has been a very very util information. Have follower the guide from Stux and seems that, with my actual configuration, a SLOG is not really util:

Code:
sync=disabled ----------> 49 MB/s

sync=standard ----------> 47 MB/s

sync=always ------------> 2 MB/s


Code:
And with ramdisk SLOG:
sync=disabled ----------> 48 MB/s

sync=standard ----------> 51 MB/s

sync=always ------------> 46 MB/s



BUT today, searching about RAIZ, MIRROR and its differences in performance... This is my actual configuration:

Code:
NAME		  STATE	 READ WRITE CKSUM
   zpool		 ONLINE	   0	 0	 0
	 mirror-0	ONLINE	   0	 0	 0
	   ada0	  ONLINE	   0	 0	 0
	   ada1	  ONLINE	   0	 0	 0
	 mirror-1	ONLINE	   0	 0	 0
	   ada2	  ONLINE	   0	 0	 0
	   ada3	  ONLINE	   0	 0	 0
	 mirror-2	ONLINE	   0	 0	 0
	   ada4	  ONLINE	   0	 0	 0
	   ada5	  ONLINE	   0	 0	 0


Maybe now I'm understanding? Seems this 47 MB/s in write speed is normal in a MIRROR configuration... because the write speed of the entire zpool is the same as an individual HDD!!!! I'm correct?

Maybe with a double RAIDZ1 config I get a much better write performance, and a little worst read performance??
Something like this:

Code:
NAME		  STATE	 READ WRITE CKSUM
   zpool		 ONLINE	   0	 0	 0
	 raidz1-0	ONLINE	   0	 0	 0
	   ada0	  ONLINE	   0	 0	 0
	   ada1	  ONLINE	   0	 0	 0
	   ada2	  ONLINE	   0	 0	 0
	 raidz1-1	ONLINE	   0	 0	 0
	   ada3	  ONLINE	   0	 0	 0
	   ada4	  ONLINE	   0	 0	 0
	   ada5	  ONLINE	   0	 0	 0


I'm correct?
 

Zredwire

Explorer
Joined
Nov 7, 2017
Messages
85
You will get worse write performance with RaidZ1 or RaidZ2 than with mirrors. This is because of the overhead from writing parity information. Read rates will be close to the same for Mirror, RaidZ1 and Raidz2 (this all assumes equal number of drives for each configuration). It would help for you to post your hardware specs as something looks wrong with your numbers. As a side note you definitely do not want to use RAIDZ1 with 4TB drives.
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
xfer speeds do look a little low even for NFS , and I assume you quoted avgs. You have a gigabit connection, but have you tested network speed between server and client with iperf?

Have you set up the FreeNAS server as NFSv4 with sys security, or something else? Are you able to check the mount params on the mac os client? Did you add your client to the hostname list under the network global config tab?
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215
Maybe now I'm understanding? Seems this 47 MB/s in write speed is normal in a MIRROR configuration... because the write speed of the entire zpool is the same as an individual HDD!!!! I'm correct?

Sorry, that is incorrect; at least in regards to the Volume/Pool itself. What you are thinking is a along the lines of a single Volume/Pool consisting of a single VDev...

For example a VDev that is RaidZ2 with 6 Disks will basically function at a speed of one drive

However, if I have a Volume/Pool that consists of 2 RaidZ2 VDevs; then that will roughly function at the speed of 2 drives (one per VDev).

So with mirroring, you should definitely see higher speeds since you have 3 VDevs in your Volume/Pool. I would suspect that there is something else bottle-necking.

FWIW, you may want to take some time to look at the links in my sig which help explain things better.
 

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
I see now, you are right Mirfster, thanks very much to all.

Seems something is not right in my macOS. I'm investigating now what can be.

What I don't understand now is because if I mount the NFS share with the async parameter, why the transfer speed is different with sync=disabled and sync=standard?

My NFS mount parameters:
async,nodev,nosuid,automounted,noatime,vers=3,tcp,port=2049,nomntudp,hard,intr,noresvport,negnamecache,callumnt,locallocks,quota,rsize=65536,wsize=65536,readahead=128,dsize=65536,rdirplus,nodumbtimr,timeo=10,maxgroups=16,acregmin=5,acregmax=60,acdirmin=5,acdirmax=60,nomutejukebox,nonfc,sec=sys

My transfer speeds:
  • Duplicating a 28GB single file with sync=standard: around 30MB/s
  • Duplicating the same 28GB single file with sync=disabled: around 65MB/s
  • Duplicating the same 28GB single file with sync=always: around 2MB/s
Why?! The mount parameters are async!!

Tomorrow will try to repeat the test with SMB. Some other test I can do to know there is the problem? (sometimes macOS can be really weird.............)
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
Your mac client is making a version 3 connection. You've not said how your FreeNAS nfs server and exports are configured. How do you know the FreeNAS nfs is honouring the client async param? Have you compared xfer speeds when client mounts async to mounting sync?
 

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
@vicmarto You didn't return to this thread, so I wondered if you answered your own question:

What I don't understand now is because if I mount the NFS share with the async parameter, why the transfer speed is different with sync=disabled and sync=standard?

Did you assume that an async NFS client mount meant the NFS server was operating in async mode? Using zilstat at the CLI would show if & when your zil/SLOG was in use.
 

vicmarto

Explorer
Joined
Jan 25, 2018
Messages
61
Thanks KrisBee for your patience, zilstat has been very helpful, and please, excuse the delay in reply.

Have been doing a lot of testing with my Mac (last OS version) and the FreeNAS server (last OS version) to debug this issue. Have discovered some thing, like:
  • When I reboot the Mac, the share is mounted as async:
Code:
$ nfsstat -m
/zpool/tmp/Users/vicmarto from 192.168.1.250:/mnt/zpool/tmp/Users/vicmarto

  -- Original mount options:

	General mount flags: 0x10500058 async,nodev,nosuid,automounted,noatime,nobrowse

	NFS parameters: intr,locallocks,rsize=65536,wsize=65536,readahead=128,dsize=65536,rdirplus

	File system locations:

	  /mnt/zpool/tmp/Users/vicmarto @ 192.168.1.250 (192.168.1.250)

  -- Current mount parameters:

	General mount flags: 0x14500058 async,nodev,nosuid,automounted,noatime,nobrowse multilabel

	NFS parameters: vers=3,tcp,port=2049,nomntudp,hard,intr,noresvport,negnamecache,callumnt,locallocks,quota,rsize=65536,wsize=65536,readahead=128,dsize=65536,rdirplus,nodumbtimr,timeo=10,maxgroups=16,acregmin=5,acregmax=60,acdirmin=5,acdirmax=60,nomutejukebox,nonfc,sec=sys

	File system locations:

	  /mnt/zpool/tmp/Users/vicmarto @ 192.168.1.250 (192.168.1.250)

	Status flags: 0x0



And zilstat confirm the server is honoring it:
Code:
# zilstat
   N-Bytes  N-Bytes/s N-Max-Rate	B-Bytes  B-Bytes/s B-Max-Rate	ops  <=4kB 4-32kB >=32kB
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0
		 0		  0		  0		  0		  0		  0	  0	  0	  0	  0



  • BUT, after some use, the mount changes by itself to sync!! Do not know when or how!!!
Code:
$ nfsstat -m
/zpool/tmp/Users/vicmarto from 192.168.1.250:/mnt/zpool/tmp/Users/vicmarto

  -- Original mount options:

	General mount flags: 0x10500058 async,nodev,nosuid,automounted,noatime,nobrowse

	NFS parameters: intr,locallocks,rsize=65536,wsize=65536,readahead=128,dsize=65536,rdirplus

	File system locations:

	  /mnt/zpool/tmp/Users/vicmarto @ 192.168.1.250 (192.168.1.250)

  -- Current mount parameters:

	General mount flags: 0x14500018 nodev,nosuid,automounted,noatime,nobrowse multilabel

	NFS parameters: vers=3,tcp,port=2049,nomntudp,hard,intr,noresvport,negnamecache,callumnt,locallocks,quota,rsize=65536,wsize=65536,readahead=128,dsize=65536,rdirplus,nodumbtimr,timeo=10,maxgroups=16,acregmin=5,acregmax=60,acdirmin=5,acdirmax=60,nomutejukebox,nonfc,sec=sys

	File system locations:

	  /mnt/zpool/tmp/Users/vicmarto @ 192.168.1.250 (192.168.1.250)

	Status flags: 0x0


And zilstat confirms this:
Code:
# zilstat
   N-Bytes  N-Bytes/s N-Max-Rate	B-Bytes  B-Bytes/s B-Max-Rate	ops  <=4kB 4-32kB >=32kB
  28822304   28822304   28822304   33030144   33030144   33030144	252	  0	  0	252
   6508096	6508096	6508096	7340032	7340032	7340032	 56	  0	  0	 56
   8965760	8965760	8965760   10485760   10485760   10485760	 80	  0	  0	 80
	332440	 332440	 332440	 655360	 655360	 655360	  5	  0	  0	  5
  24637976   24637976   24637976   27918336   27918336   27918336	213	  0	  0	213
  19126832   19126832   19126832   22282240   22282240   22282240	170	  0	  0	170
  17465736   17465736   17465736   19791872   19791872   19791872	151	  0	  0	151
  17187904   17187904   17187904   19529728   19529728   19529728	149	  0	  0	149
   3270472	3270472	3270472	4194304	4194304	4194304	 32	  0	  0	 32
  23315296   23315296   23315296   27262976   27262976   27262976	208	  0	  0	208
  12152952   12152952   12152952   13762560   13762560   13762560	105	  0	  0	105
	863608	 863608	 863608	1179648	1179648	1179648	  9	  0	  0	  9
  19043984   19043984   19043984   22020096   22020096   22020096	168	  0	  0	168
  16934384   16934384   16934384   19136512   19136512   19136512	146	  0	  0	146
  28157608   28157608   28157608   31850496   31850496   31850496	243	  0	  0	243
   1262368	1262368	1262368	1703936	1703936	1703936	 13	  0	  0	 13
  15407552   15407552   15407552   17825792   17825792   17825792	136	  0	  0	136
  19210464   19210464   19210464   22806528   22806528   22806528	174	  0	  0	174
  18661048   18661048   18661048   21102592   21102592   21102592	161	  0	  0	161
   8036160	8036160	8036160	9306112	9306112	9306112	 71	  0	  0	 71
  18747280   18747280   18747280   21889024   21889024   21889024	167	  0	  0	167
  17930800   17930800   17930800   20316160   20316160   20316160	155	  0	  0	155
  14742856   14742856   14742856   16646144   16646144   16646144	127	  0	  0	127
   1528120	1528120	1528120	2228224	2228224	2228224	 17	  0	  0	 17
  16934952   16934952   16934952   19398656   19398656   19398656	148	  0	  0	148
  18595128   18595128   18595128   21233664   21233664   21233664	162	  0	  0	162
  11204352   11204352   11204352   13107200   13107200   13107200	100	  0	  0	100
  24240296   24240296   24240296   27693056   27693056   27693056	212	  0	  0	212
   7570800	7570800	7570800	8650752	8650752	8650752	 66	  0	  0	 66
  19259624   19259624   19259624   22413312   22413312   22413312	171	  0	  0	171
   4284336	4284336	4284336	5505024	5505024	5505024	 42	  0	  0	 42
  17166816   17166816   17166816   19398656   19398656   19398656	148	  0	  0	148
  17731320   17731320   17731320   20054016   20054016   20054016	153	  0	  0	153
  18212944   18212944   18212944   20709376   20709376   20709376	158	  0	  0	158
   7280432	7280432	7280432	8519680	8519680	8519680	 65	  0	  0	 65
   2591256	2591256	2591256	3276800	3276800	3276800	 25	  0	  0	 25
  18500752   18500752   18500752   20951040   20951040   20951040	162	  0	  0	162
  17264840   17264840   17264840   19677184   19677184   19677184	153	  0	  0	153
  11356608   11356608   11356608   13930496   13930496   13930496	107	  0	  0	107
  14477272   14477272   14477272   16384000   16384000   16384000	125	  0	  0	125
   3055872	3055872	3055872	4194304	4194304	4194304	 32	  0	  0	 32
  33555160   33555160   33555160   38039552   38039552   38039552	294	  0	  1	293
	824216	 824216	 824216	1433600	1433600	1433600	 14	  0	  1	 13
  18134824   18134824   18134824   21495808   21495808   21495808	164	  0	  0	164
  17333664   17333664   17333664   19791872   19791872   19791872	151	  0	  0	151
  12620472   12620472   12620472   14680064   14680064   14680064	112	  0	  0	112
  19199144   19199144   19199144   23068672   23068672   23068672	176	  0	  0	176
  12486768   12486768   12486768   14417920   14417920   14417920	110	  0	  0	110
  19540072   19540072   19540072   23724032   23724032   23724032	181	  0	  0	181
  16669672   16669672   16669672   19005440   19005440   19005440	145	  0	  0	145
  13814080   13814080   13814080   15728640   15728640   15728640	120	  0	  0	120
   1203768	1203768	1203768	3014656	3014656	3014656	 23	  0	  0	 23
  35065232   35065232   35065232   39845888   39845888   39845888	304	  0	  0	304
  13018272   13018272   13018272   14811136   14811136   14811136	113	  0	  0	113


  • Now, if I reboot the Mac, the share is mounted one more time as async!


The share is configured with default settings, nothing exotic:
1.jpeg

2.jpeg



Why this strange behavior? I don't know! Do you have some idea why? Or some test I can do? THANK!
 
Last edited:

KrisBee

Wizard
Joined
Mar 20, 2017
Messages
1,288
Taken at face value, this is odd behaviour, but is it the correct interpretation? What were the exact conditions when the zilstat command showed zero activity for 14 seconds? If no data was being transferred on the NFS share that’s normal. If data was being transferred, it could be explained if the shared dataset had left with sync=disabled. Otherwise, I wonder.

I think you need to backtrack to something I already pointed out (your mac os client is making vers 3 connection) and asked about (what was your NFS server config?)

You’ll notice for a vers 3 connection the rsize/wsize defaults to 64k - 65536. This in itself can lead to reduced file transfer speed. Setting the NFS server to version 4 should raise the rsize/wsize to 128k.

To use NFSV4 check the box in the NFS service settings. On the share advanced settings, I’d check the “All directories” box, and set “maproot user” to “root” and “mapgroup group” to “wheel”, and select “sys” for “security”. Save share settings and restart the nfs server.

For your mac os client to connect to the share you’ll have to mount with a “nfsvers=4” option. As authentication is different to NFSv3, you’ll need to ensure both your mac box and freenas box are in the same domain, e.g. local.com, and the same user/group account exists on both boxes. (see: https://blather.michaelwlucas.com/archives/796 ). For good measure add your mac box details to the FreeNAS box known hosts under “Network ‣ Global Configuration”

Once you have the client making a NFSv4 connection hopefully you’ll see better transfer speeds.

To SLOG, or not to SLOG, is a separate question and you need to understand that mounting the NFS client async does NOT mean the NFS server operates entirely in async mode and that no sync writes take place on the zfs filesystem. By design/default the NFS server works in sync mode.

Some reading:

https://constantin.glez.de/2010/07/20/solaris-zfs-synchronous-writes-and-zil-explained/
http://dtrace.org/blogs/brendan/2009/06/26/slog-screenshots/

You can track the client/server NFS exchange using tcpdump & wireshark

https://theagileadmin.com/2017/05/26/tcpdump-and-wireshark-on-osx/

An example analysis which shows how a simple untar command on a NFS share can lead to many sync writes:

http://www.c0t0d0s0.org/archives/7762-tar-x-and-NFS-or-The-devil-in-the-details.html
 
Status
Not open for further replies.
Top