Testing the benefits of SLOG using a RAM disk!

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Say you were considering getting a stupid fast SLOG... because your VMs are too slow...

Screen Shot 2017-07-29 at 1.39.42 AM.png


You could create a SLOG on a RAM disk. This is an incredibly stupid idea, but it does demonstrate the maximum performance gains to be had from using the absolutely fastest SLOG you could possibly get...

You could do this:

# create 6GB memory drive, at /dev/md1
mdconfig -a -t swap -s 6g -u 1

# add the ramdisk md1 as slog to pool tank
zpool add tank log md1

Screen Shot 2017-07-29 at 1.43.21 AM.png


So, it works quite well for 256MB... what about 16GB.

You might even be interested in using gstat to watch the data pour into your SLOG and then start pouring onto your disks...

Screen Shot 2017-07-29 at 1.46.04 AM.png


If you were to do that, you might notice that there is a lot of something happening on the Disk Write graph... that's the Dirty Sync Data being flushed in 64MB lumps.

You could increase your dirty_data_sync parameter to 640MB, and see what that does...

sysctl vfs.zfs.dirty_data_sync=671088640

Screen Shot 2017-07-29 at 1.49.38 AM.png


Well, isn't that interesting... well it doesn't make a difference.. so lets put the dirty sync data back to its default...

sysctl vfs.zfs.dirty_data_sync=67108864

Now that we've finished messing around... lets get rid of the memory disk.

# remove ram slog from pool
zpool remove tank md1

#destroy ram disk md1
mdconfig -d -u 1

PLEASE NOTE USING A NON-BATTERY BACKED RAM DISK AS A SLOG IS VERY STUPID.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Now lets try with an actual SLOG...

Screen Shot 2017-07-29 at 1.58.20 AM.png


Screen Shot 2017-07-29 at 1.59.26 AM.png

Screen Shot 2017-07-29 at 2.02.40 AM.png


Not bad I think. And remember, without the SLOG its 5MB/s.
 
Joined
Apr 9, 2015
Messages
1,258
Very interesting. I am assuming that the VM's were either locally hosted on the same hardware as the FreeNAS or 10GBe?

One other quick observation you state creating a MD at md0 but then mount md1, assuming typo maybe.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Now, many people suggest simulating a SLOG by simply disabling sync on the dataset...

First, lets remove the slog,
1) click Storage
2) click on your pool
3) click on Volume Status at the bottom
4) click on the slog volume...
5) click remove at the bottom...

Screen Shot 2017-07-29 at 2.04.48 AM.png


# now, lets disable sync on the vmware nfs dataset...
zfs set sync=disabled tank/vmware_nfs

Screen Shot 2017-07-29 at 2.08.00 AM.png


Not bad... but also twice the speed of the RAM slog and 3x the speed of the actual SLOG. So, not actually a realistic display. Basically my system only seems capable of writing about a GB/s, and when sync is enabled, it has to write half a gig/s to the slog AND half a gig/s to the pool. Worth remembering.

The other thing to remember is the fuzzy red lines... that's the 64MB of sync being written to disk just as the rest of the data is sprayed onto your pool... even though sync is disabled...

If I lower the size of the benchmark so that no sync is written during the benchmark...

Screen Shot 2017-07-29 at 2.12.40 AM.png


Anyway. Sync always has to be written eventually... and it will get sprayed out to disk in 64MB chunks... so that's fairly efficient, but it adds latency.

# now lets restore sync to the dataset
zfs set sync=standard tank/vmware_nfs

NOTE: running a vmware nfs dataset with sync disabled is stupid, but nowhere near as stupid as using a RAM based SLOG

;)
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Very interesting. I am assuming that the VM's were either locally hosted on the same hardware as the FreeNAS or 10GBe?

Locally hosted, but after some tuning i'm realiably hitting 10gbps. And there is no hardware offload at the moment. I expect it should be possible to get similar results with a single well tuned 10gbe connection.

One other quick observation you state creating a MD at md0 but then mount md1, assuming typo maybe.

Typo. Thanks. Fixed.
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
BTW, my tests are showing that NFS works pretty damn well.
 

Arwen

MVP
Joined
May 17, 2014
Messages
3,599
One clarifying detail, Zeus RAM disks ARE suitable for ZFS SLOG disks.

Using internal, non-battery backed up RAM is NOT suitable for ZFS SLOG disks.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
One clarifying detail, Zeus RAM disks ARE suitable for ZFS SLOG disks.

Using internal, non-battery backed up RAM is NOT suitable for ZFS SLOG disks.

Yes.

Does make me wish that little Intel 32MB optane thing had PLP and performed better.
 
D

Deleted47050

Guest
Just out of curiosity, what is the actual SLOG you used in the second test?


Sent from my iPhone using Tapatalk
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
It's actually a Samsung 960 Evo 250GB, which does not have PLP

Currently investigating PCIe SLOG devices with PLP
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
So... somebody bought a stupid fast SLOG...

Intel P3700 400GB. Good for 10DWPD and can sustain about 1GB/s write. With a crazy 450,000 IOPS and full PLP. I couldn't justify the expense of the P4800X.

Been doing some testing. One of the first things I did was switch it to 4K sectors.

Anyway, making a pool on the device... no SLOG, but sync=standard (ie enabled for NFS) provides:

Screen Shot 2017-08-03 at 4.01.53 PM.png


Now, 400MB/s make sense, since its essentially writing twice. Once for the ZIL and again when it flushes the TXG.

sync=disabled, means its only writing once...

Screen Shot 2017-08-03 at 4.04.11 PM.png


And then you get circa double the speed... but even though the SSD has PLP, this is not safe, as anything in the TXG will be lost. I suspect faster performance would be had by using the P3700 as SLOG, and a bog standard NVMe as the pool...

I think there is actually a ZFS setting to tell it to write straight through...
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Now, this is using the entirety of the P3700 as a slog for a 40GB pool, which is actually a slice of a Samsung 960 evo, which has been shared into the FreeNAS VM via ESXI.

Screen Shot 2017-08-03 at 5.56.07 PM.png


For those keeping score at home... That's just 10% below my speeds using a ram disk.

And again with sync=disabled

Screen Shot 2017-08-03 at 5.59.06 PM.png


No difference really to when I was using the P3700 directly as a pool.

I think this shows a fundamental 1GB/s bottleneck on my system. Its 8 vCPUs @ 2.2ghz and only dual-channel.

B3n's done some tests, which I think show this too...

https://b3n.org/benchmarking-guest-on-freenas-zfs-bhyve-esxi/
sysbench_memory_mbps.png


Namely, no matter which hypervisor he's using with either 1 or 2C, he still gets just 2.something GB/s.

And if there's only 2GB/s to go around, then writing to ARC and then to disk etc, is just going to end up dividing that.

These tests have been on my current 'test' system, in preparation for setting up similar in my Primary system, which has 6/12 Cores at 4+Ghz and quad-channel DDR4@2400 128GB of RAM. When I commission that I'll be able to dig in futher and determin if the bottleneck is clockspeed or memory bandwidth related.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
So, in order to set the drive to use 4096/4096 byte logical/physical sectors, you can follow this page:

https://www.intel.com/content/www/us/en/support/memory-and-storage/data-center-ssds/000016238.html

Note: the picture is wrong, an actual command to do this is

isdct.exe start -intelssd 0 -nvmeformat LBAFormat=3 SecureEraseSetting=2 ProtectionInformation=0 MetadataSetttings=0

with an added bonus SecureErase, which should return the drive to Fresh Out of Box performance.

After doing that, you can quite simply Over Provision the drive using the same tool.

To OP IntelSSD #0 to 20GB (ie 380GB locked away)
isdct.exe set -intelssd 0 MaximumLBA=20GB

to 50% (ie 100% would be no OP)
isdct.exe set -intelssd 0 MaximumLBA=50%

And to restore to native,
isdct.exe set -intelssd 0 MaximumLBA=native

And then I should just be able to use FreeNAS to add the nvme to my pool... and it should just magically work as a 20GB OP SLOG.

...

And yes... it appears as exactly a 20.0GB drive.

Screen Shot 2017-08-03 at 6.39.27 PM.png
 
Last edited:

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
And with the OP we now have beautiful consistent performance....

Screen Shot 2017-08-03 at 7.01.59 PM.png


Again, very close performance to the RAMdisk.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
And now for iSCSI.

P3700 pci passthrough, OP to 20GB, as SLOG.
6xIronWolf 8TB SATA pci passthrough, RAIDZ2.
VMware ESXi 6.5 AIO config. VMXNET3 for Storage.

sync=always
Screen Shot 2017-08-03 at 8.12.00 PM.png


sync=standard
Screen Shot 2017-08-03 at 8.13.04 PM.png



sync=disabled
Screen Shot 2017-08-03 at 8.14.06 PM.png
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
I go in to some detail on actually using the P3700 as SLOG/swap and L2ARC in my all-in-one build here
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Joined
Dec 1, 2016
Messages
12
Interesting thread, thank you for all your information.
Can you tell me what makes spinning disks so slow compared to SSD’s in FreeNAS?
I have 6 seagate 2.4 TB SAS 10K drives in striped mirrors, and I only get about 25-30MB/s from the volume over NFS.
Each drive in it self can perform 200+ MB/s, so what makes them so slow in FreeNAS.
If I put them on a HW raid controller in RAID 10, I am sure the speeds would be much better.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,358
Interesting thread, thank you for all your information.
Can you tell me what makes spinning disks so slow compared to SSD’s in FreeNAS?
I have 6 seagate 2.4 TB SAS 10K drives in striped mirrors, and I only get about 25-30MB/s from the volume over NFS.
Each drive in it self can perform 200+ MB/s, so what makes them so slow in FreeNAS.
If I put them on a HW raid controller in RAID 10, I am sure the speeds would be much better.

ESXi forces all NFS writes to be synchronous writes.

Ie, uncached.

HDs are slow when writing synchronously. SSDs not so much. It’s because the HD has to physically rotate and position the head, write the data, then do that again to update the metadata before returning control back to the requester.

With async writes, the writes are buffered and executed when the hd can. The control is returned to the requester as soon as ZFS buffers the write. The issue is the requester is not sure the write is committed yet.

ESXi insists on knowing when the writes are committed for vm consistency reasons, so forces sync writes.

If sync writes are slow. Use a SLOG, and then the sync writes will be written to the SLOG synchronously and asynchronously to the pool.

HW raid controllers need a BBU in order to safely cache sync writes. The SLOG serves the same purpose to FreeNAS
 
Top