Assigning ZIL(SLOG) Device to ZFS Volume Troubles

Status
Not open for further replies.

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
I just upgraded the boot drive in my computer so I had a spare 400GB Intel 750 Series PCI_e SSD. My server which runs 8x WD Gold 7200RPM drives in a raidz2 config would average around 500~600MBps so its pushing about the max the HBA can process information (SATAIII). I heard of people using SSDs as a form of read/write cache. As I learned on my own under the Volume Manager the cache option is for reads and ZIL is for writes. Some research into ZIL showed that it's not REALLY intended as a form of boosting performance but preventing the loss of in transit information as it's being flushed to the array (or something to that degree, please don't quote me on any of this). However a little more research showed that depending on how it's configured it will increase write performance.

So, I do some baseline testing. As I said earlier around 500~600MBps. I install the SSD. I go into Volume Manager and add the SSD to the volume as a ZIL device and...no performance change what-so-ever...I take a look in the Reporting tab on FreeNAS and look under Disk. Located the PCI_e SSD and according to the report no information is being written to the SSD at all but all 8 disks show a spike in writes at the time of copy.

A little bit more research brings me to synchronous vs asynchronous write (Looking up the difference between the two I couldn't understand). An article mentioned that by default only synchronous writes are...enabled?...my understanding starts to fall apart here. I found there's a setting for the SLOG device called sync. By default sync=standard. The article spoke about how setting sync=always does something with asynchronous writes and forces data to be written to the SLOG device which would force the writes to operate at the speed of the slog device. Well, the PCI_e SSD is rated for 900MBps writes. Setting sync=always made transfer speeds go from 500~600MBps to a steady 300MBps.

Going back into the Reporting tab under disk it reported this time that data WAS being written to the SSD. So it did what it was suppose to do...just at 1/3rd the rated speed.

I know the issue isn't the files or network card because when data is saved to the server's RAM I can saturate the 10Gbit link reading information.

The server's primary application is backup so writes are more important than reads.

Shell commands I was using were:
zfs get sync [Volume Name]
zfs set sync=[standard | always | disabled] [Volume Name]
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
It is really a waste of the SSD you have, using it in this manner, you didn't read enough about it first and we have plenty of information on this forum regarding it.
Take a look at this writeup from @Stux about a proper use of an SSD: https://forums.freenas.org/index.ph...n4f-esxi-freenas-aio.57116/page-4#post-403374
I see there's a lot of configuring that I skipped. 4K sectors, I saw over-provisioning in my research, calculating how much space is necessary of which less than 25GB is needed for even a 20Gbit link so 400GB is ludicrous I know. Using the partitions for wear leveling of which this post states my SSD is terrible at and will die very quickly...great...

As far as I'm reading it seems to speak about using this for VMs. I'm looking to use it as write cache for a file server. Does that mean this is a waste of my time?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
As far as I'm reading it seems to speak about using this for VMs. I'm looking to use it as write cache for a file server. Does that mean this is a waste of my time?
Yes, fast random IO is the purpose of SSD. If you are doing backup, it is a sequential IO and no SSD is called for. It will just ruin the SSD and not make the system faster.
You said that you have a 10GB network connection and you are trying to maximize the throughput. As I understand it you are only getting around 500MB/s, around half the potential for the network interface.
How is your zpool provisioned? It is likely that some reconfiguration would get you the speed you are seeking. Likely, it will take adding more drives. Is that something you would be interested in exploring?
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
You said that you have a 10GB network connection and you are trying to maximize the throughput. As I understand it you are only getting around 500MB/s, around half the potential for the network interface.
I've hit the throughput limit of the array if not very close to the limit of SATAIII (around 500~600MB/s dependent on the file type)
How is your zpool provisioned? It is likely that some reconfiguration would get you the speed you are seeking. Likely, it will take adding more drives. Is that something you would be interested in exploring?
If I'm understanding that provisioning is essentially describing how a drive or array is partitioned then its a 8x2TB drive 16TB raidz2 array with 12TB usable. Just one big volume, no zvols or separate datasets. I do have four more drive bays free but if I really need higher capacities increasing density sounds like a more intelligent move (like 8 or 10TB disks) as oppose to increasing the size of the array because that will set me up for a capacity cap where I run out of SATA ports or drive bays and have to replace every single drive at once to increase the total capacity.

In terms of VM's I do actually have a few running on the server with a few more I may introduce in the future. I have the current VM's assigned to a 3x 3TB raidz1 array with a 64GB SSD as read cache. Running Diskmark it DOES show around about SSD read speeds in the VM...so now I wonder. Since write cache for the file server is a bust and to be honest I'm just a hobbyist so it's not a serious problem that it won't work now I wonder if I could use it as read cache for the VM's.

I can see this being problematic though. The same wear leveling issue is still applicable. It could kill the SSD quickly like the post stating for my specific SSD being maybe 1 year (plus it's already a few years old so probably not even that) but also the motherboard does not support booting to NVMe devices. I have no idea if using it as read cache for the VM array would even work.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I do have four more drive bays free but if I really need higher capacities increasing density sounds like a more intelligent move (like 8 or 10TB disks) as oppose to increasing the size of the array because that will set me up for a capacity cap where I run out of SATA ports or drive bays and have to replace every single drive at once to increase the total capacity.
The idea is to have more disks because, if they are laid out in the proper configuration, it will increase the speed that you can write, and read, data to the pool. If you had 24 x 1TB drives laid out in 4 vdevs (6 drives each) in RAIDz1, you would have about 13TB usable storage with an estimated IO performance of 1152 MB/s ... More than enough to handle the 10GB network. However, with the same number of drives, if you switch to RAID-z2, the theoretical performance drops to 822.86 MB/s ... So, multiple factors can affect the speed that you can put data on disk. One is the number of drives because each drive has a limit on how fast it can write data. Another is the layout of the array and that includes things like the number of vdevs (array groups) inside the pool. In ZFS each vdev provides about the write performance of an individual disk that makes up the vdev. There is a lot more involved in developing a performant array than just throwing disks in a box.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
it will increase write performance.
Only for sync writes. Unless you can tell me which application of yours is issuing sync writes, you don't benefit at all from an SLOG.

Sync writes will always be slower or equal to async writes, because async writes are cached in RAM.
 
Joined
Apr 9, 2015
Messages
1,258
So your spinning drives in a RaidZ2 should have a write throughput on their own of around 600MBps on their own. If that is the case and you are doing sequential IO the SSD is just adding complexity and possibly even slowing things down.

https://calomel.org/zfs_raid_speed_capacity.html will give you an estimate of what you can expect for speeds based on a setup. An 8 drive RaidZ2 is not listed but a 6 drive is and a 11 drive RaidZ3 is so splitting the difference between those two you should be around 600MBps, possibly slower since the site posted is using 4TB drives but I doubt too far off.

But lets say you need to know your exact performance for reading and writing. You need to create a second dataset and turn compression off on it, if the compression is on it will not work. You can delete it later since you will only be doing this for the test.

Now navigate to the dataset you created.

Run the following command to test Write speed
# dd if=/dev/zero of=testfile bs=1024 count=50000

Run the following command to test Read speed
# dd if=testfile of=/dev/zero bs=1024 count=50000

Do this with the ZIL in place and without and see your results. The bad thing is that your SSD will slow down after it saturates its write buffer. Once it hits that point it is no better than a spinning drive or worse IMHO.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
The idea is to have more disks because, if they are laid out in the proper configuration, it will increase the speed that you can write, and read, data to the pool. If you had 24 x 1TB drives laid out in 4 vdevs (6 drives each) in RAIDz1, you would have about 13TB usable storage with an estimated IO performance of 1152 MB/s ... More than enough to handle the 10GB network. However, with the same number of drives, if you switch to RAID-z2, the theoretical performance drops to 822.86 MB/s ... So, multiple factors can affect the speed that you can put data on disk. One is the number of drives because each drive has a limit on how fast it can write data. Another is the layout of the array and that includes things like the number of vdevs (array groups) inside the pool. In ZFS each vdev provides about the write performance of an individual disk that makes up the vdev. There is a lot more involved in developing a performant array than just throwing disks in a box.
I have an understanding of the different forms of RAID, how they work, parody disks/parody bits and how the size of the RAID and which RAID is being used will affect the performance so I knew where you were going with your question. I deliberately opted for array integrity using raidz2 instead of raidz1 knowing it would hurt R/W performance and compensated for the loss by using 8 disks instead of a lesser number also opting for 7200RPM drives instead of 5400RPM knowing this would also affect performance. Honestly if super high performance is my goal I'll wait for 2TB SSD's to drop in price and rebuild the array using those...unless you have reasons to say that's a terrible idea (besides being really expensive).
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Only for sync writes. Unless you can tell me which application of yours is issuing sync writes, you don't benefit at all from an SLOG.

Sync writes will always be slower or equal to async writes, because async writes are cached in RAM.
This makes me question why people try to use SSD's as cache for their mechanical array. Is it a misconception to use an SSD to speed up access to the array? Or is it really dependent on weather you want fast writes, fast reads, and how much system memory you have? I find the more RAM you have the more FreeNAS will load frequently accessed files into it and use that source instead of the array. This makes me believe that an SSD for read cache for a file server is useless especially if it forces the array to no longer utilize the RAM but the SSD for all cached files. However if the array is being used for VMs it does seem to have its benefits as I have tested.

Now for write access my research said that a ZIL device is meant more for protecting data or the array in the event of power failure during a file transfer. This sounds to me like a RAID card that has RAM installed onto it with a battery. When power is lost the chunk of information in the RAID cards RAM is held until power comes back. When it does the RAID card will write the chunk of data it was holding and this prevents problems.

In short, I don't have an application that uses sync writes and as Chris Moore explained what I thought I could do I actually could not.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
An 8 drive RaidZ2 is not listed but a 6 drive is and a 11 drive RaidZ3 is so splitting the difference between those two you should be around 600MBps, possibly slower since the site posted is using 4TB drives but I doubt too far off.
You're right in the ballpark of what I'm getting. 500~600MB/s depending on what kind of file I'm transferring.
Run the following command to test Write speed
# dd if=/dev/zero of=testfile bs=1024 count=50000

Run the following command to test Read speed
# dd if=testfile of=/dev/zero bs=1024 count=50000

Do this with the ZIL in place and without and see your results. The bad thing is that your SSD will slow down after it saturates its write buffer. Once it hits that point it is no better than a spinning drive or worse IMHO.
When I first initialized the array I ran those tests (no compression, but not in a separate dataset) and the R/W performance was obscure. I can't fully remember but it was ether giving me results around 300MB/s or around 1GB/s of which neither were even close to accurate once I started my actual work flow and as I have learned I cannot use the SSD as I was hoping and even worse even if I could it appears I could only write a handful of TB to it before it stops working due to issues involving wear leveling.

It appears if I would like to use a PCI_e SSD for anything in the server it'll have to be for VM's and I'd have to buy one that is designed for the environment so it doesn't die prematurely.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Is it a misconception to use an SSD to speed up access to the array?
Yes it is.
I find the more RAM you have the more FreeNAS will load frequently accessed files into it and use that source instead of the array.
Well, obviously, all OSes do that.
This makes me believe that an SSD for read cache for a file server is useless especially if it forces the array to no longer utilize the RAM but the SSD for all cached files.
No. Yes, there's ARC, but caching things in L2ARC is useful in many scenarios. The catch is that L2ARC needs metadata stored in ARC, so there's a balancing act involved and you can't just throw L2ARC at the problem.

Now for write access my research said that a ZIL device is meant more for protecting data or the array in the event of power failure during a file transfer. This sounds to me like a RAID card that has RAM installed onto it with a battery. When power is lost the chunk of information in the RAID cards RAM is held until power comes back. When it does the RAID card will write the chunk of data it was holding and this prevents problems.
Sync writes must be acknowledged as having been written to non-volatile storage. Therefore, every ZFS pool has a ZIL to temporarily store these writes before flushing them to the pool proper. This is massively slow, so there's the option of offloading the ZIL to an SLOG device. This device must be fast, low-latency and have power loss protection.
The ZIL is never read from unless things have already gone wrong. Everything in the ZIL is also in RAM, but it must be committed to the ZIL before it's acknowledged.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
I'm looking to use it as write cache for a file server. Does that mean this is a waste of my time?
Yes. The SLOG is not a write cache. ZFS doesn't use a write cache device. This misconception seems to be popping up a lot lately.
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Sync writes must be acknowledged as having been written to non-volatile storage. Therefore, every ZFS pool has a ZIL to temporarily store these writes before flushing them to the pool proper. This is massively slow, so there's the option of offloading the ZIL to an SLOG device. This device must be fast, low-latency and have power loss protection.
The ZIL is never read from unless things have already gone wrong. Everything in the ZIL is also in RAM, but it must be committed to the ZIL before it's acknowledged.
So I take it the benefits of this are more for the application of large amount of data coming form a large amount of clients simultaneously?
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
Yes. The SLOG is not a write cache. ZFS doesn't use a write cache device. This misconception seems to be popping up a lot lately.
Well I suppose that's fine. This does make me want to test the drive as a read cache for my VM's though as I did test a SATA SSD assigned as cache for a 3 mechanical drive VM array and read performance was much higher. In the range of a typical SSD.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
So I take it the benefits of this are more for the application of large amount of data coming form a large amount of clients simultaneously?
No, it's for applications that use sync writes in general. Typically, databases and VMs.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I'll wait for 2TB SSD's to drop in price and rebuild the array using those...unless you have reasons to say that's a terrible idea (besides being really expensive).
That would be so much more performance that it wouldn't make any sense in a home environment. Our main SAN where I work has tiered storage with part of it on SSD, part on 15k SAS drives, part on a robotic tape library and software that automatically moves data around based on demand, but we have a multiple petabytes with hundreds of simultaneous requests.
You already have a quantity of data stored on the system, so I am sure I am wasting my time trying to talk to you about the configuration of your storage, but you could get significantly better performance than you are with a small reconfiguration and adding a few disks. It would not be able to be done with the data in place. The pool would need to be reconfigured.
Can you give a rundown on your hardware? Are you actually interested in changing anything to make for better performance?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
You can test the benefits of a SLOG using a RAM disk. It will put your data at severe risk... but will verify if a high end SLOG would have any affect, more reliably than just disabling sync writes.

https://forums.freebsd.org/threads/58539/
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
That would be so much more performance that it wouldn't make any sense in a home environment. Our main SAN where I work has tiered storage with part of it on SSD, part on 15k SAS drives, part on a robotic tape library and software that automatically moves data around based on demand, but we have a multiple petabytes with hundreds of simultaneous requests.
You already have a quantity of data stored on the system, so I am sure I am wasting my time trying to talk to you about the configuration of your storage, but you could get significantly better performance than you are with a small reconfiguration and adding a few disks. It would not be able to be done with the data in place. The pool would need to be reconfigured.
Can you give a rundown on your hardware? Are you actually interested in changing anything to make for better performance?
A commercial setup like that I'd love to see.

The servers primary application is backup so 99% of the data on it is located on a RAID0 array on my desktop. Dumping the data to reconfigure the array is something I have no issue doing I'll just want to create a 3rd copy on the 2nd raidz array before we do it.

Adding more disks is something I do not want to do for a few reasons (as previously mentioned) besides aren't I limited by SATAIII? All the drives are plugged into the same controller so I don't know how much more performance can really be had with the existing equipment. If we can push it right up to the throughput cap of the controller with the 8 drives I'm open to your configuration guide. Future upgrades I have in mind include using more than one controller (theoretically this would allow me throughput higher than SATAIII when drives between the controllers are in RAID.)

H/W:
Motherboard: ASRock Rack EP2C602-4L/D16
RAM: 16x8GB Kingston DDR3 ECC Unbuffered 1600MHz
CPU: 2x Intel E5-2670 8 core 16 thread @ 2.6GHz
SATA controller: Dell PERC H310 (Firmware flashed to an HBA controller for easy software RAID)
10Gbit NIC: HP 10GbE Mellanox ConnectX-2 MNPA19-XTR (with SFP-10GSR-85 850nm 300 meter transceiver)
raidz2: 8x WD2005FBYZ
raidz1: 3x WD30EFRX
850W Power Supply
 

Windows7ge

Contributor
Joined
Sep 26, 2017
Messages
124
You can test the benefits of a SLOG using a RAM disk. It will put your data at severe risk... but will verify if a high end SLOG would have any affect, more reliably than just disabling sync writes.
You mean create a RAM disk on my desktop to test maximum throughput to the array? Not really worth it since my boot drive is capable of 2GB/s(~20Gigabit) reads/writes.
 
Status
Not open for further replies.
Top