SOLVED - SLOG affecting VM CPU?

selbs

Cadet
Joined
Sep 20, 2019
Messages
9
Hi,

Weird problem I noticed when I was bench-marking my freenas rig. I had a Windows server 2016 VM doing windows updates and I noticed that when I had the SLOG online - the vCPU in that 2016 VM spent most of its time maxed. When I took the SLOG offline, the vCPU in that VM dropped by an average of 50%.

Background on my setup - FreeNAS is in a VM on seperate SSD datastore, with 64GB RAM dedicated with an optane 16GB passed thru. Pool is 6x 4tb disks in ZFS2. 2 datasets -500GB for VM Datastore and the balance for file-server stuff. NFS with sync set to "standard" (since I believe ESX forces "always"). Apart from that, pretty vanilla installation

Anybody able to venture a guess as to why I am seeing this behavior? In a nutshell - vCPUs seem to be impacted by the presence of a SLOG. Weird.


- Superstorage 6047R-E1R24N: 24 Bay SAS3 backplane storage chassis
- Super X9DRi-LN4F+ with 2x E5-2630L Xeons (60 Watt TDP each)
- 128GB 8x 16GB ECC RAM
- LSI 9211-81 (IT mode)
- 2x IBM i340-T4 Quad GB Nics
- 6x Constellation 4TB SAS
- 2x 128GB Adata SSD
- Optane 16gb (SLOG)
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
So when you take the SLOG offline... do you think the vCPU in the 2016 VM gets to work more, because write speeds are higher, or less, because write speeds are lower?
 

selbs

Cadet
Joined
Sep 20, 2019
Messages
9
That is a very good question - this is going to be an interesting 'scooby-doo mystery' to solve :)

My next steps are to get a load-simulator on the 2016 VM so I can keep it at a set 'load', and then continue testing from there. I have no solid data to work with now, except for the disk benchmarks and the fact that CPU spiked (and that was reproduce-able).

I'm going to post my benchmarks here, since this is also going to be a "Dear Diary" for me as I work through this.
SJXXSi1.jpg
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
That is a very good question - this is going to be an interesting 'scooby-doo mystery' to solve :)

My next steps are to get a load-simulator on the 2016 VM so I can keep it at a set 'load', and then continue testing from there. I have no solid data to work with now, except for the disk benchmarks and the fact that CPU spiked (and that was reproduce-able).

I'm going to post my benchmarks here, since this is also going to be a "Dear Diary" for me as I work through this.

Well, the answer was actually embedded in my question, but you did the homework anyways and hopefully you look carefully at that. Basically if your VM can write faster, then the processes that drive that writing (in this case the Microsoft patcher) will run harder and consume more resources, but also run a lot more quickly.

Also suggested reading:

https://www.ixsystems.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

https://www.ixsystems.com/community...d-why-we-use-mirrors-for-block-storage.44068/
 

selbs

Cadet
Joined
Sep 20, 2019
Messages
9
Thinking out loud... I wonder if my SLOG is too big? If I take the write abilities of each drive @ 160 mb/s x 6 drives = about 1 gb/s. Now, if the SLOG dumps every 5 seconds - that is about 5 gb they can accept. Sooo, if I am using a 16gb SLOG, it is going to dump +/-15gb every 5 seconds and my drives can only take 5gb and they are causing a bottleneck which cascades down to affect the CPU activities.... maybe? -or, am I misunderstanding the science of the SLOG and 'bigger' doesn't hurt?
 

selbs

Cadet
Joined
Sep 20, 2019
Messages
9
Well, the answer was actually embedded in my question, but you did the homework anyways and hopefully you look carefully at that. Basically if your VM can write faster, then the processes that drive that writing (in this case the Microsoft patcher) will run harder and consume more resources, but also run a lot more quickly.

Also suggested reading:

https://www.ixsystems.com/community/threads/some-insights-into-slog-zil-with-zfs-on-freenas.13633/

https://www.ixsystems.com/community...d-why-we-use-mirrors-for-block-storage.44068/


Thank you! I've been reading but, I clearly have not absorbed it all. There are still some blanks I am trying to fill in :) Thanks for the links - I'm going to check them out, directly. I'm really pleased with FreeNAS and I am moving towards the 'tuning' phase of things!

Thanks again!
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thinking out loud... I wonder if my SLOG is too big? If I take the write abilities of each drive @ 160 mb/s x 6 drives = about 1 gb/s. Now, if the SLOG dumps every 5 seconds - that is about 5 gb they can accept. Sooo, if I am using a 16gb SLOG, it is going to dump +/-15gb every 5 seconds and my drives can only take 5gb and they are causing a bottleneck which cascades down to affect the CPU activities.... maybe? -or, am I misunderstanding the science of the SLOG and 'bigger' doesn't hurt?

No, please read the "some insights" article. You seem confused as to what a log device is. "SLOG dumps" is a saynothing. The SLOG is a log. 99.99999+% of its life is spent writing. The only time it reads is when a pool is imported (i.e. at boot time), and then only to make sure that everything that had been written in the last transactions actually made it into the pool. After the pool is imported, the SLOG is 0.0_% read and 100.0_% write. It is emphatically NOT a cache of any sort. The only real sizing rule for SLOG is that you really want it to be larger than two transaction groups worth of writes, which isn't very large. "Too large" isn't really a thing, except that it could be stupidly wasting money.

Thank you! I've been reading but, I clearly have not absorbed it all. There are still some blanks I am trying to fill in :) Thanks for the links - I'm going to check them out, directly. I'm really pleased with FreeNAS and I am moving towards the 'tuning' phase of things!

Thanks again!

No worries. We usually get most people straightened out pretty well. You immediately went to doing some benchmarking that clearly displays what I was talking about, so make sure you see that. Notice that sync off with or without SLOG is the same, and always faster than sync writes.
 

selbs

Cadet
Joined
Sep 20, 2019
Messages
9
No, please read the "some insights" article. You seem confused as to what a log device is. "SLOG dumps" is a saynothing. The SLOG is a log. 99.99999+% of its life is spent writing. The only time it reads is when a pool is imported (i.e. at boot time), and then only to make sure that everything that had been written in the last transactions actually made it into the pool. After the pool is imported, the SLOG is 0.0_% read and 100.0_% write. It is emphatically NOT a cache of any sort. The only real sizing rule for SLOG is that you really want it to be larger than two transaction groups worth of writes, which isn't very large. "Too large" isn't really a thing, except that it could be stupidly wasting money.



No worries. We usually get most people straightened out pretty well. You immediately went to doing some benchmarking that clearly displays what I was talking about, so make sure you see that. Notice that sync off with or without SLOG is the same, and always faster than sync writes.


Roger that! I am reading the 'some insights' now. Thanks, you guys are more than patient :)
 
Top