Is there an option for "real" write caching?

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
Short question to the experts.
Is my understanding correct that there is no possibility to add a "real" write cache to truenas(zfs)? With "real" meaning that it caches the actual data before writing it over to the slow spinning disks (not "just" indexes/metadata).

Background: Was dumb some years ago and have some SMR drives which I use with ZFS. Performance really tanks as expected when the CMR cache is filled (123MBs down to 20-30MBs). As I also recently upgraded to 2.5G networking my idea was that I could maybe add a cache SSD instead of buying new (still relatively slow) CMR disks. Also I would hope that it also helps with CMR disks but with 2.5 Gbps I guess benefits would be negligible.

My current understanding is my best bet to achieve something like my idea is to have a ZIL on an SSD and switch over to ASYNC writes but that will only "save" one transaction so 5 seconds and basically is not caching but just making sure that the data is not lost on power failure despite the ASYNC write mode. So what I would need is that it can queue multiple transactions. (Maybe my understanding regarding transactions is not complete here)

Use case is transferring a large 50-100gb file (RAM will not be sufficient for caching)

Thanks
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
You can increase the transaction time.
Increasing your ARC and L2ARC might be useful.
If you want some PLP in your situation you need a UPS.

Anyway, there isn't a short answer to your question.
 
Last edited:

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
Is my understanding correct that there is no possibility to add a "real" write cache to truenas(zfs)? With "real" meaning that it caches the actual data before writing it over to the slow spinning disks (not "just" indexes/metadata).
ZFS uses RAM as write cache, so what you're talking about is already there.

If we're only talking about async writes, that's exactly how it works.

If you throw sync writes in the mix, there's a slightly different story (where SLOG... sometimes incorrectly referred to as write cache... can help to mitigate the lost performance from ensuring the write makes it to disk before confirmation of the write is sent).

my idea is to have a ZIL on an SSD and switch over to ASYNC writes
Don't do both of those, they are effectively mutually exclusive... SLOG won't help with async writes.

Use case is transferring a large 50-100gb file (RAM will not be sufficient for caching)
You don't need 100GB of RAM to transfer a 100GB file... although you will eventually slow the speed of the copy to the pool disk speed after your available RAM is all consumed and you can only commit already cached transactions at pool disk speed to free RAM to cache more.

If you're talking only about huge single file copy speed, the only way to make it go faster without increasing the size of RAM is to have a better performing pool... perhaps you could consider a separate (SSD) pool to accept the initial copy (although that workflow will quickly burn out a cheap SSD)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
My current understanding is my best bet to achieve something like my idea is to have a ZIL on an SSD and switch over to ASYNC writes but that will only "save" one transaction so 5 seconds and basically is not caching but just making sure that the data is not lost on power failure despite the ASYNC write mode. So what I would need is that it can queue multiple transactions. (Maybe my understanding regarding transactions is not complete here)
Short answer is no to all.

ZFS already caches writes in DRAM, if possible. It doesn't get faster than that. Of course, sync writes cannot be merely cached in memory to comply with the specified semantics, and that's what the ZIL is for. SLOG devices can accelerate the ZIL by offloading it from the main pool, but this only affects sync writes.

Async is async and the application is expected to cope [however it wants to - readbacks or shrugs] with loss of data if the server goes down.

So what I would need is that it can queue multiple transactions.
No can do. Your pool must be able to absorb the data quickly enough and 2.5 Gb/s should be doable for most reasonable pools - of course, SMR disks ruin that part.

have some SMR drives
As I also recently upgraded to 2.5G networking
You should really research before buying. That's two boondoggles that could have been avoided.
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
Thanks for all the comments and explanations.

I just tested again and the behavior does not match up with what I thought should be the case after reading the above.

I copied a 50G file to my local SSD which worked fine with basically constant 113 MB/s (PC had only 1Gb/s LAN). Then I copied the same file back but after almost exactly 10% the write went from 113 MB/s to 10-20MB/s.

Before the experiment I set this pool to sync disabled. Machine has 48GB RAM so it should be able to cache more than 5GB but it looked basically identical to what I saw before with the pools sync set to „standard“.

Guess I need to read some more articles to really understand it.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
Please don't recommend that. It's a bad idea all around.
I just presented the option to do so :tongue:
Edit: I get how it might be interpreted as a suggestion. To clarify: it's not.
 
Last edited:

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
I just presented the option to do so :tongue:
Edit: I get how it might be interpreted as a suggestion. To clarify: it's not.
No worries, I‘m not going to try that.

However I still want to understand the behavior above. With that much RAM I would expect it to cache more than 5GB
 

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
However I still want to understand the behavior above. With that much RAM I would expect it to cache more than 5GB
You gave the explanation in your first post:
Background: Was dumb some years ago and have some SMR drives which I use with ZFS. Performance really tanks as expected when the CMR cache is filled (123MBs down to 20-30MBs).
ZFS caches in RAM up to two transaction groups (10 seconds). If the pool cannot take writes fast enough, file transfer will stall until the RAM cache is committed to disk. So, irrespective of network speed, writes to the NAS are capped by the sustained write performance of the SMR drives—which is not good.

Get rid of these SMR drives—also for data safety!
And then, if you can, add more vdevs to increase write throughput.
 
Joined
Jun 15, 2022
Messages
674
Well, @Etorix is a bit aggressive on that...you might want to add a new pool with non-SMR drives, transfer the data, then remove the SMR pool and use the SMR drives elsewhere. ZFS and SMR don't mix well, but many other file systems are fine.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
With that much RAM I would expect it to cache more than 5GB

You can expect whatever you like, but piling even 1TB of RAM into your server isn't going to make it use all of that as write cache. ZFS has a very specific algorithm to manage the write cache, and it is NOT designed to support sustained write speeds in excess of what your pool can support. If you want fast writes, you really need a fast pool PLUS sufficient memory to hold several active transaction groups. That is it. There is no more. It works that way and it is a good design choice. Nothing will make it go faster than that. If that's too slow, increase the speed of your pool (possibly by adding vdevs) and make sure there's sufficient memory to hold more than two transaction groups.
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
You gave the explanation in your first post:

ZFS caches in RAM up to two transaction groups (10 seconds). If the pool cannot take writes fast enough, file transfer will stall until the RAM cache is committed to disk. So, irrespective of network speed, writes to the NAS are capped by the sustained write performance of the SMR drives—which is not good.

Get rid of these SMR drives—also for data safety!
And then, if you can, add more vdevs to increase write throughput.
Oh so it never caches more than two transaction groups even when sync is off, now I get it. It would be cool though if it could cache more transaction groups. (For unimportant data to maximize use of the RAM)

Yes, I will get new CMR drivers. I was just curious and wanted to understand the details.
 
Joined
Oct 22, 2019
Messages
3,641
What about a dedicated pool comprised of only one (stripe) or two (mirror) fast NVMe drives? You say it's "unimportant data". So when dealing with temporary, fleeting, and unimportant data, you can use this super fast pool as a playground of sorts.

Not only will there be no bottlenecks from the drive(s) themselves, but it'll declutter your main pool.

For the record, this is sort of what I do myself.
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
What about a dedicated pool comprised of only one (stripe) or two (mirror) fast NVMe drives? You say it's "unimportant data". So when dealing with temporary, fleeting, and unimportant data, you can use this super fast pool as a playground of sorts.

Not only will there be no bottlenecks from the drive(s) themselves, but it'll declutter your main pool.

For the record, this is sort of what I do myself.
Yes thought about it too could do a cron job to move the data over to the slow SMR drives at night and use them a bit longer (to annoy some people here )

But it could actually be a valid plan. Buy nvme SSDs now and play with really fast storage. When the first SMR drive dies (or a really good offer pops up) add a CMR drive for resilvering and thus have super fast storage and get rid of SMR drives gradually.
 
Joined
Jun 15, 2022
Messages
674
Yes thought about it too could do a cron job to move the data over to the slow SMR drives at night and use them a bit longer (to annoy some people here )

But it could actually be a valid plan. Buy nvme SSDs now and play with really fast storage. When the first SMR drive dies (or a really good offer pops up) add a CMR drive for resilvering and thus have super fast storage and get rid of SMR drives gradually.

Hierarchical storage management

(also known as Tiered storage)
A Data Storage and Data Management technique that automatically moves data between high-cost and low-cost storage media.
---
It's not that people here are snobdoggins, it's that they've often seen what's down the path and offer a better option for your own benefit, should you chose to investigate their commentary. Few-if any-people here talk for the sake of hearing themselves talk.
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
Sure, didn’t want to offend anyone. Especially as I agree with most of the stuff.

I would never buy SMR again now that I know more. Just wanted to understand the background and get some feedback on what is possible as I have the drivers here.

I just ordered a pcie 16x to 4x4x4x4x nvme adapter and will watch out for some nice nvme deals, prices seem to be dropping
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
I just ordered a pcie 16x to 4x4x4x4x nvme adapter and will watch out for some nice nvme deals, prices seem to be dropping
Make sure your mainboard supports bifurcation. If it does, these cards are insanely great.
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
Make sure your mainboard supports bifurcation. If it does, these cards are insanely great.
It does. I did my research.

Now looking into which ssd properties are most relevant for the use case. Not going to go enterprise but at least TLC. Currently checking if I want/need dram cache and if I want pcie 4. most likely both overkill for the use case (current CPU does only have pcie 3 and I currently only have 2.5Gb/s Ethernet).
So I guess I should rather go for more capacity than speed but this gets a bit off topic
 

thex

Dabbler
Joined
Apr 3, 2023
Messages
14
FYI new SSD pool up and running.
And as I'm encountering some wired issue I also have pulled the trigger on some new EXOS harddrives to replace the SMR drives.
 
Top