Ram write caching for docker apps on NVME pool

cmplieger

Dabbler
Joined
Aug 6, 2023
Messages
11
Hello,

I am planning to deploy a TrueNas Scale system with 128GB of RAM.
The system will have 2 pools:
  • 1 mirror NVME mirror pool to run containers
  • 1 HDD pool in raidz-2
I understand RAM write dirty-caching is on by default and should use 50% of the available RAM (64GB). I've seen this demos on youtube mainlu with SMB or NFS.
How does this work for docker applications installed on pool 1 (SSD) storing on pool 2(HDD)? Will it also write to RAM if there is capacity? Does this need to be turned on? Do I need a separate NVME cache to enable this internal functionality?

Thanks for your help.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
I understand RAM write dirty-caching is on by default and should use 50% of the available RAM (64GB).
I don't think whoever is guiding you understands how it works.

ARC (the ZFS "read cache"... don't quote me on that, but close enough for this exercise) can take up to half of RAM on Linux (maybe that's where confusion is coming from). It may be that the OpenZFS Linux implementation also limits write caching into that same half of RAM.

ZFS write caching is only going to store at most 2 transaction groups (and one new one in open state) in memory (which may or may not be further limited to 1/8th of your RAM). https://www.delphix.com/blog/zfs-fundamentals-transaction-groups

As long as your pool is capable of eating whatever is fed to it from RAM in time, you can continue taking in new data for transaction groups to RAM, but as soon as your pool can't finish in time, you stop accepting new data and wait for the pool to catch up.

I have seen a lot of conjecture about how big a transaction group can be and what factors it depends upon, but I've never seen reference to all transaction groups being able to reach more than something like 30GB.

Basically, you can't just win a RAM speed pool if the backing storage for it isn't RAM, but you may initially (in a large copy) or randomly (for sporadic writes) get things to go at RAM speed for a bit. http://dtrace.org/blogs/ahl/2013/12/27/zfs-fundamentals-the-write-throttle/

How does this work for docker applications installed on pool 1 (SSD) storing on pool 2(HDD)? Will it also write to RAM if there is capacity? Does this need to be turned on? Do I need a separate NVME cache to enable this internal functionality?
Wherever the storage is mapped (to which pool) will determine what ZFS will do with data written there.

The system of a container will usually be on the pool you assigned to host your apps.

Data you map in as persistent storage will be handled as you instruct (it can be NFS, so can even involve the network stack before ZFS gets to it).

The same system of handling transaction groups will apply to data destined for your HDD pool, but it will get throttled much sooner in the process as soon as ZFS sees that the pool isn't eating the writes fast enough to just keep grabbing more data into RAM.
 

cmplieger

Dabbler
Joined
Aug 6, 2023
Messages
11
Ok thanks for your explanations that makes a lot more sense. It's not the silver bullet I thought I would be. I'd just have to put IO heavy storage on an SSD and occasionally move that to the HDD cluster as it cools off. That can work for me as well.
 
Top