Looking for a little guidance from the community

Status
Not open for further replies.

kitt001

Dabbler
Joined
Jun 2, 2017
Messages
29
I didn't honestly expect there to be, but the way my luck goes, I'd be all gung-ho, and then find out there's some obscure conflict with the Samsung NVMe that puts it on the no-fly list for FreeNAS.

Thank you again for all your assistance, I think this system is far better with your input and explanations. I'm sure I'll have more questions, but I'm more confident about the direction I'm headed to start with now.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, there's no compatibility issue with NVMe.

Platform dependent. Update the 960 firmware to be safe. Ref:

https://tinkertry.com/supermicro-su...n-cause-960-pro-and-evo-to-hide-heres-the-fix

I prefer two L2ARC devices over one because if you have a failure on one massive device, you're probably going to be really hurting, instead of just sort-of hurting.

If my L2 assumption is incorrect ... how do I size it? You mention 1TBish, and then state two 500's ...

Sizing of L2ARC is a little bit of black magic.

First, you have to know your working set size. That's the set of blocks you access repeatedly. But is that repeatedly every minute? Hour? Day? ... Week?

So just for giggles, let's say you had a Windows VM that's using 20GB of a 100GB vmdk. You have a virus scanner which scans this every day. If that's the only thing running I/O to your filer when it scans that 20GB, you can definitely afford to pull that data from pool, so L2ARC or even ARC is not too useful. But if you have 50 of those all doing it and there's a bunch that are likely to be doing it simultaneously, that's going to be killing your pool.

L2ARC is a way to artificially increase the IOPS of your pool. That can help if you've got heavy fragmentation, or if you've got lots of simultaneous I/O, or just an overall heavy IOPS load.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Sizing of L2ARC is a little bit of black magic.
Have you had the chance to play with compressed ARC/L2ARC? I expect VM performance to improve noticeably on the read side, but I'd love to hear some real-world numbers.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, sorry. I'm still finding it very difficult to get something that's justifiable. I was hoping the price of flash would drop but it went up instead.

There's a few hundred VM's in inventory here, and it'd be really nice to be able to run them all off a pair of FreeNAS systems. We can't really stop production operations for maintenance or updates, so any solution has to allow for the migration of VM's. This means a minimum of two servers, and maybe 8-10TB required. I was having pretty good luck with the 24-bay system (7 three-way vdevs of 2TB each) which was about ~7TB on a single unit, but the power utilization was about ~200 watts and the cost was maybe ~$7-8K. Two of them would have been ~400 watts and ~$15K. Speeds were good, much better than plain RAID1 HDD but not as good as SSD. Once I got out to ~1TB of L2ARC (that's fully 1/7th the usable pool) HDD reads per second had dropped to almost zero so that's very cool.

I expect that compression might have allowed me to get to the almost-zero-read point more quickly. 1TB of NVMe flash is pricey.

But the thing is, I was able to get a quartet of Synology DS416slim's with 2 x 1TB SSD and 2 x 2TB HDD, each unit was around $1000 loaded. The things are only good for about 50MBytes/sec but there's four of them and with the SSD the performance is acceptable. I can offline a unit with less impact by migrating its contents to the remaining three, and the power utilization is only ~15-20 watts per unit. The LSI 2208 RAID controllers have been in the ~$150 range on eBay lately so it's a LOT more power-efficient and cost-effective to move the things that need high performance storage to direct-attached disk, and use the cheap little NAS units for shared iSCSI.

4TB of true RAID1 SSD, 8TB of RAID1 HDD, 60 watts, ~200MBytes/sec, at $4000. I mean, yeah, two FreeNAS systems would be able to push out fifty times that, but it was just more than I could justify, cost, cooling, etc.

So I've been leaning away from the big gear and pursuing lower power and less expensive devices.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
If your read ops were nearly zero with 1TB of L2ARC, then does it still make sense to use 3-way mirrors instead of some RaidZ levels?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If your read ops were nearly zero with 1TB of L2ARC, then does it still make sense to use 3-way mirrors instead of some RaidZ levels?
Don't forget the writes. Especially on a somewhat fragmented pool, they're still going to need a lot of IOPS.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If your read ops were nearly zero with 1TB of L2ARC, then does it still make sense to use 3-way mirrors instead of some RaidZ levels?

Yes. RAIDZ is going to be inefficient at how it allocates space. RAIDZ, especially higher levels like Z2 or Z3, consumes a variable amount of space for parity, and it simply isn't good at the sort of small-scale updates mirrors do pretty well at.

The 3 way mirrors for the above system would give you read speeds on a scale of up to ~~21x that of an individual disk and write up to ~~7x, whereas a Z2 setup would probably be only three or four vdevs, so only ~~3x.

https://forums.freenas.org/index.ph...d-why-we-use-mirrors-for-block-storage.44068/

While I would absolutely use Z2 or Z3 for large contiguous files, due to the great space efficiency, the inefficient space consumption of RAIDZ for block storage and the low performance probably isn't worth the savings. We live in an era of crazy-cheap HDD. Thirty years ago, 20MB of nonredundant HDD would have cost around $400 to add to a system. Today, you can add three 2TB drives to a system for that price, totally redundant, a million times the storage, a hundred times the speed, etc.
 

kitt001

Dabbler
Joined
Jun 2, 2017
Messages
29
Don't forget the writes. Especially on a somewhat fragmented pool, they're still going to need a lot of IOPS.

Isn't the point of the SLOG to buffer the writes? .... is the issue simply in the risk that if it falls too far behind it may not catch up and just end up full?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Isn't the point of the SLOG to buffer the writes?
Yeah, you still need to empty the ZIL in usable time. Same goes for non-sync writes, TXGs are limited to a few seconds and you can't have more than one pending, IIRC. So your disk subsystem needs to be sized for worst-case performance requirements.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Isn't the point of the SLOG to buffer the writes? .... is the issue simply in the risk that if it falls too far behind it may not catch up and just end up full?

SLOG drives are never read from except during recovery, so it's not really a write buffer, rather it's a journal which is played back after a crash.

The journal allows ZFS to report a sync write as written to the pool when it's only written to the SLOG, but the data is still in memory, and then it gets written to the pool and then the copy on the SLOG is no longer needed.

So, if it were a write buffer, it's actually Only a sync write buffer.

ZFS's async write buffer is RAM.
 

kitt001

Dabbler
Joined
Jun 2, 2017
Messages
29
The journal allows ZFS to report a sync write as written to the pool when it's only written to the SLOG, but the data is still in memory, and then it gets written to the pool and then the copy on the SLOG is no longer needed.

So, in an ideal world, you'd need as much available RAM space as you have SLOG space? If I'm following this correctly, once the 128GB of my system ram is full, the surplus space in the 400GB SLOG is just doing nothing? Not that it's not good for wear leveling of the flash, etc ... but the actual SLOG size is basically limited to the amount of system memory from the sound of it. Am I following that right?
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
Worse than that, SLOG only stores 2 or 3 transaction (forget which) groups.

A transaction group is by default up to 5 seconds (iirc). Thus 15 seconds of transfers.

At 1gbit speed that's 1.5GB.

At 10gbit that's 15GB.

Thus anything above say 40GB is well and truly wasted.

If the transaction groups can't be flushed to disk fast enough then ZFS will block until one is.
 

kitt001

Dabbler
Joined
Jun 2, 2017
Messages
29
Interesting ... does L1ARC suffer from the same kinds of limits? or is that able to use all remaining available memory?
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
This has nothing to do with ARC. ARC is for reads only.
 

kitt001

Dabbler
Joined
Jun 2, 2017
Messages
29
I realize that ARC is the read side of things ... Sorry if I was unclear in my question. I was just trying to determine if one were to have 256GB or more worth of memory in a system and the SLOG is limited to 40-45ish GB as Stux pointed out, can the ARC leverage the rest no matter how much there is?, or is there a point that additional memory just doesn't provide any additional benefit because nothing will end up using it.
 

Stux

MVP
Joined
Jun 2, 2016
Messages
4,419
This has nothing to do with ARC. ARC is for reads only.

Since we're discussing low-level mechanics, I believe ZFS actually does cache the recent writes in ARC, in the hope that they will be read again. If they are then they migrate to the most recently read list. And it adapts.
 
Status
Not open for further replies.
Top