288 TB raw = 50 TB usable?

Status
Not open for further replies.

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
I just ordered a couple of 6048r-e1cr36l 36-drive supermicro boxes with 8 tb hgst nearline sas drives, chelsio t520 nics, 256 gb memory and intel p3700 nvme cards for zil and l2arc. I have a couple of other boxes I'll be using as replication targets, so I want these new systems to be reasonably fast. Use case will be nfs for vmware datastores and nfs mounts for various linux servers, direct-attach iscsi for veeam and various other servers, and other misc stuff.

I've been playing with the zfs space calculator at https://jsfiddle.net/Biduleohm/paq5u7z5/1/embedded/result/, and if I'm understanding the results correctly, it's saying I have 103 tb usable, not including 25 tb which is the minimum recommended free space amount? If that's the case, if I want to follow best practice and not use more than 50%, I'm realistically looking at ~50 tb per box. Does 50 tb usable sound like a reasonable expectation if starting with 288 tb raw?

Code:
Drive size: 8 TB
Number of drives: 36
RAID type:  Mirror (requires at least 2 drives)
MTTPR: 72 h
Rebuild speed: 100  MiB/s
                                  TiB      TB      %
Drive size                          7.276   8      N/A
Total parity space                131     144      50
Total data space                  131     144      50
Total RAID space                  261.9   288     100
Metadata overhead                   2.095   2.304   1.6
Blocks overhead                     0       0       0
Total overhead                      2.095   2.304   1.6
Minimum recommended free space     25.77   28.34   19.68
Usable data space                 103.1   113.4    78.72
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
No, the free space recommendation there is crap for VM usage (sorry Bidule0hm). That's just an 80% recommendation.

If you're doing simple mirroring, you'll wind up with around 130TiB of pool space, of which you should use ~25-50%, so 32-64TB of space, depending on how much performance you want for writes. You're already looking at 256GB of memory, so "that's good." Hopefully you've seen any of the various posts where I talk about pool occupancy and fragmentation so that you understand the tradeoffs there. With such a large pool, that implies a fairly large working set. If so, you might want to consider picking up a pair of the Intel 750 1.2TB's instead of the much pricier P3700, installing both of them, and then run just one to see how your ARC/L2ARC ratios work out. This means you have a spare ready-to-go if you burn through one unit too quickly, or if your ARC stats are such that you could actually make good use of more L2ARC, you can add the other, and be caching ~2.5TB.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
No, the free space recommendation there is crap for VM usage (sorry Bidule0hm). That's just an 80% recommendation.

Yeah, I want to add an input field for the free space recommended (so you can put 50 % or whatever you want) but I just don't have time to work on the app right now...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yeah, I want to add an input field for the free space recommended (so you can put 50 % or whatever you want) but I just don't have time to work on the app right now...

But you have time to post replies! ;-)
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Yep, because I think it's more important than the app :)
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
No, the free space recommendation there is crap for VM usage (sorry Bidule0hm). That's just an 80% recommendation.

If you're doing simple mirroring, you'll wind up with around 130TiB of pool space, of which you should use ~25-50%, so 32-64TB of space, depending on how much performance you want for writes. You're already looking at 256GB of memory, so "that's good." Hopefully you've seen any of the various posts where I talk about pool occupancy and fragmentation so that you understand the tradeoffs there. With such a large pool, that implies a fairly large working set. If so, you might want to consider picking up a pair of the Intel 750 1.2TB's instead of the much pricier P3700, installing both of them, and then run just one to see how your ARC/L2ARC ratios work out. This means you have a spare ready-to-go if you burn through one unit too quickly, or if your ARC stats are such that you could actually make good use of more L2ARC, you can add the other, and be caching ~2.5TB.
I like the two 750's idea, but pci-e availability is a little tight with this build. In addition to the hba, each box has two chelsio t520s, an intel x520-lr card for dr snapshot replication, and 3 p3700's, two for zil and 2 for l2arc.

There's a degree of guesswork with the sizing and I did consider getting 512 gb memory and more l2arc, however the bucket of money was finite and it would have meant skimping in other areas. Also, at least a third of the storage is going to be audio/visual stuff that will need to be archived for legal reasons and mostly never looked at again. The rest will be divided between veeam backups and virtual machines.

I'll do some testing and post the results here once I get up and running, but I'm hoping the caching will be enough to hold the working set.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Be aware that ZFS no longer needs mirrored SLOG devices. A failure of the SLOG is not fatal and in a crisis it is pretty easy just to disable sync writes.
 

KevinM

Contributor
Joined
Apr 23, 2013
Messages
106
Be aware that ZFS no longer needs mirrored SLOG devices. A failure of the SLOG is not fatal and in a crisis it is pretty easy just to disable sync writes.
I knew that but I thought best practice was to mirror if you're using a separate slog device. At any rate the order has been placed so I can't do anything about it now.

A lot of the workload will be veeam backups, and audio-visual stuff that will be stored and rarely if ever looked at, so the sizing might still be ok. But the truth is I have no idea what the size of the working set will be. If it turns out to be way off we might be able to do a mid cycle refresh with more memory and l2arc.
 
Status
Not open for further replies.
Top