Greetings FreeNAS Admins - Test box - 32Cores and 10Gbe

Status
Not open for further replies.

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
Hello all, seems to be quite a few smart people on this site. Thanks for all of your posts.

Have used FreeBSD for webservers and MTA (postfix) for years mostly using 3Ware RAID mirrors, I love being able to take a server down, pull a drive, power on, go to bios, mark new as spare, Raid does rebuild, swap the other drive a few days later, and now I have 2 full usable backups, with 2 new drives in server.

My Test Box:


KGPE-D16 - 32 Opteron Cores - 64Gb non-ecc - Antec Titan 550
Seasonic Snow Silent 750
LSI 9266-8i setup as JBOD from CLI (not sure which firmware I have or if the cache is being used)
Chelsio TR540 - 10Gb - with 10Gb iSCSI links to ESXi - Thanks for the tip - Never heard of Chelsio before.
(5) SATA 6 2TB - WD Enterprise drives.

Speed seems great - I was able to get 3Gbps-5Gpbs traffic from ESXi hosts writing to my FreeNAS server via iSCSI as seen on FreeNAS CXL0 interface, this as with using all 5 drives as RAID Z1. Actually got the CPU up for 10 percent. (all 32 cores) (won't be using Z1 for production)


I also ran a Crystal Disk 3.03 on my VM with DAS local, then same VM via iSCSI on FreeNAS and the speed was better than an SSD. (Small 32Gb Windows 7 box) Probably running out of RAM.

I went with 2TB WD SATA Enterprise drives as the seek time seemed to be a lot better than the 4Tb drives, but the 2TB usable space isn't so great. Anyone have thoughts on this if it matters?


On Deck: 2U Supermicro storage server - 3008 + SAS3 backplane (12) 3.5 hotswap - 32Gb DDR4 ECC - Redundant PSU's - 2011 v3 Xeon 1.6Ghz (6 core) [hope that's enough cpu]



 

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
If possible you should replace the LSI 9266-8i with a proper HBA like a 9211-8i. RAIDZ and iscsi aren't a great combination.
Thanks, can you explain why iSCSI and RAIDZ (1,2,3) aren't a good combo? Or provide a link, sorry if this has been beaten to death already.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Thanks, can you explain why iSCSI and RAIDZ (1,2,3) aren't a good combo? Or provide a link, sorry if this has been beaten to death already.
Each RAIDZ vdev has the random IOPS performance of a single disk. The general rules to having good iscsi performance are

1) Lots of RAM
2) Lots of vdevs (pretty much the only way to do this in a 12-bay chassis is to go with mirrors)
3) Not filling pool more than 50%

I think @jgreco has written fairly extensively on the matter of iscsi.
For example: https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/
 

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
Each RAIDZ vdev has the random IOPS performance of a single disk. The general rules to having good iscsi performance are

1) Lots of RAM
2) Lots of vdevs (pretty much the only way to do this in a 12-bay chassis is to go with mirrors)
3) Not filling pool more than 50%

I think @jgreco has written fairly extensively on the matter of iscsi.
For example: https://forums.freenas.org/index.ph...res-more-resources-for-the-same-result.28178/

Good thread, thanks for reply. So with a RAID10 setup does rule #3 apply
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Good thread, thanks for reply. So with a RAID10 setup does rule #3 apply

Still applies. The idea is that you can avoid fragmentation by having enough contiguous space to copy-on-write every existing block. Going over 50% doesn't compromise data integrity, it only reduces performance.

Speaking of writes and performance, important note here. ESXi does not do sync writes over iSCSI by default. This makes it fast as hell, but potentially puts data at risk in case of power outage or component failure. To mitigate this, you need to force sync writes (zfs set sync=always zvolname) - but your write performance craters. You fix this by adding a fast SLOG (separate log) device.

See this thread by jgreco about sync writes, and why iSCSI seems so much faster than NFS:

https://forums.freenas.org/index.ph...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/
 

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
Still applies. The idea is that you can avoid fragmentation by having enough contiguous space to copy-on-write every existing block. Going over 50% doesn't compromise data integrity, it only reduces performance.

Speaking of writes and performance, important note here. ESXi does not do sync writes over iSCSI by default. This makes it fast as hell, but potentially puts data at risk in case of power outage or component failure. To mitigate this, you need to force sync writes (zfs set sync=always zvolname) - but your write performance craters. You fix this by adding a fast SLOG (separate log) device.

See this thread by jgreco about sync writes, and why iSCSI seems so much faster than NFS:
https://forums.freenas.org/index.ph...xi-nfs-so-slow-and-why-is-iscsi-faster.12506/

Ran a quick test on the test box today. Now it has (8) 2TB WD Sata6 drives in it. Still playing with RAID settings. Ballpark of %50 slower seems to apply here.

300Gb clone from ESXi as seen from FreeNAS yeilded:
(zfs set sync=disabled) ~ 2Gbps
(zfs set sync=always) ~ 1Gbps


By adding a SLOG Intel S3700 100GB SSD ($200 isn't bad) can I "more-safely " leave (zfs set sync=disabled) or is the whole point to actually sync the writes with better speed?
Do I need 2 in a RAID0 to get 5Gbps+ (I realize this depends on compression)


Thanks for advice - I need to read some more on this.

 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
Ran a quick test on the test box today. Now it has (8) 2TB WD Sata6 drives in it. Still playing with RAID settings. Ballpark of %50 slower seems to apply here.

300Gb clone from ESXi as seen from FreeNAS yeilded:
(zfs set sync=disabled) ~ 2Gbps
(zfs set sync=always) ~ 1Gbps


By adding a SLOG Intel S3700 100GB SSD ($200 isn't bad) can I "more-safely " leave (zfs set sync=disabled) or is the whole point to actually sync the writes with better speed?
Do I need 2 in a RAID0 to get 5Gbps+ (I realize this depends on compression)


Thanks for advice - I need to read some more on this.
If you set ZFS sync=disabled, then the SLOG isn't used at all. It is useful in the instances where sync=enabled.

[edit: typo]
 
Last edited:

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Isn't it that smaller disks "always" have better seek times?

Sure, maybe 10% faster, but then you start looking at the performance penalty for fragmentation. Look at the Delphix steady state graph for a pool that's 50% full, and then note that the same pool at 25% full is about three times faster.

Screen-Shot-2013-02-25-at-10.45.36-AM3.png

Is that larger hard drive suddenly seeming a little faster?
 

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
Sure, maybe 10% faster, but then you start looking at the performance penalty for fragmentation. Look at the Delphix steady state graph for a pool that's 50% full, and then note that the same pool at 25% full is about three times faster.

Screen-Shot-2013-02-25-at-10.45.36-AM3.png

Is that larger hard drive suddenly seeming a little faster?

I guess so - No Defrag utility for ZFS? Nice mess! So VMDK files are then fragmented, even though the writes are written in order.. (sync=always) - why doesn't the file system stay "neat" I'm sure the underlying windows MFT are a mess too.

Wheres the graph for NFS? as I understand it does not have the same issue. (thanks Joe)
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Ran a quick test on the test box today. Now it has (8) 2TB WD Sata6 drives in it. Still playing with RAID settings. Ballpark of %50 slower seems to apply here.

300Gb clone from ESXi as seen from FreeNAS yeilded:
(zfs set sync=disabled) ~ 2Gbps
(zfs set sync=always) ~ 1Gbps


By adding a SLOG Intel S3700 100GB SSD ($200 isn't bad) can I "more-safely " leave (zfs set sync=disabled) or is the whole point to actually sync the writes with better speed?
Do I need 2 in a RAID0 to get 5Gbps+ (I realize this depends on compression)


Thanks for advice - I need to read some more on this.

@depasseg already noted that the SLOG is what enables you to keep sync=always without harming performance. A 100GB S3700 might not improve those much, I believe I've been able to squeeze ~85-90MB/s (680-720Mbps) out of those models; the 200GB model should improve that to around 140-160MB/s.

Clones as well will use the VAAI XCOPY primitive so you're copying largely sequentially at the ZFS level. Try running some iostat or other random-write benchmarks against a VMFS datastore and I imagine you'll see significantly worse performance from sync=always, as your disks will need to do a lot more seeking around. Those sync=always numbers still seem high without an SLOG though. I suspect your LSI card is doing some manner of write-caching. I believe the command to enable it is

MegaCli -AdpSetProp -EnableJBOD 1 -aall

Although I'm not entirely sure, The Google pointed me there.

Regarding higher performance than that, if you're only able to get 2Gbps with sync=disabled, that's your "as fast as possible, safety be damned" mark. With 8x 2TB in mirror vdevs you should be able to push more than that though. I wonder if maybe your LSI is doing write-caching and hurting performance here.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I guess so - No Defrag utility for ZFS? Nice mess! So VMDK files are then fragmented, even though the writes are written in order.. (sync=always) - why doesn't the file system stay "neat" I'm sure the underlying windows MFT are a mess too.

Wheres the graph for NFS? as I understand it does not have the same issue. (thanks Joe)

It's a natural side effect of any Copy-on-Write filesystem. An update to a block does not overwrite the existing block. It first gets committed, then metadata gets updated.

This is the sort of thing that allows ZFS to do snapshotting and other nifty tricks. It works REALLY WELL for file based storage, where you're likely to be rewriting an entire file, or updating a large section of an existing file, so that fragmentation doesn't really become a massive issue. It does, however, become a major issue for any sort of access model where you're updating tiny little noncontiguous chunks, like sectors in a VM disk file, or databases, etc.

A defrag utility is nearly an impossibility, because of the sheer complexity of the data structures involved that would need updating. Windows defrag on a DOS filesystem is pretty simple... it's a small disk, there's the directory, there's the list of blocks, move them into order and rewrite. ZFS can potentially have hundreds or thousands of different metadata references to a single block, and the only way to find them would be through a traversal of the whole pool - and the pool is likely terabytes or even petabytes in size, so figuring out how to "defrag" it would be a massive problem.
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I've never seen the addition of a SLOG and sync=always to be "without harming performance." It always comes with a penalty.
Agree. But the penalty is considerably less with a decent SLOG.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
I've never seen the addition of a SLOG and sync=always to be "without harming performance." It always comes with a penalty.

Okay, "without brutally beating performance to the point where even Mike Tyson would suggest you ease up a little bit" then.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Code:
# zfs set sync=standard storage3; dd if=/dev/zero of=file bs=1048576 count=1048576; rm file; zfs set sync=always storage3; dd if=/dev/zero of=file bs=1048576 count=1048576; rm file; zfs set sync=standard storage3
1048576+0 records in
1048576+0 records out
1099511627776 bytes transferred in 3189.541550 secs (344724033 bytes/sec)
1048576+0 records in
1048576+0 records out
1099511627776 bytes transferred in 13362.986037 secs (82280384 bytes/sec)


That's an Intel 750...
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Right, now run that same test without the 750 and you'll get ~10% of that performance I'd wager. ;)

Less, actually, but the point I was making was

I've never seen the addition of a SLOG and sync=always to be "without harming performance." It always comes with a penalty.

which I think my demonstration adequately addresses.
 

Babbage

Dabbler
Joined
Nov 11, 2015
Messages
13
@depasseg already noted that the SLOG is what enables you to keep sync=always without harming performance. A 100GB S3700 might not improve those much, I believe I've been able to squeeze ~85-90MB/s (680-720Mbps) out of those models; the 200GB model should improve that to around 140-160MB/s.

Clones as well will use the VAAI XCOPY primitive so you're copying largely sequentially at the ZFS level. Try running some iostat or other random-write benchmarks against a VMFS datastore and I imagine you'll see significantly worse performance from sync=always, as your disks will need to do a lot more seeking around. Those sync=always numbers still seem high without an SLOG though. I suspect your LSI card is doing some manner of write-caching. I believe the command to enable it is

MegaCli -AdpSetProp -EnableJBOD 1 -aall

Although I'm not entirely sure, The Google pointed me there.

Regarding higher performance than that, if you're only able to get 2Gbps with sync=disabled, that's your "as fast as possible, safety be damned" mark. With 8x 2TB in mirror vdevs you should be able to push more than that though. I wonder if maybe your LSI is doing write-caching and hurting performance here.

With Sync=disabled I was able to get 5Gbps+ while coping multiple files to FreeNAS. The 300Gb test copy wasn't enough to max it out, but gave me an idea of how much it hurt.

After looking at some ZFS Code Author posts, (sync=standard) may be ok for me. Brand New equipment, Redundant, A/C, UPS + 2nd FreeNAS box to duplicate to. As far as FreeNAS crashing, I haven't been able to crash it yet. My ESXi's are 600+ days up, have had many FreeBSD servers with up times in years...

Got my new box and it's faster than test box (only 9Gbps on local)
local 127.0.0.1 port 24920 connected with 127.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 23.1 GBytes 19.8 Gbits/sec



Still testing things. Thanks for the tips guys.
 
Status
Not open for further replies.
Top