iSCSI Performance Problems

HoneyBadger · Oct 11, 2018

Windows7ge said:
Ditto, mine is also 234 bytes. I'll del & re-copy/paste the file from scratch but this time I won't touch the permissions or anything else. See what happens.

Update: It did exactly the same thing. Claims there's a syntax error on line 3.

I have no explanation other than "your system is a jerk"

Mlovelace · Oct 11, 2018

HoneyBadger said:
The dtrace script will help narrow it down, but I'm still at a loss as to why it won't run. That's the exact script I have, that runs on my machine - is it 234 bytes exactly in an ls? If it's not, there might be an extra batch of characters.

Edit: I'm updating an extra machine from 11.1-U5 to 11.1-U6, once it's done I'll make sure the dtrace script still runs there.

Your script works fine, just ran it on one of my 11.1-U6 systems without a hitch.

Code:

root@CLNAS02:~ # dtrace -s dirty.d tank
dtrace: script 'dirty.d' matched 2 probes
CPU	 ID					FUNCTION:NAME
  6  63083				 none:txg-syncing	0MB of 3269MB used
  6  63083				 none:txg-syncing   56MB of 3269MB used
  6  63083				 none:txg-syncing   64MB of 3269MB used
  7  63083				 none:txg-syncing   64MB of 3269MB used
  3  63083				 none:txg-syncing   64MB of 3269MB used
  3  63083				 none:txg-syncing   64MB of 3269MB used
  2  63083				 none:txg-syncing   64MB of 3269MB used
  4  63083				 none:txg-syncing   64MB of 3269MB used
  6  63083				 none:txg-syncing   64MB of 3269MB used
  1  63083				 none:txg-syncing   64MB of 3269MB used
  4  63083				 none:txg-syncing   64MB of 3269MB used
  4  63083				 none:txg-syncing   64MB of 3269MB used
  7  63083				 none:txg-syncing   64MB of 3269MB used

Johnnie Black · Oct 11, 2018

Windows7ge said:
I'm wondering if the CPU clock has anything to do with the bottleneck.

Possibly for SMB, you mentioned they peak at 700MB/s but I didn't get if it's just for the start and then drop or if they stay constant, if they stay constant it might be the CPU, SMB is single threaded, so clock speed beats more cores, but it can't explain if the transfer starts faster then drops out, dropping out it's usually the pool limiting the speed after the RAM cache fills up.

jgreco · Oct 11, 2018

Windows7ge said:
The SSD's are an encrypted 12 drive raidz2
The HDD's are an encrypted 8 drive raidz2

I wouldn't expect either of these to work too well, and the HDD Z2 pool is likely to be a performance catastrophe.

https://forums.freenas.org/index.ph...d-why-we-use-mirrors-for-block-storage.44068/

You can ditch the iSCSI in favor of something saner and it should get somewhat better. Depending on how full and fragmented your pool already is, of course.

Also:

Windows7ge said:
I set the MTU on the client NICs to 9014 and on the servers to 9014 to alleviate the load off the CPU as it was causing a bottleneck on the server.

What made you think *that*?!?!?!?!?!?!?!?! This is extremely unlikely. Any semi-modern kit with hardware offload on a vaguely modern CPU should have no trouble at all with 1500 byte packets. All you're likely to be doing for yourself is creating other problems.

HoneyBadger · Oct 12, 2018

jgreco said:
I wouldn't expect either of these to work too well, and the HDD Z2 pool is likely to be a performance catastrophe.

While I fully agree with RAIDZ2 on spinning disks being abysmal, I'd have expected better than ~220MB/s from the SSDs, even with the overhead of a block protocol in there.

Any thoughts on his box being unwilling to play ball with dtrace?

jgreco · Oct 12, 2018

HoneyBadger said:
While I fully agree with RAIDZ2 on spinning disks being abysmal, I'd have expected better than ~220MB/s from the SSDs, even with the overhead of a block protocol in there.

SSD speeds can tank rapidly. SSD write speeds are dependent upon the size of the device's free page pool. I've talked about this before especially with respect to the SLOG; see

https://forums.freenas.org/index.php?threads/slog-underprovisioning.38374/

etc. The SLOG has an optimal (sequential) usage pattern for writes and if sized appropriately you ought to be able to get write speeds approaching the speed at which the SSD can block-erase. This is not likely to be true within a pool, especially in a pool where you have RAIDZ2 and block size issues to consider. When you think about what this is probably causing the SSD's to have to sustain for writes, there's a lot of random writes, which means lots of flash page updates, which means the free page pool exhausts rapidly, which means that you get a burst of performance while ZFS and the SSD are able to keep pace but consuming ZIL and free flash pages at a furious pace, then you run out of one, and then the other, and your filer faceplants.

You can look back to the first message in this thread to see a classic faceplant.

So there's stuff you need to do, part of which is to use mirrors (read previously provided link), and optimize your blocksizes, both of which will contribute to significantly better performance within ZFS (applicable to BOTH SSD and HDD), and then maybe underprovision your SSD to increase the free page pool, and then also make sure your pool isn't too full (remembering that beyond 50% can carry a heavy tax), and for SSD's, make sure that the drive supports TRIM and that TRIM is functioning, which gives the drive a head start on keeping the free page pool larger. When you do all the right things, you are minimizing the unnecessary writes to the SSD's, which gives you the biggest portion of the performance boost... or at least that's my opinion. These may not be the ONLY factors involved, but my guess is that this is playing a significant role here.

Remember that you can get insane write speeds out of ZFS on a HDD pool with 10% occupancy. This is all a game about optimizing your resources so that ZFS can do its magic. The flip side of the coin is that it is easy to throw wrenches in the mechanism to screw it all up.

HoneyBadger said:
Any thoughts on his box being unwilling to play ball with dtrace?

No, sorry.

Windows7ge · Oct 12, 2018

HoneyBadger said:
I have no explanation other than "your system

is a jerk"

Wouldn't surprise me. Whenever I want to configure something new one thing or another doesn't want to cooperate.

Johnnie Black said:
Possibly for SMB, you mentioned they peak at 700MB/s but I didn't get if it's just for the start and then drop or if they stay constant, if they stay constant it might be the CPU, SMB is single threaded, so clock speed beats more cores, but it can't explain if the transfer starts faster then drops out, dropping out it's usually the pool limiting the speed after the RAM cache fills up.

Yeah it's not constant. Usually drops off and floats 500~600MB/s.

jgreco said:
You can ditch the iSCSI in favor of something saner and it should get somewhat better. Depending on how full and fragmented your pool already is, of course.

Unfortunately my experience has been the opposite of this. SMB has been the worst performer @ 10Gbit. I've failed to get NFS working on Windows at all and the only other option has been iSCSI which proves to work @ 10Gbit just not over a full transfer.

jgreco said:
What made you think *that*?!?!?!?!?!?!?!?! This is extremely unlikely. Any semi-modern kit with hardware offload on a vaguely modern CPU should have no trouble at all with 1500 byte packets. All you're likely to be doing for yourself is creating other problems.

Possibly, but from my experience using SMB and NOT enabling jumbo packets on the NIC has proved to cause about a 200MB/s loss. I cannot say that I've tested this with the latest FreeNAS release though. Perhaps something has changed.

jgreco said:
etc. The SLOG has an optimal (sequential) usage pattern for writes and if sized appropriately you ought to be able to get write speeds approaching the speed at which the SSD can block-erase. This is not likely to be true within a pool, especially in a pool where you have RAIDZ2 and block size issues to consider. When you think about what this is probably causing the SSD's to have to sustain for writes, there's a lot of random writes, which means lots of flash page updates, which means the free page pool exhausts rapidly, which means that you get a burst of performance while ZFS and the SSD are able to keep pace but consuming ZIL and free flash pages at a furious pace, then you run out of one, and then the other, and your filer faceplants.

You can look back to the first message in this thread to see a classic faceplant.

So there's stuff you need to do, part of which is to use mirrors (read previously provided link), and optimize your blocksizes, both of which will contribute to significantly better performance within ZFS (applicable to BOTH SSD and HDD), and then maybe underprovision your SSD to increase the free page pool, and then also make sure your pool isn't too full (remembering that beyond 50% can carry a heavy tax), and for SSD's, make sure that the drive supports TRIM and that TRIM is functioning, which gives the drive a head start on keeping the free page pool larger. When you do all the right things, you are minimizing the unnecessary writes to the SSD's, which gives you the biggest portion of the performance boost... or at least that's my opinion. These may not be the ONLY factors involved, but my guess is that this is playing a significant role here.

Remember that you can get insane write speeds out of ZFS on a HDD pool with 10% occupancy. This is all a game about optimizing your resources so that ZFS can do its magic. The flip side of the coin is that it is easy to throw wrenches in the mechanism to screw it all up.

We know TRIM is working on the SSDs The SSDs are Intel DC S4500's so we believe under-provisioning shouldn't be necessary as it should already have a dedicated section of memory to handle wearing.

The only reason I don't like the idea of mirrors is that means I can only use 50% of the total capacity. However as I've been told by others it drastically increases IO performance over RAID5/6.

I should state again that this performance issue be it iSCSI or SMB is identical on both the HDD & SSD arrays. Network performance does not exceed that peak 700MB/s using SMB regardless of array and iSCSI does the exact same thing on both arrays which makes me think the issue lies elsewhere.

HOWEVER, this late in the game. If mirrors are suppose to be substantially better I'm willing to at least give it a try to see if it really makes any improvement to the situation. I'll even ditch iSCSI for SMB if we can see any improvement from there.

HoneyBadger · Oct 12, 2018

Based simply on blasting "clean slate" drives with 128K records:

Eight spinning drives in RAIDZ2 gives me write speeds of 250MB/s.
The same eight drives in mirrors gives me 550MB/s.

So yes, mirrors should be substantially better.

jgreco · Oct 12, 2018

This is really a matter of dotting all your i's and crossing all your t's, in addition to speaking in complete sentences with proper sentence structure and spelling and all that. It will only work well when you get it *all* right, and then, it will work astonishingly well. Otherwise you will experience varying degrees of suck. :-/

Johnnie Black · Oct 12, 2018

Windows7ge said:
Possibly, but from my experience using SMB and NOT enabling jumbo packets on the NIC has proved to cause about a 200MB/s loss. I cannot say that I've tested this with the latest FreeNAS release though. Perhaps something has changed.

I found the same, though my NIC is quite old and likely laking the CPU offload features present on newer ones, I can only get around 800MB/s with MTU @ 1500 vs. 1.1GB/s with jumbo frames enable.

Windows7ge · Oct 12, 2018

jgreco said:
This is really a matter of dotting all your i's and crossing all your t's, in addition to speaking in complete sentences with proper sentence structure and spelling and all that. It will only work well when you get it *all* right, and then, it will work astonishingly well. Otherwise you will experience varying degrees of suck. :-/

HoneyBadger said:
Based simply on blasting "clean slate" drives with 128K records:

Eight spinning drives in RAIDZ2 gives me write speeds of 250MB/s.
The same eight drives in mirrors gives me 550MB/s.

So yes, mirrors should be substantially better.

Unfortunately I'm not all that savvy with Linux/Unix as a whole so based off what you're saying then chances are I'm aiming for the least degree of suck. I work with files varying from 1KB to 15GB so there's no setup to my understanding that would yield the best performance for such a broad spectrum. Of course I'm more interested in higher transfer rates for bigger file sizes.

I can start by blasting the current hdd raidz2 and wipe them clean (fill with zeros) re-setup raidz2 then testing an SMB share then test it again with mirrors. If the results are different the RAID's at fault. If they're identical, something else is to blame.

HoneyBadger · Oct 12, 2018

You don't need to zero-fill them, just check off the "mark drives as new (destroy data)" box when you export the pool in the FreeNAS GUI.

jgreco · Oct 12, 2018

Windows7ge said:
Unfortunately I'm not all that savvy with Linux/Unix as a whole so based off what you're saying then chances are I'm aiming for the least degree of suck. I work with files varying from 1KB to 15GB so there's no setup to my understanding that would yield the best performance for such a broad spectrum. Of course I'm more interested in higher transfer rates for bigger file sizes.

I can start by blasting the current hdd raidz2 and wipe them clean (fill with zeros) re-setup raidz2 then testing an SMB share then test it again with mirrors. If the results are different the RAID's at fault. If they're identical, something else is to blame.

In general, iSCSI is going to suck because you lose most of the benefits of a NAS, which is the ability to share files. You can only do that with a cluster-aware filesystem if you're using iSCSI. When you attach an iSCSI drive to your PC, this is not much better than a big external USB disk - except that you get data protection too. It's usually a mistake to use iSCSI unless you have a very specific reason. The overhead resource consumption is very high, the cost is very high, and the benefits modest.

RAIDZ is heavily optimized towards working well with large sequential files. Block storage, even if the block storage is a representation of large sequential files, is problematic for RAIDZ. RAIDZ can actually be faster than mirrors with the appropriate use case. However, for block storage, this requires a bunch of optimization and knowledge about the system, and isn't likely to be a good fit even if you're a ZFS God.

HoneyBadger · Oct 12, 2018

Windows7ge said:
SMB has been the worst performer @ 10Gbit.

Perhaps tuning the SMB settings would be a better option than dealing with the overhead of a block filesystem?

Mlovelace said:
View attachment 26064

Is that 1GB/s transfer speed over SMB?

jgreco · Oct 12, 2018

HoneyBadger said:
Is that 1GB/s transfer speed over SMB?

Well given that it's @Mlovelace I am not shocked at that.

Mlovelace · Oct 12, 2018

HoneyBadger said:
Is that 1GB/s transfer speed over SMB?

Yep :)

Mlovelace · Oct 12, 2018

jgreco said:
Well given that it's @Mlovelace I am not shocked at that. :)

I'm glad to see you posting more frequently, missed reading your insights around here! :D

HoneyBadger · Oct 12, 2018

Mlovelace said:
Yep :)

Then I'd argue that it's perfectly possible to tune SMB to give the OP the throughout they're after.

Johnnie Black · Oct 12, 2018

Definitely possible to get 1GB/s+ with SMB, at least while FreeNAS is caching to RAM:

And if the OP wants to try (and besides enabling jumbo frames) these are the only tunables I'm using (found somewhere on the this forum):

Chris Moore · Oct 12, 2018

Johnnie Black said:
Definitely possible to get 1GB/s+ with SMB, at least while FreeNAS is caching to RAM:

Is that on the HDD pool or the SSD pool?

Important Announcement for the TrueNAS Community.

iSCSI Performance Problems

actually does care

Guru

Guru

Resident Grinch

actually does care

Resident Grinch

Contributor

actually does care

Resident Grinch

Guru

Contributor

actually does care

Resident Grinch

actually does care

Resident Grinch

Guru

Guru

actually does care

Guru

Hall of Famer

Similar threads