- Joined
- May 13, 2015
- Messages
- 2,478
You're getting 60MBytes/sec on iSCSI on RAIDZ2 with two vdevs? That seems like it could be very reasonable, especially once fragmentation takes hold.
https://www.truenas.com/community/threads/the-path-to-success-for-block-storage.81165/
I get about 400-500 megabits per second, it's a 65% full pool (which is where some of the speed loss comes in) with a ZFS-reported 18% fragmentation, with three mirror vdevs. So it's a little different, but given the small amount of available I/O, I think what you're seeing on your hardware is not necessarily bad.
You may very well be hitting a disk I/O bottleneck, as @jgreco suggests.
But have you added any network tunables on these systems?
Here are the ones I use with FreeNAS 11.2-U8 (ignore the hw.sfxge.* settings as they're hardware-specific):
View attachment 50131
OK, the pool is 9% fragmented. I feel like I should at least be able to get close to 2 hard drives worth of I/O.
Okay, so, I'm not necessarily saying you CAN'T get more, and I'm not discouraging you from trying, but I do want to point out:
Two hard drives are maybe 250 IOPS/sec each. That's 500 IOPS for the pool. I'm really glad you said this, in fact, because it means that you are that rare bird here on the forums, someone who might be REASONABLE. ;-)
So let me point out that an IOPS, from the angle that matters, is a seek. If you are seeking for every 4K block you get, which is clearly a really bad case, 4096 * 500 is only 2MBytes. You're beating 2MBytes/sec, so you're definitely not hitting a hugely pessimistic case.
Tuning may bring out more speed, and you've just been handed some pointers there by someone else. Absolutely do feel free to try that stuff. But to get back to what I was saying, when that runs out of steam:
Mirrors provide much better read performance on iSCSI pools, and you might be able to make your performance substantially better using mirrors. Additionally, if you read the "path to success for block storage" article, one other factor that will decrease your fragmentation is more free space (which should work even for RAIDZ2 somewhat). Depending on how your application is writing data, if it is writing large sequential chunks, having much more free space on the pool is likely to result in it being written to the pool sequentially, which will in turn result in much better read performance on the way out.
However, on the flip side, if your client restores are "single-threaded", there is going to tend to be a speed cap as to how fast a restore can go, even with mirrors. However, especially with mirrors, ZFS shines at parallelism, and you might find that while doing a single restore isn't as fast as you'd like, doing multiple parallel restores makes much better use of the available resources.
Just some random thoughts for you to explore.
60MB/s is only 0.48Gbit/s -- and it just seems to me that you ought to be able to beat that with your equipment....snip...
When you store a couple hundred terabytes of backups and need to recover a VM, getting it out at 30-60MBps isn't going to meet RTO's.
...snip...
iperf
results near 10Gb/s using the tunables I posted earlier. I also use jumbo frames, for what that's worth, but even without jumbos you should be getting better than 9Gb/s with iperf
or something's just not right.I agree with @Spearfoot that before you analyze any other problems `iperf` must should you full link bandwidth of 10Gb/s, not 5-6-whatever. Lower numbers may mean some packet losses or other problems. And it should better be in single TCP stream, because each iSCSI connection is a single TCP stream. In some cases single connection bandwidth can be limited by single CPU core performance, in which case it is not that bad, but I'd expect that happen closer to 5-6GB/s, not 5-6Gb/s.
On top of that SCSI target in TrueNAS is set to process up to 32 requests per LUN at a time, so aside of mentioned network bottlenecks single initiator is not a single thread, as @jgreco said, unless it really does not provide enough parallel requests. Previously the SCSI target could serialize logically consequential reads to help ZFS speculative prefetcher be more efficient, but earlier this year I've improved the prefetcher and was able to significantly relax that constraint. So make sure you are running latest TrueNAS version (12.0-U6 now).
No, just default compression.Are you using dedupe?
Perhaps so, but have you tried adding tunables? At least for TCP window size? And jumbo frames should definitely help w/ block storage.I think the network is fine.
Perhaps so, but have you tried adding tunables? At least for TCP window size? And jumbo frames should definitely help w/ block storage.
I'm just a dog typing on my master's computer, but @mav@ is with iXsystems, and knows whereof he speaks. We both agree that it just makes sense to maximize network throughput.
I would be glad to try the tunables. Should I try TCP window size first and then just test the various other ones you posted?
Also, 9K jumbos are configured and tested end-to-end.
Thanks everyone!