iSCSI issue since upgrading to 8.2 final (from beta 3)

Status
Not open for further replies.

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
Hello

Since upgrading fro 8.2 b3 to final I see a whole lot of iSCSI errors such as these:

Code:
Aug 21 01:28:07 freenas istgt[2249]: istgt_lu_disk.c:5209:istgt_lu_disk_queue_start: ***WARNING*** transfer abort CmdSN=42996
Aug 21 01:28:07 freenas istgt[2249]: istgt_lu.c:2865:luworker: ***WARNING*** LU1: lu_disk_queue_start() aborted
Aug 21 01:28:43 freenas istgt[2249]: istgt_iscsi.c: 777:istgt_iscsi_write_pdu_internal: ***ERROR*** iscsi_write() failed (errno=32)
Aug 21 01:28:43 freenas istgt[2249]: istgt_iscsi.c:3518:istgt_iscsi_task_response: ***ERROR*** iscsi_write_pdu() failed
Aug 21 01:28:43 freenas istgt[2249]: istgt_iscsi.c:4967:sender: ***ERROR*** iscsi_task_response() failed on iqn.2011-03.example.org.istgt:freenas,t,0x0001(iqn.1991-05.com.microsoft:myserver.domain.local,i,0x400001370001)
Aug 21 01:28:43 freenas istgt[2249]: istgt_iscsi.c: 777:istgt_iscsi_write_pdu_internal: ***ERROR*** iscsi_write() failed (errno=32)
Aug 21 01:28:43 freenas istgt[2249]: istgt_iscsi.c:4984:sender: ***ERROR*** iscsi_write_pdu() failed on iqn.2011-03.example.org.istgt:freenas,t,0x0001(iqn.1991-05.com.microsoft:myserver.domain.local,i,0x400001370001)
Aug 21 01:28:43 freenas istgt[2249]: Login from iqn.1991-05.com.microsoft:myserver.domain.local (172.16.100.5) on iqn.2011-03.example.org.istgt:freenas LU1 (172.16.100.10:3260,1), ISID=400001370001, TSIH=4, CID=1, HeaderDigest=off, DataDigest=off
Aug 21 01:34:53 freenas istgt[2249]: Configuration refresh requested from 172.16.100.10


On the client side (Windows 2008 server) I get:

Code:
Log Name:      System
Source:        iScsiPrt
Date:          2012-08-23 02:00:58
Event ID:      43
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:     myserver.domain.local
Description:
Target failed to respond in time for a login request.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="iScsiPrt" />
    <EventID Qualifiers="49152">43</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2012-08-23T00:00:58.886418900Z" />
    <EventRecordID>235257</EventRecordID>
    <Channel>System</Channel>
    <Computer>myserver.domain.local</Computer>
    <Security />
  </System>
  <EventData>
    <Data>\Device\RaidPort1</Data>
    <Binary>0000010001000000000000002B0000C000000000000000000000000000000000000000000000000001</Binary>
  </EventData>
</Event>


Obviously not good news. Used to work fine in beta 3 (although I have seen such errors in a few isolated cases) and haven not changed any setting (except, obviously, the upgrade itself).

Any idea / pointer most welcome !

(Freenas 8.2, Using iSCSI file extent, Windows server 2008R2 client)
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You're probably running more traffic and/or running with a fuller filesystem. ZFS and iSCSI have some performance issues in certain scenarios, there's a bug report that talks extensively about it, there are forum posts that talk extensively about it, look for things I've posted about iSCSI and performance and FreeNAS and you should find some mitigation steps.
 

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
You're probably running more traffic and/or running with a fuller filesystem. ZFS and iSCSI have some performance issues in certain scenarios, there's a bug report that talks extensively about it, there are forum posts that talk extensively about it, look for things I've posted about iSCSI and performance and FreeNAS and you should find some mitigation steps.

Thanks - seems to be indeed performance related (ie under heavy load).

Will look into your posts but if you could link them here (just to make sure I don't miss the most relevant one :) - strangely if I search all your posts I only get 3 hits !?) it would be appreciated.

vrCFG.jpg
 

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
Just to bump this thread as I am still stuck with those issues - I would much appreciate any concrete suggestion to improve / mitigate the problem.
 

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
Still digging into this one but I can now definitely report that iSCSI in the release version is most definitely not as stable as in beta3 - I see lots of performance issues / errors with the exact same setup (except for the Freenas release obviously). Among other problems I very much suspect memory leaks... More to come !
 

praecorloth

Contributor
Joined
Jun 2, 2011
Messages
159
No concrete advice, but please keep posting. I am keen on hearing your progress and the things that you try.

Question. Are you using a dedicated network link to FreeNAS for your iSCSI traffic? If so, can you experiment with larger frames? I've heard a lot of people speak highly of the use of larger frames on data networks.
 

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
You might be on something.

I am using FreeNAS in a visualized environment (Vmware). It is running within a virtual machine with the disk directly attached and is presenting iSCSI targets to two Windows 2008 server VMs. One is hosted on the same machine and the other is accessing from a different server through non dedicated network. Interestingly enough the problems are mostly (exclusively) apparent on the local config (i.e. the one running through the Vmware virtual network) whereas the physically networked setup is humming around just fine (both machines have similar workloads).

Will investigate if there is something specific about iSCSI over virtual networks (if anyone has ideas just jump in :) and will also setup dedicated physical networking for the other link. Thanks for those comments.

In any case I insist that this very setup was working just fine so far with 8.2b3
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Thanks - seems to be indeed performance related (ie under heavy load).

Try searching for posts referencing bug 1531. Also read bug 1531 here.

Basically, the way ZFS works may not be optimal for an iSCSI device without putting some effort into design and tuning. 1531 is kind of an umbrella issue. I had set up an E3-1230 with 32GB of RAM and some older 2005-vintage-"fast" 400GB drives in RAIDZ2. Performance was absolutely horrifying. The system would go catatonic for minutes at a time while writing out with dd. Not a hardware issue. It was clearly ZFS. Reducing memory reduced the size of the txg buffer that ZFS could build out towards the pool. Tuning would also reduce the size or timeframe for the txg buffer.

Now, the thing is, for general UNIX on a timesharing platform, hanging for even a few seconds when doing disk I/O might not even be noticed in many cases, but with NFS and particularly with iSCSI, there's a lot of room for badness. iSCSI in particular... when your initiators time out and drop/reconnect a target due to "slowness".

So, other things to contemplate (things not in 1531, which you still really need to check out)

Run your pool at less than 80% capacity, possibly far less. Guaranteed performance hit once you pass 80%, but in a write-heavy environment, it seems that maybe 60% or 70% might be warranted. Avoidance of unnecessary fragmentation is seriously important.

Don't use vdev's. Empirical evidence is that at this time, vdev-backed iSCSI extents are written in sync mode, or maybe there's some other problem that causes severely low write performance.

RAIDZ2 appears to be unusually susceptible to performance suckage. It is not entirely clear why, but in many cases, use of RAIDZ or mirrored drives significantly improved performance.

Sorry for the slow reply, hadn't noticed the unanswered thread. I hope this helps, even if maybe not the answers we'd likely both prefer.
 

atakacs

Explorer
Joined
Apr 23, 2012
Messages
92
Well I can certainly vouch for the fact that reducing the RAM available to freeNAS makes a huge difference in terms of performances (from unusable to fairly decent).
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Well I can certainly vouch for the fact that reducing the RAM available to freeNAS makes a huge difference in terms of performances (from unusable to fairly decent).

Yeah, if that's the case, you'll want to take a particular look at tuning vfs.zfs.txg.write_limit_override ...

Unfortunately, I'm not really certain that this is the best fix, but it's a workable one. There's supposedly some logic in ZFS to try to manage this, but I'm convinced that it's either stupid, or not been ported to FreeBSD, or something like that. I've been able to induce minutes-long catatonic states without even working hard at it.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
After setting vfs.zfs.txg.write_limit_override you may want to look into vfs.zfs.vdev.max_pending to further reduce latency. On Nexenta with SATA disks Richard Elling was recommending vfs.zfs.vdev.max_pending be set to 4 - 2. In this post he seems to like a value of 2.

I don't use iSCSI myself and I don't know how similar FreeBSD behaves with this vs Solaris based OSes. Perhaps jgreco, or anyone who has tested it, can comment on it.
 
Status
Not open for further replies.
Top