FreeNAS 9.3 STABLE upgrading issue (from 9.2.1.9)

mav@ · Dec 13, 2014

Abcslayer, could you please retry after setting sysctl kern.icl.coalesce=0 .

abcslayer · Dec 15, 2014

mav@ said:
Abcslayer, could you please retry after setting sysctl kern.icl.coalesce=0 .

I have to setup another instance with 9.2.1.9 and using in my system now as it is almost impossible to use 9.3 at its state.
I will fire up 9.3 with your setting once again.

abcslayer · Dec 16, 2014

Update:
1. I turn on my 9.3, apply new turntable, the system was running fine under light load (no dropping).
2. I let it reboot after updated to the newest patch of 9.3, there is one dropping message when a VM boot up (Windows Server 2012 R2).
I run AS SSD benchmark 1 time, there is only 1 dropping message, everything seems to be nearly OK.
I run it for 2nd time, things still OK.
I run it for 3rd time, writing is OK but reading is terrible, a lot of dropping messages appear. Benchmark speed drop from 300MB/s - 600MB/s to under 0.5MB/s.
I have to abort the test and retry many time, the result is still the same, dropping when reading.
3. I decided to test with mtu 9000, enable tso (it was mtu 1500, -ts04 -tso6 previously). Everything suddenly becomes OK again. No more drop connection. To ensure about the result, I have turn off VM, restart FreeNAS and test again, things are still OK. I will let it run to test further.
Thank you.

abcslayer · Dec 18, 2014

Update: till now FreeNAS 9.3 still up and running. I have tried to do check disk, disk benchmarks but there is no connection drop detected. In ESXi there are some events logged that IO latency increase from thousands ms to hundred thousands ms (!!) then decreased, when I check performance graph of ESXi at the moment event logged there is no such high latency. I will reboot the whole server to see if it happened again.
One more thing: under light load, there is no "icl_conn_send_pdus: no space to send;" but when there are moderate or high IO load the message appear.

abcslayer · Dec 30, 2014

Update: I have updated FreeNAS VM to newest online update (I notice in some older update there is bug fix that related to kernel icl) but the IO latency increased (and then decreased) still appear in ESXi event log and it match with the graph (but it not as high as the event log, the event log number is super high):

Device
t10.FreeBSD_iSCSI_Disk______000c29e873c1010
_________________ performance has improved.
I/O latency reduced from 53563 microseconds to
10519 microseconds.

Device
t10.FreeBSD_iSCSI_Disk______000c29e873c1010
_________________ performance has
deteriorated. I/O latency increased from average
value of 2464 microseconds to 53563
microseconds.

P/S: When I tried to change sysctl kern.icl.coalesce=1 (to see if some recent update of FreeNAS has fixed the issue), the dropping connection still appear. (sorry, I dont have deep knowledge about kernel icl so I can only try & test in such simple, blind way) so I have to applied sysctl kern.icl.coalesce=0.

Chewie71 · Feb 2, 2015

I am having a very similar problem with my FreeNAS host that we recently upgraded to 9.3. It's currently at FreeNAS-9.3-STABLE-201501241715.

We run Veeam backup jobs to the FreeNAS over iSCSI. The iSCSI target is mounted directly in the Veeam node via the Microsoft Initiator. When backups run we start seeing a lot of the NOP-OUT warnings in the messages log. Eventually it gets bad enough that CTL or the client itself drops the connection completely. It disappears from the Veeam node.

I also have a second iSCSI LUN mapped to some ESXI 5.5 hosts. When the problem occurs, the Datastore in VMware becomes greyed out as VMware thinks it's been disconnected.

If I restart the iSCSI service on the FreeNAS server then connections from the Veeam node and from ESXI reconnect and start working again.

I enabled the kern.cam.ctl.iscsi.ping_timeout = 60. That may resolve the NOP-Out messages.....but I don't know yet if that will solve the disconnection problem.

Thanks,
Matt

fisherwei · Feb 8, 2015

I have the same issue:

Code:

Feb  9 00:55:21 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 00:56:16 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 00:57:36 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 00:58:00 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 00:58:42 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:35:45 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:35:58 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:36:08 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:37:05 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:37:30 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:37:57 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:38:28 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:38:54 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:39:33 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:40:10 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection
Feb  9 01:40:42 nas WARNING: 192.168.253.5 (iqn.2015-01.de.wqfw.h.esxi:esxi): no ping reply (NOP-Out) after 5 seconds; dropping connection

Syris · Feb 19, 2015

Having this same exact issue. Daily emails with no ping reply and dropped connections. For me it doesn't seem to have anything to do with activity of the pool. looking in ESXi event log it often happens overnight when not much is happening in my VMs.
Has anyone found a solution or is this even being looked into? I'm willing to provide logs if someone will tell me what's needed.

abcslayer · Feb 26, 2015

Hi, till now I am using 9.3 with the following setting: sysctl kern.icl.coalesce=0
On moderate load there is no error detected but in VMWare event log there are still high latency issue (which I described in previous posts) happened when my VMs backup their databases.

BTW, if anyone use more than 1 target in FreeNAS 9.3, please avoid the newest update (... STABLE-842051b-ffd50f5-c741590) as the extra target can not work (VMWare client can not find the extra target and there is error message in FreeNAS about target not found!!?!!). If I reboot to the older one (... STABLE-d18ea5b-153f322-d773d50) everything is normal again. With experience on this bug report I do not want to open new topic cover iSCSI target bug. Sorry.

mav@ · Feb 26, 2015

I've tried to reproduce the situation: updated FreeNAS, created second target and two more extents, scanned the portal with VMware 5.5 and found no problem. Could you give more information about your problem: mentioned and other error messages, copy if your /etc/ctl.conf file, etc?

abcslayer · Feb 27, 2015

Hi,

My system configuration (which works fine till the newest update):

1 main target stay with main portal bind to a local IP address, it has 1 extent (a zvol)
1 secondary target stay with secondary portal bind to 0.0.0.0, it has 1 extent (another zvol). This target is mainly to test so I do not bind its portal to a fixed interface.
VMWare initiator connect to FreeNAS through only 1 interface (which the main portal bind to). VMWare can discover the 2nd target as well and connect to it just like the 1st one.

In newest update, this configuration broke, VMWare can only find the 1st target. In FreeNAS message, there are messages that there is initiator connect to the 2nd target but it is not found then that ctld thread exit. Currently my system is running, I will check if I can reboot it and collect log/config later. Sorry as I did not address my specific issues with multiple portals.

mav@ · Feb 27, 2015

During iSCSI discovery each portal reports only targets available through that portal. I am not sure how or why it worked before, but I would say that new behavior seems quite logical to me. Connecting to specific address you are not receiving target bound to wildcard.

Generally I would say that binding target portals to 0.0.0.0 is not a very good idea (it complicated discovery process, because target can't properly announce its addresses), and especially it is not good in combination with binding to specific addresses.

abcslayer · Feb 27, 2015

Thank you for your comment.
Actually I can not remember that if the 2nd target (and portal) was discovered automatically or I have added it manually in ESXi iSCSI initiator but any how, the trouble is when it tried to connect to the target (with correct address), it failed and FreeNAS threw error message about target not found!?! (I have checked the config for any changes by the updater and everything is still the same, I even save the config few times to be sure it is as is).

abcslayer · Mar 11, 2015

The error messages are:

Mar 11 12:48:23 freenas ctld[3499]: 192.168.10.2 (iqn.1998-01.com.vmware:5436d81
2-1b9d-676a-6755): requested target "iqn.2011-03.freenas:ipxe" not found
Mar 11 12:48:23 freenas ctld[1836]: child process 3499 terminated with exit stat
us 1

The issue happened on the newest update too (9.3-STABLE-3a01482-4663c83-3dbc203)

mav@ · Mar 11, 2015

What issue has actually happened? Was reported target exposed to the portal where connection was established? Or you are still experimenting with wildcards, that I discouraged?

abcslayer · Mar 11, 2015

mav@ said:
What issue has actually happened? Was reported target exposed to the portal where connection was established? Or you are still experimenting with wildcards, that I discouraged?

Yes, I am still on the way with the wildcards (0.0.0.0 for the 2nd portal) as FreeNAS 9.3 best feature is snapshot for boot volume, I can easily switch back to the known-work snapshot. As it works with older snapshots and on logic side it should work so I consider the newer updates have bug(s). I will keep wildcard for awhile till I fix the IP for 2nd portal (I prepare it to expose it outside of the ESXi vSwitch network).
Sorry for bother you & thank you for your help/comment.

Important Announcement for the TrueNAS Community.

FreeNAS 9.3 STABLE upgrading issue (from 9.2.1.9)

mav@

iXsystems

abcslayer

Dabbler

abcslayer

Dabbler

abcslayer

Dabbler

abcslayer

Dabbler

Chewie71

Cadet

fisherwei

Dabbler

Syris

Cadet

abcslayer

Dabbler

mav@

iXsystems

abcslayer

Dabbler

mav@

iXsystems

abcslayer

Dabbler

abcslayer

Dabbler

mav@

iXsystems

abcslayer

Dabbler

Similar threads