freenas ctld: read: connection lost

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Hi there,

i am sitting on this one for days now and haven't found any solution to the problem yet. I've setup FreeNAS 9.10.1 to be an iSCSI-Target and the extent is recognized well by Xenserver 7. However, as soon as i want to establish a connection via XenCenter to install a VM on the Target, the connection gets dropped by FreeNAS and both interfaces (FreeNAS+Xenserver) go down. I am using Intel Pro 1000PT Dual NICs (lag/roundrobin) and i've read that there could be a bug in the em driver considering tso. I've disabled tso on both sides but the problem remains. I've also set "kern.cam.ctl.iscsi.ping_timeout" to zero because i got a No-OP timeout message (that now disappeared).

These are the last lines of /var/log/messages:
Code:
Aug 24 15:21:08 freenas ctld[11993]: 192.168.0.110: read: connection lost												
Aug 24 17:21:08 freenas ctld[1805]: child process 11993 terminated with exit status 1									
Aug 24 17:21:08 freenas ctld[11994]: 192.168.0.110: read: connection lost												
Aug 24 17:21:08 freenas ctld[1805]: child process 11994 terminated with exit status 1																																		  
Aug 24 17:23:11 freenas kernel: arp: 192.168.0.100 moved from 02:ff:a0:00:07:0a to 00:15:17:5c:49:2c on epair1b		  
Aug 24 17:23:11 freenas mDNSResponder: mDNSPlatformSendUDP got error 49 (Can't assign requested address) sending packet to FF02:0000
:0000:0000:0000:0000:0000:00FB on interface 192.168.0.100/lagg0/4														
Aug 24 17:23:12 freenas mDNSResponder: mDNSPlatformSendUDP got error 49 (Can't assign requested address) sending packet to FF02:0000
:0000:0000:0000:0000:0000:00FB on interface 192.168.0.100/lagg0/4														
Aug 24 17:23:12 freenas mDNSResponder: mDNSPlatformSendUDP got error 49 (Can't assign requested address) sending packet to FF02:0000
:0000:0000:0000:0000:0000:00FB on interface 192.168.0.100/lagg0/4														
Aug 24 17:23:12 freenas mDNSResponder: mDNSPlatformSendUDP got error 49 (Can't assign requested address) sending packet to FF02:0000
:0000:0000:0000:0000:0000:00FB on interface 192.168.0.100/lagg0/4

Xenserver says this in dmesg:
Code:
[  117.407106]  connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4294918440, last ping 4294919692, now 4294920944
[  117.407130]  connection1:0: detected conn error (1011)

It would seem like an DNS issue but name resolving works and everything is setup correctly. Sometimes mdns says:
Code:
"freenas mDNSResponder: mDNSPlatformSendUDP got error 50 (Network is down) sending packet to FF02:0000:0000:0000:0000
:0000:0000:00FB on interface 192.168.0.100/lagg0/4"

I can partition the iscsi extent on Win 7 just fine and work with it. It might only occur under heavy load such as installing a vm over network...
Any help is greatly appreciated!


Hardware:
ASUS M5A99FX Pro R2.0
AMD Opteron 3350HE
32GB Kingston ECC DDR3-1600
Sandisk Extreme 16GB USB-Stick (OS)
6x 1TB WD raidz2
IBM M1115 in IT-Mode
Intel PRO/1000PT Dual NIC
Sapphire HD 6570 passive
 
Last edited by a moderator:

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
That's what I got from running 'ctld -d' in the CLI and running a (failed) VM install on the Xenserver at the same time:

Code:
ctld: obtaining previously configured CTL luns from the kernel                                                                
ctld: CTL port 0 "tpc" wasn't managed by ctld;                                                                                
ctld: CTL port 1 "camsim" wasn't managed by ctld;                                                                              
ctld: CTL port 2 "ioctl" wasn't managed by ctld;                                                                              
ctld: found CTL lun 0 "lun1"                                                                                                  
ctld: obtaining configuration from /etc/ctl.conf                                                                              
ctld: auth-group "default" not defined; going with defaults                                                                    
ctld: opening pidfile /var/run/ctld.pid                                                                                        
ctld: modifying lun "lun1", CTL lun 0                                                                                          
ctld: updating port "pg1-iqn.2005-10.org.freenas.ctl:target1"                                                                  
ctld: not listening on portal-group "default", not assigned to any target                                                      
ctld: listening on 192.168.0.100:3260, portal-group "pg1"                                                                      
ctld: incoming connection; not forking due to -d flag                                                                          
ctld: accepted connection from 192.168.0.110; portal group "pg1"                                                              
ctld: 192.168.0.110: setting session timeout to 60 seconds                                                                    
ctld: 192.168.0.110: Capsicum capability mode enabled                                                                          
ctld: 192.168.0.110: beginning Login Phase; waiting for Login PDU                                                              
ctld: 192.168.0.110: read: connection lost 


tcpdump reveals, that there is definitely dataflow, so there is a connection going. But then it gets dropped for whatever reason and there isn't much verbose error logging anywhere...very strange.

I've just run iperf on both servers to test for network stability under high load. Runs perfectly fine with full gigabit saturation, no connection losses at all. It must be related to ctld only, maybe kernel related (if i haven't messed up the config yet ...)
 
Last edited by a moderator:

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
I resolved the problem now by deinstalling XenServer. There were some weird bugs and it was nearly impossible to troubleshoot the problem. I still don't know what the issue was exactly but i guess XenServer didn't seem to like my hardware. I installed Proxmox and now iSCSI to the FreeNAS and setting up VMs is working, as it seems. What a hell of a week... -.-
 

fips

Dabbler
Joined
Apr 26, 2014
Messages
43
I have that "read: connection lost" error together with Proxmox -.-
Also with FreeNAS 9.10.1, and Proxmox 4.2 connected via iSCSI.
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Interesting. What's your mtu settings for both interfaces? Do you have a lag setup? Anything else interesting in the logs of either Proxmox or FreeNAS? iSCSI can be a bit of a bitch to setup...
 
Joined
Mar 22, 2016
Messages
217
Interesting. What's your mtu settings for both interfaces? Do you have a lag setup? Anything else interesting in the logs of either Proxmox or FreeNAS? iSCSI can be a bit of a bunny to setup...

I had a similar issue with ESXI and iSCSI. I one day decided to move some connections on my switch around so that they made some logical sense.

Everything was working fine till it stopped. I spent 4 hours trying to figure it out. Rebooted the entire network, all the systems involved, ran iperf tests, etc. I decided to finally check in on the switch and see if packets were being dropped and thats when I noticed it. It was due to MTU issues where the cards had MTU's set at 9k while the switch was set at 1.5k. I had completely forgotten to set all the ports on the switch to 9216 and just set the ones I was using at the time. When I rearranged all the connections it went to hell really fast.

Check the MTU's!!!
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Yeah, i've been through that as well. Unfortunately, MTU settings wasn't the issue with Xenserver and FreeNAS this time. Since i run commodity hardware (and a lot of people do), it is sometimes impossible to predict if it will actually work/behave the way it's supposed to. Troubleshooting can become a real time consuming pain that way...
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Did some further testing and still run in the same error i originally posted. I wondered because my dmesg output suddenly was empty and 'dmesg -a' showed that my log was flooded with ctld connection errors. I also receive this error message:

Code:
ng_ether_ifnet_arrival_event: can't re-name node epair1b                
pid 1526 (syslog-ng), uid 0: exited on signal 6 (core dumped)


I think this has to do with my lag setup and the particular em0 driver (bug). In combination i get the above error. Also, the plugins section now couldn't be loaded. A reboot of FreeNAS stopped the log flooding but it is likely to occur again. I really hope this gets sorted out!
 
Last edited:

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
If someone knows a dual gigabit nic that doesn't have this issue, i would like to know. In the interfaces section of the lagg0 interface, i have set '-tso -vlantso' in the options field. But i think i have to delete the lagg0 setup and set it individually for em0 and em1. Or i might have to ditch lagg at all with that card. It's just not working reliably.
 
Last edited:
Joined
Mar 22, 2016
Messages
217
I had a lot of issues with a LAG set up on my FreeNAS when I first got it. I wanted to have a dual 10Gb connection for the most bandwidth possible. I got all sorts of weird and strange happenings, including loosing connection on a regular basis.

I just dumped it in favor of running two subnets. One for general storage traffic and another dedicated to iSCSI. It gave me the extra bandwidth I was looking for and resolved all the issues. There's a sticky at the top of this page about LAG/LACP.

I've used that same NIC in my FreeNAS before I moved to the 10Gb connection. It worked perfectly fine for me. I never tried it in LAG though so I can't tell you how that went.
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Thx for the input! The idea with the subnets is not bad. I think i'll scrap the whole roundrobin setup and try some more configurations to see how it goes. But honestly, this behaviour shouldn't occur with such a common card. And from what i've read, this issue affects quite some people and is around at least since 9.3. The 'tso' thing doesn't do much here as it looks. But well, just have to spend even more time on this i guess...
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
This is a real problem now...I've disabled lagg everywhere. I physically removed one of two intel cards from the proxmox server. I configured two subnets on the freenas host. But as soon as i fire up the proxmox server, guess what..."freenas ctld: read: connection lost" all over the place. And i still get the 'ng_ether_ifnet_arrival_event: can't re-name node epair1b' error. This is REALLY frustrating, since a f***ing month i struggle with this. And i can't figure out what the issue is. I will have to run the proxmox standalone. There is no way i get this to work over iscsi reliably.
 

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
Is there any way to selectively suppress/disable ctld messages at the console? I keep getting spammed with ctld connection errors but afaik there are none. My vms running fine over iscsi and i don't see any performance issues. This is just annoying as hell. Doesn't feel right to keep it running like this....
 
Last edited:

B166ER

Explorer
Joined
Aug 24, 2016
Messages
52
I've killed syslogd for now. This is not cool but gives me time to think about how to deal with this.
 

Kei

Dabbler
Joined
May 26, 2016
Messages
45
Hello... Any chance someone can advise us on this bug? This is really annoying :(
 

ybcnyc

Dabbler
Joined
Feb 7, 2013
Messages
23
Apr 27 11:02:53 freenas ctld[12222]: 192.168.1.31: read: connection lost
Apr 27 11:02:53 freenas ctld[2419]: child process 12222 terminated with exit status 1
Apr 27 11:02:56 freenas ctld[12223]: 192.168.1.33: read: connection lost
Apr 27 11:02:56 freenas ctld[2419]: child process 12223 terminated with exit status 1
Apr 27 11:02:56 freenas ctld[12225]: 192.168.1.32: read: connection lost
Apr 27 11:02:56 freenas ctld[2419]: child process 12225 terminated with exit status 1
Apr 27 11:02:57 freenas ctld[12226]: 192.168.1.30: read: connection lost
Apr 27 11:02:57 freenas ctld[2419]: child process 12226 terminated with exit status 1

i'm on FreeNAS-9.3-STABLE-201604041648 with 4 proxmox hosts on separate network connecting to freenas through ISCSI

i'm getting flooded with these messages. did anyone got their issues resolved.
 

Kei

Dabbler
Joined
May 26, 2016
Messages
45
Not solved. But it doesnt seem to be a problem either for the service availability.
 

mav@

iXsystems
iXsystems
Joined
Sep 29, 2011
Messages
1,428
We've passed that several times now. It seems like a "feature" of proxmox. For some reason it regularly connects to iSCSI port and immediately drops the connection.
 

EsJ

Cadet
Joined
Feb 1, 2017
Messages
2
Hi there,

the feature is the check, if the storage is still alive. And multipe targets are leading to multiple log entries every 10 seconds.

Suppress the unwanted connection lost entries with the following. It's a workaround but better than disabling the complete syslog and works for me until next FreeNAS reboot with FreeNAS 9.10 and Proxmox 4.4. iSCSI is communicating over a separate vlan.
Unfortunately I haven't found the source where syslog-ng.conf is being renewed after every reboot. Perhaps someone from ixsystems can help.
Code:
Feb  2 22:53:02 freenas ctld[56870]: 192.168.160.10: read: connection lost
Feb  2 22:53:02 freenas ctld[1715]: child process 56870 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56871]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56871 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56872]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56872 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56873]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56873 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56874]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56874 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56875]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56875 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56876]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56876 terminated with exit status 1
Feb  2 22:53:03 freenas ctld[56877]: 192.168.160.10: read: connection lost
Feb  2 22:53:03 freenas ctld[1715]: child process 56877 terminated with exit status 1

/etc/local/syslog-ng.conf
Code:
# Needs freeNAS with syslog-ng (since freeNAS 9.3)
# The following filter cuts unwanted syslog-ng entries caused by an iSCSI connection to freeNAS, which appear every 10 seconds.
# Anyway, iSCSI connection is working correctly in my case.
#
# ==> messages <==
# Feb  2 22:53:02 freenas ctld[56870]: 192.168.160.10: read: connection lost
# Feb  2 22:53:02 freenas ctld[1715]: child process 56870 terminated with exit status 1
#
# Insert the following directly under existing sylog-ng filter rules, adjust your IP address and trigger "service syslog-ng restart"


filter f_cut_ctld01 {
  program("ctld") and
  message("192.168.160.10: read: connection lost"); };
filter f_cut_ctld02 {
  program("ctld") and
  message("child process") and
  message("terminated with exit status 1"); };

log { source(src); filter(f_cut_ctld01); flags(final); };
log { source(src); filter(f_cut_ctld02); flags(final); };
 

everyman

Dabbler
Joined
Jun 24, 2016
Messages
14
Suppress the unwanted connection lost entries with the following. It's a workaround but better than disabling the complete syslog and works for me until next FreeNAS reboot with FreeNAS 9.10 and Proxmox 4.4. iSCSI is communicating over a separate vlan.
Unfortunately I haven't found the source where syslog-ng.conf is being renewed after every reboot. Perhaps someone from ixsystems can help.

Thanks for this. I stopped using iSCSI between FreeNAS and Proxmox because these messages were drowning the console. Added filter, messages stopped. To make it permanent:
Code:
cd /conf/base/etc/local
mv syslog-ng.conf syslog-ng.conf.freenas-dist
cp /etc/local/syslog-ng.conf ./
 
Top