Windows box losing connectivity to FreeNAS server when transfering from one data set to another

Status
Not open for further replies.

SG872

Cadet
Joined
Sep 20, 2017
Messages
4
When I am using moving data from one SMB share \\freenas.local\Backups to \\freenas.local\Veeam Backups (those are on different Arrays on the same box), Through my 10Gbit NIC on my desktop (X520-DA1), my Desktop loses connection to the freenas server's SMB shares, and FreeFileSync throws an error stating the "Specified network name is no longer available" When I attempt to browse to the SMB share when it stops transfering it is indeed not accessible however all other traffic is flowing from my desktop and the server is still replying to ICMP traffic.

I can access it from my second system without a problem when I get that error, in fact I can keep on transferring all the data without problems (second system is limited to 1Gbit).

If instead of directly from freenas share to freenas share, I copy the files to my desktop first then over to freenas it is not a problem.
Sadly my secondary system with the other 10Gbit NIC is offline so I cannot test with that (hardware failure). I was wondering if anybody had experienced this or has some suggestions, or where I should start troubleshooting.


FreeNAS hardware:
SuperMicro X9DRD-iF
2x Intel 2640v2
128GB (8x16GB) ECC Samsung M393B2G70DB0-CMA
Mellanox ConnectX-2 EN using a 2m Meraki DAC to a US-16-XG (9k MTU)
8x3TB WD Red RAID 10
4x8TB WD Red RAID 10
2x LSI-9207-8i
HX1000 PSU
Norco RPC-3216
FreeNAS 11 U3

Desktop (not sure if it is really relevant)
4770k Stock Clocks
32GB RAM
Various SSD's
Intel X520-DA1 (9k MTU) using a Intel Compatible SFP+ transceiver from FiberStore to 3m OM4 to Ubiquiti Compatible SFP+ transceiver (also FiberStore) in the US-16-XG
Windows 7 SP1
 
Last edited:

Artion

Patron
Joined
Feb 12, 2016
Messages
331
from one SMB share \\freenas.local\Backups to \\freenas.local\Veeam Backups (those are on different Arrays on the same box)

Maybe its not relevant to troubleshooting, but if you just want to have a periodic backup of a dataset you can use the zfs send | zfs recieve commands to move the data from one dataset to the other without using the network. I assume the datasets are on different pools. Otherwise the use of snapshots is recomended as those take relatively less space.

As per the network issue: are both of the desktop systems (the 10Gb and the 1Gb one) connected to the same switch? If this is the case and when the 10Gb loose connection but the 1Gb does not I'll suggest that the problem may reside between the switch and the 10Gb box (maybe the switch, maybe the cable, maybe the network card of the box, its drivers?)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
If instead of directly from freenas share to freenas share, I copy the files to my desktop first then over to freenas it is not a problem.
From the hardware description, I am not sure if you are using a switch or if this is a direct connection situation. But it sounds to me like either a hardware fault in one of the network interfaces or possibly some sort of quirk because you are bringing data in and shoving it right back out. Fast bi-directional. All that traffic is going to make the 10Gig card get HOT. Not that the card isn't capable, but it might be getting too hot. As to why it works when you copy it to your system and then copy it back, it will not get as hot when you pull the data to your system (one way traffic) and then push it back again (another one way) because it is only doing one thing at a time instead of both at once as fast as it can. I would suspect that the card in your workstation is not getting enough airflow, but it could be either of them or both. You could test it by putting an extra fan there, directed at the card, just temporarily to run a test.

I will expand on what @Artion said too. You can copy data from one pool or directory to another on FreeNAS by SSH into the system and give a command at the prompt. No network traffic required and it goes at disk interface speed instead of network speed. Usually much faster. I copy my whole pool to a backup pool every Sunday and it goes very fast. If you use rsync to do the copy, it can even make comparisons and only touch the files that have changed. You can even automate the process with the schedule task feature in the GUI. There is really no need to do this from another computer, it can happen inside the NAS.
 
Joined
May 10, 2017
Messages
838
Same thing happens to me if I simultaneously send and receive data to my main FreeNAS server using my desktop with MTU=9000, it also can also happen during send only during very large transfers, problem goes away, or at least it occurs much less often with a lower MTU, I'm now using 1500, it's a little slower but don't get the constant timeouts.

I believe it's related to the Mellanox Connectx-2 and/or the driver used in FreeNAS, it doesn't happen with any of my other servers using a Linux OS and the same NIC.
 

SG872

Cadet
Joined
Sep 20, 2017
Messages
4
Maybe its not relevant to troubleshooting, but if you just want to have a periodic backup of a dataset you can use the zfs send | zfs recieve commands to move the data from one dataset to the other without using the network. I assume the datasets are on different pools. Otherwise the use of snapshots is recomended as those take relatively less space.

As per the network issue: are both of the desktop systems (the 10Gb and the 1Gb one) connected to the same switch? If this is the case and when the 10Gb loose connection but the 1Gb does not I'll suggest that the problem may reside between the switch and the 10Gb box (maybe the switch, maybe the cable, maybe the network card of the box, its drivers?)
I was trying to move the backups from one RAID 10 array to another. Both 10Gb devices are on the same switch, the 1Gbit device is 3 hops away Through 10Gb switch, trough 1Gb switch (also managed), and through a basic switch (this is only temporary).

From the hardware description, I am not sure if you are using a switch or if this is a direct connection situation. But it sounds to me like either a hardware fault in one of the network interfaces or possibly some sort of quirk because you are bringing data in and shoving it right back out. Fast bi-directional. All that traffic is going to make the 10Gig card get HOT. Not that the card isn't capable, but it might be getting too hot. As to why it works when you copy it to your system and then copy it back, it will not get as hot when you pull the data to your system (one way traffic) and then push it back again (another one way) because it is only doing one thing at a time instead of both at once as fast as it can. I would suspect that the card in your workstation is not getting enough airflow, but it could be either of them or both. You could test it by putting an extra fan there, directed at the card, just temporarily to run a test.

I will expand on what @Artion said too. You can copy data from one pool or directory to another on FreeNAS by SSH into the system and give a command at the prompt. No network traffic required and it goes at disk interface speed instead of network speed. Usually much faster. I copy my whole pool to a backup pool every Sunday and it goes very fast. If you use rsync to do the copy, it can even make comparisons and only touch the files that have changed. You can even automate the process with the schedule task feature in the GUI. There is really no need to do this from another computer, it can happen inside the NAS.

I'll check the temperature's while there is a 140mm Fan blowing towards the X520 it may not be directly in it's path definitely worth investigating thanks for the suggestions! Will look into direct copy commands in the future if I need to do that as that would save crap tons of time just going from disk to disk. Also all of this is going through a switch, no direct connects.

Same thing happens to me if I simultaneously send and receive data to my main FreeNAS server using my desktop with MTU=9000, it also can also happen during send only during very large transfers, problem goes away, or at least it occurs much less often with a lower MTU, I'm now using 1500, it's a little slower but don't get the constant timeouts.

I believe it's related to the Mellanox Connectx-2 and/or the driver used in FreeNAS, it doesn't happen with any of my other servers using a Linux OS and the same NIC.

Interesting that you are having that issue but on the FreeNAS box side, while mine is on the client side. If it is anything heat related you may want to check that out, my FreeNAS box has reasonable airflow with 4x 80mm fans and 2x 92mm fans on the processors, although I keep it relatively quiet since it is in my bedroom for now.

I will update after some testing, and hopefully I can get my UnRAID server up and running in a week or two with the other 10Gbe card to do some testing with as well (that's a ConnectX-2 as well) (either CPU/RAM or Motherboard problem)
 
Joined
May 10, 2017
Messages
838
If it is anything heat related you may want to check that out

It's not, I did suspect that at first because it mainly happened after some time during large transfers, but latter I found that if started a transfer in each direction it would happen immediately ruling that out.
 

SG872

Cadet
Joined
Sep 20, 2017
Messages
4
It's not, I did suspect that at first because it mainly happened after some time during large transfers, but latter I found that if started a transfer in each direction it would happen immediately ruling that out.

When that happens if you have any other systems are they still able to access the shares?

Also just in case my ConnectX-2 is the issue what are known working adapters for FreeNAS (I tend to buy older used server hardware)
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Connect using ip not network names. Try direct connection with no switch. Try default mtu not jumbo frames.
 

SG872

Cadet
Joined
Sep 20, 2017
Messages
4
Connect using ip not network names. Try direct connection with no switch. Try default mtu not jumbo frames.
I'll try the direct connect, I am using IP addresses as opposed to hostnames, just was easier to type the hostname in the post.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
I'll try the direct connect, I am using IP addresses as opposed to hostnames, just was easier to type the hostname in the post.
I meant use a mapped network drive. Don't use the network section of explorer that uses netbios that is sketchy.
 
Joined
May 10, 2017
Messages
838
When that happens if you have any other systems are they still able to access the shares?

Yes, but using another NIC, the 10GbE NIC is connected directly to my Desktop PC, when it errors out I click retry and the copy operation resumes.
 
Status
Not open for further replies.
Top