SMB Errors/Network seems to drop out during transfer - Error Network Error 0x8007003B

Reiject

Cadet
Joined
Mar 20, 2020
Messages
4
Hi,

I am struggling to figure out whether the error above, which I'm seeing from two different cabled Windows clients, is being caused by something I've done wrong when setting up my FreeNas system
  • HP Proliant ML10 Gen9
  • Intel Pentium G4400 CPU
  • 32GB ECC RAM (Crucial)
  • 4x WD Red 6TB Drives in Raidz
  • Network card identified as Intel i219 (built-in NIC)
I have set this up with DeDupelication on some datasets so am expecting to need to add more RAM, but I've not been able to transfer any more than a few hundred GB as the transfers all stall too frequently to be usable.
If there's any other information that I need to consider to figure this out please let me know, any help greatly appreciated!
 

Attachments

  • 2020-03-20.png
    2020-03-20.png
    10.2 KB · Views: 253

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Did you do anything special with your network setup? What happens when you disable dedup?

I don't have any experience with dedup, but I know that 32GB isn't near enough memory for even small datasets. For good measure, I would check CPU usage and memory usage just to see if anything strange is going on there.
 
Last edited:

Reiject

Cadet
Joined
Mar 20, 2020
Messages
4
Did you do anything special with your network setup? What happens when you disable dedup?

I din't have any experience with dedup, but I know that 32GB isn't near enough memory for even small datasets. For good measure, I would check CPU usage and memory usage just to see if anything strange is going on there.

Thanks for the response! Nothing that I would call "special" about the network setup. I have a Netgear Prosafe GS108E switch & an Asus Wifi AC router. Not using jumbo packets, no config done on the switch... CPU usage seems to spike around 70% when transfers are able to maintain 100mbytes/sec and RAM seems to indicate most of it is being used by ZFS cache.

I'm aware Dedupe is almost frowned upon because of its hardware requirements, I intended to increase the RAM as I populated the datasets using the feature, assuming the more storage in use determins the memory usage of dedup.
 

Attachments

  • 2020-03-22.png
    2020-03-22.png
    964.2 KB · Views: 266

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
It would make sense that most of your memory is taken by cache. That is always the case with ZFS, but keep in mind that dedup tables and metadata are very, very large. I am going to guess that your system is not configured properly to run dedup efficiently. For people using dedup, I see references to massive amounts of memory.

- Do the copy jobs run to completion when dedup is not being used? I would do this as a test just to rule out other possible problems.
- Is the system using SWAP when copying with dedup on? If yes, that will be an indication that you don't have enough memory.
- Have you tested some of your data to see how much space you will actually save by using dedup? It might not be worth it. Not all data will benefit very much from dedup.

Note: Not specifically related to your question, but it is not wise to run RaidZ with drives as large as 6TB. RaidZ2 is much safer. Search online and in the forum for comments about redundancy versus disk size.
 

Reiject

Cadet
Joined
Mar 20, 2020
Messages
4
It would make sense that most of your memory is taken by cache. That is always the case with ZFS, but keep in mind that dedup tables and metadata are very, very large. I am going to guess that your system is not configured properly to run dedup efficiently. For people using dedup, I see references to massive amounts of memory.

- Do the copy jobs run to completion when dedup is not being used? I would do this as a test just to rule out other possible problems.
- Is the system using SWAP when copying with dedup on? If yes, that will be an indication that you don't have enough memory.
- Have you tested some of your data to see how much space you will actually save by using dedup? It might not be worth it. Not all data will benefit very much from dedup.

Note: Not specifically related to your question, but it is not wise to run RaidZ with drives as large as 6TB. RaidZ2 is much safer. Search online and in the forum for comments about redundancy versus disk size.

Thanks again for the response. So today I have rebuilt my storage pool as RAIDZ2 with an additional spare I had (originally removed because it reported pending sectors, but subsequently passed the WD DataLifeguard diagnostics tests) and not enabled Dedup. Copying 500GB across to it now, and while I've yet to see the error quoted above, I am getting regular in transfer speed from 100MBytes/sec to 0. It tends to hang for a couple of seconds before climbing back up and repeating. This is the same behaviour it was exhibiting previously before then giving me the error referenced earlier.

Is there anything else I can do to monitor what is going on with the server itself to figure out where this behaviour is coming from ?
Something I considered was putting one of the NTFS drives with my data on into the server Freenas is installed on, and importing the disk directly
 

Attachments

  • 2020-03-22 (1).png
    2020-03-22 (1).png
    42.2 KB · Views: 223
  • 2020-03-22 (2).png
    2020-03-22 (2).png
    43.2 KB · Views: 244

pschatz100

Guru
Joined
Mar 30, 2014
Messages
1,184
Are you using IPv6? Maybe try disabling that to see if anything changes.

The only other thing I could think of is something related to hardware or your network setup: a bad cable or switch or something. I would suggest that you double check all your network connections. I once had a difficult to find network problem that ended up being caused by an old CAT 5 cable that somehow got used even though I upgraded all my cables when I started using gigabit networking.

Of course, you can mount an NTFS disk and copy the data off it.

Good luck.
 

Reiject

Cadet
Joined
Mar 20, 2020
Messages
4
Looking at the disk writes - when I start to observe the error I notice the disk write activity seems to take this strange pulsing pattern. Might this point at any particular problem? Or can I look elsewhere to see what's causing this ?
 

Attachments

  • 2020-03-25 (1).png
    2020-03-25 (1).png
    168.9 KB · Views: 251
Top