slow replication task speed

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
I have a this 12.0 system, a production 11.3 freenas box and offsite at rsync.net . Between freenas boxes and to rsync.net I average between 20-40MB/s . I have tried just SSH and SSH+NETCAT task types.

iperf3 between 2 freenas boxes

Code:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  10.8 GBytes  9.30 Gbits/sec    0             sender
[  5]   0.00-10.02  sec  10.8 GBytes  9.28 Gbits/sec                  receiver


while replication task is running on the 12.0 box


Code:
root@freenas[/tmp]# zpool iostat -v
                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
BACKUPS                                         5.61T   125T    140     66  8.13M  8.42M
  raidz2                                         953G  20.9T     21     11  1.35M  1.40M
    gptid/27a1fa0e-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   231K   239K
    gptid/296b54aa-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   232K   239K
    gptid/293b7a0f-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   239K
    gptid/2a884f25-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   230K   239K
    gptid/2b3f6c16-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   231K   239K
    gptid/2cb00c20-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   239K
  raidz2                                         942G  20.9T     20     11  1.33M  1.39M
    gptid/2dbeaf7c-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/2f0992b8-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/2fbb3e59-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   226K   237K
    gptid/307045da-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/31c9dff6-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/329e647d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   226K   237K
  raidz2                                         942G  20.9T     21     10  1.33M  1.39M
    gptid/33865090-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   223K   237K
    gptid/348c3757-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/35a40808-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/3657905a-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   223K   237K
    gptid/37c1860d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   227K   237K
    gptid/3966f4ee-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
  raidz2                                         940G  20.9T     20     10  1.33M  1.38M
    gptid/387c039a-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   236K
    gptid/3a24df68-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   227K   236K
    gptid/3b40abe3-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   224K   236K
    gptid/3be91500-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   230K   236K
    gptid/3cd99499-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   236K
    gptid/3dce6b0d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   225K   236K
  raidz2                                         983G  20.9T     28     11  1.40M  1.43M
    gptid/3ed0fd95-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/3fafcabb-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   244K
    gptid/408723c6-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   241K   244K
    gptid/413c7e46-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/420bd820-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/42e335a5-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   242K   244K
  raidz2                                         981G  20.9T     28     11  1.39M  1.42M
    gptid/43505487-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   243K
    gptid/43f0cfe8-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   238K   243K
    gptid/445b66d5-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   240K   243K
    gptid/44c9e047-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   243K
    gptid/4508a538-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   238K   243K
    gptid/453112d5-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   240K   243K
----------------------------------------------  -----  -----  -----  -----  -----  -----
freenas-boot                                    3.75G   440G      0      3  10.1K  49.6K
  mirror                                        3.75G   440G      0      3  10.1K  49.6K
    da0p2                                           -      -      0      1  5.05K  24.8K
    da1p2                                           -      -      0      1  5.01K  24.8K
----------------------------------------------  -----  -----  -----  -----  -----  -----



Any suggestions?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Are you running zfs replication or rsync replication?

Are CPU or disks busy?
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
built in replication task, so zfs.

cpu overall is not busy, top tasks

Code:
USER         PID   %CPU %MEM     VSZ     RSS TT  STAT STARTED         TIME COMMAND
root          11 4790.3  0.0       0     768  -  RNL   5Sep20 719829:35.45 [idle]
root       91731   30.6  0.0   21260   11048  -  S    12:34      108:02.57 /usr/local/bin/ssh -i /tmp/tmp9kd3cu9j -o UserKnownHostsFile=/tmp/tmpfdf1fjeb -o StrictHostKeyChecking=yes -o BatchMode=yes -o ConnectTimeout=10 -p22 TARGET_IP zfs recv -s -F POOl/DATASET
root           0    8.0  0.0       0   36384  -  DLs   5Sep20   1110:03.41 [kernel]
root       91732    1.8  0.0   19580    8312  -  D    12:34       11:04.84 zfs send -V -p -w -L -c POOL/DATASET@auto-20200901.1029-1m
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
cpu_utilization.PNG
cpu_utilization.PNG
 

jenksdrummer

Patron
Joined
Jun 7, 2011
Messages
250
Best option I have found for replication tasks is to pull using netcat+ssh. Not sure if that helps or not, but that's all I got - lol
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Agreed with netcat + ssh. Are the charts above the freeNAS and the TrueNAS system. CPU is not busy.

I'd check whether the drives are busy... how many drives and what VDEV layout?
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
It was running between the freenas systems when I made this post. I posted `zpool iostat -v` with drive layout and speeds while task is running. Is there a different command to show the busy numbers I should run instead? Here it is again incase it isn't showing up in the original. This is on the backup system data is being written to.

I did try with netcat and the speeds do not change. Same metrics no matter which way I run it.


Code:
root@freenas[/tmp]# zpool iostat -v
                                                  capacity     operations     bandwidth
pool                                            alloc   free   read  write   read  write
----------------------------------------------  -----  -----  -----  -----  -----  -----
BACKUPS                                         5.61T   125T    140     66  8.13M  8.42M
  raidz2                                         953G  20.9T     21     11  1.35M  1.40M
    gptid/27a1fa0e-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   231K   239K
    gptid/296b54aa-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   232K   239K
    gptid/293b7a0f-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   239K
    gptid/2a884f25-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   230K   239K
    gptid/2b3f6c16-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   231K   239K
    gptid/2cb00c20-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   239K
  raidz2                                         942G  20.9T     20     11  1.33M  1.39M
    gptid/2dbeaf7c-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/2f0992b8-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/2fbb3e59-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   226K   237K
    gptid/307045da-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/31c9dff6-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/329e647d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   226K   237K
  raidz2                                         942G  20.9T     21     10  1.33M  1.39M
    gptid/33865090-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   223K   237K
    gptid/348c3757-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   228K   237K
    gptid/35a40808-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
    gptid/3657905a-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   223K   237K
    gptid/37c1860d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   227K   237K
    gptid/3966f4ee-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   237K
  raidz2                                         940G  20.9T     20     10  1.33M  1.38M
    gptid/387c039a-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   236K
    gptid/3a24df68-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   227K   236K
    gptid/3b40abe3-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   224K   236K
    gptid/3be91500-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   230K   236K
    gptid/3cd99499-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   229K   236K
    gptid/3dce6b0d-ec58-11ea-8a08-0cc47a6a1836      -      -      3      1   225K   236K
  raidz2                                         983G  20.9T     28     11  1.40M  1.43M
    gptid/3ed0fd95-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/3fafcabb-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   244K
    gptid/408723c6-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   241K   244K
    gptid/413c7e46-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/420bd820-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   237K   244K
    gptid/42e335a5-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   242K   244K
  raidz2                                         981G  20.9T     28     11  1.39M  1.42M
    gptid/43505487-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   243K
    gptid/43f0cfe8-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   238K   243K
    gptid/445b66d5-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   240K   243K
    gptid/44c9e047-ec58-11ea-8a08-0cc47a6a1836      -      -      4      1   236K   243K
    gptid/4508a538-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   238K   243K
    gptid/453112d5-ec58-11ea-8a08-0cc47a6a1836      -      -      5      1   240K   243K
----------------------------------------------  -----  -----  -----  -----  -----  -----
freenas-boot                                    3.75G   440G      0      3  10.1K  49.6K
  mirror                                        3.75G   440G      0      3  10.1K  49.6K
    da0p2                                           -      -      0      1  5.05K  24.8K
    da1p2                                           -      -      0      1  5.01K  24.8K
----------------------------------------------  -----  -----  -----  -----  -----  -----





Right now the offsite task is running to rsync.net. cpu isnt busy there either and the pool doesnt reveal a lot of information on their side, but its consistent with what happens on the other freenas system as well.


RSYNC.NET stats

Code:
              capacity     operations     bandwidth
pool        alloc   free   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
data1       3.20T  4.31T      2     12  32.4K  2.71M
  vtbd1     3.20T  4.31T      2     12  32.4K  2.71M


Code:
USER   PID %CPU %MEM   VSZ   RSS TT  STAT STARTED        TIME COMMAND
root    11 80.0  0.0     0    32  -  RNL  26Aug20 55502:15.82 [idle]
root 82834 42.0  0.3 23908 13452  -  Ss   11:34     424:40.16 sshd: root@notty (sshd)
root    12 31.0  0.0     0   272  -  WL   26Aug20   910:28.11 [intr]
root  3656  8.0  0.2 17672  8900  -  D    08:45       0:01.01 zfs list -t snapshot -H -o name -s name -r POOL/DATASET
root 82838  3.0  0.2 17024  8236  -  S    11:34      46:42.92 zfs recv -s -F POOL/DATASET
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
I looked at the historical report in webui for disk busy. all disks follow same curve and %. not, def not too busy
disk_busy.PNG
 

hescominsoon

Patron
Joined
Jul 27, 2016
Messages
449
ok what is your outgoing wan speed?
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
I guess you never tested performance between two 11.3 FreeNAS systems?.... if the TrueNAS 12.0 made it slower, then it would be a bug. Can you do that test?

There's no reason we have found for it being this slow. In any case, I'd suggest it's worth reporting.
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
I retested with 11.3 straight to rsync.net and its the same speed (25-40). The 11.3 + 12.0 system is a Supermicro SSG-6048R-E1CR36H, Intel E5-2670 v3 , the RAIDZ2 stripes in the pool consist of 36 HGST HUS726040AL5210 drives. Uses an LSI 3108 in HBA passthrough mode. Network driver in dmesg is as below

Code:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xf020-0xf03f mem 0xfbb80000-0xfbbfffff,0xfbc04000-0xfbc07fff irq 56 at device 0.0 numa-domain 1 on pci12
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xf000-0xf01f mem 0xfbb00000-0xfbb7ffff,0xfbc00000-0xfbc03fff irq 60 at device 0.1 numa-domain 1 on pci12
ix2: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xe020-0xe03f mem 0xfb980000-0xfb9fffff,0xfba04000-0xfba07fff irq 58 at device 0.0 numa-domain 1 on pci13
ix3: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xe000-0xe01f mem 0xfb900000-0xfb97ffff,0xfba00000-0xfba03fff irq 61 at device 0.1 numa-domain 1 on pci13

 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,449
I retested with 11.3 straight to rsync.net and its the same speed (25-40). The 11.3 + 12.0 system is a Supermicro SSG-6048R-E1CR36H, Intel E5-2670 v3 , the RAIDZ2 stripes in the pool consist of 36 HGST HUS726040AL5210 drives. Uses an LSI 3108 in HBA passthrough mode. Network driver in dmesg is as below

Code:
ix0: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xf020-0xf03f mem 0xfbb80000-0xfbbfffff,0xfbc04000-0xfbc07fff irq 56 at device 0.0 numa-domain 1 on pci12
ix1: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xf000-0xf01f mem 0xfbb00000-0xfbb7ffff,0xfbc00000-0xfbc03fff irq 60 at device 0.1 numa-domain 1 on pci12
ix2: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xe020-0xe03f mem 0xfb980000-0xfb9fffff,0xfba04000-0xfba07fff irq 58 at device 0.0 numa-domain 1 on pci13
ix3: <Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 3.2.12-k> port 0xe000-0xe01f mem 0xfb900000-0xfb97ffff,0xfba00000-0xfba03fff irq 61 at device 0.1 numa-domain 1 on pci13

It seems to me rsync.net is located somewhere on the internet, therefore I believe the speed limit is between your LAN to the internet, or the internet to rsync.net.
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
It seems to me rsync.net is located somewhere on the internet, therefore I believe the speed limit is between your LAN to the internet, or the internet to rsync.net.

I have 1gbps symetrical line. 10gbps on the lan side. Regardless of where I seem to send it I'm in the 25-40 range
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
I had settings wrong when I changed the 11.3 system. 12-RC1 solved my issues for everything else.
 

morganL

Captain Morgan
Administrator
Moderator
iXsystems
Joined
Mar 10, 2018
Messages
2,691
Does this mean the problem is solved? If so, can you write up the problem and how it was solved.
 

chocamo

Dabbler
Joined
Sep 3, 2020
Messages
12
The problem was slow replication speed, maxing out at 40MiB/s in 12-BETA2. After updating to 12-RC1 , the speeds starting to hit ~ 400MiB/s. Unfortunately I'm not sure what in the RC may have changed that improved things.
 
Top