Backblaze upload speeds

elorimer

Contributor
Joined
Aug 26, 2019
Messages
194
With 11.2-U6 in place, I'm now exploring online backup to Backblaze. At the moment it is only using a third of my upload bandwidth, and I'm curious whether this can be increased.

I have a low power FreeNAS server with 16gb and 3 4TB drives in RaidZ1 (call this Main). I am replicating all of the non-media datasets to another lower power FreeNAS server with 8GB and one 4 TB drive (adequate for now) (call this Backup, which is all it does). The media datasets I'm replicating to a local USB drive, as I am less concerned with that. That all seems to be working fine.

So Backblaze would be my third, physically separate, backup. My upload speed is 30mb/s; my speedtests are regularly in that range. I have Backblaze configured as a Cloud Sync sync task on Main to backup one dataset on Main, encrypted. It just completed uploading 128GB now at around 9mb/s, taking just on to two days. Throughout the CPU usage was very low, there were gobs of memory, and internally data moves from Main to Backup around 500mb/s, so I'm not seeing an obvious bottleneck.

Before I configure other datasets for upload, or figure out how often the Cloud Sync should run, is there this speed normal? Is there something I can do to improve the speed?
 

Bhoot

Patron
Joined
Mar 28, 2015
Messages
241
Don’t call a single disk backup. It’s just a disk and if it’s running constantly, I’d be very weary. The problem is because there is no reference data, you stand a chance of slow data corruption which can make many files become unreadable/unrecoverable.
From the configurations I am assuming you’re not using ECC dimm.
Also remember if you write corrupted data to your backup, having x number of backups will also give you corrupted files along all file systems.
FreeNAS requires expensive hardware and it’s saved my neck more than once, including a complete corruption of OS (windows equivalent of BSOD) and I haven’t lost a single bit of data. I’d ask you to read the recommendations listed in hardware section, CoZ only you know how important that data is to you and any loss will be yours alone.
Don’t try to run a merc with kia tyres.
 

elorimer

Contributor
Joined
Aug 26, 2019
Messages
194
Yes, good advice.

My experience with data loss has always involved physical issues, thus the reason I am exploring a strategy including a cloud solution.
 

elorimer

Contributor
Joined
Aug 26, 2019
Messages
194
I think this is not a FreeNAS/Cloud Sync problem. My upload bandwidth tests out at 30mb/s on speedtest.net, and 9 mb/s on the Backblaze speedtest page, using Chrome. 9mb/s is about what the Cloud Sync speed was. Backblaze says one thread should accommodate about 25-30mb/s, so it is likely something about how Backblaze accepts data from my LAN that is limiting things. Don't know why.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
so it is likely something about how Backblaze accepts data from my LAN that is limiting things. Don't know why.

That's not how the Internet works. Unless you are an ISP and have a direct peering with Backblaze, they do not "accept data from [your] LAN".

Speaking as someone in the service provider world, you need to be aware that there are many places that this can go awry. The Internet is made of many "autonomous systems", each one belonging to a different company. As an example, let's say you are on a cable ISP in the USA and are trying to get to a data center in the USA. Your data follows a path over:

- A link from your home "router" (actually a NAT gateway) to the ISP
- If the ISP is not just a single-city regional ISP, then quite possibly a link within the ISP's network to a hub site
- A link from the ISP to an upstream major transit network, probably a "tier 2" backbone network
- In the very best case that "tier 2" may have Backblaze as a customer, but there tend to be different backbone networks used for "eyeball" networks than "cheap server" networks (Cogent, HE, etc).
- In the more likely case you may have at least one more, maybe several more "tier 1" and "tier 2" backbone providers involved
- Eventually the traffic reaches a backbone that has Backblaze as a customer
- And the traffic traverses a link to Backblaze

Now, each time the traffic makes a "hop", you are traversing a circuit that could be experiencing congestion, or other various potential issues that happen. Your traffic is also trying desperately hard to paddle upstream, because the predominant direction for eyeball network traffic is downstream (towards you). You will find in many cases that the networks have been optimized for this use case. For example, if you are on cable broadband, DOCSIS 3.1 allows for up to 10Gbps down and 1Gbps up. So if your ISP has a bunch of people trying to send things upstream, you can find yourself limited almost immediately at that point. But it can also happen at various intermediate points. And the path that packets takes from Backblaze back to you is almost certainly very different, introducing even more challenges.

I've spent decades in this industry and it is a complex topic.

You can get some idea of the path from you to Backblaze by finding out where your data is being sent. During a Backblaze session, do a "netstat -a" on your NAS and look for ESTABLISHED sessions, and see if you can determine which one is the Backblaze session (or sessions). Then traceroute to that address.
 

elorimer

Contributor
Joined
Aug 26, 2019
Messages
194
Interesting. I thought there might be some interacting going on with rclone, like the server would say to BB it was sending a block, and wait until BB said it didn't already have it, rather than a one way asynchronous blast.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,681
Interesting. I thought there might be some interacting going on with rclone, like the server would say to BB it was sending a block, and wait until BB said it didn't already have it, rather than a one way asynchronous blast.

In general, if you want high performance, you cannot wait for Internet round trips ("RTT") because it isn't that uncommon for just the network portion of an interaction to be 50-100ms, which is only 10-20 operations per second. Plus you need some extra processing time to actually do a lookup to see if the data block already exists, or whatever. So typically a protocol has a way to stream a series of requests at the remote so that a bunch of things can be "in progress".
 
Top