Advice Needed: Uploading 10TB to AWS Glacier Deep Archive from TrueNAS-SCALE-23.10.1

Himala

Cadet
Joined
Oct 18, 2023
Messages
2
Hello TrueNAS Community ,

I'm a new user of TrueNAS, currently working with TrueNAS-SCALE-23.10.1. As part of a project, I need to archive a large dataset – about 10TB of raw images. My plan is to use AWS Glacier Deep Archive for its long-term storage benefits.

I'm looking for guidance on the best way to transfer this data from TrueNAS directly to an AWS S3 bucket, and subsequently to Glacier Deep Archive. Here are my specific questions:

  1. Does TrueNAS-SCALE-23.10.1 offer any native features or compatible third-party tools for directly uploading to AWS S3?
  2. What are the best practices within the TrueNAS framework for transferring such a significant amount of data to the cloud, particularly focusing on maintaining data integrity and optimizing transfer efficiency?
  3. If anyone has experience with a similar task, could you share your insights, especially any challenges you faced and how you overcame them?
Your expertise and experiences would be incredibly valuable to me as I undertake this task. Any advice, tips, or suggestions are highly appreciated.

Thank you all in advance for your help and support!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Hey @Himala

Both TrueNAS SCALE and CORE can sync to cloud storage providers like Amazon S3, using the instructions in the documentation regarding Adding Cloud Sync Tasks - you can then set up a sync task from a dataset to send the files as objects for storage.

The cloud sync task itself I believe will run through rsync - depending on your available bandwidth and the size of the files you want to upload, you may need to increase the number of parallel transfers.

You may experience some challenges with using Glacier with the automatic sync tasks though - for example, the TrueNAS sync won't delete files out of Glacier (in order to avoid the penalties for not having a 180-day minimum object lifetime). There's also the potential billing impacts of the two-stage "upload to standard S3 and then migrate into Glacier" - there's likely to be transfer costs at both stages from Amazon. You may also want to do a file count to ensure that you won't be making an excessive number of API calls (as I understand that's a billable there as well) - doing some "containerization" into uncompressed TARs or similar may help with this.

And of course, the cost of a Glacier/Deep Archive restore can be a bit of a "sticker shock" moment if it ever comes to pass, which could completely negate the cost benefits.

@morganL offered some suggestions to another user who was looking for advice in this thread here:


One of the suggestions was investigating a storage service from a partner of iXsystems, iX-Storj - while the raw per-TB storage costing is higher than Glacier ($4/TB/mo for storage) it's cheaper than standard S3, and the restoration cost is $7/TB - cheaper than Glacier. Easier to sync and manage, no ingress costing.
 

nihil2041

Cadet
Joined
Jan 11, 2024
Messages
2
I have had a similar situation in the past and here's what I did as an alternative to HoneyBadger's excellent suggestion.

1. AWS has a device called a Snowball (https://aws.amazon.com/snowball/) that one can rent for a fee. You may want to check that out. I had ~35TB that I needed in Glacier and my upload speeds were abysmal and would have taken forever. After contacting AWS, they sent me the snowball device where I uploaded all of my data to it. Once my data was on the device, I sent it back to them where they ingested it into my S3 account.

2. Going forward, I am using native AWS CLI S3 tools on my TrueNAS SCALE box for S3 cloud sync tasks. Using the TrueNAS cloud sync tasks does not work well for me because the options aren't granular enough for my use cases as it insists on re-uploading all of my now ~50TB worth of data. I created a script that is kicked off by a systemd timer 2x a week, and it only uploads the changes from the previous sync. Using native AWS CLI tools does have the option to delete files from Glacier.

Good luck!
 
Top