Replication server and backup to AWS Glacier Deep Archive

nathanael · Oct 8, 2020

Hi!

I'm a bit new to FreeNas and AWS. We have in our company a data server running on FreeNas and storing ~35TB.
Local snapshot are made every hours and kept 2 weeks.
There's a replication to a backup server, also running on FreeNas. If I understood clearly, every snapshot taken during the day are backed up at midnight to the backup server, on which they are kept 2 monthes.
My first question: is this the correct process to backup data? If something happens to our data server, will we be able to restore all our data from the backup server.

Now, we would like to backup remotely in addition to locally. We found the solution of aws glacier deep archive very attractive: we already have some other data backed up to s3 and we could like to use this service for our main data.
I saw some threads in this forum and on reddit, but they are a bit old and I wondered if things have changed since.
We don't want to sync to s3 then use transitions to send data to Glacier Deep Archive: there will be costs to push them to s3, and then costs again to move them to Glacier.

What I think would be the right solution is to sync snapshots of the backup server directly to Glacier.
First: is it the correct way? Same question than before: if both of our servers are lose their data, will we be able to restore everything from the stored data in Glacier?
Second: is there a way to do that? I didn't find any answer to this one.

Thanks for reading!

Chris Moore · Oct 8, 2020

nathanael said:
FreeNas

It's FreeNAS.

Understand that shapshots on the primary server are only protecting you from accidental data deletion or malware that might encrypt your data. You could mount any of those snapshots as a read-only share to access a historical version of the files or roll back to a previous snapshot, but if there is a hardware fault that takes the server down, you could mount a snapshot from the backup server, which should be possible, but I recommend doing a test to ensure it works the way you think it will work. In the event you needed to mount the backup, it might be as much as a day behind, if you are actually getting transfers to the backup system. Test this to ensure it is actually working. Mounting the snapshot on the backup server would allow you to get back in production. Alternatively, you could bring the primary server back up when any hardware fault was corrected, but it would be unlikely that a data restoration was needed unless the disk array was the cause of the failure.

nathanael said:
My first question: is this the correct process to backup data? If something happens to our data server, will we be able to restore all our data from the backup server.

That is a policy question. It depends entirely on how badly you would be affected by down time. In my shop, we have two servers running with replication between the primary and the backup, so if one fails we can continue operating from the other while the faulted system is brought back online. It isn't automatic failover but down time would be pretty minimal. We have a tape library system to make more long-term backups and to archive data that we don't want to discard but don't need on hot storage. You can get a tape library system from Quantum for about $30k that will allow you to backup a massive amount of data, and restore it, and you don't need to worry about all the costs involved in something like s3 where the monthly cost will just keep adding up and you no longer have control of your data.

nathanael said:
We don't want to sync to s3 then use transitions to send data to Glacier Deep Archive: there will be costs to push them to s3, and then costs again to move them to Glacier.

We extrapolated all the costs of that remote storage, including that you need to pay again to get the data back, and decided an in-house tape library made more sense.

nathanael said:
if both of our servers are lose their data, will we be able to restore everything from the stored data in Glacier?

I wouldn't count on it and it would be extremely slow, but restoring from tape it fairly slow also. Either case would be an extreme disaster. If you have two good servers, it is unlikely that they will both fail at once unless there is a natural disaster or the building burns down. Then, you will need new hardware to bring the system back or a cold site that you can fall back on. What is your disaster recovery plan?

nathanael · Oct 8, 2020

Chris Moore said:
It's FreeNAS.

My bad!

Thank you very much for all these advices!

Chris Moore said:
shapshots on the primary server are only protecting you from accidental data deletion

This is exactly how we see their purpose.

This is how we thought the architecture:

Local snapshot every hours during business hours to prevent accidental data deletion
Daily Local backup of those snapshot on another server, slower than the main one (HDD instead of SSD), in case the main server loses its data, to be able to restore them "fast"
Remote backup in case both server lose their data. Mainly in case of a disaster like a destruction of our building

What I understand your solution is:

No changes on local snapshot
Hourly replication on a backup server, of the same performance, to ensure there is no much down-time
Long term storage on a tape library like Quantum, I assume local

Your solution don't seem to include the remote part. What do you think would be the best solution?

The down time is not that critical for us, but the protection of our data is the most important point. We could afford a down time of several hours if our main server is faulty and we need to restore from the backup. And we could afford a much longer down time if both the servers are down for good.

We thought of AWS Glacier Deep Archive because of the costs: the storage part costs only ~1$/TB. There is the PUSH/POST requests also, which I don't seem to grasp the exact cost for our use-case (it's supposed to be 0.06$/1000 requests). Then of course, there is the cost of extraction, which we estimated to be around 1000$.

In case of a disaster, we thought it would be quite a low price to ensure all data recovery.

What do think of all that?

Chris Moore · Oct 9, 2020

nathanael said:
Your solution don't seem to include the remote part. What do you think would be the best solution?

We keep a set of tapes locally and send a set out to a storage bunker. We don't have a fail-over site, so if our building burns down or we have a flood (both are not likely) but a disaster would have us down until we could obtain replacement hardware to restore operations. The offsite storage facility is a bunker that the military built that has been re-purposed for the archiving information, so I think the data is safe enough there.

nathanael said:
We thought of AWS Glacier Deep Archive because of the costs: the storage part costs only ~1$/TB. There is the PUSH/POST requests also, which I don't seem to grasp the exact cost for our use-case (it's supposed to be 0.06$/1000 requests). Then of course, there is the cost of extraction, which we estimated to be around 1000$.

These costs add up over time but the math is different for us. We have over 250TB of data that must be archived. Depending on the amount of data, your cost for the Glacier might be significantly less than the purchase cost for a tape library. Another factor for us was the concern over getting the data back. You just need to go with what lets you be comfortable. With good hardware, you may never need even the backup server, much less the off-site backup. In the past three or four years, I only had to shift to the backup server once and that was for a scheduled outage. I can't even remember what it was for because I have not had a down time in so long. Even when I replaced all the drives in the NAS for a capacity upgrade, I did that a couple drives at a time and never needed to take the server down. I do software updates on weekends so there is no work stoppage. FreeNAS is rock solid when you have good platform hardware under it.

nathanael · Oct 12, 2020

Well, we clearly don't have the same use-case, but your point of view is really interesting. I would never have thought of tape library.
I think we'll go on with our Glacier idea, but I'll keep your solution in mind.

So now, do you have any idea on how to connect FreeNAS directly to Glacier's API?

bablos · Oct 27, 2020

nathanael said:
Hi!

I'm a bit new to FreeNas and AWS. We have in our company a data server running on FreeNas and storing ~35TB.
Local snapshot are made every hours and kept 2 weeks.
There's a replication to a backup server, also running on FreeNas. If I understood clearly, every snapshot taken during the day are backed up at midnight to the backup server, on which they are kept 2 monthes.
My first question: is this the correct process to backup data? If something happens to our data server, will we be able to restore all our data from the backup server.

Now, we would like to backup remotely in addition to locally. We found the solution of aws glacier deep archive very attractive: we already have some other data backed up to s3 and we could like to use this service for our main data.
I saw some threads in this forum and on reddit, but they are a bit old and I wondered if things have changed since.
We don't want to sync to s3 then use transitions to send data to Glacier Deep Archive: there will be costs to push them to s3, and then costs again to move them to Glacier.

What I think would be the right solution is to sync snapshots of the backup server directly to Glacier.
First: is it the correct way? Same question than before: if both of our servers are lose their data, will we be able to restore everything from the stored data in Glacier?
Second: is there a way to do that? I didn't find any answer to this one.

Thanks for reading!

In my company, we also faced a similar problem, we would like to perform backups not only locally, but also remotely. We tried Amazon S3 Glacier, and what really appeals to S3 Glacier is that it allows you to access data in minutes and is suitable for data that need faster access. As a result of using Glacier, I rarely got access to my data, and they told me on cciedump.spoto.net that I could switch to Deep Archive, and it immediately became noticeably easier. As a result of prolonged use, I can say that I am satisfied with the quality of the service.

nathanael · Oct 27, 2020

bablos said:
In my company, we also faced a similar problem, we would like to perform backups not only locally, but also remotely. We tried Amazon S3 Glacier, and what really appeals to S3 Glacier is that it allows you to access data in minutes and is suitable for data that need faster access. As a result of using Glacier, I rarely got access to my data, and they told me on cciedump.spoto.net that I could switch to Deep Archive, and it immediately became noticeably easier. As a result of prolonged use, I can say that I am satisfied with the quality of the service.

So you upload your backups directly to S3 Glacier from FreeNAS? How did you do that?
Do you upload only the snapshots, is it enough?

Sawtaytoes · Dec 12, 2023

As far as I know, you can only upload to S3 and offload to Glacier Deep-Archive (different than that 2013 version, but I'm not sure how).

I found this page, but I'm not certain if it's accurate:

My assumption is that by creating a Glacier vault, you can do Glacier uploads automatically, but it's problematic because there are costs involved with transferring small files:

From what I understand, they want you to tar or zip these small files before sending them to Glacier, so I guess it doesn't change anything either way.

Asking their "ChatGPT" bot, I can't seem to find out why the Glacier S3 console exists nor how it's different from S3. If I create an S3 bucket that offloads to Glacier, I don't see it in this Glacier area.

---

Another way to do this is setting up and S3 bucket and transferring everything in a bulk transaction after X days, that could potentially reduce the cost of transferring to Glacier.

That's my understanding from reading Amazon's docs, but honestly, I moved to Backblaze B2 because it's too freakin' complicated to use Glacier storage. Now my B2 costs are too high, so I'm back to looking at Glacier.

Please chime in if you know better because I feel so lost when working with anything AWS. It's like they've kept it intentionally obtuse.

Sawtaytoes · Dec 12, 2023

I was wrong in a few ways.

The Glacier Vault is for the defunct 2013 version. I think it's still there because people haven't moved off.

After creating a regular S3 bucket, this is all you need to do in TrueNAS when setting up the backup task:

Looks like S3 is a big thing with tiered storage now. Glacier isn't a second entity any longer; it's a first-class citizen and even has multiple tiers (such as flexible access).

Important Announcement for the TrueNAS Community.

Replication server and backup to AWS Glacier Deep Archive

nathanael

Cadet

Chris Moore

Hall of Famer

nathanael

Cadet

Chris Moore

Hall of Famer

nathanael

Cadet

bablos

Cadet

nathanael

Cadet

Sawtaytoes

Patron

Sawtaytoes

Patron

Similar threads