Syncing to Glacier when a local file has become corrupted

hendry

Explorer
Joined
May 24, 2018
Messages
98
Hi, I plan to backup my FreeNAS to Glacier via a "Cloud Sync Task". I think this is the fastest way considering how slow b2 etc is from Singapore.
My concern is whenever a local file changes and synced up, what actually happens in detail RE charging & backup integrity.

Glacier storage class is mentioned before at https://www.ixsystems.com/community/threads/aws-glacier-on-freenas.73890/post-516874

I've setup my cloud S3 storage bucket to transition to Glacier after a day.

https://s.natalian.org/2019-07-09/glacier.mp4

The case I am especially concerned about is if there is local corruption on FreeNAS of my file, how do I catch that it's overwriting a good copy of my remote glacier backup?
Use bucket versioning?
Use object locking?

Trust ZFS will never corrupt a file? Tbh it's more likely to happen via a mount and some accident in Finder.

My data is largely static, i.e. files do not change after a period of time. YYYY-MM-DD archives of the media I capture. i.e. stuff > 90 days will probably not change at all.

Be great to hear from anyone who uses Glacier for their UX, since I feel the devil is in the details.
 
Last edited:

hendry

Explorer
Joined
May 24, 2018
Messages
98

hendry

Explorer
Joined
May 24, 2018
Messages
98
I am using the following S3 bucket configuration using Terraform

Code:
provider "aws" {
  profile = "mine"
  region  = "ap-southeast-1"
}
# https://www.terraform.io/docs/providers/aws/r/s3_bucket.html#noncurrent_version_expiration
resource "aws_s3_bucket" "b" {
  bucket = "red-freenas-backup"
  acl    = "private"
  versioning {
    enabled = true
  }
  lifecycle_rule {
    id      = "killoldversions"
    enabled = true
    noncurrent_version_expiration {
      days = 90
    }
    expiration {
      expired_object_delete_marker = true
    }
    abort_incomplete_multipart_upload_days = "3"
    transition {
      days          = 1
      storage_class = "GLACIER"
    }
  }
}


I plan in future to track multiple versions, probably via inventory listings. In my particular storage of static files, there really should not be different versions of files. Nonetheless I keep them around for 90 days so I can investigate.

1563699619_2560x1440.png


Couple of comments of current 11.2 stable Cloud sync that uses rclone under the hood

The Error: 14 is due to it trying to list my buckets I think. No need for that.
Ideally I want to upload as STANDARD_IA aka the lowest initial storage cost for my static files
I noticed I uploaded a `/mnt/red/redsamba/.recycle` directory which I think is cause by the samba share. Tbh I wish I could ignore this upload, but there is no way to ignore files in the current UI above. Is there a bug open about that?
 

hendry

Explorer
Joined
May 24, 2018
Messages
98
Last edited:

Evi Vanoost

Explorer
Joined
Aug 4, 2016
Messages
91
You can simply download the rclone for FreeBSD and overwrite the version that comes with FreeNAS. Only when they update the OS will it be overwritten. Also, you can specify GLACIER and DEEP_ARCHIVE even in older versions of rclone manually and it will work.

On the other hand, per you original post, how do you know the remote hasn't been corrupted? Especially with Amazon Deep Glacier, there is no way of knowing whether or not your data is still 'good' without shelling out money to recover it. So you have to trust both sides to not be corrupt. ZFS data being corrupt without notice would be super-rare as in once in the lifetime of the Universe event. Or you can in addition to trusting ZFS, use a PAR2 on every file and distribute portions of the file(s) and the PAR2 to Amazon, Azure and B2.

I simply stream my snapshots to Amazon Deep Glacier using rclone, it's a lot faster, safer and cheaper (per transaction) than checking and streaming billions of individual files.
 

hendry

Explorer
Joined
May 24, 2018
Messages
98
Thanks Evi, so you don't use Cloud Sync Tasks and you build the rclone invocation yourself? I probably should do this, since I want the ability to ignore some file patterns.

If rclone did what `aws s3 sync` does (I tested this), I would know if the file was corrupted because there would be a new version of the file.

IIUC AWS S3 exposes some checksum on the sync, so there is no need to recover it.

I contemplated par2, but I don't think I need it. Ideally, old directories had a cron job where a par2 manifest would be generated though.

If you stream whole snapshots, then it's impossible to one file out, right? I don't think I accounted for the transaction cost. But since the bulk of my 3TB is on S3 now, hopefully that is already a sunk cost and I needn't repeat it.

Nonetheless your approach sounds valid. I doubt I would ever pick out a single file. This S3 backup would probably only be used if my FreeNAS burst into flames.
 

albertredneck

Dabbler
Joined
Aug 27, 2019
Messages
19
Hi there, @hendry, I'm also interested in backing up all my data to S3 Glacier. Did you make it work? Did you use the Sync Tasks our a custom Cron Job?

I'm not an expert on rclone and I'm a bit concerned about how the S3 connection will work with Glacier, being a "cold storage", to avoid superfluous traffic (which will incur in additional costs) and just upload files if they've been modified, in an optional way.
 

hendry

Explorer
Joined
May 24, 2018
Messages
98
I updated to nightlies for the new Cloud tasks interface and furthermore updated rclone to make it all work. I did file a bug for the FreeNAS developers to update rclone https://jira.ixsystems.com/browse/NAS-103087 I manually updated my rclone binary, which is straightforward since it essentially is a static binary.

So now I am syncing direct to the Glacier Deep Archive Storage Class as a remote backup. Hopefully I will never need to retrieve a file from it as it is probably quite expensive.

To summarise my learnings:
  • transitions are expensive, just upload direct
  • avoid lots of files (e.g. do NOT backup a .node_modules directory)
  • you need to enable rclone's fast list (option is present in new FreeNAS12 interface) otherwise you will be paying a dollar or two just to list and compare on sync on a ~3TB archive
  • just be aware that actions on objects cost money. i.e. you can get into situations where it's more expensive to expire lots of files than just keeping them
Btw I made a video here https://www.youtube.com/watch?v=lxWb8fFp5vY&t but it doesn't have the latest learnings.
 
Last edited:

albertredneck

Dabbler
Joined
Aug 27, 2019
Messages
19
That's awesome @hendry, thanks for sharing! I think I'm going to wait until 12 to change my setup then... Also, any special consideration configuring the S3 side?

Any chance you can write a guide on the Resources section about all the process?
 

hendry

Explorer
Joined
May 24, 2018
Messages
98

nathanael

Cadet
Joined
Oct 8, 2020
Messages
5
Hello @hendry ! I'm looking for a solution to backup a ~40TB to Glacier Deep Archive (I detailed it here).
I wondered if it worked fine for you and if you could explain how exactly you did it (I'm a bit new to FreeNAS).

Thanks!
 
Top