Offline Backups for big datasets

raigan21

Cadet
Joined
Aug 16, 2019
Messages
8
Hello, community,

I'm still learning the ways of Truenas and ZFS and now I have an exciting task on hand, I need to find a way to create cold backups that will live offline most of the time for big datasets of around 20TB, most of the data are really high-resolution images and videos so if possible we don't want to use compression so the quality is not affected.

my idea is to use a small separate Truenas box like the Truenas mini-series to create pools and copies the data from the main Truenas to these new pools that later on will go offline, I'm doing this mainly because the main nas has no more empty bays.

so this small machine I was planning to create pools of 2 or 4 hard drives maximum to save the data and send it offline to a fireproof cabinet out of site.

Now my questions are:

1. is ZFS easy to restore when need it?
2. will ZFS protect my data from corruption in this way?
3. I know that spinning hard drive backups require a scrub often to guarantee the integrity of the data, the question is how often should we scrub the data that is offline?

I will appreciate any suggestion to guide me in this journey.
 
Joined
Oct 22, 2019
Messages
3,641
most of the data are really high-resolution images and videos so if possible we don't want to use compression so the quality is not affected.
ZFS inline compression is lossless (just like when you "zip" something); nor does it do anything beyond the stored records. Whether the file is a PDF document, JPEG image, MP4 video, ASCII text file, etc; it doesn't matter. Records are records. It's best to leave compression to "LZ4", as it uses "early abort" and any level of compression enabled will always compress the highly compressible metadata. (EDIT: This might not be true. ZFS may in fact always compress metadata by default, even if compression is disabled at the per-dataset level.)

I'd still leave compression enabled, since any type enabled will "zero out" the padding at the end of a record.


3. I know that spinning hard drive backups require a scrub often to guarantee the integrity of the data, the question is how often should we scrub the data that is offline?
That's up to you. Maybe a couple times a year will satisfy you.


1. is ZFS easy to restore when need it?
You essentially do a replication (send/recv) in "reverse". From the backup pool to the new "main" pool.


2. will ZFS protect my data from corruption in this way?
There's nothing different about ZFS on a main pool or backup pool. What determines data integrity is level of redundancy, frequency of scrubs, regular hardware checks, etc.
 
Last edited:

Arwen

MVP
Joined
May 17, 2014
Messages
3,611
To add, I'd suggest some:
  • Anti-static baggies to store the removed hard drives. Size depends on either bare drives or storing with drive sleds.
  • Enough Seahorse or other protective cases for the drive sets, (so they are not damaged during movement).
  • Clear and concise labeling of the drives, in which pool, what type of pool, date copied, and last date scrubbed.
  • A local web / wiki page with the info from the above line.
  • Clear testing & procedures, also documented
Their have been many cases, some here in the forums, of people making backups and not being able to recover data when needed. An un-tested or un-usable backup copy is in some ways worthless.


One other point. Many companies are now finding that they have really old archival data that may be quite hard to restore. Even if they are mandated by law to have 7 or 10 year old archival data, if the specific tape drives don't exist 7 or 10 years from now, good luck restoring the data.

Thus, in some cases, I'd recommend 2, (or more), rotating archive backups. Meaning you take one set today, and send off site. Then perhaps 3 or 6 months from now, make another set and send off site. You can then request the first set brought back for ZFS scrub. And if good, send back to off site. Then erase the local copy.

This accomplishes 2 things. If your off site sets have redundancy, you can "fix" a set by replacing a disk as needed. If not, you have a second off site set that hopefully if still good.

If your off site sets don't have redundancy, then simply create a 3rd set. And add it into the scrub rotation.
 
Top