rsync and data corruption

Vortigern

Dabbler
Joined
Oct 14, 2022
Messages
45
Dear All,
I'm currently running TrueNAS CORE 13 and I made a dataset in my pool which is share with my PCs via NFS. So far I've been using "rsync" to upload new files:
Code:
# Sources: /Data1 and /Data2 (local hard drives)
# Destination: /home/user/NFS (NFS share mounted locally)
$ time rsync -ah --no-owner --no-group --progress --inplace /Data1 /Data2 /home/user/NFS/

Then I started to think on consequences:
  1. What if a file gets corrupted on my local hard drive(s)? Rsync should update the remote copy hence resulting in corrupting the data on the NAS as well, right?
  2. How can I avoid this?
So far the only idea I've had to avoid this is actually to revert the process: first copy the data on the NAS and then rsync the NAS folder to the local hard drives (basically reverting sources and destinations. Is there another option?

Thanks in advance,
Vortigern
 

Patrick M. Hausen

Hall of Famer
Joined
Nov 25, 2013
Messages
7,776
First if a file on your PC gets corrupted silently, its modification date will not be updated so rsync will not overwrite the copy on your NAS in archive mode.

Second set up a snapshot schedule on the NAS. If you notice corruption you can get an older version back.
 

ChrisRJ

Wizard
Joined
Oct 23, 2020
Messages
1,919
Second set up a snapshot schedule on the NAS. If you notice corruption you can get an older version back.
This is also a great protection against ransomware.
 

Vortigern

Dabbler
Joined
Oct 14, 2022
Messages
45
Hi @Patrick M. Hausen and @ChrisRJ,
thanks for the advice. Nonetheless Snapshot will double the space needed, right? I'd like avoiding using all my space: 2 disks are gone for parity and if I use snapshots I would be able to use only half of the remaining capacity. Moreover, is there a way to recover the file via the checksum (assuming single bit flip)? I run scrubs every week.

Cheers,
Vortigern
 
Joined
Oct 22, 2019
Messages
3,641
Snapshot will double the space needed, right?
A snapshot only takes up space that is represented by exclusively unique records at the time when that snapshot was created. (A record is not "exclusively unique" if more than one snapshot points to it or your live filesystem points to it.)

In other words, if you never delete or modify any files in a dataset (you only create new files), then your snapshots will always consume zero space.

When you start to delete or modify files, your earlier snapshots will still "point to" the old records, and consume this additional space. (This is what lets you restore a snapshot, browse a snapshot, and even recover individual files from an older snapshot.)
 
Joined
Jun 15, 2022
Messages
674
Generally, Network Attached Storage exists as a solution to the problems associated with storing files on local machines. Therefore, store your files on the NAS.

If local file MyData.xls corrupts with bitrot, you modify it (edit the corrupt local Excel spreadsheet), and rsync it up to the NAS, yes, the NAS copy will be corrupt. Chances are you'll discover random errors in the sheet at some later point in time and have to create a new sheet.

If we're talking about a laptop and you need a modifiable local copy on the laptop, yes, you should use some sort of file versioning on the NAS (like snapshots).
 

Vortigern

Dabbler
Joined
Oct 14, 2022
Messages
45
Dear @WI_Hedgehog and @winnielinnie,
thanks for your answers and sorry for my late reply. I actually use my local copy of the data s backup of what's on the NAS, so that I have 2 independent copies of the files. What I could do instead is to rsync from the NAS to the PC. But the question is: can the ZFS revoer in case of single bit flip?
@winnielinnie assuming I save all my data to the NAS and then make a snapshot, how big will this be? As far as I know it's about the same size of the data stored, than the next ones will have 0 delta if I do not modify anything on the NAS. To me this means I will effectively eat up, eventually, half of the free space with snapshot, right?

Cheers,
Vortigern
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
ZFS will recover as long as there is redundancy.

Z1: there is one copy (and two copies of critical metadata) - the file exists twice
Z2: there are two copies (and four of critical metadata) - the file exists three times
Z3: there are three copies (and six of critical metadata) - the file exists four times

Starting from Z2 on (and a bit depending on single drive and vdev size) I‘d say the problems with lunatic wetware typing rm -fr in the root dir, exploding PSUs taking down 6 drives at once, the NSA resetting ones to zeros etc. outweigh the ZFS inherent risks.
 

Vortigern

Dabbler
Joined
Oct 14, 2022
Messages
45
Dear @awasb,
thanks a lot for your quick reply. I'm currently using Z2 redundancy. So theoretically I could lose up to 2 drives. Then the question is: do I need to care about corrupted data on my NAS? Or will it recover the corrupted data during scrubbing?

Cheers,
Vortigern
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
As long as you use ECC RAM on your server and your client doesn‘t produce/introduce wrong/false bits … yes. Use snapshots and your storage is even more resilient.
 
Last edited:

Vortigern

Dabbler
Joined
Oct 14, 2022
Messages
45
Dear @awasb,
I do use ECC RAM, but I do not use snapshot hence it will take up to half of the space. About the point of not introducing wrong/false bits I can easily do that just by saving the files directly on the NAS and then rsync to my local disk. But basically it means as far as I'm not touching the data on the NAS the fact that I'm using RAIDZ2 and scrubbing should automatically correct errors. By the way this also requires checksum to be ON, right? I did not use anything different than "ON" on checksum because I don't know the benefits of the various checksum algorithm.

Thanks,
Vortigern
 

awasb

Patron
Joined
Jan 11, 2021
Messages
415
Snapshots take almost nothing if you don't renew pool data on a daily basis.

Concerning "bit stability": Do your clients use ECC-RAM, too? Du you use an UPS? How are your networking cables laid out? How is the shielding? There is no need to answer all this. What I'm triying to say is this: Do not look at it in isolation and with a focus on the server alone. That is why I added the comment about probabilities of general failure above.

Security and availability are concepts, not technical features.
 
Last edited:
Joined
Oct 22, 2019
Messages
3,641
I do use ECC RAM, but I do not use snapshot hence it will take up to half of the space.
@winnielinnie assuming I save all my data to the NAS and then make a snapshot, how big will this be? As far as I know it's about the same size of the data stored, than the next ones will have 0 delta if I do not modify anything on the NAS. To me this means I will effectively eat up, eventually, half of the free space with snapshot, right?

I already answered this in post #5.

I'm not sure why you keep insisting the same thing over and over?
 
Top