Network safety?

Status
Not open for further replies.

OJ2k

Dabbler
Joined
Aug 26, 2014
Messages
13
Context: A curious home user.

Two weeks ago I was ignorant (although a victim) of bitrot.

One week ago I was ignorant of concerns on non-ECC memory.

Although trying to avoid paranoia, is there anything I should know about copying large amounts of data over a network? E.g. if regularly rsyncing backups should I consider checksum triggers? I'd guess there is already error correction in the network protocol, but I've been proven wrong with HDDs and RAM.
 
Last edited:

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
ECC ram would be the first step in preventing bitrot or corruption long term.. You might get away with a non-ecc build until you don't and lose it all..

I like rsync.. It has it's own checksum I believe for backup/transfer.. Should be safe using it..

Without a hardware list it's impossible to tell what you are or were running.. Generally stick with the recommended stuff.. Read the manual a few times (refer to it lots during setup a few times) and you'll be flying..
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
For stuff that's very valuable, independent checksums are a potentially helpful thing. The checksums used by the lower network levels are intended to be fast, not comprehensive. On the flip side, having a checksum that tells you the data's wrong but no longer having a valid copy of the data ... not so useful.

Since you are enlightening yourself, please feel free to refer to http://noahdavids.org/self_published/CRC_and_checksum.html or similar references on the Internet.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
Context: A curious home user.

Two weeks ago I was ignorant (although a victim) of bitrot.

One week ago I was ignorant of concerns on non-ECC memory.

Although trying to avoid paranoia, is there anything I should know about copying large amounts of data over a network? E.g. if regularly rsyncing backups should I consider checksum triggers? I'd guess there is already error correction in the network protocol, but I've been proven wrong with HDDs and RAM.

I regulary use hashdeep (with MD5 or even SHA256 checksums) to check if my important backups are still the same (and generate the checksums on the original data in the first place.)
I also have always an off-site backup.
 

panz

Guru
Joined
May 24, 2013
Messages
556
ECC ram would be the first step in preventing bitrot or corruption long term.. You might get away with a non-ecc build until you don't and lose it all..

I like rsync.. It has it's own checksum I believe for backup/transfer.. Should be safe using it..

Without a hardware list it's impossible to tell what you are or were running.. Generally stick with the recommended stuff.. Read the manual a few times (refer to it lots during setup a few times) and you'll be flying..

I'd like to love rsync, but it has a severe limitation: file path. I'm not a programmer, so I can't figure out why they couldn't make it work on paths deeper than (250?) characters.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I'd like to love rsync, but it has a severe limitation: file path. I'm not a programmer, so I can't figure out why they couldn't make it work on paths deeper than (250?) characters.
Some filesystems / OS don't support a path length greater than 255 characters. For instance, I think the max UNC path on windows is 255 characters. I have had backup jobs fail on windows because users create stupidly long file paths on shares.
 

panz

Guru
Joined
May 24, 2013
Messages
556
You haven't these limitations in Win-64 bits or Linux, but still on a modern Unix. This is nonsense... :)
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125

OJ2k

Dabbler
Joined
Aug 26, 2014
Messages
13
Since you are enlightening yourself, please feel free to refer to http://noahdavids.org/self_published/CRC_and_checksum.html or similar references on the Internet.

This was excellent. From the last paragraph:

If you are transferring files via FTP or some other protocol you can zip the file before transferring. A corrupted file will not unzip correctly.

I'd always dismissed using rsync compression when the destination is on the same network. Of course, the CPU/HDD may take a beating, especially if combined with checksum triggers.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Don't dismiss it too quickly. Modern gear may be fast enough to compress data and get you faster-than-gigE speeds, especially if you use a low compression setting.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
A filesystem is a database for files. :)
I never thought of it that way. Maybe there was some forethought in the way some users create file names that have 100+ characters in them. When they inevitably lose the file, I can quickly use the 'find' command to retrieve them using a vague description of the contents of the file even if the file is a jpg or pdf. :)
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
I'd like to love rsync, but it has a severe limitation: file path. I'm not a programmer, so I can't figure out why they couldn't make it work on paths deeper than (250?) characters.
I'm a programmer and as far as I can think of, there is no reason why the buffer size can't be any larger than 250.
They probably have some constant defined for maximum string sizes and it happens to be set at 250 for whatever reason.
It could potentially be tedious to fix if, instead of using a constant, they just hard-code 250 everywhere though.

Now, I know Windows have some path length limitations, but rsync is a UNIX tool, so I don't see why they would code it having Windows in mind, but oh well.
 

AlainD

Contributor
Joined
Apr 7, 2013
Messages
145
This was excellent. From the last paragraph:

If you are transferring files via FTP or some other protocol you can zip the file before transferring. A corrupted file will not unzip correctly.

I'd always dismissed using rsync compression when the destination is on the same network. Of course, the CPU/HDD may take a beating, especially if combined with checksum triggers.

Hi LZ4 compression/decompression is probably faster that a 1Gb/s network on most modern CPU's.
xxHash or xxHash64 are very fast hashers, xxHash64 can be more than 10GB/s.
 

panz

Guru
Joined
May 24, 2013
Messages
556
My gut reaction is that if you need more than 260 characters you are probably trying to turn your filesystem into a database. Set up a proper DB. :)

Did you ever try to backup the Plex directory with Rsync? Did you try to backup with Rsync the iTunes folder which holds the iPad or iPhone backups? They have directories like /data/iTunes/user/user data/something here with 50 characters/something here withcharacters/ajdjdjsjndndnjdj737873737jsndjdjjsjsjejekjxnncmlflpyouikukejwhahwjqjanwndkedjdkekddjjdkkejekkdke73847585969607849928283/ASKDMFKEKKEOIITITOTTOSOJWNZNJS/FUCKING BACKUP HERE

Am I still trying to turn my filesystem into a db? ;)
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
Ah yeah, Apple does like to make stuff like that.
 

Whattteva

Wizard
Joined
Mar 5, 2013
Messages
1,824
A filesystem is a database for files. :)

I never thought of it that way. Maybe there was some forethought in the way some users create file names that have 100+ characters in them. When they inevitably lose the file, I can quickly use the 'find' command to retrieve them using a vague description of the contents of the file even if the file is a jpg or pdf. :)
What jgreco said. Linus Torvalds even says he basically created Git to model a file system rather than a source control software since he's a kernel guy.
 
Status
Not open for further replies.
Top