Slow response and excessive reorganizing on 26TB system

Status
Not open for further replies.

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
Thank you for pointing out the difference on the USB enclosure. But am I not losing speed by dividing up the 4 disks on a single USB port?

Yes for sure, but we're not concentrating on speed - we're concentrating on persuading you that your setup is too risky to use. Fixing that will automatically fix the speed issues.

On the data integrity, I have lost connectivity to USB devices before without losing power. I do not know the reason. Rebooting brings the USB enclosure on-line without losing any data.

That's interesting! I would have expected the whole RAIDZ to have gone down at this point, permanently corrupted.
Hey, you more experienced guys, what's happening here? Does the RAIDZ set shut itself down so quickly that nothing is damaged? That's fantastic if so.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
That's interesting! I would have expected the whole RAIDZ to have gone down at this point, permanently corrupted.
Hey, you more experienced guys, what's happening here? Does the RAIDZ set shut itself down so quickly that nothing is damaged? That's fantastic if so.

So from my experimenting with ZFS and deliberately disconnecting drives as soon as you lose redundancy + 1 additional disk the zpool goes offline immediately. If you restore the disks you CAN restore the zpool, but there is no guarantee of anything. Unwritten data could be lost etc etc. If you do a scrub you may find no data corrupted, but my understanding is that if you hadn't completed a write you could still have corrupted data. Pretty much any recently written data can be considered suspect.

This is where having a program like fsck may be necessary. ZFS doesn't have protection from corrupted file system information. So you could end up with a screwed up file system in which there is no recovery(unless you are awesome and can find and fix it yourself manually.. not likely) except to destroy the zpool and recreate it from scratch. Basically you are taking an enormous gamble of which there is no 'undo'. If your file system is corrupted you are basically screwed. Because ZFS uses a top-down file system you can actually have corruption in the 'root' directory of the zpool that cascades through the entire zpool resulting in complete loss of data.

Pretty much the two nastiest things you can do to a zpool aside from blatantly deleting the zpool is multiple disk corruption leading to loss of the zpool and losing more drives than you have redundancy for.

Even with a complete loss of power to the FreeNAS server all you need is to be sure that the FreeNAS server wasn't actively reading or writing and you are generally okay. For instance, my FreeNAS server has an UPS and upon a loss of power to my house the network switch that my FreeNAS server connects to loses power effectively killing any transactions in progress over the network. Even if my FreeNAS server wouldn't power down on its own it would definitely have the 6 seconds needed to commit any data in RAM to disk before the server lost power. Also, when power was restored I'd simply recopy whatever file I was working on so any potential corruption in the file would be removed. Whats important to protect ZFS from file system corruption is that the file system writes be completed. If your one file is corrupted I'd vote for that over completely file system corruption any day.

So that's why using USB and other external cabling isn't just crappy because of performance, its just an absolutely terrible terrible idea. I get nervous when people tell me they use a SAS external cable to connect their machine to an external storage array. I used to have a SAS controller connected to an external box with 16 drive attached. I put it in my basement and in a corner so nobody would ever touch or even bump it. Additionally the controller had a write cache on the controller with battery backup so any unwritten data would be written when the disk was reattached(just to be safe).
 
Status
Not open for further replies.
Top