Please help! Started up FreeNAS, suddenly voluime storage (ZFS) status unknown?!

warri · Mar 25, 2013

You can use rsync like described in ProtoSDs last post. In the first run, this should at least recover any files up to possibly broken ones, and in the log you are able to see which file(s) are causing a problem. In a second run, you then can just skip the offending files. If you encounter more bad files, just repeat.

Update: Of course this would involve actually moving the files off the pool to a backup disk. Alternatively, you can try 'rsync -avn' + protoSDs command, which will perform a dry-run. But the system might crash again, so I'd personally just grab an external disk big enough to hold the most important data and do the actual rsync.,

HHawk · Mar 25, 2013

Okay thanks, however since I am a complete FreeBSD noob (though I am getting a little experience in ZFS and restoring it :() I would need some help with that.

I quote the command from ProtoSD:

Code:

rsync -av --partial --log-file=/some-place-not-on-your-pool.log source-directory destination-directory

I am going to clean my external harddisk (eSATA) tonight and hook it up to the NAS. But to be completely honest, I do not know whats the best way to do this.

First, in regards to the given code from ProtoSD. Would it be alright to place the log file on the USB-stick?

Code:

root@mfsbsd:/root # egrep 'ad[0-9]|da[0-9]|cd[0-9]|acd[0-9]' /var/run/dmesg.boot
em1: <Intel(R) PRO/1000 Network Connection 7.3.2> port 0xde00-0xde1f mem 0xfdda0000-0xfddbffff,0xfdd80000-0xfdd9ffff irq 18 at device 0.1 on pci5
da5 at mps0 bus 0 scbus0 target 5 lun 0
da5: <ATA WDC WD20EARX-008 AB51> Fixed Direct Access SCSI-6 device 
da5: 600.000MB/s transfers
da5: Command Queueing enabled
da5: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da2 at mps0 bus 0 scbus0 target 2 lun 0
da2: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SCSI-6 device 
da2: 600.000MB/s transfers
da2: Command Queueing enabled
da2: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da0 at mps0 bus 0 scbus0 target 0 lun 0
da0: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SCSI-6 device 
da0: 600.000MB/s transfers
da0: Command Queueing enabled
da0: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da4 at mps0 bus 0 scbus0 target 4 lun 0
da4: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SCSI-6 device 
da4: 600.000MB/s transfers
da4: Command Queueing enabled
da4: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da3 at mps0 bus 0 scbus0 target 3 lun 0
da3: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SCSI-6 device 
da3: 600.000MB/s transfers
da3: Command Queueing enabled
da3: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da1 at mps0 bus 0 scbus0 target 1 lun 0
da1: <ATA WDC WD20EARX-00P AB51> Fixed Direct Access SCSI-6 device 
da1: 600.000MB/s transfers
da1: Command Queueing enabled
da1: 1907729MB (3907029168 512 byte sectors: 255H 63S/T 243201C)
da6 at umass-sim0 bus 0 scbus11 target 0 lun 0
da6: <SanDisk Extreme 0001> Removable Direct Access SCSI-6 device 
da6: 40.000MB/s transfers
da6: 15272MB (31277232 512 byte sectors: 255H 63S/T 1946C)
GEOM: da6: geometry does not match label (16h,63s != 255h,63s).
GEOM: da6: media size does not match label.

According to egrep da6 is the USB stick. So what command should I enter to store the log-file on the USB-stick?

And what's the best command to copy files including directories to a eSATA drive from FreeBSD?

HHawk · Mar 25, 2013

And is it possible to enable FTP on this FreeBSD version?
In that way I can already start transfering some files to my work PC.

//edit

Nevermind I will use Filezilla and SFTP instead. :)

warri · Mar 25, 2013

After hooking up the new disk via eSATA, SATA or USB you probably need to format it to UFS and mount it. Unfortunately, I'm also not sure about the exact commands needed, we'll have to wait on paleoN or protoSD to comment.
When your new disk is finally available, you can put the log output on this disk, no need to store it on the USB drive.

In the meantime, I hope your backup via SFTP is working for the most important files!

HHawk · Mar 25, 2013

warri said:
After hooking up the new disk via eSATA, SATA or USB you probably need to format it to UFS and mount it. Unfortunately, I'm also not sure about the exact commands needed, we'll have to wait on paleoN or protoSD to comment.
When your new disk is finally available, you can put the log output on this disk, no need to store it on the USB drive.

In the meantime, I hope your backup via SFTP is working for the most important files!

Uhmz... Instead of chaging the filesystem to UFS, I am gonna do it differently. Not as fast, but still better than USB; I am going to hook the external HD to my PC, and transfer the files locally through Filezilla. Guess it will be limited the network card than right?

HHawk · Mar 25, 2013

Okay still copying some stuff, got a lot from the server so far (still a lot to go), but it's still rebooting sometimes / kernel panic. :S
I really have no clue what is causing it. Sometimes I can copy for a full hour, the next moment I can only copy for 1 minute...

What could be the cause?

It's not the onboard controller, not the purchases controller, not the memory, not the USB stick and it also cannot be the harddisks because the smartctl checks (long/short) all finished without errors... :(

Code:


        NAME                                            STATE     READ WRITE CKSUM
        storage                                         ONLINE       0     0    59
          raidz2-0                                      ONLINE       0     0   234

I am now running a scrub but that count is increasing.

Between typing this post it increased even further:

Code:

        storage                                         ONLINE       0     0    76
          raidz2-0                                      ONLINE       0     0   302

And also this is still here:

Code:

        storage:<0x0>
        /rw/storage/Jail/plugins/var/log/messages

cyberjock · Mar 25, 2013

I can't remember what all we switched out, but are you using the same motherboard and CPU? Perhaps they are having issues that manifest during periods of high workload?

Caesar · Mar 25, 2013

could be power supply... Personally I did not understand why you couldn't move these drives to your water cooled pc. I mean the drives didn't have to go into the case just lay them out on the ground with cables running into your case. If you were able to do that we would have a better idea as to what is wrong with that pc, Mobo, Chip, PSU, whatever. Perhaps you will be able to troubleshoot the root cause once you get your data recovered.

nevertheless I applaud your patience! I troubleshoot for a living and I would love it if my client were as calm as you are in the face of losing so much data/time/money. Also I am amazed that the help that you are getting from the forums. I feel good in my choice in using freenas knowing that the community can be so helpful in a time of need.

Good luck with the rest of your recovery.

ProtoSD · Mar 25, 2013

Sorry for the Off-Topic capacitor discussion. It has been moved HERE
Please post any follow-up comments there.

HHawk · Mar 26, 2013

Caesar said:
could be power supply... Personally I did not understand why you couldn't move these drives to your water cooled pc. I mean the drives didn't have to go into the case just lay them out on the ground with cables running into your case. If you were able to do that we would have a better idea as to what is wrong with that pc, Mobo, Chip, PSU, whatever. Perhaps you will be able to troubleshoot the root cause once you get your data recovered.

//offtopic

I ALREADY mentioned that placing the harddisk into my watercooled PC was NOT an option. This would mean I had to remove all tubings, remove waterblocks, buy an air cooler, mount the CPU bracket for the air cooler, etc... If it was an option I would have done this, so please do not repeat this stuff over and over. It would have been nice if my previous statements about this situation were actually read. Also if I had done this, I wouldn't now be able to transfer all my files, now would I! Now drop this subject. TY.

//ontopic
I checked the components when I ordered them. They aren't that old yet though.

PSU: Cooler Master M600 (28.07.2011)
Memory: Corsair CMX16GX3M4A1333C9 (28.07.2011)
Motherboard: GA-880GA-UD3H rev. 3.1 (24.07.2011)
Harddisks: 6x WD20EARX (28.07.2011)*

* One disk faulted and was RMA-ed and replaced by a new harddisk (resilvered) a few months ago.

The weird thing is, when running long smartctl checks on all harddisks no errors are being reported and all drives are reported as working fine.

I am also not convinced the kernel panics and reboots are random, because I am almost done copying all my stuff EXCEPT for movies and series folders (downloads through sanzndb etc). These folders seem to be the most problematic. For example; I have the directory 'Grimm' (TV-series) and as soon as I copy this folder it results in a kernel panic and reboot.

It doesn't mean it happens to all files in this folder. Currently I am transering another serie called 'The Killing' this one is transfering at the moment fine and without kernel panics or reboots.

I will re-test the download of the folder "Grimm" and see if the kernel panic / reboot happens again. If that's the case, we can conclude the problems aren't related to PSU and / or CPU. Maybe ZFS is just bogus. I am really starting to re-consider the use of ZFS for my future file system. After this experience I do not trust ZFS to be the safest solution for me.

What tests can I run furthermore to see what is causing these problems (as soon as I have transfered the remaining stuff, which is possible, from my NAS).

ProtoSD · Mar 26, 2013

I'm really glad to hear you have been making progress copying stuff off. I am curious if you have checked the quality of the files you have recovered? I've seen situations where it appears stuff is copying great, but when you actually view the files they are corrupt or damaged.

I think problems like this are possible with any filesystem, the big difference is that other filesystems have been in use longer and have better tools for recovery. I plan to keep working on my program when I can because I know there will be other people, but I'm glad it wasn't necessary right now :)

How did the conversation with your wife go? Burn those pics on to a DVD, you never know when something can go wrong, just like this time.

If you don't trust ZFS anymore, you can still stay with FreeNAS and try UFS instead.

I still think there is some hardware problem. Just because something is new doesn't mean it can't fail. Power failures can be very hard on electronic equipment.

HHawk · Mar 26, 2013

ProtoSD said:
I'm really glad to hear you have been making progress copying stuff off. I am curious if you have checked the quality of the files you have recovered? I've seen situations where it appears stuff is copying great, but when you actually view the files they are corrupt or damaged.

I think problems like this are possible with any filesystem, the big difference is that other filesystems have been in use longer and have better tools for recovery. I plan to keep working on my program when I can because I know there will be other people, but I'm glad it wasn't necessary right now :)

How did the conversation with your wife go? Burn those pics on to a DVD, you never know when something can go wrong, just like this time.

If you don't trust ZFS anymore, you can still stay with FreeNAS and try UFS instead.

I still think there is some hardware problem. Just because something is new doesn't mean it can't fail. Power failures can be very hard on electronic equipment.

Yeah, so far it's going good. Currently I am also backing up several things to my work PC (kinda slow on my internet connection and upload connection from home), but I am getting it nevertheless...
I didn't try all the files (= lot of work), but I am testing some files randomly and they all work so far! * knocks on wood *

Well maybe I am overreacting a bit in regards to ZFS, however I always thought ZFS was supposed to be the safest / best filesystem around. That's why I also choose raidz2 so I could lose 2 disks and still not having any kind of data loss. However if a kernel panic, power outage, hardware failure (other than the harddisks) can cause these kinds of problems... Well I am not to sure about it at the moment if I should keep be using this.

But considering it has working before without any problems, I doubt I will make a switch to something else. And I will be considering an UPS (I still don't know if these problems were caused by a power outage though). On a sidenote; with an UPS is it possible to setup FreeNAS whenever there is a power outage and it switches to the UPS, it automatically shuts down the server (after a few minutes) so nothing is lost and there is no data corruption? I never used it before in combination with FreeNAS, so I am pretty clueless regarding this...

In regards to the wife; actually I didn't tell her anything (yet)... Maybe it's bad, but why should I worry her with this?
However I will be making backups of the pictures from now on. Just to be safe...

But now the main question; as soon as I have downloaded the current series (which will take a while) I will retry transfering the "Grim"-series. Previously, as soon as I started transfering files from this series, it would have a kernel panic and reboot the NAS. If it happens now again, it will definitely be related to;

a) one / several / all the harddisks
b) ZFS filesystem for some reason

However if it does transfer the files now suddenly without any problems, the problem must lay elsewhere...

I completely understand new hardware doesn't mean it's problem free, HOWEVER why was it working correctly for months and suddenly stops working properly. Harddisks I can understand (moving parts), onboard controller (well kinda hard to believe, but okay possible), USB stick (yes the stick wears out after use, due to read/write cycles, same with SSD), memory (also hard to believe, but also possible). So the only 2 things left imho are the motherboard itself or the PSU. But those options I also find hard to believe. Even my previous gaming system, which was heavily overclocked and watercooled, never had any kind of problems after severly stress testing. And ran without any issues for the past 2 years (I replaced it recently for a new gaming system).

And maybe the most important question remains unanswered... Will my system be safe to use and to rely on in the near future?!
Who says this won't happen again, after reinstalling and redoing everything? And are there other ways to test hardware, for example the harddisks? I am pretty annoyed I still cannot find out the problem, though I am guessing it's still related to ZFS, because if I check zpool status -v I am still getting this:

Code:

errors: Permanent errors have been detected in the following files:

        storage:<0x0>
        /rw/storage/Jail/plugins/var/log/messages

And as far as I can tell, the "storage:<0x0>"-error is a pretty serious problem, right?

cyberjock · Mar 26, 2013

HHawk said:
I am also not convinced the kernel panics and reboots are random, because I am almost done copying all my stuff EXCEPT for movies and series folders (downloads through sanzndb etc). These folders seem to be the most problematic. For example; I have the directory 'Grimm' (TV-series) and as soon as I copy this folder it results in a kernel panic and reboot.

It doesn't mean it happens to all files in this folder. Currently I am transering another serie called 'The Killing' this one is transfering at the moment fine and without kernel panics or reboots.

I will re-test the download of the folder "Grimm" and see if the kernel panic / reboot happens again. If that's the case, we can conclude the problems aren't related to PSU and / or CPU. Maybe ZFS is just bogus. I am really starting to re-consider the use of ZFS for my future file system. After this experience I do not trust ZFS to be the safest solution for me.

I don't think your assessment proves the issue isn't related to the PSU and/or CPU. The damage that occurred has never been narrowed down to a "when", let alone a "cause". It may have happened on the day you couldn't mount the zpool anymore, it could have been that some component is occasionally going bad(for instance, a loose SATA connection) that has cause small patches of corruption that are finally coming to light. One of the reasons why ZFS development began back in 2005 was that Sun Microsystems had identified the need for a new file system that was far more resilient to corruption from hardware errors than was currently available. Just like with every file system out there, unless you are willing to have a tool such as chkdsk, e2fs, or zpool scrubs run and traverse the entire file system there is no way to prove that you don't have corruption somewhere right now that you don't know about. ZFS provides this protection by using replicas, checksums or backups of every aspect of the data stored on the drives.

Even with a hardware RAID6 you cannot correct from certain types of errors, and unless a hard drive actually reports that it couldn't read the sector the parity data is NOT used to reconstruct the corrupt data. So a hard drive with bad firmware, loose cable, etc may destroy your entire zpool and you won't even be able to identify the problem without extensive testing. Add to that the fact that the logical disk manager layer and the file system layer couldn't interact and that added to the potential issues involved with "bigger and bigger" storage systems that will be used both now and in the future. ZFS is the only all-in-one file system and logical disk manager that exists that is mature. BTRFS has an opportunity to compete with ZFS for market share in the "very very large file server with exceptional reliability" but BTRFS isn't mature yet. Because ZFS has gone "closed source" I expect that in 5 years the forum will be full of posts that include discussions about which file system to use and why between BTRFS and ZFS.

I can fully appreciate your reasons for not being interested in ZFS, but I think they are very misguided. You've probably lost data to NTFS and other file systems in the past and you don't even know it. Ever had a file disappear and you had no idea where it went and chose to just download it again, restore it from a CD, etc? I discussed the stuff that Windows does behind the admin's back previously on this thread, so I won't discuss it again. But just like with any file system, certain hardware failures can be devastating to your data regardless of the file system or logical disk manager used.

The advice you should be taking away from this whole thing should be to use backups religiously. There's a reason that the "almighty" ZFS provides very efficient methods for backing up data and even keeping "backups" of data on the same file system via snapshots. Even a very low powered Atom system with 8GB of RAM and a couple of hard drives can be a life saver if things get very very ugly. There has been, and will always be, a use-case for backups that will never be satisfied any other way.

Personally, I have a spare 3TB drive that I backed up my most important information to and I keep at a friend's house in a neighboring town. If my house burns down that hard drive will be something that I will be very very glad I did. I've known 2 people that lost everything in a house fire. Baby pictures, family photo albums, sometimes pets and family! One of my friends lost a family member in their house fire. They are so incredibly happy they took my advice back in 2010 and started keeping a hard drive full of their most important family files on a hard drive at another family member's house. They are convinced that when I talked them into keeping a backup of their digital home videos and family photo album that I was doing "God's work". Without that hard drive they wouldn't have had anything to remember their son by except their memories and a tombstone. They have video and photos of him and that is all. The house burned down to the cement foundation(the house was out in the country and there wasn't a fire station around for 25 miles).

Anyway, backups should be a must in all situations. Backups can not be and never will be overstated. It's oviously your choice, but I'm glad I have ZFS, and I trust it far more than the RAID6 it replaced. I have always seen very large NTFS RAIDs had occasional issues(2-6 months in between) that resulted in varying amounts of lost data that came up randomly and without warning. So far never had an issue at all on any of the ZFS systems I've built for myself and friends.

Caesar · Mar 26, 2013

It sounds like we are looking at the hardware as everything is good until proven bad it maybe better for us to look at it as everything is bad until proven good. Also PSUs are one of the most common causes of indeterminate failures

HHawk · Mar 26, 2013

Well I just tested transfering the "Grimm"-series folder again... And gues what... Kernel panic + reboot.
So this problem is really caused by ZFS.

Otherwise if it was hardware related, it would be random (except if it was for the harddisks).
I have now tried to transfer the "Grimm"-series folder several times and every single time the exact same result > kernel panic > reboot.

I will now have to test the remaining directories one at a time in order to see which one are causing kernel panics and reboots. Maybe I should disconnect a different harddisk and see if the problems still stay. I don't know if it makes a difference, but removing the first harddisk resulted in gettin my pool (partially) online and making it possible for me to transfer data from it. The risk of losing any data from this stage is not that of an issue as I have already transfered the most important stuff.

- - - Updated - - -

@ paleoN

/ ProtoSD

When I am done transfering files or when it's not possible to transfer the remaining files (due to kernel panics / reboots), are there any other commands you want me to try (even if they are destructive) to see what could be causing the problems? I am willing to give it a few shots to see what happens or anything...

Also what is the best way to set everything up again? Do I reinstall FreeNAS completely as a fresh install or can I use the old USB stick with the previous FreeNAS installation?
And what about my Sabnzbd, Couchpotato and Sickbeard settings? I managed to download the 'Plugins'-directory with all files in them (this will also mean configuration files). So can I import them again someway?

cyberjock · Mar 26, 2013

HHawk said:
Well I just tested transfering the "Grimm"-series folder again... And gues what... Kernel panic + reboot.
So this problem is really caused by ZFS.

Otherwise if it was hardware related, it would be random (except if it was for the harddisks).

Not true. ZFS causing a kernel panic is more than likely a symptom and not a cause. Corruption of any type is not a cause but a symptom of something wrong. The cause could be almost anything. The real question you should be asking yourself(and all us senior guys are asking ourselves) is "What caused corruption so significant that its crashing ZFS?" The answers range from user error to hardware error to software error to an unexpected loss of power. Right now nobody has a solid answer. Mt first instinct when I saw your very first post(before you told us about the potential loss of power) was bad RAM or loss of power/improper shutdown.

I'm really not buying it is a software error though. For If it was it would have likely been found long ago and fixed. This isn't labeled "enterprise class" for nothing. Enterprises trust ZFS implicitly for its superior data integrity. Virtually every aspect of the file system is checksumed or otherwise has a backup internally(aside from just using a vdev that uses redundancy) so that ZFS can determine exactly what stuff is corrupted and fix it. In your case it sounds like the file system is so corrupted that it can't figure out what is bad, let alone fix it. (Honestly.. this screams of loss of power or hardware failure)

To quote the wikipedia article on ZFS:

One major feature that distinguishes ZFS from other file systems is that ZFS is designed from the ground up with a focus on data integrity.

All of us senior guys are scratching our heads because there seems to be absolutely nothing wrong, but its so much more likely to be a hardware issue or user issue than a software bug.

The real problem is that something is/was wrong. For your own sake you should be very interested in knowing exactly what the cause was and fixing it, otherwise you may find yourself getting screwed again later from the same exact issue. This is why paleoN and protoSD were saying you should get the SATA controller to rule out the controller.

Anyway, I'm done trying to defend ZFS. It's your data and your choice(and your consequences... as you have personally witnessed unfortunately) how you want to store your data. It's your data to keep/lose/backup. :)

Here's a fun 2 paragraphs on silent corruptions...

The worst type of errors are those that go unnoticed, and are not even detected by the disk firmware or the host operating system. This is known as "silent corruption". A real life study of 1.5 million HDDs in the NetApp database found that on average 1 in 90 SATA drives will have silent corruption which is not caught by hardware RAID verification process; for a RAID-5 system that works out to one undetected error for every 67 TB of data read. However, there are many other external error sources other than the disk itself. For instance, the disk cable might be slightly loose, the power supply might be flaky, external vibrations such as a loud sound, the Fibre Channel switch might be faulty, cosmic radiation and many other types of soft errors, etc. In 39,000 storage systems that were analyzed, firmware bugs accounted for 5–10% of storage failures. All in all, the error rates as observed by a CERN study on silent corruption, are far higher than one in every 10[SUP]16[/SUP] bits. Webshop Amazon.com confirms these high data corruption rates.

Silent data corruption has not been a serious concern while storage devices remained relatively small and slow. Hence, a user very rarely faced silent corruption, so it was not deemed to be a problem that required a solution. With the advent of larger drives and very fast RAID setups, a user is capable of transferring 10[SUP]16[/SUP] bits in a reasonably short time. In particular, ZFS creator Jeff Bonwick stated that the fast database at Greenplum — a database software company located in San Mateo, California specializing in enterprise data cloud solutions for large-scale data warehousing and analytics – faces silent corruption every 15 minutes, which is one of the reasons that Greenplum now base their fast database solution on ZFS. These large and fast raid setups require new file systems that focus on data integrity. This is one of the design goals of ZFS, as explained by Jeff Bonwick.

ProtoSD · Mar 26, 2013

HHawk said:
- - - Updated - - -

@ napoleoN / ProtoSD

I think PaleoN would appreciate it if you got his username correct ;)

HHawk · Mar 26, 2013

Oooooooops... Fixed. My bad. :)

SkyMonkey · Mar 26, 2013

Wow, if anything reading through this thread has given me more confidence in ZFS. Ability to recover data after extensive problems almost certainly caused by a hardware failure somewhere is pretty impressive.

HHawk, for your irreplaceable content (ie wedding photos, etc), pay a few bucks a month for a cloud backup solution, and also keep at least one other local copy.

My personal strategy is a Crashplan account which backs up many local PCs, and pieces of my FreeNAS system. The wedding photos are backed up to Crashplan as an .iso as well as the jpegs. Personal photos and docs are duplicated locally as well, and the wedding photos are on at least two dvds burned off and stored.

paleoN · Mar 26, 2013

HHawk said:
Well I just tested transfering the "Grimm"-series folder again... And gues what... Kernel panic + reboot.
So this problem is really caused by ZFS.

Likely some of them at the least. Not that you bothered to mention what the panics are. The txg the pool was rolled back to also may not have been the most consistent one. That's water under the bridge at this point though.

HHawk said:
When I am done transfering files or when it's not possible to transfer the remaining files (due to kernel panics / reboots), are there any other commands you want me to try (even if they are destructive) to see what could be causing the problems? I am willing to give it a few shots to see what happens or anything...

I would try importing the pool read-only as this may possibly allow you to copy other things you couldn't before. You could also try setting the failmode to continue and see if that behaves any better. That seems familiar.

HHawk said:
Also what is the best way to set everything up again? Do I reinstall FreeNAS completely as a fresh install or can I use the old USB stick with the previous FreeNAS installation?
And what about my Sabnzbd, Couchpotato and Sickbeard settings? I managed to download the 'Plugins'-directory with all files in them (this will also mean configuration files). So can I import them again someway?

You could use the old stick, but I would at the least detach the old pool to remove it from the db before creating a new one. For the plugins, reinstall everything first then copy the config files over with the jail shutdown or the plugins shutdown at least.

For the experimentation phase I would setup a 3 x disk raidz1 pool and a 3 x disk UFS raid3 pool. Then use rsync to backup the zpool to the UFS array.

HHawk said:
It's not the onboard controller, not the purchases controller, not the memory, not the USB stick and it also cannot be the harddisks because the smartctl checks (long/short) all finished without errors... :(

SMART is hardly absolute. All the manufacturers made sure of that. I agree it does make it unlikely to physically be the disks though.

SkyMonkey said:
Ability to recover data after extensive problems almost certainly caused by a hardware failure somewhere is pretty impressive.

How did you reach that conclusion? An obscure ZFS bug, very obscure?, is still a possibility. Something seems to have cause garbage to be written to important parts of the pool causing metadata corruption which doesn't show as metadata corruption during import. Of course that's just a supposition as well and doesn't answer what initially caused it.

HHawk, this has already been mentioned before, but I will repeat it: No form of RAID is a substitute for backups. ZFS being RAID-like. A real backup is also not physically connected to the same system. You don't have any data unless you have a second unrelated, independent copy of it.

Important Announcement for the TrueNAS Community.

Please help! Started up FreeNAS, suddenly voluime storage (ZFS) status unknown?!

Guru

Contributor

Contributor

Guru

Contributor

Contributor

Inactive Account

Contributor

MVP

Contributor

MVP

Contributor

Inactive Account

Contributor

Contributor

Inactive Account

MVP

Contributor

Contributor

Wizard

Similar threads