Pool Suddenly Crashed, Cannot Access

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
In a bit of a panic mode, did a quick search but couldn't find anything.

My pool (6x8TB Z2) suddenly crashed. I received an email alert saying that a snapshot failed followed by an email saying the pool state is UNVAVAIL: one or more devices are faulted in response to IO failures.

**EDIT** I was running a scrub when this happened.

Rebooted, same thing. Actually had to hard reboot because it said "90 second watchdog timeout expired. Shutdown terminated.

On the web dashboard, my (encrypted) pool is only showing 3/6 disks. If I go to storage > disks I can see them all though. Attempting to unlock the pool results in "error unlocking" dialog.

Dashboard reports the following:
OS Version:
FreeNAS-11.2-U3
(Build Date: Mar 27, 2019 18:24)
Processor:
Intel(R) Xeon(R) CPU E5-2650L 0 @ 1.80GHz (16 cores)
Memory:
24 GiB
HostName:
freenas.local

The Memory graph is saying that only 1.54GiB are being used though. Saw another post saying that ECC RAM going bad can cause this?

Will provide more data as needed.

Steps I have taken so far were to put the 3 "missing" disks into other trays in case the backplanes had gone bad. This didn't help so I put them back to where they were.

Any help is greatly appreciated. And no, unfortunately I do not yet have a backup system in place as I have just recently put this system together and it contains some irreplaceable data.

Thanks.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Will provide more data as needed.
Full hardware details. See guidance below:

Forum Guidelines
https://www.ixsystems.com/community/threads/forum-guidelines.45124/
Any help is greatly appreciated. And no, unfortunately I do not yet have a backup system in place as I have just recently put this system together and it contains some irreplaceable data.
Especially when you have an encrypted pool, you should never, never, not have a backup because of the likelihood that encryption will lock you out of your data.

The most likely situation is that the drives not available need to be unlocked. They are probably not showing up because the encryption didn't automagically unlock them. You might be able to decrypt them manually and regain access to your data.
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
Thank you Chris, appreciate your response. Hardware config is as follows:
  • Case: Norco 4224 4U
  • Mobo: Gigabyte 7-PESH2
  • CPU: Xeon E5-2650L 0 @ 1.80GHz (16 cores)
  • RAM: 24GB ECC (6x4 Samsung 2Rx4 PC3L-10600R DDR3 ECC REG)
  • Storage Pool = 6x HDDs in Z2 config (encrypted)
    • 2x 8TB WD80EMAZ-00WJTA0
    • 8TB WD80EZZX-11CSGA0
    • 8TB WD80EZAZ-11TDBA0
    • 8TB WD80EFZX-68UW8N0
    • 10TB WD100EMAZ-00WJTA0 (which just finished resilvering ~12 hours ago from an 8TB WD80EFZX-68UW8N0 and has successfully rebooted after resilvering - I don’t know if the old data would still be on here but I haven’t used the disk since)
  • Boot Pool = 2x mirrored 32GB USB Cruzer Blades (SDCZ50-032G)
  • HP 487738-001 468405-001 24-Bay 3Gb SAS Expander Card
  • FreeNAS 11.2 U3
Forgot to mention that I have the disks arranged vertically (the 24 tray case can hold 6 vertically; the reason I spread them out was so that each one is on its own separate backplane. I did this because some older reviews for the Norco 4224 mentioned the backplanes sometimes frying the drives, so I did this for risk mitigation. The "missing" disks are on the lower 3 trays which is why I had moved them to the upper backplanes, but this didn't work.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
When you replace a disk in an encrypted pool, you need to rekey the pool. The disks that are not showing up were not able to be unlocked.
just search the forum for geli unlock, there are many threads on this subject. You would need to unlock the individual disks from the command line, then you should be able to import the pool from in the GUI. Once imported, you need to rekey. It should work fine after that.
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
Well thanks, unlocking each disk worked! I followed the instructions found here: https://www.openattic.org/posts/unlock-geli-ecrypted-zfs-volume-freenas/ and had to manually change back the mount point under Users (which isn't mentioned in the link).

I had actually done this immediately after replacing the drive, so I'm not sure why it suddenly became an issue:
  1. Highlight the pool that contains the recently replaced disk and click the Encryption Re-key button in the GUI. Entry of the root password will be required.
  2. Highlight the pool that contains the disk you just replaced and click Create Passphrase and enter the new passphrase. The old passphrase can be reused if desired.
  3. Highlight the pool that contains the recently replaced disk and click the Download Key button to save the new encryption key. Since the old key will no longer function, any old keys can be safely discarded.
  4. Highlight the pool that contains the disk that was just replaced and click the Add Recovery Key button to save the new recovery key. The old recovery key will no longer function, so it can be safely discarded.
Anyway, it looks like I'll be expediting the backup process and getting everything copied over ASAP. What's strange though is that not all my files are showing up immediately - they're slowly being populated in the root folder after refreshing every few minutes. At this rate it'll take a week or so.

zpool status shows the following, which from what I can remember looks very different than what it used to (eg I don't remember seeing da#p2.eli before):

Code:
[root@freenas ~]# zpool status
  pool: mypool
 state: ONLINE
  scan: resilvered 244K in 0 days 00:00:01 with 0 errors on Fri Apr 26 23:59:142019
config:

        NAME           STATE     READ WRITE CKSUM
        mypool         ONLINE       0     0     0
          raidz2-0     ONLINE       0     0     0
            da0p2.eli  ONLINE       0     0     0
            da2p2.eli  ONLINE       0     0     0
            da1p2.eli  ONLINE       0     0     0
            da4p2.eli  ONLINE       0     0     0
            da3p2.eli  ONLINE       0     0     0
            da5p2.eli  ONLINE       0     0     0

errors: No known data errors


Anyway thanks a ton for your help!
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
Update: It's not just refreshing the directory structure, it's copying everything over again. When I woke up the pool size had increased from 69% => 86%. I'm copying off the most critical files as quickly as I can hoping it won't exceed 95%.

Not sure why all this is happening. Anyway I can't complain as at least I have access to the data. Will be wiping everything once I get everything copied over and starting again from scratch.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
zpool status shows the following, which from what I can remember looks very different than what it used to (eg I don't remember seeing da#p2.eli before):
That isn't right. Once you unlocked the drives, did you import the pool from the GUI?
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
I've copied/deleted a sizable amount which should have affected the usage %, but it hasn't recovered any space. Currently using 92%. I hate to reboot, but I'm going to have to in about half an hour to see if it will reallocate some space.
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
Alright, I was able to reboot and eventually get the pool unlocked again. System does not appear to be resuming the copy that it was doing, so that's good. I was able to delete some of the stuff I had backed up and regained some space.

The problem now is that my directory tree under Sharing > SMB > Edit looks like this:

  • /mnt
    • folder1
    • folder2
    • iocage
    • mypool
      • folder1 **copy it made last round**
    • folder3

I currently have the SMB share set to /mnt/mypool however it's only showing the folder1 copy. I'm pretty sure it was previously /mnt/mypool/(folders) but don't understand why that changed.

Anyway, what would be the best way to regain access to everything? I also seem to have all my snapshots - would rolling back to one of those be a viable option?
 

dasq4v

Cadet
Joined
Apr 26, 2019
Messages
8
Sorry, figured out I could just do this with:

mv "/mnt/folder name" /mnt/mypool

Just posting in case someone has the same issue.
 
Top