Can I mount a DEGRADED pool so that files with permanent errors may be fixed/removed?

Niel Archer

Dabbler
Joined
Jun 7, 2014
Messages
28
I have a raidz1 pool that has become degraded after LSI MegaRAID problems. I have reconnected the drives directly to my motherboard, but now have "permanent errors" showing:

pool: pool01
state: DEGRADED
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://illumos.org/msg/ZFS-8000-8A
scan: resilvered 429G in 0 days 00:57:24 with 70 errors on Sat Dec 29 21:01:24 2018
config:

NAME STATE READ WRITE CKSUM
pool01 DEGRADED 0 0 72
raidz1-0 DEGRADED 0 0 152
gptid/13802218-f5d3-11e8-b394-6805ca3c6a75 DEGRADED 0 0 0 too many errors
gptid/142d2799-f5d3-11e8-b394-6805ca3c6a75 DEGRADED 0 0 0 too many errors
gptid/14d968d0-f5d3-11e8-b394-6805ca3c6a75 ONLINE 0 0 0

errors: Permanent errors have been detected in the following files:

<metadata>:<0x42>
<metadata>:<0x89>
pool01/.system/cores:/syslog-ng.core
pool01/.system/samba4:<0x0>
pool01/.system/syslog-a341adb2ddce4de3b1f8d0281e2d5601:/log/utx.log
pool01/.system/syslog-a341adb2ddce4de3b1f8d0281e2d5601:/log/samba4/log.winbindd
pool01/.system/syslog-a341adb2ddce4de3b1f8d0281e2d5601:/log/samba4/log.winbindd-idmap
pool01/.system/syslog-a341adb2ddce4de3b1f8d0281e2d5601:/log/messages

Is it possible to remove/replace these files, as they appear to be only logs? How do I make the pool visible to the system?
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Is it possible to remove/replace these files, as they appear to be only logs? How do I make the pool visible to the system?

I'd try zpool import -F -n Tank first (zpool import -f Tank may work as well). And if you are able to import, what has always worked for me is zpool export Tank, reboot, import from GUI. But I've never dealt with this specifically; however, when I've seen "too many errors" I was never able to get the pool mounted again, I had to copy off data from the unmounted pool, destroy, and recreate.

These are the guys you need to be worried about (it isn't just 8 files, i.e. "too many errors ")
<metadata>:<0x42>
<metadata>:<0x89>

Its possible a scrub + zpool clear may remedy, but the metadata is concerning. But I think your only shot at reimporting via GUI is to see if you can fix the metadata (I've attempted before and failed). A friendly attempt to assist, but let a pro chime in so you don't do something that is irreversible. And further I would tread very lightly with raid z1 due to such slim fault tolerance.
 

Jailer

Not strong, but bad
Joined
Sep 12, 2014
Messages
4,977
With corrupt metadata your chance at success is minimal.
 

Niel Archer

Dabbler
Joined
Jun 7, 2014
Messages
28
I'd try zpool import -F -n Tank first (zpool import -f Tank may work as well).
I had already imported with -f, so that gives me an error "cannot import 'pool01': a pool with that name is already created/imported, and no additional pools with that name were found."
It still doesn't show up in the GUI though.

These are the guys you need to be worried about (it isn't just 8 files, i.e. "too many errors ")
<metadata>:<0x42>
<metadata>:<0x89>
Yeah, they worry me also. If I can get the pool available to the system I can start moving any recoverable files to another pool.

Its possible a scrub + zpool clear may remedy, but the metadata is concerning. But I think your only shot at reimporting via GUI is to see if you can fix the metadata (I've attempted before and failed). A friendly attempt to assist, but let a pro chime in so you don't do something that is irreversible.

How does one go about fixing the metadata?

And further I would tread very lightly with raid z1 due to such slim fault tolerance.
I've been reading a lot of comments the last couple of days about that. *sigh* Wish I had known at the start of the month when I bought the drives, then I would have invested in an extra one.
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
Can we take a step back for a moment, please? Can you expand on the "LSI MegaRAID problems" so I understand how we got here? I don't ask to shame you, rather just ascertain some situational awareness regarding what we are up against. And most importantly are you backed up and if not, how important is the data you stand to lose.

I had already imported with -f, so that gives me an error "cannot import 'pool01': a pool with that name is already created/imported, and no additional pools with that name were found."
It still doesn't show up in the GUI though.
  • I quickly replied trying to get to dinner on time as I know what it is like to be in your shoes. In doing so, I realized I missed a few key bits in your post.
  • AFAIK - and I've been in a similar situation a number of times thanks to using an encrypted pool (but I back up the key, recovery key, and even geli metadata + have on-site and off-site backup) - whenever I've been unable to import via the GUI - I've always had to import the pool via CLI - then some combination of zpool export / reboot / import via GUI always got the job done.
  • You aren't getting your pool imported via the GUI and mounted until you solve the metadata issue, which you likely won't ...
Yeah, they worry me also. If I can get the pool available to the system I can start moving any recoverable files to another pool.
  • ... I'll be less diplomatic than previously, but I 100% agree with @Jailer . I wouldn't give up yet, but it is a near certainty your pool is borked.
  • But its not all bad news, I suspect just more of a PITA to deal with ... as when I had a similar issue with metadata, I could still access my data, just not via the GUI and the tank wasn't mounted, i.e. /mnt/pool01 rather /pool01 (to use your pool name). So I copied it off via rclone to another system, and started fresh.
  • Further, I see that you only have a few posts, but this community is great and experts abound who have the knowledge I don't and who will take the time to assist.
  • Chin up! ;)
How does one go about fixing the metadata?
  • That is way above my pay grade ... I just know enough to break stuff from time to time.
I've been reading a lot of comments the last couple of days about that. *sigh* Wish I had known at the start of the month when I bought the drives, then I would have invested in an extra one.
  • Its OK to make sub-optimal decisions, heck make as many as you want, just learn from each one.
  • But you have a bigger problem to solve than creating a more fault tolerant pool at the moment.
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
I just purposely prevented myself from being able to import from the GUI to demonstrate that all is likely not lost, but it looks bad, eh (open the spoiler)?

Unable to import from GUI, so try via CLI ... geli attach the key (i.e. decrypt) ... manage to import the pool ... but it isn't visible in the GUI, nor mounted, but if I navigate to /Tank1 (which would normally be /mnt/Tank1) with the mountpoint set properly via GUI import, my data is still there ... I suspect your case to be the same. Now you just need to copy it off (and if you need help let me know).

O/c the geli attach is completely irrelevant in your case, just demonstrating that if your pool is imported, your data is likely intact.

Code:
root@FreeNAS-02[~]# zpool import Tank1
cannot import 'Tank1': I/O error
        Destroy and re-create the pool from
        a backup source.
root@FreeNAS-02[~]# cd /data/geli

root@FreeNAS-02[/data/geli]# geli attach -k geli.key da3p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da4p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da5p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da6p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da7p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da8p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da9p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da10p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da11p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da12p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da13p1
Enter passphrase:
root@FreeNAS-02[/data/geli]# geli attach -k geli.key da14p1
Enter passphrase:

root@FreeNAS-02[/data/geli]# zpool import Tank1
root@FreeNAS-02[/data/geli]# zpool status Tank1
  pool: Tank1
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:09:15 with 0 errors on Thu Dec 27 08:39:47 2018
config:

        NAME                 STATE     READ WRITE CKSUM
        Tank1                ONLINE       0     0     0
          raidz2-0           ONLINE       0     0     0
            da3p1.eli        ONLINE       0     0     0
            da4p1.eli        ONLINE       0     0     0
            da5p1.eli        ONLINE       0     0     0
            da6p1.eli        ONLINE       0     0     0
            da7p1.eli        ONLINE       0     0     0
            da8p1.eli        ONLINE       0     0     0
            da9p1.eli        ONLINE       0     0     0
            da10p1.eli       ONLINE       0     0     0
            da11p1.eli       ONLINE       0     0     0
            da12p1.eli       ONLINE       0     0     0
            da13p1.eli       ONLINE       0     0     0
            da14p1.eli       ONLINE       0     0     0
        logs
          gpt/Opt-02_Log-01  ONLINE       0     0     0
          gpt/Opt-02_Log-02  ONLINE       0     0     0

errors: No known data errors

Circling back to the pool archictecture, I don't feel completely comfortable with a 12-wide raidz2 pool; however I need the space as my other pool is 12 10TB disks and this is 12 6TB disks (and this is a backup so I wanted to keep my S/E % as high as possible, higher fault tolerance on the pool with 120TB raw). My preference would have been to go with raidz3 on this host.

pool.png
 

Niel Archer

Dabbler
Joined
Jun 7, 2014
Messages
28
Can we take a step back for a moment, please? Can you expand on the "LSI MegaRAID problems" so I understand how we got here? I don't ask to shame you, rather just ascertain some situational awareness regarding what we are up against. And most importantly are you backed up and if not, how important is the data you stand to lose.
At the beginning of the month my main pool (6TB x 3 raidz1) was running out of space, when I saw a particularly good deal (about £70 each less than normal price) on some IronWolf Pro 10TB drives. So I bought three.

My motherboard didn't have the SATA ports for the extra drives, "fortunately" I had a couple LSI MegaRAID 9240-8i available. I hooked the drives up, created the pool, created a couple datasets, moved the largest dataset from the older pool to the newer - all without any problems.
Around this time FreeNAS 11.2 reached release status so I upgraded.

The hardware for my FreeNAS box is mostly older, retired stuff from desktop use (not the drives obviously), I don't have the cash to buy server gear yet. I decided to transfer to a relatively newer M/B, CPU, and quadruple the RAM. This is when it all went pear-shaped ;-) The LSi card really didn't like the newer motherboard, I think. I've heard they can be temperamental with desktop m/b, particularly older ones. Any how, FreeNAS couldn't complete boot up without commands to the MegaRAID card timimg out after a minute or two. After a few tries to determine what was wrong, I gave up. Fortunately the newer board had more SATA ports so with some juggling I was able to get all the drives hooked up without the need for the MegaRAID card. This is when I discovered the problem with the pool.

P.S. I'm in the process of moving the data off of pool01 to the previous pool it came from. I expect that to take some time :-(
 

svtkobra7

Patron
Joined
Jan 12, 2017
Messages
202
P.S. I'm in the process of moving the data off of pool01 to the previous pool it came from. I expect that to take some time :-(
  • I'm glad to hear your data is in tact at least.
  • I know it is a PITA and I'd wager that nearly all forum members have borked a pool at some point or another.
  • Grab a few beers and let her copy. In an attempt to cheer your spirits, at least you aren't looking at 30 - 40 TiB as I had to copy last time I borked a pool. This was pre 10G, so I was looking at 125 MB/s ... and I said screw it ... temporarily converted the other server to homebrew JBOD, i.e. cascaded that backplane of the first and enjoyed much, much faster speeds. It was done before way before I could finish the keg I purchased to wait it out what I thought would be quite a long time. (bad sense of humour - sorry).
 

Niel Archer

Dabbler
Joined
Jun 7, 2014
Messages
28
Just a quick update. I managed to get all but one file off of the pool without any major problems. That allowed me to destroy and recreate the pool.
 
Top