Has my pool gone to pool heaven?

Status
Not open for further replies.

Tyrannus

Dabbler
Joined
Aug 20, 2016
Messages
16
Hi all

So Ive been having some problems lately where I was getting total system lockups and having to hard reboot. There was a few circumstances around which this would happen:

1) Trying to delete or copy 100's of Gbs of data
2) Or at some point during the day/night after no activity network watchdog would timeout (that was the last message on screen anyway)

I was about to try and address these, as well as continuing to fix up my datasets for my shares when it all crashed on me overnight, with the watchdog timeout being the last message on screen around 1am the night before.

I rebooted as per normal and its hanging on importing the single zpool I have.

I've done Memtest and everything is OK there.
When I boot into Single User Mode, I can import the pool using -f -R to mount it, but just -R doesn't work - it needs to be forced.
I reinstalled the OS for the sake of it, and it's still hanging on importing the pool.
I have a feeling my pool has become kaput :(

What are my next steps in terms of resolution? I'd really like to not lose the data if I can avoid it.

Thanks
 

Tyrannus

Dabbler
Joined
Aug 20, 2016
Messages
16
I can't do it from SSH because it won't boot that far. It literally hangs on importing the pool. On one boot attempt I got an error that UID 0 had been terminated as well.

I just ran it from SUM, and my pool was absent so I am about to import it.

System specs are:
Intel Core i5-6400
16Gb of DDR4 RAM (non-ECC and yes I knew the risks but RAM has checked OK)
Gigabyte Z170-HD3 mainboard
Realtek onboard NIC
2x Samsung 750 SSDs
4x Seagate 3Tb 3.5" disks
1x Samsung 32Gb USB flash disk for the OS.

Last post for I'm off to work but I did the import of the pool in SUM.

It gave a lot of errors about mountpoints but when I did a zpool status - it's showing up all OK with everything online and no errors detected.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
when I did a zpool status - it's showing up all OK with everything online and no errors detected
Then you can copy your important data elsewhere before destroying and rebuilding the pool.
non-ECC and yes I knew the risks but RAM has checked OK
Running memtest on non-ECC RAM is missing the point entirely. Random bit flips occur in RAM due to various natural causes, including cosmic radiation. ECC RAM will at least detect, and usually correct, bit flips. Non-ECC RAM, no matter how many times you memtest it, will not detect those bit flips, and can thus silently corrupt your data.
 

Tyrannus

Dabbler
Joined
Aug 20, 2016
Messages
16
Hi Robert

Thanks. I feel a bit of a relief now knowing that.

Also thanks for your extra details on memtest.

With regards to saving my data, I do have somewhere I can put it, but I'm curious: if the pool is OK, why can't the OS import it on a normal boot?

Is there anything I can look for?
 

Tyrannus

Dabbler
Joined
Aug 20, 2016
Messages
16
Just on this, if I leave it for long enough, I get the following message:

"uid 0, was killed: out of swap space."

Could this indicate a failure with the USB drive that I am using for the OS and this is why the pool won't import?

More potentially relevant.

The mountpoint errors are specific to my datasets. So basically when I import the pool in SUM it gives me an error that /mnt/pool/dataset cannot be mounted for all of my datasets.

Another, maybe relevant item, if I do not use -f it claims the pool is in use by another system but from what I can tell, not only does my host ID match, the pool certainly hasn't been used by another system. Any ideas?
 
Last edited:

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
I don't think "the pool is OK", I think it's probably corrupted. However, since you're able to mount it read-only, you have an opportunity to recover some data before rebuilding it. You might consider doing the recovery on a system that's better suited to running FreeNAS, to minimize potential data loss.

It's possible your USB boot drive has issues too, may as well replace it.

The mount errors are because the mount-points don't exist in single user mode. Run zfs get mountpoint <datasetname> for one of the datasets and check to see which folders are missing. You might only have to create the top level folder(s), I'm not sure how much of the work ZFS does on its own.

The "other system" is in fact your system, which still 'owns' the pool. When you force it to mount in single user mode, it doesn't see itself as the rightful owner.
 

Tyrannus

Dabbler
Joined
Aug 20, 2016
Messages
16
Thanks for the tips.

Ive managed to somehow get it to boot correctly. I'm going to go through the process of pulling data off it and rebuilding the pool from scratch.

Thanks for your guidance.
 
Status
Not open for further replies.
Top