GUI Upgrade Failed, Can’t Import Volume

Status
Not open for further replies.

Kamikaze321

Dabbler
Joined
Nov 30, 2015
Messages
11
My System:
CPU:i7-3770
MoBo: Asrock B75M
Ram: 2x8GB DDR3 1333 (non ECC )

My ZFS pool is 6 4TB drives in raid-z2


I’ve been running Freenas for a year and a half now without any major issues, until today. My issues started when I used the gui to upgrade to the newest stable release that came out a few days ago. I was upgrading from the previous stable release that came out in December.

After the update finished the gui never came back up so I plugged in a monitor and had a look and saw the following error on screen:

Code:
KDB: enter: panic
[ thread pid 767 tid 100590 ]

Followed by a db> prompt that doesn’t seem to do anything
http://imgur.com/EA2aWeq (screenshot of error)

After a power cycle it got stuck on this error again. I also get it when I try one of the older revisions in the GRUB loader (I think that's what it called?)

My fist assumption was that the USB drive might have failed so I reinstalled a fresh copy of freenas on a different USB drive and booted up. Freenas will boot up but any steps that involve importing the volume cause the GUI to hang / freenas to reboot. Even clicking the wizard button cause the system to instantly reboot.

The zpool import & zpool import <VolumeName> commands both caused the system to reboot in the exact same way
This is a video I made of the crash / reboot - https://vid.me/ANLh

I also tried these troubleshooting commands to get additional information:
Code:
[root@freenas ~]# zpool status
pool: freenas-boot
state: ONLINE
scan: none requested
config:
NAME        STATE     READ WRITE CKSUM
freenas-boot  ONLINE       0     0     0
da0p2     ONLINE       0     0     0
errors: No known data errors
[root@freenas ~]#


gpart show
http://pastebin.com/KCiYpPeu
gpart list

http://pastebin.com/PsxpHPpP
So the disks appear to be recognized, they just won’t import.

I’m hoping someone can suggest additional steps I can try to resolve this. The only similar issue I could find on the forums was from years ago and it was determined his issue was caused by an insufficient amount of ram. Because I don’t have ECC ram I also spent the last few hours running a memtest 86 4.2 and it has found no errors so far.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I think what has probably happened here is there is some hardware problem that was pre-existing, but wasn't a problem until you rebooted (i.e., has nothing to do with the upgrade). Your troubleshooting step of trying a fresh install on a fresh USB drive was very sensible. Too bad it was not successful.

I think you probably know this is an awful motherboard/chipset to use for a FreeNAS...

Be that as it may, what I'd propose doing at this point is troubleshooting to see if the hard drive(s) are the problem. I would take them (all) out, put in some single, random, hard drive you have laying around, and see if you can now boot and create a pool on that. If you can, then the problem would appear to be with the hard drives and/or pool itself.
 

Kamikaze321

Dabbler
Joined
Nov 30, 2015
Messages
11
Good idea, I tried that. It worked without issue. I was able to create a pool on the single disk, add some data, detached and re-imported.

So I guess that's not good news for my pool..

Is it a good idea to start disconnecting 1 HDD at a time and booting up to see if the crash goes away?

I ran a smartctl -x on all the drives, I don't see anything obviously wrong with any of them. lifetime max temp is a bit high on a few of them but they all pass

ada0
http://pastebin.com/TZG226t1

ada1
http://pastebin.com/LCEGNnuY

ada2
http://pastebin.com/ycggvr8y

ada3
http://pastebin.com/0jihaVaH

ada4
http://pastebin.com/Yre1rSJD

ada5
http://pastebin.com/uk6QrH6V

Thanks for your help
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
No, I don't see anything terrible here. Some of these are dodgy. High Fly Writes are bad, and only God knows what attribute #1 (raw read errors) is supposed to mean in a Seagate drive.

So what we know, right now, is that you have a problem that manifests when you plug in the drives for your main pool, but not when you plug in a single random drive. Could be a number of things. Power supply rail not holding its voltage at spin up under the load. Something serious wrong with one of the drives. Something serious wrong with many of the drives. Something wrong with the pool itself. Something wrong with another piece of hardware, etc. Maybe the next thing I'd try to plugging in some random sample of 4 of the 6 drives, trying to isolate maybe a bad drive or something. Or, perhaps, swapping in a new PSU to rule that out.

Maybe someone has a better idea?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Well here are some suggestions to try:

1) Run Memtest86+ for a day to ensure there are no RAM issues. A failure in multiple areas may indicate a power or motherboard issue.
2) Run a CPU Stress Test (I prefer to not run longer than a few hours but that is my opinion) and see if that passes. Failures could indicate a CPU issue or a power supply or motherboard issue.
3) If those fail to show a failure, plug in all your hard drives (power cords only) and see if you can boot up your system.
4) If you can boot up, ensure you reset your boot device to factory settings just because I don't want the pool to try to mount. Then power down, connect one SATA port to a single hard drive, power up and see if it boots. Rinse and Repeat through all your drives until they are all working. If you find that one fails to work, try a different SATA cable.

You see the process here, take it slow and rule out the components. Post your results of all the testing and make sure it's clear what you did so we can give further advice. I don't think your pool is gone. Of course you should be able to move your drives into another computer and get to your data. If the problem follows then you likely have a single bad drive.
 

Kamikaze321

Dabbler
Joined
Nov 30, 2015
Messages
11
Thank you all for the suggestions. just to summarize this is what i have done so far:

1.) I ran memtest 86+ for about 16 hours the other day, six passes finished, zero errors
2.) I unplugged all 6 4TB drives and plugged in a single spare drive and I was able to create a pool without any crashes
(just to make this clear the crashing is not happening on boot but will happen the second I try to import the volume in the GUI or run any zpool import commands)
3.) I swapped the power supply. (both are high quality gold P.S.) this did not change anything, exact same crashing behavior persisted
4.) I unplugged 2 of 6 drives at a time and booted up / attempted to import the volume with the GUI, same crash occurred, no matter what 4 drives were plugged in(repeated 3 times)
5.) I just booted into stresslinux and am running the "burnP6" "stress --cpu 4" test right now.. (let me know if you have a preferred stress test software I should try)

If the stress test does not cause an issue next I will try booting up with a single drive at a time like you suggest, but since my crash is not occurring at boot, I need to import a volume to trigger it. I don't see what trying to import a single drive (or <4 drives) at a time will accomplish, since of course the import will not work either way.. but I will try

4) If you can boot up, ensure you reset your boot device to factory settings just because I don't want the pool to try to mount.

My understanding is my pool is crashing every time when it tries to mount.... :(
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
That is odd that importing your pool is causing the issue.

So when you try to import the pool, the pool is apparently listed in the drop down menu so you can select it to be imported, right?

I don't see where you tried the last version of FreeNAS you were running in December for the boot device and if that doesn't work, try a FreeNAS build for 6 months back or so. I'm curious if there is an issue with FreeNAS. Also, were you always upgrading your pool to the latest ZFS Feature Flags? (the alert would flash if you didn't uncheck the warning or upgrade).
 

Kamikaze321

Dabbler
Joined
Nov 30, 2015
Messages
11
When I go to import the pool I can make it to this step - http://imgur.com/3tXTXNg the second I hit OK the crash occurs

So the stress test ran for a few hours, CPU tempts got up to around 60°C, no crashes or anything unusual happened

-yes I was running the most up-to-date feature flags

I have now tested with a fresh install of 3 different version of Freenas and 3 different USB drives. every version I try had the same crashing behavior when I try to import to pool
FreeNAS-9.3-STABLE-201506292332
FreeNAS-9.3-STABLE-201509022158
FreeNAS-9.3-STABLE-201601181840

I also just happened to have a 6 pack of extra sata cables laying around so I have swapped all the original cables out, no change

I even switched all the drives into a different machine (had 8GB of ram) and booted up, exact same crash occurred when I imported.

So I think I can confidently say that this is not hardware related.

I'm about ready to throw in the towel on this one... I have a backup from the beginning of December so its not the end of the word. I just wish I could come up with a clear reason on why my pool became corrupted. Besides the fact that I have a non server grade board..

edit:
I have also found after a bit more testing that the crash will not occur if I have less than 3 (of 6) drives connected, but of course the pool is not found so I can't import... I guess that just proves that I need to have an intact pool of 4 or more drives to trigger the crash.
 
Last edited:

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I'm unfortunately out of options. It would be nice to know why this happened as you said. I would still report this as a bug and give all the details you can and point to this thread. It sounds like your hardware is fine and if I were in your shoes I'd just rebuild the pool and restore the backup. You have spent a lot of time on it already. The USB Flash drive you were using when the failure occured, I'd mark it as possibly bad to prevent you from using it in FreeNAS again.
 

Robert Trevellyan

Pony Wrangler
Joined
May 16, 2014
Messages
3,778
the crash will not occur if I have less than 3 (of 6) drives connected
As a last-ditch effort, have you tried all possible combinations of 4 drives? A RAIDZ2 pool with 2 drives missing would have no redundancy, but should give you an opportunity to update your backup.
 

Kamikaze321

Dabbler
Joined
Nov 30, 2015
Messages
11
As a last-ditch effort, have you tried all possible combinations of 4 drives? A RAIDZ2

Ya I've tried every combination of 4 drives, no luck. I'm just going to create a new pool and start the restore later tonight.

Thanks all for your help and suggestions
 
Status
Not open for further replies.
Top