Unable to boot to GUI to Offline disk

Status
Not open for further replies.

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
Hi Guys,

So whilst I was out of town/unable to get connected to my server one of the disks (4x3tb RAIDz1 - yes CyberJock I know, just read the whole thing before you start typing your 'look at my signature, dumbass' reply :D ) started failing. I got an email with a few bad sectors. All I could do was shut it down in the hopes of minimizing damage until I got home.

Now that I'm home however I cannot boot to the GUI in order to start the disk replacement, which is kind of leaving me stuck in regards my pool. I think the disk has had a fairly serious failure pretty quickly.

At boot I get stuck in a loop of:

Code:
CAM status: ATA Status Error
ATA Status: 51 (DRDY SERV ERR), error: 40 (UNC )


Now, if I remove the troubled disk I can get to the GUI, although obviously in that case my pool is gone.

Has anyone ever had this/seen this? I've got a lingering hope that if I can boot to GUI I can offline the disk and replace it, but I'm pretty sure in reality I've lost my data.

If that's the case so be it I suppose. I keep a cold spare, a keen eye on performance and am fine with the risk but I was caught out at a time I couldn't respond. These things happen.

I am using a Gen8 Microserver, 16Gb RAM. Up to date (updates applied within the last fortnight) FreeNAS 9.10 boots from USB stick.

Anything I can try when I get home tonight would be appreciated - thanks for reading.
 

Artion

Patron
Joined
Feb 12, 2016
Messages
331
one of the disks (4x3tb RAIDz1 -... started failing.
if I remove the troubled disk I can get to the GUI, although obviously in that case my pool is gone.
Well, AFAIK, pulling out a disk from a RAIDZ1 pool isn't so catastrofic (depending on how big your disks are and the amount of data on them). The pool will continue working in a degraded state, as a RAIDZ1 can sustain losing 1 drive. So you can attach a new drive where the bad one was and replace it from the GUI so to start the resilvering process.
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Have you tried booting back to an older version (prior to the update) instead of using the updated version?

Some have complained about updating to 11.0-U3 and not being able to go into the WebUI. Maybe you are in the similar situation with 9.10
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
Well, AFAIK, pulling out a disk from a RAIDZ1 pool isn't so catastrofic (depending on how big your disks are and the amount of data on them). The pool will continue working in a degraded state, as a RAIDZ1 can sustain losing 1 drive. So you can attach a new drive where the bad one was and replace it from the GUI so to start the resilvering process.

When I pull the disk and thus can get into the GUI my volume shows as UNKNOWN.

Have you tried booting back to an older version (prior to the update) instead of using the updated version?

Some have complained about updating to 11.0-U3 and not being able to go into the WebUI. Maybe you are in the similar situation with 9.10

The updates applied a couple of weeks ago and everything had been working just fine. I guess I probably hadn't done a reboot in that time but I'm 99% sure this is due to a pretty catastrophic HD failure. That being said I am going to try a new boot device, it's worth a shot and takes little time.
 

Artion

Patron
Joined
Feb 12, 2016
Messages
331
When I pull the disk and thus can get into the GUI my volume shows as UNKNOWN.
Have you tried "Import Disk" or "Import Volume" from the GUI?

Can you please post the output of zpool status command?
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
I've not tried to Import the new disk. I was wary of trying this without offlining the faulty disk.

zpool status only shows:

Code:

[root@domsnas] ~# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Fri Aug 25 03:46:34 2017
config:

		NAME										  STATE	 READ WRITE CKSUM
		freenas-boot								  ONLINE	   0	 0	 0
		  gptid/90eedf7a-c602-11e5-b973-d0bf9c460324  ONLINE	   0	 0	 0

errors: No known data errors



Which is obviously not my actual pool.
 
Last edited:

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
Managed to get booted up with all disks present...

Code:

[root@domsnas] ~# zpool status
  pool: Data
state: ONLINE
  scan: scrub repaired 0 in 6h31m with 0 errors on Tue Sep  5 23:22:55 2017
config:

		NAME										  STATE	 READ WRITE CKSUM
		Data										  ONLINE	   0	 0	 0
		  gptid/3b599aaf-c87c-11e6-87bf-d0bf9c460324  ONLINE	   0	 0	 0
		  gptid/3c202095-c87c-11e6-87bf-d0bf9c460324  ONLINE	   0	 0	 0
		  gptid/3d4c5eac-c87c-11e6-87bf-d0bf9c460324  ONLINE	   0	 0	 0

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Fri Aug 25 03:46:34 2017
config:

		NAME										  STATE	 READ WRITE CKSUM
		freenas-boot								  ONLINE	   0	 0	 0
		  gptid/90eedf7a-c602-11e5-b973-d0bf9c460324  ONLINE	   0	 0	 0

errors: No known data errors



Looks like the failed disk is missing from the pool?
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Looks like the failed disk is missing from the pool?
Then why is your pool not in DEGRADED state? Did you only have 1 pool named Data? Did you have any other pools where the disk might have belonged to?

You sure you had 4 disks in your pool and not just 3 with 1 being a spare or something ;)
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
One pool, named Data. Set up RAIDz1 and keep a cold spare in case anything starts going wrong. Or so I thought.

It actually looks to me like you are right, it looks like it's not set how I thought it was which is an absolute nightmare. I can't offline the drive as there's no valid replicas.
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
One pool, named Data. Set up RAIDz1 and keep a cold spare in case anything starts going wrong. Or so I thought.

It actually looks to me like you are right, it looks like it's not set how I thought it was which is an absolute nightmare. I can't offline the drive as there's no valid replicas.
Well the good thing is that everything is ONLINE now. And you have backups. Get the data back from there. You do have backups right? If not, fer shame !!!!!:p

Backup all the data NOW !

Then maybe recreate the pool if you are so inclined. Maybe this time, listen to @cyberjock and don't use RAIDZ1 ;)
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
I'm trying, believe me! Running around trying to find any old hard drive I can!

Turns out I didn't use z1, god knows what I did use.
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
I'm trying, believe me! Running around trying to find any old hard drive I can!

Turns out I didn't use z1, god knows what I did use.
Gahhh !!!:confused: It pains me to not know !!:mad:

j/k
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
Thanks for your help. It looks like it's used some sort of stripe with three disks and one spare?

I was sure I set it t z1 - I remember thinking "If this ever goes wrong I can't ask on the forum" :D
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Thanks for your help. It looks like it's used some sort of stripe with three disks and one spare?

I was sure I set it t z1 - I remember thinking "If this ever goes wrong I can't ask on the forum" :D
Well you did and you lived through it.

Now don't use RAIDZ1 :mad:, g0ddamnit !!!!
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
I was also wondering why your username does not have a title underneath. Based on your post count, there should be a Newbie underneath your username. Is there a setting in the profile to disable this?
 

droeders

Contributor
Joined
Mar 21, 2016
Messages
179
It looks like your Data pool consists of a 3 disk stripe (i.e. RAID0). If one disk fails, you lose your entire pool. This is far more dangerous than RAIDz1.

I would recommend a full backup, then re-create your pool - as RAIDz1, RAIDz2, or striped mirrors (i.e. something with redundancy).
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
I was also wondering why your username does not have a title underneath. Based on your post count, there should be a Newbie underneath your username. Is there a setting in the profile to disable this?

No Idea. I signed up a long time ago - maybe it's a time based thing?

It looks like your Data pool consists of a 3 disk stripe (i.e. RAID0). If one disk fails, you lose your entire pool. This is far more dangerous than RAIDz1.

I would recommend a full backup, then re-create your pool - as RAIDz1, RAIDz2, or striped mirrors (i.e. something with redundancy).

Yeah I definitely don't want to be RAID0.

Since I have one disk "spare" as it were, can I replace the failing drive with this one before I do my backing up? I'm wary of this basically being a ticking time bomb! (It's not so bad, I've got 75% of the stuff elsewhere but it would set my mind at ease a bit.)
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Yeah I definitely don't want to be RAID0.

Since I have one disk "spare" as it were, can I replace the failing drive with this one before I do my backing up? I'm wary of this basically being a ticking time bomb! (It's not so bad, I've got 75% of the stuff elsewhere but it would set my mind at ease a bit.)
But all your drives are ONLINE, so why would you replace any? Just back up the rest of the 25% data and recreate the pool using 2-way mirrors or RAIDZ2.
 

TeamDom

Dabbler
Joined
Mar 16, 2016
Messages
11
But all your drives are ONLINE, so why would you replace any? Just back up the rest of the 25% data and recreate the pool using 2-way mirrors or RAIDZ2.

Just because one is reporting 1,976 unreadable sectors.
 

Inxsible

Guru
Joined
Aug 14, 2017
Messages
1,123
Just because one is reporting 1,976 unreadable sectors.
Where does it say that? Did you run a SMART test? If so did you run it on the spare or on the drives that are in your zpool? Are you sure it's not your spare which has the disk errors? Double, no triple check the gptid of the disk that is showing failures before removing that drive. Because you have Raid0, if you remove the wrong disk, your entire pool will be lost. Since your system is online now, I would first backup all the important stuff to another machine and then worry about replacing drives.
 
Status
Not open for further replies.
Top