Volume state is UNKNOWN

Status
Not open for further replies.

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
Hi All,

I have what appears to be some data corruption, and I'm trying to make sense of it. Server setup first, then some log data.

Server: Dell R720xd with 2x CPU & 384gb RAM (ECC)
Boot: from USB
2 pools -
jails - made up of a single set of mirrored drives
tank - made up of 2 vdevs with 12 drives each, raidz3
vdev0 - 12x3TB Disks locally attached to an HBA inside the server (NOT a raid controller, and these are the disks that are showing up)
vdev1 - 12x2TB Disks attached to an Netapp DS4246 via an X2065A (4 port QSFP HBA) [NOT SHOWING UP]


Yesterday something (power outage? HBA Fault?) occurred which caused a reboot of FreeNAS. The server was booting back up and was paused on one of the HBAs, saying the configuration of the HBA had changed. I did a cold boot and it came back up - still saying that same message however it went past it.

Upon booting into FreeNAS, two things occurred.
1) the GUI was corrupted, the main home page displayed HTML errors, several other pages just flat out failed to load.
2) both pools were in state "UNKNOWN"

After another reboot (swapped SAS ports to alternate ports on my HBA) the pool "Jails" came back online.

tank is still offline. With the command zpool import I get the following output: What sticks out to me is that one of my vdevs is not showing up. The disks show up in the "View Disks" however when I do "geom disk list" I only see the first half of the vdev.

Code:
root@freenas:/ # zpool import
   pool: tank
	 id: 907423887126384953
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
		The pool may be active on another system, but can be imported using
		the '-f' flag.
config:

		tank											FAULTED  corrupted data
		  raidz3-0									  ONLINE
			gptid/e6f85169-5136-11e8-b376-bc305bf48148  ONLINE
			gptid/f2bd171b-5136-11e8-b376-bc305bf48148  ONLINE
			gptid/181f4f1a-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/24da255f-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/325a8e49-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/4a4df2c1-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/6a52ee8d-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/7bc6a747-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/8d306d6b-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/a82f0ce2-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/bb5df1ae-5137-11e8-b376-bc305bf48148  ONLINE
			gptid/db7bb774-5137-11e8-b376-bc305bf48148  ONLINE



Questions

1) Why do disks show up in "View Disks" but not in cmd line?
2) If I trust the cmd line, then something is happening preventing my second vdev from showing up correct?
3) assuming it's just a failed HBA, cable, or IOM on my shelf I should be able to power down, swap the HBA / cable / IOM and bring it back up and be happy?
4) If I have errors on my GUI is it likely that something on my install is also corrupted?
5) where in the logs can I get some better clarity on what might have occurred. I think I neglected proper logging setup so can you point me into a good place to see what needs setup so when I recover or rebuild I'll have some sanity?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
tank - made up of 2 vdevs with 12 drives each, raidz3
That list doesn't show 2 vdevs of 12 drives. It shows one vdev. Why is the list truncated?
Yesterday something (power outage? HBA Fault?) occurred which caused a reboot of FreeNAS.
What version?
1) Why do disks show up in "View Disks" but not in cmd line?
I would postulate that the list in the GUI is the drives that the system thinks should be present, and the list at the command line are the disks that are actually addressable.
vdev1 - 12x2TB Disks attached to an Netapp DS4246 via an X2065A (4 port QSFP HBA) [NOT SHOWING UP]
This external enclosure may have a hardware fault or the SAS controller that runs it may have failed. I had a SAS controller fail in one of my systems and all the drives on that controller just didn't show up. When the controller faulted (while the system was running) that caused it to reboot. It has been a few years and I can't recall if they were visible in the GUI as you describe, but the pool could not be imported with only half the drives showing up.
Upon booting into FreeNAS, two things occurred.
1) the GUI was corrupted, the main home page displayed HTML errors, several other pages just flat out failed to load.
That is strange. If you have a backup of your config, you might want to do a fresh install on a new boot media. The USB drives do fail sometimes and they have some really odd and unpredictable behavior when they do. That is why I don't use them.
3) assuming it's just a failed HBA, cable, or IOM on my shelf I should be able to power down, swap the HBA / cable / IOM and bring it back up and be happy?
That is what I did. I replaced the HBA, booted back into FreeNAS and everything worked again. Do you have spares?
 

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
That list doesn't show 2 vdevs of 12 drives. It shows one vdev. Why is the list truncated?
Yeah that's why I was thinking this must be something related to the shelf, and not an actual all out failure.
What version?
Important detail sorry for leaving it out. 11.1 - U5

I would postulate that the list in the GUI is the drives that the system thinks should be present, and the list at the command line are the disks that are actually addressable.
Not an expert but this appears to be the case - although this is different from the behavior that I've seen when adding new drives (or popping them in and doing a camcontrol rescan all to get serial #s mapped to a specific slot.

This external enclosure may have a hardware fault or the SAS controller that runs it may have failed. I had a SAS controller fail in one of my systems and all the drives on that controller just didn't show up. When the controller faulted (while the system was running) that caused it to reboot. It has been a few years and I can't recall if they were visible in the GUI as you describe, but the pool could not be imported with only half the drives showing up.
Unclear on if it's a hardware issue with the card or the SAS controller, I swapped to the other IOM, and swapped to a different SAS card. Fortunately it appears my pool is still alive, the boot threw lots of errors (including one saying that it could not mount the pool) but after finally booting it's OK now.
That is strange. If you have a backup of your config, you might want to do a fresh install on a new boot media. The USB drives do fail sometimes and they have some really odd and unpredictable behavior when they do. That is why I don't use them.
I've been for some time now using USB to boot for ESX and FreeNAS, this is the first problem I've had but it definitely is frustrating since it seems to be combined with an HBA failure/issue. There is definitely corruption on the boot volume, my AD Config won't load because it appears DNS is not working. I can't get to some of the config pages in the gui. But my data is there. Grabbing a final backup of some critical files and then will be reloading to a new USB drive.

That is what I did. I replaced the HBA, booted back into FreeNAS and everything worked again. Do you have spares?
[/QUOTE]
For others reading - I was able to effectively swap the HBA (Actually just connected this shelf to another existing HBA in the system) and powered on my box and it all came back to life.

a few additional questions:

Can I just copy the /data/FreeNAS-v1.db file as my backup? and place it on the new usb drive?
Will doing this put any additional risk to my data if there's something wonky in the config?
Any inputs on log data? Considering standing up a syslog server to capture information, I'm curious if there were any early warning signals that the controller was having issues. I run a weekly scrub on Sundays and it showed 0 errors

Again Chris thanks for the answers.
 

mellman

Dabbler
Joined
Sep 16, 2016
Messages
38
update...

installed freenas to a new USB stick, restored the config, everything seems happy.
 
Status
Not open for further replies.
Top