SOLVED Host Not Found after replacing a drive

Status
Not open for further replies.

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
Hello,

First off, i'm relatively new to this so I've been reading over the manual and forum searching but can not find a straight answer for this.

So I have a 40 bay (1TB drives) Aberdeen storage server that I inherited from the previous engineer at my facility and it's running the latest version of FreeNas on it (I know b/c I saw him upgrade it). The system has been working fine but I had a hardware alarm alerting to a failed drive on Tuesday. I logged into the FreeNas GUI and saw that it was degraded but operational. So I started reading through the manual to see what I could do to replace the drive. Per the instructions, I OFFLINED the bad one, shut down the server and swapped out the drives. Upon starting it up, I could not access the GUI nor could I ping other machines from inside the SHELL. Also, because it's not talking to the network, I could not and can not access the files on the server.

Error I see when I restart the machine: Storage_Server_1 ntpd-initres[4907]: Host Name Not Found: 1. FreeBSD.pool.ntp.org

I've restarted multiple times.
I know the switch is working because all of my other machines are talking (including another Freenas server)
I've configured another nic to the same static IP address and moved the cable and it's not doing anything now (won't even boot to the freenas console)
It has a Raidz2-0 config

Will I destroy the current setup if I try changing the IP address that is currently assigned to it?
Is there any idea as to why I would lose connectivity to the GUI after swapping out a failed drive?
Am I just totally hosed on this?

Thanks
 

pirateghost

Unintelligible Geek
Joined
Feb 29, 2012
Messages
4,219
Hostname not found just indicates that it can't talk to a name server to resolve the ntp requests.

You will not break your freenas by changing the IP
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Replacing a failed drive shouldn't affect the network connection at all, but no, you shouldn't be "just totally hosed". What is the current state of the server? Will it boot at all, or does the boot process die somewhere? If the latter, what error message does it give?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
I had the same thought danb35 but it's not loading the GUI at all.

So it's to the point where it machine boots up and I have the console view if I switch to it on my KVM but when I enter the shell to test the network connectivity it can't ping anything. Also, it keeps throwing an error that states it can not access the WORKGROUP now.
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
when i try to ping it from my mac, I get "HOST IS DOWN" message and complete packet loss.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, if it can't ping anything on your network (I assume you've tried pinging by IP, not only by hostname), the GUI will be a non-starter. Your network connection is the problem. The obvious things to check would be cable connections (make sure they're tight, make sure you have a link light on both ends, maybe even try replacing the patch cable), that the NIC is well-seated in its slot, perhaps try a different port on your switch, open up the server and see that everything is still connected properly.

If nothing else, do you have a spare NIC you can install? It's always possible that your NIC died.
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
I've swapped out the patch cable and that did not work either. As far as the NIC, the lights are flashing and it appears to be talking. There are 4 total NICs on the chassis and i configured the 2nd NIC with the same IP address, netmask, etc. Then I moved the cable over to that one and rebooted, only to have the same problem.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Before you started this process (i.e., before the drive failed), was this server configured with a static IP, or via DHCP (with or without an IP reservation on the DHCP server)?

Could you try doing a clean installation onto a fresh USB stick, and see how that does?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
I have an assigned IP address for it. My 2nd, more stable unit, is using DHCP.... maybe I should change it.

If I do a fresh install on a new USB stick, won't that wipe out my setup?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
So it just randomly came back up (randomly, after I went in and for the third time checked the network configs and restarted for the 7th or 8th time) BUT 1 of my 2 volumes on it is corrupted and stating: error getting available space.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yes, try changing to DHCP and see what that does--if it picks up an IP, that pretty well proves it can talk to the network.

A fresh install would require you to reconfigure it. My reason for suggesting that right now was just to narrow down the problem--if it works from there, it's either a config problem, or a boot device problem.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Back up is good. Not knowing why isn't so good. Problems with one of your pools are less good. SSH to the server and post the output of 'zpool status', inside code tags, so we can see what's going on.
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
When I run the Zpool, I see 1 of 2 volumes and those drives are ONLINE and HEALTHY but the 2nd is non-existent. Can that be restored from a snapshot?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
[root@Storage_Server_1 ~]# zpool status
pool: Storage_Server_1-1
state: ONLINE
scan: scrub repaired 0 in 3h27m with 0 errors on Sun Nov 1 02:28:03 2015
config:

NAME STATE READ WRITE CKSUM
Storage_Server_1-1 ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
gptid/b08aab0e-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b0ed39f0-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b149fd09-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b1e13f2b-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b23a9766-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b2d45d87-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b337be77-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b3937c41-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b3f9c2b1-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b45a52ed-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b4bf5577-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b51d3dc9-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b589a036-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b5f16df6-63ab-11e5-822f-002215883973 ONLINE 0 0 0
gptid/b64c3335-63ab-11e5-822f-002215883973 ONLINE 0 0 0

errors: No known data errors

pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Tue Dec 1 03:45:57 2015
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
gptid/3c4fbe24-4114-11e5-b2f1-002215883973 ONLINE 0 0 0

errors: No known data errors
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
I just emailed my network engineer and as well and he said they had a DNS server failure this morning and just repaired the issue so maybe that's why but I doubt it because all of my other systems were working.

Thank you btw for your help!
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Unless you replicated the snapshot of the second pool somewhere else, no, you can't restore from it. What does 'zpool import' say?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
[root@Storage_Server_1 ~]# zpool import
pool: Storage_Server_1-2
id: 4126844871692108960
state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
see: http://illumos.org/msg/ZFS-8000-3C
config:

Storage_Server_1-2 UNAVAIL insufficient replicas
raidz2-0 UNAVAIL insufficient replicas
881286988809705582 UNAVAIL cannot open
16951986136311385172 UNAVAIL cannot open
13880951671858679305 UNAVAIL cannot open
761683466651723552 OFFLINE
15406529968983763880 UNAVAIL cannot open
15309855938180331212 UNAVAIL cannot open
15334817703217746897 UNAVAIL cannot open
gptid/fa63a359-63b0-11e5-beda-002215883973 ONLINE
gptid/fb3da0c2-63b0-11e5-beda-002215883973 ONLINE
gptid/fc0da00a-63b0-11e5-beda-002215883973 ONLINE
gptid/fce69c17-63b0-11e5-beda-002215883973 ONLINE
gptid/fdbdc471-63b0-11e5-beda-002215883973 ONLINE
gptid/fe93157c-63b0-11e5-beda-002215883973 ONLINE
gptid/ff774f1f-63b0-11e5-beda-002215883973 ONLINE
gptid/00715a4c-63b1-11e5-beda-002215883973 ONLINE
gptid/014fd0bd-63b1-11e5-beda-002215883973 ONLINE
gptid/02251362-63b1-11e5-beda-002215883973 ONLINE
gptid/02fca670-63b1-11e5-beda-002215883973 ONLINE
gptid/03e2324b-63b1-11e5-beda-002215883973 ONLINE
gptid/04c51dc0-63b1-11e5-beda-002215883973 ONLINE
gptid/05ab343f-63b1-11e5-beda-002215883973 ONLINE
gptid/069f5a83-63b1-11e5-beda-002215883973 ONLINE
gptid/07884eb6-63b1-11e5-beda-002215883973 ONLINE
[root@Storage_Server_1 ~]
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
Interesting, so that is the missing volume, or appears to be but with 7 dead drives? The hardware shows that all drives are operating without error.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Yes, you've offlined one disk, and FreeNAS can't see six of the others. It's pretty obvious that this will prevent a RAIDZ2 pool from mounting. So, what do these six drives have in common?
 

COTVJosh

Dabbler
Joined
Sep 21, 2015
Messages
24
Hmmm is that a rhetorical question? Hahaha. What would cause it to not see the 6 others I wonder.
 
Status
Not open for further replies.
Top