SERIOUS HELP- iSCSI Target Not Connecting

Status
Not open for further replies.
Joined
Feb 12, 2016
Messages
12
Hi Everyone. We have a serious outage that is affecting over 30 Nursing homes and various businesses. Without getting into detail, several virtual machines are responsible for their doors and credentials (offsite) and without this, there's no management. If we have to pay someone to get this system back up we will.

History: March of 2016 we fired up a DELL R710, installed FreeNAS 9.3 STABLE (on a SanDisk 16GB thumb drive) as a storage / LUN unit as a ZFS (recommended by threads and boards. We have 6 x 1TB enterprise WD HDD's. There are 4 Virtual Machines that run on this system: 1 x Domain Controller, 1 x File Server, 1 x Tech Console and 1 x Access Control Database Manager. The Host: DELL R410 with 32GB RAM and 2 x XEON E class processors.

This system above has been running for over 2 years WITHOUT ANY ISSUE WHATSOEVER, the most stable environment yet. We never had to worry. Power outages? If there were any outages longer than the batteries could accommodate, there's nothing that a simple reboot couldn't fix, and a simple RESCAN on the ESXi host would always reconnect and the vm's would always start running.

Yesterday (12-17-2018) A new Rack arrived. Simply only to accommodate the the system above. We shut down all virtual machines safely, shut down the ESXi host, then logged into the FreeNAS box and performed a shut down. We moved all equipment, into the rack, switches, power, etc, and when we turned on, the ESXi host could not locate the scsi target. Performed 3-7 reboots of FreeNAS, about 5 reboots of host, rescan for disks/LUN's and even went to the point of removing the iscsi from the host (adapter) and reconnecting. Wont connect.

As we log into FreeNAS, we do see the BLOCK (iSCSI) and our configuration for data1, target1, extent1, the structure according to how we set up is there. Is this a mounting issue? Is this a bug? Can someone assist, This is a site-wide disaster.

We just cant determine if our data and machines are still there. After a reboot, after the 8th time of the ESXi host, we right clicked on the host and performed a Rescan for Datastores, we clicked on the "click here to create a datastore" and now the FreeBSD target is available but it wants us to rename and format. Someone please help. This is serious and we have been troubleshooting since last night and wont go through extreme measures because we dont want to wipe any possible data thats on those disks.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Is the FreeNAS system reporting any health issue?
 
Joined
Feb 12, 2016
Messages
12
Hi Chris

These are the only 2 errors we get at sign in:


WARNING: smartd is not running. (<< this error has always been here since the beginning)
CRITICAL: The volume data1 (ZFS) state is UNKNOWN: (<< This error is NEW as of yesterday)
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
WARNING: smartd is not running. (<< this error has always been here since the beginning)
That should not have been ignored. If you had smartd running, it might have been able to tell you that a drive was beginning to fail before it completely died. Which brings me to,
CRITICAL: The volume data1 (ZFS) state is UNKNOWN: (<< This error is NEW as of yesterday)
This probably indicates that some number of drives are offline making the storage pool be offline.
What is the layout of your storage pool and the details of your hardware configuration?
March of 2016 we fired up a DELL R710, installed FreeNAS 9.3 STABLE
This is not details. To troubleshoot this, we need to know exactly how it was configured.
 
Joined
Feb 12, 2016
Messages
12
I agree with you. But, theres been no issues since day 1, also, in the Services area, iSCSI is turned on and S.M.A.R.T. is in fact turned on unless its something else its warning me about
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I agree with you. But, theres been no issues since day 1,
No. You are under a misunderstanding that you could have cleared up by asking on the forum. You have had a fault since day one that you never bothered to investigate or attempt to clear. Just because it is yellow instead of red and the system appears to work does not mean it is fully functional. The system can still work, under some circumstances, when it has a red "Critical" fault. Would you have ignored that too?
in the Services area, iSCSI is turned on
I didn't ask about that. It isn't relevant as long as your storage pool is in this state:
CRITICAL: The volume data1 (ZFS) state is UNKNOWN: (<< This error is NEW as of yesterday)
Or did you forget about that? Your iSCSI data is stored somewhere, I will let you guess where.
We have 6 x 1TB enterprise WD HDD's.
Those drives you mentioned, well they should have (theoretically) been mapped by ZFS (the file system of FreeNAS) into a storage pool that you appear to have named "data1". That storage pool is offline. Which brings us back to:
and S.M.A.R.T. is in fact turned on unless its something else its warning me about
The smartd daemon (the one that isn't running) is the thing that checks S.M.A.R.T. status, so you have not had any S.M.A.R.T. status checks running the entire time the server has been running and it is very likely that in that time there have indeed been hard drive faults that you were not made aware of. Thus, it is likely you have enough drives failed that your storage pool is completely lost.
Do you have a backup?

Are you aware of how to open an SSH session to your FreeNAS? Or, is this even configured?
If you can, run the command zpool status at the command prompt and share the results of that with us, it might shed some light on the situation.
It should (one would hope) look something like this:

Code:
[root@freenas ~]# zpool status -v                                                                                       
  pool: NAS                                                                                                                     
state: DEGRADED                                                                                                                 
status: One or more devices has experienced an unrecoverable error.  An                                                         
        attempt was made to correct the error.  Applications are unaffected.                                                     
action: Determine if the device needs to be replaced, and clear the errors                                                       
        using 'zpool clear' or replace the device with 'zpool replace'.                                                         
   see: http://illumos.org/msg/ZFS-8000-9P                                                                                       
  scan: scrub repaired 788K in 1h25m with 0 errors on Sat Jul  9 01:37:34 2016                                                   
config:                                                                                                                         
                                                                                                                                
        NAME                                            STATE     READ WRITE CKSUM                                               
        NAS                                             DEGRADED     0     0     0                                               
          raidz2-0                                      DEGRADED     0     0     0                                               
            gptid/da4b033c-0ce0-11e6-b23c-d050997eebf5  ONLINE       0     0     0                                               
            gptid/db5b2629-0ce0-11e6-b23c-d050997eebf5  ONLINE       0     0     0                                               
            gptid/dc7f5445-0ce0-11e6-b23c-d050997eebf5  DEGRADED     0     0   196  too many errors                             
            gptid/dd99642d-0ce0-11e6-b23c-d050997eebf5  ONLINE       0     0     0                                               
            gptid/deb99316-0ce0-11e6-b23c-d050997eebf5  ONLINE       0     0     0                                               
            gptid/dfe5d181-0ce0-11e6-b23c-d050997eebf5  ONLINE       0     0     0                                               
                                                                                                                                
errors: No known data errors                                                                                                     
                                                                                                                                
  pool: freenas-boot                                                                                                             
state: ONLINE                                                                                                                   
  scan: scrub repaired 0 in 0h1m with 0 errors on Sat Jul  2 03:46:27 2016                                                       
config:                                                                                                                         
                                                                                                                                
        NAME        STATE     READ WRITE CKSUM                                                                                   
        freenas-boot  ONLINE       0     0     0                                                                                 
          da0p2     ONLINE       0     0     0                                                                                   
                                                                                                                                
errors: No known data errors 
 
Joined
Feb 12, 2016
Messages
12
I'm not trying to muscle you around, im just asking questions because we are very unfamiliar to troubleshoot FreeNAS and we desperately need help. Yes we use SSH to interface with ESXi, but never did with FreeNAS. When we login, we use the shell - that's all we know. I will run the
zpool status -v command in SSH or shell? If I use SSH, what port do I use?
 
Joined
Feb 12, 2016
Messages
12
I used the FreeNAS Shell

[root@freenas ~]# zpool status -v
pool: freenas-boot
state: ONLINE
scan: scrub repaired 0 in 0h0m with 0 errors on Wed Nov 28 03:45:38 2018
config:

NAME STATE READ WRITE CKSUM
freenas-boot ONLINE 0 0 0
da0p2 ONLINE 0 0 0

errors: No known data errors
[root@freenas ~]#
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I used the FreeNAS Shell
The shell in the GUI is incredibly limited because you can't scroll back to see what went off the screen. If you had a pool, which you don't, you would not have been able to capture the data through the shell window.
It would be a good idea for you to setup the SSH service so you can use something like Cygwin or PuTTY to SSH in to a terminal window.
https://www.ixsystems.com/documentation/freenas/9.3/freenas_services.html#ssh

What this is showing is very bad. Either the disk controller that runs the drives has failed and all the drives are not showing up, or you used a hardware RAID controller to configure the drive and the hardware RAID has failed because of too many failed drives.
Not looking good.
Lets look at the GUI.
If you go to "View Volumes" in the GUI, here is the manual page if you need it:
https://www.ixsystems.com/documentation/freenas/9.3/freenas_storage.html#view-volumes
and click on the button at the bottom:

1545155913990.png


This should show you a list of the disks that make the volume. Please share that with us.
It should look something like this, but I am guessing that it does not:

1545156184869.png
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
I'm not trying to muscle you around, im just asking questions because we are very unfamiliar to troubleshoot FreeNAS and we desperately need help. Yes we use SSH to interface with ESXi, but never did with FreeNAS. When we login, we use the shell - that's all we know. I will run the
zpool status -v command in SSH or shell? If I use SSH, what port do I use?
It is a pretty standard setup. Almost as easy as just turning the service on.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Did you get this resolved successfully?
 
Joined
Feb 12, 2016
Messages
12
No I didn't get this resolved. Im now reaching out to other IT companies/friends of mine if they know FreeNAS because even though ARE IT, we dont work with FreeNAS too often. This was a one time setup for us, we built it, it worked and we got hit with so much work that we never took the time to embrace FreeNAS to fully understand it. 3 years later, we didnt think that a siple restart and shut down would cause such malfunction.

I dont know how all my settings are gone.

Yes this was set up as a pool, 6 disks, ZFS. i setup the data pool as data1, target as target1 and extent as extent1. Trust me, it was setup somewhat reasonable to work as long as it did. Were in serious trouble because sensitive machines are on this.

I would not mind you jumping on my system remotely taking a look. I Just need to know if my data is there. Is there a way to extract or lok at the vm files>?
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Yes this was set up as a pool, 6 disks, ZFS. i setup the data pool as data1, target as target1 and extent as extent1. Trust me, it was setup somewhat reasonable to work as long as it did. Were in serious trouble because sensitive machines are on this.
Did you look where I asked you to look? I would like to see what the results are.
There is a possibility that the SAS controller failed. I have had that happen before and all the disks on the controller simultaneously disappeared. Replacing the controller brought all the disks back along with the data. We were working on finding answers and you stopped answering.
 
Joined
Feb 12, 2016
Messages
12
Yea I'm sorry. this has made me very tired, and we also have a huge customer base calling in at the same time so I had to walk away to attend to other service requests. I will look over your notes and repost. Thank you.
 
Joined
Feb 12, 2016
Messages
12
I ran a zpool status -v within the shell of FreeNAS and the image is what the message returned. Then I ran a zpool import, all my disks are online, but when I looked up the error message, it indicated that a device is missing. In the case of FreeNAS, what is a "device" I can think of 1,000 different things that a device can be
 

Attachments

  • zpool-import.PNG
    zpool-import.PNG
    32.1 KB · Views: 351
  • zpool-status-v.PNG
    zpool-status-v.PNG
    21.1 KB · Views: 401

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Well, based on your zpool-import picture your data1 pool is raidz1 with two unavailable drives. So, if both drives are in fact dead your data1 pool is gone. Hopefully you have backups of that data.
 
Joined
Feb 12, 2016
Messages
12
So, thats interesting (Mlovelace), all my hard drives are running with green lights on the physical storage unit. there are only 6, I count 6 "ONLINE" on the shell - how can you declare 2 are dead, by this:


13508958357321515966 UNAVAIL cannot open
3082284579865464576 UNAVAIL cannot open
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
The lights are on, but no one is home. The drives are unavailable to the pool, and I didn't say the drives were dead, I said if both drives are in fact dead. You need to see why the drives are unavailable.
 
Status
Not open for further replies.
Top