SOLVED Disks Unavailable, Pool Won't Import

Status
Not open for further replies.

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
For reasons below, we might also add "user has brain problems", but hopefully you will forgive my panicked mistake/not make too much fun of me. Also, I have seen that 4GB is too little RAM for ZFS, so I shall be upgrading that as soon as I am able.

The issue is very similar to this thread: Pool won't import, disks are now missing - except that I have no multipath issues reported and insofar as I can tell multipath has not been enabled.

Also, it's pretty long and I tend not to explain things very well, so thank you to anyone who reads all the way through.

The issues are:
  • The pool won't mount
  • Disks can be seen by the system on lower levels, but not on higher levels
System Setup:
  • Core i3
  • 4GB RAM
  • FreeNAS 9.1 on a USB Stick
  • 6 2TB HDDs
  • RAIDZ with a single disk of redundancy
Storage Configuration:
All drives are SAMSUNG, a process of logic tells me that one of the drives not showing up is attached to each of the controllers (ada1 to the Adaptec controller, ada4 to the ASRock mainboard).
  • 2 Disks attached to the ASRock Motherboard, in AHCI mode
  • 4 Disks attached to an Adaptec 1430SA
The Cause:

I am about 90% certain that two things preceded this:
  1. A too short network cable that caused the box to jolt when it was being slightly moved (a silly idea when it's turned on, I know).
  2. Heat - it was very hot, and the CPU fan had been blocked by a cable end that had fallen into its housing. The CPU was thus cooling ambiently/passively and indirectly heating the HDDs, idling at around 90C!
What I See in BIOS/POST:
Initially I saw the following:
The 2 Drives attached to the mainboard didn't really show up. However the ASRock post would show all 4 drives as JBOD disks, configured in Legacy mode. However, the drive attached to port 1 (ada1) showed up as non-configured.
Here's where I did something foolish. In the Adaptec configuration tool I highlighted and hit configure on ada1 - so it now shows up as a Simple Volume and has its status shown as OPTIMAL rather than Legacy or non-configured. I am guessing that this drive is basically dead and gone in terms of its data, so all of my hope rests on recovering ada4 and then forcing a replacement on ada1 and resilvering. I am also fairly sure that this might not be possible.
Diagnostics/Commands/Status
Code:
# uname -a
FreeBSD carbon.dis 9.1-STABLE FreeBSD 9.1-STABLE #0 r+16f6355: Tue Aug 27 00:38:40 PDT 2013    root@build.ixsystems.com:/tank/home/jkh/src/freenas/os-base/amd64/tank/home/jkh/src/freenas/FreeBSD/src/sys/FREENAS.amd64  amd64

Code:
# zpool import
  pool: Omega
    id: 15715021659324379333
  state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
        devices and try again.
  see: http://illumos.org/msg/ZFS-8000-3C
config:
 
        Omega                    UNAVAIL  insufficient replicas
          raidz1-0              UNAVAIL  insufficient replicas
            gpt/ada0            ONLINE
            9434423201809139189  UNAVAIL  cannot open
            gpt/ada2            ONLINE
            gpt/ada3            ONLINE
            2943002579415553192  UNAVAIL  cannot open
            gpt/ada5            ONLINE

Code:
# camcontrol devlist
<SAMSUNG HD204UI 1AQ10001>        at scbus0 target 0 lun 0 (ada0,pass0)
<SAMSUNG HD204UI 1AQ10001>        at scbus1 target 0 lun 0 (ada1,pass1)
<SAMSUNG HD204UI 1AQ10001>        at scbus2 target 0 lun 0 (ada2,pass2)
<SAMSUNG HD204UI 1AQ10001>        at scbus3 target 0 lun 0 (ada3,pass3)
<SAMSUNG HD204UI 1AQ10001>        at scbus7 target 0 lun 0 (ada4,pass4)
<SAMSUNG HD204UI 1AQ10001>        at scbus8 target 0 lun 0 (ada5,pass5)
<Imation Classic PMAP>            at scbus9 target 0 lun 0 (pass6,da0)

Code:
# gpart show
=>        34  3907029101  ada0  GPT  (1.8T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834703    2  freebsd-zfs  (1.8T)
 
=>        34  3907029101  ada2  GPT  (1.8T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834703    2  freebsd-zfs  (1.8T)
 
=>        34  3907029101  ada3  GPT  (1.8T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834703    2  freebsd-zfs  (1.8T)
 
=>        34  3907029101  ada5  GPT  (1.8T)
          34          94        - free -  (47k)
        128    4194304    1  freebsd-swap  (2.0G)
    4194432  3902834703    2  freebsd-zfs  (1.8T)
 
=>      63  15116673  da0  MBR  (7.2G)
        63  1930257    1  freebsd  [active]  (942M)
  1930320        63      - free -  (31k)
  1930383  1930257    2  freebsd  (942M)
  3860640      3024    3  freebsd  (1.5M)
  3863664    41328    4  freebsd  (20M)
  3904992  11211744      - free -  (5.4G)
 
=>      0  1930257  da0s1  BSD  (942M)
        0      16        - free -  (8.0k)
      16  1930241      1  !0  (942M)


Code:
# glabel status
          Name  Status  Components
      gpt/ada0    N/A  ada0p2
      gpt/ada2    N/A  ada2p2
      gpt/ada3    N/A  ada3p2
      gpt/ada5    N/A  ada5p2
ufs/FreeNASs3    N/A  da0s3
ufs/FreeNASs4    N/A  da0s4
ufs/FreeNASs1a    N/A  da0s1a


Code:
# dmesg | grep ada1
ada1 at mvsch1 bus 0 scbus1 target 0 lun 0
ada1: <SAMSUNG HD204UI 1AQ10001> ATA-8 SATA 2.x device
ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad6
GEOM_RAID: DDF-LE: Disk ada1 state changed from NONE to ACTIVE.
GEOM_RAID: DDF-LE: Subdisk SimpleVol:0-ada1 state changed from NONE to ACTIVE.


Code:
# dmesg | grep ada4
ada4 at ahcich4 bus 0 scbus7 target 0 lun 0
ada4: <SAMSUNG HD204UI 1AQ10001> ATA-8 SATA 2.x device
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada4: Command Queueing enabled
ada4: 1907729MB (3907029168 512 byte sectors: 16H 63S/T 16383C)
ada4: quirks=0x1<4K>
ada4: Previously was known as ad18


Here are a few things suggested in the similar thread I linked above:

Code:
# gpart recover /dev/ada1
gpart: arg0 'ada1': Invalid argument
 
# gpart recover /dev/ada4
gpart: arg0 'ada4': Invalid argument


Code:
# gmultipath list
#


Pastbin:

Full dmesg output
smartctl -a /dev/ada1
smartctl - a /dev/ada4

Hopefully that will provide the various gurus I've been reading the responses from somewhere to start.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Few things in no particular order:

1. Your smart data for ada1 and ada4 are STILL over the 40C recommendation. In essence, your hard drives are overheating even in their current condition and completely idle. This isn't good.
2. Without getting at least one of those disks on the pool, you have no chance of getting at your data. I think you know this.
3. What exactly happened to cause this? You're a little short on details. Was the server working fine and then one day you rebooted? Are you trying to go from some other product to FreeNAS, tell the story...
4. Short tests every hour on ada1 and ada4.. WHOA! That's a bit frequent!
5. No long test on either drives in the log.. not a good sign.
6. Your drives appear to be in good shape even despite any overheating. I will say I've had drives that overheated and they start writing trash to the drives. As soon as you cool them down they'll start working properly again, but any trash written is... well....trashed.
 

Yatti420

Wizard
Joined
Aug 12, 2012
Messages
1,437
Also what kind of flash media? I've been going through alot of my old flash media lately and have big issues when I try to use them for FreeNAS.. I try to stick to something high quality/brand name.. ..
 

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
Easiest first: Yatti420, it's an 8GB Imation flash USB. The particular model is pretty good in my experience and it is a relatively fresh one - I had a super small form factor SanDisk drive, but it died. The machine kept on going though - just couldn't perform any administration. Upgraded to 9.1 and replaced the SanDisk usb drive with the current one a few months ago. As far as I can tell it's the RAID array, not the OS flash disk which is providing the issues.
 

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
Few things in no particular order:

1. Your smart data for ada1 and ada4 are STILL over the 40C recommendation. In essence, your hard drives are overheating even in their current condition and completely idle. This isn't good.
2. Without getting at least one of those disks on the pool, you have no chance of getting at your data. I think you know this.
3. What exactly happened to cause this? You're a little short on details. Was the server working fine and then one day you rebooted? Are you trying to go from some other product to FreeNAS, tell the story...
4. Short tests every hour on ada1 and ada4.. WHOA! That's a bit frequent!
5. No long test on either drives in the log.. not a good sign.
6. Your drives appear to be in good shape even despite any overheating. I will say I've had drives that overheated and they start writing trash to the drives. As soon as you cool them down they'll start working properly again, but any trash written is... well....trashed.


1. Unfortunately, this can't be helped at present - temperatures are very high, my apartment is west facing and my aircon is a little not-existent. Ambient temperatures are sitting at around 30 degrees as things are. I'll see about this though. The case is pretty tight a Fractal Design mini, so that's perhaps a poor decision from the get go, or it could be faulty and I hadn't realised.
2. Yes, and if it weren't for my messing with the Adaptec tool I think I would have had two chances - now ada4 is my great hope. I just don't know how to diagnose what is wrong with those drives (ada1 has a fresh issue, of course) - things look relatively healthy, they just aren't showing up in gpart et al - my next step eludes me.
3. Haha, I promise that I wasn't leaving anything out - I can identify one of two things which preceded this: It was exceptionally hot in the two days preceding my noticing the issue (it could have happened any time from the eve of the first day and I wouldn't have noticed as our new Xbox One rather distracted me), so that's option 1 - the heat and the non-functional cooling. The only other thing which might have happened is that on the morning of the second day, I moved the machine in order to disconnect the monitor I had attached, the network cable was slightly shorter than anticipated which meant that there was a slight jolt when I moved it, and slight percussion, which given the disks were spinning could be bad.
4. I shall fix that.
5. And that.
6. That is lucky. Now, if only they would give up their sweet data.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I was hoping for a story along the lines of:

I've had FreeNAS and ZFS for 2 months. Yesterday I got angry when my girlfriend left me for a jerk and I stupidly kicked my server. I then decided in a fit of rage to grab it and throw it out the window Hulk Hogan style. Then I found out I wasn't as strong as Hulk Hogan so I dropped the damn thing on the ground. Then I installed Windows on one of the drives. Then I removed the Windows drive. Powered it up and this is what I have.

You say that you did something with the adaptec tool. What did you do? Cause pretty much I'm not getting details to identify what you did to break things, so I can't figure out how to unbreak things.

Right now I'm a little dismayed because your drives in the pool are labeled as gpt/ada0p2. This isn't how FreeNAS has done things, so someone has been doing things they weren't supposed to do. This also makes it less likely to help since I can't make any assumptions about how the server was made.
 

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
Hahaha, we can pretend that is what I did, if you like.

The history of the machine is thus:

FreeNAS 8.0 was installed
OS corrupted
New install of 9.1 on a new USB device
Imported NFS pool from before.

No one has played silly buggers with the drives or their labels as only I have access and I haven't done anything unusual - it's been all FreeNAS, using the WebUI until yesterday. I 100% promise, the labels have not been tampered with by human hands.

ada1 and the Adaptec Tool

Firstly, I am fairly certain that this drive can be considered a lost cause due to what I did with the adpatec tool. However ada4 is at present untouched and in the same state as it has been since the failure - so my path forward as I understand it is bound up in understanding what ada4's status is at present.

As for what I did, I briefly described this in the OP under "What I See in BIOS/POST"

Upon reboot I saw each of the drives listed in a format similar to the below:

Code:
Loading Configuration...
 
Port#00 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#01 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#02 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#03 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
 
SATA JBOD- PORT-0 SAMSUNG HB204UI 1.81TB Legacy
SATA JBOD- PORT-1 SAMSUNG HB204UI 1.81TB Non-Configured
SATA JBOD- PORT-2 SAMSUNG HB204UI 1.81TB Legacy
SATA JBOD- PORT-3 SAMSUNG HB204UI 1.81TB Legacy


Both ada1 and ada4 were appearing and not appearing in the same way within the OS (as per all of the outputs I showed above). Not thinking straight, I rebooted again and hit Ctrl-A to enter the configuration mode for the controller.

Whereupon I selected "Configure Drives" and selected the drive attached to port 1 and then went forward with the configuration process.

Now I see the following

Code:
Loading Configuration...
 
Port#00 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#01 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#02 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
Port#03 SAMSUNG HB204UI 1AQ10001 1.81TB Healthy 3.0 Gb/s
 
SATA JBOD- PORT-0 SAMSUNG HB204UI 1.81TB Legacy
SATA JBOD- PORT-2 SAMSUNG HB204UI 1.81TB Legacy
SATA JBOD- PORT-3 SAMSUNG HB204UI 1.81TB Legacy
Array#0 - Simple Volume          1.81TB Optimal


So, now ada1 has slightly different messages upon load in the dmesg - in that it is recognised that it's been configured (I'm pretty sure it's had its table changed as a result of my error in judgement and is now a lost cause, whereas ada4 is only PROBABLY a lost cause).

So, unless you think that it wasn't quite as destructive as I fear, I suspect we'll get the most progress out of ignoring ada1 for now and trying to diagnose what ada4 is doing.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
WOW. Not looking good.

Can we do a Teamviewer? I *might* be able to fix this. If so, get on IRC. :P
 

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
We can! I would greatly appreciate that. One moment, I will find the IRC channel and server.
 

nihil

Cadet
Joined
Jan 7, 2014
Messages
6
Wow!

cyberjock you are a great man!

For those who are curious - the bootsector was backed up from a good drive (ada0) and restored to ada1 and ada4. Everything seems to be in order, though other issues were unveiled.
 

HolyK

Ninja Turtle
Moderator
Joined
May 26, 2011
Messages
654
Yea, our mighty Cyberjock ...

@ cyber: Bro, quit you job (if you have any o_O ) and start offering your PRO services for $$$ :D

BTW: Topic marked as "solved" now :]
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
No job.. but I'm seriously considering consultant work. If it helps pay the bills I don't see a downside to this idea.
 
Status
Not open for further replies.
Top