Offlined a disk to test RAID, expected "Online" button but there is none.

Status
Not open for further replies.

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
I took a disk out of my array just to convince myself that I had set up everything right and that it was going to work as I expected. The array performed just fine in the degraded state with one disk down.

Now, I don't see a way to just flip a switch and have the disk resume its place in the array. For starters, I figured that the OFFLINE button would just change its label to ONLINE. Instead, the OFFLINE button just disappeared, and only EDIT and REPLACE remain. The offlined drive is in the Replace list, but selecting it results in a failure message to the effect of "this drive is part of the array Z1", which happens to be the array that I'm trying to add it back into. Time wise, I'd rather not have to wipe the drive and resilver the whole thing. Is there a set of CLI commands that someone could walk me through to get this disk inserted back into the array? Thanks.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I hope you didn't just unplug the device with the server running if you have important data on the zpool. The manual states that's a very bad idea if you value your data.

All you have to do (if your system supports hotswap) is to plug the drive in and after the system re-acknowledges the drive it will automatically add it to the array again. Then you just need to do a scrub to fix any issues with data inconsistencies between the zpool and the drive you removed.

Edit: AFAIK if the buttons edit/replace appear that means that the zpool has a disk for that zpool stripe. In effect, the disk is "online".
 

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
The data on there is invaluable, but thoroughly backed up. I did this test in advance of deleting one of the backups. I didn't just unplug the drive; I hit the OFFLINE button in the GUI, then powered down, unplugged the drive, then powered back up to do a quick test on the degrades array. That went well, so I powered down, plugged the drive back in, then powered up again. The drive's status in the array shows as "offline", the array is "degraded", and there are EDIT and REPLACE buttons available, but no ONLINE. Hitting REPLACE results in the error I described initially, and doesn't work. I think there has to be a seamless way to bring the drive online again within the pool, I'm just not seeing that it's possible through the GUI.
 

peterh

Patron
Joined
Oct 19, 2011
Messages
315
It's the correct behaviour.

Think about this, you unplugged a disk and continued to use the filesystem.
Whanever you use the filesystem it changes. The unplugged disk will get out-of-phase
and cannot be reconnected. What you need to do is "replace" which will cause zfs to
resilver it's array.

You might have to clear start and end of the replugged disk to erase zfs-labels ( man dd ) , but the GUI might take care of that for You.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Assuming the disk was "untouched" I would expect either rebooting or clicking on the offline/replace button would have work. Time to see what your zpool actually looks like.

From a SSH session as root paste the output of:
Code:
zpool status -v

camcontrol devlist

gpart show

glabel status
Throw the output inside of some [code][/code] tags as it will preserve the formatting and keep my eyes from crossing.
 

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
Think about this, you unplugged a disk and continued to use the filesystem.
Whanever you use the filesystem it changes. The unplugged disk will get out-of-phase
and cannot be reconnected.
I did write some files to the array with one disk off. I get parity checking, I can understand compression, some other stuff, but if slipping the disk back into the array just worked, it would be no less mysterious to me to RAID working in the first place. I think I get what you're saying, that the math that makes raid work requires writing to all disks in a pool at the same time, and that with one disk down the math can no longer occur. Looking at the OFFLINE command in the manual, and the suggested action in the output of 'zpool status' makes me think that there's simply a way to bring the disk back online, and for resilvering to fill in the missing pieces only.

Code:
~> zpool status -v
  pool: Z1
 state: DEGRADED
status: One or more devices has been taken offline by the administrator.
        Sufficient replicas exist for the pool to continue functioning in a
        degraded state.
action: Online the device using 'zpool online' or replace the device with
        'zpool replace'.
 scrub: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        Z1                                              DEGRADED     0     0     0
          raidz1                                        DEGRADED     0     0     0
            gptid/dcfd7167-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0
            gptid/9884486d-fc92-11e1-9aa1-002191211e07  OFFLINE      0     0     0
            gptid/ddcfcb44-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors

  pool: ZWD10
 state: ONLINE
 scrub: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        ZWD10                                         ONLINE       0     0     0
          gptid/a5606357-e057-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors

Code:
~# camcontrol devlist
<ST3250820A 3.AAE>                 at scbus0 target 0 lun 0 (pass0,ada0)
<ST3300831A 3.03>                  at scbus0 target 1 lun 0 (pass1,ada1)
<SanDisk SDCFH2-004G HDX 4.32>     at scbus1 target 0 lun 0 (pass2,ada2)
<WDC WD10EADS-00L5B1 01.01A01>     at scbus2 target 0 lun 0 (pass3,ada3)
<WDC WD30EFRX-68AX9N0 80.00A80>    at scbus3 target 0 lun 0 (pass4,ada4)
<WDC WD30EFRX-68AX9N0 80.00A80>    at scbus4 target 0 lun 0 (pass5,ada5)
<WDC WD30EFRX-68AX9N0 80.00A80>    at scbus5 target 0 lun 0 (pass6,ada6)

Code:
~# gpart show
=>       34  488397101  ada0  GPT  (233G)
         34         94        - free -  (47K)
        128    4194304     1  freebsd-swap  (2.0G)
    4194432  484202703     2  freebsd-ufs  (231G)

=>       34  586072301  ada1  GPT  (279G)
         34         94        - free -  (47K)
        128    4194304     1  freebsd-swap  (2.0G)
    4194432  581877903     2  freebsd-ufs  (277G)

=>     63  8027649  ada2  MBR  (3.8G)
       63  1930257     1  freebsd  [active]  (943M)
  1930320       63        - free -  (32K)
  1930383  1930257     2  freebsd  (943M)
  3860640     3024     3  freebsd  (1.5M)
  3863664    41328     4  freebsd  (20M)
  3904992  4122720        - free -  (2.0G)

=>        34  1953525101  ada3  GPT  (932G)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  1949330703     2  freebsd-zfs  (930G)

=>        34  5860533101  ada4  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338703     2  freebsd-zfs  (2.7T)

=>        34  5860533101  ada5  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338703     2  freebsd-zfs  (2.7T)

=>        34  5860533101  ada6  GPT  (2.7T)
          34          94        - free -  (47K)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338703     2  freebsd-zfs  (2.7T)

=>      0  1930257  ada2s1  BSD  (943M)
        0       16          - free -  (8.0K)
       16  1930241       1  !0  (943M)

Code:
~# glabel status
                                      Name  Status  Components
                    ufsid/4eb2375204e46ca7     N/A  ada0p2
                                    ufs/s2     N/A  ada0p2
gptid/b1a5e7a5-05e6-11e1-8b4d-002191211e07     N/A  ada0p2
                    ufsid/4ea7983ca987f2a7     N/A  ada1p2
                                    ufs/s3     N/A  ada1p2
gptid/fe209d59-ff91-11e0-98e5-002191211e07     N/A  ada1p2
                             ufs/FreeNASs3     N/A  ada2s3
                             ufs/FreeNASs4     N/A  ada2s4
gptid/a5606357-e057-11e1-9ccc-002191211e07     N/A  ada3p2
gptid/dcfd7167-e056-11e1-9ccc-002191211e07     N/A  ada4p2
gptid/9884486d-fc92-11e1-9aa1-002191211e07     N/A  ada5p2
gptid/ddb9c18d-e056-11e1-9ccc-002191211e07     N/A  ada6p1
gptid/ddcfcb44-e056-11e1-9ccc-002191211e07     N/A  ada6p2
                            ufs/FreeNASs1a     N/A  ada2s1a


Other than the zpool status, I have no idea what this means. I am encouraged by the line "action: Online the device using 'zpool online' or replace the device with 'zpool replace'." Looks like something I need to run from a command line though. Would I have to type in the gptid?

Thanks.
 

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
Check this out. This may be my very first successful use of the "man page" and a command line command:
Code:
~# zpool online Z1 gptid/9884486d-fc92-11e1-9aa1-002191211e07
~# zpool status
  pool: Z1
 state: ONLINE
 scrub: none requested
config:

        NAME                                            STATE     READ WRITE CKSUM
        Z1                                              ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/dcfd7167-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0
            gptid/9884486d-fc92-11e1-9aa1-002191211e07  ONLINE       0     0     0
            gptid/ddcfcb44-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors

  pool: ZWD10
 state: ONLINE
 scrub: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        ZWD10                                         ONLINE       0     0     0
          gptid/a5606357-e057-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors


The zpool status output seems good now, but I'm still getting the yellow flasher in the GUI, which tells me that the status is unknown.
 

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
Another couple commands:
Code:
~# zpool scrub Z1
~# zpool status
  pool: Z1
 state: ONLINE
 scrub: resilver completed after 0h0m with 0 errors on Thu Sep 13 21:24:59 2012
config:

        NAME                                            STATE     READ WRITE CKSUM
        Z1                                              ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/dcfd7167-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0
            gptid/9884486d-fc92-11e1-9aa1-002191211e07  ONLINE       0     0     0  112M resilvered
            gptid/ddcfcb44-e056-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors

  pool: ZWD10
 state: ONLINE
 scrub: none requested
config:

        NAME                                          STATE     READ WRITE CKSUM
        ZWD10                                         ONLINE       0     0     0
          gptid/a5606357-e057-11e1-9ccc-002191211e07  ONLINE       0     0     0

errors: No known data errors
~#


The 112M resilver amount seems about right for the amount of files that I added as a test to the degraded array, a handfull of pdf's. Green light now showing in the GUI.

An ONLINE button would have been nice.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Assuming the disk was "untouched" I would expect either rebooting or clicking on the offline/replace button would have work.
Just wanted to correct myself here. The drive was administratively taken offline and should not come back online after a reboot, which it doesn't. If the drive was UNAVAIL then I would expect it to "just work" after rebooting.

Check this out. This may be my very first successful use of the "man page" and a command line command:
First of many. :) It's not so bad is it.
 

StephenFry

Contributor
Joined
Apr 9, 2012
Messages
171

freshfeesh

Explorer
Joined
Oct 10, 2011
Messages
72
It's looking like this project will pull me out of my GUI comfort zone in spite of myself. I'm starting to appreciate the directness and speed of the CLI. Yeah, I take back everything I said before the fix. Not so bad. And it definitely makes the server "mine". I think I'm going to have to give it a better name.
 

my95z34

Explorer
Joined
Oct 25, 2014
Messages
51
So, I apologize for bringing this thread back from the dead, but since this is the exact same issue I'm having I figured there was no point in making a new thread.

So, my new server has status LEDs on the hard drive bays, so I wanted to see if those reacted to offlining the disk or not. So, I picked one and marked it as offline. But now, I'm unable to get it back online again. -_-

Code:
[root@freenas] /# zpool status
  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 0h1m with 0 errors on Thu Feb 19 03:46:48 2015
config:

    NAME                                            STATE     READ WRITE CKSUM
    freenas-boot                                    ONLINE       0     0     0
     mirror-0                                      ONLINE       0     0     0
       da0p2                                       ONLINE       0     0     0
       gptid/9a6c1dc7-ab4e-11e4-9d6e-0022641e60cd  ONLINE       0     0     0

errors: No known data errors

  pool: volume1
state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas exist for
    the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: none requested
config:

    NAME                                            STATE     READ WRITE CKSUM
    volume1                                         DEGRADED     0     0     0
     raidz1-0                                      DEGRADED     0     0     0
       gptid/ebad5afa-c34f-11e4-b9d2-6c0b8406413a  ONLINE       0     0     0
       17652295458507152850                        UNAVAIL      0     0     0  was /dev/gptid/ec96a306-c34f-11e4-b9d2-6c0b8406413a
       gptid/ed6c732f-c34f-11e4-b9d2-6c0b8406413a  ONLINE       0     0     0
       gptid/ee4651d3-c34f-11e4-b9d2-6c0b8406413a  ONLINE       0     0     0

errors: No known data errors


When I try to online it, I get an error and the zpool status stays the same.

Code:
[root@freenas] /# zpool online volume1 gptid/ec96a306-c34f-11e4-b9d2-6c0b8406413a
warning: device 'gptid/ec96a306-c34f-11e4-b9d2-6c0b8406413a' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present



Code:
[root@freenas] /# gpart show
=>      34  31266749  da0  GPT  (14G)
        34      1024    1  bios-boot  (512k)
      1058         6       - free -  (3.0k)
      1064  31265712    2  freebsd-zfs  (14G)
  31266776         7       - free -  (3.5k)

=>      34  31266749  da1  GPT  (14G)
        34      1024    1  bios-boot  (512k)
      1058         6       - free -  (3.0k)
      1064  31265712    2  freebsd-zfs  (14G)
  31266776         7       - free -  (3.5k)

=>        34  5860533101  ada1  GPT  (2.7T)
          34          94        - free -  (47k)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

=>        34  5860533101  ada2  GPT  (2.7T)
          34          94        - free -  (47k)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

=>        34  5860533101  ada3  GPT  (2.7T)
          34          94        - free -  (47k)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)

=>        34  5860533101  ada0  GPT  (2.7T)
          34          94        - free -  (47k)
         128     4194304     1  freebsd-swap  (2.0G)
     4194432  5856338696     2  freebsd-zfs  (2.7T)
  5860533128           7        - free -  (3.5k)



Any ideas what I'm doing wrong? I guess this is what I get for playing around. I don't want to wipe it and replace. =[ Thanks in advance!
 

my95z34

Explorer
Joined
Oct 25, 2014
Messages
51
Hmm.... I guess you can disregard this. I shut the server down and swapped the 'offline' disk around with an online one, and it's back up and running. Although I've got a warning in the GUI.

Code:
CRITICAL: The volume volume1 (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.


I assume that'll clear up when it scrubs. The volume status shows the resilver completed successfully, so hopefully all is well now, lol
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
All you actually needed to do was reboot and do a scrub. :P
Technically you could have done a "zpool online <pool> <gpt/id>" and not rebooted at all.
 

my95z34

Explorer
Joined
Oct 25, 2014
Messages
51
All you actually needed to do was reboot and do a scrub. :P
Technically you could have done a "zpool online <pool> <gpt/id>" and not rebooted at all.
I did the zpool online command but that didn't work. Which is why I posted here, lol. But it's all good now.

Sent from my Nexus 6 using Tapatalk
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I did the zpool online command but that didn't work. Which is why I posted here, lol. But it's all good now.

Ah, I see it was unavailable... Yeah, had to do the reboot.
 
Status
Not open for further replies.
Top