Confusion about proper drive removal ?

Status
Not open for further replies.
Joined
Nov 11, 2014
Messages
1,174
I read in the FreeNAS manual and other guides as well about the proper way to replace drive.

It seems you can't just pull the drive like a drobo while running and put it back .You have to properly put the disk offline first to let the system know you are swapping failed drive to prevent issues,but in real world situation when drive suddenly dies without notice while running, what happens then ? How that will be ok ?

I am little confused here if somebody can shed some light will be appreciated a lot ?


P.S. We are talking about redundant array situation let's say in RAIDZ with 3 drives.
 
D

dlavigne

Guest
In real world situation you follow the steps in the Guide... If you have redundancy, the pool will continue to work in a degraded state until the resilvering completes. Assuming the maximum amount of allowed disks doesn't fail.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
If the disk fails, two things may happen:

It's not been offlined automatically and you'll be able to offline it.

or

It's been automatically offlined and doesn't need to be manually offlined.

Follow the manual to the letter and it'll be fine.
 
Joined
Nov 11, 2014
Messages
1,174
So if I understand this right I can still pull the drive from the bay while it's working ( to simulate instant drive fail) and then go to the gui and offline it ?
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
no =) im glad that you clearly practise this before you have any valuable data on those raidz2 as raidz1 is not suggested at all and raidz2 requires 4 hdd to be used..
Code:
Well here it goes. i might miss this even myself thats why you always look what manual says. As in freenas 8 this would have to do much differently
0. Tell users that server is offlimits at moment (company where i am working  we do this by knockin door to door and talk to persons in room) (And if someone says server is down say "it's newer down, its offlimits at moment"
1. shuddown nas
2. rip that hdd off. (and i really do hope that you looked before that hdd serial is right what you have to take off)
3. start nas and see what zpool status says, gui might not even notice it yet.. offline that bad hdd
4.take notes or pictures or what ever makes you remember what nas currently says..
5.shutdown nas
6. build other nas or use that nas without other hdds plugged in and plugin that "failed" hdd on machine and start another freenas
7. type zpool labelclear /dev/whatever (dont do this if you have other valuable zfs pool hdds plugged)
(you have clear "failed" hdd so zfs dont get nervous if you try give old hdd to it.
8. if nas not allows this type sysctl kern.geom.debugflags=0x10 and do 7. again
9. that hdd should be now cleared what pool it belonged
10. rip usb stick off and put orignal in place and plugin all hdds in place and press start button
11.If nas shows that old hdd is offline then continue what manual says


Of course if you plan really change hdd to brand new then you dont need labelclear to clean old hdd.

And if someone else looks, this is not proper replace method, as normally you should first offline a disk before rip it off like manual suggest.
 
Last edited:
Joined
Nov 11, 2014
Messages
1,174
I guess I have to try to rip the hdd when Freenas is working. This is in order to create a drive fail situation, because when hdd dies with the "click of dead" it will not care is the system is on or off in real life situation.

I appreciate everybody's help on this. I guess I couldn't explained better what I am confused about.
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
If you rip and put hdd back it zfs accepts it back to pool before you even get say pool.
In older raid systems if hdd is lost few second and came back raid would start rebuilding whole raid, and that takes long time, just because hdd had a little hiccup. zfs will handle hiccups better when hdd comes back from hiccup it just resilver latest stuff and not whole disk.

Code:
zfs has many states:

ONLINE
    The device or virtual device is in normal working order. Although some transient errors might still occur, the device is otherwise in working order.

DEGRADED
    The virtual device has experienced a failure but can still function. This state is most common when a mirror or RAID-Z device has lost one or more constituent devices. The fault tolerance of the pool might be compromised, as a subsequent fault in another device might be unrecoverable.

FAULTED
    The device or virtual device is completely inaccessible. This status typically indicates total failure of the device, such that ZFS is incapable of sending data to it or receiving data from it. If a top-level virtual device is in this state, then the pool is completely inaccessible.

OFFLINE
    The device has been explicitly taken offline by the administrator.

UNAVAIL
    The device or virtual device cannot be opened. In some cases, pools with UNAVAIL devices appear in

DEGRADED mode. If a top-level virtual device is UNAVAIL, then nothing in the pool can be accessed.

REMOVED
    The device was physically removed while the system was running. Device removal detection is hardware-dependent and might not be supported on all platforms.

As you can see zfs can be come many states depend on hardware and a case.
if you rip it as in on it might be go REMOVED,UNAVAIL,OFFLINE,FAULTED,

The catch:
As for if you planning on just press offline button and rip off hdd and put new one while system is on, it might not work because your system might not be hotswappable and same time even not hotpluganable (hotplugin and hotswap is different things)
Thats why no one can say for sure does this online replace stuff works.
As for rip that hdd off onlinemachine you can try that if you want, no broblem but replacing new hdd while online what i have read its safer to shutdown, because some hardware might halt it.

What if I rip it, format it on different machine and then put it back , so FreeNAS won't recognize like a disk from the pool , and I assume it will treat it like new one ? Then can I add it to rebuild the pool , even I didn't properly offline the disk when I rip it off ?

(Yes you can take that hdd offline rip it or offline button or what pleases you most, but replace require reboot on nas anyway (reboot is recommended as later in this topic someone has a solid argumend). And you put new hdd in and then you have to start nas and check does it show old or manynumbershdd offline state after that you manually press "replace this hdd" button) Sorry for my bad english.

Edit: now i see your point what you are after, manual says that you have offline disk before put new one on nas machine, and you thinkin what if hdd just actually goes bad while in on.

Just try see an or take it offline and then you get replace button (replace button seems changin time to time
on different freenas versions)


I had one time that i couldn put that failed hdd offline, i dont have a clue why. But after rebooting few times nas while that old hdd was in place it then allowed to offline it, even it didnt see it anymore, after that i shutdown and put new hdd and continued from manual.

Edit2: Wow i have much clearer head now, had to come and moderate my own text from last night.
 
Last edited:
Joined
Nov 11, 2014
Messages
1,174
What if I rip it, format it on different machine and then put it back , so FreeNAS won't recognize like a disk from the pool , and I assume it will treat it like new one ? Then can I add it to rebuild the pool , even I didn't properly offline the disk when I rip it off ?
 
Joined
Nov 11, 2014
Messages
1,174
If you rip and put hdd back it zfs accepts it back to pool before you even get say pool.
In older raid systems if hdd is lost few second and came back raid would start rebuilding whole raid, and that takes long time, just because hdd had a little hiccup. zfs will handle hiccups better when hdd comes back from hiccup it just resilver latest stuff and not whole disk.

Edit:
Code:
zfs has many states:

ONLINE
    The device or virtual device is in normal working order. Although some transient errors might still occur, the device is otherwise in working order.

DEGRADED
    The virtual device has experienced a failure but can still function. This state is most common when a mirror or RAID-Z device has lost one or more constituent devices. The fault tolerance of the pool might be compromised, as a subsequent fault in another device might be unrecoverable.

FAULTED
    The device or virtual device is completely inaccessible. This status typically indicates total failure of the device, such that ZFS is incapable of sending data to it or receiving data from it. If a top-level virtual device is in this state, then the pool is completely inaccessible.

OFFLINE
    The device has been explicitly taken offline by the administrator.

UNAVAIL
    The device or virtual device cannot be opened. In some cases, pools with UNAVAIL devices appear in

DEGRADED mode. If a top-level virtual device is UNAVAIL, then nothing in the pool can be accessed.

REMOVED
    The device was physically removed while the system was running. Device removal detection is hardware-dependent and might not be supported on all platforms.

As you can see zfs can be come many states depend on hardware and a case.
if you rip it as in on it might be go REMOVED,UNAVAIL,OFFLINE,FAULTED,

The catch:
As for if you planning on just press offline button and rip off hdd and put new one while system is on, it might not work because your system might not be hotswappable and same time even not hotpluganable (hotplugin and hotswap is different things)
Thats why no one can say for sure does this online replace stuff works.
As for rip that hdd off onlinemachine you can try that if you want, no broblem but replacing new hdd while online what i have read its safer to shutdown, because some hardware might halt it.



(Yes you can take that hdd offline rip it or offline button or what pleases you most, but replace require reboot on nas anyway. And you put new hdd in and then you have to start nas and check does it show old or manynumbershdd offline state after that you manually press "replace this hdd" button) Sorry for my bad english.

Edit: now i see your point what you are after, manual says that you have offline disk before put new one on nas machine, and you thinkin what if hdd just actually goes bad while in on.

Just try see and or take it offline and then you get replace button (replace button seems changin time to time on different freenas versions)

I had one time that i couldn put that failed hdd offline, i dont have a clue why. But after rebooting few times nas while that old hdd was in place it then allowed to offline it, even it didnt see it anymore, after that i shutdown and put new hdd and continued from manual.

When you had a drive failed, why you didn't shutdown NAS remove failed drive put a new one and then start NAS and see if it will give you an option to tell gui "take this hdd offline and replace with this one".

I am not sure , but I think what freenas means with "offline" is not actually what we expect to be. Perhaps put it on "offline" mode is a way to tell the raid array that is gone and to work without it. Not like USB drive when you take it "offline" or "safe removal" "or "dismount".
I don't know but it's confusing , not because it's complicated , because it should be explain better how was meant to work. Now when drive is there but is dead , do you need to remove it , or it's already gone when you look ad the gui ?!
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
When you had a drive failed, why you didn't shutdown NAS remove failed drive put a new one and then start NAS and see if it will give you an option to tell gui "take this hdd offline and replace with this one".
I really dont remember clearly anymore, it either did not able to detach old hdd, or i missed real page where offline button was..
Just for intrest i just tested somethin myself i put massive peltor earcovers on (glad im this night a man in house) and fired supermicro superserver with sata backplating and tryed this.

Edit: machine specs
Superchassis CSE-836TQ-R800B 800w redundant
Supermicro X7DBE Xeon E5430
24GB ECC FB ram
2x retail IBM ServeRAID M1015 raid card IT mode (cards came with huge manual papers n cdrom stuff)
4x Supermicro multilane SAS IPASS (SFF-8087) cable
6x seagate 500GB sata hdd
1x 2port IPMI card
Freenas version 9.2.1.9b
APC Smart UPS xxxxVA

superserver on whole time and no reboots or relogins
-raidz1 pool
1. rip a drive off from slot
2.volume status show 23232332423 REMOVED
3.put new drive on sameslot (hearing some beep but i think its not because i pull sata drives..)
4. select drive press replace, and replace again and replace worked and it started silvering
5. silver done and all is online and green again. works

superserver on whole time and no reboots or relogins
-raidz1 pool
1. press offline
2. now it shows 234232323 OFFLINE
3.rip correct hdd off from slot
4.put new drive on sameslot (hearing some beep but i think its not because i pull sata drives..)
5.select drive press replace, and replace again and replace worked and it started silvering
6. silver done and all is online and green again.
7. if you edit old hdd it errors and press x and try again it then shows correct hdd serial

Okay now is your turn to try same =) do you get same results? =)

Edit:2
Well this is just being stupid, even i get it.
Im almost certain that if i try this on my amdFX machine it wont work smooth like that


As manual says always OFFLINE hdd. This step is needed to properly remove the device from the ZFS pool and to prevent swap issues.
But im not starting to argue against a team who makes freenas, im pretty sure they know better =)
 
Last edited:
Joined
Nov 11, 2014
Messages
1,174
In the first experiment when you said "1. rip a drive off from slot" was the machine on and running at that time ?
 

Starpulkka

Contributor
Joined
Apr 9, 2013
Messages
179
Yes machine was runnin loud and clear all time, but its a real server hardware, what are in real server rooms.
And now im goin to sleep, maby tomorrow i try what zfs states amd chipset gives.. zzZZ
 
Last edited:
Joined
Nov 11, 2014
Messages
1,174
Yes mine too. Supermicro 836TQ

That is very good to know. My controllers are LSI9211 I know they are able to hot swap, but my concern was not if you have to reboot to put new drives but , what happened if suddenly drive is gone , could be failed drive could be even bad cable that will simulate the same result, it's good to confirm NAS was able to recover from that situation.

Thank you Starpulkka for all your help , I guess this give me an answer what happened when drive is ripped from the array. Right now I am doing some test setting up LACP. I got confused by that too.:smile:
 

mjws00

Guru
Joined
Jul 25, 2014
Messages
798
The best thing to do is get to know your own hardware while you are burning in and messing around. Pull a drive and see what's up. Plug it back in... see what happens. The thing is smart hardware can tell you pulled the plug, so it doesn't simulate 'failure'. It will just show it as REMOVED. There is no 'zpool online' command in the GUI, but zfs will let you put that drive back in service from the CLI. Or just reboot with the device back in place.

Shut down, pull the drive, then reboot. You will get a different result that is closer to a failure the controller and zfs doesn't know about. With a TQ chassis you can unplug a single drive, or the sas cable, whatever suits you. Do it so you've seen it with your own eyes.

If the drive physically fails. It will show up as FAULTED or UNAVAIL on the drive list. But the pool and box will continue happily assuming redundancy. At that point you can offline then replace at your leisure. Best practice is shutdown, replace, reboot. Mostly due to ease of an error pulling the wrong drive, on a live pool with only one disk redundancy left, and causing a hard pool crash.

Good luck. Don't take someone's word for it. Know your own recovery scenarios inside out and backwards.
 
Joined
Nov 11, 2014
Messages
1,174
The best thing to do is get to know your own hardware while you are burning in and messing around. Pull a drive and see what's up. Plug it back in... see what happens. The thing is smart hardware can tell you pulled the plug, so it doesn't simulate 'failure'. It will just show it as REMOVED. There is no 'zpool online' command in the GUI, but zfs will let you put that drive back in service from the CLI. Or just reboot with the device back in place.

Shut down, pull the drive, then reboot. You will get a different result that is closer to a failure the controller and zfs doesn't know about. With a TQ chassis you can unplug a single drive, or the sas cable, whatever suits you. Do it so you've seen it with your own eyes.

If the drive physically fails. It will show up as FAULTED or UNAVAIL on the drive list. But the pool and box will continue happily assuming redundancy. At that point you can offline then replace at your leisure. Best practice is shutdown, replace, reboot. Mostly due to ease of an error pulling the wrong drive, on a live pool with only one disk redundancy left, and causing a hard pool crash.

Good luck. Don't take someone's word for it. Know your own recovery scenarios inside out and backwards.

Thanks. And I agree with you.:smile:
 
Joined
Nov 11, 2014
Messages
1,174
The best thing to do is get to know your own hardware while you are burning in and messing around. Pull a drive and see what's up. Plug it back in... see what happens. The thing is smart hardware can tell you pulled the plug, so it doesn't simulate 'failure'. It will just show it as REMOVED. There is no 'zpool online' command in the GUI, but zfs will let you put that drive back in service from the CLI. Or just reboot with the device back in place.

Shut down, pull the drive, then reboot. You will get a different result that is closer to a failure the controller and zfs doesn't know about. With a TQ chassis you can unplug a single drive, or the sas cable, whatever suits you. Do it so you've seen it with your own eyes.

If the drive physically fails. It will show up as FAULTED or UNAVAIL on the drive list. But the pool and box will continue happily assuming redundancy. At that point you can offline then replace at your leisure. Best practice is shutdown, replace, reboot. Mostly due to ease of an error pulling the wrong drive, on a live pool with only one disk redundancy left, and causing a hard pool crash.

Good luck. Don't take someone's word for it. Know your own recovery scenarios inside out and backwards.


But sometimes you have to take someone's word for. I am thinking to buy 10gb card for my Freenas and at least one more nic for my workstation , without switch for the first time. Before I invest thousand + dollars in nic and switch will be good to know what speeds I can get with cifs sharing , because it might be ... not worth it if I can utilize network at least 50% of it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If your CIFS protocol is the bottleneck, expect 200-350MB/sec in Windows and 350-500MB/sec in Linux. If your pool is slower than that, whatever that speed is.
 
Joined
Nov 11, 2014
Messages
1,174
So if my pool can do 900 MB/sec I still can't utilize even 500MB/sec with 10G nic with SMB 2.0 ?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Not likely. At least, I've never gotten those speeds before despite trying desperately and using things like ram drives and such. Call up Windows and yell at them for being so crappy. ;)

Switch to Linux. It has much faster throughput.

You should be able to get better speeds with Windows 8+ and SMB3, but SMB3 support isn't always 100% in Samba so it may or may not work.

But yeah, don't expect to do 300MB/sec+ in Windows 7 with CIFS.
 
Joined
Nov 11, 2014
Messages
1,174
Well Linux has it's purposes but not for Workstation OS.
If I had Windows 8( which I don't cause don't like it) and SMB 3.0 ( Windows 2012 Server let say ) I believe I can get 300 MB/s with 1G network ( 4x1G Nic) using the new SMB 3.0 multipath feature and can combine 4 ports 1G in to 1 single combine connection and wont even need 10G network.
 
Status
Not open for further replies.
Top