Having issues replacing ZFS drive on FreeNAS-8.0.2-RELEASE-amd64 (8288)

Status
Not open for further replies.

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Hi All,

Appreciate any support/assistance in getting out of this one.

History: My freenas had been running extremely slow (10 minutes to map to server, impossible to open files etc. and scrubs had identified chksum errors on one of the drives constantly. I had ruled out any network related issues, and decided to remove the drive that was producing errors and try formatting the drive. After reboot the state became degraded as expected and performance was back to normal (extra load on CPU) while I worked on getting the other drive back online. I placed the faulty drive in another machine and reformatted to NFS and ran chkdsk /f with no errors found. I shutdown freenas and inserted the drive back into freenas to rebuild data onto this drive.

Issue: Same drive (formatted) has been put back into the server but cannot follow the instructions listed on the forum sites for replacing failed disk as the name does not appear correctly and showing as "unknown" instead of ada3. I wanted to see if it was possible to add the same disk back into the pool and let it rebuild rather than buying a new drive if it was not a hardware fault. How can i add the drive back into the storage pool - I have tried replacing through the GUI but it just displays "Replace Disk: None" with ...

Storage > View Disks:

view disks.JPG

GUI ZPOOL Status:

zpool status.JPG

CLI freenas debug:

debug.JPG

Specs:
Freenas Build: FreeNAS-8.0.2-RELEASE-amd64 (8288)
OS Version: FreeBSD 8.2-RELEASE-p3
Box: HP Proliant N36L Microserver
Drives: 5x WD (WD20EARX)
RAM: 2GB ECC

Look forward to your responses :)
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
My freenas had been running extremely slow (10 minutes to map to server, impossible to open files etc. and scrubs had identified chksum errors on one of the drives constantly.

Probably because you don't have enough RAM
RAM: 2GB ECC

Something about that CLI screenshot looks funny, did you type the header "ZFS POOLS STATUS" yourself?

Ok, I'd say you're on about step #5 below where it says 'Unavailable'. So instead of typing "zpool replace Media /dev/ada3",
try typing "zpool replace Media /dev/gptid/f6e... (and the rest of that big long string including the dashes)"

OR, if that doesn't work, try using that OTHER big long number starting with 6648 but without the /dev before it.

One of those should work.


  • Determine which disk needs to be replaced (for example ada7) in a raidz1/z2 (Example: tank)
  • Id the physical disk
  • Shut down the system
  • Pull ada7 out of the system and replace it with a new disk (same size or larger) in the same port
  • Power-on the system (tank will be in a DEGRADED state and /dev/ada7 Unavailable)
  • From the command line type zpool replace tank ada7
  • The pool will begin re-silvering and can take a long time, let it finish.
  • You can check the status while this is happening with zpool status -v
  • After it finishes, do another zpool status -v, it will still say DEGRADED
  • Look for the device name that says /dev/ada7/old
  • Type the command zpool detach tank /dev/ada7/old
  • Check the status again and the DEGRADED and /dev/ada7/old should be gone and the pool state should be ONLINE
  • Type the command zpool export tank
  • Do Auto Import from the GUI
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Something about that CLI screenshot looks funny, did you type the header "ZFS POOLS STATUS" yourself? .

I have SSH set up and connected to console - ran command /usr/local/bin/freenas-debug -z (only took a screenshot of the relevant section).

try typing "zpool replace Media /dev/gptid/f6e... (and the rest of that big long string including the dashes).

I have tried the commands with the old gtipd that I can see listed in "View Disks" but didn't work - output below

[root@freenas] ~# zpool replace Media /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
cannot open '/dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d': No such file or directory


if that doesn't work, try using that OTHER big long number starting with 6648 but without the /dev before it."

Also didn't have any luck using the 6648 number, output below

[root@freenas] ~# zpool replace Media 6648684304901970743
cannot open '6648684304901970743': no such GEOM provider
must be a full path or shorthand device name
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Glabel Status:

[root@freenas] ~# glabel status
Name Status Components
ufs/FreeNASs3 N/A da0s3
ufs/FreeNASs4 N/A da0s4
ufs/FreeNASs1a N/A da0s1a
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d N/A ada0p2
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada1p2
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada2p2
gptid/41d96622-2863-11e1-872c-3cd92b0cfc7d N/A ada3p1
gptid/96782f11-2863-11e1-872c-3cd92b0cfc7d N/A ada3p2
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d N/A ada4p2

Gpart Show:


[root@freenas] ~# gpart show
=> 63 30695301 da0 MBR (15G)
63 1930257 1 freebsd [active] (943M)
1930320 63 - free - (32K)
1930383 1930257 2 freebsd (943M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 26790372 - free - (13G)

=> 0 1930257 da0s1 BSD (943M)
0 16 - free - (8.0K)
16 1930241 1 !0 (943M)

=> 34 3907029101 ada0 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada1 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada2 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada3 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada4 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Ok try this:

zpool replace Media /dev/gptid/96782f11-2863-11e1-872c-3cd92b0cfc7d

When you took your disk out and formatted it the label changed. If you compare the list of gptid's with the list from your original output, all of the other ID's are the same. It's strange that none of the others show an ID for the swap partition OR that this one does show one.
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Interesting indeed...

Results:

[root@freenas] ~# zpool replace Media /dev/gptid/96782f11-2863-11e1-872c-3cd92b0cfc7d
cannot replace /dev/gptid/96782f11-2863-11e1-872c-3cd92b0cfc7d with /dev/gptid/96782f11-2863-11e1-872c-3cd92b0cfc7d: no such device in pool
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Looking back at the CLI screenshot, did you put the formatted disk back in and then do a scrub?

It looks like if that's what happened, it left the formatted disk in an in-between state. *IF* that's what happened, then try the stuff below, probably anyway, but let me know about the scrub first.


Let's try this (copy & paste 1 at a time, keep all quotation marks):

Code:
dd if=/dev/zero of=/dev/ada3 bs=1m count=1

dd if=/dev/zero of=/dev/ada3 bs=1m oseek=`diskinfo ada0 | awk '{print int($3 / (1024*1024)) - 4;}'`


Followed by the 'glabel status' and 'gpart show' again.

Oh, and a zpool status -v


The dd commands will wipe the partition info for that disk.
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Will try the suggestions shortly and let you know the result, but yes it appears as though a scrub task has been kicked off (the last time it started i stopped it via command line as I didnt know if doing a scrub without the drive present would cause any issues...
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Results for dd if=/dev/zero of=/dev/ada3 bs=1m count=1:

[root@freenas] ~# dd if=/dev/zero of=/dev/ada3 bs=1m count=1
1+0 records in
1+0 records out
1048576 bytes transferred in 10.458542 secs (100260 bytes/sec)

Results for dd if=/dev/zero of=/dev/ada3 bs=1m oseek=`diskinfo ada0 | awk '{print int($3 / (1024*1024)) - 4;}'`:

[root@freenas] ~# dd if=/dev/zero of=/dev/ada3 bs=1m oseek=`diskinfo ada0 | awk '{print int($3 / (1024*1024)) - 4;}'`
dd: /dev/ada3: short write on character device
dd: /dev/ada3: end of device
5+0 records in
4+1 records out
4284416 bytes transferred in 45.651567 secs (93850 bytes/sec)

Glabel Status:

[root@freenas] ~# glabel status
Name Status Components
ufs/FreeNASs3 N/A da0s3
ufs/FreeNASs4 N/A da0s4
ufs/FreeNASs1a N/A da0s1a
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d N/A ada0p2
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada1p2
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada2p2
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d N/A ada4p2

[root@freenas] ~# gpart show
=> 63 30695301 da0 MBR (15G)
63 1930257 1 freebsd [active] (943M)
1930320 63 - free - (32K)
1930383 1930257 2 freebsd (943M)
3860640 3024 3 freebsd (1.5M)
3863664 41328 4 freebsd (20M)
3904992 26790372 - free - (13G)

=> 0 1930257 da0s1 BSD (943M)
0 16 - free - (8.0K)
16 1930241 1 !0 (943M)

=> 34 3907029101 ada0 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada1 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada2 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

=> 34 3907029101 ada4 GPT (1.8T)
34 94 - free - (47K)
128 4194304 1 freebsd-swap (2.0G)
4194432 3902834703 2 freebsd-zfs (1.8T)

Zpool Status -V

[root@freenas] ~# zpool status -v
pool: Media
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: scrub in progress for 2h43m, 29.58% done, 6h29m to go
config:

NAME STATE READ WRITE CKSUM
Media DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
6648684304901970743 UNAVAIL 0 0 0 was /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0

errors: No known data errors
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
Give this a try again:

zpool replace Media /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d

If that doesn't work I'm out of ideas for the moment.

I guess your could try an ls /dev/gptid and post the output just for grins since the output of this command earlier said it didn't exist:

[root@freenas] ~# zpool replace Media /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
cannot open '/dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d': No such file or directory
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Unfortunately no luck with the zpool replace command :( Thanks for all your help to date though!! Muchly appreciated. Output is below..

[root@freenas] ~# zpool replace Media /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
cannot open '/dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d': No such file or directory
[root@freenas] ~# ls /dev/gptid
./
../
f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d
f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d
f48181dd-e82d-11e0-ae11-3cd92b0cfc7d
f818db2a-e82d-11e0-ae11-3cd92b0cfc7d
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
That replace command that protosd is giving you does not make sense.

You were right with the beginning but them missed it using "dd" over ada3.

Do the following:
Go to the GUI, Storage -> View Volumes -> View Disks
On the Disk that does not show the device name nor serial click Replace, then select the disk, ada3, click OK.

That should work.

if it doesn't, hopefully it will at least create the GPT table in the disk once again. Then issue the command:
zpool replace Media 6648684304901970743 /dev/ada3p2

Let me know how that goes...

EDIT:

If that still doesn't work (or you still could not replace using GUI), the disks hasn't been labeled with GPT, then run:
gpart create -s gpt /dev/ada3 && gpart add -b 128 -t freebsd-swap -s 2097152 ada3 && gpart add -t freebsd-zfs ada3

Then again: zpool replace Media 6648684304901970743 /dev/ada3p2
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Morning - thanks for your responses - latest outputs are below.

Go to the GUI, Storage -> View Volumes -> View Disks
On the Disk that does not show the device name nor serial click Replace, then select the disk, ada3, click OK.

The GUI paused for about 10 minutes, then went back to the View Disks Screen. Refreshed and still showing as "unknown" and unavailable.

if it doesn't, hopefully it will at least create the GPT table in the disk once again. Then issue the command:
zpool replace Media 6648684304901970743 /dev/ada3p2

Seem to be getting warmer....CLI paused for about 5 minutes, HDD light came on solid on the Server...and then errored out as per below.

[root@freenas] ~# zpool replace Media 6648684304901970743 /dev/ada3p2
cannot replace 6648684304901970743 with /dev/ada3p2: /dev/ada3p2 is busy

If that still doesn't work (or you still could not replace using GUI), the disks hasn't been labeled with GPT, then run:
gpart create -s gpt /dev/ada3 && gpart add -b 128 -t freebsd-swap -s 2097152 ada3 && gpart add -t freebsd-zfs ada3
Then again: zpool replace Media 6648684304901970743 /dev/ada3p2

Tried this as well - results below.

[root@freenas] ~# gpart create -s gpt /dev/ada3 && gpart add -b 128 -t freebsd-swap -s 2097152 ada3 && gpart add -t freebsd-zfs ada3
gpart: geom 'ada3': File exists
 

ProtoSD

MVP
Joined
Jul 1, 2011
Messages
3,348
That replace command that protosd is giving you does not make sense.

You were right with the beginning but them missed it using "dd" over ada3.

My thought was that, since this disk had been wiped already and the device that zpool was showing for it didn't match, that resetting it to 'wiped' and letting the zpool replace recreate the GPT might work. Also, the glabel status was showing 2 ID's for that same disk and those didn't work either. dd'ing over ada3 shouldn't cause any harm since it was wiped already and removing the GPT with dd just reset it to clean again. Doing a replace from the GUI shouldn't be any different than the command line other than you might get lucky with a valid device name.

Anyway, something is out of sync and this shouldn't be such a problem.

Any other suggestions William?
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
ZPool Status:

[root@freenas] ~# zpool status
pool: Media
state: DEGRADED
status: One or more devices could not be opened. Sufficient replicas exist for
the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
see: http://www.sun.com/msg/ZFS-8000-2Q
scrub: none requested
config:

NAME STATE READ WRITE CKSUM
Media DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
6648684304901970743 UNAVAIL 0 0 0 was /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0

errors: No known data errors

Glabel Status:

[root@freenas] ~# glabel status
Name Status Components
ufs/FreeNASs3 N/A da0s3
ufs/FreeNASs4 N/A da0s4
ufs/FreeNASs1a N/A da0s1a
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d N/A ada0p2
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada1p2
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d N/A ada2p2
gptid/a8f78c1e-28ae-11e1-8b7f-3cd92b0cfc7d N/A ada3p1
gptid/0553eb6c-28af-11e1-8b7f-3cd92b0cfc7d N/A ada3p2
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d N/A ada4p2
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
I dont see a reason why "zpool replace Media 6648684304901970743 /dev/ada3p2" wouldn't work.

It _might_ be because they are the same disk,,,

If that doesn't, make sure your HD is ok (smart or something).

Or else it might be a good idea to try to DD the beginning of ada3p2 and the end of it

dd if=/dev/zero of=/dev/ada3p2 bs=1m count=4
dd if=/dev/zero of=/dev/ada3p2 bs=1m oseek=`diskinfo ada3p2 | awk '{print int($3 / (1024*1024)) - 4;}'`

then zpool replace again (note: it is really ada2p2 above, not just ada3)
 

Polzy

Dabbler
Joined
Dec 16, 2011
Messages
13
Tonight I came home and ran the two commands as suggested - interesting results below.

[root@freenas] ~# dd if=/dev/zero of=/dev/ada3p2 bs=1m count=4
4+0 records in
4+0 records out
4194304 bytes transferred in 116.353832 secs (36048 bytes/sec)

[root@freenas] ~# dd if=/dev/zero of=/dev/ada3p2 bs=1m oseek=`diskinfo ada3p2 | awk '{print int($3 / (1024*1024)) - 4;}'`
dd: /dev/ada3p2: short write on character device
dd: /dev/ada3p2: end of device
5+0 records in
4+1 records out
4201984 bytes transferred in 46.967432 secs (89466 bytes/sec)

Having not compared this with the other drives would I be correct in assuming that the actual read/write speed on this single drive is a massive issue and may explain the slowness that I experienced - which since having removed the drive and running in a degraded state has actually improved performance back to what it was prior to their being any issues?

I have ran the zpool replace again followed by zpool status and it appears to be resilvering (albiet at a snail pace if i wanted to wait for 1536 days to write back about 1.2TB of data.

[root@freenas] ~# zpool status -v
pool: Media
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress for 0h6m, 0.00% done, 40316h42m to go
config:

NAME STATE READ WRITE CKSUM
Media DEGRADED 0 0 0
raidz1 DEGRADED 0 0 0
gptid/f345e6e9-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f48181dd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
gptid/f5b3d7bd-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
replacing DEGRADED 0 0 0
6648684304901970743 UNAVAIL 0 0 0 was /dev/gptid/f6e68c56-e82d-11e0-ae11-3cd92b0cfc7d
ada3p2 ONLINE 0 0 0 4.25M resilvered
gptid/f818db2a-e82d-11e0-ae11-3cd92b0cfc7d ONLINE 0 0 0
 

William Grzybowski

Wizard
iXsystems
Joined
May 27, 2011
Messages
1,754
Yes, this is extremely slow. Its possible the drive has some bad sectors and it is trying to recover...

check smarctl -a /dev/ada3
 
Status
Not open for further replies.
Top