2nd resilver much slower than the 1st

Status
Not open for further replies.

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Hi all,

I'm in the middle of upgrading my raidz array. The array was configured as;

1.5Tb Seagate
1.5Tb Seagate
1.5Tb Seagate
2.0Tb Hitachi

And I'm upgrading it to;

2.0Tb Seagate
2.0Tb Seagate
2.0Tb Seagate
2.0Tb Hitachi

I've replaced the 1st drive as per the following procedure;
http://doc.freenas.org/index.php/Volumes#Replacing_a_Failed_Drive_or_Zil_Device

It resilvered in about 7hrs. I've followed the same procedure again, but now it's running much slower;

Code:
[user@host] ~# zpool status
  pool: Vol1
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 18h39m, 49.01% done, 19h24m to go
config:

	NAME                                              STATE     READ WRITE CKSUM
	Vol1                                              DEGRADED     0     0     0
	  raidz1                                          DEGRADED     0     0     0
	    gptid/7d194783-3124-11e2-8549-e8393520a421    ONLINE       0     0     0
	    replacing                                     DEGRADED     0     0     0
	      ada1p2                                      OFFLINE      0     0     0
	      gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  637G resilvered
	    ada2p2                                        ONLINE       0     0     0
	    ada3p2                                        ONLINE       0     0     0

errors: No known data errors


Is this expected behaviour, or should I be worried?

Thanks!
Andy.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Estimates tend to be quite inaccurate in the early stages of resilvering. Let it run and check on it later.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Resilvering of the 2nd disk completed after ~40hrs. When I attempted to detach the old disk through the GUI, resilvering restarted on the newly installed disk, and took another ~40hrs. Did I miss click, or what happened?

Current;
Code:
zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 42h45m, 100.00% done, 0h0m to go
config:

	NAME                                            STATE     READ WRITE CKSUM
	Vol1                                            ONLINE       0     0     0
	  raidz1                                        ONLINE       0     0     0
	    gptid/7d194783-3124-11e2-8549-e8393520a421  ONLINE       0     0     0
	    gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  1.41T resilvered
	    ada2p2                                      ONLINE       0     0     0
	    ada3p2                                      ONLINE       0     0     0


I note that the volume state has returned to online, but the status indicates resilvering has not completed. Should I install the third disk yet?
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Curiouser and curioser. It's still resilvering, even though it's 100% complete....

Code:
zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 47h42m, 100.00% done, 0h0m to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vol1                                            ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/7d194783-3124-11e2-8549-e8393520a421  ONLINE       0     0     0
            gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  1.57T resilvered
            ada2p2                                      ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0

errors: No known data errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, I'd give it a few more hours. What I did when I was testing FreeNAS with my test bed. I did the replace thing. After resilvering finished I shutdown the server and removed the drive that was being replaced. Booted up FreeNAS, detached the missing(considered failed) drive. Then started with the next disk.

I always do a shutdown versus replacing a drive with the power on because I saw 2 controllers that didn't seem to support hotswap correctly. Besides, a 1 minute powerdown won't be the end of the world.

Edit: You may have a drive that is failing. Scrubs are very hard on hard drives. Just let it finish whatever its doing. You may have misclicked something. It shouldn't have started another scrub.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Thanks for the response. I'll let it run til the weekend, and then see what's going on....
 

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
Code:
action: Wait for the resilver to complete.
 scrub: resilver in progress for 47h42m, 100.00% done, 0h0m to go


Certainly seems a bit confusing.

Anyway, in 5 hours (between posts), he went from 1.41T resilvered to 1.57T resilvered. It's a 2TB drive so that gives some estimate as to how long this should run. He can periodically check to make sure it's still progressing. And yes, I'd let it finish.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
I like the theory - but;

Code:
 zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 65h49m, 100.00% done, 0h0m to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vol1                                            ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/7d194783-3124-11e2-8549-e8393520a421  ONLINE       0     0     0
            gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  2.16T resilvered
            ada2p2                                      ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0

errors: No known data errors


Now it's gone past 2TB. Anybody got any ideas what it's doing, and when it might be finished?

Thanks for the help!
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
If it were me, I'd consider a reboot. That's not supposed to be dangerous during a resilver and maybe that will help you understand what's going on.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I'd do a reboot and see what a 'zpool status' says.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
I'd do a reboot and see what a 'zpool status' says.

It appears to have started resilvering again!

Code:
 zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 0h2m, 0.06% done, 56h43m to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vol1                                            ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/7d194783-3124-11e2-8549-e8393520a421  ONLINE       0     0     0
            gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  785M resilvered
            ada2p2                                      ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0

errors: No known data errors
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, guess you are stuck waiting for it to finish :( that's crappy.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Seems to be doing the same thing again....

Code:
zpool status
  pool: Vol1
 state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
 scrub: resilver in progress for 41h15m, 100.00% done, 0h0m to go
config:

        NAME                                            STATE     READ WRITE CKSUM
        Vol1                                            ONLINE       0     0     0
          raidz1                                        ONLINE       0     0     0
            gptid/7d194783-3124-11e2-8549-e8393520a421  ONLINE       0     0     0
            gptid/836d0449-3162-11e2-a5d4-e8393520a421  ONLINE       0     0     0  1.34T resilvered
            ada2p2                                      ONLINE       0     0     0
            ada3p2                                      ONLINE       0     0     0

errors: No known data errors


I had a good backup, so I could destroy the volume and re-create it, but that doesn't seem the best option to me.
Can anybody offer any help?

Cheers!
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
What version are you on, 8.2 or earlier?

I few people with similar problems, on the earlier versions, fixed it by upgrading to 8.3.

Also, the output of the following for each drive:
Code:
smartctl -q noserial -a /dev/adaX
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Well, I'd say you have 2 options:

1. Let it finish(this is what I'd do unless you have a backup). It will eventually HAVE to finish, right?
2. You can stop the scrub with "zpool scrub -s Vol1"

Before you stop the scrub think about this; what are you going to do once you stop the scrub? It's pretty evident that something is wrong.

Sent ya a PM.

Edit: I'm slow. PaleoN beat me to posting. Walked away from the computer for 45 mins and didn't see his post until after I posted. The smartctl may give a clue as to what is wrong. I'm wondering if this is a sata cable failing or something and the scrub is writing data, then verifying it and finding errors so rewriting it. I'm not sure if scrubs verify writes or not though. If I were designing it I would want it to verify any writes performed with a scrub since you are already fixing something that is known to be broken and you want to be sure it is truly fixed.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
smartctl fails as specified;

Code:
 smartctl -q noserial -a /dev/ada2p2
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

/dev/ada2p2: Unable to detect device type
Smartctl: please specify device type with the -d option.

Use smartctl -h to get a usage summary


So I tried it with -d ata on the end;

Code:
 smartctl -q noserial -a /dev/ada2p2 -d ata
smartctl 5.42 2011-10-20 r3458 [FreeBSD 8.2-RELEASE-p9 amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Smartctl: Device Read Identity Failed: Inappropriate ioctl for device

A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.


This is a HP Microserver NP40L - is there possibly some problem with the way I have the drives configured?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You should be going to /dev/ada2, not /dev/ada2p2. The adaxPx are logical sections of a disk.
 

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Gotcha - uploaded as attachments.

Edit - Forgot to post version; FreeNAS-8.2.0-RELEASE-p1-x64 (r11950), was upgrading the disks before performing the upgrade as I only have about 10Gb available on the volume....
 

Attachments

  • ada0.txt
    7.1 KB · Views: 275
  • ada1.txt
    5.5 KB · Views: 274
  • ada2.txt
    7.2 KB · Views: 258
  • ada3.txt
    6.7 KB · Views: 253

andyl

Explorer
Joined
Apr 20, 2012
Messages
76
Gave up in the end. The resilver was still running after about 6Tb, so I zapped the whole box and and currently restoring from backup.
Thanks for all the assistance!
 
Status
Not open for further replies.
Top