SOLVED Resilver running slowly with WD Blue (23 DAYS!)

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
I have a drive currently resilvering that is taking a LONG time.

The TL;DR version:

The pool is a 8x6TB RAIDZ2, and is 81% full. I am trying to replace a drive in the array with a shucked WD Blue (WD60EZAZ). The resilver has been running for 2 days already. The write speed is dropping (29MB/s and falling), and the time to complete is increasing. The rate of % complete has stayed steady, at .03% per 10 minutes. At this rate, the resilver will take about 23 days to complete. I recently completed 2 separate resilvers on this array (see below for details) using WD 8TB Reds (WD80EFAX); each took about 20 hours (avg speed was 530MB/s). I’ve never had a resilver run this slowly (even on a Blue). What am I missing?

The long version:

The pool at the beginning of this saga was comprised of:
4x 6TB WD Red WD60EFRX
2x 6TB WD Blue WD60EZRZ
1x 6TB Seagate ST6000DX000 (7200rpm)
1x 6TB Mediamax WL6000GSA6457 (WD OEM drive, “white label” rebrand)
I wanted to replace the Seagate, as it runs hot (obviously). I recently picked up a 6TB WD Elements (Blue WD60EZAZ) to shuck and replace the Seagate. After starting, I tried to cancel the replace (to change location of the 2 drives), but I screwed it up, and was left with a degraded array. No problem; this is what RAIDZ2 is for. I set up the new drive in the system, and set about resilvering. After a couple of hours, one of the other drives in the array (the Mediamax) started throwing errors, and ultimately died. Now I’m scared: I have no redundancy left. After running the resilver with the Blue for a couple of days, the speed is insanely slow (sub 30MB/s), gstat shows the Busy% at 100, and I’m terrified of losing another drive. I have another system in development with a bunch of WD 8TB drives (Red WD80EFAX), and figured that they might be faster. They were; a lot. I ran the 2 resilvers sequentially (not simultaneously); each took about 20 hours at 530MB/s (gstat Busy% was around 40).

Now that the pool is healthy, I need my 8TB reds back for the dev machine, and I go to replace one with the Blue I bought. I figured it would be faster with the array healthy. Nope. Same as before. As stated above, the replacement is on track to take over 23 DAYS to complete.

I know that Blue drives are crappier than Reds, and expected it to be slower, but not by this much. I did this exact replacement a couple years ago (minus my screw-ups), replacing a Seagate with the Blue EZRZ. I don’t have exact times, but it didn’t take more than a day or two. WD’s spec sheet shows the only difference between the EZRZ and EZAZ to be the cache; 64 vs 256. It almost seems like this is an SMR drive, but I’m not aware of WD having any SMR drives.

Does anybody have any insight or suggestions? I’ve tried to include everything I can think of, and will attach output from zpool status and gstat in the next post. Please let me know if there’s any other info that would be helpful.

I greatly appreciate your help.


System specs:
FreeNAS-11.1-U7
Supermicro A1SAi-2750F
4x Kingston 8GB ECC DDR3-1600
IBM M1015 flashed to IT mode
(boot) Supermicro SSD-DM016-PHI SATA DOM
(jail & system, mirror) 2x Sandisk SDSSDP-128G-G25
(aux, mirror) 2x 4TB Seagate ST4000LM016
(main pool, RAIDZ2)
4x 6TB WD Red WD60EFRX
2x 6TB WD Blue WD60EZRZ
2x 8TB WD Red WD80EFAX (temporary replacements)
Silverstone DS380B w/ 3x Noctua NF-F12 iPPC-3000
Silverstone ST45SF-G 450W PSU
APC SUA1500 UPS
 

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
As promised, output from zpool status and gstat

Code:
root@freenas:~ # zpool status paul
  pool: paul
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Aug  8 16:45:30 2019
        5.80T scanned at 37.0M/s, 4.50T issued at 28.7M/s, 35.4T total
        563G resilvered, 12.73% done, 13 days 01:14:41 to go
config:

        NAME                                                  STATE     READ WRITE CKSUM
        paul                                                  ONLINE       0     0     0
          raidz2-0                                            ONLINE       0     0     0
            gptid/f1edfd73-0683-11e6-a058-0cc47a33a4ec.eli    ONLINE       0     0     0
            gptid/f2affb17-0683-11e6-a058-0cc47a33a4ec.eli    ONLINE       0     0     0
            gptid/f37a9e81-0683-11e6-a058-0cc47a33a4ec.eli    ONLINE       0     0     0
            gptid/f43bcc5a-0683-11e6-a058-0cc47a33a4ec.eli    ONLINE       0     0     0
            gptid/f4ff27fc-0683-11e6-a058-0cc47a33a4ec.eli    ONLINE       0     0     0
            gptid/48be6173-b3cc-11e9-b9d7-0cc47a33a4ec.eli    ONLINE       0     0     0
            replacing-6                                       ONLINE       0     0     0
              gptid/0d713e90-b980-11e9-b9d7-0cc47a33a4ec.eli  ONLINE       0     0     0
              gptid/93ce3fde-ba36-11e9-b300-0cc47a33a4ec.eli  ONLINE       0     0     0  (resilvering)
            gptid/b7963b3a-4f22-11e7-8c45-0cc47a33a4ec.eli    ONLINE       0     0     0

errors: No known data errors


Code:
root@freenas:~ # gstat -p -I 30s
dT: 30.004s  w: 30.000s
 L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     54     54   2439    1.2      0      0    0.0    1.6| da0
    0     54     54   2434    1.4      0      0    0.0    1.7| da1
    0     54     54   2439    1.1      0      0    0.0    1.6| da2
    0     55     55   2441    1.1      0      0    0.0    1.5| da3
    0     53     53   2435    1.2      0      0    0.0    1.6| da4
    0     54     54   2438    1.2      0      0    0.0    1.6| da5
    0     54     54   2427    0.6      0      0    0.0    1.4| da6
    0     54     54   2430    0.7      0      0    0.0    1.4| da7
    0     35      0      0    0.0     35   2365   33.6   97.8| da8
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,110

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
... this and several other WD drives are using SMR platters.

Thanks for the info. That explains the horrible performance. Although WD has done some shady stuff over the years (all the head-parking nonsense), they’ve been pretty faithful on their spec sheet accuracy. It appears that they’re now competing with Seagate for first place in the “Misleading Specs” race. I’ll need to be more careful when replacing my 6TB drives.

I’m going to try to return the drive to NewEgg based on the misleading specs.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
The resilver has been running for 2 days already.
Just to give you a point of comparison, I recently did a resilver, here is the zpool status:

Code:
  pool: Emily
 state: ONLINE
status: Some supported features are not enabled on the pool. The pool can
        still be used, but some features are unavailable.
action: Enable all features using 'zpool upgrade'. Once this is done,
        the pool may no longer be accessible by software that does not support
        the features. See zpool-features(7) for details.
  scan: resilvered 2.76T in 0 days 03:49:08 with 0 errors on Fri Aug  9 10:03:46 2019
config:

        NAME                                            STATE     READ WRITE CKSUM
        Emily                                           ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/af7c42c6-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/b07bc723-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/b1893397-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/b2bfc678-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/b3c1849e-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/b4d16ad2-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/acb0b918-ba5d-11e9-b6dd-00074306773b  ONLINE       0     0     0
            gptid/670fa647-ba90-11e9-b6dd-00074306773b  ONLINE       0     0     0
            gptid/d1ea0d87-ba96-11e9-b6dd-00074306773b  ONLINE       0     0     0
            gptid/b9de3232-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/baf4aba8-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
            gptid/bbf26621-bf05-11e8-b5f3-0cc47a9cd5a4  ONLINE       0     0     0
        logs
          gptid/ae487c50-bec3-11e8-b1c8-0cc47a9cd5a4    ONLINE       0     0     0
        cache
          gptid/ae52d59d-bec3-11e8-b1c8-0cc47a9cd5a4    ONLINE       0     0     0

errors: No known data errors

It took three hours and 49 minutes. I have a server at work with over 250TB of data that can resilver a drive in about 12 hours...
There is no possible way it should take multiple days to resilver. There is a serious problem here.
The pool is a 8x6TB RAIDZ2, and is 81% full.
That is too full but still, it is taking far longer than it should.
I’ve never had a resilver run this slowly (even on a Blue). What am I missing?
The new drive is either defective or it is a SMR (Shingled Magnetic Recording) drive. That type of drive is to be avoided like plague.
WD’s spec sheet shows the only difference between the EZRZ and EZAZ to be the cache; 64 vs 256.
They just are not telling the whole story. You would be much better served buying a non SMR drive. The drives with huge cache like that are almost always SMR drives and they perform terribly with ZFS because of how ZFS works. You would be better off pulling that drive back out and replacing it with something else. No, you do not need to wait for the resilver to finish.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
Just stay away from the Blue drives across the board and they even have a couple models of the Red drives that are SMR drives. You can always come here and ask before you buy. This is a fairly good reference also:
https://www.ixsystems.com/community/resources/disk-price-performance-analysis-buying-information.62/

Thanks for the info and the link. This drive was a shucked external. Since it’s now impossible to guarantee that any external WD 6TB drive isn’t SMR (Red or Blue), I’m just going to buy more 8TB WD externals. It’s cheaper to get external 8TB and 10TB drives than internal 6TB non-SMR.

I’m not mad that WD (or Seagate for that matter) *have* SMR drives; I’m sure there are use cases for them (just not in my servers!). I’m upset that they’re being cagey and deceptive with spec sheets. Kingston did the same thing with their RAM and got bit hard when people realized what they were doing. Unfortunately, WD and Seagate make up the majority of spinning rust sold; if they’re both screwing people over, there’s not really anyone else to buy from.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Unfortunately, WD and Seagate make up the majority of spinning rust sold; if they’re both screwing people over, there’s not really anyone else to buy from.
Toshiba, but they have a small share of the market. If they are honest about what they are selling, they might be gaining a bigger share.
 
Joined
May 10, 2017
Messages
838
If you can try resilvering the disk again after upgrading to FreeNAS 11.2, another user with a similar disk was having much better resilver performance on latest FreeNAS, possible because resilvering is more sequntial and might not degrade so much on SMR disks.
 

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
If you can try resilvering the disk again after upgrading to FreeNAS 11.2, another user with a similar disk was having much better resilver performance on latest FreeNAS, possible because resilvering is more sequntial and might not degrade so much on SMR disks.
That’s good to know. I read about that while going through this mess, but didn’t make the connection that it might help my problem. The disk in question is packed for return to NewEgg, but if I get desperate and need a replacement (or get a *really* great deal), I’ll keep that in mind.

I had held off upgrading to 11.2 due to the issues with drives not spinning down (yes, I know mentioning spindown is a great way to summon Cyberjock and start a holy war; I have good reasons). I know it’s been fixed (my dev server is running it and spins down fine); I just haven’t gotten around to upgrading this one yet.
 

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
If you can try resilvering the disk again after upgrading to FreeNAS 11.2...

So I wrote my reply, and I’m sitting watching TV, and I look at the boxed-up drive. It's taunting me. I figure, “What the heck, I’ll try it.” Upgraded to 11.2, rebooted, and hooked up the drive...

The rebuild stats look radically different than the last time. If I hadn't swapped the disks myself, I wouldn't believe this was the same drive.

Code:
Mon Aug 12 01:11:28 MST 2019
  pool: paul
state: ONLINE
status: One or more devices is currently being resilvered.  The pool will
        continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.50T scanned at 1.38G/s, 389G issued at 214M/s, 35.4T total
        47.2G resilvered, 1.07% done, 1 days 23:42:37 to go


Code:
root@freenas:~ # gstat -p -I 15s
dT: 15.003s  w: 15.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0    641    631  29557    0.7     10     67    0.4   21.2| da0
    0    652    642  29532    0.6     10     70    0.4   20.6| da1
    0    648    640  29551    0.6      8     73    0.9   21.1| da2
    0    649    639  29542    0.6     10     73    0.7   20.8| da3
    0    643    633  29541    0.6      9     68    0.3   18.6| da4
    0    652    644  29534    0.6      8     69    0.5   18.4| da5
    0    686    675  29501    0.4     10     72    0.3   13.9| da6
    0    689    679  29496    0.4     10     71    0.4   14.0| da7
    0    389      0      0    0.0    386  29043    3.3   78.3| da8



GRANTED... this is only 30 minutes in... but the performance had tanked by this point on previous attempts. 2 days is about what I would have expected for a low-performance Blue drive rebuilding thru USB on a pool over 80% full. This is perfectly acceptable.

@Johnnie Black I can't thank you enough for mentioning that. I'll post back once the resilver is complete (or if the performance craps out). Fingers crossed!

@Chris Moore, This may make SMR drives useable(-ish!). It'll be interesting to see how this performs as a member of the pool, but initial results look good!
 
Joined
May 10, 2017
Messages
838
GRANTED... this is only 30 minutes in... but the performance had tanked by this point on previous attempts. 2 days is about what I would have expected for a low-performance Blue drive rebuilding thru USB on a pool over 80% full. This is perfectly acceptable.

Looks much better, please keep us updated after it's complete, or if it slows down after some time.
 

scholztec

Dabbler
Joined
Mar 29, 2014
Messages
14
south-park-and-its-gone.jpg

Just in case you need a reference for the above

Well... I probably should have waited before being excited. Right back to where it was before.

Code:
root@freenas:~ # gstat -p -I 15s
dT: 15.020s  w: 15.000s
L(q)  ops/s    r/s   kBps   ms/r    w/s   kBps   ms/w   %busy Name
    0     65     57   2414    0.8      8     60    0.2    2.9| da0
    0     65     57   2415    0.9      8     59    0.3    3.1| da1
    0     64     58   2415    0.9      7     59    0.6    3.3| da2
    0     65     57   2413    1.1      8     60    0.4    3.1| da3
    0     63     57   2421    1.8      6     61    0.5    2.5| da4
    0     65     57   2417    0.9      8     60    0.2    2.3| da5
    0     65     58   2415    0.9      8     60    0.4    1.9| da6
    0     64     56   2418    0.9      8     61    0.4    2.1| da7
    1     47      0      0    0.0     46   2818   29.3   95.1| da8


I have a script running to log the status of this pool. Notice the dramatic decrease in speed (and increase in rebuild time):

Code:
Mon Aug 12 02:21:28 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.39T scanned at 2.06G/s, 111G issued at 95.8M/s, 35.4T total
        13.3G resilvered, 0.31% done, 4 days 11:13:07 to go


Mon Aug 12 04:21:29 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.91T scanned at 363M/s, 743G issued at 90.6M/s, 35.4T total
        90.4G resilvered, 2.05% done, 4 days 15:21:20 to go


Mon Aug 12 06:21:29 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.91T scanned at 196M/s, 948G issued at 62.3M/s, 35.4T total
        116G resilvered, 2.62% done, 6 days 17:04:09 to go


Mon Aug 12 08:21:30 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.91T scanned at 134M/s, 1.11T issued at 51.0M/s, 35.4T total
        138G resilvered, 3.14% done, 8 days 03:31:38 to go


Mon Aug 12 10:21:30 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.91T scanned at 102M/s, 1.30T issued at 45.3M/s, 35.4T total
        162G resilvered, 3.67% done, 9 days 02:57:11 to go


Mon Aug 12 12:21:31 MST 2019
  scan: resilver in progress since Mon Aug 12 00:40:25 2019
        2.91T scanned at 82.0M/s, 1.45T issued at 41.0M/s, 35.4T total
        182G resilvered, 4.11% done, 10 days 01:01:47 to go




I appreciate all the help and suggestions from everyone here. Looks like I'm back to searching for 8TB external drives, as that appears to be the only way to avoid the curse of SMR.
 
Joined
May 10, 2017
Messages
838
That's a shame, it's possible there's a small CMR zone, say 100GB or so, and after that's full you hit the SMR wall.
 
Top