Scrub slow, Problem or biproduct of having a massive pool?

Status
Not open for further replies.

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147
My pool has been scrubbing for a month and has a couple weeks to go at least, yet is only 69% done.

upload_2016-1-18_16-29-13.png


This is also with aggressive settings:

vfs.zfs.scrub_delay=0
vfs.zfs.top_maxinflight=128
vfs.zfs.resilver_min_time_ms=5000
vfs.zfs.resilver_delay=0

I have 150 disks (12 are idle) and only two have errors, none are failed or rebuilding.
" /dev/da29 [SAT], 1 Currently unreadable (pending) sectors"
" /dev/da11 [SAT], Failed SMART usage Attribute: 184 End-to-End_Error."

That said both those disks are online and doing the same IO as every other disk, I was hoping to finish the scrub before replacing them. Could this just be a byproduct of having a huge pool?

I have this error message as well, which I have a pending fix for (I doubled it to 4096) but I need to reboot for the change to go through.

Jan 18 14:07:17 fs05 mps1: Out of chain frames, consider increasing hw.mps.max_chains.

Any thoughts?
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
Resilvering of a parity-based array (RAIDZ) requires traversing the entire dataset, and will also get progressively slower as fragmentation increases (more seeking going on). Also, if you have anything actively accessing the pool, that'll slow things down substantially.

The gurus may be able to suggest some tunables to make things faster... but it's never going to be fast...
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
573TB is how much data is in the pool? Jesus christ. Just doing some math here, factoring for the size of the pool, and various things, I would expect a scrub of 573TB on a pool experiencing normal usage conditions, with SATA drives, to take approximately 36 days. So the numbers you have, actually, seem reasonable.

Interesting that your pool is so large that it takes longer to do one scrub than the normal interval we'd recommend between scrubs. lol.
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
I'll admit to being curious... why is that 573T in one big pool? Do you really need to present *that* much storage as one big lump?
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
One of the things the OP said last week, was that the pool was composed of 12 x 12 disk RAIDz3 vdev's.

And FreeNAS "It's telling me to upgrade my firmware from version 5 to 9 but I'm not touching it unless I need to"
 

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
One of the things the OP said last week, was that the pool was composed of 12 x 12 disk RAIDz3 vdev's.

And FreeNAS "It's telling me to upgrade my firmware from version 5 to 9 but I'm not touching it unless I need to"

Do you know how long it takes to download and curate 573TB of porn^H^H^H^Hvaluable data? Even at 50Mbps, assuming perfect utilization, that's more than 3 years of work! :D
 

depasseg

FreeNAS Replicant
Joined
Sep 16, 2014
Messages
2,874
I think your scrub is slow. Here are my scrub speeds from two of my pools. One is doing 763M/s and the other 615M/s. Maybe it's because my pools aren't heavily used right now though.

Code:
[root@freenas1] ~# zpool status
  pool: backup-tank
state: ONLINE
  scan: scrub in progress since Tue Jan 19 09:24:03 2016
        2.81T scanned out of 25.1T at 762M/s, 8h30m to go
        0 repaired, 11.19% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        backup-tank                                     ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/47f4175e-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/48d8d60b-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/49bd46ca-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4ab3e647-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4b9e6364-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4c862ac8-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4d72ebb5-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4e5c9527-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/4f4a31ea-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5031936b-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/512c0226-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/521d5ff4-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5303de29-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/53f91f2c-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/54daed66-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/55ded62d-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/56e46d23-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/57d64d4c-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/58cb455a-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/59aa77d7-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
          raidz2-2                                      ONLINE       0     0     0
            gptid/5aa26b53-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5b7a49d6-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5cc27257-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5dfda90e-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/5f4e8c42-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
            gptid/60301043-816a-11e5-a89a-002590fbfb40  ONLINE       0     0     0
        spares
          gptid/854c08bc-8a11-11e5-969d-002590fbfb40    AVAIL  

errors: No known data errors

  pool: freenas-boot
state: ONLINE
  scan: scrub repaired 0 in 1h7m with 0 errors on Sun Jan 10 04:52:14 2016
config:

        NAME                                            STATE     READ WRITE CKSUM
        freenas-boot                                    ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/70d43788-80be-11e4-b68d-002590fbfb40  ONLINE       0     0     0
            gptid/04fb5c1a-9b79-11e4-a597-002590fbfb40  ONLINE       0     0     0

errors: No known data errors

  pool: tank
state: ONLINE
  scan: scrub in progress since Tue Jan 19 09:19:47 2016
        2.42T scanned out of 24.4T at 615M/s, 10h25m to go
        0 repaired, 9.90% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        tank                                            ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/f0f1f829-7f5c-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/f44b5e06-7f5c-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/f7b6be5b-7f5c-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/fb1109fc-7f5c-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/fe7a0d91-7f5c-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/01cf6ea8-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/05303815-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/0889c7ae-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/0beb5218-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/0f60d4db-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
            gptid/12bbe0d6-7f5d-11e5-8d67-002590fbfb40  ONLINE       0     0     0
        logs
          gptid/13491c6f-7f5d-11e5-8d67-002590fbfb40    ONLINE       0     0     0
        cache
          gptid/da71277a-832f-11e5-b2db-002590fbfb40    ONLINE       0     0     0

errors: No known data errors
[root@freenas1] ~#
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
do a zpool list , whats your fragmentation? I assume its fairly high.
 

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147
573TB is how much data is in the pool? Jesus christ. Just doing some math here, factoring for the size of the pool, and various things, I would expect a scrub of 573TB on a pool experiencing normal usage conditions, with SATA drives, to take approximately 36 days. So the numbers you have, actually, seem reasonable.

1.PNG


One of the things the OP said last week, was that the pool was composed of 12 x 12 disk RAIDz3 vdev's.

And FreeNAS "It's telling me to upgrade my firmware from version 5 to 9 but I'm not touching it unless I need to"

That is actually a different server, it's the one I'm replicating to. And while I understand why you would point that out, try and understand that I have a very unique environment and a lot of things are blocking me. The most notable is replication literally never ends and currently if I turn off that server I'll lose 50TB of replicated data because it's in the middle of a massive transfer and once it's done that it starts another massive transfer. Freenas 10 will apparently have multiple replication streams which will be a godsend for me, but this is all for another thread I suppose.

I'll admit to being curious... why is that 573T in one big pool? Do you really need to present *that* much storage as one big lump?

SQL database backups, I'm in the process of splitting out each of the larger/more important servers into their own dataset.

do a zpool list , whats your fragmentation? I assume its fairly high.

2.png
 
Last edited:

tvsjr

Guru
Joined
Aug 29, 2015
Messages
959
I've built and operated some large storage subsystems, but never in FreeNAS, so I'll admit to some ignorance. However, since you're already cutting the pool up into separate mounts for each DB, wouldn't it make sense to create a separate pool for each DB? This should (I think) also allow you to do parallel replication, would it not? It would be a little less space-efficient, since free space couldn't be shared... however, if the database sizes are relatively constant, it wouldn't matter so much.


You say the replication tasks are basically running continuously... the more you use the array, the longer the resilver will take. If you have replication tasks running, systems conducting backups, etc., your resilver time will be dramatically longer. No different than any other large storage array out there.
 

Bhoot

Patron
Joined
Mar 28, 2015
Messages
241
Well not being a pro here but the scrub for me proceeds at 150-350 MBytes/s. I do realize though it slows down towards the end (maybe coz of the way data is written on the HDD) but my 8x4tb raid z2 takes somewhere between 3-5 days for a scrub

To the OP: I would rather make separate systems (maybe with 1 or 2 vdev) with you capacities for a much more effective scrub time. I mean I am sure you did think long and hard for creating the zpool but having a scrub going on for a month is really not the most optimum environment for the server. Also the system is under constant load while scrubbing so your system performance also takes a hit.
 

diehard

Contributor
Joined
Mar 21, 2013
Messages
162
I'm actually shocked your only at 13% fragmentation with 81% capacity. Your data must have not changed a whole ton.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I'm actually shocked your only at 13% fragmentation with 81% capacity. Your data must have not changed a whole ton.

They are backups so yeah, pretty much write once and never touch it again :)
 

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147
I've built and operated some large storage subsystems, but never in FreeNAS, so I'll admit to some ignorance. However, since you're already cutting the pool up into separate mounts for each DB, wouldn't it make sense to create a separate pool for each DB? This should (I think) also allow you to do parallel replication, would it not? It would be a little less space-efficient, since free space couldn't be shared... however, if the database sizes are relatively constant, it wouldn't matter so much.

You say the replication tasks are basically running continuously... the more you use the array, the longer the resilver will take. If you have replication tasks running, systems conducting backups, etc., your resilver time will be dramatically longer. No different than any other large storage array out there.

From my understanding there is no way to do parallel replication out of the box, even if you have separate mounts. By splitting up the datasets it gives me the ability to prioritize them, for instance two of my servers are extremely critical so I split those out first and fired up replication and as they finish I will do the rest. If I don't care about certain servers I'll leave them in the original dataset which is snapshotted but not replicated.

The reason I have one giant pool is because of scope creep, initially I was told we only needed 1/10th this storage but they kept changing the retention policies and filling it up. I had to react quickly and grow it or risk losing data, Also they had no idea what the sizes would be of each database so I couldn't hard set any separate pools even if I wanted to.

Well not being a pro here but the scrub for me proceeds at 150-350 MBytes/s. I do realize though it slows down towards the end (maybe coz of the way data is written on the HDD) but my 8x4tb raid z2 takes somewhere between 3-5 days for a scrub

To the OP: I would rather make separate systems (maybe with 1 or 2 vdev) with you capacities for a much more effective scrub time. I mean I am sure you did think long and hard for creating the zpool but having a scrub going on for a month is really not the most optimum environment for the server. Also the system is under constant load while scrubbing so your system performance also takes a hit.

I was aware as I grew this beast that there would be issues with scale but it's handled it quite well all things considered. If I knew what the retention of the databases would be before I created these servers I would've definitely split them up into more physically boxes actually, for many reasons but mostly replication time. It's definitely been the biggest struggle.
 

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
OK, so it's a different server, but according to last weeks message "The main server has 150 actually (including spares), replicated to an identical server".

The point I was driving at was how your pool was laid out. You didn't provide that information in your opening message. Some might look at the message and wonder if the pool is comprised of mirrors, some form of RAIDzX, as well as the width of the individual vdev's.

I didn't have time to write a longer reply before work this morning, but I thought I'd share what you posted last week, so others might have a better idea about your configuration. I read almost all of the messages that are posted in English.

That is actually a different server, it's the one I'm replicating to. And while I understand why you would point that out ...
 

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147
OK, so it's a different server, but according to last weeks message "The main server has 150 actually (including spares), replicated to an identical server".

The point I was driving at was how your pool was laid out. You didn't provide that information in your opening message. Some might look at the message and wonder if the pool is comprised of mirrors, some form of RAIDzX, as well as the width of the individual vdev's.

I didn't have time to write a longer reply before work this morning, but I thought I'd share what you posted last week, so others might have a better idea about your configuration. I read almost all of the messages that are posted in English.

Yep I can appreciate that, thank you.
 

Mlovelace

Guru
Joined
Aug 19, 2014
Messages
1,111
Doing Veeam forever forward incrementals we are at 28% frag with 39% capacity. :(
Same here Veeam is fragmenting my pool. It's now at 23% frag with 25% capacity. The RPOs haven't been affected yet but it's only a matter of time.
 

fullspeed

Contributor
Joined
Mar 6, 2015
Messages
147
Scrub said 325hrs last night, I came in this morning and it was finished.

A little unsettling but at least I can reboot now.

upload_2016-1-20_9-43-12.png
 
Status
Not open for further replies.
Top