SOLVED Scrub suddenly taking far longer

Status
Not open for further replies.

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Hi, guys!

I've had my FreeNAS-setup (specs see footer) for a while now and it works like a charm. I do scrubs weekly (on the night from sunday to monday) when I'm sure, nobody is accessing the server.

Usually, the scrub (for 4.4TB of data) takes around 8 hours, which I think is reasonable. Since last week, it takes around 11 - 12 hours. So, I took a look at the disk activity in the Reporting-page (screenshot attached, all four drives look exactly alike).

Why does the scrub only read around 2-3MB/sec/drive for the first 3 hours before jumping to the usual 60-80MB/Sec/drive? (screenshot 1)

Also, the performance drops occasionally (screenshot 2). During that time, I was in the car, so I'm sure, no one accessed the server.

The scrub shows no errors, as does the (also weekly) SMART short selftest. Smart-data is clean on all drives.

Any ideas?
 

Attachments

  • scrub1.JPG
    scrub1.JPG
    31.9 KB · Views: 333
  • scrub2.JPG
    scrub2.JPG
    34.9 KB · Views: 384

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
To give you a bit more details:
I user one vdev with 4 Samsung Spinpoint drives (2TB) in a RAIDZ1. Compression is the standard lz4 on the pool (and inherit on all datasets) and I use deduplication only on one dataset (183GB), because that's where my wife and I store our iTunes-libraries and we have a lot of the same MP3s.
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
I would check if ur disks are ok...generally a much lower speed indicates some hdd problems
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
That's what I thought, too. But the selftest is okay on all drives and the smart-data also shows nothing. What else can I check to make sure, the drives are okay?
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
I can't see ur hw specs from mobile, but if ur hw is ok and u can assure ur hdd are ok too, my only other guess is that there was something going on the server, maybe with the dedup. You could try checking next scrub to see how it goes
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
No problem. My hardware-specs are:
Latest 9.3-RELEASE
HP Z210 CMT
Intel Xeon E3-1270 (3,4GHz,4C,HT)
16GB ECC-Ram (1333MHz, 4*4GB, Dual channel)
4*2TB Samsung 7200rpm SATA-drives
Single RAIDZ1-pool
Mirrored 16GB USB2.0-sticks for FreeNAS.

The interesting thing is that the deduplication and also the amount of data has been the same since the server exists. And suddenly, last week the time for the scrub increased from 7 hours to 10hours.:
So: until 16th of february: 7h
23th of february: 10hours
2nd of march: 10hours

No changes in configuration or amount of data (just a few gigabytes of new movies, but far from explaining this behavior).

I also attached the SMART-Data for all four drives. I see no irregularities. Do you?
 

Attachments

  • ada0.txt
    16.5 KB · Views: 263
  • ada1.txt
    16.3 KB · Views: 267
  • ada2.txt
    16.1 KB · Views: 242
  • ada3.txt
    16.3 KB · Views: 252

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I did a "gstat -f /" during the "slow" times and attached the screenshot. All the drives show roughly the same activity level and data rate. Shouldn't one drive have far higher activity than the others if it were damaged? Also, the "zpool status" shows 100.01% completion. ;-)

After 10:16, the scrub completed without any errors.
 

Attachments

  • gstat.JPG
    gstat.JPG
    43.9 KB · Views: 362
  • zpool_status.JPG
    zpool_status.JPG
    53.8 KB · Views: 467
  • zpool_status_fertig.JPG
    zpool_status_fertig.JPG
    51.1 KB · Views: 350

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ZFS is a complex beast. Your scrubs are interesting, but completely unexpected. ZFS uses lots of resources, and when you run out, it is often like going off a cliff.

Myself, my server went from running at saturation speeds over my Gb LAN to unable to stream a single movie overnight (and nothing significant took place overnight). Its like a balloon, everything is fine until that one last breath.. then it bursts.

Dedup is a terrible idea with just 16GB of RAM. Especially so with that system. Not only is the ddt (dedup table) stored in RAM, but with just 183GB of data, the dedup can, in theory, be much larger than your entire system's RAM. To boot, using the ddt and such requires good throughput between the CPU and RAM... something you do not have in abundance with a system that has a FSB.

Sorry, but I'd wager the problem is that you are expecting a bit much for hardware that is more than 7 years old. You could try undeduping your data, but that's going to take some work and the RAM you are going to save may or may not be enough.

Good luck.
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
Well, there are some reported errors for values 1-181-200, especially for ada3, but i don't know if that is the problem or not. Wait for someone else suggestion
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Thank you for your answer.

I have 6TB of data (maximum) which translates into 6GB of ram including 180GB deduped data (which translates into 1,something GB of RAM), so I think, I'm on the safe side with 16GB of ram.

But the ram-idea is a good one. I included a screenshot. FreeNAS dropped it's ARC when I cancelled the transfer of a big file yesterday and has been bulding it up again ever since (see screenshot).
My point is:
1.) When the scrub started very, very slow, the system still had 8GB of RAM left.
2.) Disc activity was above 80% with a queue depth of 5 to 6 and 200 commands/second. If memory was indeed the issue, shouldn't disc activity be far lower?

If you really think, deduplication of those few MP3s is too much for the system, i can easily back the data up, delete the dataset, recreate it without duplication and restore the data.

By the way, the hardware is not 7 years old. The E5-1270 was introduced in Q3/2011: http://ark.intel.com/de/products/52276/Intel-Xeon-Processor-E3-1270-8M-Cache-3_40-GHz
 

Attachments

  • memory.JPG
    memory.JPG
    38.8 KB · Views: 286
Last edited:

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Well, there are some reported errors for values 1-181-200, especially for ada3, but i don't know if that is the problem or not. Wait for someone else suggestion
That could be it, I agree. But on the other hand, the disc activity, queue depth and commands/sec are the same for all four drives. If one was damaged, shouldn't the values be different for the damaged one?
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
The logs don't show any LONG tests.

Let's do long tests.

smartctl -t long /dev/ada3 and so on.
 

Apollo

Wizard
Joined
Jun 13, 2013
Messages
1,458
I think scrub performance is highly dependent on Dataset content and fragmentation level.
When I do scrubs, I can see the Disk report fluctuate as it progresses. I would say behavior seem consistent from one scrub to the next.
Throughput ranges approximately from a maximum of 100MB/s down to 15MB/s per disk.

On your gstat report, you can see your "%busy" column being close to 100%, with FreeBSD, sampling is not accurate and the information displayed is not always correct. I would suspect the readings are fluctuating above %100 as well.
I noticed your OP/s are in the 200 on the read side and for a scrub I wouldn't expect Writes to happen, but it does, even though the OP/s are low, they are affecting your pool performance.
I don't believe your pool or hardware suffer failure, but I suspect the amount and size of MP3 files is what's causing the scrub to be slow. I also suspect deduplication is not able to keep up due to the insuficient amount of RAM and cause the swap to be written to to alleviate this limitation, hence the writes.
It is also possible the latest update is doing things differently, ie new zfsd feature, affecting overall system behavior.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
Of course.

sysctl-Output:
Code:
vfs.zfs.arc_meta_limit: 3877044224
vfs.zfs.arc_meta_used: 977644608
vfs.zfs.arc_shrink_shift: 5
vfs.zfs.arc_average_blocksize: 8192
vfs.zfs.arc_min: 1938522112
vfs.zfs.arc_max: 15508176896


And the zpool status:
Code:
  pool: brain
state: ONLINE
  scan: scrub repaired 0 in 10h16m with 0 errors on Mon Mar  2 11:16:24 2015
config:

        NAME                                            STATE     READ WRITE CKSUM
        brain                                           ONLINE       0     0     0
          raidz1-0                                      ONLINE       0     0     0
            gptid/ccfbb34a-7dff-11e4-8be5-2c4138a81e5e  ONLINE       0     0     0
            gptid/cd89f054-7dff-11e4-8be5-2c4138a81e5e  ONLINE       0     0     0
            gptid/ce0f6c16-7dff-11e4-8be5-2c4138a81e5e  ONLINE       0     0     0
            gptid/ceb6462b-7dff-11e4-8be5-2c4138a81e5e  ONLINE       0     0     0

errors: No known data errors

dedup: DDT entries 3790850, size 1005 on disk, 223 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
     1    1.47M    185G    162G    163G    1.47M    185G    162G    163G
     2    1.95M    249G    225G    226G    3.93M    503G    455G    455G
     4     199K   24.9G   22.8G   22.9G     816K    102G   93.5G   93.7G
     8    2.44K    313M    248M    250M    21.8K   2.72G   2.09G   2.10G
    16      333   41.6M   18.2M   18.9M    6.95K    890M    398M    413M
    32      120     15M   4.36M   4.65M    4.72K    604M    161M    174M
    64       42   5.25M     42K    244K    3.54K    453M   3.54M   20.6M
   128       28   3.50M   2.01M   2.07M    4.06K    520M    278M    287M
   256        1    128K      1K   5.81K      307   38.4M    307K   1.74M
   512        1    128K      1K   5.81K      536     67M    536K   3.04M
    1K        1    128K      1K   5.81K    1.28K    163M   1.28M   7.42M
    2K        1    128K      1K   5.81K    2.38K    305M   2.38M   13.8M
Total    3.62M    459G    411G    412G    6.24M    796G    713G    715G


Today, I also copied all the deduplicated data off the dataset, turned deduplication off, deleted all the filed and copied the backup back in. So, only the snapshots (hourly for the last two weeks) should reference deduplicated data. I plan to delete them all, but I didn't get around to it. Is there an easy way to delete all snapshots from one dataset?

Thanks!

P.S.: After I deleted all the snapshots, I plan to do the scrub again to see, if anything changed. Is there anything, I can do before that?
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
I think scrub performance is highly dependent on Dataset content and fragmentation level.
When I do scrubs, I can see the Disk report fluctuate as it progresses. I would say behavior seem consistent from one scrub to the next.
Throughput ranges approximately from a maximum of 100MB/s down to 15MB/s per disk.

On your gstat report, you can see your "%busy" column being close to 100%, with FreeBSD, sampling is not accurate and the information displayed is not always correct. I would suspect the readings are fluctuating above %100 as well.
I noticed your OP/s are in the 200 on the read side and for a scrub I wouldn't expect Writes to happen, but it does, even though the OP/s are low, they are affecting your pool performance.
I don't believe your pool or hardware suffer failure, but I suspect the amount and size of MP3 files is what's causing the scrub to be slow. I also suspect deduplication is not able to keep up due to the insuficient amount of RAM and cause the swap to be written to to alleviate this limitation, hence the writes.
It is also possible the latest update is doing things differently, ie new zfsd feature, affecting overall system behavior.

Thank you, that seems logical. I deleted all my MP3s, turned off dedpulication and moved the data back onto the dataset, so if deduplication is indeed the issue, we will know tomorrow. :smile:

Also @DrKK suggested a long smart-selftest of all disks, i will try that as well. And let's see, what @HoneyBadger can read from the data, I provided.

First and foremost: Thanks guys for your great support!
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
Of course.

sysctl-Output:
Code:
vfs.zfs.arc_meta_limit: 3877044224
vfs.zfs.arc_meta_used: 977644608


And the zpool status:
Code:
 dedup: DDT entries 3790850, size 1005 on disk, 223 in core

bucket              allocated                       referenced
______   ______________________________   ______________________________
refcnt   blocks   LSIZE   PSIZE   DSIZE   blocks   LSIZE   PSIZE   DSIZE
------   ------   -----   -----   -----   ------   -----   -----   -----
Total    3.62M    459G    411G    412G    6.24M    796G    713G    715G


Today, I also copied all the deduplicated data off the dataset, turned deduplication off, deleted all the filed and copied the backup back in. So, only the snapshots (hourly for the last two weeks) should reference deduplicated data. I plan to delete them all, but I didn't get around to it. Is there an easy way to delete all snapshots from one dataset?

Thanks!

P.S.: After I deleted all the snapshots, I plan to do the scrub again to see, if anything changed. Is there anything, I can do before that?

So here's the scoop on your dedup:

Your dedup table is currently consuming (3790850 x 223) = 845359550 bytes, or about 800MB of RAM, and you're pulling a ratio of (715:412) = ~1.75:1 for a net gain of around 300GB. The metadata limit in ARC is ~3700MB, so just over 20% of the entire available metadata space in RAM is being consumed by your DDT. While it's not at risk of swapping to disk (and summarily killing pool performance) it isn't exactly delivering huge gains for you.

Regarding deleting snapshots, if you haven't already - you should be able to go to Storage -> Snapshots and define a filter for only that dataset and destroy them all. Be aware that this can cause a fair bit of I/O as it updates the DDT on disk.

Once you've done that, prior to scrub you can run the following line a bunch of times and compare the values:

Code:
zpool status -D yourpoolname | grep dedup


You should see the DDT entries drop as it cleans them up; this is done async so it won't be immediate. Once it's complete, it will show "no DDT entries"
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I have 6TB of data (maximum) which translates into 6GB of ram including 180GB deduped data (which translates into 1,something GB of RAM), so I think, I'm on the safe side with 16GB of ram.

You do know that every zfs record takes 300 bytes or so of ARC space. So in theory, if you had 180GB of files with records that were 512 bytes in size, you'd need almost 150GB of RAM to dedup 180GB of data. I'm not even joking. Dedup is something that can get out of control rapidly, which is why the harsh warnings are included and everyone will add you to an ignore list the second you mention you use dedup.

But the ram-idea is a good one. I included a screenshot. FreeNAS dropped it's ARC when I cancelled the transfer of a big file yesterday and has been bulding it up again ever since (see screenshot).
My point is:
1.) When the scrub started very, very slow, the system still had 8GB of RAM left.
2.) Disc activity was above 80% with a queue depth of 5 to 6 and 200 commands/second. If memory was indeed the issue, shouldn't disc activity be far lower?

Nope. In fact, from what I saw on my system, ZFS couldn't effectively cache anything properly so in it reading data that would later be discarded only to be read a second time because the ARC couldn't be effectively utilized. So my drives would go busy as all hell for like 20 seconds when trying to get something as simple as a directory listing with just 10 files.

By the way, the hardware is not 7 years old. The E5-1270 was introduced in Q3/2011: http://ark.intel.com/de/products/52276/Intel-Xeon-Processor-E3-1270-8M-Cache-3_40-GHz

Not sure what I read last night, but I swear I saw a different processor last night. /shrug

But looking at the meta_used and meta_limit, you need more ARC, period. The fact you are using dedup is not helping your cause in the slightest either.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
So... As @DrKK suggested, i did a long SMART-selftest last night and it completed without errors on all disks.

@HoneyBadger : I was a bit puzzled last night. Referenced: 715GB can't be true. I only have about 138GB of MP3s. So, I looked at all the datasets and ... voilá: another dataset (used for backups) was deduped, too. I did never intend to set that, obviously, I misclicked. But, since it's just a backup, I simply deleted all the data and all the snapshots. not, zpool status shows "no DDT entries". This could also explain, why I got really bad behavior on my backup-dataset. Maybe, this solved another problem as well.

I just started another scrub and will update this thread, when it completes.
 

Blues Guy

Explorer
Joined
Dec 1, 2014
Messages
69
So... As @DrKK suggested, i did a long SMART-selftest last night and it completed without errors on all disks.

@HoneyBadger : I was a bit puzzled last night. Referenced: 715GB can't be true. I only have about 138GB of MP3s. So, I looked at all the datasets and ... voilá: another dataset (used for backups) was deduped, too. I did never intend to set that, obviously, I misclicked. But, since it's just a backup, I simply deleted all the data and all the snapshots. not, zpool status shows "no DDT entries". This could also explain, why I got really bad behavior on my backup-dataset. Maybe, this solved another problem as well.

I just started another scrub and will update this thread, when it completes.
 
Status
Not open for further replies.
Top