Scrubing a RAIDZ1 - over 100%

Status
Not open for further replies.

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
Hi all

With the success and reliability of two boxes running FreeNAS for a few years (4 or 5) on Atom based PC's with 2GB RAM, I decided to build another box based on another Small Form Factor, and got an HP N54L 2.2Ghz Turion CPU, with 16GB RAM and 5 x 3 TB Seagate Barracuda 7200 drives.

Assembled the machine and installed FreeNAS 9.2.17 and created a RAIDZ1 and a single Pool, about a month ago, accepting all the defaults.

I then began transferring some large ISO's (DVD's) from one of the other machines, using rsync, and the machine has about 3 TB used.

Last week the machine became sluggish and hanged for extended periods (minutes), causing extremely slow transfer rates.

I decided to shutdown and investigate.

So first thing was to run a scrub from the command line. I'm surprised because it began scrubbing and the ETA to complete (zpool status POOL) was more than 190 hours.....

It kept giving that estimate until about 10% of progress and then it got some turbo speed and went from 10% completion to 95% completion in less than 24h, when it began to slow down again dramatically, and now the status is pictured below.
20141007_120734_Rua%2BTeófilo%2BBraga.jpg


As you can see, it doesn't even try to do an estimate to completion (scan is slow, no estimated time) and the percentage done is already above 100% !!!!

Any ideas of what might be happening here ??? I'll let it go until the scrub stops but would like to get some clues to prevent if I did something wrong.

TIA
 

enemy85

Guru
Joined
Jun 10, 2011
Messages
757
Have you checked your disks SMART attributes? My guess is that you have some kind of problem with a disk
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
Thanks,

no problems on that area. They all came "clean"

and the scrubbing continues "happily", now at 107% with no estimate to complete :)
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
are any other apps running on the box when these hangs happen? (You mention rsync, so that could be one)

I don't know if is available, but you could try "iostat -t da -x 1" on the command line and see if any particular drive is showing numbers far higher than the others. If so, that could be your bad drive/cable.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
You say you moved over 3TB, but scrub shows 8.4TB. How many snapshots do you have and how frequent?
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
Thanks.

yes its available and the numbers are all very similar between all the drives.


are any other apps running on the box when these hangs happen? (You mention rsync, so that could be one)

I don't know if is available, but you could try "iostat -t da -x 1" on the command line and see if any particular drive is showing numbers far higher than the others. If so, that could be your bad drive/cable.
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
I hadn't enabled snapshots (yet)

When I built the RAIDZ1 with all the 5 disks (3 billion bytes each) the wizard gave a number that I don't recall but if converted to 1024 base instead of 1000 base capacity might be around 10TB, so when I looked at this number doesn't seem odd to me, because I assumed free space (compression was default on) ??

You say you moved over 3TB, but scrub shows 8.4TB. How many snapshots do you have and how frequent?
You say you moved over 3TB, but scrub shows 8.4TB. How many snapshots do you have and how frequent?
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
ok

something weird.

while the scrub continues happily at 108%....

ran spool get all

and some weird results:

DATA dedupditto 0 default
DATA dedupratio 1.30x

Let me see what I get from this: Dedup is not enabled (first line) but dedup is working and has a 1.3 ratio ??? (second line)

what's wrong with this assumption ?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
What does zfs list show you? I think your pool is either full or getting close.
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
Weird thing I noticed now running zpool get all

DATA size 13.6T
DATA capacity 61%

DATA free 5.24T
DATA allocated 8.39T


Just rechecked the original directory on the old FreeNAS machine that was copied to this machine and it reports 2.85T.

I'm lost :)





You say you moved over 3TB, but scrub shows 8.4TB. How many snapshots do you have and how frequent?
 
Last edited:

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
It does report similar numbers like zpool and those are strange, because I'm sure I didn't copy that much data to this machine

It's now reporting 8.46T used and 3.99T available

What does zfs list show you? I think your pool is either full or getting close.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
Is the memory factory installed? Could you be at borderline on the power supply rating?
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
No, this is the usual memory upgrade people do on this HP N54L with Kingston memory View: http://www.amazon.co.uk/gp/product/B0064R7LH8/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1


Before installing FreeNas I played around with the machine using other OS's and neither of the others complained, or freenas :)

About power, From a quick calculation there's still some headroom regarding that aspect and remember the machine has been working for over a month with no issues and nothing has changed (apart from a lot of bytes on the disks) hardware related. So I would rule that out.

Is the memory factory installed? Could you be at borderline on the power supply rating?
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
If dedup is on, that could be your problem. Once the tables required for dedup start to take over your RAM, things go bad quickly.

zfs get dedup DATA

Would also be good to find out what the extra data is.
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
I was sure I didn't had turned dedup on, and

NAME=DATA
PROPERTY= dedup
VALUE=off
SOURCE=LOCAL




If dedup is on, that could be your problem. Once the tables required for dedup start to take over your RAM, things go bad quickly.

zfs get dedup DATA

Would also be good to find out what the extra data is.
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
Now it's getting ridiculous as Scrub reached 110%.

I'm stopping the Scrub, destroy the dataset, and rerun the scrub again, to check for some mysterious influence of cosmic rays. If nothing weird happens just reinstall everything from scratch and recreate the pools, copy the data back, and hope for the best....


:)


btw: any idea how long does a zfs destroy -Rv DATA/Dataset could take if the Dataset is 2.9T ??
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I wouldn't do anything destructive yet. You do need to find the other source of data. If it isn't in that dataset (2.9T), then it is still going to be there after you destroy the data you actually want.

Can you do zfs list -r DATA
 

luisvale

Dabbler
Joined
Mar 1, 2012
Messages
32
It displays as expected. Inside the Dataset the directories for each dvd iso.

Now waiting for the scrub to stop as I did zpool scrub -s DATA........ sure hope it doesn't take as long as the scrub itself......

I wouldn't do anything destructive yet. You do need to find the other source of data. If it isn't in that dataset (2.9T), then it is still going to be there after you destroy the data you actually want.

Can you do zfs list -r DATA
 
Status
Not open for further replies.
Top