Is this a bad sign: smartd: 1 Currently unreadable (pending) sectors....?

Status
Not open for further replies.

Stephens

Patron
Joined
Jun 19, 2012
Messages
496
Low level formatting never referred to wiping out logical sectors on a drive. Low level formatting always referred to reinitializing whole tracks. In the old days, software took care of this during formatting. As drive technology changed, the drives came with tracks already preformatted. In fact, a "long" format under Windows really just became a process of reading every sector on the pre-formatted disk to make sure it was readable and marking it bad if it couldn't be (underlying drive electronics might do other things such as remapping, though). Drive manufacturers got tired of trying to fit things into the track/head/sector model (well, cylinder/head/sector if you want to get technical) for addressing absolute sectors and went to logical blocks. And remappings. And all kinds of hokery. The bottom line on that is each drive needed special tools to do a real low level format (rewriting whole tracks, not just logical sector data).

What you refer to as writing zeroes to a whole drive has always been called wiping the drive as far as I know.

I'm not into nitpicking people to death over mostly harmless semantic mistakes, but there really is a difference between a low level format and a wipe. The low level format is completely irrespective of operating system and file system (NTFS, FAT, UFS, ZFS, etc.). Your wipe process (writing zeroes) will just delete all the sector level information from the drive, not the low level information. Master Boot Records, Partition Tables, directories, ... all these are logical constructs laid over lower level tracks.

The wipe does at least two things:
1) removes all identifying information from the logical sectors (but not things stored in the controller or track/sector headers).
2) refreshes the magnetic signal of all sectors.

A low level format adds the benefit of refreshing all the track and sector header information that normally isn't visible, but can also fade (it's all magnetic). I don't have the time to look it up right now, but let's just say a track looks like this logically:

Code:
[TRACK1_HEADER]
 [SECTOR1_HEADER]
  [SECTOR1_DATA (512 bytes)]
 [SECTOR2_HEADER]
  [SECTOR2_DATA (512 bytes)]
  :
 [SECTORn_HEADER]
  [SECTORn_DATA (512 bytes)]


As I said, that's a quick summary from memory (it's been many years since I dealt with this at a low level). It's not exact, but should make the point. NOTE: "Advanced Format" drives literally created fewer sectors on each track by making each sector 4096 bytes instead of 512 bytes, which obviously required less header information, giving more space to data. Anyway, a wipe ONLY accesses the SECTOR*_DATA area.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Just to make sure; I never ran or did the following:

- Created ZFS Dataset
- Created a Snapshot

I don't know if these are important to do them before taking the drive offline as mentioned?
I don't know what that means. Regardless, it's not relevant for offlining the drive.

I think (guessing here) I can select 'Online' or some similar option then right?

And this 'Resilvering' will it happen by itself or do I need to do anything else?
Actually, if you are zeroing out the whole drive you will need to replace the "old" drive with the "new", now blank drive. Step 5 becomes replace vs online. The resilvering happens automatically from that point either way.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
I don't know what that means. Regardless, it's not relevant for offlining the drive.

Okay just making sure, so I don't lose anything information and such! Thanks for confirming.

Actually, if you are zeroing out the whole drive you will need to replace the "old" drive with the "new", now blank drive. Step 5 becomes replace vs online. The resilvering happens automatically from that point either way.

Okay thanks for the explanation. As mentioned above, I was guessing this. Replace actually indeed makes more sense. I will be doing this tonight.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Oops... I forgot a question to ask... Can I still use my NAS, while I am wiping the removed drive, or is it 'dangerous'...? In using I mean streaming a movie or something.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
You still have an additional parity drive. Feel free to stream away.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Aaaaargh.... I did the wipe / erase thing by Western Digital. Took something over 6 hours and now I re-added the drive to my FreeNAS and checked replace.

At first I was getting this:

Nov 29 07:54:44 freenas notifier: 1+0 records out
Nov 29 07:54:44 freenas notifier: 1048576 bytes transferred in 0.003688 secs (284313563 bytes/sec)
Nov 29 07:54:44 freenas notifier: dd: /dev/ada2: short write on character device
Nov 29 07:54:44 freenas notifier: dd: /dev/ada2: end of device
Nov 29 07:54:44 freenas notifier: 5+0 records in
Nov 29 07:54:44 freenas notifier: 4+1 records out
Nov 29 07:54:44 freenas notifier: 4284416 bytes transferred in 0.080224 secs (53405639 bytes/sec)

Followed by this "nice" message:

Nov 29 07:54:45 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "cannot replace 6201553240551106299 with gptid/a89a045a-39f1-11e2-9242-00151736994a: device is too small, "]

What should I do now? :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Is it the same size drive? ie.. is it a 2TB if your other drives are 2TB?

I have seen where some drives are like 100MB smaller just because the drive has a different firmware from another.

I'm not sure how to solve your problem. I'm sure someone else will have some trick to try from the command line.

You did reboot after completing the wipe, right? Normally I wouldn't try a reboot, but when I have any kind of odd behavior I always try a wipe. It can save me alot of anger and an "I look like a fool for not trying a reboot" in the forum :P
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Yes it the same disk (as mentioned in the first post), but was having problems, therefor I wiped the harddisk. But now it says it's now big enough.

//edit 1

I also noticed I am still getting the following errors:

Device: /dev/ada2, 65535 Currently unreadable (pending) sectors

I am done with this crappy drive. I am gonna order a new one and RMA this one. Wasted 6 hours on this. Sigh.

//edit 2

Just to make sure it's the harddisk (because WD-tool said the harddisk is in good condition) I will exchange sata cables as well.
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
I also noticed I am still getting the following errors:

Device: /dev/ada2, 65535 Currently unreadable (pending) sectors

I am done with this crappy drive. I am gonna order a new one and RMA this one. Wasted 6 hours on this. Sigh.
The is a sign of a failing drive. That is a very large jump from 1 & then only 2 unreadable sectors. Definitely RMA. Ideally you would have a cold spare on hand, but you will now after this. The six hours confirmed you should RMA the drive.

//edit 2

Just to make sure it's the harddisk (because WD-tool said the harddisk is in good condition) I will exchange sata cables as well.
It's not the SATA cable. The drive is internally reporting the pending sectors.

You are better off paying attention to the SMART attributes than overall SMART health or even the WD-tool. Vendors set very permissive SMART thresholds to not deal with a lot of RMAs.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Okay thanks palaeoN!

Highly appreciate your answers. I also thought this drive would be best of to RMA it. I already started the procedure with WD. Luckily their RMA works fast and good. So I am hoping I will receive my new drive soon.

Should I test the new drive also by writing 010101's with the WD software?
 

uutzinger

Dabbler
Joined
Nov 27, 2011
Messages
43
The is a sign of a failing drive. That is a very large jump from 1 & then only 2 unreadable sectors. Definitely RMA. Ideally you would have a cold spare on hand, but you will now after this. The six hours confirmed you should RMA the drive.

It's not the SATA cable. The drive is internally reporting the pending sectors.

You are better off paying attention to the SMART attributes than overall SMART health or even the WD-tool. Vendors set very permissive SMART thresholds to not deal with a lot of RMAs.


I read through all posts in this thread and its very reassuring that the FreeNAS community is on top of problems.
I have exactly same issue with one drive and indeed I can not read the sector from a drive that reports a read failure after a long self test (dd if=/dev/xx... sector=...).

What confuses me are the SMART Data Structure values which on my disk look almost same as the ones posted earlier. I have value 0 for read_error_rate, same for reallocated_sector_ct, same for reallocated_event_count and a 1 in pending_sector_count.

I understand those values depend on manufacturer but what I am curious is why the drive when identifying a read error with self test does not attempt to relocate the sector and the user would need to attempt writing to the sector to initiate such process.

From what I am hearing (problems persisted), I interpret best approach is to replace the drive if the pending sector errors occur. Or is there really a way to repair the drive by forcing relocation and assuming this was isolated incidence not reflecting overall problem with the drive?

Urs
 

paleoN

Wizard
Joined
Apr 22, 2012
Messages
1,403
Should I test the new drive also by writing 010101's with the WD software?
It won't tell you as much as you would like, but it's certainly better than nothing. Compare all SMART attributes beforehand and afterwards.



I understand those values depend on manufacturer but what I am curious is why the drive when identifying a read error with self test does not attempt to relocate the sector and the user would need to attempt writing to the sector to initiate such process.
Because it can't read the sector. There is a chance, though unlikely, that the drive will be able to successfully read it in the future. When writing it knows what it's writing and can spare the sector out if needed.

From what I am hearing (problems persisted), I interpret best approach is to replace the drive if the pending sector errors occur. Or is there really a way to repair the drive by forcing relocation and assuming this was isolated incidence not reflecting overall problem with the drive?
Change it to "repair" and the answer is yes. A small, non-increasing number of bad sectors is considered relatively "normal" nowadays. Drives come with a number of spare sectors for this. You want to watch out for steadily increasing or significant jumps in the sector error counts. Either is often a sign of impending drive failure.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
I just received my replacement drive, however after adding the drive, it says it's to small?

Here is the message:

Dec 3 17:02:15 freenas manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "cannot replace 6201553240551106299 with gptid/ce4ea5df-3d62-11e2-b01e-00151736994a: device is too small, "]

Now what can I do...? :(
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Have you rebooted the server? If you have and it still won't work I don't know what to say.

I know years ago when we had 18GB 10k RPM hard drives in a RAID5 we RMA'd a bad drive, and the replacement was the same model but different firmware. We couldn't use the replacement because it was 13MB smaller than the other drives! We contacted Seagate and they told us that the drive shrunk slightly due to new firmware blah blah blah and that we should purchase one of their larger 10kRPM drives.. how nice. I noticed it was odd when the RAID controller came up and it listed all of the drive with the same size except one. One was slightly smaller.
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
If the reboot doesn't work (and I don't think it will), then you'll need to find another drive to use as a replacement. I have seen some recent comments on Google from people with WD drives having the exact same problem that you have for the reason noobsauce80 mentions. You can either find a drive of the exact same model number as you currently have in the drives working in the array, or one that's larger.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Thanks guys for the answers.

Yes I did reboot the NAS; twice already. No luck. Keeps complaining that the drive is to small.
On a sidenote; it said the same about the original drive after I wiped it. It also said the drive was to small.

Is there no other solution? And how can I check the drives in FreeNAS (or use FreeBSD commands) to check the size of each drive exactly (so I can compare them).

And is it possible to make the other drives smaller (without losing data)? I have enough space available. So losing a few GB's is not a problem for me if that can solve the problem.

Also, as mentioned above, I had the same problem with the 'bad' drive. It also said it was to small after trying to add it back to the ZFS. And just to be sure; I put back the drive and select 'Replace' in 'View Volume' > 'Volume Status', right?

And as a last resort, how do I replace a drive from the command line? Maybe FreeNAS is a bit bugged (using the latest version though).
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Is there no other solution? And how can I check the drives in FreeNAS (or use FreeBSD commands) to check the size of each drive exactly (so I can compare them).
See if smartctl -a [path to the drive] gives you the answer. "User Capacity" lists the bytes available.

From And is it possible to make the other drives smaller (without losing data)? I have enough space available. So losing a few GB's is not a problem for me if that can solve the problem.
From what I can tell, no that's not possible once the pool has been created. Oracle/Sun mentions this could be an issue on one of their blogs and their solution is to use a larger drive.

I also wouldn't expect a command line replace to give different results. The other reports I found on Google weren't from FreeNAS users, but from WD users.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Okay I just checked and compared both drives:

New drive:
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARX-008FB0
Serial Number: WD-WCAZAJ407237
LU WWN Device Id: 5 0014ee 2079f3b54
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Dec 3 19:20:45 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

And 'old' drive:
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARX-00PASB0
Serial Number: WD-WMAZA5248450
LU WWN Device Id: 5 0014ee 6569b0b81
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Mon Dec 3 19:22:07 2012 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

It's completely identical except for serial numbers, LU WWN Device ID's and device model's last part (00PASB0 vs 008FB0) however I looked that up before and platters and such are similar. :(


//update

I don't know if it's important, but I will post it anyways. When I select 'Replace' I get a new popup saying the following:

Replacing disk None
Member disk ada2 (2.0 TB)

Replace Disk | Cancel

Shouldn't it say something else rather than 'None'....?
If it doesn't know what to replace, it's obvious that the system says that the disk is to small...?
 

bollar

Patron
Joined
Oct 28, 2012
Messages
411
Try this:
Code:
On 09/05/2012 05:06 AM, Yaverot wrote:
> "What is the smallest sized drive I may use to replace this dead
drive?"
> 
> That information has to be someplace because ZFS will say that drive Q is
too small.  Is there an easy way to query that information?

I use fdisk to find this out. For instance say your drive you want to
find the size of is c2t4d0, then do:

# fdisk /dev/rdsk/c2t4d0p0

Near the top fdisk will print this kind of drive info:

             Total disk size is 9345 cylinders
             Cylinder size is 12544 (512 byte) blocks

Simply multiply the numbers and you get the result:

# echo '9345 * 12544 * 512' | bc
60018524160

And that's the size of the drive in 8-bit bytes.

Cheers,
--
Saso

zfs-discuss-finding smallest drive that can be used to replace
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Okay, I did that fdisk command. Below are the results:


******* Working on device /dev/ada2 *******
parameters extracted from in-core disklabel are:
cylinders=3876021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=3876021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
start 1, size 3907029167 (1907729 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

******* Working on device /dev/ada5 *******
parameters extracted from in-core disklabel are:
cylinders=3876021 heads=16 sectors/track=63 (1008 blks/cyl)

Figures below won't work with BIOS for partitions not in cyl 1
parameters to be used for BIOS calculations are:
cylinders=3876021 heads=16 sectors/track=63 (1008 blks/cyl)

Media sector size is 512
Warning: BIOS sector numbering starts with sector 1
Information from DOS bootblock is:
The data for partition 1 is:
sysid 238 (0xee),(EFI GPT)
start 1, size 3907029167 (1907729 Meg), flag 80 (active)
beg: cyl 0/ head 1/ sector 1;
end: cyl 1023/ head 255/ sector 63
The data for partition 2 is:
<UNUSED>
The data for partition 3 is:
<UNUSED>
The data for partition 4 is:
<UNUSED>

I don't think I need to echo and calculate anything because this shows they are completely identical?!
 
Status
Not open for further replies.
Top