I replaced the disk and I am going through resilvering.
It does not seem to stop. I use FreeNAS-8.2.0-RELEASE-p1-x64.
zpool status pool: Vol1 state: ONLINE status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scrub: resilver in progress for 47h42m, 100.00% done, 0h0m to go
I think the sysctl kern.geom.debugflags keeps a user from screwing things up by not letting them write directly to a disk in a raid. you can change it back afterwards, just google how to check current sysctl values for parameters and set it back after everything is working. it will also reset if you ever reboot as were just temporarily changing it.
I'd run a long selftest
Code:smartctl -t long /dev/ada2
check the smart information for the unreadable sector, lets call it 'X'
Code:smartctl -A /dev/ada2
change the syscontrol and try writing to the sector. Change the 'X' below
Code:sysctl kern.geom.debugflags=16 dd if=/dev/zero of=/dev/ada2 bs=4096 count=1 seek=X conv=noerror,sync
check the smart information to see if 'Current_Pending_Sector' went to 0, you may need to repeat some of the steps multiple times if there are multiple unreadable sectors..
Code:smartctl -A /dev/ada2
Now run another smart test and hopefully it can complete without error.
Code:smartctl -t long /dev/ada2 smartctl -A /dev/ada2 #check status to see if it completed
Now run a scrub (either from the gui or with 'zpool scrub poolname').
Check the scrub's status and hopefully it fixes some errors.
Code:zpool status -v poolname
[Dylan@freenas] /mnt/ToxServMain# smartctl -A /dev/ada5 smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 426 3 Spin_Up_Time 0x0027 181 178 021 Pre-fail Always - 5950 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 734 5 Reallocated_Sector_Ct 0x0033 198 198 140 Pre-fail Always - 79 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 092 092 000 Old_age Always - 6023 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 40 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 12 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 721 194 Temperature_Celsius 0x0022 119 103 000 Old_age Always - 31 196 Reallocated_Event_Count 0x0032 196 196 000 Old_age Always - 4 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 2 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 8
[Dylan@freenas] /mnt/ToxServMain# smartctl -l selftest /dev/ada5 smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build) Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org === START OF READ SMART DATA SECTION === SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 6005 - # 2 Extended offline Completed without error 00% 5909 - # 3 Extended offline Completed without error 00% 5506 -
You know we just had this discussion twice in the last week? LOL. I'll forgive you though as the search feature of the forums is fairly broken.
What I'd do is pull that drive out and run badblocks on it. It'll be time consuming and destructive to the drive's data, but that's what redundancy is for. But something like badblocks -svw -b 4096 -t 0xFF -t 0x00 -t 0xFF /dev/adaX should determine if you have any bad sectors.
If I were in your case I'd just replace the disk. Of course there's no justification for an RMA at this time since it passes all tests so things get a little complex and disk replacement is something you'll have to decide on for yourself.
I'd imagine if you choose to wait the disk will probably have problems that will warrant RMA in the next month or two at the most.
This test is primarily for testing new drives and is a read-write test. As the pattern is written to every accesible block the device effectively gets wiped. Default is an extensive test with four passes using four different patterns: 0xaa (10101010), 0x55 (01010101), 0xff (11111111) and 0x00 (00000000). For some devices this will take a couple of days to complete.
Source
Or did you mean the data in your pool? ;)
Yes, theoretically you can just run it from a live CD. Just may take a while.
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 198 136 051 Pre-fail Always - 869 3 Spin_Up_Time 0x0027 253 170 021 Pre-fail Always - 975 4 Start_Stop_Count 0x0032 099 099 000 Old_age Always - 1058 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 089 089 000 Old_age Always - 8386 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 882 192 Power-Off_Retract_Count 0x0032 199 199 000 Old_age Always - 788 193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always - 633490 194 Temperature_Celsius 0x0022 118 106 000 Old_age Always - 32 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 001 001 000 Old_age Always - 65535 198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 25
I got message "Device: /dev/ada0, 65535 Currently unreadable (pending) sectors". Does this means 65535 sectors bad??
Thanks cyberjock. So I should replace this disk immediately? It is raidz1 system with 5 disks. Is possible to convert into raidz2 directly without copying data out and later copying back?Actually, it means more than that. That log only holds 65535 entries. The drive da0 is clearly crap.
You don't need to replace the disk if you still have extra sectors.
I was able to fix this on my freenas system about a week ago.
You basically need to..
1) run a smart test and find where the test fails
2) write directly to that sector with dd(this forces the drive to relocate the secotr to one of your extra sectors)
3) run a scrub and as long as you have a clean redundant copy everything should be back to normal.
In step two you need to change a syscontrol to allow writing directly to a drive.
http://daemon-notes.com/articles/other/smartmontools/current-pending
Goodluck