I am running FreeNAS 9.1.0, I have 6 * 3tb hard drives, using RAIDZ2. The system was built a little over a month ago, all new components and hard drives.
When I log into the GUI front end in my web browser the main log (a preview is shown at the bottom of the screen) is reporting the following:
Very clear, but upsetting, since that's 4 out of 6 hard drives. I have spent today doing extra backups of my NAS, for obvious reasons. Wanting some more details, I opened the console and ran the command:
smartctrl -h /dev/ada0 | more
the interesting bits were:
Unless I am reading this totally wrong, this is saying there are zero errors, and everything is looking good. I am getting the same results for the other 5 drives in the system. I can see no useful difference between the "failing" drives and the healthy drives.
I then ran
zpool status Main-Storage
which returned:
Can anyone offer any thoughts or suggestions? Am I just misunderstanding this? The log messages are very clear, but seem to contradict the SMART test results.
In theory a scrub will be run every Saturday, as recommended here: http://doc.freenas.org/index.php/ZFS_Scrubs
but my initial settings were not quite right, so only one scrub has been run so far it seems.
If I have not provided enough details, which is likely, what information will help?
When I log into the GUI front end in my web browser the main log (a preview is shown at the bottom of the screen) is reporting the following:
Code:
Oct 6 06:32:01 freenas smartd[2337]: Device: /dev/ada1, FAILED SMART self-check. BACK UP DATA NOW! Oct 12 10:32:01 freenas smartd[2337]: Device: /dev/ada0, FAILED SMART self-check. BACK UP DATA NOW! Oct 13 08:02:02 freenas smartd[2337]: Device: /dev/ada2, FAILED SMART self-check. BACK UP DATA NOW! Oct 13 08:02:02 freenas smartd[2337]: Device: /dev/ada3, FAILED SMART self-check. BACK UP DATA NOW!
Very clear, but upsetting, since that's 4 out of 6 hard drives. I have spent today doing extra backups of my NAS, for obvious reasons. Wanting some more details, I opened the console and ran the command:
smartctrl -h /dev/ada0 | more
the interesting bits were:
Code:
=== START OF INFORMATION SECTION === Model Family: Western Digital Red (AF) Device Model: WDC WD30EFRX-68AX9N0 Serial Number: WD-WCC1T1490741 LU WWN Device Id: 5 0014ee 2b38719e7 Firmware Version: 80.00A80 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Sizes: 512 bytes logical, 4096 bytes physical Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Sun Oct 13 18:53:39 2013 PDT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED ... SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0 3 Spin_Up_Time 0x0027 179 175 021 Pre-fail Always - 6008 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 18 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 099 099 000 Old_age Always - 1067 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 18 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 8 193 Load_Cycle_Count 0x0032 200 200 000 Old_age Always - 9 194 Temperature_Celsius 0x0022 122 115 000 Old_age Always - 28 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 17 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 1038 - # 2 Extended offline Completed without error 00% 870 -
Unless I am reading this totally wrong, this is saying there are zero errors, and everything is looking good. I am getting the same results for the other 5 drives in the system. I can see no useful difference between the "failing" drives and the healthy drives.
I then ran
zpool status Main-Storage
which returned:
Code:
pool: Main-Storage state: ONLINE scan: scrub repaired 0 in 4h50m with 0 errors on Sat Oct 5 04:50:21 2013 config: NAME STATE READ WRITE CKSUM Main-Storage ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ef2bd5c3-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 gptid/efa5f830-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 gptid/f0222234-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 gptid/f09a8dc3-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 gptid/f1108fc6-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 gptid/f186fb53-1188-11e3-8467-c83a35d12cd7 ONLINE 0 0 0 errors: No known data errors
Can anyone offer any thoughts or suggestions? Am I just misunderstanding this? The log messages are very clear, but seem to contradict the SMART test results.
In theory a scrub will be run every Saturday, as recommended here: http://doc.freenas.org/index.php/ZFS_Scrubs
but my initial settings were not quite right, so only one scrub has been run so far it seems.
If I have not provided enough details, which is likely, what information will help?