ZFS - Drives go offline/removed - Cant seem to get SMART reports

Status
Not open for further replies.

BigOBX

Cadet
Joined
Jan 12, 2015
Messages
8
Il willing to post more detailed info if needed but im running:

FreeNAS-9.3-STABLE-201501090144

i have three ZFS pools, two are working fine using the on board controllers but my third larger pool is not working well. When i first get the system up and running about 3 weeks ago, the pool was working fine for a few days i actually moved a bunch of data to the array, i then moved the data away from that array. A few days later i noticed that there were some update's available so i did the updates via the gui and maybe 3-4 days after that i started having issues with the pool.

Here is the current status of the array:

[BigO@FREENAS] ~% zpool status LSI-11TB

pool: LSI-11TB

state: UNAVAIL

status: One or more devices could not be opened. There are insufficient

replicas for the pool to continue functioning.

action: Attach the missing device and online it using 'zpool online'.

see: http://illumos.org/msg/ZFS-8000-3C

scan: scrub repaired 4K in 0h0m with 0 errors on Mon Jan 12 11:38:36 2015

config:


NAME STATE READ WRITE CKSUM

LSI-11TB UNAVAIL 0 0 0

raidz1-0 ONLINE 0 0 0

gptid/2430592d-999d-11e4-9d68-001b212dea45 ONLINE 0 0 0

gptid/24cbca20-999d-11e4-9d68-001b212dea45 ONLINE 0 0 0

gptid/256fcc61-999d-11e4-9d68-001b212dea45 ONLINE 0 0 0

gptid/260e424d-999d-11e4-9d68-001b212dea45 ONLINE 0 0 0

raidz1-1 UNAVAIL 0 0 0

1084237407502822741 REMOVED 0 0 0 was /dev/gptid/26bf5793-999d-11e4-9d68-001b212dea45

gptid/276a88f2-999d-11e4-9d68-001b212dea45 ONLINE 0 0 0

14279668410874385647 UNAVAIL 0 0 0 was /dev/gptid/280fa588-999d-11e4-9d68-001b212dea45

8516515671396866123 UNAVAIL 0 0 0 was /dev/gptid/28afd0f9-999d-11e4-9d68-001b212dea45


errors: No known data errors

If i try to "online" the drives the status doesnt change:
[BigO@FREENAS] ~% sudo zpool online LSI-11TB 1084237407502822741

warning: device '1084237407502822741' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present

[BigO@FREENAS] ~% sudo zpool online LSI-11TB 14279668410874385647

warning: device '14279668410874385647' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present

[BigO@FREENAS] ~% sudo zpool online LSI-11TB 8516515671396866123

warning: device '8516515671396866123' onlined, but remains in faulted state

use 'zpool replace' to replace devices that are no longer present

I tried to set up a smart report to run every hour on these drives but im not getting any emails about it but i am getting other emails from the NAS so im at a loss there as well.

Please help me out, the array is new and there is no data on it so im not worried about that but would like to get the array working so that i can begin using it soon as my Synology is running low on space and ill need to get this going soon.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Hardware specs would help--motherboard, CPU, RAM (type and quantity), disk controller, firmware type and version on the disk controller, disks. RAIDZ1 isn't recommended; your data would be safer with an 8-disk RAIDZ2 configuration, but I don't think that's your problem right now.

What's the output of camcontrol devlist?

SMART only sends emails if there are errors, so you shouldn't expect SMART emails on a routine basis.
 

BigOBX

Cadet
Joined
Jan 12, 2015
Messages
8
MotherBoard - asus a8n-sli
cpu - opty 165
mem - 4gb kingston value ram
controller - intel sa8ci flashed with LSI IT firmware
disks - WD20green ears

[BigO@FREENAS] ~% sudo camcontrol devlist

<WDC WD1001FALS-00J7B0 05.00K05> at scbus2 target 0 lun 0 (pass0,ada0)

<WDC WD1001FALS-00J7B0 05.00K05> at scbus3 target 0 lun 0 (pass1,ada1)

<WDC WD1001FALS-00J7B0 05.00K05> at scbus4 target 0 lun 0 (pass2,ada2)

<WDC WD1001FALS-00J7B0 05.00K05> at scbus5 target 0 lun 0 (pass3,ada3)

<WDC WD10EADS-00M2B0 01.00A01> at scbus6 target 0 lun 0 (pass4,ada4)

<WDC WD10EADS-00M2B0 01.00A01> at scbus7 target 0 lun 0 (pass5,ada5)

<WDC WD10EADS-00M2B0 01.00A01> at scbus8 target 0 lun 0 (pass6,ada6)

<WDC WD10EADS-19M2B0 01.00A01> at scbus9 target 0 lun 0 (pass7,ada7)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 0 lun 0 (pass8,da0)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 1 lun 0 (pass9,da1)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 2 lun 0 (pass10,da2)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 3 lun 0 (pass11,da3)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 4 lun 0 (pass14,da4)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 5 lun 0 (da5,pass15)

<ATA WDC WD20EARS-22M AB51> at scbus10 target 6 lun 0 (pass13,da6)

<ATA WDC WD20EARS-00M AB50> at scbus10 target 7 lun 0 (pass12,da7)

< 8.07> at scbus12 target 0 lun 0 (pass16,da8)
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
Well, your hardware is suboptimal in at least a few ways: (1) AMD processors aren't generally recommended with FreeBSD, (2) desktop-grade motherboard, (3) non-ECC RAM, and (4) only half the minimum amount of RAM. I don't really expect that any of these is what's causing your problem, but none of them is good, and we've seen lots of pools lost when users don't have enough RAM. What version of the IT firmware are you using?

Tell us a bit more about your disk arrangement. Your zpool status indicates 8 disks, in two four-disk RAIDZ1 vdevs. Your camcontrol devlist lists 16 disks. Which 8 are in use in your pool, and what are you doing with the other 8? How are all of them connected?

What's the output (in code tags) of "smartctl -a /dev/ada0"? What about "smartctl -a /dev/da0"?
 

BigOBX

Cadet
Joined
Jan 12, 2015
Messages
8
Yeah the hardware isnt the best but ran the old version of freenas before the fork pretty well. I know this version is not the same but dont really have the funds for a whole new build so i had to work with what i had laying around.

i dont recall the firmware version right now but cant get that info a bit later as im going to be stepping out soon and need to bounce the box to get that info. It could potentially be out of date as i flashed it about 2 years ago and its been that way since.

The wd20eads disks are the ones that are part of the array in question, the other 8 disks are for my other arrays 4*4 and are working correctly.

Code:
=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Black

Device Model:     WDC WD1001FALS-00J7B0

Serial Number:    WD-WMATV0725302

LU WWN Device Id: 5 0014ee 0abbf14cd

Firmware Version: 05.00K05

User Capacity:    1,000,204,886,016 bytes [1.00 TB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.5, 3.0 Gb/s

Local Time is:    Mon Jan 12 12:51:19 2015 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x84)Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19800) seconds.

Offline data collection

capabilities:  (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003)Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01)Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  ( 228) minutes.

Conveyance self-test routine

recommended polling time:  (   5) minutes.

SCT capabilities:        (0x303f)SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   230   217   021    Pre-fail  Always       -       8483

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       545

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31637

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       543

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       540

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       545

194 Temperature_Celsius     0x0022   118   079   000    Old_age   Always       -       32

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1

Warning: ATA error count 80 inconsistent with error log pointer 1


ATA Error Count: 80 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.


Error 80 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:59.415  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:57.076  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA


Error 79 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:57.076  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA


Error 78 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA


Error 77 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 80 00 00 00 02   1d+01:14:50.070  READ DMA


Error 76 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 80 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:47.898  READ DMA


SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Green (AF)

Device Model:     WDC WD20EARS-22MVWB0

Serial Number:    WD-WCAZA3952563

LU WWN Device Id: 5 0014ee 25ad36a48

Firmware Version: 51.0AB51

User Capacity:    2,000,398,934,016 bytes [2.00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.6, 3.0 Gb/s

Local Time is:    Mon Jan 12 12:53:55 2015 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x85)Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 245)Self-test routine in progress...

50% of test remaining.

Total time to complete Offline 

data collection: (35460) seconds.

Offline data collection

capabilities:  (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003)Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01)Error logging supported.

General Purpose Logging supported.

Short self-test routine 

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  ( 342) minutes.

Conveyance self-test routine

recommended polling time:  (   5) minutes.

SCT capabilities:        (0x3035)SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1233

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       152

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       15973

 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       150

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       149

193 Load_Cycle_Count        0x0032   133   133   000    Old_age   Always       -       201230

194 Temperature_Celsius     0x0022   120   113   000    Old_age   Always       -       30

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1

No Errors Logged


SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     15970         -

# 2  Extended offline    Completed without error       00%     15964         -

# 3  Extended offline    Completed without error       00%     15958         -

# 4  Extended offline    Completed without error       00%     15952         -

# 5  Extended offline    Interrupted (host reset)      20%     15945         -

# 6  Extended offline    Completed without error       00%     15941         -

# 7  Extended offline    Interrupted (host reset)      90%     15934         -

# 8  Extended offline    Completed without error       00%     15930         -

# 9  Extended offline    Completed without error       00%     15906         -

#10  Extended offline    Completed without error       00%     15882         -

#11  Extended offline    Completed without error       00%     15858         -

#12  Extended offline    Completed without error       00%     15835         -

#13  Extended offline    Completed without error       00%     15811         -

#14  Extended offline    Completed without error       00%     15787         -

#15  Extended offline    Completed without error       00%     15763         -

#16  Extended offline    Interrupted (host reset)      60%     15735         -

#17  Extended offline    Completed without error       00%     15715         -

#18  Extended offline    Completed without error       00%     15691         -


SMART Selective self-test log data structure revision number 1

 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Yeah the hardware isnt the best but ran the old version of freenas before the fork pretty well. I know this version is not the same but dont really have the funds for a whole new build so i had to work with what i had laying around.

i dont recall the firmware version right now but cant get that info a bit later as im going to be stepping out soon and need to bounce the box to get that info. It could potentially be out of date as i flashed it about 2 years ago and its been that way since.

The wd20eads disks are the ones that are part of the array in question, the other 8 disks are for my other arrays 4*4 and are working correctly.

Code:
=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Black

Device Model:     WDC WD1001FALS-00J7B0

Serial Number:    WD-WMATV0725302

LU WWN Device Id: 5 0014ee 0abbf14cd

Firmware Version: 05.00K05

User Capacity:    1,000,204,886,016 bytes [1.00 TB]

Sector Size:      512 bytes logical/physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.5, 3.0 Gb/s

Local Time is:    Mon Jan 12 12:51:19 2015 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x84)Offline data collection activity

was suspended by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      (   0)The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19800) seconds.

Offline data collection

capabilities:  (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003)Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01)Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  ( 228) minutes.

Conveyance self-test routine

recommended polling time:  (   5) minutes.

SCT capabilities:        (0x303f)SCT Status supported.

SCT Error Recovery Control supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   230   217   021    Pre-fail  Always       -       8483

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       545

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   057   057   000    Old_age   Always       -       31637

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       543

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       540

193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       545

194 Temperature_Celsius     0x0022   118   079   000    Old_age   Always       -       32

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1

Warning: ATA error count 80 inconsistent with error log pointer 1


ATA Error Count: 80 (device log contains only the most recent five errors)

CR = Command Register [HEX]

FR = Features Register [HEX]

SC = Sector Count Register [HEX]

SN = Sector Number Register [HEX]

CL = Cylinder Low Register [HEX]

CH = Cylinder High Register [HEX]

DH = Device/Head Register [HEX]

DC = Device Command Register [HEX]

ER = Error register [HEX]

ST = Status register [HEX]

Powered_Up_Time is measured from power on, and printed as

DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,

SS=sec, and sss=millisec. It "wraps" after 49.710 days.


Error 80 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:59.415  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:57.076  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA


Error 79 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:57.076  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA


Error 78 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:54.736  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA


Error 77 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:52.243  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 80 00 00 00 02   1d+01:14:50.070  READ DMA


Error 76 occurred at disk power-on lifetime: 31091 hours (1295 days + 11 hours)

  When the command that caused the error occurred, the device was active or idle.


  After command completion occurred, registers were:

  ER ST SC SN CL CH DH

  -- -- -- -- -- -- --

  40 51 00 0b 02 00 e0  Error: UNC at LBA = 0x0000020b = 523


  Commands leading to the command that caused the error were:

  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name

  -- -- -- -- -- -- -- --  ----------------  --------------------

  c8 00 10 00 02 00 00 02   1d+01:14:50.071  READ DMA

  c8 00 10 00 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 10 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 80 00 00 00 02   1d+01:14:50.070  READ DMA

  c8 00 10 00 02 00 00 02   1d+01:14:47.898  READ DMA


SMART Self-test log structure revision number 1

No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.


Code:
=== START OF INFORMATION SECTION ===

Model Family:     Western Digital Caviar Green (AF)

Device Model:     WDC WD20EARS-22MVWB0

Serial Number:    WD-WCAZA3952563

LU WWN Device Id: 5 0014ee 25ad36a48

Firmware Version: 51.0AB51

User Capacity:    2,000,398,934,016 bytes [2.00 TB]

Sector Sizes:     512 bytes logical, 4096 bytes physical

Device is:        In smartctl database [for details use: -P show]

ATA Version is:   ATA8-ACS (minor revision not indicated)

SATA Version is:  SATA 2.6, 3.0 Gb/s

Local Time is:    Mon Jan 12 12:53:55 2015 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled


=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED


General SMART Values:

Offline data collection status:  (0x85)Offline data collection activity

was aborted by an interrupting command from host.

Auto Offline Data Collection: Enabled.

Self-test execution status:      ( 245)Self-test routine in progress...

50% of test remaining.

Total time to complete Offline

data collection: (35460) seconds.

Offline data collection

capabilities:  (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities:            (0x0003)Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability:        (0x01)Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time:  (   2) minutes.

Extended self-test routine

recommended polling time:  ( 342) minutes.

Conveyance self-test routine

recommended polling time:  (   5) minutes.

SCT capabilities:        (0x3035)SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.


SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE

  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0

  3 Spin_Up_Time            0x0027   253   253   021    Pre-fail  Always       -       1233

  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       152

  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0

  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0

  9 Power_On_Hours          0x0032   079   079   000    Old_age   Always       -       15973

10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0

11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0

12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       150

192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       149

193 Load_Cycle_Count        0x0032   133   133   000    Old_age   Always       -       201230

194 Temperature_Celsius     0x0022   120   113   000    Old_age   Always       -       30

196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0

198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0

199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0


SMART Error Log Version: 1

No Errors Logged


SMART Self-test log structure revision number 1

Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error

# 1  Extended offline    Completed without error       00%     15970         -

# 2  Extended offline    Completed without error       00%     15964         -

# 3  Extended offline    Completed without error       00%     15958         -

# 4  Extended offline    Completed without error       00%     15952         -

# 5  Extended offline    Interrupted (host reset)      20%     15945         -

# 6  Extended offline    Completed without error       00%     15941         -

# 7  Extended offline    Interrupted (host reset)      90%     15934         -

# 8  Extended offline    Completed without error       00%     15930         -

# 9  Extended offline    Completed without error       00%     15906         -

#10  Extended offline    Completed without error       00%     15882         -

#11  Extended offline    Completed without error       00%     15858         -

#12  Extended offline    Completed without error       00%     15835         -

#13  Extended offline    Completed without error       00%     15811         -

#14  Extended offline    Completed without error       00%     15787         -

#15  Extended offline    Completed without error       00%     15763         -

#16  Extended offline    Interrupted (host reset)      60%     15735         -

#17  Extended offline    Completed without error       00%     15715         -

#18  Extended offline    Completed without error       00%     15691         -


SMART Selective self-test log data structure revision number 1

SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS

    1        0        0  Not_testing

    2        0        0  Not_testing

    3        0        0  Not_testing

    4        0        0  Not_testing

    5        0        0  Not_testing

Selective self-test flags (0x0):

  After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.
lol you don't seem to get it. Your hardware doesn't even come close to meeting the minimum specifications. Because of this the people who could possibly help you probably will not. Since this appears to be something you are testing out take this experience as a successful test. You now know that you should have read the hardware stickies and followed the guidelines.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
The SMART test for da0 looks fine, except that your load cycle count is very high--look into WDIDLE3.exe to fix your drive so this doesn't increase so quickly. The LSI firmware should be at P16, which is a few versions behind the current. You're probably OK on this; 9.3 gives you a warning in the web GUI if the version doesn't match.

You've set up SMART to run tests way more often than necessary--a short test daily, and long test every week or two (or even every month) is plenty. Again, I don't think this is what's causing your problem, but it isn't doing any good, either.

I don't have any other input into what's causing your problem at the moment, but 9.3 requires 8 GB RAM, minimum. You have half that amount. You shouldn't be surprised if things aren't working as expected.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
lol you don't seem to get it. Your hardware doesn't even come close to meeting the minimum specifications. Because of this the people who could possibly help you probably will not. Since this appears to be something you are testing out take this experience as a successful test. You now know that you should have read the hardware stickies and followed the guidelines.

+1
 

BigOBX

Cadet
Joined
Jan 12, 2015
Messages
8
lol you don't seem to get it. Your hardware doesn't even come close to meeting the minimum specifications. Because of this the people who could possibly help you probably will not. Since this appears to be something you are testing out take this experience as a successful test. You now know that you should have read the hardware stickies and followed the guidelines.

I have to admit that i didn't read the Min specs needed like i normally would but didn't think i was that far off due to it being NAS software that is normally light on the min specs but like i said, im currently low on funds and building out a new machine is not something i can swing right now.

Once i get back to work, I will look into getting something to spec built out but for now, i could use any help i can get with the current hardware i have in place so any suggestions or help would be greatly appreciated.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
Honestly, at this point, it could easily be a corrupted pool due to non-ECC RAM, insufficient RAM or some wacky interaction between the HBA and the motherboard.
 

Glorious1

Guru
Joined
Nov 23, 2014
Messages
1,211
I just had the same problem where two drives disappeared from one RAIDZ1 pool at the same time. I got about 300 emails about failed replication until I got home (does it have to keep trying every minute and send emails every time?). Disk status said they had been removed. I shut it down, pulled and reinserted the drives in that pool, and all was fine when it restarted.

Since I've also been getting occasional (every few days) system crashes and watchdog resets, I vaguely suspect it may be some problem with the HBA (the disappeared drives were on that LSI card) as Ericloewe suggested. But I have no idea how to diagnose it.

Correction - the drives that bugged out were on the motherboard SATA ports.
 
Last edited:

BigOBX

Cadet
Joined
Jan 12, 2015
Messages
8
So i went ahead and read cybers powerpoint (pdf) version and i didn't realize how far from spec i really am. At this point I think its prob best i move back to the nas4free fork that this hardware was running at one point since it ran well under that OS and is closer to the supported spec.

Hopefull when i get back to working ill be able to get the needed hardware for a real freenas rig. The old version was good but this one really seems to be awesome so I will likely come back to it when i can build the proper machine for it.

Other than nas4free, is there any other options for my type of hardware?
 
Status
Not open for further replies.
Top