Frequent cksum errors on three drives

Status
Not open for further replies.

sbolte

Cadet
Joined
Feb 6, 2016
Messages
1
I regularly have checksum errors on three separate drives. Furthermore, when it happens in the last year, it usually happens to all three drives at once. I would suspect the motherboard, but for two years before that it was a single drive that was eventually replaced. In any case, I'm puzzled why I don't see read or write errors too. I'd like help interpreting my smartctl reports, and suggestions on how to pin down the guilty hardware.

The system is nearly three years old. From the start, it had two Seagate Desktop drives and one Hitachi Deskstar; all 4TB in raidz1. For the first two years one of the Seagate drives kept failing checksum tests. I swapped SATA cables and power cables, disabled power-down and acoustic management, and tried everything else I could think of. Nothing made a difference. Eventually Seagate simply replaced it.

Since that replacement last February, all three drives regularly fail checksum tests. Of the 19 times that has happened, over half the time all three drives faulted together. There has never been data loss. I've always been able to run (zpool clear).

There is also a WD 6TB drive and a 60GB SSD that were added just two months ago.

FYI, I have smartctl running long tests daily on the three original drives and don't recall every being notified of a problem. I just added a long test task for the new drive too.

Below you'll find my system summary, today's daily report, and smartctl results. Let me know if there is anything else I can provide.

Thank you,

Scott

System summary:
Code:
    FreeNAS-9.3-STABLE-201602020212
    Intel Core i3-2120T CPU @ 2.60GHz (2594.17-MHz K8-class CPU)
    ASRock H77 Pro4-M P1.60 motherboard
    2 x 8GB memory
    Fractal Design Define R4 tower
    raidz1 (ada0, ada1, ada2)
        one 4TB Seagate Desktop, model ST4000DM000-1F2168 CC51
        one 4TB Seagate Desktop, model ST4000DM000-1F2168 CC54
        one 4TB Hitachi Deskstar 5K4000, model HDS5C4040ALE630
      log (ada4)
        one KINGSTON SSDNow V300 Series 2.5" 60GB SATA III, model SV300S37A60G 603ABBF0

    stripe (ada3)
        one 6TB WDC WD60EZRX-00MVLB1


This morning's failure email:
Code:
Checking status of zfs pools:
NAME  SIZE  ALLOC  FREE  EXPANDSZ  FRAG  CAP  DEDUP  HEALTH  ALTROOT
freenas-boot  14.9G  10.4G  4.43G  -  -  70%  1.00x  ONLINE  -
kithrup  10.9T  8.31T  2.57T  -  18%  76%  1.00x  ONLINE  /mnt
media  5.44T  1.74T  3.70T  -  9%  31%  1.00x  ONLINE  /mnt

 pool: kithrup
state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
 see: http://illumos.org/msg/ZFS-8000-9P
 scan: scrub repaired 576K in 22h57m with 0 errors on Sat Feb  6 22:57:27 2016
config:

    NAME  STATE  READ WRITE CKSUM
    kithrup  ONLINE  0  0  0
     raidz1-0  ONLINE  0  0  0
     gptid/05753838-9f8a-11e2-b6a3-bc5ff4655d56  ONLINE  0  0  4
     gptid/0610902b-9f8a-11e2-b6a3-bc5ff4655d56  ONLINE  0  0  3
     gptid/96ff8d43-ae60-11e4-b757-bc5ff4655d56  ONLINE  0  0  2
    logs
     gptid/fc762188-b09a-11e5-9b99-bc5ff4655d56  ONLINE  0  0  0

errors: No known data errors

-- End of daily output --


(smartctl -a /dev/ada0) -- this is one of the original drives
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    Z300ALSE
LU WWN Device Id: 5 000c50 05037b7e1
Firmware Version: CC51
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Feb  7 09:53:02 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  633) seconds.
Offline data collection
capabilities:             (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 541) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x1085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       188262288
  3 Spin_Up_Time            0x0003   091   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   037   037   020    Old_age   Always       -       65535
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   088   060   030    Pre-fail  Always       -       770163158
  9 Power_On_Hours          0x0032   072   072   000    Old_age   Always       -       24861
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       126
183 Runtime_Bad_Block       0x0032   099   099   000    Old_age   Always       -       1
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 1
189 High_Fly_Writes         0x003a   097   097   000    Old_age   Always       -       3
190 Airflow_Temperature_Cel 0x0022   071   050   045    Old_age   Always       -       29 (2 203 31 25 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       47
193 Load_Cycle_Count        0x0032   001   001   000    Old_age   Always       -       253798
194 Temperature_Celsius     0x0022   029   050   000    Old_age   Always       -       29 (128 0 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       17361h+23m+36.620s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       25532024339
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       484487878126

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     23138         -
# 2  Extended offline    Completed without error       00%     23116         -
# 3  Extended offline    Completed without error       00%     23090         -
# 4  Extended offline    Completed without error       00%     23065         -
# 5  Extended offline    Completed without error       00%     23042         -
# 6  Extended offline    Completed without error       00%     23024         -
# 7  Extended offline    Completed without error       00%     22972         -
# 8  Extended offline    Completed without error       00%     22949         -
# 9  Extended offline    Completed without error       00%     22926         -
#10  Extended offline    Completed without error       00%     22904         -
#11  Extended offline    Completed without error       00%     22877         -
#12  Extended offline    Completed without error       00%     22862         -
#13  Extended offline    Completed without error       00%     22810         -
#14  Extended offline    Completed without error       00%     22781         -
#15  Extended offline    Completed without error       00%     22762         -
#16  Extended offline    Completed without error       00%     22737         -
#17  Extended offline    Completed without error       00%     22712         -
#18  Extended offline    Completed without error       00%     22692         -
#19  Extended offline    Interrupted (host reset)      90%     22651         -
#20  Extended offline    Completed without error       00%     22639         -
#21  Extended offline    Completed without error       00%     22615         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


(smartctl -a /dev/ada1) -- this is the replacement drive from ~1 year ago
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Desktop HDD.15
Device Model:     ST4000DM000-1F2168
Serial Number:    S300JF2R
LU WWN Device Id: 5 000c50 0745ad7b9
Firmware Version: CC54
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5900 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Feb  7 09:54:17 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)    Offline data collection activity
                    was completed without error.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (  117) seconds.
Offline data collection
capabilities:             (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 522) minutes.
Conveyance self-test routine
recommended polling time:     (   2) minutes.
SCT capabilities:           (0x1085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       88695792
  3 Spin_Up_Time            0x0003   092   091   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       38
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   086   060   030    Pre-fail  Always       -       438824041
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       8722
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       38
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 1
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   048   045    Old_age   Always       -       31 (Min/Max 27/34)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       6
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       75
194 Temperature_Celsius     0x0022   031   052   000    Old_age   Always       -       31 (0 13 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       8724h+03m+25.869s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       16534011188
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       13416651596510

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


(smartctl -a /dev/ada2) -- this is the other original drive
Code:
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p31 amd64] (local build)
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Hitachi Deskstar 5K4000
Device Model:     Hitachi HDS5C4040ALE630
Serial Number:    PL1321LAG8LBXH
LU WWN Device Id: 5 000cca 22ec3e7fb
Firmware Version: MPAOA3B0
User Capacity:    4,000,787,030,016 bytes [4.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5700 rpm
Form Factor:      3.5 inches
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sun Feb  7 09:55:03 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (44034) seconds.
Offline data collection
capabilities:             (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:     (   1) minutes.
Extended self-test routine
recommended polling time:     ( 734) minutes.
SCT capabilities:           (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   135   135   054    Pre-fail  Offline      -       99
  3 Spin_Up_Time            0x0007   128   128   024    Pre-fail  Always       -       555 (Average 538)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       188
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   113   113   020    Pre-fail  Offline      -       42
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       24866
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       126
192 Power-Off_Retract_Count 0x0032   080   080   000    Old_age   Always       -       24697
193 Load_Cycle_Count        0x0012   080   080   000    Old_age   Always       -       24697
194 Temperature_Celsius     0x0002   181   181   000    Old_age   Always       -       33 (Min/Max 15/53)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     11779         -
# 2  Extended offline    Completed without error       00%     11748         -
# 3  Extended offline    Completed without error       00%     11734         -
# 4  Extended offline    Completed without error       00%     11709         -
# 5  Extended offline    Completed without error       00%     11676         -
# 6  Extended offline    Completed without error       00%     11663         -
# 7  Extended offline    Completed without error       00%     11635         -
# 8  Extended offline    Completed without error       00%     11612         -
# 9  Extended offline    Completed without error       00%     11585         -
#10  Extended offline    Completed without error       00%     11563         -
#11  Extended offline    Completed without error       00%     11533         -
#12  Extended offline    Completed without error       00%     11511         -
#13  Extended offline    Completed without error       00%     11496         -
#14  Extended offline    Completed without error       00%     11459         -
#15  Extended offline    Completed without error       00%     11443         -
#16  Extended offline    Completed without error       00%     11418         -
#17  Extended offline    Completed without error       00%     11388         -
#18  Extended offline    Completed without error       00%      5077         -
#19  Extended offline    Completed without error       00%      5050         -
#20  Extended offline    Completed without error       00%      5025         -
#21  Extended offline    Completed without error       00%      5002         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


(dmesg) -- trimmed to fit in forum 30,000 character posting limit
Code:
Copyright (c) 1992-2014 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
  The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 9.3-RELEASE-p31 #0 r288272+33bb475: Mon Feb  1 18:20:18 PST 2016
    root@build3.ixsystems.com:/tank/home/stable-builds/FN/objs/os-base/amd64/tank/home/stable-builds/FN/FreeBSD/src/sys/FREENAS.amd64 amd64
gcc version 4.2.1 20070831 patched [FreeBSD]
CPU: Intel(R) Core(TM) i3-2120T CPU @ 2.60GHz (2594.17-MHz K8-class CPU)
  Origin = "GenuineIntel"  Id = 0x206a7  Family = 0x6  Model = 0x2a  Stepping = 7
  Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE>
  Features2=0x1d9ae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,TSCDLT,XSAVE,OSXSAVE,AVX>
  AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
  AMD Features2=0x1<LAHF>
  TSC: P-state invariant, performance statistics
real memory  = 17706254336 (16886 MB)
avail memory = 16203726848 (15453 MB)
Event timer "LAPIC" quality 600
ACPI APIC Table: <ALASKA A M I>
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
FreeBSD/SMP: 1 package(s) x 2 core(s) x 2 SMT threads
 cpu0 (BSP): APIC ID:  0
 cpu1 (AP): APIC ID:  1
 cpu2 (AP): APIC ID:  2
 cpu3 (AP): APIC ID:  3
...
acpi0: <ALASKA A M I> on motherboard
acpi0: Power Button (fixed)
acpi0: reservation of 67, 1 (4) failed
...
agp0: <SandyBridge desktop GT1 IG> on vgapci0
agp0: aperture size is 256M, detected 262140k stolen memory
...
ahci0: <ASMedia ASM1061 AHCI SATA controller> port 0xd050-0xd057,0xd040-0xd043,0xd030-0xd037,0xd020-0xd023,0xd000-0xd01f mem 0xf7c00000-0xf7c001ff irq 19 at device 0.0 on pci3
ahci0: AHCI v1.20 with 2 6Gbps ports, Port Multiplier supported
ahcich0: <AHCI channel> at channel 0 on ahci0
ahcich1: <AHCI channel> at channel 1 on ahci0
...
ahci1: <Intel Panther Point AHCI SATA controller> port 0xf0b0-0xf0b7,0xf0a0-0xf0a3,0xf090-0xf097,0xf080-0xf083,0xf060-0xf07f mem 0xf7d16000-0xf7d167ff irq 19 at device 31.2 on pci0
ahci1: AHCI v1.30 with 6 6Gbps ports, Port Multiplier not supported
ahcich2: <AHCI channel> at channel 0 on ahci1
ahcich3: <AHCI channel> at channel 1 on ahci1
ahcich4: <AHCI channel> at channel 2 on ahci1
ahcich5: <AHCI channel> at channel 3 on ahci1
ahcich6: <AHCI channel> at channel 4 on ahci1
ahcich7: <AHCI channel> at channel 5 on ahci1
...
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)
...
ada0 at ahcich0 bus 0 scbus0 target 0 lun 0
da0 at umass-sim0 bus 0 scbus9 target 0 lun 0
da0: <PNY USB 3.0 FD 1100> Removable Direct Access SCSI-6 device
da0: Serial Number AA530A2240000015
da0: 400.000MB/s transfers
da0: 15259MB (31250432 512 byte sectors: 255H 63S/T 1945C)
da0: quirks=0x13<NO_SYNC_CACHE,NO_6_BYTE,NO_RC16>
ada0: <ST4000DM000-1F2168 CC51> ATA-8 SATA 3.x device
ada0: Serial Number Z300ALSE
ada0: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada0: Command Queueing enabled
ada0: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
ada0: quirks=0x1<4K>
cd0 at ahcich4 bus 0 scbus4 target 0 lun 0
cd0: <HL-DT-ST DVDRAM GH24NSB0 LJ00> Removable CD-ROM SCSI-0 device
cd0: Serial Number K85F4R94624
cd0: 150.000MB/s transfers (SATA 1.x, UDMA6, ATAPI 12bytes, PIO 8192bytes)
cd0: cd present [1 x 0 byte records]
ada0: Previously was known as ad4
ada1 at ahcich1 bus 0 scbus1 target 0 lun 0
ada1: <ST4000DM000-1F2168 CC54> ATA-9 SATA 3.x device
ada1: Serial Number S300JF2R
ada1: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada1: Command Queueing enabled
ada1: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
ada1: quirks=0x1<4K>
ada1: Previously was known as ad6
ada2 at ahcich2 bus 0 scbus2 target 0 lun 0
ada2: <Hitachi HDS5C4040ALE630 MPAOA3B0> ATA-8 SATA 3.x device
ada2: Serial Number PL1321LAG8LBXH
ada2: 600.000MB/s transfers (SATA 3.x, UDMA6, PIO 8192bytes)
ada2: Command Queueing enabled
ada2: 3815447MB (7814037168 512 byte sectors: 16H 63S/T 16383C)
ada2: Previously was known as ad8
ada3 at ahcich5 bus 0 scbus5 target 0 lun 0
ada3: <WDC WD60EZRX-00MVLB1 80.00A80> ATA-9 SATA 3.x device
ada3: Serial Number WD-WX21DC42E5JC
ada3: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
ada3: Command Queueing enabled
ada3: 5723166MB (11721045168 512 byte sectors: 16H 63S/T 16383C)
ada3: quirks=0x1<4K>
ada3: Previously was known as ad14
ada4 at ahcich6 bus 0 scbus6 target 0 lun 0
ada4: <KINGSTON SV300S37A60G 603ABBF0> ATA-8 SATA 3.x device
ada4: Serial Number 50026B725A08123F
ada4: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 512bytes)
ada4: Command Queueing enabled
ada4: 57241MB (117231408 512 byte sectors: 16H 63S/T 16383C)
ada4: Previously was known as ad16
SMP: AP CPU #1 Launched!
SMP: AP CPU #3 Launched!
SMP: AP CPU #2 Launched!
Timecounter "TSC-low" frequency 1297084560 Hz quality 1000
Trying to mount root from zfs:freenas-boot/ROOT/FreeNAS-9.3-STABLE-201602020212 []...
GEOM_RAID5: Module loaded, version 1.3.20140711.62 (rev f91e28e40bf7)
wbwd0: DevID 0xc3 DevRev 0x33, will not attach, please report this.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
, I have smartctl running long tests daily
Don't. Do. That. That's WAY too often, wayyyy too hard on the drives. Long tests, at most, twice per month.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Nothing looks terribly out of the ordinary on the drives you provided smart reports for.

But, none of this is recommended hardware, those are certainly not drives we'd recommend. So, yeah, maybe someone will have more constructive ideas for you, but all I can say is that nothing is standing out glaringly in the smartctl (better if you do smartctl -x /dev/adaN on each drive, then pastebin each result, and post the links to the pastebins for a more full analysis)...but that even under the most ideal circumstances, this would be a very poor FreeNAS rig.
 

rs225

Guru
Joined
Jun 28, 2014
Messages
878
I would run a memtest(just to be sure) and consider the power supply situation. My guess is rare/random corruption in the IO components(although probably not the hard drives). Any common cables?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Nothing looks terribly out of the ordinary on the drives you provided smart reports for.

The 53 °C for ada2 is a bit of a concern to me :)
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I have heard of some drives throwing cksum errors and not showing anything else out of order when they are in an overheat condition. Certainly, over 50C would be of concern for NAS drives. Over 40C would be of concern, to be honest.
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I'd say 50 degrees C is "turn up the cooling now" urgent.
 
Status
Not open for further replies.
Top