Grrr.
Well here's the output I have of those other details:
CAM messages? Yes:
Code:
Dec 19 09:23:56 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 80 cf 7e 40 8c 00 00 00 20 00
Dec 19 09:23:56 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:23:56 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:23:56 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 8b cf 7e 8c 8c 00 00 0f 00
Dec 19 09:23:56 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:00 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 80 cf 7e 40 8c 00 00 00 20 00
Dec 19 09:24:00 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:00 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:00 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 81 cf 7e 8c 8c 00 00 0f 00
Dec 19 09:24:00 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:04 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 80 cf 7e 40 8c 00 00 00 20 00
Dec 19 09:24:04 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:04 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:04 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 81 cf 7e 8c 8c 00 00 0f 00
Dec 19 09:24:05 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:30 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 a0 cf 7e 40 8c 00 00 00 20 00
Dec 19 09:24:30 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:30 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:30 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 b0 cf 7e 8c 8c 00 00 00 00
Dec 19 09:24:30 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:34 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 a0 cf 7e 40 8c 00 00 00 20 00
Dec 19 09:24:34 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:34 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:34 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 af cf 7e 8c 8c 00 00 0f 00
Dec 19 09:24:34 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:41 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 28 d0 7e 40 8c 00 00 00 30 00
Dec 19 09:24:41 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:41 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:41 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 28 d0 7e 8c 8c 00 00 1f 00
Dec 19 09:24:41 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:45 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 28 d0 7e 40 8c 00 00 00 30 00
Dec 19 09:24:45 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:45 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:45 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 28 d0 7e 8c 8c 00 00 1f 00
Dec 19 09:24:45 nas kernel: (ada2:ata3:0:0:0): Retrying command
Dec 19 09:24:49 nas kernel: (ada2:ata3:0:0:0): READ_DMA48. ACB: 25 00 28 d0 7e 40 8c 00 00 00 30 00
Dec 19 09:24:49 nas kernel: (ada2:ata3:0:0:0): CAM status: ATA Status Error
Dec 19 09:24:49 nas kernel: (ada2:ata3:0:0:0): ATA status: 51 (DRDY SERV ERR), error: 40 (UNC )
Dec 19 09:24:49 nas kernel: (ada2:ata3:0:0:0): RES: 51 40 28 d0 7e 8c 8c 00 00 1f 00
Dec 19 09:24:49 nas kernel: (ada2:ata3:0:0:0): Retrying command
...
etc.
How can I tell if anything else is accessing the devices?
smartctl info for the two drives that were in question. The other two showed no errors.
This basically means that two drives failed somehow at around the same time.. Too bad.
Well, at least it gives me an opportunity to expand and upgrade the server. Proper server hardware here we come!
Thanks again for your time with this. It's all a little lower level than my specialty.
Code:
# smartctl -x /dev/ada2
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EARX-42R6B0
Serial Number: [No Information Found]
LU WWN Device Id: 5 0014ee 2b07638eb
Firmware Version: 02.00A02
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Sun Dec 22 23:27:34 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Disabled
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84)Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 121)The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection:(40800) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003)Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01)Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 464) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f)SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 165 145 021 - 8741
4 Start_Stop_Count -O--CK 100 100 000 - 76
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 200 200 000 - 0
9 Power_On_Hours -O--CK 079 079 000 - 16034
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 73
192 Power-Off_Retract_Count -O--CK 200 200 000 - 39
193 Load_Cycle_Count -O--CK 150 150 000 - 152108
194 Temperature_Celsius -O---K 115 100 000 - 37
195 Hardware_ECC_Recovered -OS-CK 200 200 000 - 0
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 176
198 Offline_Uncorrectable ----CK 200 200 000 - 130
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 071 071 000 - 25858
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read GP Log Directory failed
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x09 SL R/W 1 Selective self-test log
0x80-0x9f SL R/W 16 Host vendor specific log
0xa0-0xa7 SL VS 16 Device vendor specific log
0xa8-0xb7 SL VS 1 Device vendor specific log
0xc0 SL VS 1 Device vendor specific log
0xe0 SL R/W 1 SCT Command/Status
0xe1 SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log Version: 1
No Errors Logged
SMART Extended Self-test Log (GP Log 0x07) not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 15912 1817587469
# 2 Short offline Completed: read failure 90% 15856 1817587469
# 3 Short offline Completed: read failure 90% 15659 1817587469
# 4 Short offline Completed without error 00% 15535 -
# 5 Short offline Completed without error 00% 15370 -
# 6 Short offline Completed without error 00% 15205 -
# 7 Extended offline Completed without error 00% 15169 -
# 8 Short offline Completed without error 00% 15066 -
# 9 Short offline Completed without error 00% 14901 -
#10 Short offline Completed without error 00% 14736 -
#11 Short offline Completed without error 00% 14570 -
#12 Extended offline Completed without error 00% 14463 -
#13 Short offline Completed without error 00% 14405 -
#14 Short offline Completed without error 00% 14240 -
#15 Short offline Completed without error 00% 14075 -
#16 Short offline Completed without error 00% 13909 -
#17 Extended offline Completed without error 00% 13755 -
#18 Short offline Completed without error 00% 13744 -
#19 Short offline Completed without error 00% 13579 -
#20 Short offline Completed without error 00% 13416 -
#21 Short offline Completed without error 00% 13251 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 2
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Unknown (8)
Current Temperature: 37 Celsius
Power Cycle Min/Max Temperature: 34/42 Celsius
Lifetime Min/Max Temperature: 37/52 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (453)
Index Estimated Time Temperature Celsius
454 2013-12-22 15:30 38 *******************
... ..(157 skipped). .. *******************
134 2013-12-22 18:08 38 *******************
135 2013-12-22 18:09 37 ******************
... ..( 61 skipped). .. ******************
197 2013-12-22 19:11 37 ******************
198 2013-12-22 19:12 38 *******************
... ..( 18 skipped). .. *******************
217 2013-12-22 19:31 38 *******************
218 2013-12-22 19:32 39 ********************
... ..( 87 skipped). .. ********************
306 2013-12-22 21:00 39 ********************
307 2013-12-22 21:01 38 *******************
... ..(145 skipped). .. *******************
453 2013-12-22 23:27 38 *******************
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP Log 0x04) not supported
ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read SATA Phy Event Counters failed
Code:
# smartctl -x /dev/ada3
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (AF, SATA 6Gb/s)
Device Model: WDC WD20EARX-42R6B0
Serial Number: [No Information Found]
LU WWN Device Id: 5 0014ee 2b05d57ef
Firmware Version: 02.00A02
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 1.5 Gb/s
Local Time is: Sun Dec 22 23:29:53 2013 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
AAM feature is: Disabled
APM feature is: Unavailable
Rd look-ahead is: Enabled
Write cache is: Enabled
ATA Security is: Disabled, NOT FROZEN [SEC1]
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82)Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0)The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection:(40800) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003)Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01)Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 464) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f)SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAGS VALUE WORST THRESH FAIL RAW_VALUE
1 Raw_Read_Error_Rate POSR-K 200 200 051 - 0
3 Spin_Up_Time POS--K 165 145 021 - 8725
4 Start_Stop_Count -O--CK 100 100 000 - 81
5 Reallocated_Sector_Ct PO--CK 200 200 140 - 0
7 Seek_Error_Rate -OSR-K 100 253 000 - 0
9 Power_On_Hours -O--CK 079 079 000 - 16014
10 Spin_Retry_Count -O--CK 100 253 000 - 0
11 Calibration_Retry_Count -O--CK 100 253 000 - 0
12 Power_Cycle_Count -O--CK 100 100 000 - 77
192 Power-Off_Retract_Count -O--CK 200 200 000 - 42
193 Load_Cycle_Count -O--CK 149 149 000 - 153014
194 Temperature_Celsius -O---K 119 101 000 - 33
195 Hardware_ECC_Recovered -OS-CK 200 200 000 - 0
196 Reallocated_Event_Count -O--CK 200 200 000 - 0
197 Current_Pending_Sector -O--CK 200 200 000 - 1
198 Offline_Uncorrectable ----CK 200 200 000 - 0
199 UDMA_CRC_Error_Count -O--CK 200 200 000 - 0
200 Multi_Zone_Error_Rate ---R-- 198 192 000 - 482
||||||_ K auto-keep
|||||__ C event count
||||___ R error rate
|||____ S speed/performance
||_____ O updated online
|______ P prefailure warning
ATA_READ_LOG_EXT (addr=0x00:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read GP Log Directory failed
SMART Log Directory Version 1 [multi-sector log support]
Address Access R/W Size Description
0x00 SL R/O 1 Log Directory
0x01 SL R/O 1 Summary SMART error log
0x02 SL R/O 5 Comprehensive SMART error log
0x06 SL R/O 1 SMART self-test log
0x09 SL R/W 1 Selective self-test log
0x80-0x9f SL R/W 16 Host vendor specific log
0xa0-0xa7 SL VS 16 Device vendor specific log
0xa8-0xb7 SL VS 1 Device vendor specific log
0xc0 SL VS 1 Device vendor specific log
0xe0 SL R/W 1 SCT Command/Status
0xe1 SL R/W 1 SCT Data Transfer
SMART Extended Comprehensive Error Log (GP Log 0x03) not supported
SMART Error Log Version: 1
No Errors Logged
SMART Extended Self-test Log (GP Log 0x07) not supported
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00% 15977 -
# 2 Extended offline Completed without error 00% 15903 -
# 3 Short offline Completed without error 00% 15840 -
# 4 Short offline Completed without error 00% 15838 -
# 5 Extended offline Completed without error 00% 15774 -
# 6 Short offline Completed without error 00% 15646 -
# 7 Short offline Completed without error 00% 15644 -
# 8 Short offline Completed without error 00% 15522 -
# 9 Short offline Completed without error 00% 15520 -
#10 Short offline Completed without error 00% 15357 -
#11 Short offline Completed without error 00% 15355 -
#12 Short offline Completed without error 00% 15192 -
#13 Short offline Completed without error 00% 15190 -
#14 Extended offline Completed without error 00% 15179 -
#15 Extended offline Completed without error 00% 15108 -
#16 Short offline Completed without error 00% 15053 -
#17 Short offline Completed without error 00% 15051 -
#18 Short offline Completed without error 00% 14888 -
#19 Short offline Completed without error 00% 14886 -
#20 Short offline Completed without error 00% 14723 -
#21 Short offline Completed without error 00% 14721 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
SCT Status Version: 2
SCT Version (vendor specific): 258 (0x0102)
SCT Support Level: 1
Device State: Active (0)
Current Temperature: 33 Celsius
Power Cycle Min/Max Temperature: 31/41 Celsius
Lifetime Min/Max Temperature: 32/51 Celsius
Under/Over Temperature Limit Count: 0/0
SCT Temperature History Version: 2
Temperature Sampling Period: 1 minute
Temperature Logging Interval: 1 minute
Min/Max recommended Temperature: 0/60 Celsius
Min/Max Temperature Limit: -41/85 Celsius
Temperature History Size (Index): 478 (101)
Index Estimated Time Temperature Celsius
102 2013-12-22 15:32 33 **************
... ..(183 skipped). .. **************
286 2013-12-22 18:36 33 **************
287 2013-12-22 18:37 32 *************
... ..( 29 skipped). .. *************
317 2013-12-22 19:07 32 *************
318 2013-12-22 19:08 33 **************
319 2013-12-22 19:09 32 *************
320 2013-12-22 19:10 33 **************
... ..( 2 skipped). .. **************
323 2013-12-22 19:13 33 **************
324 2013-12-22 19:14 38 *******************
... ..( 54 skipped). .. *******************
379 2013-12-22 20:09 38 *******************
380 2013-12-22 20:10 37 ******************
381 2013-12-22 20:11 37 ******************
382 2013-12-22 20:12 37 ******************
383 2013-12-22 20:13 36 *****************
... ..( 2 skipped). .. *****************
386 2013-12-22 20:16 36 *****************
387 2013-12-22 20:17 35 ****************
... ..( 13 skipped). .. ****************
401 2013-12-22 20:31 35 ****************
402 2013-12-22 20:32 34 ***************
... ..(142 skipped). .. ***************
67 2013-12-22 22:55 34 ***************
68 2013-12-22 22:56 33 **************
... ..( 32 skipped). .. **************
101 2013-12-22 23:29 33 **************
SCT Error Recovery Control:
Read: Disabled
Write: Disabled
Device Statistics (GP Log 0x04) not supported
ATA_READ_LOG_EXT (addr=0x11:0x00, page=0, n=1) failed: 48-bit ATA commands not implemented for legacy controllers
Read SATA Phy Event Counters failed