Failing disk(s)?

Chip28 · May 5, 2014

--edit, looks like I forgot to add the question tag... just imagine it's there please...

Afternoon All!
Apologies in advanced for the long post. (Mods, if I'm in the wrong place, please let me know) I have both questions and a problem with my build. I've lurked on the forums for a while, lots of awesome help here and I'm hoping maybe someone can help me out a bit as well. I fully expect to get a bit of grief for how it's set up, but I did best I could at the time:

Setup:

Dell Precision R5400

2 x Intel(R) Xeon(R) CPU E5440 @ 2.83GHz

12(ish)GB of ECC RAM

SanDisk Cruzer Fit 16GB

2 x Seagate ST3500418AS - Jails/Transcoding etc

Monoprice 103581 PCI Express Serial ATA II

Sans Digital TowerRAID TR8M+B

6 x ST3000DM 3T

2 x WD RED 3T

The 3T drives are in a 2 x RaidZ2 (As recommended by the "creation" wizard) : --exit added below and changed to raidz2

Code:

 NAME                                            STATE     READ WRITE CKSUM
        mediastore                                      ONLINE       0     0     0
          raidz2-0                                      ONLINE       0     0     0
            gptid/1cfdfa5d-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/1dd5426a-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/1e381275-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/1ea087d9-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
          raidz2-1                                      ONLINE       0     0     0
            gptid/1f01fd79-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/1f6b3224-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/2049a02f-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0
            gptid/21453905-9de6-11e3-a6e9-00219b58171e  ONLINE       0     0     0

The problems:
Before I get started with it, I would like to say I DO have smart checks and scrubs set up AND the emails come through just fine, I do get one ever boot >.> (part of the problem, mentioned below).

When writing files, it will stall out, moving to a creeping pace, eventually picking back up (sometimes). The error log shows things like:

Code:

May  5 17:09:15 eve kernel: (ada4:siisch1:0:0:0): CAM status: ATA Status Error
May  5 17:09:15 eve kernel: (ada4:siisch1:0:0:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May  5 17:09:15 eve kernel: (ada4:siisch1:0:0:0): RES: 41 84 80 f3 9f 00 4a 00 00 00 01
May  5 17:09:15 eve kernel: (ada4:siisch1:0:0:0): Retrying command
May  5 17:09:16 eve kernel: (ada5:siisch1:0:1:0): WRITE_FPDMA_QUEUED. ACB: 61 88 80 24 a0 40 4a 00 00 00 00 00
May  5 17:09:16 eve kernel: (ada5:siisch1:0:1:0): CAM status: ATA Status Error
May  5 17:09:16 eve kernel: (ada5:siisch1:0:1:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May  5 17:09:16 eve kernel: (ada5:siisch1:0:1:0): RES: 41 84 18 f7 9f 00 4a 00 00 88 00
May  5 17:09:16 eve kernel: (ada5:siisch1:0:1:0): Retrying command
May  5 17:09:16 eve kernel: siisch0: Error while READ LOG EXT
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): WRITE_FPDMA_QUEUED. ACB: 61 88 58 0e a0 40 4a 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): CAM status: ATA Status Error
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): ATA status: 00 ()
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): RES: 00 00 00 00 00 00 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): Retrying command
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): WRITE_FPDMA_QUEUED. ACB: 61 00 68 0f a0 40 4a 00 00 01 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): CAM status: ATA Status Error
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): ATA status: 00 ()
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): RES: 00 00 00 00 00 00 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): Retrying command
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): WRITE_FPDMA_QUEUED. ACB: 61 88 e0 0e a0 40 4a 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): CAM status: ATA Status Error
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): ATA status: 00 ()
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): RES: 00 00 00 00 00 00 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada3:siisch0:0:3:0): Retrying command
May  5 17:09:17 eve kernel: (ada2:siisch0:0:2:0): WRITE_FPDMA_QUEUED. ACB: 61 88 40 1b a0 40 4a 00 00 00 00 00
May  5 17:09:17 eve kernel: (ada2:siisch0:0:2:0): CAM status: ATA Status Error
May  5 17:09:17 eve kernel: (ada2:siisch0:0:2:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May  5 17:09:17 eve kernel: (ada2:siisch0:0:2:0): RES: 41 84 6f 1b a0 40 4a 00 00 00 00
May  5 17:09:17 eve kernel: (ada2:siisch0:0:2:0): Retrying command
May  5 17:09:18 eve kernel: (ada4:siisch1:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 88 78 32 a0 40 4a 00 00 00 00 00
May  5 17:09:18 eve kernel: (ada4:siisch1:0:0:0): CAM status: ATA Status Error
May  5 17:09:18 eve kernel: (ada4:siisch1:0:0:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May  5 17:09:18 eve kernel: (ada4:siisch1:0:0:0): RES: 41 84 78 32 a0 00 4a 00 00 88 00
May  5 17:09:18 eve kernel: (ada4:siisch1:0:0:0): Retrying command
May  5 17:09:18 eve kernel: (ada5:siisch1:0:1:0): WRITE_FPDMA_QUEUED. ACB: 61 88 70 41 a0 40 4a 00 00 00 00 00
May  5 17:09:18 eve kernel: (ada5:siisch1:0:1:0): CAM status: ATA Status Error
May  5 17:09:18 eve kernel: (ada5:siisch1:0:1:0): ATA status: 41 (DRDY ERR), error: 84 (ICRC ABRT )
May  5 17:09:18 eve kernel: (ada5:siisch1:0:1:0): RES: 41 84 70 41 a0 00 4a 00 00 88 00

If you notice, it's typically a set of drives, not one in particular. Drives 0-3 and 4-7, respectively, are each on an esata cable. I have replaced the esata cables with well recommended ones from amazon thinking it would solve the problem, it didn't.

The email mentioned above states:

Code:

Device: /dev/ada2, 41 Currently unreadable (pending) sectors
 
Device info:
WDC WD30EFRX-68EUZN0, S/N:WD-WCCXXXXXXXXX, WWN:5-0014ee-XXXXXXXXX, FW:80.00A80, 3.00 TB

The WD Drive tools verifies that that one particular drive is failing. SeagateSeatools BSOD's both my windows computers when trying to scan one of the seagate drives.

SMART Data
Drive0 (seagate)

Code:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  114  099  006    Pre-fail  Always      -      61103656
  3 Spin_Up_Time            0x0003  092  092  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      79
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -      19881043
  9 Power_On_Hours          0x0032  088  088  000    Old_age  Always      -      11248
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      85
183 Runtime_Bad_Block      0x0032  001  001  000    Old_age  Always      -      104
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  094  094  000    Old_age  Always      -      6
188 Command_Timeout        0x0032  100  003  000    Old_age  Always      -      39 82 2765
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  058  051  045    Old_age  Always      -      42 (Min/Max 37/44)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      62
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      1754
194 Temperature_Celsius    0x0022  042  049  000    Old_age  Always      -      42 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  092  000    Old_age  Always      -      23326
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      9058h+56m+29.965s
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      64461077885046
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      139361752603292
 
...
 
Error 4 occurred at disk power-on lifetime: 11063 hours (460 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  2d+09:33:19.740  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:19.740  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  2d+09:33:19.249  READ LOG EXT
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED

Drive1 (seagate)

Code:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  114  099  006    Pre-fail  Always      -      61103656
  3 Spin_Up_Time            0x0003  092  092  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      79
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  072  060  030    Pre-fail  Always      -      19881043
  9 Power_On_Hours          0x0032  088  088  000    Old_age  Always      -      11248
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      85
183 Runtime_Bad_Block      0x0032  001  001  000    Old_age  Always      -      104
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  094  094  000    Old_age  Always      -      6
188 Command_Timeout        0x0032  100  003  000    Old_age  Always      -      39 82 2765
189 High_Fly_Writes        0x003a  100  100  000    Old_age  Always      -      0
190 Airflow_Temperature_Cel 0x0022  058  051  045    Old_age  Always      -      42 (Min/Max 37/44)
191 G-Sense_Error_Rate      0x0032  100  100  000    Old_age  Always      -      0
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      62
193 Load_Cycle_Count        0x0032  100  100  000    Old_age  Always      -      1754
194 Temperature_Celsius    0x0022  042  049  000    Old_age  Always      -      42 (0 15 0 0 0)
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x003e  200  092  000    Old_age  Always      -      23326
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      9059h+00m+43.746s
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      64461077885046
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      139361752603292
 
...
 
Error 4 occurred at disk power-on lifetime: 11063 hours (460 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  2d+09:33:19.740  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:19.740  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  2d+09:33:19.249  READ LOG EXT
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED
 
Error 3 occurred at disk power-on lifetime: 11063 hours (460 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:16.501  READ FPDMA QUEUED
  2f 00 01 10 00 00 00 00  2d+09:33:16.107  READ LOG EXT
  60 00 00 ff ff ff 4f 00  2d+09:33:13.379  READ FPDMA QUEUED
  60 00 00 ff ff ff 4f 00  2d+09:33:13.379  READ FPDMA QUEUED

Drive2 (WD) (verified failing0

Code:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  199  109  051    Pre-fail  Always      -      6320
  3 Spin_Up_Time            0x0027  173  172  021    Pre-fail  Always      -      6316
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      47
  5 Reallocated_Sector_Ct  0x0033  175  175  140    Pre-fail  Always      -      758
  7 Seek_Error_Rate        0x002e  100  253  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  097  097  000    Old_age  Always      -      2847
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      45
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      34
193 Load_Cycle_Count        0x0032  185  185  000    Old_age  Always      -      46395
194 Temperature_Celsius    0x0022  111  104  000    Old_age  Always      -      39
196 Reallocated_Event_Count 0x0032  199  199  000    Old_age  Always      -      1
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      41
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  001  000    Old_age  Always      -      24372
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      58
 
 
...
 
Error 1598 occurred at disk power-on lifetime: 1287 hours (53 days + 15 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 d0 8f 1d e4  Error: UNC 8 sectors at LBA = 0x041d8fd0 = 69046224
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 d0 8f 1d e4 00  2d+04:03:30.791  READ DMA
  ec 00 00 00 00 00 a0 00  2d+04:03:30.790  IDENTIFY DEVICE
  ef 03 44 00 00 00 a0 00  2d+04:03:30.790  SET FEATURES [Set transfer mode]
 
...
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed: read failure      90%      2697        55757464
# 2  Extended offline    Completed: read failure      70%      1421        58783072
# 3  Extended offline    Completed: read failure      90%      1415        55764744
# 4  Extended offline    Completed: read failure      70%      1387        55757464
# 5  Extended offline    Completed: read failure      90%      1383        58477104
# 6  Extended offline    Completed: read failure      90%      1368        58470232
# 7  Extended offline    Completed: read failure      60%      1349        61200168
# 8  Extended offline    Completed: read failure      70%      1346        63843408
# 9  Extended offline    Completed: read failure      90%      1148        55768184

Drive3 (WD) - No logged SMART Errors --edit, changed from seagate to WD

Code:

ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x002f  200  200  051    Pre-fail  Always      -      0
  3 Spin_Up_Time            0x0027  180  179  021    Pre-fail  Always      -      5958
  4 Start_Stop_Count        0x0032  100  100  000    Old_age  Always      -      81
  5 Reallocated_Sector_Ct  0x0033  200  200  140    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x002e  200  200  000    Old_age  Always      -      0
  9 Power_On_Hours          0x0032  092  092  000    Old_age  Always      -      6024
10 Spin_Retry_Count        0x0032  100  253  000    Old_age  Always      -      0
11 Calibration_Retry_Count 0x0032  100  253  000    Old_age  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      79
192 Power-Off_Retract_Count 0x0032  200  200  000    Old_age  Always      -      55
193 Load_Cycle_Count        0x0032  200  200  000    Old_age  Always      -      25
194 Temperature_Celsius    0x0022  110  105  000    Old_age  Always      -      40
196 Reallocated_Event_Count 0x0032  200  200  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0032  200  200  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0030  100  253  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x0032  200  001  000    Old_age  Always      -      26238
200 Multi_Zone_Error_Rate  0x0008  200  200  000    Old_age  Offline      -      0

The rest of the drives don't have anything interesting AFAIK in the SMART Logs yet STILL get mentioned in the syslog.

Questions:
1) Are all my drives throwing errors because of the WD (and possibly the 2 seagates) are failing?
2) Are WD RED's the best way to go? Anyone want to comment on what drive you're using?
3) I've had a linux server for media and such for the past 3ish years typically running seagate drives (because they're cheap) and on 24x7. I've RMA'd 3 of them so far out of this machine alone, a couple other from other machines. The computer rarely physically moves. I'm I just having awful luck or am I doing something completely wrong?
4) Is my ZFS setup OK? is there a better config that I could/should use that wouldn't burn so many of my drives (1/2) as redundancy?

Last question, Promise
5) Any suggestions for a relatively in-expensive rackmount JBOD enclosure? (I'm a college student doing this more or less as a hobby so I cant be super extravagant)

as a psudo-question, any suggestions for resources on understanding the smart data?

Feel free to bash away. I have read MOST of the manual.. but it's long, and it's finals week ;) ...

Should I attach/post the smart results for all 8 drives? Nothing like a monster wall of text to discourage help :(

cyberjock · May 5, 2014

So what does this mean...

The 3T drives are in a 2 x ZFS2 (As recommended by the "creation" wizard)

There is no such thing as ZFS2...

Literally 4 of us are in Mumble and none of us have a heck of a clue what configuration you are running...

DrKK · May 5, 2014

I feel like what's happening here,

He's got a Sans hardware raid box, that he likes because it port multiplies his eSATA, but now he's got FreeNAS going to a hardware-RAID solution. If that's right, you're in big doo doo, sir.

Plus, I am told the Sans site says he cannot run this much space in this box in any case.

So I am not sure what the hell is going on.

Chip28 · May 5, 2014

cyberjock said:
So what does this mean...

The 3T drives are in a 2 x ZFS2 (As recommended by the "creation" wizard)

There is no such thing as ZFS2...

Literally 4 of us are in Mumble and none of us have a heck of a clue what configuration you are running...

I edited the post. I ment a striped (I assume) RaidZ2 - Apologies

Chip28 · May 5, 2014

DrKK said:
I feel like what's happening here,

He's got a Sans hardware raid box, that he likes because it port multiplies his eSATA, but now he's got FreeNAS going to a hardware-RAID solution. If that's right, you're in big doo doo, sir.

Plus, I am told the Sans site says he cannot run this much space in this box in any case.

So I am not sure what the hell is going on.

Sortof. The box it's self is simply a port multiplier. The manufacturer provided sata controller does do raid. I have replaced the controller with a different one based on suggestions from others. The current card shows them all as JBOD

Lemming · May 5, 2014

Chip28 said:
Sortof. The box it's self is simply a port multiplier. The manufacturer provided sata controller does do raid. I have replaced the controller with a different one based on suggestions from others. The current card shows them all as JBOD

The box may still be mangling the SMART data which is part of the problem you are seeing.

Also not every SATA controller supports talking to ESATA multipliers. It may *work* but that doesn't mean it's going to keep working or be reliable.

Chip28 · May 5, 2014

Lemming said:
The box may still be mangling the SMART data which is part of the problem you are seeing.

Fair enough. Do you have any suggestions for a rackmount solution?

Lemming said:
Also not every SATA controller supports talking to ESATA multipliers. It may *work* but that doesn't mean it's going to keep working or be reliable.

Yeah, I did notice that. For the manufacturer provided card they only released hack-and-slash kernel mode drivers for linux that were sketchy at best so I opted for a different solution. The card I got was recommended by another linux user that said it works for them just fine.

cyberjock · May 5, 2014

Wow.. you are in deep "doo-doo". SATA multipliers are the bane of "reliability" and "performance".

Your problem with multiple "good" drives dropping is almost certainly directly attributable to the multiplier. That's one of the primary reasons why we tell people not to use them. Frankly, you are lucky your pool still works. Seen quite a few people lose their pools when their first disk fails with a multiplier.

If you want a rackmount solution the best one is to go buy a used supermicro server, gut the electronics and reuse the case. Poof, instant $1000 case with a few scratches for $300 or less. ;)

Chip28 · May 5, 2014

Thanks for the advice cyberjock. I was aware and ok with the performance kill with the card. The reliability I didn't put a ton of thought into (obviously) . I was aware that if, say, one cable came loose, 4 of the drives would drop. I wasn't aware of the multiplier killing the pool. Honestly, its more of a project, so it isn't an 'end all' if the pool dies, just a monster inconvenience for a week. The main factor behind the JBOD via the multiplier was the cost. I'm a full time student so I couldn't throw all the resources at it that so many people here can afford to do.

When summer comes around and I start my job and can do those upgrades I'll probably do just that.

Is there a better way to lay out the 8 disks? Would it simply be better to do a 6 disk pool and add another 6disk once I get a few others?

cyberjock · May 6, 2014

The better way is anything that involves ditching the port multiplier. Those things are a nightmare. We had one guy that lost a pool that was over 100TB because of port multipliers. ;)

Important Announcement for the TrueNAS Community.

Failing disk(s)?

Chip28

Cadet

cyberjock

Inactive Account

DrKK

FreeNAS Generalissimo

Chip28

Cadet

Chip28

Cadet

Lemming

Cadet

Chip28

Cadet

cyberjock

Inactive Account

Chip28

Cadet

cyberjock

Inactive Account

Similar threads