Just moved across the country - can't import volume

caddy013

Dabbler
Joined
Sep 6, 2018
Messages
11
Hey everyone,

As the title says, I just moved across the country. Finally getting my lab and computer stuff up and running, but tonight when I tried to get my FreeNAS box going, I got a bunch of error text and then when I booted in to the GUI, my volume is showing status UNKNOWN. Did a zpool import and this is what I got back:

Code:
root@guiltySpark ~]# zpool import                                             
   pool: spradlin                                                               
     id: 14948627749909235127                                                   
  state: DEGRADED                                                               
 status: One or more devices are missing from the system.                       
 action: The pool can be imported despite missing or damaged devices.  The     
        fault tolerance of the pool may be compromised if imported.             
   see: http://illumos.org/msg/ZFS-8000-2Q                                     
 config:                                                                       
                                                                                
        spradlin                                        DEGRADED               
          mirror-0                                      DEGRADED               
            7965133589885255340                         UNAVAIL  cannot open   
            gptid/26159776-92c0-11e4-a186-bc5ff4e7a55f  ONLINE                 
          mirror-1                                      DEGRADED               
            gptid/137f3c38-dd06-11e5-990e-bc5ff4e7a55f  ONLINE                 
            13696599547927428096                        UNAVAIL  cannot open   
          mirror-2                                      ONLINE                 
            gptid/1a92ec0a-dd06-11e5-990e-bc5ff4e7a55f  ONLINE                 
            gptid/967e579f-b001-11e8-8f71-bc5ff4e7a55f  ONLINE                 
[root@guiltySpark ~]#                            


At first glance, it's looking to me like I may have two bad disks, but that the data would still be salvageable if I replaced/fixed them? However, since the pool isn't imported, in the GUI, it didn't look like there was a way to detach a disk (might have the term wrong, but I remember having to do something similar when I replaced a drive 6 or so months ago).

Searching for solutions, I've read several posts that seem to all point to an error with the disks, so here's what I'm thinking my next steps are:

1: power down, unplug, and make sure cables are good to go
2: try replacing just one of the disks?
3: try to import and resilver the pool after that
4: .....

Does that all sound right?

Here's the system:
FreeNAS 11.1-U7
Intel i3 4130T
16GB ECC RAM
No hardware RAID controller
6x WD RED disks (4x 3TB, 2x 4TB (slowly replacing them)) in three mirrors.

Anything else y'all need to know? Just looking to get feedback before I start blindly doing anything.

Thanks in advance!
 
Joined
Dec 29, 2014
Messages
1,135
Luckily for you it looks like the failed disks are in different mirrored vdev's, so your data should be ok. The first thing you should do is back it all up. Then try checking cabling and power connections. If all that is good, then you need to replace disks.
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
I would look at sata and power cables as well,then if that fails carry on and resilver the mirrors
 

caddy013

Dabbler
Joined
Sep 6, 2018
Messages
11
So, the problem with trying to back it all up, is that I can't access it from windows explorer. Guessing this is due to the pool not importing correctly. Is there a way to force import of the pool, or maybe even connect a USB drive to the FreeNAS box and get it off that way somehow?

I originally started this box with two drives, which is why I think I went with a RAID1-type setup. I'd like to switch all of this to RAIDz2 when all this is fixed so I'll probably snag an external, back it all up, then destroy the thing and start over...
 

caddy013

Dabbler
Joined
Sep 6, 2018
Messages
11
Update: just tried making sure everything is plugged in well enough. ...BIOS doesn't even recognize the two missing drives are there. I suppose it could be a problem with the ports on the motherboard? I'm thinking a bad drive is maybe more likely though *shrug*
 
Joined
Dec 29, 2014
Messages
1,135
You can add the "-f" flag to the zpool import. You can escalate to "-F" if the first attempt fails.
 

caddy013

Dabbler
Joined
Sep 6, 2018
Messages
11
Well, I have good news and confusing news.

After trying to do another zpool import, I got an error and an entire mirror had disappeared...

So that got me to thinking (I had already messed with the cables once)...

I shut everything down, took it all apart, took all the drives out and disconnected all the power and sata cables, then plugged drives in two at a time to see if they got picked up in BIOS. All of them did. So now I'm thinking either I have some bad SATA cables (less likely since I just recently replaced them), bad power supply/cables/connectors, bad ports on the mobo, OR, stuff is just squeezed in there so tight that a couple of the drives just didn't have a good connection.

So the troubleshooting continues, but for now it at least appears as though my drives might not be toast.
 

blueether

Patron
Joined
Aug 6, 2018
Messages
259
Good luck in finding the bad cable/power/port
 

caddy013

Dabbler
Joined
Sep 6, 2018
Messages
11
Alrighty, so here's where I'm at now...

All of my drives are physically out of my system. I plugged each of them in one by one and did a reboot in between each time to see if they showed up in BIOS or not. One by one, they all showed up. Success! Sort of. ***may or may not be important to note that I'm pretty sure none of the drives were attached to their original SATA port on the motherboard when I plugged it all back in. I seem to remember something a while back saying that wasn't an issue though.

Next, I let it boot into FreeNAS. Everything was going better than previously when I started seeing messages something along the lines of .. doing scan sync txg... with some other numbers and things. Some searches led me to believe that may indicate resilvering. Sure enough, it started resilvering one of the drives last night (this computer has been packed up for over three months and I can't remember what it's exact state was when we were getting things ready to move).

Anyway, ran some errands this morning (it was still resilvering). While I was out, received an e-mail for a critical alert "The volume spradlin state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

Just got home a bit ago, and this is what I'm seeing:

zpool status:
Code:
  pool: freenas-boot
 state: ONLINE
  scan: scrub repaired 0 in 0 days 00:04:46 with 0 errors on Sat Jan  4 03:49:46 2020
config:

    NAME                                          STATE     READ WRITE CKSUM
    freenas-boot                                  ONLINE       0     0     0
      gptid/6da54db8-b264-11e4-8c0d-bc5ff4e7a55f  ONLINE       0     0     0

errors: No known data errors

  pool: spradlin
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
    attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
    using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 2.42T in 0 days 08:04:28 with 0 errors on Sat Jan  4 09:32:21 2020
config:

    NAME                                            STATE     READ WRITE CKSUM
    spradlin                                        ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        gptid/20958994-9461-11e9-be2d-bc5ff4e7a55f  ONLINE       0     0   108
        gptid/26159776-92c0-11e4-a186-bc5ff4e7a55f  ONLINE       0     0     0
      mirror-1                                      ONLINE       0     0     0
        gptid/137f3c38-dd06-11e5-990e-bc5ff4e7a55f  ONLINE       0     0     0
        gptid/842ac0d2-ae7d-11e8-b1c8-bc5ff4e7a55f  ONLINE       0     0     0
      mirror-2                                      ONLINE       0     0     0
        gptid/1a92ec0a-dd06-11e5-990e-bc5ff4e7a55f  ONLINE       0     0     0
        gptid/967e579f-b001-11e8-8f71-bc5ff4e7a55f  ONLINE       0     0     0

errors: No known data errors


The only thing I notice in this is a crap ton of checksum errors, and FreeNAS is recommending zpool clear.

Here's the smartctl output, formatted for reading:
Code:
########## SMART status report summary for all drives ##########

+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+
|Device|Serial         |Temp|Power|Start|Spin |ReAlloc|Current|Offline |UDMA  |Seek  |High  |Command|Last|
|      |               |    |On   |Stop |Retry|Sectors|Pending|Uncorrec|CRC   |Errors|Fly   |Timeout|Test|
|      |               |    |Hours|Count|Count|       |Sectors|Sectors |Errors|      |Writes|Count  |Age |
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+
|ada0 ?|WD-WMC4N0MAL5D1| 28 |28349|   46|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|  13|
|ada1 ?|WD-WCC7K0KJXKJ3| 28 | 6941|   23|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|  13|
|ada2 ?|WD-WCC4N7FSXRPT| 29 |28349|   46|    0|      0|      0|       0|     5|   N/A|   N/A|    N/A| 276|
|ada3 ?|WD-WCC4N7FSXEU2| 29 |28348|   45|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|  13|
|ada4 ?|WD-WMC4N1069088| 26 |45591|  215|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|  58|
|ada5 ?|WD-WCC7K7LJL4TR| 26 |  170|   13|    0|      0|      0|       0|     0|   N/A|   N/A|    N/A|   7|
+------+---------------+----+-----+-----+-----+-------+-------+--------+------+------+------+-------+----+



########## SMART status report for ada0 drive (Western Digital Red: WD-WMC4N0MAL5D1) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   182   182   021    Pre-fail  Always       -       5858
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       46
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28349
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       45
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       922
194 Temperature_Celsius     0x0022   122   097   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     28037         -



########## SMART status report for ada1 drive (Western Digital Red: WD-WCC7K0KJXKJ3) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   169   169   021    Pre-fail  Always       -       6541
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       23
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   091   091   000    Old_age   Always       -       6941
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       23
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       10
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       96
194 Temperature_Celsius     0x0022   122   104   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%      6638         -



########## SMART status report for ada2 drive (Western Digital Red: WD-WCC4N7FSXRPT) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   176   175   021    Pre-fail  Always       -       6183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       46
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28349
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       45
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       14
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1948
194 Temperature_Celsius     0x0022   121   101   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   199   000    Old_age   Always       -       5
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     21734         -



########## SMART status report for ada3 drive (Western Digital Red: WD-WCC4N7FSXEU2) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       35
  3 Spin_Up_Time            0x0027   176   176   021    Pre-fail  Always       -       6183
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       45
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   062   062   000    Old_age   Always       -       28348
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       44
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       13
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       1078
194 Temperature_Celsius     0x0022   121   095   000    Old_age   Always       -       29
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

No Errors Logged

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Completed without error       00%     28036         -



########## SMART status report for ada4 drive (Western Digital Red: WD-WMC4N1069088) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   211   175   021    Pre-fail  Always       -       4450
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       215
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   038   038   000    Old_age   Always       -       45591
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       206
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       40
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       789
194 Temperature_Celsius     0x0022   124   090   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       0

ATA Error Count: 46 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 46 occurred at disk power-on lifetime: 45100 hours (1879 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 50 01 47 43  Error: UNC 8 sectors at LBA = 0x03470150 = 54985040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 50 01 47 43 08      08:36:42.681  READ DMA
  c8 00 08 50 01 47 43 08      08:36:35.682  READ DMA
  c8 00 08 50 01 47 43 08      08:36:28.683  READ DMA
  c8 00 08 50 01 47 43 08      08:36:21.684  READ DMA
  c8 00 08 50 01 47 43 08      08:36:14.685  READ DMA

Error 45 occurred at disk power-on lifetime: 45100 hours (1879 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 50 01 47 43  Error: UNC 8 sectors at LBA = 0x03470150 = 54985040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 50 01 47 43 08      08:36:35.682  READ DMA
  c8 00 08 50 01 47 43 08      08:36:28.683  READ DMA
  c8 00 08 50 01 47 43 08      08:36:21.684  READ DMA
  c8 00 08 50 01 47 43 08      08:36:14.685  READ DMA

Error 44 occurred at disk power-on lifetime: 45100 hours (1879 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 50 01 47 43  Error: UNC 8 sectors at LBA = 0x03470150 = 54985040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 50 01 47 43 08      08:36:28.683  READ DMA
  c8 00 08 50 01 47 43 08      08:36:21.684  READ DMA
  c8 00 08 50 01 47 43 08      08:36:14.685  READ DMA

Error 43 occurred at disk power-on lifetime: 45100 hours (1879 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 50 01 47 43  Error: UNC 8 sectors at LBA = 0x03470150 = 54985040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 50 01 47 43 08      08:36:21.684  READ DMA
  c8 00 08 50 01 47 43 08      08:36:14.685  READ DMA

Error 42 occurred at disk power-on lifetime: 45100 hours (1879 days + 4 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 50 01 47 43  Error: UNC 8 sectors at LBA = 0x03470150 = 54985040

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 50 01 47 43 08      08:36:14.685  READ DMA

Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline    Interrupted (host reset)      90%     44198         -



########## SMART status report for ada5 drive (Western Digital Red: WD-WCC7K7LJL4TR) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   186   181   021    Pre-fail  Always       -       5658
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       13
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   199   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       170
 10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       13
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       4
194 Temperature_Celsius     0x0022   124   107   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

No Errors Logged


That should include everything, but I have the raw output attached.

There's a lot going on here. Biggest problem I see with this is that ada4 has some errors, and the load cycle counts for some of the drives seem to be relatively high.

I'm not sure how many of my problems were caused by my cable situation earlier. I was getting a pretty consistent error on one of them before we moved and I just hadn't had time to look into it, but looking back, again, it may have been cable related.

Like I mentioned above, my primary goal is to get everything healthy enough to work until I back it all up to an external drive or something so that I can start over from scratch with a RAIDz2 setup (I realize this could have been solved had I been doing backups all along).

Any thoughts/suggestions on where I should go from here? Moving everything to a bigger case will definitely happen sooner than later. Should I worry about doing a zpool clear? Are things good enough for me to start backing up now and just worry about everything after I start over?

Is there anything else I should check? Do y'all need any more info?

Thanks!
 

Attachments

  • smartctlOutput.txt
    38.6 KB · Views: 239
  • dmesg.txt
    12.6 KB · Views: 243
Joined
Dec 29, 2014
Messages
1,135
Based on your SMART test results, ad4 looks to be on its last legs. I would suggest replacing that as soon as possible. Then you can move on to other changes.
 
Top