Porting RAIDZ1 pool to new hardware

Status
Not open for further replies.

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
My best guess is a bad contact on one of the ends of the SATA cable of this drive. Try to reseat both ends of the cable of the problematic drive ;)
Hi,

The cable is a Mini SAS to 4 x SATA that I bought off ebay, it may be of dubious quality .... I don't know where to get a good one though .. the SAS end is a bit loose ..

It looks like this: http://www.ebay.com/itm/381352379322
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
I've heard that Monoprice sell good cables at low price but I never ordered here so you may want to wait for another answer.

But for now you can try to reseat it (with the server powered off), you don't have anything to lose I guess :)
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
I notice that with all the drives in the old system again, I am getting the following warning from FreeNAS.

  • WARNING: The volume myvolume (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'.

How do I fix this? I don't know how to "Determine if the device needs to be replaced"

EDIT:

"zpool status" shows:
NAME STATE READ WRITE CKSUM
myvolume ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
gptid/40bcdf65-04e9-11e6-b34c-ac162d0caf04 ONLINE 0 0 5
gptid/4128469f-04e9-11e6-b34c-ac162d0caf04 ONLINE 0 0 0
gptid/418ee1b6-04e9-11e6-b34c-ac162d0caf04 ONLINE 0 0 0

errors: No known data errors

It seems my brand new 2TB RED drive is having issues .... how do I run the SMART test on this drive? ... or, does the pool just need a scrub?

EDIT2:

glabel status shows the "faulty" device as /dev/ada1
smartctl -a /dev/ada1 (the drive showing the errors ) does not show any errors on the drive ....

Does this mean I just scrub and "zpool clear" the errors? ..
 
Last edited:

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Post the output of smartctl -a /adaX for each drive (in between code tags or using pastebin for the readability) please :)
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
Here you go:
/dev/ada0
Code:
[bmentink@freenas ~]$ sudo smartctl -a /dev/ada0
Password:
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p15 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     Hitachi HDS721010CLA632
Serial Number:    JP2940N01BAX0L
LU WWN Device Id: 5 000cca 39ad34084
Firmware Version: JP4OA41A
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jun 10 09:55:24 2016 NZST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x84)    Offline data collection activity
                    was suspended by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         ( 9396) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 157) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   097   094   016    Pre-fail  Always       -       6
  2 Throughput_Performance  0x0027   137   100   054    Pre-fail  Always       -       90
  3 Spin_Up_Time            0x0023   125   100   024    Pre-fail  Always       -       294 (Average 312)
  4 Start_Stop_Count        0x0022   100   100   000    Old_age   Always       -       2143
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002f   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0025   138   100   020    Pre-fail  Offline      -       31
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       8473
10 Spin_Retry_Count        0x0033   100   100   060    Pre-fail  Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       2142
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0033   100   100   097    Pre-fail  Always       -       0
185 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       -       65535
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   065   000    Old_age   Always       -       32702648
189 High_Fly_Writes         0x0032   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   074   058   000    Old_age   Always       -       26 (Min/Max 25/27)
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       2146
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       2146
194 Temperature_Celsius     0x0002   230   142   000    Old_age   Always       -       26 (Min/Max 5/42)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0

SMART Error Log Version: 0
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Aborted by host               90%      6915         -
# 2  Short offline       Completed without error       00%      6915         -
# 3  Extended offline    Interrupted (host reset)      90%         7         -
# 4  Extended offline    Interrupted (host reset)      90%         1         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/ada1
Code:
[bmentink@freenas ~]$ sudo smartctl -a /dev/ada1
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p15 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Red (AF)
Device Model:     WDC WD20EFRX-68EUZN0
Serial Number:    WD-WCC4M6AL1L62
LU WWN Device Id: 5 0014ee 2b7f85842
Firmware Version: 82.00A82
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-2 (minor revision not indicated)
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Fri Jun 10 09:57:02 2016 NZST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (27240) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 275) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x703d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   173   173   021    Pre-fail  Always       -       4308
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       10
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   099   099   000    Old_age   Always       -       1269
10 Spin_Retry_Count        0x0032   100   253   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   253   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       5
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       22
194 Temperature_Celsius     0x0022   119   114   000    Old_age   Always       -       28
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


/dev/ada2
Code:
[bmentink@freenas ~]$ sudo smartctl -a /dev/ada2
smartctl 6.2 2013-07-26 r3841 [FreeBSD 9.2-RELEASE-p15 amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital AV-GP
Device Model:     WDC WD10EVDS-63N5B1
Serial Number:    WD-WCAU4C363255
LU WWN Device Id: 5 0014ee 25864e406
Firmware Version: 01.01A01
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS (minor revision not indicated)
SATA Version is:  SATA 2.5, 3.0 Gb/s
Local Time is:    Fri Jun 10 09:57:46 2016 NZST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:         (23400) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (   2) minutes.
Extended self-test routine
recommended polling time:      ( 268) minutes.
Conveyance self-test routine
recommended polling time:      (   5) minutes.
SCT capabilities:            (0x303f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   170   163   021    Pre-fail  Always       -       6500
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       367
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   026   026   000    Old_age   Always       -       54617
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       334
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       178
193 Load_Cycle_Count        0x0032   200   200   000    Old_age   Always       -       367
194 Temperature_Celsius     0x0022   119   099   000    Old_age   Always       -       31
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]


SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
Right, so a FreeNAS issue then ... will clearing the error and doing a scrub help?
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
None of those drives has ever seen a SMART self-test (in the case of one of them, in over 6 years of continuous operation). That's bad. You should be running long and short SMART tests regularly on your drives. With that said, there are no apparent errors. A scrub (no need to clear first) would show if this is just a transient issue, or something persistent.
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
None of those drives has ever seen a SMART self-test (in the case of one of them, in over 6 years of continuous operation). That's bad. You should be running long and short SMART tests regularly on your drives. With that said, there are no apparent errors. A scrub (no need to clear first) would show if this is just a transient issue, or something persistent.
The drives have never seen a SMART test because they were pulled from desktop machines. The drive that is having CRC errors is a brand new drive.
Now that they are in my server, I will do a SMART test and a scrub ..
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
More likely some other hardware issue. Of course, a disk can be bad without any evidence showing up in SMART data.
So how do I prove it is a hardware issue without SMART verification there was an error? (.. for warranty purpose) The fact that I get CRC issues on this drive in different hardware machines means it is the drive or FreeNAS ...

Is it possible Freenas did something to the drive going from 9.10 back to 9.2.1?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The fact that I get CRC issues on this drive in different hardware machines means it is the drive or FreeNAS ...

For the third time: try to reseat the SATA cable at both ends and if it doesn't work then try to change the cable. CRC errors are usually because of bad connections or bad cables.

Is it possible Freenas did something to the drive going from 9.10 back to 9.2.1?

No.
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
For the third time: try to reseat the SATA cable at both ends and if it doesn't work then try to change the cable. CRC errors are usually because of bad connections or bad cables.
No.

Duh, don't you think I have tried that already? The problem drive has been in two machines with different SATA cables, all drive cables have been reseated. I even changed it over to different SATA positions .... the problem follows the drive .....
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, then why you didn't said you've tried it?

If the problem follows the drive with all those tests then it's the drive ;)
 

Bernard Mentink

Contributor
Joined
Apr 2, 2016
Messages
193
Joined
Nov 11, 2014
Messages
1,174

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
So they are crap?
 
Joined
Nov 11, 2014
Messages
1,174

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, it was the first and last time I recommend them then ;)
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
I buy cables and stuff from Monoprice and will continue to do so. My experience with Monoprice has been just fine. If I did have a problem, I'd just replace it with another and move on.

I would never buy a Monster cable. I've always equated their prices to be a ripoff. And most articles on the internet agree, search for "Monster ripoff" or "Monster versus Monoprice". If the Black Ninja sees value in them, that's his perogitive.

Ok, it was the first and last time I recommend them then
 
Status
Not open for further replies.
Top