GEOM_RAID3: Cannot add disk ada1 to raid3 (error=22).

Status
Not open for further replies.

ThePowerTool

Dabbler
Joined
Aug 31, 2012
Messages
31
I recently had a drive fail in my RAID3 array. I replaced it. For some reason FreeNAS didn't automatically insert the new drive into the RAID3 array. I have seen it do this in the past on an older version of FreeNAS.

Code:
System Information
Hostname           freenas.local
Build              FreeNAS-9.1.1-RELEASE-x86 (a752d35)
Platform           Intel(R) Xeon(TM) CPU 3.60GHz
Memory             3054MB
System Time        Tue Jun 10 15:37:07 PDT 2014
Uptime             3:37PM up 10 days, 1:29, 1 user
Load Average       0.13, 0.07, 0.04
Connected through  192.168.1.250


I manually inserted the new drive. Details - http://forums.freenas.org/index.php?threads/degraded.19454/

I cannot keep the drive. I've run smartctl tests, both long and short. Below is my 2nd attemp to reinsert, /var/log/messages for today (if you need the entire messages log or any other details I'm happy to provide it), and smartctl status.

Could someone please take a look and give me some pointers or help me troubleshoot why the new drive won't stay inserted?

There is a firmware upgrade for these drives (all drives are on the same firmware) but I have not installed it and wanted to check here before trying something like that.

Thank you, in advance!

Details, as promised:
Code:
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
[root@freenas] ~# graid3 insert raid3 ada1
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
                       ada1 (SYNCHRONIZING, 0%)
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
                       ada1 (SYNCHRONIZING, 3%)
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
                       ada1 (SYNCHRONIZING, 71%)
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)

************************************************************
/var/log/messages (just for June 10th)
************************************************************
Jun 10 08:25:44 freenas kernel: (ada1:siisch2:0:0:0): lost device
Jun 10 08:25:44 freenas kernel: GEOM_RAID3: Synchronization request failed (error=6). ada1[WRITE(offset=2207810846720, length=65536)]
Jun 10 08:25:44 freenas kernel: GEOM_RAID3: Device raid3: provider ada1 disconnected.
Jun 10 08:25:44 freenas kernel: GEOM_RAID3: Device raid3: rebuilding provider ada1 stopped.
Jun 10 08:25:44 freenas kernel: (ada1:siisch2:0:0:0): removing device entry
Jun 10 08:25:49 freenas kernel: ada1 at siisch2 bus 0 scbus4 target 0 lun 0
Jun 10 08:25:49 freenas kernel: ada1: <ST3000DM001-1CH166 CC24> ATA-8 SATA 3.x device
Jun 10 08:25:49 freenas kernel: ada1: 300.000MB/s transfers (SATA 2.x, UDMA6, PIO 8192bytes)
Jun 10 08:25:49 freenas kernel: ada1: Command Queueing enabled
Jun 10 08:25:49 freenas kernel: ada1: 2861588MB (5860533168 512 byte sectors: 16H 63S/T 16383C)
Jun 10 08:25:49 freenas kernel: ada1: quirks=0x1<4K>
Jun 10 08:25:49 freenas kernel: ada1: Previously was known as ad8
Jun 10 08:25:49 freenas kernel: GEOM_RAID3: Component ada1 (device raid3) broken, skipping.
Jun 10 08:25:49 freenas kernel: GEOM_RAID3: Cannot add disk ada1 to raid3 (error=22).

************************************************************
smartctl status after short and long scan
************************************************************
[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE i386] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.14 (AF)
Device Model:     ST3000DM001-1CH166
LU WWN Device Id: 5 000c50 04f4db186
Firmware Version: CC24
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Mon Jun  9 16:16:45 2014 PDT

==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/223651en

SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (  584) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      ( 328) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.
SCT capabilities:            (0x3085)    SCT Status supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       145303056
  3 Spin_Up_Time            0x0003   098   093   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       127
  5 Reallocated_Sector_Ct   0x0033   100   100   010    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   078   060   030    Pre-fail  Always       -       4364014472
  9 Power_On_Hours          0x0032   090   090   000    Old_age   Always       -       9352
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       105
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0 0 0
189 High_Fly_Writes         0x003a   045   045   000    Old_age   Always       -       55
190 Airflow_Temperature_Cel 0x0022   061   050   045    Old_age   Always       -       39 (Min/Max 38/45)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       120
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       581
194 Temperature_Celsius     0x0022   039   050   000    Old_age   Always       -       39 (0 16 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       9343h+40m+13.681s
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       20665572418
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       63889735084

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      9349         -
# 2  Short offline       Completed without error       00%      9344         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I've never hard of FreeNAS ever adding disks back automatically. If that's a UFS thing that might be why though. UFS is dead on FreeNAS, and about to be removed permanently. ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Sorry, I don't know. I've never used UFS as UFS is inferior to ZFS for my intended use cases. :/

Maybe someone else can pipe in with the answer.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Could you please post the output of gpart show

I am shooting in the dark, but there is nothing to loose...
 

ThePowerTool

Dabbler
Joined
Aug 31, 2012
Messages
31
Could you please post the output of gpart show

I am shooting in the dark, but there is nothing to loose...

The following shows ada1 is missing and the requested output of gpart show:

Code:
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
[root@freenas] ~# gpart show
=>       63  312581745  ada3  MBR  (149G)
         63    1930257     1  freebsd  [active]  (942M)
    1930320         63        - free -  (31k)
    1930383    1930257     2  freebsd  (942M)
    3860640       3024     3  freebsd  (1.5M)
    3863664      41328     4  freebsd  (20M)
    3904992  308676816        - free -  (147G)

=>      0  1930257  ada3s1  BSD  (942M)
        0       16          - free -  (8.0k)
       16  1930241       1  !0  (942M)
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Only now, I have looked up your old post and noticed that in March you had replaced ad0. I guess successfully. And now you are unable to replace ada1? Could you please confirm or deny each of the two statements?

Since your disks ada0, ada1, and ada2 do not show in the output of gpart, can you post output of
Code:
fdisk -s /dev/ada0
fdisk -s /dev/ada1
fdisk -s /dev/ada2
Also please reinsert ada1 again, and while the synchronization process is running, execute
Code:
graid3 list


I would like to verify that the sizes of the slices/partitions are identical. The current FreeNAS versions could have done it differently, but only starting with 9.2.0.

P.S. gpart show only listed partitions of your boot device
 

ThePowerTool

Dabbler
Joined
Aug 31, 2012
Messages
31
OK, this is scarey:

Code:
[root@freenas] ~# fdisk -s /dev/ada0
fdisk: invalid fdisk partition table found
fdisk: read_s0: No such file or directory
[root@freenas] ~# fdisk -s /dev/ada1
fdisk: unable to get correct path for /dev/ada1: No such file or directory
[root@freenas] ~# fdisk -s /dev/ada2
fdisk: invalid fdisk partition table found
fdisk: read_s0: No such file or directory
[root@freenas] ~# graid3 list
Geom name: raid3
State: DEGRADED
Components: 3
Flags: NONE
GenID: 9
SyncID: 8
ID: 433511840
Zone64kFailed: 94671
Zone64kRequested: 227926415
Zone16kFailed: 3713
Zone16kRequested: 3156220
Zone4kFailed: 0
Zone4kRequested: 178122
Providers:
1. Name: raid3/raid3
   Mediasize: 6001185963008 (5.5T)
   Sectorsize: 1024


Above is the fdisk -s for each, as requested. Since it came back bad I went ahead and immediately did a graid3 list. Below is the next part you requested:

Code:
[root@freenas] ~# graid3 status
       Name    Status  Components
raid3/raid3  DEGRADED  ada2 (ACTIVE)
                       ada0 (ACTIVE)
[root@freenas] ~# graid3 insert raid3 ada1
graid3: Invalid provider.
[root@freenas] ~# 


What the heck is going on??? That's new!

FYI & FWIW I'm desperately working on getting my end-of-week incremental backups off of it which appears to be running successfully atm.
 

ThePowerTool

Dabbler
Joined
Aug 31, 2012
Messages
31
FYI--once the backup is complete if you need me to reformat and start over that's an option. Please note from my above-referenced previous post that 1) yep you are correct I thought ada0 inserted successfully and the array appeared to run well for some time; and 2) I don't think this systems is quite up to running zfs based upon CPU and memory (but if you think differently I'm willing to give it a try).
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Your ada1 disk is not seen by the system. Some likely causes:
  • total disk failure (If you touch it, do you feel it spinning?)
  • bad or loose cable (Check cabling, both power and SATA. Do you use locking SATA cables?)
  • SATA port failure (Do you have an unused port you can use?)
P.S. I am not really a hardware guy...
 

ThePowerTool

Dabbler
Joined
Aug 31, 2012
Messages
31
So if you look at my earlier posts, this thread, ada1 was passing the smartctl short and long tests.

Now I'm getting this:

Code:
[root@freenas] ~# smartctl -a /dev/ada1
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE i386] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

/dev/ada1: Unable to detect device type
Please specify device type with the -d option.

Use smartctl -h to get a usage summary


You are correct. I can handle it from here. I have a spare drive which I'll pop in.

I just don't understand why it would pass the smartctl tests like that and then suddenly fail.

Thank you very much for walking through this with me!!!
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Dead disk is only one possibility. A failing SATA port (or even power) have to be considered. This is the second disk at the same port in a very short time.
 
Status
Not open for further replies.
Top