Lost ZFS pool after upgrading to 9.1.1

Status
Not open for further replies.

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
I decided to upgrade to 9.1.1 today and when I did my ifs pool didnt come back up. It gave an error about unrecognized pool I believe. So I deleted it so I could just auto reimport it. Yes I made sure to uncheck the checkboxes.

It doesnt auto import and when I reboot I see errors about da1 being offline and seagate references to a kb article (site was down of coarse)

So I know I will need to buy a new drive but shouldn't it still be able to auto import the pool and just run in degraded state? And since I cant seem to see the volume/pool how will I be able to replace and run the commands to resliver and replace if it doesnt even see the pool?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Post the following outputs my fellow Arizonan!

gpart status
zpool import
zpool status
smartctl -a /dev/da1

Please put it in CODE or attach it as a file. The formatting of the text is important.
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Code:
    FreeNAS (c) 2009-2013, The FreeNAS Development Team
    All rights reserved.
    FreeNAS is released under the modified BSD license.
 
    For more information, documentation, help or support, go here:
    http://freenas.org
Welcome to FreeNAS
[root@freenas] ~# gpart status
  Name  Status  Components
da0s1      OK  da0
da0s2      OK  da0
da0s3      OK  da0
da0s4      OK  da0
da0s1a      OK  da0s1
da0s2a      OK  da0s2
[root@freenas] ~# zpool import
[root@freenas] ~# zpool status
no pools available
[root@freenas] ~# smartctl -a /dev/da1
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Seagate Barracuda LP
Device Model:    ST32000542AS
Serial Number:    5XW2HGK3
LU WWN Device Id: 5 000c50 02f6a4eab
Firmware Version: CC34
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5900 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Wed Aug 28 21:17:28 2013 MST
 
==> WARNING: A firmware update for this drive may be available,
see the following Seagate web pages:
http://knowledge.seagate.com/articles/en_US/FAQ/207931en
http://knowledge.seagate.com/articles/en_US/FAQ/213915en
 
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
See vendor-specific Attribute list for marginal Attributes.
 
General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (  663) seconds.
Offline data collection
capabilities:              (0x73) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    No Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 454) minutes.
Conveyance self-test routine
recommended polling time:      (  2) minutes.
SCT capabilities:            (0x103f)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000f  088  083  006    Pre-fail  Always      -      6409621
  3 Spin_Up_Time            0x0003  100  100  000    Pre-fail  Always      -      0
  4 Start_Stop_Count        0x0032  100  100  020    Old_age  Always      -      43
  5 Reallocated_Sector_Ct  0x0033  100  100  036    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000f  081  060  030    Pre-fail  Always      -      158457215
  9 Power_On_Hours          0x0032  076  076  000    Old_age  Always      -      21231
10 Spin_Retry_Count        0x0013  100  100  097    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  020    Old_age  Always      -      43
183 Runtime_Bad_Block      0x0032  100  100  000    Old_age  Always      -      0
184 End-to-End_Error        0x0032  100  100  099    Old_age  Always      -      0
187 Reported_Uncorrect      0x0032  001  001  000    Old_age  Always      -      121
188 Command_Timeout        0x0032  100  099  000    Old_age  Always      -      1
189 High_Fly_Writes        0x003a  031  031  000    Old_age  Always      -      69
190 Airflow_Temperature_Cel 0x0022  063  044  045    Old_age  Always  In_the_past 37 (0 56 41 29 0)
194 Temperature_Celsius    0x0022  037  056  000    Old_age  Always      -      37 (0 20 0 0 0)
195 Hardware_ECC_Recovered  0x001a  044  028  000    Old_age  Always      -      6409621
197 Current_Pending_Sector  0x0012  100  100  000    Old_age  Always      -      22
198 Offline_Uncorrectable  0x0010  100  100  000    Old_age  Offline      -      22
199 UDMA_CRC_Error_Count    0x003e  200  200  000    Old_age  Always      -      0
240 Head_Flying_Hours      0x0000  100  253  000    Old_age  Offline      -      169062797693539
241 Total_LBAs_Written      0x0000  100  253  000    Old_age  Offline      -      3046485885
242 Total_LBAs_Read        0x0000  100  253  000    Old_age  Offline      -      3868897209
 
SMART Error Log Version: 1
ATA Error Count: 121 (device log contains only the most recent five errors)
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 121 occurred at disk power-on lifetime: 21229 hours (884 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  1d+07:37:01.058  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00  1d+07:37:01.023  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  1d+07:37:01.022  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  1d+07:37:01.022  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  1d+07:37:01.021  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 120 occurred at disk power-on lifetime: 21229 hours (884 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  1d+07:36:57.229  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00  1d+07:36:57.198  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  1d+07:36:57.197  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  1d+07:36:57.197  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  1d+07:36:57.196  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 119 occurred at disk power-on lifetime: 21229 hours (884 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  1d+07:36:53.357  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00  1d+07:36:53.322  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  1d+07:36:53.321  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  1d+07:36:53.321  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  1d+07:36:53.321  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 118 occurred at disk power-on lifetime: 21229 hours (884 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  1d+07:36:49.502  READ FPDMA QUEUED
  27 00 00 00 00 00 e0 00  1d+07:36:49.477  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 00  1d+07:36:49.476  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 00  1d+07:36:49.475  SET FEATURES [Set transfer mode]
  27 00 00 00 00 00 e0 00  1d+07:36:49.475  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
 
Error 117 occurred at disk power-on lifetime: 21229 hours (884 days + 13 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 40 ff ff ff 4f 00  1d+07:36:45.693  READ FPDMA QUEUED
  60 00 40 ff ff ff 4f 00  1d+07:36:45.691  READ FPDMA QUEUED
  ea 00 00 00 00 00 a0 00  1d+07:36:45.682  FLUSH CACHE EXT
  61 00 02 ff ff ff 4f 00  1d+07:36:45.681  WRITE FPDMA QUEUED
  60 00 2a ff ff ff 4f 00  1d+07:36:45.671  READ FPDMA QUEUED
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    21222        -
# 2  Short offline      Completed without error      00%    21210        -
# 3  Short offline      Completed without error      00%    21198        -
# 4  Short offline      Completed without error      00%    21186        -
# 5  Short offline      Completed without error      00%    21174        -
# 6  Short offline      Completed without error      00%    21161        -
# 7  Short offline      Completed without error      00%    21149        -
# 8  Short offline      Completed without error      00%    21137        -
# 9  Short offline      Completed without error      00%    21125        -
#10  Short offline      Completed without error      00%    21113        -
#11  Short offline      Completed without error      00%    21101        -
#12  Short offline      Completed without error      00%    21089        -
#13  Short offline      Completed without error      00%    21077        -
#14  Short offline      Completed without error      00%    21065        -
#15  Short offline      Completed without error      00%    21053        -
#16  Short offline      Completed without error      00%    21041        -
#17  Short offline      Completed without error      00%    21029        -
#18  Short offline      Completed without error      00%    21017        -
#19  Short offline      Completed without error      00%    21005        -
#20  Short offline      Completed without error      00%    20993        -
#21  Short offline      Completed without error      00%    20981        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~#
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Yeah, you have serious problems. Your partition table is gone, hence the zpool import didn't list any importable pools. I'm not sure how you're going to recover you partition table. If you could manage to recreate the exact table you had, you could possibly get your data back.

Also, your hard drive is bad. It's not like "ZOMG this thing is trash" but its definitely on its way out. I'd say you have a very good chance of recovering your data if you can restore your partition table.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Normally, I'd say you did check that box that says "mark the disks as new" as that is supposed to destroy your partition table, but you are saying you didn't check it.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
It depends on how important this data is. If it's worth the cost of 5 more drives I'd definitely go that route soyou have a backup copy. If you don't and you make a mistake things can be unrecoverable.

There is a forum post here somewhere that discusses how to recover your partition table in FreeNAS if you did click the "mark disks as new", I just have to see if I can find it tomorrow. I'm dead tired and it's just a few minutes until midnight.

I know this is little consolation, but I'm betting if we can reproduce your partition table then your data will be available again. We just have to be slow and steady with it and not start doing things out of haste because we are in a panic.

I'm wondering if there is a bug with 9.1.1 where if you don't check the box it will trash your partition table. Might be something to test tomorrow in a VM just to verify it.
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Well I think the table was trashed before I deleted the pool.

Do they have to be the same drives or can I take this chance to upgrade to the nas designed drives?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You could upgrade to nas drives if you wanted. But Definitely make sure they are the same size or bigger. If you are 1 sector too small that could spell problems. :P
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Damn. I can't seem to find the post where a recovery of the GPT table was done. I'll try searching some more. I'm sure someone here knows the precise commands to recreate the GPT table. Pretty much a 2GB swap is created, then the rest is ZFS. You just have to reproduce the sizes exactly.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I doubt it. If the table had been trashed the system wouldn't have been able to mount the pool nor would it have let you "detach" the pool. It would have thrown an error and said it couldn't export and the pool would still be listed in the GUI.
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Hmmm
I wish I wrote down the error. It essentially was that it couldn't find the pool. So now I'm wondering though about the raid aspect. Should I just be able to remove the bad drive and import the volume from the remaining drives? Should I be running smartctl -a /dev/da1 for 1-5?

I suppose I'm simply dreaming if I think the firmware update would automagially fix this as well.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
You are dreaming if you think a firmware update will fix this.

It sure appears that you may have trashed your pool with the detaching(and marking the "mark as new disks"). It's totally recoverable, but I'm not 100% sure on how to do it. I was really hoping someone would pop in with the answer. I'm really busy until Monday of next week so if you don't get this straightened out by then make sure to PM me with this thread so I can make some time to help.

Of course, if your data is important to you then I'd definitely wait on doing anything until you duplicate your disks.
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Cyberjack,
I was looking through some other threads trying to find what you were talking about, it looks like you are very active and helping everyone out. I'm sure others are thinking the same thing that you are a great resource. The digital angel.

Thank you!
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
I appreciate the sentiment, but make a RAIDZ1 of 10 disks and you won't be calling me angel... maybe hell's angel. LOL
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Ok Powered off the bad drive (actually removed it in my vm)
Ran the smartctl on the rest of the drives

I guess ill scour around for the gpt commands some more.

Code:
[root@freenas] ~# smartctl -a /dev/da1
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K2000
Device Model:    Hitachi HDS722020ALA330
Serial Number:    JK1130YAGLAX5T
LU WWN Device Id: 5 000cca 221c857a1
Firmware Version: JKAOA20N
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 29 18:20:56 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
 
General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (22036) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 367) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  086  086  016    Pre-fail  Always      -      154
  2 Throughput_Performance  0x0005  133  133  054    Pre-fail  Offline      -      101
  3 Spin_Up_Time            0x0007  156  156  024    Pre-fail  Always      -      505 (Average 422)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      158
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      3
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  112  112  020    Pre-fail  Offline      -      39
  9 Power_On_Hours          0x0012  096  096  000    Old_age  Always      -      32952
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      145
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      1001
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      1001
194 Temperature_Celsius    0x0002  146  146  000    Old_age  Always      -      41 (Min/Max 21/59)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      3
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      2
 
SMART Error Log Version: 1
ATA Error Count: 2
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
 
Error 2 occurred at disk power-on lifetime: 5217 hours (217 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 56 aa 33 4d 00  Error: ICRC, ABRT at LBA = 0x004d33aa = 5059498
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 00 98 00 33 4d 40 00  34d+10:46:03.585  WRITE FPDMA QUEUED
  60 00 90 00 32 4d 40 00  34d+10:46:03.457  READ FPDMA QUEUED
  60 00 88 00 31 4d 40 00  34d+10:46:03.447  READ FPDMA QUEUED
  61 00 80 00 30 4d 40 00  34d+10:46:03.443  WRITE FPDMA QUEUED
  60 00 78 00 2f 4d 40 00  34d+10:46:03.425  READ FPDMA QUEUED
 
Error 1 occurred at disk power-on lifetime: 2646 hours (110 days + 6 hours)
  When the command that caused the error occurred, the device was active or idle.
 
  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 2b 15 94 9b 00  Error: ICRC, ABRT 43 sectors at LBA = 0x009b9415 = 10195989
 
  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC  Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  35 00 40 00 94 9b e3 00  26d+19:37:29.751  WRITE DMA EXT
  35 00 40 c0 93 9b e3 00  26d+19:37:29.749  WRITE DMA EXT
  35 00 40 80 93 9b e3 00  26d+19:37:29.747  WRITE DMA EXT
  35 00 40 40 93 9b e3 00  26d+19:37:29.745  WRITE DMA EXT
  35 00 40 00 93 9b e3 00  26d+19:37:29.743  WRITE DMA EXT
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    32946        -
# 2  Short offline      Completed without error      00%    32934        -
# 3  Short offline      Completed without error      00%    32922        -
# 4  Short offline      Completed without error      00%    32910        -
# 5  Short offline      Completed without error      00%    32898        -
# 6  Short offline      Completed without error      00%    32886        -
# 7  Short offline      Completed without error      00%    32874        -
# 8  Short offline      Completed without error      00%    32862        -
# 9  Short offline      Completed without error      00%    32850        -
#10  Short offline      Completed without error      00%    32838        -
#11  Short offline      Completed without error      00%    32826        -
#12  Short offline      Completed without error      00%    32814        -
#13  Short offline      Completed without error      00%    32802        -
#14  Short offline      Completed without error      00%    32790        -
#15  Short offline      Completed without error      00%    32778        -
#16  Short offline      Completed without error      00%    32766        -
#17  Short offline      Completed without error      00%    32754        -
#18  Short offline      Completed without error      00%    32742        -
#19  Short offline      Completed without error      00%    32730        -
#20  Short offline      Completed without error      00%    32718        -
#21  Short offline      Completed without error      00%    32706        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~# smartctl -a /dev/da2
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K2000
Device Model:    Hitachi HDS722020ALA330
Serial Number:    JK1130YAH4KWVT
LU WWN Device Id: 5 000cca 221d02c2d
Firmware Version: JKAOA28A
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 29 18:22:18 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
 
General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (22918) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 382) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  093  093  016    Pre-fail  Always      -      131107
  2 Throughput_Performance  0x0005  132  132  054    Pre-fail  Offline      -      103
  3 Spin_Up_Time            0x0007  154  154  024    Pre-fail  Always      -      407 (Average 532)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      167
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      2
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  110  110  020    Pre-fail  Offline      -      40
  9 Power_On_Hours          0x0012  096  096  000    Old_age  Always      -      30897
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      137
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      1003
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      1003
194 Temperature_Celsius    0x0002  150  150  000    Old_age  Always      -      40 (Min/Max 21/60)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      2
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    30891        -
# 2  Short offline      Completed without error      00%    30879        -
# 3  Short offline      Completed without error      00%    30867        -
# 4  Short offline      Completed without error      00%    30855        -
# 5  Short offline      Completed without error      00%    30843        -
# 6  Short offline      Completed without error      00%    30831        -
# 7  Short offline      Completed without error      00%    30819        -
# 8  Short offline      Completed without error      00%    30807        -
# 9  Short offline      Completed without error      00%    30795        -
#10  Short offline      Completed without error      00%    30783        -
#11  Short offline      Completed without error      00%    30771        -
#12  Short offline      Completed without error      00%    30759        -
#13  Short offline      Completed without error      00%    30747        -
#14  Short offline      Completed without error      00%    30735        -
#15  Short offline      Completed without error      00%    30723        -
#16  Short offline      Completed without error      00%    30711        -
#17  Short offline      Completed without error      00%    30699        -
#18  Short offline      Completed without error      00%    30687        -
#19  Short offline      Completed without error      00%    30675        -
#20  Short offline      Completed without error      00%    30663        -
#21  Short offline      Completed without error      00%    30651        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~# smartctl -a /dev/da3
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K2000
Device Model:    Hitachi HDS722020ALA330
Serial Number:    JK1131YAHKZ0EV
LU WWN Device Id: 5 000cca 221d6427b
Firmware Version: JKAOA28A
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 29 18:22:25 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
 
General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (21742) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 362) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  095  095  016    Pre-fail  Always      -      17
  2 Throughput_Performance  0x0005  132  132  054    Pre-fail  Offline      -      103
  3 Spin_Up_Time            0x0007  158  158  024    Pre-fail  Always      -      496 (Average 416)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      109
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  112  112  020    Pre-fail  Offline      -      39
  9 Power_On_Hours          0x0012  096  096  000    Old_age  Always      -      30282
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      101
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      745
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      745
194 Temperature_Celsius    0x0002  153  153  000    Old_age  Always      -      39 (Min/Max 23/63)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    30276        -
# 2  Short offline      Completed without error      00%    30264        -
# 3  Short offline      Completed without error      00%    30252        -
# 4  Short offline      Completed without error      00%    30240        -
# 5  Short offline      Completed without error      00%    30228        -
# 6  Short offline      Completed without error      00%    30216        -
# 7  Short offline      Completed without error      00%    30204        -
# 8  Short offline      Completed without error      00%    30192        -
# 9  Short offline      Completed without error      00%    30180        -
#10  Short offline      Completed without error      00%    30168        -
#11  Short offline      Completed without error      00%    30156        -
#12  Short offline      Completed without error      00%    30144        -
#13  Short offline      Completed without error      00%    30132        -
#14  Short offline      Completed without error      00%    30120        -
#15  Short offline      Completed without error      00%    30108        -
#16  Short offline      Completed without error      00%    30096        -
#17  Short offline      Completed without error      00%    30084        -
#18  Short offline      Completed without error      00%    30072        -
#19  Short offline      Completed without error      00%    30060        -
#20  Short offline      Completed without error      00%    30048        -
#21  Short offline      Completed without error      00%    30036        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~# smartctl -a /dev/da4
smartctl 6.1 2013-03-16 r3800 [FreeBSD 9.1-STABLE amd64] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
 
=== START OF INFORMATION SECTION ===
Model Family:    Hitachi Deskstar 7K2000
Device Model:    Hitachi HDS722020ALA330
Serial Number:    JK1131YAHL3LAV
LU WWN Device Id: 5 000cca 221d653aa
Firmware Version: JKAOA28A
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    7200 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:  ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Thu Aug 29 18:22:31 2013 MST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
 
=== START OF READ SMART DATA SECTION ===
SMART STATUS RETURN: incomplete response, ATA output registers missing
SMART overall-health self-assessment test result: PASSED
Warning: This result is based on an Attribute check.
 
General SMART Values:
Offline data collection status:  (0x85)    Offline data collection activity
                    was aborted by an interrupting command from host.
                    Auto Offline Data Collection: Enabled.
Self-test execution status:      (  0)    The previous self-test routine completed
                    without error or no self-test has ever
                    been run.
Total time to complete Offline
data collection:        (23506) seconds.
Offline data collection
capabilities:              (0x5b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    No Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine
recommended polling time:      (  1) minutes.
Extended self-test routine
recommended polling time:      ( 392) minutes.
SCT capabilities:            (0x003d)    SCT Status supported.
                    SCT Error Recovery Control supported.
                    SCT Feature Control supported.
                    SCT Data Table supported.
 
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG    VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate    0x000b  092  092  016    Pre-fail  Always      -      65576
  2 Throughput_Performance  0x0005  132  132  054    Pre-fail  Offline      -      106
  3 Spin_Up_Time            0x0007  156  156  024    Pre-fail  Always      -      505 (Average 424)
  4 Start_Stop_Count        0x0012  100  100  000    Old_age  Always      -      129
  5 Reallocated_Sector_Ct  0x0033  100  100  005    Pre-fail  Always      -      0
  7 Seek_Error_Rate        0x000b  100  100  067    Pre-fail  Always      -      0
  8 Seek_Time_Performance  0x0005  112  112  020    Pre-fail  Offline      -      39
  9 Power_On_Hours          0x0012  096  096  000    Old_age  Always      -      30320
10 Spin_Retry_Count        0x0013  100  100  060    Pre-fail  Always      -      0
12 Power_Cycle_Count      0x0032  100  100  000    Old_age  Always      -      119
192 Power-Off_Retract_Count 0x0032  100  100  000    Old_age  Always      -      918
193 Load_Cycle_Count        0x0012  100  100  000    Old_age  Always      -      918
194 Temperature_Celsius    0x0002  150  150  000    Old_age  Always      -      40 (Min/Max 22/57)
196 Reallocated_Event_Count 0x0032  100  100  000    Old_age  Always      -      0
197 Current_Pending_Sector  0x0022  100  100  000    Old_age  Always      -      0
198 Offline_Uncorrectable  0x0008  100  100  000    Old_age  Offline      -      0
199 UDMA_CRC_Error_Count    0x000a  200  200  000    Old_age  Always      -      0
 
SMART Error Log Version: 0
No Errors Logged
 
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline      Completed without error      00%    30314        -
# 2  Short offline      Completed without error      00%    30302        -
# 3  Short offline      Completed without error      00%    30290        -
# 4  Short offline      Completed without error      00%    30278        -
# 5  Short offline      Completed without error      00%    30266        -
# 6  Short offline      Completed without error      00%    30254        -
# 7  Short offline      Completed without error      00%    30242        -
# 8  Short offline      Completed without error      00%    30230        -
# 9  Short offline      Completed without error      00%    30218        -
#10  Short offline      Completed without error      00%    30206        -
#11  Short offline      Completed without error      00%    30194        -
#12  Short offline      Completed without error      00%    30182        -
#13  Short offline      Completed without error      00%    30170        -
#14  Short offline      Completed without error      00%    30158        -
#15  Short offline      Completed without error      00%    30146        -
#16  Short offline      Completed without error      00%    30134        -
#17  Short offline      Completed without error      00%    30122        -
#18  Short offline      Completed without error      00%    30110        -
#19  Short offline      Completed without error      00%    30098        -
#20  Short offline      Completed without error      00%    30086        -
#21  Short offline      Completed without error      00%    30074        -
 
SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
 
[root@freenas] ~#
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Whoa, wait a minute.. you are doing this in a VM?

At this point you need to explain ALOT more of how this is configured. I typically walk away at this point with a "good luck" comment because the complexity you just added to the recovery process is something I'm not interested in dealing with(and not something that I can expect to be accurate if past data loss and user comments are any indication). VMs take just about everything that is real and abstract it, making many of the assumptions about recovery wrong.

How are these disks presented to the VM? RDM? VT-d? If VT-d then what controller are you using for the passhthrough?
 

Wind_freak

Dabbler
Joined
Nov 28, 2011
Messages
20
Yes esxi 5.1
I added the disks so that the vm has direct access to them - mapped lun I think.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So yeah.. you did RDM. RDM is a major no-no. And if you do any searching you'll find stuff like:

http://forums.freenas.org/threads/freenas-esx-rdm-issue.14639/

http://forums.freenas.org/threads/p...duction-as-a-virtual-machine.12484/#post58364

http://forums.freenas.org/threads/a...ide-to-not-completely-losing-your-data.12714/

http://forums.freenas.org/threads/pool-gone-after-reset-import-does-not-help.14435/

and a heated discussion: http://forums.freenas.org/threads/disks-not-configured-in-freenas-9-1-release.14287/

The bottom line; I can't help you. One of the forum users(FlynnVT) recovered some data for another user, but he can better explain what he did than I. And unfortunately the commands that they used are only appropriate for that one system. I would recommend you PM him, but for some reason you can't.

I posted in that last thread. Maybe FlynnVT will help you out.
 
Status
Not open for further replies.
Top