Possible Hard Drive Replacement

Status
Not open for further replies.

9C1 Newbee

Patron
Joined
Oct 9, 2012
Messages
485
I have had great luck the last few weeks with my new build.

SUPERMICRO MBD-X9SCM-F-O LGA
Xeon E3-1230V2
32G ECC Ram
Flashed M1015
6 3TB WD Green in raidz2
1 2TB WD Green for jail duty

About two weeks ago, I enabled the auto tune function. Other than that I have not molested the machine up to this point.

2 days ago I got an email from the system that a drive could not be opened. The drive did not show up in the GUI. I rebooted the system. The drive came back online and seemed to be working.
A different drive dropped out of the array an hour later. I rebooted and things seemed fine again.
I started a scrub of the pool.

Code:
 pool: BigPool                                                                 
 state: ONLINE                                                                  
status: One or more devices has experienced an unrecoverable error.  An         
        attempt was made to correct the error.  Applications are unaffected.    
action: Determine if the device needs to be replaced, and clear the errors      
        using 'zpool clear' or replace the device with 'zpool replace'.         
   see: http://www.sun.com/msg/ZFS-8000-9P                                      
  scan: scrub repaired 3.85M in 6h23m with 0 errors on Sun May 19 05:26:12 2013 
config:                                                                         
                                                                                
        NAME                                            STATE     READ WRITE CKS
UM                                                                              
        BigPool                                         ONLINE       0     0    
 0                                                                              
          raidz2-0                                      ONLINE       0     0    
 0                                                                              
            gptid/f849925f-a41b-11e2-8c93-002590a969a1  ONLINE       0     0    
 0                                                                              
            gptid/f9328a49-a41b-11e2-8c93-002590a969a1  ONLINE       0     0    
 0                                                                              
            gptid/fa072ee7-a41b-11e2-8c93-002590a969a1  ONLINE       0     0    
 0                                                                              
            gptid/fae46be7-a41b-11e2-8c93-002590a969a1  ONLINE       0     0     
0                                                                              
            gptid/fbba45b8-a41b-11e2-8c93-002590a969a1  ONLINE       0     0   9
85                                                                              
            gptid/fc945db6-a41b-11e2-8c93-002590a969a1  ONLINE       0     0    
 0                                                                              
                                                                                
errors: No known data errors                                                    
                                                                                
  pool: Jacuzzi                                                                 
 state: ONLINE                                                                  
  scan: scrub repaired 0 in 2h26m with 0 errors on Sun May 19 02:26:28 2013     
config:                                                                         
                                                                                
        NAME                                          STATE     READ WRITE CKSUM
        Jacuzzi                                       ONLINE       0     0     0
          gptid/9a2697ea-a4b9-11e2-8b46-002590a969a1  ONLINE       0     0     0
                                                                                
errors: No known data errors                                                    


I have never seen the scrub repair anything before. Something is up.
I have searched. I just don't know enough about shell commands to do anything. What should would you do next?
I also have a brand new 3TB Green here ready to roll if needed.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
The second to the last drive has 985 checksum errors.

It would have been nice to have a "zpool status" report from the first episode, to see if it was a smaller number and is now increasing.

Although you have RAIDZ2, since you have a spare, I'd replace it now.

Afterwards, run smartctl long tests on the drive, to see what it finds.
 

9C1 Newbee

Patron
Joined
Oct 9, 2012
Messages
485
"zpool status" showed all 0's before the scrub.

How can I find out what the serial # or even the name of the drive? I don't understand what the gtpid thing is.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
If you do a gpart list it'll give you a list of the UUIDs. If you do zpool status you should get those exact UUIDs. Now match the UUID from zpool status to the UUID of gpart list and you'll know the exact device. Then take that device and either look for the serial number in the FreeNAS GUI and look at the list or use smartctl -a /dev/(yourdevice) and you'll be able to find your serial number. If you don't know which device is which on your hardware shutdown the FreeNAS server and look at every disk for the disk's serial number.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Ahh, cyberjock gave you the info, while I was composing a message.

While you're working on this, I recommend taking some time and documenting all your hard disks. Consider labelling them with their device names, like "ada0", etc. Should you have a problem in the future, it'll make it easier to find a drive.
 

9C1 Newbee

Patron
Joined
Oct 9, 2012
Messages
485
Now we are cooking with gas! You guys are great. Turns out to be "da4". Just so happens to be one of the drives that dropped out of the array. Should I just replace it? Or should I throw some tests at it? So far, it has been ok since the scrub that ended this morning.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
You could run short and long SMART tests on it.

smartctl -t short (or long) /dev/(device name)

The short test, will probably only take a couple of minutes, whereas a long test will probably take a few hours.

Are you sure it's not ada4.
 

9C1 Newbee

Patron
Joined
Oct 9, 2012
Messages
485
I will throw a long test at it. The emails and the GUI says da4

gpart list
Code:
Geom name: da4                                                                  
modified: false                                                                 
state: OK                                                                       
fwheads: 255                                                                    
fwsectors: 63                                                                   
last: 5860533134                                                                
first: 34                                                                       
entries: 128                                                                    
scheme: GPT                                                                     
Providers:                                                                      
1. Name: da4p1                                                                  
   Mediasize: 2147483648 (2.0G)                                                 
   Sectorsize: 512                                                              
   Stripesize: 4096                                                             
   Stripeoffset: 0                                                              
   Mode: r1w1e1                                                                 
   rawuuid: fb9890cc-a41b-11e2-8c93-002590a969a1                                
   rawtype: 516e7cb5-6ecf-11d6-8ff8-00022d09712b                                
   label: (null)                                                                
   length: 2147483648                                                           
   offset: 65536                                                                
   type: freebsd-swap                                                           
   index: 1                                                                     
 index: 1                                                                     
   end: 4194431                                                                 
   start: 128                                                                   
2. Name: da4p2                                                                  
   Mediasize: 2998445412352 (2.7T)                                              
   Sectorsize: 512                                                              
   Stripesize: 4096                                                             
   Stripeoffset: 0                                                              
   Mode: r1w1e2                                                                 
   rawuuid: fbba45b8-a41b-11e2-8c93-002590a969a1                                
   rawtype: 516e7cba-6ecf-11d6-8ff8-00022d09712b                                
   label: (null)                                                                
   length: 2998445412352                                                        
   offset: 2147549184                                                           
   type: freebsd-zfs                                                            
   index: 2                                                                     
   end: 5860533127                                                              
   start: 4194432                                                               
Consumers:                                                                      
1. Name: da4                                                                    
   Mediasize: 3000592982016 (2.7T)                                              
   Sectorsize: 512                                                              
   Stripesize: 4096                                                             
 Stripeoffset: 0                                                              
   Mode: r2w2e5                                                                 


It threw me off when I was searching for solutions as well. Why it decided to call all the drives "da" I have no idea.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
If you do a gpart list it'll give you a list of the UUIDs. If you do zpool status you should get those exact UUIDs. Now match the UUID from zpool status to the UUID of gpart list and you'll know the exact device. Then take that device and either look for the serial number in the FreeNAS GUI and look at the list or use smartctl -a /dev/(yourdevice) and you'll be able to find your serial number. If you don't know which device is which on your hardware shutdown the FreeNAS server and look at every disk for the disk's serial number.

Not that I don't like the command line, I do, but you can use the web gui for this too.

Under "View Volumes", the right most button on the root pool entry is "volume status". This gives a similar list to "zpool status", but lists /dev[a]daX instead of the gpt ID's. Even though I like working in the command line, I find this easier for matching read / write / cksum errors to devices.

Then "view disks" will let you match the /dev entry to a disks serial number. (except in 8.3.1-release of course).
 

9C1 Newbee

Patron
Joined
Oct 9, 2012
Messages
485
Just a little update. Since running the scrub that corrected the data, I have ran several scrubs. We have been using the server quite a bit as well. I have had no further issues. I am chalking this up to a hiccup in the hardware. FreeNAS rocks! As always, thank you for all the help guys.
 
Status
Not open for further replies.
Top