Stresstesting RAIDZ failes (FreeNAS 8.0.1-BETA2-amd64)

Status
Not open for further replies.

neubert

Dabbler
Joined
Jun 24, 2011
Messages
26
Hello,

I built a dedicated FreeNAS 8.0.1-BETA2-amd64 box with three 2TB SATA-III hard drives and ZFS RAIDZ volume on top. Before I dare copying the data from my current fileserver to it I wanted to do a stresstest: I unplugged the third of the three drives to simulate a drive failure.

Expected outcome:
  1. Test data on RAIDZ volume still accessible and correct.
  2. Receive e-mail notification from NAS on failure.
  3. After replugging the drive the RAIDZ volume is either automatically or manually repaired.
  4. GUI shows state of volume (degraded / functional).

Please correct me if my expectations are not in accordance with what FreeNAS is intended to do.

Observed outcome:
  1. Test data on RAIDZ volume still accessible and correct.
  2. No e-mail notification received.
  3. Unable to automatically or manually repair the volume.
  4. No clues in the GUI what is going on.

ad 2.: according to what I read in another thread this is probably due to the missing "E-Mail To:" setting for the admin in user in the GUI and will be fixed in BETA3.

ad 3.: after unplugging the drive, I get the message

Code:
Jun 25 11:40:16 nas kernel: (ada2:ahcich2:0:0:0): lost device


The GUI (Storage | volume1 | View Disk) still shows three entries but only ada0 and ada1 are named whereas the name of the third disk (previously ada2) is blank.

Logging into the nas and running zpool status gives

Code:
  pool: volume1
 state: ONLINE
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        volume1        ONLINE       0     0     0
          raidz1       ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0
            gpt/disk2  ONLINE       0     0     0

errors: No known data errors


I need to do some writing to the volume to provoke an updated status:

Code:
  pool: volume1
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: none requested
config:

        NAME           STATE     READ WRITE CKSUM
        volume1        ONLINE       0     0     0
          raidz1       ONLINE       0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0
            gpt/disk2  ONLINE   7.69K 38.9K     0
                                                                                                                                                                
errors: No known data errors 


I then replug the drive and use the GUI (Storage | volume1 | View Disk | Replace) to replace the disk. A popup message informs me: "Sorry, an error occured."

On the command line, ls /dev shows that there is no /dev/ada2. Obviously the hotplugging was not detected. camcontrol rescan all does not help either.

After rebooting the nas and playing around with zpool on the command line, I was able to get as far as

Code:
pool: volume1                                                                                                                                                 
 state: DEGRADED                                                                                                                                                
status: One or more devices could not be opened.  Sufficient replicas exist for                                                                                 
        the pool to continue functioning in a degraded state.                                                                                                   
action: Attach the missing device and online it using 'zpool online'.                                                                                           
   see: http://www.sun.com/msg/ZFS-8000-2Q                                                                                                                      
 scrub: scrub in progress for 0h0m, 63.24% done, 0h0m to go
config:

        NAME           STATE     READ WRITE CKSUM
        volume1        DEGRADED     0     0     0
          raidz1       DEGRADED     0     0     0
            gpt/disk0  ONLINE       0     0     0
            gpt/disk1  ONLINE       0     0     0
            gpt/disk2  UNAVAIL      0     0     3  cannot open


but I was not able to place /dev/ada2 back into the pool.

Deleting the entire volume and starting over works.

My questions:
  • How do I repair the RAIDZ volume?
  • Is there a HOWTO for it? (could not find one on the web)
  • How can the status of the RAIDZ volume monitored and how can the RAIDZ volume repaired by the GUI?

Thank you very much in advance for any help or hints.

Kind regards
Boris
 
Status
Not open for further replies.
Top