I just received the following email alert. What do I do next?

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
The volume xxxxxxxxx (ZFS) state is DEGRADED: One or more devices has been removed by the administrator. Sufficient replicas exist for the pool to continue functioning in a degraded state.
Device: /dev/ada0, unable to open device

I printed out the section 8.1.10 Replacing a Failed Drive and I'm stuck at step 1: "Before ... go to Storage>Volumes>View Volumes. My display is empty with no Volume Status icon.

What do I do next?

1557518515939.png
 
D

dlavigne

Guest
What's the full output of zpool status (within code tags)?
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
Code:
[root@GATORMAIN ~]# zpool status                                                                                                   
  pool: freenas-boot                                                                                                               
 state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h1m with 0 errors on Sun Apr 14 03:46:22 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                    
errors: No known data errors                                                                                                       
                                                                                                                                    
  pool: gatormain                                                                                                                   
 state: DEGRADED                                                                                                                   
status: One or more devices has been removed by the administrator.                                                                 
        Sufficient replicas exist for the pool to continue functioning in a                                                         
        degraded state.                                                                                                             
action: Online the device using 'zpool online' or replace the device with                                                           
        'zpool replace'.                                                                                                           
  scan: scrub repaired 0 in 0h25m with 0 errors on Sun Mar 31 00:25:44 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME                                            STATE     READ WRITE CKSUM                                                 
        gatormain                                       DEGRADED     0     0     0                                                 
          raidz1-0                                      DEGRADED     0     0     0                                                 
            14110738617747807040                        REMOVED      0     0     0  was /dev/gptid/f526bd63-2cd2-11e6-a622-001cc04de
239                                                                                                                                 
            gptid/f711e301-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
            gptid/f88248c1-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
                                                                                                                                    
errors: No known data errors                                                                                                       
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
1. Backup your data. If another drive fails or goes offline, you will lose your data.

Please post your system specs and how your drives are connected to your system. Also, your FreeNAS version.

One of your three drives is missing from the pool. I suspect either bad power or data cable to that drive. Or it failed spectacularly. You need to identify which drive and replace it.

https://www.ixsystems.com/documentation/freenas/11.2-U4.1/storage.html#replacing-a-failed-disk
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
1557769507231.png

The drives are connected via SATA.

Data is backed up regularly. My biggest concern is that the FreeNAS user interface seems to lack the volume data and active control buttons to initiate a replacement. I probably need to do this from the shell command line interface since that seems to be working properly.

Physically to identify the failed drive requires a NAS power down and to move the NAS to a work bench and repower. I don't think I risk anything by moving the system from the Network closet. I then need to purchase a replacement drive. I am assuming it doesn't need an exact model match but speed and size should match. Do you concur?

What are the shell commands I need to execute to replace? Your linked reference only refers to the Version 11 user interface. I have the version 9 documentation printed out but this also only refers to the FreeNAS user interface.

The error message indicates: /dev/ada0 . Is this just a logical reference or do you think this can be related to a physical SATA cable connector on the motherboard?
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
https://www.ixsystems.com/documentation/freenas/9.10/storage.html#replacing-a-failed-drive
That's the correct page for your version.

1. Is your drive encrypted? If so, follow the directions for replacing a failed encrypted drive.
2. If you disconnect the wrong drive, you will lose the whole pool.


I wouldn't do anything from the CLI, just follow the directions in the docs. Do you have a HBA with blinking hard drive lights? If not, then do this:

[ada0 is just a reference that can change with each reboot]
Go to this link to use Bidule0hm's Display drives identification infos (Device, GPTID, Serial) script to get the serial number for your failed drive.
https://www.ixsystems.com/community/threads/scripts-to-report-smart-zpool-and-ups-status-hdd-cpu-t°-hdd-identification-and-backup-the-config.27365/
Unfortunately, you will need to move it to a bench to get the correct serial number, again, this is to ensure you have the correct failed drive as per the docs' instructions.

Once you have the correct serial numbered drive, you can
1. Turn off the server, Replace the SATA cable, turn on the machine and see if the drive is recognized and your pool is healthy.
2. If it wasn't the cable then I'd go down the road of replacing it. The drive just needs to be the same size or larger (if larger, it will be treated as the size of the other drives)
 

pro lamer

Guru
Joined
Feb 16, 2018
Messages
626
1. Is your drive encrypted?
@ITgator the zpool status results suggest it's not encrypted.


a physical SATA cable connector on the motherboard?
Or the SATA power cable/connection.

We have drive troubleshooting guide/resource in these forums.

I then need to purchase a replacement drive
It'd be good to burn it in before replacing. We have guides to this as well ;)

There was a discussion whether to put a degraded pool at risk waiting for the new drive burnt in. If it concerns you you may for example keep the NAS powered off or take out or disconnect temporarily the good drives to protect them...

Sent from my phone
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
This is interesting! It took me a little while to assure my back-up was perfect and to clear a lab bench for this operation. I started the Shutdown process and waited for complete confirmation of shutdown and then removed power completely. Moved the server to the lab bench. Re-established net connection and peripherals and then powered up. Ran a zpool status through the shell and the status had changed considerably. Here it is:

Code:
[root@GATORMAIN ~]# zpool status                                                                                                   
  pool: freenas-boot                                                                                                               
 state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h1m with 0 errors on Sun Apr 14 03:46:22 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                    
errors: No known data errors                                                                                                       
                                                                                                                                    
  pool: gatormain                                                                                                                   
 state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                             
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                             
   see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: resilvered 1.08G in 0h1m with 0 errors on Wed May 15 11:40:57 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME                                            STATE     READ WRITE CKSUM                                                 
        gatormain                                       ONLINE       0     0     0                                                 
          raidz1-0                                      ONLINE       0     0     0                                                 
            gptid/f526bd63-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     3                                                 
            gptid/f711e301-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
            gptid/f88248c1-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
                                                                                                                                    
errors: No known data errors 


I was just about to run the Display Drive Indentification infos script to determine the dead drive but now they look to all be spinning. The system Alert seems to indicated there is still a problem.

1557946556592.png


I then initiated a zpool clear from the shell.

System alert changed to a Green light and an alert message that says OK pops up when the light is clicked.

I haven't touched a thing except to power down and back up. Does the scan: resilvered 1.08G message from zpool status mean that the failed drive is back online? Is there still a problem that I can diagnose?
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
The status changed again to critical. One drive has a checksum error.

Code:
[root@GATORMAIN ~]# zpool status                                                                                                   
  pool: freenas-boot                                                                                                               
 state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h1m with 0 errors on Sun Apr 14 03:46:22 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                    
errors: No known data errors                                                                                                       
                                                                                                                                    
  pool: gatormain                                                                                                                   
 state: ONLINE                                                                                                                     
status: One or more devices has experienced an unrecoverable error.  An                                                             
        attempt was made to correct the error.  Applications are unaffected.                                                       
action: Determine if the device needs to be replaced, and clear the errors                                                         
        using 'zpool clear' or replace the device with 'zpool replace'.                                                             
   see: http://illumos.org/msg/ZFS-8000-9P                                                                                         
  scan: resilvered 1.08G in 0h1m with 0 errors on Wed May 15 11:40:57 2019                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME                                            STATE     READ WRITE CKSUM                                                 
        gatormain                                       ONLINE       0     0     0                                                 
          raidz1-0                                      ONLINE       0     0     0                                                 
            gptid/f526bd63-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     1                                                 
            gptid/f711e301-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
            gptid/f88248c1-2cd2-11e6-a622-001cc04de239  ONLINE       0     0     0                                                 
                                                                                                                                    
errors: No known data errors
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
You get any failed smart test emails?
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Freenas does not send email when is successful. So what are you taking about? If a scar test fails the drive needs to be replaced. Very straight forward
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
Freenas does not send email when is successful. So what are you taking about? If a scar test fails the drive needs to be replaced. Very straight forward

I guess I'm confused about the terminology. I get emails on scrubs and security runs with no errors indicated, usually. I'm not sure what operation is sending the message but within the past week I have received emails for failures. The system is basically setup in default. Whatever was enabled from a standard install is still enabled and unchanged. The only thing I did three years ago, was to setup the volumes, make sure all of the clients could see the shares and point to the outgoing email server for notifications. Aside from an initial memory failure (bad SIMM) when the system was setup, the system has run flawlessly until this current round of failures. As an aside, I ran 'zpool clear' last night before leaving the office and this morning the status is still indicating perfect operation. What is a scar test?

BTW, I was extremely impressed with the ease in which I got FreeNAS to install and operate. I was expecting days of agony and all I encountered was bliss, happiness, and three years of perfect operation.

Have a little grace, IT is not my profession.
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
All disks should be running smart tests and you will get emails when those fail. THis tells you a disk is dying. You need to set this up, sounds like it might be to late.

run smartctl -a /dev/da# disk number. Do this for all disks to get the current status then run smartctl -t long /dev/da# to run a long test. Do this for all disks. THen setup auto smart tests in the freenas gui.
 

ITgator

Dabbler
Joined
Jun 15, 2016
Messages
13
Thanks, I will run this as you have described. I do believe smart test is enabled and has been enabled from the beginning of first install. Here is the GUI configuration screen. Do you see anything else that should be enabled? :
1558394889765.png
 

SweetAndLow

Sweet'NASty
Joined
Nov 6, 2013
Messages
6,421
Yeah there need to be tasks created to actually run tests
 

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
Top