view volumes/disks does not show volumes/disks. Can I replace a bad drive without the GUI?

Dan543

Cadet
Joined
Dec 19, 2021
Messages
4
Hi All, I have what is hopefully a straight forward issue. I have a degrading HDD in a z2 configured array and need to replace it. This system has been running with no issues since 2015 so the drives are likely close to EOL. Access has been via windows shares for media and music storage and the GUI shows the shares [Volumes>'volume name'>'share name'. However, the View Volumes> on the GUI returns no volumes and the View Disks returns no discs - so although it has worked well with no issues, it is likely that I didn't set this up correctly initially. I need to identify the bad drive, replace it and initiate a resilvering process. Ideally, I'd like to get the GUI to show the volumes/disks as well.

Alert on Dashboard identifies ada1 as having 8 bad sectors
<Zpool status> returns the 8 zpool drives
<camcontrol identify ada1> returns the drive serial number

So since I can's see it in the GUI, I cannot take it offline and resilver with the GUI. Can I shut the NASbox down, replace the drive (via serial number) and expect it to resilver?

Appreciate help with both of these issues.

FreeNAS Build: FreeNAS-9.3-STABLE-201509022158
Platform: Intel(R) Core(TM) i3-4130T CPU @ 2.90GHz
Motherboard: ASRock E3C224D21
Memory: 16319MB, Crucial CT102472BD160B.118FED ECC
Drives: 8x 4TB HGST DeskStar 0S03664
 

Dan543

Cadet
Joined
Dec 19, 2021
Messages
4
So since the GUI is not working for me; conceptually - identify the bad drive by serial number, physically remove it and put in a good drive, then use zpool replace to resilver

zpool structure

Code:
[root@freenas ~]# zpool list                                                                                                       
NAME             SIZE  ALLOC   FREE  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOT                                                 
CopperMountain    29T  24.2T  4.75T         -    22%    83%  1.00x  ONLINE  /mnt                                                   
freenas-boot    14.5G   522M  14.0G         -      -     3%  1.00x  ONLINE  -       


Correlate gptid with dev
Code:
[root@freenas ~]# glabel status                                                                                                     
                                      Name  Status  Components                                                                      
gptid/959f3ca6-a1ce-11e5-bd16-d05099c01bdb     N/A  da0p1                                                                          
gptid/6dc81678-a46e-11e5-bc55-d05099c01bdb     N/A  ada0p2                                                                          
gptid/6d3a0467-a46e-11e5-bc55-d05099c01bdb     N/A  ada1p2                                                                          
                      ntfs/System Reserved     N/A  ada2s1                                                                          
gptid/6ca9f247-a46e-11e5-bc55-d05099c01bdb     N/A  ada3p2                                                                          
gptid/700b9382-a46e-11e5-bc55-d05099c01bdb     N/A  ada4p2                                                                          
gptid/6f7d0f50-a46e-11e5-bc55-d05099c01bdb     N/A  ada5p2                                                                          
gptid/6ee7970c-a46e-11e5-bc55-d05099c01bdb     N/A  ada6p2                                                                          
gptid/6e5a2806-a46e-11e5-bc55-d05099c01bdb     N/A  ada7p2                                                                          
gptid/7094cb62-a46e-11e5-bc55-d05099c01bdb     N/A  ada8p2  


Verifying that ada1 has issues with smartctl and determine drive serial number

Code:
smartctl -a /dev/ada1 | more
smartctl 6.3 2014-07-26 r3976 [FreeBSD 9.3-RELEASE-p25 amd64] (local build)                                                         
Copyright (C) 2002-14, Bruce Allen, Christian Franke, www.smartmontools.org                                                         
                                                                                                                                    
=== START OF INFORMATION SECTION ===                                                                                               
Model Family:     HGST Deskstar NAS                                                                                                 
Device Model:     HGST HDN724040ALE640                                                                                             
Serial Number:    PK2338P4GXXJZC                                                                                                   
LU WWN Device Id: 5 000cca 249cd258e                                                                                               
Firmware Version: MJAOA5E0                                                                                                         
User Capacity:    4,000,787,030,016 bytes [4.00 TB]                                                                                 
Sector Sizes:     512 bytes logical, 4096 bytes physical                                                                           
Rotation Rate:    7200 rpm                                                                                                         
Form Factor:      3.5 inches                                                                                                       
Device is:        In smartctl database [for details use: -P show]                                                                   
ATA Version is:   ATA8-ACS T13/1699-D revision 4                                                                                   
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)                                                                           
Local Time is:    Wed Dec 22 15:15:37 2021 MST                                                                                     
SMART support is: Available - device has SMART capability.                                                                         
SMART support is: Enabled                                                                                                           
                                                                                                                                    
=== START OF READ SMART DATA SECTION ===                                                                                           
SMART overall-health self-assessment test result: PASSED                                                                           
                                                                                                                                    
General SMART Values:                                                                                                               
Offline data collection status:  (0x84) Offline data collection activity                                                           
                                        was suspended by an interrupting command from host.                                         
                                        Auto Offline Data Collection: Enabled.                                                     
Self-test execution status:      ( 116) The previous self-test completed having                                                     
                                        the read element of the test failed.                                                       
Total time to complete Offline                                                                                                     
data collection:                (   24) seconds.                                                                                   
Offline data collection                                                                                                             
capabilities:                    (0x5b) SMART execute Offline immediate.                                                           
                                        Auto Offline data collection on/off support.                                               
                                        Suspend Offline collection upon new                                                         
                                        command.                                                                                   
                                        Offline surface scan supported.                                                             
                                        Self-test supported.                                                                       
                                        No Conveyance Self-test supported.                                                         
                                        Selective Self-test supported.                                                             
SMART capabilities:            (0x0003) Saves SMART data before entering                                                           
                                        power-saving mode.                                                                         
                                        Supports SMART auto save timer.                                                             
Error logging capability:        (0x01) Error logging supported.                                                                   
                                        General Purpose Logging supported.                                                         
Short self-test routine                                                                                                             
recommended polling time:        (   1) minutes.                                                                                   
Extended self-test routine                                                                                                         
recommended polling time:        ( 567) minutes.         
SCT capabilities:              (0x003d) SCT Status supported.                                                                       
                                        SCT Error Recovery Control supported.                                                       
                                        SCT Feature Control supported.                                                             
                                        SCT Data Table supported.                                                                   
                                                                                                                                    
SMART Attributes Data Structure revision number: 16                                                                                 
Vendor Specific SMART Attributes with Thresholds:                                                                                   
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                   
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       131072                                       
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       79                                           
  3 Spin_Up_Time            0x0007   127   127   024    Pre-fail  Always       -       610 (Average 610)                           
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       59                                           
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0                                           
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0                                           
  8 Seek_Time_Performance   0x0005   119   119   020    Pre-fail  Offline      -       35                                           
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       52896                                       
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0                                           
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       47                                           
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       2196                                         
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       2196                                         
194 Temperature_Celsius     0x0002   146   146   000    Old_age   Always       -       41 (Min/Max 22/49)                           
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0                                           
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8                                           
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0                                           
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0                                           
                                                                                                                                    
SMART Error Log Version: 1                                                                                                         
No Errors Logged                                                                                                                   
                                                                                                                                    
SMART Self-test log structure revision number 1                                                                                     
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error                                     
# 1  Short offline       Completed: read failure       40%     52832         -                                                     
# 2  Short offline       Completed: read failure       40%     52831         -                                                     
# 3  Short offline       Completed: read failure       40%     52830         -                                                     
# 4  Short offline       Completed: read failure       40%     52829         -                                                     
# 5  Short offline       Completed: read failure       40%     52828         -                                                     
# 6  Short offline       Completed: read failure       40%     52827         -                                                     
# 7  Short offline       Completed: read failure       40%     52826         -                                                     
# 8  Short offline       Completed: read failure       40%     52825         -                                                     
# 9  Short offline       Completed: read failure       40%     52824         -                                                     
#10  Short offline       Completed: read failure       40%     52823         -                                                     
#11  Short offline       Completed: read failure       40%     52822         -                                                     
#12  Short offline       Completed: read failure       40%     52821         -
#13  Short offline       Completed: read failure       40%     52820         -                                                     
#14  Short offline       Completed: read failure       40%     52819         -                                                     
#15  Short offline       Completed: read failure       40%     52818         -                                                     
#16  Short offline       Completed: read failure       40%     52817         -                                                     
#17  Short offline       Completed: read failure       40%     52816         -                                                     
#18  Short offline       Completed: read failure       40%     52815         -                                                     
#19  Short offline       Completed: read failure       40%     52814         -                                                     
#20  Short offline       Completed: read failure       40%     52813         -                                                     
#21  Short offline       Completed: read failure       40%     52812         -                                                     
                                                                                                                                    
SMART Selective self-test log data structure revision number 1                                                                     
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS                                                                                       
    1        0        0  Not_testing                                                                                               
    2        0        0  Not_testing                                                                                               
    3        0        0  Not_testing                                                                                               
    4        0        0  Not_testing                                                                                               
    5        0        0  Not_testing                                                                                               
Selective self-test flags (0x0):                                                                                                   

SMART Attributes Data Structure revision number: 16                                                                                 
Vendor Specific SMART Attributes with Thresholds:                                                                                  
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE                                    
  1 Raw_Read_Error_Rate     0x000b   099   099   016    Pre-fail  Always       -       131072                                      
  2 Throughput_Performance  0x0005   137   137   054    Pre-fail  Offline      -       79                                          
  3 Spin_Up_Time            0x0007   127   127   024    Pre-fail  Always       -       610 (Average 610)                            
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       59                                          
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0                                            
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0                                            
  8 Seek_Time_Performance   0x0005   119   119   020    Pre-fail  Offline      -       35                                          
  9 Power_On_Hours          0x0012   093   093   000    Old_age   Always       -       52922                                        
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0                                            
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       47                                          
192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       2197                                        
193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       2197                                        
194 Temperature_Celsius     0x0002   157   157   000    Old_age   Always       -       38 (Min/Max 22/49)                          
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0                                            
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       8                                            
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0                                            
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0                                                                                                                                                                        



So my thinking is take ada1 offline with
Code:
 zpool offline [-ft] pool devicea| 
e.g.
Code:
 zpool offline CopperMountain ada1 

Question: use the -t flag e.g.
Code:
 zpool offline -t CopperMountain ada1 
so it reverts on reboot?
Question, do I need to identify by gtpid or is ada1 sufficient?

Physically pull drive PK2338P4GXXJZC
Physically replace with a new drive
Then use zpool replace [-fsw] [-o property=value] pool device [new-device] e.g.
Code:
 zpool replace CopperMountain ada1

which should initiate a resilver.

Question: Do I need to pull the old drive, add the new before I do the replace?
What am I missing?
If anyone has done this before, I'd appreciate the input. I'm new to linux/FreeBSD
thanks
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Show us the zpool status output in code tags or a screen shot. Also, replacing a drive via the cli is not super simple, it does require some thought and planning. Looks like you also have a SWAP partition on that drive which would need to be taken into account. Do a search on something like Google for "truenas replace drive cli" and you will find some good information. Read those instructions well before making any more. It's possible but if your GUI isn't working properly, you have an issue that needs to be resolved.

Also, you could try to load up a NEW USB Flash drive with say FreeNAS 11.3-U5 and bootstrap that as a clean install, try to import your pool. If it works then maybe you can replace the drive via the GUI. BTW FreeNAS 9.3 is fine, I used it for a long time and it's stable so you can use that again as well if you prefer.

The best option is to backup your data that you need to retain to an external device (could be a Windows computer with a few large hard drives) and once done, install new drives in the NAS and rebuild from scratch. 6 Years on a hard drive is a long time, it's time to replace them all. For myself, I only have about 2TB of important data (photos, finances, paid software), the rest of my stuff if I lost it would not make me happy but I'd easily get over it.

Good Luck, I feel you may need it.
 

Dan543

Cadet
Joined
Dec 19, 2021
Messages
4
Will look up the CLI procedures, appreciate the input. It is likely that I did not set up correctly initially and do have other issues to resolve. However, it's been working well for 5-6 yrs - but I don't do much besides media storage/play and photo storage. Was hoping to get this box to limp along until I can either get another built or replace the drives with larger drives - as it is at 83% capacity. I just backed up on 3x 10TB windows drives. I had considered loading a new copy of FreeNAS 9.3 (or 11.3-U5) in a clean install but was hoping a CLI drive replacement would be straight forward. A clean install and an imported pool might be the simplest path forward. Appreciate the suggestion, having never done this before, I don't have the experience to weigh which path is most straight forward.

Code:
[root@freenas ~]# zpool status |more                                                                                               
  pool: CopperMountain                                                                                                             
 state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 23h32m with 0 errors on Sun Dec 12 23:32:23 2021                                                       
config:                                                                                                                             
                                                                                                                                    
        NAME                                            STATE     READ WRITE CKSUM                                                 
        CopperMountain                                  ONLINE       0     0     0                                                 
          raidz2-0                                      ONLINE       0     0     0                                                 
            gptid/6ca9f247-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/6d3a0467-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/6dc81678-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/6e5a2806-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/6ee7970c-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/6f7d0f50-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/700b9382-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
            gptid/7094cb62-a46e-11e5-bc55-d05099c01bdb  ONLINE       0     0     0                                                 
                                                                                                                                    
errors: No known data errors                                                                                                       
                                                                                                                                    
  pool: freenas-boot                                                                                                               
 state: ONLINE                                                                                                                     
  scan: scrub repaired 0 in 0h0m with 0 errors on Sat Dec 11 03:45:24 2021                                                         
config:                                                                                                                             
                                                                                                                                    
        NAME        STATE     READ WRITE CKSUM                                                                                     
        freenas-boot  ONLINE       0     0     0                                                                                   
          da0p2     ONLINE       0     0     0                                                                                     
                                                                                                                                    
errors: No known data errors                 
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
So at face value your pool looks okay. I'd just replace the boot device with a clean install of FreeNAS, do a minimal configuration of your network settings just enough to access the GUI, then import the pool. If you can access the pool without issue then you should be able to replace the drive using the proper procedure in the GUI and by all means, read the user guide on how to replace a hard drive, do not wing it.

If you have all your data backed up, you could just replace all your hard drives (yup, costly) with new ones, replace your boot device with new one (assuming you are using a USB Flash Drive) and then rebuild your FreeNAS. The computer itself should be fine. Once the system is rebuilt you would copy your data back to the NAS.

One note, do not upgrade your pool or you will not be able to roll back to FreeNAS 9.3. My pool is from FreeNAS 9x days, just haven't needed to upgrade it to new features that I would never use.
 

Dan543

Cadet
Joined
Dec 19, 2021
Messages
4
Did as you suggested - loaded 11.3.5-U5 and imported the 9.3.1 volume/datasets without issue. The GUI on the reloaded OS could see the imported volume & drives. Offlined ada01 in the GUI, shut the system down, physically replaced the failing drive, did the replace in the GUI & it resilvered without issue. Had to re-establish the Windows shares but that was all. Not clear what I did wrong the 1st time, I'm surprised it worked as well as it did for so long. Now to upgrade the storage pool but will likely do this on a clean build. Thanks for your help, much appreciated.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Glad it worked out. No one likes to see anyone loose data and thankfully you can import your ZFS pool into a clean install of FreeNAS/TrueNAS. I would not recommend that you upgrade your pool when asked, it will prevent you from the option of rolling back to an older version of the software should you desire it. I personally like the FreeNAS 11.3 version quite a bit, although I am running TrueNAS 12.0-U7 now and I can roll back if I find something that doesn't work for me. I'm testing TrueNAS Scale on a separate machine used for testing purposes only but that version is still for folks who like to test the latest and greatest, not for those who expect a perfectly clean system but that will come in time.

Happy New Year!
 
Top