Wipe and resilver again of the disks from volume

Status
Not open for further replies.

doman18

Dabbler
Joined
Oct 13, 2017
Messages
26
I. Introduction
I have MSI A88XM-E35 with SSD + 4x2TB (Seagate 2TB 64MB SkyHawk) in RAID10. Yesterday i had some problems with visibility of my disks. Randomly sometimes some of them showed up in UEFI, and others not and sometimes others not showed up and sometimes nothing showed up. I had this issue before but i thought it gone. After fiddling arround my computer stopped load UEFI at all. I replaced on of the RAM and finally everything just started to work. Ive seen all my disks every reboot, and FreeNAS stopped complaining about degraded volume.

II. Problem
After awhile FreeNAS threw an "unrecoverable error". But im fairly sure that this was because of yesterday problems so i made short test on failed disk and it went well. They are fairly new (less than 5 months from unboxing)
What i want to do is to just detach this disk, wipe it out and add again to volume and resilver. Can i do it from FreeNAS GUI?

Code:
root@freenas:~ # smartctl -l selftest /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	   562		 -

root@freenas:~ # smartctl -a /dev/ada1
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:	 ST2000VX008-2E3164
Serial Number:	Z526NJSP
LU WWN Device Id: 5 000c50 0b2d2196f
Firmware Version: CV12
User Capacity:	2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:	 512 bytes logical, 4096 bytes physical
Rotation Rate:	5900 rpm
Form Factor:	  3.5 inches
Device is:		Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ACS-3 T13/2161-D revision 3b
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:	Tue Nov 20 08:26:17 2018 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82)	Offline data collection activity
					was completed without error.
					Auto Offline Data Collection: Enabled.
Self-test execution status:	  (   0)	The previous self-test routine completed
					without error or no self-test has ever
					been run.
Total time to complete Offline
data collection:		 (   97) seconds.
Offline data collection
capabilities:			 (0x7b) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:			(0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:		(0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine
recommended polling time:	 (   1) minutes.
Extended self-test routine
recommended polling time:	 ( 257) minutes.
Conveyance self-test routine
recommended polling time:	 (   2) minutes.
SCT capabilities:			(0x10b9)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x000f   113   099   006	Pre-fail  Always	   -	   56248736
  3 Spin_Up_Time			0x0003   096   095   000	Pre-fail  Always	   -	   0
  4 Start_Stop_Count		0x0032   100   100   020	Old_age   Always	   -	   23
  5 Reallocated_Sector_Ct   0x0033   100   100   010	Pre-fail  Always	   -	   0
  7 Seek_Error_Rate		 0x000f   100   253   030	Pre-fail  Always	   -	   327889
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   563
 10 Spin_Retry_Count		0x0013   100   100   097	Pre-fail  Always	   -	   0
 12 Power_Cycle_Count	   0x0032   100   100   020	Old_age   Always	   -	   23
184 End-to-End_Error		0x0032   100   100   099	Old_age   Always	   -	   0
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
188 Command_Timeout		 0x0032   100   100   000	Old_age   Always	   -	   0
189 High_Fly_Writes		 0x003a   097   097   000	Old_age   Always	   -	   3
190 Airflow_Temperature_Cel 0x0022   072   070   045	Old_age   Always	   -	   28 (Min/Max 25/28)
191 G-Sense_Error_Rate	  0x0032   100   100   000	Old_age   Always	   -	   0
192 Power-Off_Retract_Count 0x0032   100   100   000	Old_age   Always	   -	   23
193 Load_Cycle_Count		0x0032   100   100   000	Old_age   Always	   -	   24
194 Temperature_Celsius	 0x0022   028   040   000	Old_age   Always	   -	   28 (0 22 0 0 0)
197 Current_Pending_Sector  0x0012   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0010   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x003e   200   200   000	Old_age   Always	   -	   0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline	   Completed without error	   00%	   562		 -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
	1		0		0  Not_testing
	2		0		0  Not_testing
	3		0		0  Not_testing
	4		0		0  Not_testing
	5		0		0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

 

Attachments

  • frenasproblem.jpeg
    frenasproblem.jpeg
    43.5 KB · Views: 235
Last edited:

garm

Wizard
Joined
Aug 19, 2017
Messages
1,556
Start by running a scrub to try and clear up the checksum errors. You have sufficient redundancy in the mirror
 

doman18

Dabbler
Joined
Oct 13, 2017
Messages
26
Before you answered i found info about detaching disks. I followed this track and detached failed drived. I wiped it out with 0s but now i cant attach it again?

Code:
root@freenas:~ # zpool status raid10
  pool: raid10
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: resilvered 160M in 0 days 00:00:04 with 0 errors on Mon Nov 19 09:25:54 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	raid10										  DEGRADED	 0	 0	 0
	  mirror-0									  DEGRADED	 0	 0	 0
		gptid/6f34e208-d842-11e8-9345-d8cb8a565c07  DEGRADED	 0	 0	93  too many errors
		gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07  ONLINE	   0	 0	 0
	  mirror-1									  ONLINE	   0	 0	 0
		gptid/70b25097-d842-11e8-9345-d8cb8a565c07  ONLINE	   0	 0	 0
		gptid/717b3dae-d842-11e8-9345-d8cb8a565c07  ONLINE	   0	 0	 0

errors: No known data errors
root@freenas:~ # zpool detach raid10 gptid/6f34e208-d842-11e8-9345-d8cb8a565c07
root@freenas:~ # zpool status raid10
  pool: raid10
 state: ONLINE
  scan: resilvered 160M in 0 days 00:00:04 with 0 errors on Mon Nov 19 09:25:54 2018
config:

	NAME											STATE	 READ WRITE CKSUM
	raid10										  ONLINE	   0	 0	 0
	  gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07	ONLINE	   0	 0	 0
	  mirror-1									  ONLINE	   0	 0	 0
		gptid/70b25097-d842-11e8-9345-d8cb8a565c07  ONLINE	   0	 0	 0
		gptid/717b3dae-d842-11e8-9345-d8cb8a565c07  ONLINE	   0	 0	 0

errors: No known data errors
##here i wiped disk and tried to attach it again
root@freenas:~ # zpool attach raid10 gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07 gptid/6f34e208-d842-11e8-9345-d8cb8a565c07
cannot open 'gptid/6f34e208-d842-11e8-9345-d8cb8a565c07': no such GEOM provider
must be a full path or shorthand device name
root@freenas:~ # glabel status
									  Name  Status  Components
							  label/efibsd	 N/A  ada0p1
gptid/23d02115-d83f-11e8-8143-d8cb8a565c07	 N/A  ada0p1
gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07	 N/A  ada2p2
gptid/70b25097-d842-11e8-9345-d8cb8a565c07	 N/A  ada3p2
gptid/717b3dae-d842-11e8-9345-d8cb8a565c07	 N/A  ada4p2
gptid/6fdf87b1-d842-11e8-9345-d8cb8a565c07	 N/A  ada2p1
root@freenas:~ # zpool attach raid10 gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07 gptid/6fdf87b1-d842-11e8-9345-d8cb8a565c07
cannot attach gptid/6fdf87b1-d842-11e8-9345-d8cb8a565c07 to gptid/6ff2c639-d842-11e8-9345-d8cb8a565c07: device is too small
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Before you answered i found info about detaching disks.
Because you are not supposed to "detach" a drive from a mirror. It makes it not a mirror. Now you don't have any redundancy. If a drive is failed, it is failed, you replace it, you don't try to fix it.
There is no way to add a mirror to a single disk vdev through the GUI, which is what you should have been using all along. There are very few tasks that should be done at the command line.
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
None of this should have been done from the command line. Why are you not using the GUI?
 

doman18

Dabbler
Joined
Oct 13, 2017
Messages
26
Because i used to use other servers which use ZFS (ex. proxmox) where all ZFS stuff were done in CLI. Also openmediavault which i used before freenas, had a ZFS plugin which didnt have all functionality in GUI (building RAID10 for example) and using CLI was normal practice.

If a drive is failed, it is failed, you replace it, you don't try to fix it.
Well i just didnt believe that drive is failed after 2-3 months of work. I thought that if SMART is ok, its just a matter of refreshing the data on it from the second (good) drive from the mirror.

Manual says that it is possible to attach/detach devices to existing pools. So i dont know why it should be a problem in this case?
https://docs.oracle.com/cd/E19253-01/819-5461/gcfhe/index.html
 
Last edited:

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Because i used to use other servers which use ZFS (ex. proxmox) where all ZFS stuff were done in CLI. Also openmediavault which i used before freenas, had a ZFS plugin which didnt have all functionality in GUI (building RAID10 for example) and using CLI was normal practice.
Well, you are using FreeNAS (not freenas) now and most things need to be done from the GUI if you want it to work properly.
Well i just didnt believe that drive is failed after 2-3 months of work.
Still, you should not have done a detach because now you have to create a new mirror and that can only be completed from the command line. You could have simply shut the NAS down, removed the drive, formatted it in another computer, then delete the partition table, then return it to the NAS and used the GUI to do a replace of the 'missing' disk because it would have been seen as a different disk after being disk wiped in another system.
 

doman18

Dabbler
Joined
Oct 13, 2017
Messages
26
Ok. I will know now. I just took some 1TB drive, will copy all data to it and will rebuild volume again. Unfortunately after adding new drive im getting error

Code:
This is a freenas data disk and can not boot system. System halted.


Even if i disconnect new drive im gettin this message. Oh man ... a day of failures ... Will have to rebuild everything from scratch tomorrow.

Thx for help.
 

toadman

Guru
Joined
Jun 4, 2013
Messages
619
It looks like the hard drive order for boot changed in bios. That message is telling you the system cannot boot from a data drive (which is true). You should simply have to change the boot order in bios and put your boot device back to #1.
 

doman18

Dabbler
Joined
Oct 13, 2017
Messages
26
If the boot order wasnt correct then i think i would get some message from BIOS/UEFI, not from the FreeNAS itself. Offcourse i checked order in UEFI anyway and it was ok (SATA controler was first priority and SATA0 with SSD was also first). I even tried to boot SSD from UEFI boot menu. Didnt help either as i thought. So I reinstalled FN with options to wipe data and leave config. This is a godsend solution as it saved some time for me. After FN booted I copied my data to backup drive, detached volume, wiped disks (QUICK) and rebuild volume again.

Thx all for help.
 
Status
Not open for further replies.
Top