SOLVED Bad cable or disk?

Status
Not open for further replies.

dak180

Patron
Joined
Nov 22, 2017
Messages
310
This is the first time anything like this has happened for me so please let me know if some additional data would be useful.

Code:
########## ZPool status report for jails ##########

  pool: jails
 state: DEGRADED
status: One or more devices are faulted in response to persistent errors.
   Sufficient replicas exist for the pool to continue functioning in a
   degraded state.
action: Replace the faulted device, or use 'zpool clear' to mark the device
   repaired.
  scan: scrub repaired 1M in 0 days 00:10:38 with 0 errors on Mon Jul 30 18:35:59 2018
config:

   NAME											STATE	 READ WRITE CKSUM
   jails										   DEGRADED	 0	 0	 0
	 mirror-0									  DEGRADED	 0	 0	18
	   gptid/63e81d14-0a8b-11e8-99ee-d05099c13d03  ONLINE	   0	 0	18
	   gptid/641b960c-0a8b-11e8-99ee-d05099c13d03  FAULTED	 44   571   348  too many errors

errors: No known data errors


Code:
########## SMART status report for ada2 drive (Crucial/Micron MX1/2/300, M5/600,: 174319680F70) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   100   100   000	Pre-fail  Always	   -	   0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   4405
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   54
171 Program_Fail_Count	  0x0032   100   100   000	Old_age   Always	   -	   0
172 Erase_Fail_Count		0x0032   100   100   000	Old_age   Always	   -	   0
173 Ave_Block-Erase_Count   0x0032   099   099   000	Old_age   Always	   -	   27
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000	Old_age   Always	   -	   47
183 SATA_Interfac_Downshift 0x0032   100   100   000	Old_age   Always	   -	   0
184 Error_Correction_Count  0x0032   097   097   000	Old_age   Always	   -	   3
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
194 Temperature_Celsius	 0x0022   073   061   000	Old_age   Always	   -	   27 (Min/Max 15/39)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   100   100   000	Old_age   Always	   -	   33
202 Percent_Lifetime_Used   0x0030   099   099   001	Old_age   Offline	  -	   1
206 Write_Error_Rate		0x000e   100   100   000	Old_age   Always	   -	   0
246 Total_Host_Sector_Write 0x0032   100   100   000	Old_age   Always	   -	   6248436116
247 Host_Program_Page_Count 0x0032   100   100   000	Old_age   Always	   -	   195263633
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000	Old_age   Always	   -	   94229631
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000	Pre-fail  Always	   -	   1255
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000	Old_age   Always	   -	   0

No Errors Logged

Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline	Completed without error	   00%	  4337		 -
Short offline	   Completed without error	   00%	  4405		 -


Code:
########## SMART status report for ada3 drive (Crucial/Micron MX1/2/300, M5/600,: 17431967E36D) ##########
smartctl 6.6 2017-11-05 r4594 [FreeBSD 11.1-STABLE amd64] (local build)

SMART overall-health self-assessment test result: PASSED

ID# ATTRIBUTE_NAME		  FLAG	 VALUE WORST THRESH TYPE	  UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate	 0x002f   100   100   000	Pre-fail  Always	   -	   0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010	Old_age   Always	   -	   0
  9 Power_On_Hours		  0x0032   100   100   000	Old_age   Always	   -	   4404
 12 Power_Cycle_Count	   0x0032   100   100   000	Old_age   Always	   -	   54
171 Program_Fail_Count	  0x0032   100   100   000	Old_age   Always	   -	   0
172 Erase_Fail_Count		0x0032   100   100   000	Old_age   Always	   -	   0
173 Ave_Block-Erase_Count   0x0032   099   099   000	Old_age   Always	   -	   27
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000	Old_age   Always	   -	   48
183 SATA_Interfac_Downshift 0x0032   100   100   000	Old_age   Always	   -	   0
184 Error_Correction_Count  0x0032   099   099   000	Old_age   Always	   -	   1
187 Reported_Uncorrect	  0x0032   100   100   000	Old_age   Always	   -	   0
194 Temperature_Celsius	 0x0022   071   061   000	Old_age   Always	   -	   29 (Min/Max 16/39)
196 Reallocated_Event_Count 0x0032   100   100   000	Old_age   Always	   -	   0
197 Current_Pending_Sector  0x0032   100   100   000	Old_age   Always	   -	   0
198 Offline_Uncorrectable   0x0030   100   100   000	Old_age   Offline	  -	   0
199 UDMA_CRC_Error_Count	0x0032   100   100   000	Old_age   Always	   -	   2
202 Percent_Lifetime_Used   0x0030   099   099   001	Old_age   Offline	  -	   1
206 Write_Error_Rate		0x000e   100   100   000	Old_age   Always	   -	   0
246 Total_Host_Sector_Write 0x0032   100   100   000	Old_age   Always	   -	   6253246412
247 Host_Program_Page_Count 0x0032   100   100   000	Old_age   Always	   -	   195413954
248 Bckgnd_Program_Page_Cnt 0x0032   100   100   000	Old_age   Always	   -	   92737450
180 Unused_Reserve_NAND_Blk 0x0033   000   000   000	Pre-fail  Always	   -	   1260
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000	Old_age   Always	   -	   0

No Errors Logged

Test_Description	Status				  Remaining  LifeTime(hours)  LBA_of_first_error
Extended offline	Completed without error	   00%	  4336		 -
Short offline	   Completed without error	   00%	  3978		 -
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
It's hard to say based on that. You could remove/wipe/replace (with same drive) and swapp cables. If the resilver runs without error, I would say cable. If the same drive is faulted again, it's a bad drive.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
I swapped out the sata cables and switched the ports; when I booted the machine back up it started reslivering by its self and once finished showed no more errors.
 

kdragon75

Wizard
Joined
Aug 7, 2016
Messages
2,457
Keep a close eye on it.
 

dak180

Patron
Joined
Nov 22, 2017
Messages
310
Yep, I also have that pool's data replicated to my main pool.
 
Status
Not open for further replies.
Top