Resilver issues

joriz

Cadet
Joined
Jun 28, 2019
Messages
6
Hi All,

I'm using 3 fileservers running FreeNAS. Some time ago 1 of the servers was showing a degraded pool so i replaced the faulty disk and initiated the resilver proces.
During the resilver proces the status of the new disk changed to faulted and the pool was still showing degraded. The disk in pool status was showing 55 (errors?) as write. Read and Checksum were both 0.

Screenshot 2020-04-08 at 15.17.00.png


Things i've tested:

  1. Connected the new disk to a Windows machine and run a surface scan --> all OK
  2. Tried to resilver it again after the surfaca scan --> failed
  3. Inspected the disk port and the port of the tray --> looks ok
I have now inserted the disk in a different tray of the server and want to try a resilver again but i don't know which line i have to pick to start the replace/resilver.
Because i have tried to resilver it several times it is showing multiple lines in the pool status.

Screenshot 2020-04-08 at 16.01.01.png


Systems specs:
FreeNAS-11.2-U7
2 x SAS2008 HBA
8 x 6TB WD RED (2 x Raidz2 pools)
Chassis with 16 port backplane
Intel(R) Xeon(R) CPU E5-1620 v3 @ 3.50GHz (8 cores)
64GB memory
Supermicro mainboard.

The faulty device name is da7.

Errors i see in the logs during the resilver proces:

Code:
Apr  6 20:53:50  ZFS: vdev state changed, pool_guid=1085976887892895734 vdev_guid=16393062259941390275
Apr  6 20:54:04  GEOM_MIRROR: Cannot open consumer da6p1 (error=1).
Apr  6 20:54:04  GEOM_MIRROR: Device swap0 destroyed.
Apr  6 22:51:01  (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 50 52 da 50 00 00 00 20 00 00
Apr  6 22:51:01  (da7:mps1:0:17:0): CAM status: SCSI Status Error
Apr  6 22:51:01  (da7:mps1:0:17:0): SCSI status: Check Condition
Apr  6 22:51:01  (da7:mps1:0:17:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Apr  6 22:51:01  (da7:mps1:0:17:0): Info: 0x25052da50
Apr  6 22:51:01  (da7:mps1:0:17:0): Error 22, Unretryable error
Apr  6 22:51:10      (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 1038 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Apr  6 22:51:10      (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 186 terminated ioc 804b l(da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00
Apr  6 22:51:10  oginfo 31080000 scsi 0 state 0 xfer 0
Apr  6 22:51:10  (da7:mps1:0:17:0): CAM status: CCB request completed with an error
Apr  6 22:51:10  (da7:mps1:0:17:0): Retrying command
Apr  6 22:51:10  (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00
Apr  6 22:51:10  (da7:mps1:0:17:0): CAM status: CCB request completed with an error
Apr  6 22:51:10  (da7:mps1:0:17:0): Retrying command
Apr  6 22:51:10  (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00
Apr  6 22:51:10  (da7:mps1:0:17:0): CAM status: SCSI Status Error
Apr  6 22:51:10  (da7:mps1:0:17:0): SCSI status: Check Condition
Apr  6 22:51:10  (da7:mps1:0:17:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Apr  6 22:51:10  (da7:mps1:0:17:0): Info: 0x400290
Apr  6 22:51:10  (da7:mps1:0:17:0): Error 22, Unretryable error
Apr  6 22:51:18  (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00
Apr  6 22:51:18  (da7:mps1:0:17:0): CAM status: SCSI Status Error
Apr  6 22:51:18  (da7:mps1:0:17:0): SCSI status: Check Condition
Apr  6 22:51:18  (da7:mps1:0:17:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Apr  6 22:51:18  (da7:mps1:0:17:0): Info: 0x2baa0f290
Apr  6 22:51:18  (da7:mps1:0:17:0): Error 22, Unretryable error
Apr  6 22:51:26  (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 01 34 43 d8 c8 00 00 00 58 00 00
Apr  6 22:51:26  (da7:mps1:0:17:0): CAM status: SCSI Status Error
Apr  6 22:51:26  (da7:mps1:0:17:0): SCSI status: Check Condition
Apr  6 22:51:26  (da7:mps1:0:17:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Apr  6 22:51:26  (da7:mps1:0:17:0): Info: 0x13443d8c8
Apr  6 22:51:26  (da7:mps1:0:17:0): Error 22, Unretryable error
Apr  6 22:51:29  ZFS: vdev state changed, pool_guid=1085976887892895734 vdev_guid=16393062259941390275
Apr  6 23:05:46  (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 45 df 61 c8 00 00 00 50 00 00
Apr  6 23:05:46  (da7:mps1:0:17:0): CAM status: SCSI Status Error
Apr  6 23:05:46  (da7:mps1:0:17:0): SCSI status: Check Condition
Apr  6 23:05:46  (da7:mps1:0:17:0): SCSI sense: ILLEGAL REQUEST asc:21,0 (Logical block address out of range)
Apr  6 23:05:46  (da7:mps1:0:17:0): Info: 0x245df61c8
Apr  6 23:05:46  (da7:mps1:0:17:0): Error 22, Unretryable error
Apr  6 23:05:55      (da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00 length 8192 SMID 340 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Apr  6 23:05:55      (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f0 90 00 00 00 10 00 00 length 8192 SMID 887 terminated ioc 804b l(da7:mps1:0:17:0): WRITE(10). CDB: 2a 00 00 40 02 90 00 00 10 00
Apr  6 23:05:55  oginfo 31080000 scsi 0 state 0 xfer 0
Apr  6 23:05:55  (da7:mps1:0:17:0): CAM status: CCB request completed with an error
Apr  6 23:05:55  (da7:mps1:0:17:0): Retrying command
Apr  6 23:05:55      (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 ba a0 f2 90 00 00 00 10 00 00 length 8192 SMID 668 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
Apr  6 23:05:55      (da7:mps1:0:17:0): WRITE(16). CDB: 8a 00 00 00 00 02 45 df 63 20 00 00 00 f8 00 00 length 126976 SMID 514 terminated ioc 804b loginfo 31080000 scsi 0 state 0 xfer 0
 

Ddog 800

Cadet
Joined
Jul 13, 2015
Messages
6
OK, so it's not just me! I also just had a drive failure and also just replaced it. During resilver, I keep getting this exact same error and the resilver restarts. Over and over again. Each time, it gets to around 30-35% before failing. I'm quite at a loss. I actually ordered two drives and they both do the exact same thing. I've tried swapping cables, etc. Still the same thing.

I won't even post my dmesg output or anything because it's the exact same thing you've got right here, as well as the write errors in the GUI/zpool status.

I'm also running a SAS2008 HBA (LSI 9211-8i in my case) and 8x 3TB WD Red drives.

Not sure if it matters, but all of the old drives are WD30EFRX-68EUZN0 while both of the new drives are WD30EFAX-68JH4N0. I'm really starting to run out of ideas here.
 

Ddog 800

Cadet
Joined
Jul 13, 2015
Messages
6
Just to add some extra detail, here are my full specs:

FreeNAS 11.3 U1
1x LSI 9211-8i (SAS2008 chipset)
6x WD Red 3TB
2x Cable Matters SFF-8087 -> 4x SATA breakout cable
Intel Core i7-4790 @ 3.60GHz
Asus Z97-C
32GB memory
 

radovan

Cadet
Joined
Apr 13, 2020
Messages
5
Hello,
you've just run into crappy hard drives trouble. Wecome to the club. I have 5 WD RED 4TB EFAX drives with same problem.

More documented here:

It would be perhaps good idea if freenas stops recommending wd reds on their web (https://www.freenas.org/hardware-requirements/)
 

nokia88

Cadet
Joined
Jul 18, 2018
Messages
4
Thank you radovan.
Just checked with smartcl the part number: WD60EFAX
Shame on you Western Digital for selling a NAS drive like this...
 

JaimieV

Guru
Joined
Oct 12, 2012
Messages
742
I've had more WD Reds fail than any other flavour of drive, over the last five years. I don't know what sort of NAS/longevity they were aiming for, but they missed.
 

Ddog 800

Cadet
Joined
Jul 13, 2015
Messages
6
OK, thanks for the confirmation! I found a couple of similar threads last night after I posted this. Definitely looking like these newer WD Red drives are crap. Looks like the WD Gold drives are pretty close in price to the Red Pro drives, so I'm going to order a couple of this.

I'll post my results to confirm success once they arrive and I get one installed and resilvered.
 

Ddog 800

Cadet
Joined
Jul 13, 2015
Messages
6
So just to provide a confirmation to my situation, I received a replacement WD Gold drive and installed it. The drive was resilvered in about 4 hours with no errors or issues at all.

As if we need further confirmation at this point now that this story has been making the rounds in the news. :D

If anyone does need to stick with regular WD Reds, you're better off ensuring you purchase one of the EFRX models, if possible, and avoid EFAX like the plague! There's probably still some old stock out there. I imagine WD will do what they can to fix these compatibility problems with the various RAID platforms that the EFAX drives are currently breaking, but it seems like they (and other manufacturers) are still going to try and push forward with the much slower SMC technology. I guess we'll have to wait and see how it goes.
 

radovan

Cadet
Joined
Apr 13, 2020
Messages
5
I've opened support case with WD, they offered RMA and exchage for EFRX version. So that's probably a way to go.
 
Top