Multiple drives with errors in RAIDZ2 pool

timpj5

Cadet
Joined
Sep 18, 2015
Messages
9
Hey guys, I have a Dell R510 with 12 3TB drives connected to an M1015 HBA card. I recently rebuilt this server on this R510 from a white box and simply moved the drives and rebuilt the ZFS pool. Some of these drives are a little long in the tooth but had been pretty good up until now.

I started noticing performance issues a few days ago and everything was running super slow. I looked at the smartctl output and saw /dev/da0 and /dev/da8 both had pending sector errors. Both also have long power_on_hours so they've lived a good life. So I removed /dev/da0 and replaced with a drive and started to resilver only to see it take forever. I started looking into it and it appears the server keeps panicing and rebooting. To compound the issue, I now have 2 more drives throwing pending sector errors and it appears that access to /dev/da2 is what caused the last panic (at least).

None of the data on this server is irreplaceable even though it's taken a while to accumulate. But that said, I'd rather not lose it all. Should I consider removing /dev/da2 from the equation and running 2 resilvers at the same time knowing I could lose all my data? I'm just not sure the resilver is ever going to complete if I leave /dev/da2 in the mix...

Thanks in advance
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
Should I consider removing /dev/da2 from the equation and running 2 resilvers at the same time knowing I could lose all my data?
If you have a good local backup, so it is relatively easy to recover, you might. I have resilvered two drives at once in the same vdev. It can be done, but if another drive fails during the resilver, you can suffer a total loss. Did you do any burn-in testing on these disks before creating the pool?

There are also some monitoring scripts you might want to plan on running:

GitHub repository for FreeNAS scripts, including disk burnin
https://www.ixsystems.com/community...for-freenas-scripts-including-disk-burnin.28/

Building, Burn-In, and Testing your FreeNAS system
https://www.ixsystems.com/community/resources/building-burn-in-and-testing-your-freenas-system.38/
 

Chris Moore

Hall of Famer
Joined
May 2, 2015
Messages
10,080
PS. If you have bad drives that are in different vdevs, you can replace one drive in each vdev with no difficulty at all. I have a server at work with 124 drives divided among 18 vdevs and I had five resilvers running at the same time in that system last month. It does slow the system down a little but the failure domain is the vdev.
If you made the whole pool one vdev, you only have one failure domain, but if you made two vdevs of six drives each, then you have up to four drives (2 in each vdev) that could be failing without loosing the pool.
 
Top