turgidfoamymaggot
Cadet
- Joined
- Jan 20, 2015
- Messages
- 6
I'm very green with ZFS as this is my first experience with it, so please bear with me.
Some background. I've inherited a NAS which I'm already fairly positive has some massive hardware issues. It cannot be replaced yet. I did not build this NAS, but it appears to be a series of RAID-1's created using the LSI RAID card, which are then added to zpools.
This weekend, there was an unexpected failure where several drives dropped offline. Entire RAID-1's were lost. The LSI controller thinks that drives have failed, though I suspect either the backplane or RAID card. Regardless, I have one impacted pool:
Neither of the faulted drives exist in /dev:
I tried to replace the faulted disks through the GUI, but I get this:
Note that while other disks are displayed as mfid[0-16], the two faulted disks only show the GPTID.
If I try to remove one of the faulted devices from a shell, I get this:
Likewise, gpart list does not find the device:
I'm sure this is something stupid simple to do, but I cannot figure out how to replace the device, since I can't remove the device because it doesn't exist.
I gather that RAIDZ-1 is not recommended, yet here we are. Blowing away the storage pool is not an option at this time.
Some background. I've inherited a NAS which I'm already fairly positive has some massive hardware issues. It cannot be replaced yet. I did not build this NAS, but it appears to be a series of RAID-1's created using the LSI RAID card, which are then added to zpools.
This weekend, there was an unexpected failure where several drives dropped offline. Entire RAID-1's were lost. The LSI controller thinks that drives have failed, though I suspect either the backplane or RAID card. Regardless, I have one impacted pool:
Code:
[root@ redacted-hostname] ~# zpool status nfs-vol2 pool: nfs-vol2 state: DEGRADED status: One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state. action: Replace the faulted device, or use 'zpool clear' to mark the device repaired. scan: scrub repaired 0 in 5h57m with 0 errors on Sun Dec 28 05:59:17 2014 config: NAME STATE READ WRITE CKSUM nfs-vol2 DEGRADED 0 0 0 raidz1-0 DEGRADED 0 0 0 gptid/c2369674-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 gptid/c25f0288-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 gptid/c2870e79-6ec3-11e4-a59f-0025901d2102 FAULTED 0 84 0 too many errors gptid/c2ae317d-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 raidz1-1 DEGRADED 0 0 0 gptid/c2d7ce90-6ec3-11e4-a59f-0025901d2102 FAULTED 0 92 0 too many errors gptid/c2ff73b2-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 gptid/c3268d97-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 gptid/c34a69ae-6ec3-11e4-a59f-0025901d2102 ONLINE 0 0 0 errors: No known data errors [root@ redacted-hostname] ~#
Neither of the faulted drives exist in /dev:
Code:
[root@ redacted-hostname] ~# ls -l /dev/gptid/c2870e79-6ec3-11e4-a59f-0025901d2102 ls: /dev/gptid/c2870e79-6ec3-11e4-a59f-0025901d2102: No such file or directory [root@ redacted-hostname] ~# ls -l /dev/gptid/c2d7ce90-6ec3-11e4-a59f-0025901d2102 ls: /dev/gptid/c2d7ce90-6ec3-11e4-a59f-0025901d2102: No such file or directory [root@ redacted-hostname] ~#
I tried to replace the faulted disks through the GUI, but I get this:
Code:
Error: Disk replacement failed: "cannot replace gptid/c2d7ce90-6ec3-11e4-a59f-0025901d2102 with gptid/e10c1ba4-a107-11e4-b02f-0025901d2102: no such device in pool, "
Note that while other disks are displayed as mfid[0-16], the two faulted disks only show the GPTID.
If I try to remove one of the faulted devices from a shell, I get this:
Code:
[root@ redacted-hostname] ~# zpool remove nfs-vol2 gptid/c2870e79-6ec3-11e4-a59f-0025901d2102 cannot remove gptid/c2870e79-6ec3-11e4-a59f-0025901d2102: no such device in pool
Likewise, gpart list does not find the device:
Code:
[root@ redacted-hostname] ~# gpart list | grep "c2870e79-6ec3-11e4-a59f-0025901d2102" [root@ redacted-hostname] ~#
I'm sure this is something stupid simple to do, but I cannot figure out how to replace the device, since I can't remove the device because it doesn't exist.
I gather that RAIDZ-1 is not recommended, yet here we are. Blowing away the storage pool is not an option at this time.