Drive Removed Issue...

Status
Not open for further replies.

centex99

Dabbler
Joined
Jul 29, 2012
Messages
45
So, I copied my problem I had posted from the storage section to here...

I had a disk change status to "Removed" then I decided to shut down the server, check the cables, and then turned it back on. It now seems to be "ok" except it shows a warning status of "WARNING: The volume NAS_VOL (ZFS) status is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'."

Zpool status -v reveals:
pool: NAS_VOL
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
scan: resilvered 988K in 0h0m with 0 errors on Fri Jan 31 13:48:50 2014
config:

NAME STATE READ WRITE CKS
UM
NAS_VOL ONLINE 0 0
0
raidz1-0 ONLINE 0 0
0
gptid/8138d265-d902-11e1-b300-f46d0471c67b ONLINE 0 0
0
gptid/6b5683d1-ef98-11e1-992b-f46d0471c67b ONLINE 0 0
0
gptid/826c766e-d902-11e1-b300-f46d0471c67b ONLINE 0 0
0
gptid/8308e26e-d902-11e1-b300-f46d0471c67b ONLINE 0 0
0
gptid/83a636ea-d902-11e1-b300-f46d0471c67b ONLINE 0 0
1

errors: No known data errors



The last disk is the one that showed "Removed" earlier... it's on ADA4.
Also, my AD0 disk has been having a few issues... it has "Device: /dev/ada0, 8 Offline uncorrectable sectors"
I've been keeping an eye on this, however this number has not increased from 8. Do I need to replace this drive? What about the ADA4 drive?


And the drive just went "Removed" again... (after about 36 hours) is this likely due to the controller or the drive? This drive hasn't shown any other signs of failure...
Any particular ways to figure out which is causing the issue?
 

centex99

Dabbler
Joined
Jul 29, 2012
Messages
45
Anyone with any recommendations on how to figure out what's causing the problem? I have it off right now just to reduce any risks of data loss... I was thinking since likely the drive will show up online if I go and turn it back online, I could swap ada3 and ada4 cables and see if it follows the disk or not and happens again...
 

centex99

Dabbler
Joined
Jul 29, 2012
Messages
45
Well... I swapped the cables, it quickly resilvered and is back in the warning condition... now I'll wait and see if/which drive goes offline...
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
ADA0 should be replaced if its showing as removed at random intervals. 8 bad sectors can cause a drive to get disconnected from the system due to deep level error recovery.

Also, you are running RAIDZ1, which is playing with fire. Hope you don't get burned...
 

centex99

Dabbler
Joined
Jul 29, 2012
Messages
45
ADA0 isn't getting removed at random intervals... ada4 got removed, I inspected everything, it got removed a second time... I since disconnected/reconnected swapping ada3/ada4 and have been going 8+ hours now and still no disconnect.
In regards to ADA0 it has seemed good minus the 8 bad sectors... I can replace it if you think that's best, otherwise leave it. I'd like to figure out the "removed" issue before I replace it though because I don't want to have the disk get removed in the middle of a resliver replacing ada0.
 

centex99

Dabbler
Joined
Jul 29, 2012
Messages
45
So back in to the "disk removed" problem... I didn't encounter the error for over a week and then decided today to put the case side back on... voila the disk went "removed" again... and it followed the disk. So that tells me it wasn't the sata cable or the sata port, but perhaps the disk or power cable (didn't swap those). Will a disk that is overheated get changed to "removed" status? It's weird because I had this up and running for over two years without this problem occuring.
 

Rikketyrik

Cadet
Joined
Jul 22, 2015
Messages
1
I am running into this EXACT issue...ADA4 randomly getting removed, and 8 Offline uncorrectable sectors on ADA0. Did you ever find a solution? No such thing as coincidences eh? My config has 5 drives (Three 2TB and two 1.5TB drifes). ada0, 1 and 2 are RaidZ (Raid 5). Ada3 and 4 are in a mirror.

Smart tests are coming back on both drives as Completed without error. I wipe the drive \ada4 with zeros, it resilvers then a few days\weeks later it goes into a removed state. I've run other third party tools on ada4, all come back negative. I can't find anything wrong with ada4.

ada0 passes all tests as well.
 
Status
Not open for further replies.
Top