SOLVED Error replacing failed drive (self-inflicted)

Status
Not open for further replies.

linux203

Cadet
Joined
Apr 26, 2012
Messages
5
I had a drive fail in a raidz1 array. The drive shows as UNAVAIL in the GUI. I've inserted the new drive and attempted to replace using the GUI dialog. There is no error in the GUI but the drive is not replaced. (There is an error in /var/log/messages)

I used the GUI to wipe the drive and retry the replace, same error.
I used dd to wipe the drive and retry the replace, same error.

Code:
Jul  7 20:31:16 hostname notifier: 1+0 records in
Jul  7 20:31:16 hostname notifier: 1+0 records out
Jul  7 20:31:16 hostname notifier: 1048576 bytes transferred in 0.036323 secs (28868044 bytes/sec)
Jul  7 20:31:16 hostname notifier: dd: /dev/ada4: short write on character device
Jul  7 20:31:16 hostname notifier: dd: /dev/ada4: end of device
Jul  7 20:31:16 hostname notifier: 5+0 records in
Jul  7 20:31:16 hostname notifier: 4+1 records out
Jul  7 20:31:16 hostname notifier: 4677632 bytes transferred in 0.016457 secs (284232182 bytes/sec)
Jul  7 20:31:16 hostname notifier: swapoff: /dev/ada4p1.eli: No such file or directory
Jul  7 20:31:16 hostname notifier: geli: No such device: /dev/ada4p1.
Jul  7 20:31:16 hostname manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "cannot open 'gptid/a50b2125-2508-11e5-948b-6805ca026820': no such GEOM provider, must be a full path or shorthand device name, "]


Code:
[root@hostname] ~# gpart show ada4
=>  34  5860533101  ada4  GPT  (2.7T)
  34  94  - free -  (47k)
  128  4194304  1  freebsd-swap  (2.0G)
  4194432  5856338696  2  freebsd-zfs  (2.7T)
  5860533128  7  - free -  (3.5k)


FreeNAS-9.3-STABLE-201506292130
Gigabyte F2A88XM-D3H
AMD A6-6400K APU
32GB Crucial DDR3 1600
4x Seagate Constellation ES 2TB (ST32000644NS) as raidz1
4x Seagate Barracuda Green ST2000DL003 as raidz1
(one Barracuda Green died, I'm replacing all four with Seagate NAS HDD 3TB ST3000VN000)

What can I do to get the replacement drive online and reslivering.
 

linux203

Cadet
Joined
Apr 26, 2012
Messages
5
I used zpool replace -f disk3 14335425896406145796 ada4p2 to get the drive to resliver.

The resliver completed successfully, so I decided to OFFLINE another drive for replacement. I removed the offlined drive and installed another (brand new) Seagate NAS HDD drive. Selected the offline drive and clicked replace. I selected ada6 from the dialog and clicked Replace Drive. The following error appeared in /var/log/messages.

Code:
Jul  8 18:33:18 hostname notifier: 1+0 records in
Jul  8 18:33:18 hostname notifier: 1+0 records out
Jul  8 18:33:18 hostname notifier: 1048576 bytes transferred in 0.034483 secs (30408320 bytes/sec)
Jul  8 18:33:18 hostname notifier: dd: /dev/ada6: short write on character device
Jul  8 18:33:18 hostname notifier: dd: /dev/ada6: end of device
Jul  8 18:33:18 hostname notifier: 5+0 records in
Jul  8 18:33:18 hostname notifier: 4+1 records out
Jul  8 18:33:18 hostname notifier: 4677632 bytes transferred in 0.015089 secs (310002064 bytes/sec)
Jul  8 18:33:19 hostname notifier: swapoff: /dev/ada6p1.eli: No such file or directory
Jul  8 18:33:19 hostname notifier: geli: No such device: /dev/ada6p1.
Jul  8 18:33:19 hostname manage.py: [middleware.exceptions:38] [MiddlewareError: Disk replacement failed: "cannot open 'gptid/551bd6c2-25c1-11e5-8dfc-6805ca026820':
 no such GEOM provider, must be a full path or shorthand device name, "]


Again, I used zpool replace -f disk3 5099684438736902907 ada6p2 and the drive is reslivering.

At this point, I think there is probably a problem with either that pool or my FreeNAS installation.
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
I'll bring this to the devs' attention.

This could be a bug or something.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Considering that you're replacing a gptid (which is what FreeNAS is designed to use) with adaXp2, I'm betting you've done other stuff from the CLI that has manifested in this error. We do not recommend doing anything from the CLI you can't do from the WebGUI. Yes, I know you're going to say "but I couldn't replace it from the WebGUI", but the fact you were so quick to jump to the CLI without asking makes me think you've done other stuff from the CLI. Probably stuff you felt wasn't overly critical to be 100% correct, and now it's backfiring.

But I seriously doubt this is a bug. If it were a problem we'd have lots of complaints. Things like disk replacement bugs are going to be hit fast and by many users simultaneously. You are but a single user with this problem.
 

linux203

Cadet
Joined
Apr 26, 2012
Messages
5
Found the source of the issue: System Tunables thru the GUI ;)

I had the following parameters set.
kern.geom.label.disk_ident.enable=0
kern.geom.label.gpt.enable=1
kern.geom.label.gptid.enable=1

I had enabled them out of laziness over a year ago. SMART errors show devices and my pools were showing guids. My laziness was not wanting to investigate which pool the failing drives were in.
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
The next time you can use my script to map drives identification infos, you can find it in the "Useful Scripts" link in my signature if you want it ;)
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
So my philosophy of "don't tune crap.. the defaults work just fine for 99.99999% of users" is still proving very much relevant. ;)
 
Status
Not open for further replies.
Top