FreeNAS 9.3: Attempting to OFFLINE a disk causes system to crash

Status
Not open for further replies.

scott2500uk

Dabbler
Joined
Nov 17, 2014
Messages
37
Hi all,

I have the following disk setup:
Screen Shot 2018-10-08 at 15.11.13.png


Disk da9 has been throwing errors:

Code:
(da9:mps0:0:17:0): Info: 0x1c99caf7

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 cc 10 00 01 00 00 length 131072 SMID 313 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 00 40 02 90 00 00 10 00 length 8192 SMID 333 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 d1 c0 ba 90 00 00 00 10 00 00 length 8192 SMID 795 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 d1 c0 bc 90 00 00 00 10 00 00 length 8192 SMID 62 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 cb 10 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c99cb77

(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 1c 99 cb 10 00 00 c0 00 length 98304 SMID 417 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 d6 10 00 01 00 00 length 131072 SMID 660 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 d5 10 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c99d5b7

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 d7 10 00 01 00 00 length 131072 SMID 716 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 d6 10 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c99d6b7

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 e0 90 00 01 00 00 length 131072 SMID 730 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 df 90 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c99e070

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 e1 90 00 01 00 00 length 131072 SMID 307 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 99 e0 90 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c99e177

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 97 71 10 00 01 00 00 length 131072 SMID 343 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 97 70 10 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c9770b7

(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 97 72 10 00 01 00 00 length 131072 SMID 430 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 02 88 00 00 40 00 length 32768 SMID 1010 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 02 c8 00 00 40 00 length 32768 SMID 187 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 03 08 00 00 40 00 length 32768 SMID 82 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 03 48 00 00 40 00 length 32768 SMID 83 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 d1 c0 ba 90 00 00 00 10 00 00 length 8192 SMID 237 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 03 88 00 00 40 00 length 32768 SMID 1017 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 03 c8 00 00 40 00 length 32768 SMID 178 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(16). CDB: 88 00 00 00 00 01 d1 c0 bc 90 00 00 00 10 00 00 length 8192 SMID 792 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 01 88 00 00 40 00 length 32768 SMID 613 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 01 c8 00 00 40 00 length 32768 SMID 463 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 02 08 00 00 40 00 length 32768 SMID 663 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): WRITE(10). CDB: 2a 00 d3 0a 02 48 00 00 40 00 length 32768 SMID 529 terminated ioc 804b scsi 0 state 0 xfer 0
(da9:mps0:0:17:0): READ(10). CDB: 28 00 1c 97 71 10 00 01 00 00 (da9:mps0:0:17:0): Info: 0x1c977167


Despite all the errors FreeNAS is keeping the drive ONLINE

I have inserted a suitable replacement drive into the system on a spare connection and is labelled as da18.

Following the docs here https://doc.freenas.org/9.3/freenas_storage.html#replacing-a-failed-drive. I attempt to offline da9 but by doing so the system hangs then shortly after reboots. When the system comes back da9 still shows as online.

How do I proceed when the system crashes when offlining a disk via the GUI?

Thanks

Scott
 
Joined
Jul 3, 2015
Messages
926
Might be worth powering down and then pulling the disk. Fire it back up again and do your replacement.
 

danb35

Hall of Famer
Joined
Aug 16, 2011
Messages
15,504
How do I proceed when the system crashes when offlining a disk via the GUI?
Well, you could consider upgrading to a version less than five four years old, as there may well be relevant bugs that have been fixed. But aside from that, you could try initiating the replacement without first taking da9 offline. I don't know if that will work or not, but it seems worth trying.
 

scott2500uk

Dabbler
Joined
Nov 17, 2014
Messages
37
Hi All,

Thanks for all the suggestions. In the end, I decided to go with powering down the machine, pulling the faulty drive, booting then using replace via the GUI. The faulty drive has been sent off for RMA as it's on yr3 of its 5yr warranty.

Resilvering of the new drive took about 8hrs at ~650MB/s once the morning rush of staff trying to get to their files died down.

@danb35, We are still on 9.3 due to it being in production. It has a number of jails and VM's in VirtualBox on this server that migration to a newer version would be a big pain in the butt.

We have recently bought a new dedicated server for running VM's in and currently in the process of migrating the jails and VM's to this new server so that our files server can go back to being just a file server, making upgrading to a new version of FreeNas that little bit easier.
 

sretalla

Powered by Neutrality
Moderator
Joined
Jan 1, 2016
Messages
9,703
If I remember right, this would be related to offlining a disk that is taking part in the swap pool, which isn't handled too well in some older versions, but works better now in recent versions.

The powerdown option you took is probably the only way out of it in that version as the system would work around a missing disk in assigning the disks with a swap partition to the pool at boot.
 
Status
Not open for further replies.
Top