Relocating Disks - Errors Result

ZooKeeper

Cadet
Joined
Dec 12, 2016
Messages
6
I've been recently working on optimizing and reconfiguring my storage pools in my home "lab". As part of this process, I've destroyed a RAIDZ pool that contained 3x 8TB white label Western Digital disks that were shucked from Easystores. The pool was destroyed through the GUI's Export/Disconnect action, checking all the option and confirmation boxes. No errors present upon completion of the task. My intent is to reuse 2 of these disks to expand another existing pool already configured as RAID10 with multiple 2-way mirror VDEVs.

While these 2 disks lived their previous life in the RAIDZ pool, they were physically located in the NAS box itself (specs below!). I've now relocated them into a NetApp shelf attached to the same physical server. When inserting the disks into the shelf, I receive the errors below on the console.

Code:
Dec 17 12:20:17 NASBOX ses0: da50,pass50 in Device Slot 12, SAS Slot: 1 phys at slot 11
Dec 17 12:20:17 NASBOX ses0:  phy 0: SAS device type 1 phy 1 Target ( SSP )
Dec 17 12:20:17 NASBOX ses0:  phy 0: parent 500a098001265eff addr 50000c90000cee26
Dec 17 12:20:17 NASBOX ses1: da21,pass21 in Device Slot 12, SAS Slot: 1 phys at slot 11
Dec 17 12:20:17 NASBOX ses1:  phy 0: SAS device type 1 phy 0 Target ( SSP )
Dec 17 12:20:17 NASBOX ses1:  phy 0: parent 500a0980017939ff addr 50000c90000cee25
Dec 17 12:20:19 NASBOX (da21:mps1:0:20:0): SERVICE ACTION IN(16). CDB: 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00
Dec 17 12:20:19 NASBOX (da21:mps1:0:20:0): SCSI sense: ABORTED COMMAND asc:44,0 (Internal target failure)
Dec 17 12:20:19 NASBOX (da21:mps1:0:20:0): Command Specific Info: 0x9e100000
Dec 17 12:20:19 NASBOX (da21:mps1:0:20:0): fatal error, failed to attach to device
Dec 17 12:20:19 NASBOX (da50:mps1:0:8:0): SERVICE ACTION IN(16). CDB: 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00
Dec 17 12:20:19 NASBOX (da50:mps1:0:8:0): SCSI sense: ABORTED COMMAND asc:44,0 (Internal target failure)
Dec 17 12:20:19 NASBOX (da50:mps1:0:8:0): Command Specific Info: 0x9e100000
Dec 17 12:20:19 NASBOX (da50:mps1:0:8:0): fatal error, failed to attach to device
Dec 17 12:20:19 NASBOX g_access(961): provider da21 has error 6 set
Dec 17 12:20:22 NASBOX g_access(961)[960]: Last message 'provider da21 has er' repeated 6 times, suppressed by syslog-ng on NASBOX.mydomain.net
Dec 17 12:20:22 NASBOX (da55:mps1:0:23:0): SERVICE ACTION IN(16). CDB: 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00
Dec 17 12:20:22 NASBOX (da55:mps1:0:23:0): SCSI sense: ABORTED COMMAND asc:44,0 (Internal target failure)
Dec 17 12:20:22 NASBOX (da55:mps1:0:23:0): Command Specific Info: 0x9e100000
Dec 17 12:20:22 NASBOX (da55:mps1:0:23:0): fatal error, failed to attach to device
Dec 17 12:20:22 NASBOX (da56:mps1:0:60:0): SERVICE ACTION IN(16). CDB: 9e 10 00 00 00 00 00 00 00 00 00 00 00 20 00 00
Dec 17 12:20:22 NASBOX (da56:mps1:0:60:0): SCSI sense: ABORTED COMMAND asc:44,0 (Internal target failure)
Dec 17 12:20:22 NASBOX (da56:mps1:0:60:0): Command Specific Info: 0x9e100000
Dec 17 12:20:22 NASBOX (da56:mps1:0:60:0): fatal error, failed to attach to device
Dec 17 12:20:22 NASBOX g_access(961): provider da55 has error 6 set
Dec 17 12:21:17 NASBOX g_access(961)[960]: Last message 'provider da55 has er' repeated 6 times, suppressed by syslog-ng on NASBOX.mydomain.net
Dec 17 12:21:17 NASBOX ses0: pass59 in Device Slot 16, SAS Slot: 1 phys at slot 15
Dec 17 12:21:17 NASBOX ses1: pass58 in Device Slot 16, SAS Slot: 1 phys at slot 15
Dec 17 12:21:17 NASBOX ses0:  phy 0: SAS device type 1 phy 1 Target ( SSP )
Dec 17 12:21:17 NASBOX ses1:  phy 0: SAS device type 1 phy 0 Target ( SSP )
Dec 17 12:21:17 NASBOX ses0:  phy 0: parent 500a098001265eff addr 50000c90000d4442
Dec 17 12:21:17 NASBOX ses1:  phy 0: parent 500a0980017939ff addr 50000c90000d4441


The NetApp shelf is a recent addition to my TrueNAS setup within the last month. However, I have 20 other disks I've populated the NetApp shelf with that are happily storing and retrieving data in their own respective pools. This includes two other similar 8TB white label Western Digital disks that I pulled from cold storage.

In my attempts to troubleshoot I've done the basics - reboot, reseat disks, and try the disks in another system. For advanced troubleshooting, I turned to searching the forum and using the Googles. I've reached the end of what I can do without asking for help.

In researching my issue, I've found that the multipathing features of the NetApp/ZFS can have a funny effect on disks, especially when they are moved around physically. To combat this, I learned you wipe out the MBR of the disk in order to start fresh. I moved the disks to a retired gaming rig I now use solely for disk burn-in testing that also has TrueNAS-12.0-U7 installed. On the retired box, I wiped the MBR of the two disks using the following command:

Code:
dd if=/dev/zero of=/dev/da# bs=512 count=1


The disks were then returned to the NetApp shelf and the same errors appeared as provided above. (Note - I did have success with this for other disks I also moved around!) I removed the disks from the NetApp and placed them back into the retired box. I then used the wipe utility within TrueNAS to write zeros to the entirety of both disks overnight. This morning the wipe was complete, I placed the disks back into the NetApp and the same errors appeared.

While the disks are inserted into the NetApp and I use the command "camcontrol devlist", I can see the 2 disks with their 2 paths. No additional/unique information is provided however as is provided for the other disks:

Code:
root@NASBOX[~]# camcontrol devlist
<ATA SA 4321>                      at scbus1 target 8 lun 0 (pass50)
<ATA SA 4321>                      at scbus1 target 20 lun 0 (pass21)
<ATA SA 4321>                      at scbus1 target 23 lun 0 (pass58)
<ATA SA 4321>                      at scbus1 target 60 lun 0 (pass59)
<WDC WD80EMAZ-00WJTSM 4321>        at scbus1 target 27 lun 0 (pass22,da22)
<WDC WD80EMAZ-00WJTSM 4321>        at scbus1 target 31 lun 0 (pass23,da23)
<ADDITIONAL DISKS REDACTED>


If I physically remove the misbehaving disks from the NetApp and use the command "camcontrol devlist" again, the 4 lines for the 2 disks disappear.

The camcontrol command has been the only way I've been able to "see" the disks inserted into the NetApp shelf aside from the console messages.

At this point, I'm stuck. I'd like some help with additional troubleshooting steps and/or thoughts and opinions about what may be causing problems with these 2 disks. I did mention there was a third disk in the original pool I destroyed, aside from simply being partially removed from the server so the disk is no longer powered, it remains untouched and can be used for testing if necessary.

Any help is greatly appreciated!


TrueNas Server configuration:
Dell R510II
- TrueNAS-12.0-U7
- 2x Xeon L5609
- 12x 3.5" Hot Swap Disk Bays
- 128GB ECC RAM
- Dell H200 HBA Flashed to IT mode (FW Ver 20.00.07.00, NVDATA 14.01.00.08)
- HP H221 HBA Flashed to IT mode (FW Ver 20.00.07.00, NVDATA 14.01.00.06)
- Boot device - Two Mirrored External USB 2.5" Mobile Disks
- Mellanox ConnectX 10GB Adapter
Netapp DS4243
- Dual IOM6 Controllers (Unknown FW Versions)
- Quad Power Supplies
 

ZooKeeper

Cadet
Joined
Dec 12, 2016
Messages
6
Increasing the count did not resolve the issue. When inserting the disk into the NetApp, I still receive the same errors stating "fatal error, failed to attach to device" and "provider da## has error 6 set".

In my original post, I had mentioned that I also used the TrueNAS disk wipe utility, writing zeros to the entire disk. Wouldn't this have also done the same (and more) as increasing the number of blocks dd would write zeros to? Does the disk wipe utility do something entirely different from the dd command?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
In my original post, I had mentioned that I also used the TrueNAS disk wipe utility, writing zeros to the entire disk.

I actually read dozens-to-hundreds of posts here daily, and don't always keep precise state on all the things that people've said, sorry.

Does the disk wipe utility do something entirely different from the dd command?

I have no idea what it does or what the problem is that you're running into, so it makes sense to me to "go back to basics." My best guess is that it's probably a competent wipe of some sort. My best guesses are occasionally wrong though.

Wouldn't this have also done the same (and more) as increasing the number of blocks dd would write zeros to?

Well, I *do* know what the dd command does, and I use it in my own system building tools for the intended purpose, so it seemed like an obvious correction. After that, we start to get into uglier stuff like alternate sector sizes, weird firmware issues, and other much-harder-to-debug stuff.
 
Top