Volume degraded: SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure) -- Is the HD bad?

billhickok · Jan 3, 2015

As I was transferring data to my newly built 10-disk RAID Z2 array, I received an e-mailing stating that my volume is degraded. It says "One or more devices are faulted in response to persistent errors. Sufficient replicas exist for the pool to continue functioning in a degraded state." Here is the output of the system log:

Jan 3 20:27:43 freenas (da7:mps0:0:7:0): READ(10). CDB: 28 00 00 40 00 81 00 00 01 00
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): CAM status: SCSI Status Error
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI status: Check Condition
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): Retrying command (per sense data)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): READ(10). CDB: 28 00 00 40 00 81 00 00 01 00
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): CAM status: SCSI Status Error
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI status: Check Condition
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): Retrying command (per sense data)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): READ(10). CDB: 28 00 00 40 00 81 00 00 01 00
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): CAM status: SCSI Status Error
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI status: Check Condition
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): Retrying command (per sense data)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): READ(10). CDB: 28 00 00 40 00 81 00 00 01 00
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): CAM status: SCSI Status Error
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI status: Check Condition
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): Retrying command (per sense data)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): READ(10). CDB: 28 00 00 40 00 81 00 00 01 00
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): CAM status: SCSI Status Error
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI status: Check Condition
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure)
Jan 3 20:27:43 freenas (da7:mps0:0:7:0): Error 5, Retries exhausted
Jan 3 20:53:30 freenas (pass7:mps0:0:7:0): ATA COMMAND PASS THROUGH(16). CDB: 85 08 0e 00 d5 00 01 00 06 00 4f 00 c2 00 b0 00 length 512 SMID 841 terminated ioc 804b scsi 0 state c xfer 0

Looks like this particular disk is faulted:

What would be the best way to test whether this specific disk is actually bad or if it's some other hardware issue (like the hotswap backplane or the SAS cable)? The log doesn't make much sense to me but perhaps someone more knowledgeable can chime in?

billhickok · Jan 4, 2015

Okay so I powered off the machine, took the disk out, re-inserted it and rebooted. It now looks like all disks are online and the alert went away. I've scheduled a SMART long self-test on the disk in 30 min., let's see what happens...

billhickok · Jan 4, 2015

I received another alert just now, "The volume RAIDZ2Pool (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected."

I see this in the volume status:

Resilver shows it's complete (with 0 errors?) and disk da7p2 shows checksum 52...this was the disk that was originally reported as being faulted prior to the reboot. What does this mean? Do I need to replace this disk?

cyberjock · Jan 4, 2015

That's because by shutting down the system you reset the error indicators. So when you powered the box back on the disk that was previously kicked has zero errors (since the machine just booted up) so ZFS happily puts the bad disk back in the pool. Once an error occurs (which you can expect for the conditions) you'll get that alert. I'd suspect that the drive will rack up a bunch more errors and eventually be kicked from the pool... again.

Of course, if you reboot the server it will do this all over again.

billhickok · Jan 22, 2015

I'm not so sure what's causing this error, but it's now happened a few times and only seems to occur when transferring large amounts of data to the NAS. I've done 30+ batches of 300GB transfers and it's happened 4 times thus far. The data transfers fine but of course FreeNAS reports the pool as degraded and drive da7p2 as faulted/offline. I shut off the machine, reboot, and it recognizes the drive as online, then proceeds to resilver successfully. Doing a scrub now doesn't result in any errors or checksum values >0. It seems I only received errors the very first time it's happened.

Obviously I don't want to keep going through this cycle as resilvering/scrubbing takes too much time and applies stress to the drives...so i'm going to try switching the miniSAS to SATA cable to see if this helps.

Question...I have 2 extra SATA connectors hanging around in my tower because i'm only using 10 drives and 3 miniSAS-to-4xSATA breakout cables. The thing is, the cable that's currently connected to the drive that keeps faulting (da7p2) is connected to a different controller than those 2 loose connectors. My question is, can I simply do a cable/controller swap for this disk and reboot? Would FreeNAS be aware and adapt to such a change or will it result in my pool being degraded?

Hope that makes sense...

Ericloewe · Jan 22, 2015

You can move disks around without a care in the world, as long as the controllers are supported.

Important Announcement for the TrueNAS Community.

Volume degraded: SCSI sense: HARDWARE FAILURE asc:44,0 (Internal target failure) -- Is the HD bad?

billhickok

Dabbler

billhickok

Dabbler

billhickok

Dabbler

cyberjock

Inactive Account

billhickok

Dabbler

Ericloewe

Server Wrangler

Similar threads