Disk problems after upgrading to 9.3

Status
Not open for further replies.

Delivereath

Dabbler
Joined
Mar 5, 2014
Messages
36
Hi,

I was running my NAS with Freenas 9.1 until a week ago. I upgraded it to 9.3 and right after that, it started to have issues. My setup is 2 RAIDZ1 pools, one with 4x 1TB and one with 4x 2TB. Both are attached to a M1015 controller and Freenas runs as a virtual machine in ESXI. Freenas has a direct access to the controller using passtrough.

First I got a few warnings like :
- Firmware version 15 does not match driver version 16 for /dev/mps0
- zpool version not up to date

I updated my zpool version without any issue and this cleared the corresponding warning.

I also had some issues when reading video files from my NAS. At some point, the video would just freeze and I had to kill the task. At this exact moment, Freenas reported the following :

Feb 21 04:20:06 freenas (da1:mps0:0:0:0): READ(10). CDB: 28 00 0d c0 8a e8 00 00 08 00
Feb 21 04:20:06 freenas (da1:mps0:0:0:0): CAM status: SCSI Status Error
Feb 21 04:20:06 freenas (da1:mps0:0:0:0): SCSI status: Check Condition
Feb 21 04:20:06 freenas (da1:mps0:0:0:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Feb 21 04:20:06 freenas (da1:mps0:0:0:0): Info: 0xdc08ae8
Feb 21 04:20:06 freenas (da1:mps0:0:0:0): Error 5, Unretryable error


This continuously happens on da1.

Next I ran a long smartctl test and got 2 disks (da1 and da4) which failed due to read errors.

I've also seen in the logs the following message :

Feb 21 04:42:33 freenas smartd[2479]: Device: /dev/da1 [SAT], 16 Currently unreadable (pending) sectors

I ran a manual scrub. It repaired 50MB of data but I've now the following alert in the GUI :

CRITICAL: The volume PoolA (ZFS) state is ONLINE: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected.

Two open points :
- Regarding the firmware version driver message, could this lead to some disk errors ? Should I plan an update of the firmware ?

- About the disk, do you think that my da1 drive is dying ? It's quite odd this happening right after my update to 9.3. Anything I could do or test before changing the disk ?

Thanks
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You should backup the data NOW if it isn't already the case. da1 is dying, you should replace it as fast as you can after doing the backup. The replace has many chances to fail since it's a RAID-Z1 and you say da4 also begin to have errors. This is why you should backup your data before doing anything.

I've helped a member a few days ago with pretty much the exact same problem (but a bit worse, his pool doesn't mount after reboot) and yesterday I found that the two failing drives are too much trashed to get his data back.

You should also update the FW to the P16 version to avoid some problems, but it's not what's causing the errors right now.
 
Last edited:

Delivereath

Dabbler
Joined
Mar 5, 2014
Messages
36
Ok, I will change the disk. I already have a backup but anyway I'm copying data to my other Pool.

Is it enough to use the command cp -R -a to correctly copy the files ? Any way to easily compare checksums of all the files after the copy is done ?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
Ok, good ;)

Yes, note that -R is useless with -a as -a is the same as doing -RPp.

I don't know for the checksums but I can imagine there is a solution, just search on the web
 

DrKK

FreeNAS Generalissimo
Joined
Oct 15, 2013
Messages
3,630
Just for the record, it would appear to be complete coincidence that the disk failure occurred around the same time as the upgrade to 9.3.

I agree with the other guy, you are in imminent failure zone here.
 

Delivereath

Dabbler
Joined
Mar 5, 2014
Messages
36
I've ordered the new disk.

Regarding, the reconstruction of my RaidZ1, what would be the best way to do it ? Since I have all data backed up, could I just delete the raidz, create a new one and copy the data again ?
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Did you order 2 disks?

You original message said "Next I ran a long smartctl test and got 2 disks (da1 and da4) which failed due to read errors."

I've ordered the new disk.
 

Delivereath

Dabbler
Joined
Mar 5, 2014
Messages
36
I had no errors when using da4. However, it failed the smartctl test due to a read error.

It may be that it just has a few bad sectors. Any way to test that and to disable any bad sector ?
 

Bidule0hm

Server Electronics Sorcerer
Joined
Aug 5, 2013
Messages
3,710
You can write to that sector with the dd command to force a remap, but be extremely careful with dd, you can wipe your data if you don't know what you're doing. There is some threads on the forum with the details to do this ;)

But when a drive start to have some bad sectors generally it get worse and a few weeks (or even days) later you've many thousands bad sectors and the drive is dead at this point... But you can be lucky and only have a few bad sectors for months without any problem, just keep an eye on that value :)
 

Ericloewe

Server Wrangler
Moderator
Joined
Feb 15, 2014
Messages
20,194
I had no errors when using da4. However, it failed the smartctl test due to a read error.

It may be that it just has a few bad sectors. Any way to test that and to disable any bad sector ?

Wrong thought process. By the time you know the sectors are bad, the drive knows it as well. ZFS should fill in any gaps that might exist (of course, RAIDZ1 severely limits this capability...).

You might tolerate one or two bad sectors, but have a replacement drive burned-in and ready to go anyway. If you don't tolerate the bad sector, replace it now. In both cases, make sure you have a spare ready to go at the end of this little exercise.
 
Status
Not open for further replies.
Top