Warning of ZFS but disks are all healthy

Status
Not open for further replies.

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
Dear All,

I found that the performance was degraded a lot when I was copying something big size into the drive.

I have a warning from the system:
<quote>
The volume XXXXXX (ZFS) status is UNKNOWN: One or more devies has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'
</quote>
Screen Shot 2014-05-24 at 6.09.53 pm.png


Screen Shot 2014-05-22 at 1.00.00 am.png


System info:
Screen Shot 2014-05-24 at 6.08.21 pm.png

BUT, I checked the status of all the disks and the ZFS volumes are "healthy".
Screen Shot 2014-05-24 at 6.11.46 pm.png


What should I do ?? (*0*)


Thanks,
Victor LEE
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
Did you get an email message about it? If yes, what did it say? Next I'd check your drives SMART data.

How about listing your hardware as well, it may help.
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
In the shell
zpool status
might say something.

zpool history
might offer more clues. However, you just rebooted your system, so you might need to go a little back in history (events have dates), try
zpool history | more
SPACE advances by screen, ENTER advances by line.
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
Did you get an email message about it? If yes, what did it say? Next I'd check your drives SMART data.

How about listing your hardware as well, it may help.

joeschmuck, I haven't setup the email alert yet. (so far I am still fail to connect to internet also)

How to listing out my hardware from "where"??? :)
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
Screen Shot 2014-05-25 at 8.26.15 am.png



In the shell
zpool status
Screen Shot 2014-05-25 at 8.27.12 am.png


Oooooops!!!! There is an error but without any information.......

zpool history | more
SPACE advances by screen, ENTER advances by line.
Screen Shot 2014-05-25 at 8.30.37 am.png


The problem occurred since "2014-05-18", but seems there is not much information. (T__T)


The system was upgraded to "FreeNAS-9.2.1.5-RELEASE-x64 (80c1d35) "
 

ser_rhaegar

Patron
Joined
Feb 2, 2014
Messages
358
Please post the output of
Code:
cat /var/log/messages

Use Putty so you can get all of the text via copy/paste.
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
after I upgrade to 9.2.1, the system rewrote the log........ lost the log before 2014-05-24........

I attached to log.
 

Attachments

  • messages.txt
    84.7 KB · Views: 284

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Just list your hardware, something like:
* Intel Pentium Processor G630 (3M Cache, 2.70 GHz)
* 2x 4GB Kingston non-ECC RAM
* motherboard brand and model (with xxx SATA ports)
* additional SATA controller (brand and model) if any
* 5x WD Green 2TB
* 750W Antec power supply
* Kingston DataTraveler SE9 8GB

Last two, and information about your case and cooling solutions, would likely be relevant only if overheating was an issue.

P.S.
Your messages.txt indicates that your motherboard only has RealTek chip for your Gigabit Ethernet. That is not good...
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
Thank you for your screenshots! Please notice that it is the middle disk (the third one) that had experienced the errors.
  1. zpool clear volume-rack-001 gtpid/the_rest_from_the_third_disk
  2. gpart list ada0 | grep rawuuid
  3. gpart list ada1 | grep rawuuid
  4. gpart list ada2 | grep rawuuid
  5. gpart list ada3 | grep rawuuid
  6. gpart list ada4 | grep rawuuid
  7. One of these will be a match for the disk that had errors. Lets say it was a disk adaN, then execute smartctl -A /dev/adaN and share with us the output.
Then read http://illumos.org/msg/ZFS-8000-9P, for some explanation.
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
After I executed all the command, any data will be destroyed??


- Because I want save the SATA port for my RAID hand drives. "Kingston DataTraveler SE9 8GB" is the disk of FreeNAS system........
- The computer was built by myself, I think that I have enough fans for the computer.
If the problem is caused by the overheating, which part will be the most concerned?? CPU? Hard-disks? All??
- how I can monitor the temperature from FreeNAS system? or I need to buy another monitoring system (hardware/software)?
- Although I am a software engineer, but I don't have much knowledge on networking. Are you suggesting I need two Ethernet ports and separate the Tx and Rx???

Thanks x 999999 for you suggestion on hardware, I don't know that much :)



Before I execute any command before.
I re-execute the "zpool status" again
Screen Shot 2014-05-25 at 12.47.07 pm.png
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
First I'd like to address the comment about the RealTek NIC, I have never seen one be the cause of a system failure. They work good providing you have a good CPU. It is true that they are not as fast as an Intel NIC however sometimes it's only slightly. If you have a slow CPU and a RealTek NIC, you are looking to see slow network performance.

I think solarisguy is giving you sound advice to identify your problem. I don't want to distract from this but here is what I'd like to see...

In the shell type the following and record the drive, ID values 5, 194 thru 198, and include the title of each value, for instance 194 should be temperature. Do this for all drives ada0 through ada4:
smartctl -a /dev/ada0 | more

The format I'd like to see is:
ada0
5 Reallocated Sector Count = 0
etc...

To retain the format of text you should be using code brackets (these are the braces you see {}# when you are entering a message, just below the Font Size selection).

You should setup your FreeNAS network and email address info so you can receive email messages when things go wrong with your NAS. Those emails could save your data!

And I did a foolish thing, I assumed something. I assumed you had been running this for a while because you are running FreeNAS 9.1.1. That was probably foolish of me.

Also, none of this will harm your data, you are only typing in commands to gather information so we can assist you.
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
Code:
smartctl -a /dev/ada0 | more

Screen Shot 2014-05-25 at 11.12.43 pm.png


Code:
smartctl -a /dev/ada1 | more

Screen Shot 2014-05-25 at 11.14.42 pm.png


Code:
smartctl -a /dev/ada2 | more

Screen Shot 2014-05-25 at 11.16.20 pm.png


Code:
smartctl -a /dev/ada3 | more

te.png


Code:
smartctl -a /dev/ada4 | more

Screen Shot 2014-05-25 at 11.17.56 pm.png




I will setup the email and internet connection on the server, thank for great suggestion :)
 

vlky

Dabbler
Joined
Oct 16, 2013
Messages
15
I tried to execute all the command and there is the output
Screen Shot 2014-05-26 at 8.03.57 pm.png



I found that I clear the error / reboot the system. Disk ada2 CKSUM will become 0. but it will turn to be a number (0-99) if I writing something into the ZFS.

The best way to solve the problem should that I buy and replace a disk with the new one?
Any S.M.A.R.T. test can gather more information??
 

solarisguy

Guru
Joined
Apr 4, 2014
Messages
1,125
I cannot see anything wrong with your ada2. I do not have enough hardware experience to say whether a SATA port or a cable could be bad. However switching around two SATA ports on your motherboard would quickly determine whether the error follows the disk/cable. Another trial would be to replace the SATA cable, try buying a locking variety for a change (may be at least disk or motherboard supports locking).

P.S. Try to copy and paste text from your screen. Images can be very difficult to read, while text can be set to any size in a browser (use joeschmuck's advice about using code).
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I do not see anything wrong with the SMART data you posted, all the drives look fine. The ID 193 value is high but over the lifetime of the drive I doubt it will cause you ill effect. If you want to look at that more then search for WDIDLE3.exe but this is not your current problem. I wouldn't do this until after you have figured out current your problem.

I agree with SolarisGuy, turn off your NAS, swap SATA data cables as he indicated or you could just move them all in a rotation (ada0 to ada1, ada1 to ada2, etc... ada4 to ada0). This will more all the data cables. You can do it at the drive or MB connection, you choice.
 
Status
Not open for further replies.
Top