Please help! Started up FreeNAS, suddenly voluime storage (ZFS) status unknown?!

Status
Not open for further replies.

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
On a sidenote; with an UPS is it possible to setup FreeNAS whenever there is a power outage and it switches to the UPS, it automatically shuts down the server (after a few minutes) so nothing is lost and there is no data corruption?

Nobody answered your question.

The UPS service is designed to do just that! Because the improper shutdown is the problem when you go on the UPS, if you have setup the service, the server should shut itself down automatically without your intervention on a loss of power. Any OS that can't handle that simple function with a UPS isn't worth jack squat.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Likely some of them at the least. Not that you bothered to mention what the panics are. The txg the pool was rolled back to also may not have been the most consistent one. That's water under the bridge at this point though.

Ehr... Okay, next time I have a kernel panic, I will stop working, get into my car and drive 35 minutes to home and take a picture of the kernel panic from the screen. Big chance I will be to late, since it reboots automatically after 15 seconds. So please don't say that I didn't bother to mention it, it's kinda impossible to achieve this when I am at work and doing everything remotely. And furthermore; I was at that time more interested to get my most important stuff transfered to my computer (= priority number one)!!



I would try importing the pool read-only as this may possibly allow you to copy other things you couldn't before. You could also try setting the failmode to continue and see if that behaves any better. That seems familiar. :rolleyes:

Since you didn't mention the correct command to do this, I am guessing it is the code below:
Code:
zpool import -f -R /mnt -o rdonly=on storage




You could use the old stick, but I would at the least detach the old pool to remove it from the db before creating a new one. For the plugins, reinstall everything first then copy the config files over with the jail shutdown or the plugins shutdown at least.

Okay, so if I am correct; bootup with the FreeNAS USBstick > detach pool from GUI (or shell) > create new pool.
In regards to the plugins, are the config.ini files all that I need?

Sidenote; If I would save my FreeNAS settings and a completely new / clean install of FreeNAS to this USB stick and import the settings / config file of FreeNAS, wouldn't this be better?



For the experimentation phase I would setup a 3 x disk raidz1 pool and a 3 x disk UFS raid3 pool. Then use rsync to backup the zpool to the UFS array.

Okay I think I understand what you mean. I guess (with my limited knowledge) I am going to test this with doing some big writes to the raidz1 pool right?
And I guess I would have to use "dd"-command for this, correct? I will be needing the correct command to do a big write on the raidz1 (or I could copy over some big files, if it's better).



SMART is hardly absolute. All the manufacturers made sure of that. I agree it does make it unlikely to physically be the disks though.

Ah okay... Well I guess doing the setup (experimental phase, as you described above) could be a good thing to (re)test everything hardware-related (including harddisks). And if something's wrong, I guess we shall see it.



HHawk, this has already been mentioned before, but I will repeat it: No form of RAID is a substitute for backups. ZFS being RAID-like. A real backup is also not physically connected to the same system. You don't have any data unless you have a second unrelated, independent copy of it.

Well I am not arguing this, but I never had this kind of dataloss before with any RAID-setup I have used over the past years. This even includes RAID 0 array with four 80 GB harddisk (yes, that's several years ago). Lucky? Most probably.
Anyways, I think it will become very expensive to get several more harddisks of sufficient size to backup my stuff on the NAS. On the other side, I have learned my lessons and I will backup the "most" important stuff differently; like burning pictures to DVD and similar.



Nobody answered your question.

The UPS service is designed to do just that! Because the improper shutdown is the problem when you go on the UPS, if you have setup the service, the server should shut itself down automatically without your intervention on a loss of power. Any OS that can't handle that simple function with a UPS isn't worth jack squat.

Thanks for the answer cyberjock. I already found an affordable solution for my UPS-needs: PowerWalker VI 1200
And should work with the "blazer_usb" driver. I only have to look into it, how to set it up correctly when I order it and receive it of course...
 

purduephotog

Explorer
Joined
Jan 14, 2013
Messages
73
For UPS's I've had good luck buying used APC units and refurbing them. Most of the time they just need new batteries. The larger units become quite affordable (at around 100$ used, 400$ new) that way.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Just mounted the zpool as readonly but still kernel panic and reboot after trying to transfer the "Grimm"-folder for example.
Well since replies are coming slowly the past few days (no offence meant), I will setup a new raidz1 + 3x disk UFS raid3 pool and test it some more.

I have the most stuff transfered, or at least the stuff which I considered the most important though.

If someone still can answer the remaining questions I have, I would be very grateful!
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Small update...

I have created, as mentioned by paleoN 2 arrays:

- zfs raidz1 (3 disks) pool name: test
- raid3 UFS (3 disks) pool name: raid3

However I am having a hard time to setup rsynch properly for testing (so it backups everything from the pool "test" to "raid3").
I checked the docs and also some tutorials, but I cannot understand it. I never used rsynch before, so a little help setting this up correctly through the GUI would be appreciated. And I do want to use a schedule.

This is purely for testing to see if I can find out what is causing problems, IF they are still there of course.

Well it's been a busy and long day for me, so I am calling it a day for now. Hopefully someone has an answer by the time I wake up.

I will run a scrub and a long disk check with smartcl in the meantime.

//update

I just noticed something when doing smartctl -a /dev/da2

(look at the bold line)

Code:
[root@freenas] ~# smartctl -a /dev/da2
smartctl 5.43 2012-06-30 r3573 [FreeBSD 8.3-RELEASE-p4 amd64] (local build)
Copyright (C) 2002-12 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format)
Device Model:     WDC WD20EARX-00PASB0
Serial Number:    WD-WMAZA5235148
LU WWN Device Id: 5 0014ee 002ca24ef
Firmware Version: 51.0AB51
User Capacity:    2,000,398,934,016 bytes [2.00 TB]
Sector Sizes:     512 bytes logical, 4096 bytes physical
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Wed Mar 27 23:29:31 2013 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x82) Offline data collection activity
                                        was completed without error.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      ( 249) Self-test routine in progress...
                                        90% of test remaining.
Total time to complete Offline 
data collection:                (36780) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine 
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 355) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       0
  3 Spin_Up_Time            0x0027   184   164   021    Pre-fail  Always       -       5791
  4 Start_Stop_Count        0x0032   100   100   000    Old_age   Always       -       460
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   100   253   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       4833
 10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       458
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       115
193 Load_Cycle_Count        0x0032   199   199   000    Old_age   Always       -       4169
194 Temperature_Celsius     0x0022   124   115   000    Old_age   Always       -       26
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   200   200   000    Old_age   Offline      -       2

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      4712         -
# 2  Short offline       Completed without error       00%      4689         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.


Multi_Zone_Error_Rate?!

Is that bad? And could that be the cause of all my problems (probably not)?
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
There's only speculation as to what that error rate means, if it's linear, logarithmic, exponential, etc., if its time based or cumulative, or even how "bad" that is.

I will tell you that I have 16 of your drives in my system and they are all zero.

You had a RAIDZ2, so you'd need 2 disks plus 1 more to fail to cause data loss. I find it incredibly unlikely that a single disk is the problem. Whatever the "fault" was, I feel pretty confident in saying that whatever the problem was the issue was confined to no less than 3 disks and could have been from all of them.

ZFS is designed to handle corruption and correct for it as long as you have enough redundancy. While that error could have caused some localised(or even widespread corruption) there's no way to know what the extent of the damage is because there's no clear answer as to what the Multi_Zone_Error_Rate means. Besides that, even if it caused crap data to be read that wasn't identified as crap data, ZFS should have been capable of correcting for it. The fact that it didn't pretty much tells me you had some kind of catastrophic failure mode. What the actual failure was has, so far, been a complete mystery.

If I were in your shoes, I wouldn't be trusting that particular machine with storing your data until you are doing backups. There's no telling what the failure was before, or when/if it will ever happen again.

For all we know it could have been something like a stick of RAM that wasn't 100% in the slot, and when you opened the case you instinctively pushed on them a little and they seated fully and you fixed the issue without knowing it. People have unknowingly fixed loose SATA cable connectors, power connectors, etc. That's why when you have a problem its important that you be calm and be very very observant of even the smallest things. After all, the smallest things can cause the biggest problems.
 

titan_rw

Guru
Joined
Sep 1, 2012
Messages
586
I just checked all my drives for multi zone error rate.

Two of them have it. One I know is a bad drive, the other works fine.

Here's the one that I know is bad:

Code:
200 Multi_Zone_Error_Rate   0x0008   173   001   000    Old_age   Offline      -       5565


This drive also has ~1200 reallocated sectors.

This is the other drive with multi zone error rate:

Code:
200 Multi_Zone_Error_Rate   0x000a   099   097   000    Old_age   Always       -       2152


This drive is working fine. Only 2 reallocated sectors.

I'd say a low non zero value is ok. If the value was really high, or if there were other smart attributes of question, I'd be more worried.
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Well no answer on how to setup the rsync properly, so I did it differently; I setup a cron job with some commands to do the same. I found these commands from somewhere else on the forums. So hopefully this will work.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Rsync is pretty versatile, and is something you really have to setup yourself with the settings you want. Your settings don't equal my settings, and its important you understand the theoretical and then properly apply it to the non-theoretical.

You're better off with questions like that if you post the exact error message you are getting and details. That's why you got no response. I can't help someone that says "my car won't get me to work.. please help". Will it not start? Does it not stay running? What have you looked at? Does it just not move?
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
What errors are you talking about?

I think you missed a few posts, because I am done transfering files and now testing AS MENTIONED with raidz1 and raid3 with rsync, AS MENTIONED!
And AS MENTIONED I already got it working, I quote:

Code:
I setup a cron job with some commands to do the same. I found these commands from somewhere else on the forums.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
What errors are you talking about?

I think you missed a few posts, because I am done transfering files and now testing AS MENTIONED with raidz1 and raid3 with rsync, AS MENTIONED!
And AS MENTIONED I already got it working, I quote:

Code:
I setup a cron job with some commands to do the same. I found these commands from somewhere else on the forums.

Nope. I read all of your posts. But a cronjob isn't the same as using rsync via the GUI. It may perform the same function though. I could easily use a cronjob to use cp or rsync or whatever I want to do too. I was responding to your request for help setting up rsync. I'd argue you didn't get "it"(rsync) working if you had to setup a cronjob. The GUI has that feature built-in! Trying to setup via cronjob may work, but you'd learn about how to use rsync properly if you used it through the GUI. But eh, your loss.

Good luck!
 

HHawk

Contributor
Joined
Jun 8, 2011
Messages
176
Well it isn't much easier than setting the following cron job:

Code:
rsync -av --log-file=/mnt/test/rsync.log /mnt/test /mnt/raid3


This was less than a minute work, where understanding rsync took me half a day and it still didn't work.
So that's why I searched other solutions and I found the above code (or at least similar), but after adjusting that it works flawlessly.

Can't beat a minute's work, especially if you do not understand rsync and no help is provided how to do this correctly.

Anyways, harddisk and hardware seem to work normal. So maybe it was a power outage which caused problems to the ZFS system. Any case, hardware seems fine.
Will do some more testing and after that transfer all contents back to the NAS.

So no loss at all, yeah my time getting rsync to work properly from within the GUI. The lost time with that cannot be gotten back. ;)
 
Status
Not open for further replies.
Top