My FreeNas box get reboted automatically time to time

Status
Not open for further replies.

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
HI All
Im using freenas box purchased from ixsystems and the OS build version is FreeNAS-9.3-STABLE-201511040813 .
I notice from last three month NAS system get restarted automatically time to time sometimes two or three times per month , due to this several of my corporate vm servers to get crash. I'm using nas storage pool to deploy my corporate servers configured on VMWARE environment in esxi 5.5.
My knowledge in linux is not good therefore troubleshooting on NAS side is difficult for me. Please help me to identify reason for this automatic reboot.
 

Mirfster

Doesn't know what he's talking about
Joined
Oct 2, 2015
Messages
3,215

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
HI Mirfster
Thank for your reply . I have attached debug file with this.
We use centralized UPS to power up all the servers at server room so this server connected to PDU where other servers connected.(not directly connect)
 

Attachments

  • debug-freenas-20160118175858..tgz
    673.6 KB · Views: 191

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Are you using NFS or iSCSI with ESXi?

While *I* didn't see a smoking gun regarding your issue, I did notice a couple of things.

First, you have 7 disks in RAIDz3. For block storage, the recommendation is to used striped mirrors for best performance. Also, your pool is 78% full, with 49% fragmentation. For NFS/iSCSI, it's recommended not to fill your pool past 50%.
 

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
Im using iscsi to bind NAS zfs pool with esxi. also From esxi event viewer i notice this MSG most of the time "Performance has deteriorated I/O increased from average value of 5253 millisecond to 263110"
So issue free nas get reboot may be due to HD performance because its utilized moor than 75% is it can you advice using available HD any arrangement i can do to avoid this reboot.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Here are some of your spec's, so others don't have to dig through the debug file.

CPU: Intel(R) Xeon(R) CPU E5-2609 v2 @ 2.50GHz
RAM: 127.94 GiB

Code:
pool: tank1
state: ONLINE
  scan: resilvered 14.2G in 5h47m with 0 errors on Wed Jan  6 22:40:18 2016
config:

    NAME                                            STATE     READ WRITE CKSUM
    tank1                                           ONLINE       0     0     0
      raidz3-0                                      ONLINE       0     0     0
        gptid/53652340-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/540afe51-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/54af6522-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/55573e2d-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/55fedc27-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/56a54d75-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/574d35ea-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
    logs
      mirror-1                                      ONLINE       0     0     0
        gptid/30a7b29e-a669-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/30c43c1a-a669-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0



Shown below is a mockup of what your pool might look like, using mirrors instead. You'd need more vdev's and/or larger disks, to get the storage requirements you need.

If you wanted more redundancy, you could add an additional disk in each vdev, to create 3 way mirrors.

@jgreco has a similar system. I *think* he has 24 x 2TB drives, laid out in 7 vdevs of 3 way mirrors. Of the ~14TB of storage, the maximum he would use, would be ~7TB.


Code:
pool: tank1
state: ONLINE
  config:

    NAME                                            STATE     READ WRITE CKSUM
    tank1                                           ONLINE       0     0     0
      mirror-0                                      ONLINE       0     0     0
        gptid/53652340-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/540afe51-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
      mirror-1
        gptid/54af6522-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/55573e2d-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
      mirror-2
        gptid/55fedc27-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/56a54d75-a668-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
    logs
      mirror-1                                      ONLINE       0     0     0
        gptid/30a7b29e-a669-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
        gptid/30c43c1a-a669-11e4-ac56-0cc47a1732e8  ONLINE       0     0     0
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Shown below is a mockup of what your pool might look like, using mirrors instead. You'd need more vdev's and/or larger disks, to get the storage requirements you need.

If you wanted more redundancy, you could add an additional disk in each vdev, to create 3 way mirrors.

@jgreco has a similar system. I *think* he has 24 x 2TB drives, laid out in 7 vdevs of 3 way mirrors. Of the ~14TB of storage, the maximum he would use, would be ~7TB.

Yeah, the RAIDZ3 shown is going to be terrible for performance. For every block of data updated on your iSCSI or NFS, ZFS has to write to four separate disks.

My recollection is that the version of FreeNAS listed suffered from a bug that's discussed in https://forums.freenas.org/index.php?threads/spontaneous-system-crashes-with-iscsi-luns.39433/page-2 etc., the suggested fix is to upgrade to the current code.

Upgrading will not fix the performance problems. The only thing likely to fix that is a better pool design.

The use of mirrors is strongly recommended. You currently have RAIDZ3; this allows the total failure of any three drives without pool loss (but loss of redundancy). This is hard to duplicate with mirroring, but I suggest that a three-way mirror is sufficient if your redundancy rule can be simplified to "a drive failure shall not result in loss of redundancy."

On the VM filer here, which is a SC216BE26 chassis with the extra trays in back, my original plan was to load it up with 24 2TB drives and then run seven sets of 3-way mirror vdevs, using the three remaining as warm spares, and use the two rear trays for an SSD pool. I've since changed that plan to go for 26 2TB drives, 8 sets, and two spares in the back, since I came up with a better gameplan for the SSD pool. @gpsguy has been obviously paying attention to what I've been saying, I would like to remain at half a dozen TB used on the pool in order to keep performance pleasant.

It's important to note that ZFS write performance will plummet as fragmentation increases, and the best thing you can do for that is to throw lots of space at the problem, and to eliminate any unnecessary writes that don't need to be made to the pool.

Read performance can be enhanced for the pool through use of L2ARC. I have 768GB of L2ARC on the system but it's stabilized around 520GB used after 35 days of uptime. ("I think I've discovered the working set size, Ma!") Unfortunately it is only shouldering a portion of the total VM load it was intended to handle...

If you want to tell me that it's depressing to deploy 52TB of HDD in order to get ~6-8TB of usable storage, well, fine, buy me a beer and tell me. It's just the way it is.
 

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
HI Jgreco

Sorry for delay reply and Thank you very much for the info you have provided .
So with the existing raidz configuration possibility what can do is increase the pool by plugging HD , or i like to know any other way i can used to resolve tis reboot issue.
Today again my system got rebooted it was the same time last automatic reboot after two week. i have attached console msges appeared after this boot.
im planing increase pool size ,in this box 3 slots left so i can plug three 1 TB disks and increase the pool space or can i get any other recommendations like relishing any mirror disks and update pool or doing config change anything that will help me to resolve this auto reboot.
today i note one other warning will this be reason for current issue (WARNING: Firmware version 16 does not match driver version 20 for /dev/mps0. Please flash controller to P20 IT firmware.)
Sorry my knowledge on Linux and NAS config is very bad therefore troubleshooting on my self will put this system in up side down.
 

Attachments

  • Console msg nas rebbot.txt
    36.4 KB · Views: 235

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
The biggest bits of advice are to update to the latest FreeNAS release and to use mirrors instead of RAIDZ.
 

gpsguy

Active Member
Joined
Jan 22, 2012
Messages
4,472
Please reflash your HBA with the phase 20 firmware.


Sent from my phone
 

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
HI All

Thanks for the guidance provide so far that really help me to get understanding and part of troubleshooting i need to think. But i feel i didn't receive any hint to identify my existing issue. Reason im saying today again my freenas box got rebooted and monitoring i was unable to find any performance issue like high Disk , High CPU or memory usage not even esxi servers utilized ZPOL in this time.
Can i know any command or method to identify, troubleshoot is there any hardware issue in this server.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
Yes, there's a sticky over in the hardware forum describing how to do burn-in on your system. Normally hardware problems get shaken out during a proper burn-in and testing phase, which is normally about four to six weeks worth of intensive testing of the hardware to shake out any problems. The current guidance is maybe a little vague and I keep meaning to revisit, but is especially meant to shake out the sorts of hardware problems you might be experiencing, that only strike now and then.
 

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
Hi All

Im thanking you all for the provided instruction.
Finally manged to identify reason for this reboot issue yes its due to the hared disks performance issue probably the raidz3 configuration. To identify this i moved VMware vmdk disks one by one from zpool to another storage(local disk of esxi server ) there i noticed after moving half of vmdks good performance on NAS then after moving all the vmdks till now freenas box didn't got restart automatically.
So i have decided to rebuild the raidz configuration and create new zpool for that i need instruction from all of your. following is the disk drives what i have attached to nas.
7 disk of 1 TB capacity , 2 SSD disks (500GB and 200GB) . i need pool capacity of minimum 4.5TB to store existing vmdks..

Please advise me.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
You have insufficient space.

For any type of block storage, such as vmdk storage, where you want good performance, you need two things: mirrors plus LOTS of free space. You do not want to fill a vmdk datastore more than about 50% full. Over time, even at 50% full, performance won't be that great. Check out this graph:

delphix-small.png


Now, here's the interesting thing. The random steady state throughput of a raw single disk drive without ZFS is about 150 IOPS, or around 600KB/sec. If you look at this graph of performance of ZFS, you'll notice that as the disk fills, the graph trends downwards and approaches that 600KB/sec rate when the pool is around 90% full. The problem is, with ZFS, "random" and "sequential" writes are handled in roughly the same way, so at 90% full, your ZFS pool will be writing fairly slowly, because it is having to seek all over the place to find little teeny blocks of space that are available on the pool, even for things that you might think are "sequential" writes. On the flip side, leaving lots of free space (let's say by using only 10% of the space) means that ZFS is capable of writing both random and sequential data out at 6000KB/sec, or ten times the speed that the underlying raw device would be capable of. That's because ZFS is reordering all of that into contiguous ranges and writing it without tons of seeks.

This is key to understanding ZFS performance with block protocols, since block protocols are very stressy on the random writes.

If you need 5TB of space to store vmdk's, you want 10-15TB of pool space. That requires 20-30TB of raw space, because your pool should be mirrors.

You can take your seven 1TB disks and make three 1TB mirror vdevs and a warm spare. It'll give you a 3TB pool of which you shouldn't use more than about 1-1.5TB.

To get where you say you need to be, I would suggest taking 4 6TB drives and creating two 6TB mirror vdevs. That'll give you a 12TB pool which should do a reasonable job of storing your 4.5TB, but if it seems slow or if you need more space, you can always add another pair of 6TB drives to get additional speed/space.

Feel free to complain that this seems like a lot of resources to commit to providing storage. The thing is, you give ZFS what it needs, and ZFS will treat you well. You are trading one thing for another with ZFS. Our VM filer here uses 48TB of raw disk to provide a 7TB datastore.
 

i_indrajith

Cadet
Joined
Jan 17, 2016
Messages
8
jgreco Technical explanation what you have explained fully understood and cleared like clean sky and i agreed store should not utilized moor than 50% that i experienced with the issue what i faced.
problem is i'm out of resources and its very hard for me to get release fund from Mgt to purchase new disks because when this NAS idea presented to management(not me ) they pictured this as a free version and using small resource we can satisfied our entire system requirement but your explanation was ignored.

What to do I have to live with what i got.
Ok my total VMDK's capacity is 2.8Tb so with this existing disks i must mange to reconfigure data pool and later i'll expand it with new 3 disks hardly managed to get approval to purchase 3 x 1Tb disks on next quarter.
I decided to build data pool using 6 x 1Tb with RAIDZ (i get 4.54 TiB) or RAIDZ2 (i get 3.63 TiB) pool and use extra disk as spare (i can expand pool later with 3x1Tb). them add other 2SSD as cache to better write speed use as caching so data get stored before writing to pool. will this ok. i know but i have to ignore your explanation to survive.
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
What I don't understand after reading this thread is why the computer rebooted, even if performance was very poor and after the system was updated to the most current version of FreeNAS. This sounds like a hardware issue (motherboard/RAM/CPU/Add on-card/PS) or bug in the software.
 
Status
Not open for further replies.
Top