8.3.0-RC1 - System freezes/dies

Status
Not open for further replies.

audix

Dabbler
Joined
Jun 11, 2011
Messages
36
The system is running ok and suddenly it's just dead. No error messages, no console output.
From the beginning I thought it had something to do when I did "heavy" access from my laptop (linux) but it seems to die even if there is no nfs-access as well. It seems to take somewhere between 0,5h - 1h, (sometimes it can stay up to say 12h).

8.3.0-RC1, x64 (r12617), (NFS, CIFS, rsync, smart, ssh enabled)
AMD Athlon II X3 415e, 8 GB.
ASUS motherboard.
Promise SATA300 TX4.
6 disks in raid-z2. 2 datasets (Samsung F4)
3 disks, separate volumes. (various)
3 clients (usually nfs).

Only message from gui is the alerts telling zfs-pool is not updated.


The system was created with Freenas v8 (about 1,5 year ago) and has been updated to 8.0.1, 8.0.2, 8.0.3 and 8.0.4. All worked find all the time.
Then updated to 8.2. Kernel panics... after reading about many other having this problem, I upgraded to 8.3.0-beta2. Worked fine for some weeks.
After starting to halt I was going to update to beta 3 but RC1 was just released (not announced). No change from beta2.

Strange halts could be hardware faults. I ran memory tests for hours, no problems found.
I have check all cables, cards fully inserted, etc.
Updated BIOS.
I booted on RC1-CD and reinstalled/upgraded again (in case RC1 was updated when it was actually announced).


I am going to buy a new usb flash drive just to test. I have not upgraded the zfs pools because I am thinking of going back to 8.0.4 again.

Any thoughts? What else can I test? Can I look for errors anywhere?
Should I add an official bug? If so, what else do I need to include?

Thanks,
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I recommend you do not convert your pool until 8.3.0-Release comes out and you have given it a week or so of testing. Using a different USB drive may work as I know there is a FreeBSD issue with some USB drives that didn't exist in the 8.0.4 build, right now all I know is large capacity HP flash drives may fail to mount properly and can hang the system.

If the problem you are having didn't exist prior to upgrading to 8.3.0-RC1 and you feel your hardware isn't the issue, yes you should post a trouble ticket. Include as much detail about the problem as you can. I mean everything no matter how small because it could be the key to figuring out the problem or not. Basic hardware configuration and model numbers of the flash drive help. If you could capture console data, especially if there is an error message or maybe the same last message is always displayed which could be a clue.
 

audix

Dabbler
Joined
Jun 11, 2011
Messages
36
Thanks joeschmuck. I was considering testing upgrading the pools, but I guess then I cannot downgrade to 8.0.4 if needed.

Today I bought two new usb memory sticks (different brands). Will reinstall and test.
BTW, should I import the current settings, or use auto-import etc? Just in case there is something "wrong" with the config-db? Is that even possible?
 

audix

Dabbler
Joined
Jun 11, 2011
Messages
36
Reinstalled on one of the new memory sticks. So far so good... (uptime > 1h, not much usage though).

The old USB-memory stick is a SANDISK Cruzer Blade 4GB. The new one is an ADATA Classic C802 4GB.
 

Letni

Explorer
Joined
Jan 22, 2012
Messages
63
I had the same issue yesterday with my USB stick ( generic Microcenter 4 GB stick) finally went belly on my N36L on my 8.0.4-p3 Install. I knew it was having issues as for weeks the CLI and web interface became VERY UNRESPONSIVE. Eventually I couldn't manage the device by either interface. I rebooted and sure enough, segment fault on loading the kernel on reboot (not even past the first stage of booting). I took the drive out and tried ZERO-OUT the entire USB stick.. I put a fresh image of 8.3.0-RC1 on and it segment faulted 1/2 way through the boot process..

I ended up using a different (4 GB USB stick) I had laying around. I imaged 8.3.0-RC1 over and it booted right up.. I had to do a little hacking to mount my zpool to get my backup config off, but I eventually got it off and re-applied.. Now up running on a new USB stick on 8.3.0-RC1 with no problems.. Will be holding off on upgrading to ZFS v28, however until were out of BETA on 8.3
 

audix

Dabbler
Joined
Jun 11, 2011
Messages
36
grr.... after a couple of hours uptime I copied some files to the server. It has now died again... no error messages or anything... :mad:
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
I recommend you take one of those flash drives and put 8.0.4 back on it and leave it that way. You can pop that in if you like providing you don't upgrade the pool. You can then test your system on the other flash drive with 8.3.0-RC1 or whatever the next flavor is you want to test.

If it is a PS like William suggests then it will happen with 8.0.4 running.

One suggestion is to place the boot flash into a different port and try it. It made a difference on my MB, it's a shot in the dark but it might help.
 

cyberjock

Inactive Account
Joined
Mar 25, 2012
Messages
19,526
Joe's recommendation is exactly how I do business on my FreeNAS servers. I have 2 thumbdrives for each server. One is my production, and the other for betas/RCs/etc. If I want to experiment I switch them out. They're both identical drives with stickers on them to identify them.

If my production USB drive fails I use my beta/RC thumbdrive as the new production(after an OS install) and I order a replacement for betas/RCs/etc.

It does sound very likely to be a power supply issue. I'd check that out and see if the fan is actually working for your power supply. If it starts overheating it can cause very odd things to happen. Do not fool yourself by hooking up a power supply tester and call it good if it says it is. Those devices don't load the PSU significantly and often have a very wide tolerance for what is "good" voltage. My recommendation would be to use another power supply that you know works and put it in the server temporarily.
 

fpiazza

Cadet
Joined
Oct 12, 2012
Messages
3
I just like to add that I'm having the same problem.

Have tested the memory and HW extensively, even running Linux to check if the HW is really Ok. Not a single problem with Linux, while stressing the 5 disks at the same time for hours.

Running FreeNAS 8.3.0-RC1 on an ASUS P8H77-m Pro mobo, 16 GB RAM, i3 CPU, and 5 WD 2TB disks.

It seemed, at first, that the freeze happened when there was a heavy disk access, or when I use the Web GUI while the disks are working hard. It made no sense, because under Linux that doesn't happen at all.

I managed to "replicate" the problem by opening a shell and running rsync. It would freeze after about 10 minutes, just when you would start to think you passed the first few minutes, and everything is Ok.

The systems never uses more than about 80% of memory, so I think is not ZFS memory starvation.

I haven't applied any tuning, because with 10 TB in disks, and 16 GB of RAM, it should run with any tweaking.

I then tested the same system with NAS4Free (9.1.0.1, I guess), and had basically the same problem. Is that a ZFS instability? I tried to exclude a ZFS 28 instability by testing with FreeNAS 8.2 as well. Same problem...
 

hostit1

Cadet
Joined
Oct 24, 2012
Messages
1
First, I wanted to thank FreeNas and the community for the great software.

The purpose of this message is to input that I am having kernel panics from time to time. I was able to duplicate a kernel panic 3 times now in the same day with the below console message:
"Fatal trap 12: page fault while in kernel mode"

The third issue, I decided to run top on FreeNAS while I was transferring/unzipping files to my cpanel2 nfs share that I created. At the time of the last server panic, the server had a 96.6% idle cpu with a Linux Load of 0.70 0.27 0.21.

Also I was pinging the system and saw it at 0.050 ms average, then it spiked to over 500 ms (local net), then about 5 minutes later, no response.


Below is a little information regarding my test setup, speed, etc. Hopefully it may provide a little information as to whether it is an issue with my setup or possibly an issue with the version of FreeNAS that I am running.


I have a 2 x Six-Core AMD Opteron(tm) Processor 2419 EE system w/ 32 gigs of ram and 4 x 2TB disks.
I have 2 x 1 GB NIC cards bonded on the NAS server

I have created a single ZFS volume with a Raid5 configuration giving me 5.2 TB of storage. BTW, these are SATA II drives
I am using FreeNas mainly for the backup of a few proxmox VM systems (Staggered daily) for about 2 hours per day. Since I am not using much of a load, I decided to give NFS a try with a CPANEL server that I have (2 small accounts)

I created a NFS for the cpanel system and mounted /home via fstab
10.175.110.253:/mnt/storage/cpanel2 /home nfs rw 0 0

I can read and write, but as soon as I try to perform a cpanel copy from one server to to this server (about 50 meg account), the freenas server dies a hard death.

I believe that my speed is great locally on the nas server
[root@freenas] /mnt/storage# dd if=/dev/zero of=test.dat bs=1048576 count=2048
2048+0 records in
2048+0 records out
2147483648 bytes transferred in 1.900634 secs (1129877499 bytes/sec)

NFS
10.175.110.253:/mnt/storage/nfs
5.2T 509G 4.7T 10% /mnt/pve/storage

From a server connected to a single gig interface for storage, using NFS, I get good speed (I believe)
root@jordan:/mnt/pve/nfs# dd if=/dev/zero of=test.dat bs=1048576 count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 23.095 s, 93.0 MB/s

NFS Info on server jordan
10.175.110.253:/mnt/storage/nfs
5.2T 509G 4.7T 10% /mnt/pve/nfs
root@jordan:~# iperf -c 10.175.110.253 -p 65000
------------------------------------------------------------
Client connecting to 10.175.110.253, TCP port 65000
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.175.110.47 port 53417 connected with 10.175.110.253 port 65000
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.09 GBytes 937 Mbits/sec

Now on the cpanel server
10.175.110.253:/mnt/storage/cpanel2
5.2T 509G 4.7T 10% /home

root@cpanel2 [/home]# dd if=/dev/zero of=test.dat bs=1048576 count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 133.175 s, 16.1 MB/s

Cheers

Tim R.
 

audix

Dabbler
Joined
Jun 11, 2011
Messages
36
There is more info in the ticket but here is a short summary of the current status.

- No difference with 8.3.0-Rel
- Changed PSU, no change.
- Works fine running linux and windows
- Hardware diagnostics and burn-in software show no errors
- Hangs even though all services are off, takes about 1h
- Works fine if booted on FreeNas-CD (kept it in "shell")

When it hangs, it still answers on ping, but for nothing else.

My guess is that it either is some strange hardware error, or some bug in the zfs-part.

I welcome all ideas. :confused:
 
Status
Not open for further replies.
Top