Stability issues new system

MichaelR

Cadet
Joined
Nov 14, 2020
Messages
5
Hi

Just build my own NAS with TrueNas 12.0. But I'm having severe stability issues, I'm hoping some can help me diagnose.

My brand new setup:
CPU: AMD Ryzen 5 3400G
RAM: Samsung DDR4 PC2666 16GB
Motherboard: Asrock X570M PRO4
Install disk: 1x Kingston Data Center DC1000B M.2
Cache disk: 1x SSD M.2 1TB Samsung 970 PRO M.2
Data disk: 2x 8TB WD WD82PURZ Purple Surveillance

During the night I had 7 unscheduled reboots of the system. Its headless and also nothing comes on the screen when I plug on in, so I cannot see if there is an error in the console. This morning I could not reach the web interface. Had to force shutdown the system and boot again.

Besides the stability issues I have battles Jails not wanting to be installed - But this is only plex so I guessing this is something else.

I have checked the /data/crash folder, but I'm not sure how to read the data. Is dump from device the device that is having issues and is panic string a generic hardware fault or does it point back to the device in question?

root@truenas[~]# cat /data/crash/info.last
Dump header from device: /dev/ada1p1
Architecture: amd64
Architecture Version: 4
Dump Length: 251392
Blocksize: 512
Compression: none
Dumptime: Sat Nov 14 06:09:01 2020
Hostname: truenas.local
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RC3 7c4ec6ff02c(HEAD) TRUENAS
Panic String: page fault
Dump Parity: 323915

root@truenas[~]# cat /data/crash/info.1
Dump header from device: /dev/ada1p1
Architecture: amd64
Architecture Version: 4
Dump Length: 737792
Blocksize: 512
Compression: none
Dumptime: Sat Nov 14 06:07:07 2020
Hostname: truenas.local
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RC3 7c4ec6ff02c(HEAD) TRUENAS
Panic String: general protection fault
Dump Parity: 3743347290
Bounds: 1

root@truenas[~]# cat /data/crash/info.4
Dump header from device: /dev/ada1p1
Architecture: amd64
Architecture Version: 4
Dump Length: 781824
Blocksize: 512
Compression: none
Dumptime: Sat Nov 14 05:43:12 2020
Hostname: truenas.local
Magic: FreeBSD Text Dump
Version String: FreeBSD 12.2-RC3 7c4ec6ff02c(HEAD) TRUENAS
Panic String: double fault
Dump Parity: 2121761597
Bounds: 4

I have not done any form of burn in of CPU or memory - But if Panic string is generic, I'm guessing I need to do a memory test.


Any advice on where I should look or how to read the data at hand or any where else to look for the issue ?

Thanks

Michael
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
With Ryzen there are some things in the bios that needs to be changed.
You have to disable some power-saving settings in the bios, otherwise the system will go in some kind of "sleep" mode and disable power for devices.
I myself use a Ryzen 5 1600x and had problems with random lockups after 3 days of uptime.

I can't remember exactly what i changed, especially because i only know the german setting names, but when im back home from work i can go into the bios and lookup the settings that i changed.

Hope this is some kind of help

Greetings

Lars
 

MichaelR

Cadet
Joined
Nov 14, 2020
Messages
5
With Ryzen there are some things in the bios that needs to be changed.
You have to disable some power-saving settings in the bios, otherwise the system will go in some kind of "sleep" mode and disable power for devices.
I myself use a Ryzen 5 1600x and had problems with random lockups after 3 days of uptime.

I can't remember exactly what i changed, especially because i only know the german setting names, but when im back home from work i can go into the bios and lookup the settings that i changed.

Hope this is some kind of help

Greetings

Lars
That would be great Lars. I have already disabled c-state, so I'm not sure it there is anything else that can be disabled. But would love it seen which settings anyway.
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
So I disabled 3 settings in Bios

I disabled C-States, AMD Cool & Quiet and ErP Ready

C-States and Cool & Quiet was in the CPU Features
ErP Ready was under Advanced and Power Management Setup

After disabling these three settings i had no more random hard lockups and the longest uptime I had bevore I added more Harddrives was 34 Days without any Hickups.

Hope this helps you stabalize your system.
 

MichaelR

Cadet
Joined
Nov 14, 2020
Messages
5
As I wrote C-state was already disabled (and Cool & Quiet is the same as c-state these days). Didn't find ErP Ready in my BIOS.

In the end I disabled SMT ( Multi threading ) and that seems to fix the issue. So I'm guessing BSD hasn't come around a support AMDs Ryzen CPUs yet. It least the page fault can be explained if it is miss reads from CPU cache due to the different CPU architecture AMD uses compared to Intel.

I have done extensive dd testing and no reboots while doing is. On a side note, this CPU has builtin GPU and per default it used 2Gb of memory to the GPU. I changed that in the BIOS to 128MB which is most likely also overkill for this servers need.

Will have to wait and see till I get time to test plex. But right now ZFS just got almost 2Gb more memory :)
 

LarsR

Guru
Joined
Oct 23, 2020
Messages
719
Nice to hear you got it stable, even though i couldn't help much...

Just out of curiosity are your cpu temps being reported correctly?
On my system the cpu temp has a +20 offset compared to my bios readings.
 

MichaelR

Cadet
Joined
Nov 14, 2020
Messages
5
Nice to hear you got it stable, even though i couldn't help much...

Just out of curiosity are your cpu temps being reported correctly?
On my system the cpu temp has a +20 offset compared to my bios readings.

I'm not sure at about the offset. But 20 degree Celsius seems about right. Right now Truenas is reporting min cpu temp of -8, mean of 30 and max of 40. A 20 degree bump to the temps would seem to fit better with what I have seen in BIOS.

And thanks for the help Lars, had almost given up on running Truenas, you sent me back down the right path.
 

MichaelR

Cadet
Joined
Nov 14, 2020
Messages
5
So, the problem was still there. Unstable at times, nightly reboots etc. Tried to test with some memory I had from a server, but that being ECC memory the system wouldn't boot. (This also resets the BISO settings). I put back my old memory again but this time on bank A2, after looking at the manual of the motherboard, I had the memory in ram slot A1 originally.

Now my system is rock solid an has been up for over 48 timer. And with default BIOS settings (C-state is disabled by default)
 

G8One2

Patron
Joined
Jan 2, 2017
Messages
248
 

SKova

Dabbler
Joined
Dec 12, 2019
Messages
12
As I wrote C-state was already disabled (and Cool & Quiet is the same as c-state these days). Didn't find ErP Ready in my BIOS.

In the end I disabled SMT ( Multi threading ) and that seems to fix the issue. So I'm guessing BSD hasn't come around a support AMDs Ryzen CPUs yet. It least the page fault can be explained if it is miss reads from CPU cache due to the different CPU architecture AMD uses compared to Intel.

I have done extensive dd testing and no reboots while doing is. On a side note, this CPU has builtin GPU and per default it used 2Gb of memory to the GPU. I changed that in the BIOS to 128MB which is most likely also overkill for this servers need.

Will have to wait and see till I get time to test plex. But right now ZFS just got almost 2Gb more memory :)
I have been having the same issue(s), at least at face value. I was hoping that what you found would help, although within 60 seconds of logging back in, it rebooted again. I did not have this issue until I updated to the U6 version. Still searching ...
 

SKova

Dabbler
Joined
Dec 12, 2019
Messages
12
I have been having the same issue(s), at least at face value. I was hoping that what you found would help, although within 60 seconds of logging back in, it rebooted again. I did not have this issue until I updated to the U6 version. Still searching ...
Interestingly, since that last reboot, it has not had an issue. I will continue to monitor since making the changes and update in case others have a similar issue.
 
Top