Ryzen Stability on 11.0-U4

Status
Not open for further replies.

diedrichg

Wizard
Joined
Dec 4, 2012
Messages
1,319
Isn't that RAM usage normal? ZFS will use all available memory. My Intel machine uses all 16GB.
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
Isn't that RAM usage normal? ZFS will use all available memory. My Intel machine uses all 16GB.

I think it is normal, but you would know more than me. I am just pointing out that my system seems to be crash when it maxes out the memory and not before. ( at least I haven't observed it to) Perhaps I should be more careful to word my statements than, "when I use all my ram". -WM
 

ykhodo

Explorer
Joined
Oct 19, 2017
Messages
52
So, the devs are still investigating, but have no leads right now. So l will continue to search for symptoms and potential causes.

1) I have strong reason to believe that is it is memory (ram) related. ( My system seems to die when I use all my ram)(memory leak bug is not helping, should be patched in the next update)
2) Power management or motherboard power saving could also be related. ( Cool'n'Quiet and power optimiztions)

I have three primary questions?
1) Is everyone running ECC ram in a compatible motherboard?
2) Is anyone getting crashes running non-ECC ram?
3 What motherboard are you using? -- (mine- ASRock AB350 PRO4)

Have a great week!-- Cheers

1) Yes, I am running the asrock taichi https://www.asrock.com/mb/AMD/X370 Taichi/
2) using corsair DDR4 ECC RAM (CT16G4WFD824A)
3) See #1
 
Joined
Apr 9, 2015
Messages
1,258

ezra

Contributor
Joined
Jan 15, 2015
Messages
124
Having the black screen super often, like every 2 hours. Cant say if its because i'm doing any particular thing on the server...

1) Is everyone running ECC ram in a compatible motherboard? Yes
2) Is anyone getting crashes running non-ECC ram? Can't say not tested.
3 What motherboard are you using? -- (mine- ASRock AB350 PRO4) ASrock Taichi 370x

Using a UPS? tried to disable it maybe something is going wrong there...

Edit; Just failed removing a lot of snapshots.

Running a linux VM inside FreeNAS all the time, SMB shares, Plex Jail, daily and weekly snapshots, B2 backup, UPS,
 

ykhodo

Explorer
Joined
Oct 19, 2017
Messages
52
Seems Ryzen still has issues with FreeBSD https://www.phoronix.com/scan.php?page=news_item&px=Ryzen-BSD-Lock-Ups-2018

May want to start sending issues upstream to the FreeBSD forums if you are still having issues as they will be the ones to implement the fixes.
I read through the mailing list linked in the blog post and saw a mention of disabling SMT in the BIOS. I tried that today and what consistently would lock my machine (replication to an external) succeeded. I am going to see how long it can stay up, but this may be a temporary solution!
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
I am currently in another testing phase ( only 2 days), but currently treating it as a HW/power management or Memory related issue. I have disabled C6 state, DIMM power down, cool and quiet, and another power saving feature. I have also disabled DF error freeze ( I have no idea about this one).

Ezra I think your issue is something completely different, most AMD systems running freenas are stable for days not hours.
 

ezra

Contributor
Joined
Jan 15, 2015
Messages
124
Hmm ran without errors from my last post now... totally at random. Can i help in your testing?
 

cchr82

Dabbler
Joined
Dec 6, 2017
Messages
18
I think I might be suffering from your same situation but have yet to solve the issue. However I do not think it was a RAM issue in the end. Here is my current status in trying to solve the consistent crashing:

[HW]
Mobo: asrock x370 taichi
RAM: corsair ECC 2x16GB
CPU: tried both 1500 and 1500X

[BIOS]
* Everything except SMT has been disabled (cool nquiet, c6, kvm)

[OS]
0) Crashed consistently within 24hours on Freenas11.1

https://forums.freenas.org/index.php?threads/freenas-11-crash-during-write.59578/#post-424111

1) Ran prime95 for ~12 hours from a USB stick, no stability problems
2) Changed to freebsd - would run a long iozone read/write test: would crash each time on freebsd: the following iozone command was used

Code:
iozone -M -e -+u -T -t 12 -r 128k -s 50G -i 0 -i 1 -i 2 -i 8 -+p 70 -C 


3) Changed to debian kernel 9.4 and 9.13 - Would no longer crash on when running iozone, but would still crash within a day after installing zfs-on-linux, docker, plex, handbrake (I have no idea if any of these caused the crash but I remember when I first booted up OS it was OK for the first 24 hours before I installed docker/plex although I doubt that would be an issue.
This would still crash even though memory usage was very very low

4) currently trying debian 9.14 without anything loaded to see how long it stays alive

I am strongly considering scraping this and returning the MOBO/CPU and starting fresh. Have a few more days to decide.

I am currently in another testing phase ( only 2 days), but currently treating it as a HW/power management or Memory related issue. I have disabled C6 state, DIMM power down, cool and quiet, and another power saving feature. I have also disabled DF error freeze ( I have no idea about this one).

Ezra I think your issue is something completely different, most AMD systems running freenas are stable for days not hours.
 

ykhodo

Explorer
Joined
Oct 19, 2017
Messages
52
I think I might be suffering from your same situation but have yet to solve the issue. However I do not think it was a RAM issue in the end. Here is my current status in trying to solve the consistent crashing:

[HW]
Mobo: asrock x370 taichi
RAM: corsair ECC 2x16GB
CPU: tried both 1500 and 1500X

[BIOS]
* Everything except SMT has been disabled (cool nquiet, c6, kvm)

[OS]
0) Crashed consistently within 24hours on Freenas11.1

https://forums.freenas.org/index.php?threads/freenas-11-crash-during-write.59578/#post-424111

1) Ran prime95 for ~12 hours from a USB stick, no stability problems
2) Changed to freebsd - would run a long iozone read/write test: would crash each time on freebsd: the following iozone command was used

Code:
iozone -M -e -+u -T -t 12 -r 128k -s 50G -i 0 -i 1 -i 2 -i 8 -+p 70 -C 


3) Changed to debian kernel 9.4 and 9.13 - Would no longer crash on when running iozone, but would still crash within a day after installing zfs-on-linux, docker, plex, handbrake (I have no idea if any of these caused the crash but I remember when I first booted up OS it was OK for the first 24 hours before I installed docker/plex although I doubt that would be an issue.
This would still crash even though memory usage was very very low

4) currently trying debian 9.14 without anything loaded to see how long it stays alive

I am strongly considering scraping this and returning the MOBO/CPU and starting fresh. Have a few more days to decide.

Disabling SMT is no longer crashing, even with cool 'n quiet, c6 enabled (which would crash right away in 11.1-U1 for me)
 

Heasy

Cadet
Joined
Jan 24, 2018
Messages
1
I am having the same problems, the system runs brilliantly however it will freeze randomly and keyboard is non responsive. I did not have remote logging set up yet but I do now. The system has been running for a month now and isn't doing anything strenuous. Hosting some network data and VM stores. Over the past 5 weeks it's frozen 4-5 times. Overnight when not in use and during the day with people using data. I've done a bit of reading but I was under the impression the Threadripper's were not having this problem going by what I've read so far.

Hardware:
AMD Ryzen Threadripper 1950X
GigaByte Aorus Gaming 7 X399 Mainboard
32GB DDR4 Corsair Vengance LED (2x16GB) 3000Mhz Kit
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
Heasy, you are the first Freenas Threadripper I have seen so..... congrats. Non ECC Memeory as well, so its probably an issue with FREEBSD or the actual AMD HW implementation. Threadripper should not have any SMT errors so you should be good there.

The remote Syslogs have been pretty useless so far, but fill free to post them if you get anything useful. The freenas Devs have temporarily given up right now on the issue as there really is not enough information for a diagnostic.

We know that Windows and Ubuntu are stable long term so I have to assume its a bug in software somewhere. -- Cheers
 

cchr82

Dabbler
Joined
Dec 6, 2017
Messages
18
Disabling SMT is no longer crashing, even with cool 'n quiet, c6 enabled (which would crash right away in 11.1-U1 for me)

Disabling SMT also seems to have fixed the immediate crashing (within 24hours) on the debian version. In that instance it was a combination of docker being installed and SMT. In other words, if I installed docker then I had to disable SMT.

Do you think this is specific to the X370 taichi boards or something where need to be patient and wait for a fix on the OS side?
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
Disabling SMT also seems to have fixed the immediate crashing (within 24hours) on the debian version. In that instance it was a combination of docker being installed and SMT. In other words, if I installed docker then I had to disable SMT.

Do you think this is specific to the X370 taichi boards or something where need to be patient and wait for a fix on the OS side?

It's not the motherboards, it is happening on a wide variety of motherboards, as well as threadripper, non& ecc boards. Disabling SMT just increases the time between crashes, the same as with disabling cool and quiet. We need more debug info, but it is pretty hard to come by with the computer's freezing. I am currently testing a new setup. Back on 11.0u4 I was able to go 20 days without crashing so, I will only report once I get to 25. We should have a new patch 11.1 U2 in 30ish days. --
 

cchr82

Dabbler
Joined
Dec 6, 2017
Messages
18
It's not the motherboards, it is happening on a wide variety of motherboards, as well as threadripper, non& ecc boards. Disabling SMT just increases the time between crashes, the same as with disabling cool and quiet. We need more debug info, but it is pretty hard to come by with the computer's freezing. I am currently testing a new setup. Back on 11.0u4 I was able to go 20 days without crashing so, I will only report once I get to 25. We should have a new patch 11.1 U2 in 30ish days. --

Thanks for your response. Do you think its worth keep these mobo/cpu builds under the assumption that eventually it will be stable??? or is that unlikely and I should strongly consider returning them.

Its interesting to me that running docker specifically was causing crashing on a Debian build; I have no idea if that helps diagnose these problems although that is a completely different OS.
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
Thanks for your response. Do you think its worth keep these mobo/cpu builds under the assumption that eventually it will be stable??? or is that unlikely and I should strongly consider returning them.

Its interesting to me that running docker specifically was causing crashing on a Debian build; I have no idea if that helps diagnose these problems although that is a completely different OS.
It could be months before the problem is found. -- Unless you can monitor it and be a guinea pig go for a stable build. AMD thread/cpu count is very nice, but FreeBSD is not liking Zen right now.
 

ezra

Contributor
Joined
Jan 15, 2015
Messages
124
Wow, disabling SMT keeps it up for over 3 days now... at the cost of a slight performance hit i guess... what did you use as a remote syslog server? I got a spare RaspberryPI around... Need to find some guides then i can help figure this out. I do want to contribute but dont know how...
 

wackymole

Explorer
Joined
Aug 21, 2017
Messages
59
11.1-U1 is useless to rsyslog because of the "freenas health " bug. I just crashed on my test, 13 days. A lot of devices have syslog in them and with a little tweaking can be as a rsyslog server. Ryzen still unstable on FreeBSD/FreeNAS. Firmware 4.5 AB350 Pro4

Edit:1 FreeNAS on top of ESxi probably runs stable.
 
Last edited by a moderator:

ykhodo

Explorer
Joined
Oct 19, 2017
Messages
52
This was with SMT disabled?
 
Last edited by a moderator:
Status
Not open for further replies.
Top