SOLVED TrueNAS keeps restarting every 6-10 min

Etorix

Wizard
Joined
Dec 30, 2020
Messages
2,134
Thanks for the details.
So, no bifurcation, two Optanes plugged to the PSU and two powered by their PCIe slot. Individually, each component should have a suitable power source. Maybe it's a global issue and there's not enough power, or an overloaded rail, when the five 9305-24i plus two 905p drives demand full power from their slots all at the same time? I notice that this board has no extra power connector which could feed the PCIe slots.
 

NugentS

MVP
Joined
Apr 16, 2020
Messages
2,947
Can I suggest going back to basics.
A single drive (HDD, SSD), no others connecting to anything. Create a pool and fill it with rubbish
If that works add a few more and test
 

Sawtaytoes

Patron
Joined
Jul 9, 2022
Messages
221
Can I suggest going back to basics.
A single drive (HDD, SSD), no others connecting to anything. Create a pool and fill it with rubbish
If that works add a few more and test
I was already doing this with the extra SSDs that weren't in a zpool. No issues writing to them. I even moved around all the miniSAS HD cables to different cards and even removed cards and added SAS expanders. I couldn't get a reboot until I imported Bunnies and wrote to it.

To check if TrueNAS have a issue you can try to boot other Linux distro that use the same or new version of ZFS and try to import Bunnies pool and test writes.
If the resets continue TrueNAS is not a problem but still can be a software issue - ZFS or something else.
Or you can try TrueNAS Core if Core ZFS can import pool from Scale to check if there is something different in logs.
This was a great idea. Sadly, I already nixed the whole zpool.

Since that pool was the issue, I created and destroyed a bunch of zpools doing tons of benchmarks until I was satisfied that it wouldn't die.

After creating my new dRAID zpool with these SSDs, I went ahead and started copying data. It's been writing at ~2.0-2.5GB/s consistently with ZFS send/recv through TrueNAS from my HDD array. I'd say those are pretty good numbers considering the source drives.

I'm assuming something was botched with that zpool. Was it a TrueNAS or ZFS issue? I can't tell anymore. Would've been good to know for posterity, but this sucked 3 days of my time. I'm just wanting everything back to normal.

SSD zpool benchmarks

I did a ton of benchmarks and landed on these two configs:

Code:
## 40 x mirrors (only 4TB) @ 1M recordsize
`readwrite`
   READ: bw=3892MiB/s (4081MB/s), 3892MiB/s-3892MiB/s (4081MB/s-4081MB/s), io=38.0GiB (40.8GB), run=10001-10001msec
  WRITE: bw=4059MiB/s (4256MB/s), 4059MiB/s-4059MiB/s (4256MB/s-4256MB/s), io=39.6GiB (42.6GB), run=10001-10001msec

## 7 x draid2:5d:16c:1s & 2 x special mirror @ 1M recordsize
`readwrite`
   READ: bw=3251MiB/s (3408MB/s), 3251MiB/s-3251MiB/s (3408MB/s-3408MB/s), io=31.8GiB (34.1GB), run=10002-10002msec
  WRITE: bw=3436MiB/s (3603MB/s), 3436MiB/s-3436MiB/s (3603MB/s-3603MB/s), io=33.6GiB (36.0GB), run=10002-10002msec

## 57 x mirrors (17 x 2TB & 40 x 4TB) @ 1M recordsize & 16 jobs
`readwrite`
   READ: bw=9514MiB/s (9976MB/s), 311MiB/s-1308MiB/s (326MB/s-1372MB/s), io=93.3GiB (100GB), run=10001-10042msec
  WRITE: bw=9727MiB/s (10.2GB/s), 278MiB/s-1243MiB/s (292MB/s-1303MB/s), io=95.4GiB (102GB), run=10001-10042msec
`read`
   READ: bw=22.0GiB/s (23.7GB/s), 339MiB/s-5145MiB/s (355MB/s-5395MB/s), io=221GiB (237GB), run=10001-10029msec
`write`
  WRITE: bw=23.3GiB/s (25.0GB/s), 1396MiB/s-1592MiB/s (1464MB/s-1670MB/s), io=233GiB (250GB), run=10004-10014msec
`randread`
   READ: bw=6460MiB/s (6774MB/s), 401MiB/s-409MiB/s (421MB/s-429MB/s), io=63.3GiB (67.9GB), run=10001-10028msec
`randwrite`
  WRITE: bw=7584MiB/s (7953MB/s), 379MiB/s-630MiB/s (397MB/s-660MB/s), io=74.2GiB (79.7GB), run=10002-10023msec

## 7 x draid2:5d:16c:1s & 2 x special mirror @ 1M recordsize & 16 jobs
`readwrite`
   READ: bw=11.6GiB/s (12.4GB/s), 390MiB/s-1320MiB/s (409MB/s-1384MB/s), io=116GiB (124GB), run=10002-10016msec
  WRITE: bw=11.7GiB/s (12.5GB/s), 393MiB/s-1323MiB/s (413MB/s-1388MB/s), io=117GiB (126GB), run=10002-10016msec
`read`
   READ: bw=17.7GiB/s (19.0GB/s), 551MiB/s-2811MiB/s (578MB/s-2947MB/s), io=177GiB (190GB), run=10001-10026msec
`write`
  WRITE: bw=15.9GiB/s (17.0GB/s), 838MiB/s-1295MiB/s (878MB/s-1358MB/s), io=159GiB (171GB), run=10001-10016msec
`randread`
   READ: bw=12.5GiB/s (13.5GB/s), 786MiB/s-819MiB/s (824MB/s-859MB/s), io=126GiB (135GB), run=10001-10015msec
`randwrite`
  WRITE: bw=6626MiB/s (6948MB/s), 332MiB/s-633MiB/s (348MB/s-664MB/s), io=72.0GiB (77.3GB), run=10001-11130msec

## Benchmark script
zfs set primarycache=none Temp
fio --ioengine=libaio --filename=/mnt/Temp/performanceTest --direct=1 --sync=0 --rw=readwrite --bs=16M --numjobs=16 --iodepth=1 --runtime=10 --size=50G --time_based --name=fio
fio --ioengine=libaio --filename=/mnt/Temp/performanceTest --direct=1 --sync=0 --rw=read --bs=16M --numjobs=16 --iodepth=1 --runtime=10 --size=50G --time_based --name=fio
fio --ioengine=libaio --filename=/mnt/Temp/performanceTest --direct=1 --sync=0 --rw=write --bs=16M --numjobs=16 --iodepth=1 --runtime=10 --size=50G --time_based --name=fio
fio --ioengine=libaio --filename=/mnt/Temp/performanceTest --direct=1 --sync=0 --rw=randread --bs=16M --numjobs=16 --iodepth=1 --runtime=10 --size=50G --time_based --name=fio
fio --ioengine=libaio --filename=/mnt/Temp/performanceTest --direct=1 --sync=0 --rw=randwrite --bs=16M --numjobs=16 --iodepth=1 --runtime=10 --size=50G --time_based --name=fio
rm /mnt/Temp/performanceTest
zfs set primarycache=all Temp

I'm satisfied with getting an extra 100 TiB from using dRAID. The speeds are already at max SMB Multichannel I can support, and my NVMe drives in Windows are gen4, so each individually should be able to saturate this link. I think I'm good!

One thing I'm surprised about is the relatively slow speeds. Considering each drive is addressable directly by the hardware, and considering that PCIe 3.0 x8 is plenty of bandwidth for these SATA SSDs, I wish I could get more speed out of them just for kicks. Oh well.
 
Last edited:

Sawtaytoes

Patron
Joined
Jul 9, 2022
Messages
221
A great talk about ZFS on NVMe from BSDconf 2022:
Scaling ZFS for NVMe - Allan Jude
There is some ZFS internal logic that is optimize for spinning rust and should be changed for flash storage.
I watched that video last week actually, but there's nothing for me to do. My takeaway is that NVMe is tough on ZFS, and devs will fix it eventually; although, some preliminary work and findings were done by that presenter.

That's beyond my use case though. I have 4 x NVMe drives as metadata drives only, and they're writing at something like 14MB/s at the highest end. My SSDs are holding them back, but my SSDs also have significantly more storage at a more affordable cost, so it's moot.
 
Top