TrueNAS Scale - System freeze

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Good Morning!
I am a new TrueNAS User and installed my FileServer just two days ago. Its not a new System, i just decided to repurpose it.
Previously it was running with a Big Raid-Controller, Two 8 Disk Raid-6 together in a Storage-Spaces Pool (striped).
It was working fine for several years, but: there were several downsides of this configuration and i decided to move forward to a ZFS based NAS.

Some System Specs:
- Supermicro X9SCA-F
- Xeon E3-1265L V2
- 4x 8GB of Kingston ECC RAM
- SemiMicro SmartHBA 2100-24i
- Brocade 1020 CNA
- 8x 10tb WD Gold
- 4x 240GB Samsung Pro SSD
- 2x Samsung 128GB Evo SSD
- 19" Supermicro Server Chassis, similar to this one: https://www.pc-pitstop.com/supermicro-4u-24-bay-server-superchassis-846be1c-r1k23b
- "Gaming Grade" Power-Supply: Enermax 800Watts

2 ssds are in a raid1 for OS (hba supports mixed mode)
I have several Disks more, and i am playing with different Configurations at the moment. Currently its one Pool:

- 8 disks in raid-z2
- 2 SSDs as L2ARC
- 2 SSDs (Mirror) as ZIL

Performance isnt bad at all. I was able to achieve 500MB/s sustained throughput to the disks copying from another Fileserver. BUT:
After 1 or 2 TB of data the entire system just froze. No error messages, no log messages: Frozen.

I was able to login via ipmi, no problems in the eventlog as well.
So i decided to run memtest and verify if the memory is still good: No errors found, ~6 hours of testing resulted in a pass.

I am now leaning towards the idea that the powersupply may be the culprit of this scenario and ordered a new one.
Any other ideas?
-> I doubt its a cooling problem. 19-20 Degree Celsius Room Temp and 5 very powerful fans in the chassis.

Also decided to setup a centralized logging system, maybe that would have shown the last messages before the freeze.

Kind regards
 

joeschmuck

Old Man
Moderator
Joined
May 28, 2011
Messages
10,994
A few questions: What version of TrueNAS are you running?
How are you copying the files? Directly from the CLI fro drive to drive? Using an external computer to copy the files? If so then what application?
(hba supports mixed mode)
This statement concerns me. Exactly how is your HBA programmed? It should be in flashed into IT mode. ZFS does not like using RAID controllers as a RAID controller.

Is the problem repeatable?

I'm not the expert on HBA's since I'm not using one but if you can provide some more information, maybe a sound answer will pop up.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Sorry, i forgot that: its the latest stable:
TrueNAS-SCALE-22.02.4

from microsemis product website:
"Offers basic RAID functionality for up to 32 devices supporting RAID levels 0, 1, 10, and 5 in conjunction with full HBA functionality for up to 238 devices in the same solution"

This is the hba i use:

I dont think any reflashing in IT-Mode is required since its a HBA.
I enabled two ports out of 24 as raid ports, put them into raid1 and installed truenas onto it. all other ports are in hba mode.

about the repeatability: i am almost done configuring the central logging, once thats done i am throwing data on the system again and see if it happens again.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
from microsemis product website:
"Offers basic RAID functionality for up to 32 devices supporting RAID levels 0, 1, 10, and 5 in conjunction with full HBA functionality for up to 238 devices in the same solution"

You have some sort of crappy RAID controller. Please swap this out for an LSI HBA. Please see the article


and note that this absolutely is targeted at you and your current RAID card. Despite what you read on the product web site, you do not have "HBA functionality"; it's still a RAID card. See the quoted article especially points #2, #3, #4, and #5, all of which apply here.
 

Davvo

MVP
Joined
Jul 12, 2022
Messages
3,222
- 2 SSDs (Mirror) as ZIL
It's most likely not related to the issue you are having but what are you using for this?
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
@jgreco: Thanks for your response. If thats the case that would be very very sad. I just bought this "HBA" for almost a grand. I cannot undo that and if this means goodbye truenas/ zfs it is what it is.

About ZIL: at the moment its just a playground and i am testing what can be done and how it behaves. Using the using 2 of the samsung ssds for this.
 

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
- 19" Supermicro Server Chassis, similar to this one: https://www.pc-pitstop.com/supermicro-4u-24-bay-server-superchassis-846be1c-r1k23b

I doubt its a cooling problem. 19-20 Degree Celsius Room Temp and 5 very powerful fans in the chassis.

It's possible you have a "hotspot" cooling issue around the PCIe slot area, especially if the fans are ramping their speed based on CPU temperatures rather than a sensor closer to the problem spot. An overheating HBA or component would certainly manifest itself as a fully frozen, non-responsive system, and yours specifically has a cooling spec of 200LFM (linear feet per minute) of cooling, which ironically may be harder to come by in a more "open" style 4U vs a "cramped" 2U where there's no other exhaust path save for "over the cards."

Can you be more exact with the Supermicro model number, and is there any ducting around the expansion slots?

If thats the case that would be very very sad. I just bought this "HBA" for almost a grand. I cannot undo that and if this means goodbye truenas/ zfs it is what it is.

Would you be able to resell it? Secondary market on that looks fairly similar to that listed price - and purchasing a simple LSI HBA just for testing purposes (even the low-end SAS2008) would be fairly inexpensive - although if you need to add an expander, that may complicate things.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
there are 2 fans directed through a "air tunnel" over cpu and ram and 1 fan is directed at the pci cards. I left one slot free between nic and hba, also left the slot bracket out to have a better airflow in between those cards.
I can put in another fan to test if it changes anything. I setup graylog in the meantime, started to throw data at it again until it froze.
Here are the logs:
Code:
truenas.domain.local systemd[1]: Started System Logger Daemon.
truenas.domain.local systemd[1]: Stopped System Logger Daemon.
truenas.domain.local systemd[1]: Starting System Logger Daemon...
truenas.domain.local systemd-journald[272709]: Runtime Journal (/run/log/journal/17c49567eea046168d9b8121de0f0439) is 8.0M, max 320.8M, 312.8M free.
truenas.domain.local systemd[1]: syslog-ng.service: Succeeded.
truenas.domain.local systemd-journald[272709]: Runtime Journal (/run/log/journal/17c49567eea046168d9b8121de0f0439) is 8.0M, max 320.8M, 312.8M free.
truenas.domain.local systemd[1]: Finished Flush Journal to Persistent Storage.
truenas.domain.local systemd[1]: Stopping System Logger Daemon...
truenas.domain.local systemd[1]: Starting Flush Journal to Persistent Storage...
truenas.domain.local syslog-ng[272716]: Syslog connection established; fd='12', server='AF_INET(192.168.123.26:5140)', local='AF_INET(0.0.0.0:514)'
truenas.domain.local systemd: Starting Journal Service...
truenas.domain.local systemd: Stopped Journal Service.
truenas.domain.local systemd: systemd-journald.service: Succeeded.
truenas.domain.local systemd: Started Journal Service.
truenas.domain.local systemd: Stopping Journal Service...
truenas.domain.local systemd-journald[272709]: Journal started
truenas.domain.local systemd-journald: Received SIGTERM from PID 1 (systemd).
truenas.domain.local syslog-ng[272716]: syslog-ng starting up; version='3.28.1'
truenas.domain.local systemd[1]: Stopping Flush Journal to Persistent Storage...
truenas.domain.local syslog-ng[272702]: syslog-ng shutting down; version='3.28.1'
truenas.domain.local systemd-journald[272672]: Journal stopped
truenas.domain.local systemd[1]: systemd-journal-flush.service: Succeeded.
truenas.domain.local systemd[1]: Stopped Flush Journal to Persistent Storage.
truenas.domain.local systemd: systemd-journald.service: Succeeded.
truenas.domain.local systemd[1]: Started System Logger Daemon.
truenas.domain.local systemd[1]: Starting System Logger Daemon...
truenas.domain.local systemd[1]: Stopped System Logger Daemon.
truenas.domain.local systemd[1]: Stopping System Logger Daemon...
truenas.domain.local systemd[1]: syslog-ng.service: Succeeded.
truenas.domain.local systemd[1]: Finished Flush Journal to Persistent Storage.
truenas.domain.local systemd-journald[272672]: Runtime Journal (/run/log/journal/17c49567eea046168d9b8121de0f0439) is 8.0M, max 320.8M, 312.8M free.
truenas.domain.local systemd[1]: Starting Flush Journal to Persistent Storage...
truenas.domain.local syslog-ng[272702]: syslog-ng starting up; version='3.28.1'
truenas.domain.local systemd: Started Journal Service.
truenas.domain.local systemd-journald[272672]: Runtime Journal (/run/log/journal/17c49567eea046168d9b8121de0f0439) is 8.0M, max 320.8M, 312.8M free.
truenas.domain.local systemd-journald[272672]: Journal started
truenas.domain.local systemd: Starting Journal Service...
truenas.domain.local systemd: Stopped Journal Service.
truenas.domain.local syslog-ng[272702]: Syslog connection established; fd='12', server='AF_INET(192.168.123.26:5140)', local='AF_INET(0.0.0.0:514)'
truenas.domain.local systemd: Stopping Journal Service...
truenas.domain.local systemd-journald: Received SIGTERM from PID 1 (systemd).
truenas.domain.local systemd[1]: var-lib-systemd-coredump.mount: Succeeded.
truenas.domain.local systemd[1]: Finished system activity accounting tool.
truenas.domain.local systemd[1]: sysstat-collect.service: Succeeded.
truenas.domain.local systemd[1]: Starting system activity accounting tool...
truenas.domain.local systemd[1]: Starting User Manager for UID 0...
truenas.domain.local systemd-logind[8453]: New session 11 of user root.
truenas.domain.local systemd[1]: Finished User Runtime Directory /run/user/0.
truenas.domain.local systemd[273327]: pam_unix(systemd-user:session): session opened for user root(uid=0) by (uid=0)
truenas.domain.local systemd[273332]: gpgconf: error running '/usr/lib/gnupg/scdaemon': probably not installed
truenas.domain.local systemd[1]: Starting User Runtime Directory /run/user/0...
truenas.domain.local systemd[1]: Created slice User Slice of UID 0.
truenas.domain.local login[273319]: pam_unix(login:session): session opened for user root(uid=0) by (uid=0)
truenas.domain.local login[273350]: ROOT LOGIN  on '/dev/pts/2'
truenas.domain.local systemd[1]: Started Session 11 of user root.
truenas.domain.local systemd[1]: Started User Manager for UID 0.
truenas.domain.local systemd[273327]: Startup finished in 271ms.
truenas.domain.local systemd[273327]: Reached target Sockets.
truenas.domain.local systemd[273327]: Listening on GnuPG cryptographic agent and passphrase cache.
truenas.domain.local systemd[273327]: Reached target Basic System.
truenas.domain.local systemd[273327]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
truenas.domain.local systemd[273327]: Reached target Main User Target.
truenas.domain.local systemd[273327]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
truenas.domain.local systemd[273327]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
truenas.domain.local systemd[273327]: Listening on D-Bus User Message Bus Socket.
truenas.domain.local systemd[273327]: Listening on GnuPG network certificate management daemon.
truenas.domain.local systemd[273327]: Starting D-Bus User Message Bus Socket.
truenas.domain.local systemd-xdg-autostart-generator[273345]: Exec binary '/usr/libexec/at-spi-bus-launcher' does not exist: No such file or directory
truenas.domain.local systemd[273327]: Queued start job for default target Main User Target.
truenas.domain.local systemd[273327]: Reached target Timers.
truenas.domain.local systemd[273327]: Reached target Paths.
truenas.domain.local systemd[273327]: Created slice User Application Slice.
truenas.domain.local systemd-xdg-autostart-generator[273345]: Not generating service for XDG autostart app-at\x2dspi\x2ddbus\x2dbus-autostart.service, error parsing Exec= line: No such file or directory
truenas.domain.local systemd-logind[8453]: Removed session 11.
truenas.domain.local systemd[1]: session-11.scope: Succeeded.
truenas.domain.local systemd-logind[8453]: Session 11 logged out. Waiting for processes to exit.
truenas.domain.local systemd[1]: user-runtime-dir@0.service: Succeeded.
truenas.domain.local systemd[1]: Stopped User Runtime Directory /run/user/0.
truenas.domain.local systemd[1]: Removed slice User Slice of UID 0.
truenas.domain.local systemd[273327]: Closed GnuPG cryptographic agent and passphrase cache.
truenas.domain.local systemd[273327]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
truenas.domain.local systemd[1]: Stopped User Manager for UID 0.
truenas.domain.local systemd[1]: run-user-0.mount: Succeeded.
truenas.domain.local systemd[1]: Stopping User Runtime Directory /run/user/0...
truenas.domain.local systemd[1]: user@0.service: Succeeded.
truenas.domain.local systemd[273327]: Reached target Exit the Session.
truenas.domain.local systemd[273327]: systemd-exit.service: Succeeded.
truenas.domain.local systemd[273327]: Finished Exit the Session.
truenas.domain.local systemd[273327]: Reached target Shutdown.
truenas.domain.local systemd[273327]: Closed GnuPG cryptographic agent (ssh-agent emulation).
truenas.domain.local systemd[273327]: Removed slice User Application Slice.
truenas.domain.local systemd[273327]: gpg-agent.socket: Succeeded.
truenas.domain.local systemd[273327]: gpg-agent-browser.socket: Succeeded.
truenas.domain.local systemd[273327]: gpg-agent-ssh.socket: Succeeded.
truenas.domain.local systemd[273327]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
truenas.domain.local systemd[273327]: gpg-agent-extra.socket: Succeeded.
truenas.domain.local systemd[273327]: dirmngr.socket: Succeeded.
truenas.domain.local systemd[273327]: Closed D-Bus User Message Bus Socket.
truenas.domain.local systemd[273327]: dbus.socket: Succeeded.
truenas.domain.local systemd[273327]: Closed GnuPG network certificate management daemon.
truenas.domain.local systemd[273327]: Stopped target Timers.
truenas.domain.local systemd[273327]: Stopped target Main User Target.
truenas.domain.local systemd[273327]: Stopped target Sockets.
truenas.domain.local systemd[273327]: Stopped target Paths.
truenas.domain.local systemd[273327]: Stopped target Basic System.
truenas.domain.local systemd[1]: Stopping User Manager for UID 0...
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 6
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 16
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 16
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 10
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 13
truenas.domain.local dhclient[6159]: No working leases in persistent database - sleeping.
truenas.domain.local dhclient[6159]: No DHCPOFFERS received.
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 7
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 13
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 7
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 11
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 16
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 7
truenas.domain.local dhclient[6159]: No working leases in persistent database - sleeping.
truenas.domain.local dhclient[6159]: No DHCPOFFERS received.
truenas.domain.local systemd[1]: sysstat-collect.service: Succeeded.
truenas.domain.local systemd[1]: Finished system activity accounting tool.
truenas.domain.local systemd[1]: Starting system activity accounting tool...
truenas.domain.local smartd[8442]: Device: /dev/sdf [SAT], 3 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdo [SAT], 8 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdv [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 70 to 69
truenas.domain.local smartd[8442]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 139 to 136
truenas.domain.local smartd[8442]: Device: /dev/sdh [SAT], 8 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 109 to 108
truenas.domain.local nginx[8790]: 2022/11/10 15:12:26 [error] 8790#8790: *26 directory index of "/usr/share/truenas/webui/assets/images/" is forbidden, client: 192.168.251.1, server: localhost, request: "GET /ui/assets/images/ HTTP/1.1", host: "192.168.251.240", referrer: "http://192.168.251.240/"
truenas.domain.local systemd[282105]: Listening on D-Bus User Message Bus Socket.
truenas.domain.local systemd[1]: Started Session 13 of user root.
truenas.domain.local login[282129]: ROOT LOGIN  on '/dev/pts/3'
truenas.domain.local systemd[282105]: Reached target Sockets.
truenas.domain.local systemd[282105]: Startup finished in 274ms.
truenas.domain.local systemd[282105]: Reached target Main User Target.
truenas.domain.local systemd[1]: Started User Manager for UID 0.
truenas.domain.local systemd[282105]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
truenas.domain.local systemd[282105]: Listening on GnuPG cryptographic agent and passphrase cache.
truenas.domain.local systemd[282105]: Reached target Basic System.
truenas.domain.local systemd[282105]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
truenas.domain.local systemd[282105]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
truenas.domain.local systemd[282105]: Listening on GnuPG network certificate management daemon.
truenas.domain.local systemd[282105]: Starting D-Bus User Message Bus Socket.
truenas.domain.local systemd[282105]: Queued start job for default target Main User Target.
truenas.domain.local systemd[282105]: Reached target Timers.
truenas.domain.local systemd[282105]: Reached target Paths.
truenas.domain.local systemd[282105]: Created slice User Application Slice.
truenas.domain.local systemd-xdg-autostart-generator[282123]: Not generating service for XDG autostart app-at\x2dspi\x2ddbus\x2dbus-autostart.service, error parsing Exec= line: No such file or directory
truenas.domain.local systemd-xdg-autostart-generator[282123]: Exec binary '/usr/libexec/at-spi-bus-launcher' does not exist: No such file or directory
truenas.domain.local systemd[282110]: gpgconf: error running '/usr/lib/gnupg/scdaemon': probably not installed
truenas.domain.local systemd[282105]: pam_unix(systemd-user:session): session opened for user root(uid=0) by (uid=0)
truenas.domain.local systemd[1]: Starting User Runtime Directory /run/user/0...
truenas.domain.local systemd[1]: Created slice User Slice of UID 0.
truenas.domain.local systemd[1]: Finished User Runtime Directory /run/user/0.
truenas.domain.local systemd[1]: Starting User Manager for UID 0...
truenas.domain.local systemd-logind[8453]: New session 13 of user root.
truenas.domain.local login[282097]: pam_unix(login:session): session opened for user root(uid=0) by (uid=0)
truenas.domain.local systemd-logind[8453]: Session 13 logged out. Waiting for processes to exit.
truenas.domain.local systemd[1]: session-13.scope: Succeeded.
truenas.domain.local systemd-logind[8453]: Removed session 13.
truenas.domain.local systemd[1]: Removed slice User Slice of UID 0.
truenas.domain.local systemd[1]: user-runtime-dir@0.service: Succeeded.
truenas.domain.local systemd[1]: Stopped User Runtime Directory /run/user/0.
truenas.domain.local systemd[282105]: Closed GnuPG network certificate management daemon.
truenas.domain.local systemd[282105]: Closed GnuPG cryptographic agent (ssh-agent emulation).
truenas.domain.local systemd[1]: Stopping User Runtime Directory /run/user/0...
truenas.domain.local systemd[1]: user@0.service: Succeeded.
truenas.domain.local systemd[1]: run-user-0.mount: Succeeded.
truenas.domain.local systemd[1]: Stopped User Manager for UID 0.
truenas.domain.local systemd[282105]: systemd-exit.service: Succeeded.
truenas.domain.local systemd[282105]: Finished Exit the Session.
truenas.domain.local systemd[282105]: Reached target Exit the Session.
truenas.domain.local systemd[282105]: Reached target Shutdown.
truenas.domain.local systemd[282105]: Removed slice User Application Slice.
truenas.domain.local systemd[282105]: Closed GnuPG cryptographic agent and passphrase cache.
truenas.domain.local systemd[282105]: Closed GnuPG cryptographic agent and passphrase cache (restricted).
truenas.domain.local systemd[282105]: gpg-agent-ssh.socket: Succeeded.
truenas.domain.local systemd[282105]: gpg-agent.socket: Succeeded.
truenas.domain.local systemd[282105]: gpg-agent-browser.socket: Succeeded.
truenas.domain.local systemd[282105]: gpg-agent-extra.socket: Succeeded.
truenas.domain.local systemd[282105]: Closed GnuPG cryptographic agent and passphrase cache (access for web browsers).
truenas.domain.local systemd[282105]: Stopped target Sockets.
truenas.domain.local systemd[282105]: Stopped target Paths.
truenas.domain.local systemd[282105]: Stopped target Basic System.
truenas.domain.local systemd[282105]: Stopped target Main User Target.
truenas.domain.local systemd[282105]: dirmngr.socket: Succeeded.
truenas.domain.local systemd[282105]: Closed D-Bus User Message Bus Socket.
truenas.domain.local systemd[282105]: dbus.socket: Succeeded.
truenas.domain.local systemd[282105]: Stopped target Timers.
truenas.domain.local systemd[1]: Stopping User Manager for UID 0...
truenas.domain.local CRON[284039]: pam_unix(cron:session): session opened for user root(uid=0) by (uid=0)
truenas.domain.local CRON[284040]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
truenas.domain.local CRON[284039]: pam_unix(cron:session): session closed for user root
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 4
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 9
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 13
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 11
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 15
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 9
truenas.domain.local dhclient[6159]: No DHCPOFFERS received.
truenas.domain.local dhclient[6159]: No working leases in persistent database - sleeping.
truenas.domain.local nginx[8790]: 2022/11/10 15:18:33 [error] 8790#8790: *32 directory index of "/usr/share/truenas/webui/assets/images/" is forbidden, client: 192.168.251.1, server: localhost, request: "GET /ui/assets/images/ HTTP/1.1", host: "192.168.251.240", referrer: "http://192.168.251.240/"
truenas.domain.local systemd[1]: Finished system activity accounting tool.
truenas.domain.local systemd[1]: Starting system activity accounting tool...
truenas.domain.local systemd[1]: sysstat-collect.service: Succeeded.
truenas.domain.local nginx[8790]: 2022/11/10 15:22:06 [error] 8790#8790: *33 directory index of "/usr/share/truenas/webui/assets/images/" is forbidden, client: 192.168.251.1, server: localhost, request: "GET /ui/assets/images/ HTTP/1.1", host: "192.168.251.240", referrer: "http://192.168.251.240/"
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 6
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 9
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 7
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 14
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 16
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 9
truenas.domain.local dhclient[6159]: No working leases in persistent database - sleeping.
truenas.domain.local dhclient[6159]: No DHCPOFFERS received.
truenas.domain.local systemd[1]: Finished system activity accounting tool.
truenas.domain.local systemd[1]: sysstat-collect.service: Succeeded.
truenas.domain.local systemd[1]: Starting system activity accounting tool...
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 6
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 14
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 7
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 12
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 14
truenas.domain.local dhclient[6159]: DHCPDISCOVER on enp3s0 to 255.255.255.255 port 67 interval 8
truenas.domain.local dhclient[6159]: No working leases in persistent database - sleeping.
truenas.domain.local dhclient[6159]: No DHCPOFFERS received.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
If thats the case that would be very very sad. I just bought this "HBA" for almost a grand. I cannot undo that and if this means goodbye truenas/ zfs it is what it is.

It would seem to be an extremely expensive lesson that one does not just buy random parts and then wish for them to work. It is much better to start with the recommended hardware guidance and buy parts known to be compatible.

That said, the typical LSI HBA is available on the used market in the $30-$50 range.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
I need a card with 24-sata slots, only 2 slots free left on my mainboard. if you can point me at a lsi card for 24 disks for 30-50 bucks: ill gladly buy it.
FYI: i didnt intend to build a truenas system in the first place: otherwise i would have looked specifically for those parts on your hcl.
I wasn't statisfied with what storagespaces had to offer (dual parity with encryption is unbearably slow) so i decided to give truenas a go.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Talking to the reseller if they can exchange it for a LSI SAS 9305-24i.
Thats fully supported, right?
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
I need a card with 24-sata slots, only 2 slots free left on my mainboard. if you can point me at a lsi card for 24 disks for 30-50 bucks: ill gladly buy it.

Once you get past 8 or 16 drives, most SAS chassis designs move on towards the use of an SAS expander and stop trying to talk to all the drives directly. If you don't know what this is, please have a glance at the SAS primer here on the forums.


What you would do is buy something along the lines of an LSI 9211-8i (probably a PERC crossflashed, commonly $30-$50) and an IBM RES2SV240 (~$50 used) for a 6Gbps solution, or an LSI 9300-8i and one of the various 12Gbps SAS expanders going on eBay for about $70. The expander cards can often be powered independently and mounted someplace else in the chassis (not burning up a slot) because they are not PCIe bus devices (even if they have the connector).

Talking to the reseller if they can exchange it for a LSI SAS 9305-24i.
Thats fully supported, right?

As long as it can be crossflashed to IT mode. However, the high port count LSI cards are typically in the $500-$1000 range. They have the upside of being easier to work with from a cabling point of view, but most of us around here are cheapskates and would prefer to find a 12Gbps HBA and SAS expander pair for maybe $150? Seems like money better spent.

Note: I received a rather terse note that some users at STH have had problems with the IBM RES2SV240 controller. We've been recommending it here in these forums for a decade, and I don't recall ever having heard of issues. However, buyer beware. It may be best just to jump right to 12Gbps SAS gear anyways.
 
Last edited:

HoneyBadger

actually does care
Administrator
Moderator
iXsystems
Joined
Feb 6, 2014
Messages
5,112
there are 2 fans directed through a "air tunnel" over cpu and ram and 1 fan is directed at the pci cards. I left one slot free between nic and hba, also left the slot bracket out to have a better airflow in between those cards.
I can put in another fan to test if it changes anything. I setup graylog in the meantime, started to throw data at it again until it froze.

I've trimmed out the log pieces but here's what I'm just seeing in there:

Code:
truenas.domain.local systemd-journald: Received SIGTERM from PID 1 (systemd).
truenas.domain.local systemd-journald: Received SIGTERM from PID 1 (systemd).


Were these controlled/expected shutdowns? SIGTERM is expected during a normal shutdown, but if this was prior to a system hang then you might be seeing a quiet kernel panic.

Code:
truenas.domain.local smartd[8442]: Device: /dev/sdf [SAT], 3 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdo [SAT], 8 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdv [SAT], SMART Usage Attribute: 190 Airflow_Temperature_Cel changed from 70 to 69
truenas.domain.local smartd[8442]: Device: /dev/sdl [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 139 to 136
truenas.domain.local smartd[8442]: Device: /dev/sdh [SAT], 8 Currently unreadable (pending) sectors
truenas.domain.local smartd[8442]: Device: /dev/sdg [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 109 to 108


This tells me you have three drives starting to fail out (sdf, sdo, sdh)

I don't necessarily trust the SMART 194 attributes as I've seen OEM firmware use that for a different value (I doubt your drives are hot enough to boil water at 136C and 108C) but SMART 190 Airflow_Temperature_Cel I haven't seen lie as often. How hot are your disks?
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
@jgreco Thanks for sharing your knowledge.
I did further testing, so far it looks like the hba really is the problem:
- Updated the firmware of the hba: same behavior.
- Changed the brocade nic against a intel x520: same.
- Changed the powersupply: same.
- Installed additional fans: same.

@HoneyBadger Thanks a ton for helping me. I appreciate this! The hottest disk i found is sdk with a max of 48°C. (reportsdashboard/disk)

The reseller where i got my current adaptec hba from is willing to exchange it against another one. But they dont have a big selection:
- Supermicro LSI SAS 3008 IT/HBA MODE 12GB (390 bucks)
- LSI SAS 9305-24i (760 bucks)

and the selection of expanders is even smaller. They list only one:
- 48 Port Adaptec Expander (500 bucks)
So buying the 3008 with expander is more expensive than getting the 24 Port HBA.
BUT: I couldnt verify if there is a IT-Mode firmware available, as i cannot find firmwares or the device on the lsi website.

The syslog component is somehow weird: i dont get to see the messages one by one. Its like i dont get messages for a long time and then get a TON like its puking all over it:
1668178180401.png
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
- Supermicro LSI SAS 3008 IT/HBA MODE 12GB (390 bucks)
- LSI SAS 9305-24i (760 bucks)

My opinion is "too much" for both of those. Well, maybe not for the 9305-24i, but I'd be suspicious that the unit might be sourced from China. See an eBay search. The Supermicro card part number is AOC-S3008L-L8i and there is one on eBay right now for about $90. If you are patient and cherry-pick, you can get these with both brackets for about $125 IME -- my companies source lots of used gear of this sort from eBay. Prices seem a bit high right now for some reason.

As for the 9305-24i firmware, you have to know where to look. LSI became Avago became Broadcom, and it is out there ... https://www.broadcom.com/site-search?q=9305-24i but please don't feel bad, I do this all day every day, I know at least some of the tricks. However, this does appear to be 16.00.11.00 firmware, not the 16.00.12.00 firmware that has a bugfix or two that's relevant for TrueNAS. This may or may not be a problem, I'm sorry, I really don't know. Check Jira or the 16.00.12.00 firmware released in the Resources section to see if you can reach a conclusion of some sort.

- 48 Port Adaptec Expander (500 bucks)

Too much. HP 36 port expander, $100. https://www.ebay.com/itm/195448885769

So if your vendor will allow you, your BEST choice is probably just to return the Adaptec card for a refund. Second best is probably to see if they will let you downshift to the Supermicro 8i card (even at the ridiculous price) and then go about arranging for an expander and cabling on your own. Of course I have no idea what they'll be willing to compromise with you on. They may well be happy to be rid of an expensive and weird part and not want to refund you even a portion of the grand you spent. All I can do is toss out as much relevant information as possible and hope that you can put together a reasonable deal of some sort.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Good morning!
I got it figured out. They wont refund me the money in cash. I can only choose from their portfolio.
So i went ahead and asked them to order the LSI 9305-24i for me. It will take 10 days or more to get it.
Thanks for sending me the Link to the LSI IT-Mode Firmware, i`ll flash it as soon as the product arrives.
Figers crossed that this solves the issues. The only other idea i have: the board only has pci-e 2.0, but those cards are pci-e 3.0.
In theory that should work nevertheless but... you know how it is.
Thanks a ton for your help on this! I`ll report as soon as i know more.
 

jgreco

Resident Grinch
Joined
May 29, 2011
Messages
18,680
board only has pci-e 2.0, but those cards are pci-e 3.0.
In theory that should work nevertheless but... you know how it is.

That should be fine.

Thanks a ton for your help on this!

I'm sorry this bit you in this manner. One of my goals here on the forums is to write stuff to make the complex topics more accessible to newcomers. Unfortunately, I also understand that there's a dismaying amount of stuff newcomers need to know. I'm always a bit disappointed when I don't get the needed information into your hands soon enough. But good luck and please let us all know how it goes.
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Yay, the hba arrived way faster than expected. Just picked it up and installed it.
Now looking at the firmware update.
Does somebody know how the correct procedure looks like?
-> Do i update only the firmware?
-> if i need to flash bios, is it: mptsas3.rom (sasbios_rel) or mpt3x64.rom(uefibsd_rel\Signed) ?
 

friendlyguy

Dabbler
Joined
Nov 10, 2022
Messages
31
Now thats exciting:
I am able to access the Controller BIOS from within the mainboard bios.
The picture below shows the current firmware version. if p16 == Version 16 an update is probably a good idea :)

1668771757252.png
 
Top