Truenas crash and freeze - 12.0-U1.1

Gianni_Pitto

Cadet
Joined
Mar 5, 2021
Messages
7
Hello ,
I have this kind of big, fresh, truenas installation, on a hp dl380 server, xeon 4208 @2.10 ghz, 128 gb ram
.
I have the o.s. installed on a raid1 (hw), then 10 disks on raid 6 (no raid hw, raid managed by truens) for an available space of 40,25tb

So, the problem here is that this truenas server worked well for 10 days circa, then suddently stopped working. We could ping it, but nothing else, samba shares , gui or ssh where unresponsive.
So the strange behaviour is that when I reboot it, it will work for 5 / 10 minutes circa, then it will do the same. I have another truenas on the same site (same hw & config, but never ever used) and now at most i can give simple command via remote ssh, ,like 'time' 'date', but no 'top' or 'reboot' because in this way the console will be freezed.

In the 5-minutes-time-frame when the server was availabe I was able to donwload logs, but I can't figure out what is happening from log side...
for example every time the system freeze (you can notice a big time gaps in the logs) I can see only messages like those you can see at the end of the code, just before the time gap 16.20-10.49

I have the entire var/logs, I can post others log if needed.

Please help



Code:
Mar 15 16:19:44 nas /etc/rc: WARNING: failed to start rpcbind
Mar 15 16:19:44 nas syslog-ng[3015]: Error processing log message: <(0x103c) EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
Mar 15 16:19:44 nas ugen0.1: <0x8086 XHCI root HUB> at usbus0
Mar 15 16:19:44 nas uhub1: <0x8086 XHCI root HUB, class 9/0, rev 3.00/1.00, addr 1> on usbus0
Mar 15 16:19:44 nas da1 at smartpqi0 bus 0 scbus2 target 66 lun 0
Mar 15 16:19:44 nas da1: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da1: Serial Number 78I8K0NXF4FE       
Mar 15 16:19:44 nas da1: 1200.000MB/s transfers
Mar 15 16:19:44 nas da1: Command Queueing enabled
Mar 15 16:19:44 nas da1: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da2 at smartpqi0 bus 0 scbus2 target 67 lun 0
Mar 15 16:19:44 nas da2: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da2: Serial Number 78I7K0W6F4FE       
Mar 15 16:19:44 nas da2: 1200.000MB/s transfers
Mar 15 16:19:44 nas da2: Command Queueing enabled
Mar 15 16:19:44 nas da2: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da3 at smartpqi0 bus 0 scbus2 target 68 lun 0
Mar 15 16:19:44 nas da3: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da3: Serial Number 78I8K0NZF4FE       
Mar 15 16:19:44 nas da3: 1200.000MB/s transfers
Mar 15 16:19:44 nas da3: Command Queueing enabled
Mar 15 16:19:44 nas da3: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da4 at smartpqi0 bus 0 scbus2 target 69 lun 0
Mar 15 16:19:44 nas da4: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da4: Serial Number 78I8K0NVF4FE       
Mar 15 16:19:44 nas da4: 1200.000MB/s transfers
Mar 15 16:19:44 nas da4: Command Queueing enabled
Mar 15 16:19:44 nas da4: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da6 at smartpqi0 bus 0 scbus2 target 71 lun 0
Mar 15 16:19:44 nas da6: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da6: Serial Number 78I8K0NYF4FE       
Mar 15 16:19:44 nas da6: 1200.000MB/s transfers
Mar 15 16:19:44 nas da6: Command Queueing enabled
Mar 15 16:19:44 nas da6: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da8 at smartpqi0 bus 0 scbus2 target 73 lun 0
Mar 15 16:19:44 nas da8: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da8: Serial Number 78I8K0NPF4FE       
Mar 15 16:19:44 nas da8: 1200.000MB/s transfers
Mar 15 16:19:44 nas da8: Command Queueing enabled
Mar 15 16:19:44 nas da8: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da7 at smartpqi0 bus 0 scbus2 target 72 lun 0
Mar 15 16:19:44 nas da7: <ATA MB006000GWBXQ HPG8> Fixed Direct Access SPC-4 SCSI device
Mar 15 16:19:44 nas da7: Serial Number 78I8K0NUF4FE       
Mar 15 16:19:44 nas da7: 1200.000MB/s transfers
Mar 15 16:19:44 nas da7: Command Queueing enabled
Mar 15 16:19:44 nas da7: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da10 at smartpqi0 bus 0 sc
Mar 15 16:19:44 nas K0NUF4FE       
Mar 15 16:19:44 nas da7: 1200.000MB/s transfers
Mar 15 16:19:44 nas da7: Command Queueing enabled
Mar 15 16:19:44 nas da7: 5723166MB (11721045168 512 byte sectors)
Mar 15 16:19:44 nas da10 at smartpqi0 bus 0 sc
Mar 15 16:19:44 nas Updating CPU Microcode...
Mar 15 16:19:45 nas Done.
Mar 15 16:19:45 nas Starting watchdogd.
Mar 15 16:19:45 nas Starting mountd.
Mar 15 16:19:45 nas mountd: mountd already running, pid: 2786.
Mar 15 16:19:45 nas /etc/rc: WARNING: failed to start mountd
Mar 15 16:19:46 nas NFSv4 is disabled
Mar 15 16:19:46 nas Starting nfsd.
Mar 15 16:19:45 nas 1 2021-03-15T16:19:45.902805+01:00 nas.domain.local nfsd 3234 - - Can't read stable storage file: Operation not permitted
Mar 15 16:19:46 nas Starting statd.
Mar 15 16:19:46 nas Starting lockd.
Mar 15 16:19:46 nas Updating motd:.
Mar 15 16:19:46 nas Mounting late filesystems:.
Mar 15 16:19:46 nas Starting ntpd.
Mar 15 16:19:46 nas kernel: NLM: can't start server - it appears to be running already
Mar 15 16:19:46 nas kernel[3015]: Last message 'NLM: can't start ser' repeated 1 times, suppressed by syslog-ng on nas.domain.local
Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.109431+01:00 nas.domain.local su 3283 - - pam_winbind(su): PAM_ESTABLISH_CRED not implemented
Mar 15 16:19:46 nas Segmentation fault
Mar 15 16:19:46 nas /etc/rc: WARNING: failed to start ntpd
Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157300+01:00 nas.domain.local su 3283 - - pam_winbind(su): request wbcLogoffUser failed: WBC_ERR_WINBIND_NOT_AVAILABLE, PAM error: PAM_AUTHINFO_UNAVAIL (12)!
Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local
Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157313+01:00 nas.domain.local su 3283 - - pam_winbind(su): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ntpd')
Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local
Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157340+01:00 nas.domain.local su 3283 - - pam_winbind(su): request wbcLogoffUser failed: WBC_ERR_WINBIND_NOT_AVAILABLE, PAM error: PAM_AUTHINFO_UNAVAIL (12)!
Mar 15 16:19:46 nas 1[3015]: Last message '2021-03-15T16:19:46.' repeated 1 times, suppressed by syslog-ng on nas.domain.local
Mar 15 16:19:46 nas 1 2021-03-15T16:19:46.157351+01:00 nas.domain.local su 3283 - - pam_winbind(su): internal module error (retval = PAM_AUTHINFO_UNAVAIL(12), user = 'ntpd')
Mar 15 16:19:46 nas Updating /var/run/os-release done.
Mar 15 16:19:46 nas Cannot 'start' nmbd. Set nmbd_enable to YES in /etc/rc.conf or use 'onestart' instead of 'start'.
Mar 15 16:19:46 nas Starting smbd.
Mar 15 16:19:46 nas Starting winbindd.
Mar 15 16:19:46 nas Starting[3015]: Last message 'winbindd.' repeated 1 times, suppressed by syslog-ng on nas.domain.local
Mar 15 16:19:46 nas Starting smbd.
Mar 15 16:19:47 nas starting wsdd
Mar 15 16:19:47 nas Starting snmpd.
Mar 15 16:19:47 nas Starting snmp_agent.
Mar 15 16:19:47 nas Starting smartd_daemon.
Mar 15 16:19:47 nas Starting rrdcached.
Mar 15 16:19:47 nas Performing sanity check on openssh configuration.
Mar 15 16:19:47 nas Starting openssh.
Mar 15 16:19:47 nas Performing sanity check on nginx configuration:
Mar 15 16:19:47 nas nginx: the configuration file /usr/local/etc/nginx/nginx.conf syntax is ok
Mar 15 16:19:47 nas nginx: configuration file /usr/local/etc/nginx/nginx.conf test is successful
Mar 15 16:19:47 nas Starting nginx.
Mar 15 16:19:47 nas Starting dbus.
Mar 15 16:19:47 nas Starting collectd_daemon.
Mar 15 16:19:47 nas Starting avahi-daemon.
Mar 15 16:19:48 nas Syncing multipaths...
Mar 15 16:19:54 nas [EFAULT] kinit for domain [domain.local] with principal [nas$@domain.local] failed: kinit: krb5_get_init_creds: unable to reach any KDC in realm domain.local
Mar 15 16:19:54 nas
Mar 15 16:19:54 nas Traceback (most recent call last):
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 137, in call_method
Mar 15 16:19:54 nas result = await self.middleware._call(message['method'], serviceobj, methodobj, params, app=self,
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/main.py", line 1195, in _call
Mar 15 16:19:54 nas return await methodobj(*prepared_call.args)
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 577, in start
Mar 15 16:19:54 nas await asyncio.wait_for(self._kinit(), timeout=kinit_timeout)
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/asyncio/tasks.py", line 483, in wait_for
Mar 15 16:19:54 nas return fut.result()
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 318, in _kinit
Mar 15 16:19:54 nas await self.do_kinit(ad)
Mar 15 16:19:54 nas File "/usr/local/lib/python3.8/site-packages/middlewared/plugins/kerberos.py", line 269, in do_kinit
Mar 15 16:19:54 nas raise CallError(f"kinit for domain [{data['domainname']}] "
Mar 15 16:19:54 nas middlewared.service_exception.CallError: [EFAULT] kinit for domain [domain.local] with principal [nas$@domain.local] failed: kinit: krb5_get_init_creds: unable to reach any KDC in realm domain.local
Mar 15 16:19:54 nas
Mar 15 16:19:54 nas
Mar 15 16:19:54 nas Configuring vt: blanktime.
Mar 15 16:19:54 nas Starting cron.
Mar 15 16:19:55 nas net.inet.carp.allow: 0 -> 1
Mar 15 16:19:55 nas
Mar 15 16:19:55 nas Mon Mar 15 16:19:55 CET 2021
Mar 15 16:20:50 nas 1 2021-03-15T16:20:50.857972+01:00 nas.domain.local login 3659 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 15 16:21:13 nas 1 2021-03-15T16:21:13.778960+01:00 nas.domain.local login 3659 - - pam_winbind(login): user 'root' OK
Mar 15 16:22:30 nas 1 2021-03-15T16:22:30.731482+01:00 nas.domain.local login 3827 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 15 16:23:36 nas 1 2021-03-15T16:23:36.383317+01:00 nas.domain.local login 3870 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 15 16:23:52 nas 1 2021-03-15T16:23:52.668306+01:00 nas.domain.local login 3870 - - pam_winbind(login): user 'root' OK
Mar 15 16:27:05 nas 1 2021-03-15T16:27:05.773788+01:00 nas.domain.local login 4002 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 15 16:27:18 nas 1 2021-03-15T16:27:18.298800+01:00 nas.domain.local login 4002 - - pam_winbind(login): user 'root' OK
Mar 15 16:27:18 nas 1 2021-03-15T16:27:18.444081+01:00 nas.domain.local login 4011 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 15 16:29:58 nas 1 2021-03-15T16:29:58.173970+01:00 nas.domain.local login 4094 - - pam_winbind(login): PAM_REINITIALIZE_CRED not implemented
Mar 16 10:49:45 nas Starting devd.
Mar 16 10:49:45 nas Starting ums0 moused.
Mar 16 10:49:45 nas Starting zfsd.
 

ThreeDee

Guru
Joined
Jun 13, 2013
Messages
700
let TrueNAS handle the boot disk mirroring and see if you still have issues
 
Top