FreeNAS 11.2 U5 Samba Locks Up Randomly

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Hi guys,

I am hoping someone has some insight into an issue I have been seeing for the last week since I updated to 11.2 U5.

Hardware:
Chassis: SuperMicro 6048R-E1CR24N
Motherboard: SuperMicro x10DRi-T4+
CPU: 2 x Intel E5-2620v3 (6 core/12 thread @ 2.4GHz per)
RAM: 12 x SuperMicro DR416L-SL01-ER21 (Samsung, 16GB DDR4 2133 ECC Registered)
HBA: SuperMicro AOC-S3108L-H8IR-O-P (LSI MegaRAID 8-port SAS 12Gb/s)
Network: ChelsIO T420-LL-CR (2 x SFP+ for 10G Fibre)
Boot: 2 x Samsung 840 Pro 128GB
Pool: 3 x RAIDZ2 vdevs of 6x8TB Seagate HDDs.

Configuration:
FreeNAS 11.2 U5 connected to Active Directory.
Three datasets in zpool shared via SMB and NFS.

As to my issue - I upgraded from 11.1 U6 to 11.2 U5 one week ago and in that time half of the system has locked up on four separate occasions. SMB access and performance has been good until suddenly the shares are not accessible. I cannot SSH to the system as it keeps reporting a broken pipe, the dashboard shows some of the details but things like the Network Info do not display anything. I can access most of the UI through the webpage, including netdata, which does not appear to show anything out of normal (<5% CPU usage, normal IO).
I can fix the issue by going to Directory Services > Active Directory and unchecking the Enable, then saving, then re-checking Enable and saving again. Similarly, if I am able to get a shell working I can run the following:
Code:
root@fileserver:~ # /etc/directoryservice/ActiveDirectory/ctl start
False
True
Join is OK
False
True

The syslog shows nothing when this issue happens, and shows the following when I run the above:
Code:
Aug 13 20:12:41 fileserver ActiveDirectory: /usr/local/bin/python /usr/local/bin/midclt call notifier.stop cifs
Aug 13 20:12:43 fileserver ActiveDirectory: /usr/sbin/service ix-hostname quietstart
Aug 13 20:12:43 fileserver ActiveDirectory: /usr/sbin/service ix-kerberos quietstart default LOCATION.DOMAIN.COM
Aug 13 20:12:43 fileserver ActiveDirectory: /usr/sbin/service ix-nsswitch quietstart
Aug 13 20:12:43 fileserver ActiveDirectory: /usr/sbin/service ix-ldap quietstart
Aug 13 20:12:43 fileserver ActiveDirectory: /usr/sbin/service ix-kinit quietstart
Aug 13 20:12:45 fileserver ActiveDirectory: /usr/sbin/service ix-kinit status
Aug 13 20:12:45 fileserver ActiveDirectory: /usr/local/bin/python /usr/local/bin/midclt call notifier.start cifs
Aug 13 20:12:50 fileserver ActiveDirectory: /usr/sbin/service ix-activedirectory quietstart
Aug 13 20:12:53 fileserver ActiveDirectory: /usr/sbin/service ix-activedirectory status
Aug 13 20:12:55 fileserver ActiveDirectory: /usr/local/bin/python /usr/local/bin/midclt call notifier.stop cifs
Aug 13 20:12:56 fileserver kernel: Failed to fully fault in a core file segment at VA 0x819849000 with size 0x11000 to be written at offset 0x63f2000 for process smbd
Aug 13 20:12:56 fileserver kernel: Failed to fully fault in a core file segment at VA 0x819849000 with size 0x11000 to be written at offset 0x63f2000 for process smbd
Aug 13 20:12:56 fileserver kernel: pid 10232 (smbd), uid 0: exited on signal 6 (core dumped)
Aug 13 20:12:56 fileserver ActiveDirectory: /usr/local/bin/python /usr/local/bin/midclt call notifier.start cifs
Aug 13 20:13:01 fileserver ActiveDirectory: /usr/sbin/service ix-pam quietstart
Aug 13 20:13:01 fileserver ActiveDirectory: /usr/local/bin/python /usr/local/bin/midclt call notifier.cachetool fill

Once the AD is restarted everything appears to return to normal. I can access the SMB shares, the dashboard updates, and life is good.

This is a live system in use by 100+ servers 24/7 via NFS and 100+ users during work hours, so reboots are... difficult, but restarting Samba or modifying some share properties is do-able.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,545
PM me a debug file (System -> Advanced -> Save Debug). The next time it happens, check output of service samba_server status. If smbd is stopped, run service samba_server start. Otherwise, try service samba_server restart. This should be less disruptive than restarting AD.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,545
Hi @anodos, is there a recommended place to upload the debug files - this one is 150MB and exceeds the PM-able size limits.

Also, I remembered that I had turned on "Enable AD Monitoring" when I did the upgrade, so I have turned that off for now, just in case this is related to https://www.ixsystems.com/community...ifs-shares-periodically-stop-and-start.67236/.
Yes, AD monitoring should be turned off completely. If you PM me, we can work out something with google drive.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Sounds good. Is there a thread or resource that has more details on the AD monitoring and why is should be disabled?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,545
Sounds good. Is there a thread or resource that has more details on the AD monitoring and why is should be disabled?
Not really, this is just my opinion on the matter. It's re-implemented in 11.3.

In 11.3 the configuration fields are removed entirely from the UI, and be replaced with the following health checks and alerts if AD is enabled:

1) periodically issue a no-effect command to the current DC samba is speaking to. This checks if the secure channel is still alive. (Once every 10 minutes).
2) once per day, verify that time on the NAS doesn't differ from the DCs by more than 3 minutes
3) instantaneously alert if winbindd internally switches domain status to 'OFFLINE'.

(1) and (2) also generate alerts. In 11.3, no action beyond alerting be performed. In 11.2, a false-positive can trigger service restarts and lead to service disruption.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Sadly this morning the access was broken again. Disabling and Enabling Active Directory was able to restore it.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Happened again tonight. That does fix it.
Code:
# service samba_server restart
Performing sanity check on Samba configuration: OK
Stopping winbindd.
Waiting for PIDS: 44037.
Stopping smbd.
Waiting for PIDS: 44031, 44031.
Stopping nmbd.
Waiting for PIDS: 44027.
Performing sanity check on Samba configuration: OK
Starting nmbd.
Starting smbd.
Starting winbindd.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Based on debug with @anodos it was noticed that winbind seems to be having issues:
Code:
[2019/08/15 09:11:28.832452,  0] ../source3/auth/auth_winbind.c:122(check_winbind_security)
  check_winbind_security: winbindd not running - but required as domain member: NT_STATUS_NO_LOGON_SERVERS
[2019/08/15 09:11:28.832714,  0] ../source3/auth/auth_winbind.c:122(check_winbind_security)
  check_winbind_security: winbindd not running - but required as domain member: NT_STATUS_NO_LOGON_SERVERS
[2019/08/15 09:11:28.834196,  1] ../lib/param/loadparm.c:1822(lpcfg_do_global_parameter)
  WARNING: The "acl check permissions" option is deprecated


We modified the SMB Auxiliary Parameters from:
Code:
ea support = no
store dos attributes = no
acl check permissions = no

to:
Code:
store dos attributes = no
winbind max domain connections = 10
winbind offline logon = no


Sadly, this issue is still present. Frequency varies from 2/day to 1/3 days. Not sure what the trigger is yet.
 

Chris Tobey

Contributor
Joined
Feb 11, 2014
Messages
114
Disabled Domain Trusts (as our configuration no longer required them anyway), but still seeing samba stop responding.
 

godhanuman

Cadet
Joined
May 21, 2021
Messages
3
Were you able to solve? I am having the same problem. but I can't understand where it could come from
 
Top