Samba stops working after 24 hours.

Cinjin

Dabbler
Joined
Jun 12, 2020
Messages
16
Hi All,

Let me start by saying i am fairly new to FreeNAS. The issue we are having is that we are serving SMB shares from FreeNAS 11.3-U3.1 to Windows 10 Pro computers. We have the drives mapped to the Windows servers as mapped drives through an AD policy. The issue we are having is after 24 hours the mapped drives show as disconnect and disappear from the windows servers. In order to have the drives show again on the windows computers i restart Samba from the GUI in FreeNAS.

Any ideas ?

With the help of another sysadmin we created a debug cron job for Samba and it seems that winbindd is having issues after 24 hours. Here is a little from our debug logs (changing some info for posting):

Thu Jun 11 19:57:41 PDT 2020
+ echo ---------------------------------------
---------------------------------------
+ wbinfo -u
(this shows all users)
+ wbinfo -g
(this shows all groups)
+ wbinfo -t
checking the trust secret for domain domain.com via RPC calls succeeded
+ wbinfo -P
checking the NETLOGON for domain[domain.com] dc connection to "ads2.domain.com" succeeded
+ net ads info
LDAP server: 10.30.*.*
LDAP server name: ads2.domain.com
Realm: domain.com
Bind Path: dc=domain,dc=com
LDAP port: 389
Server time: Thu, 11 Jun 2020 19:57:42 PDT
KDC server: 10.30.*.*
Server time offset: 0
Last machine account password change: Mon, 08 Jun 2020 22:45:53 PDT
+ net ads testjoin
Join is OK
+ klist
Credentials cache: FILE:/tmp/krb5cc_0
Principal: ST1$@domain.com

Issued Expires Principal
Jun 11 10:30:00 2020 Jun 11 20:30:00 2020 krbtgt/domain.com@domain.com
+ wbinfo -i 'domain\random.person'
failed to call wbcGetpwnam: WBC_ERR_WINBIND_NOT_AVAILABLE
Could not get info for user domain\random.person
+ id random.person
id: random.person: no such user
+ id 10090
id: 10090: no such user

+ tail -30 /var/log/samba4/log.wb-domain
[2020/06/11 17:00:01.489011, 1] ../../source3/libads/ldap_utils.c:111(ads_do_search_retry_internal)
ads_search_retry: failed to reconnect (No logon servers are currently available to service the logon request.)
[2020/06/11 17:00:01.489079, 1] ../../source3/winbindd/winbindd_ads.c:342(query_user_list)
query_user_list ads_search: No logon servers are currently available to service the logon request.
[2020/06/11 17:45:42.789145, 2] ../../source3/winbindd/winbindd_pam.c:2246(winbindd_dual_pam_auth)
Plain-text authentication for user domain\root returned NT_STATUS_NO_SUCH_USER (PAM: 13)
[2020/06/11 17:46:12.734175, 1] ../../source3/libads/authdata.c:177(kerberos_return_pac)
kinit failed for 'random.person@domain.com' with: Preauthentication failed (-1765328360)
[2020/06/11 17:46:12.734304, 2] ../../source3/winbindd/winbindd_pam.c:2246(winbindd_dual_pam_auth)
Plain-text authentication for user domain\random.person returned NT_STATUS_LOGON_FAILURE (PAM: 9)
[2020/06/11 17:46:21.107451, 1] ../../source3/libads/authdata.c:177(kerberos_return_pac)
kinit failed for 'random.person@domain.com' with: Preauthentication failed (-1765328360)
[2020/06/11 17:46:21.107579, 2] ../../source3/winbindd/winbindd_pam.c:2246(winbindd_dual_pam_auth)
Plain-text authentication for user domain\random.person returned NT_STATUS_LOGON_FAILURE (PAM: 9)
[2020/06/11 17:46:26.668116, 2] ../../source3/winbindd/winbindd_pam.c:2246(winbindd_dual_pam_auth)
Plain-text authentication for user domain\root returned NT_STATUS_NO_SUCH_USER (PAM: 13)
[2020/06/11 19:57:41.375897, 1] ../../source3/libads/ldap_utils.c:93(ads_do_search_retry_internal)
Reducing LDAP page size from 1000 to 500 due to IO_TIMEOUT
[2020/06/11 19:57:41.378166, 1] ../../source3/libads/ldap_utils.c:111(ads_do_search_retry_internal)
ads_search_retry: failed to reconnect (No logon servers are currently available to service the logon request.)
[2020/06/11 19:58:43.550344, 0] ../../source3/winbindd/winbindd.c:243(winbindd_sig_term_handler)
[2020/06/11 19:58:43.550361, 0] ../../source3/winbindd/winbindd.c:243(winbindd_sig_term_handler)
[2020/06/11 19:58:43.550349, 0] ../../source3/winbindd/winbindd.c:243(winbindd_sig_term_handler)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
Got sig[15] terminate (is_parent=0)
 

Cinjin

Dabbler
Joined
Jun 12, 2020
Messages
16
A few more things to add we have 2 AD servers, 1 x Windows Server 2012 (ads1) and 1 x Windows 2016 (ads2).

Under aux parameters in FreeNas for Samba we have this set:

idmap config DOMAIN: unix_primary_group = yes
idmap config DOMAIN: unix_nss_info = yes
idmap_ldb use:rfc2307 = Yes
winbind refresh tickets = yes
password server = ads2, ads1
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
+ klist
Credentials cache: FILE:/tmp/krb5cc_0
Principal: ST1$@domain.com

Issued Expires Principal
Jun 11 10:30:00 2020 Jun 11 20:30:00 2020 krbtgt/domain.com@domain.com

Your TGT is behaving like the ticket is expired before its time. Is it in sync with the rest of the infrastructure?
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Make sure your FreeNAS server is actually in sync, and not drifting. You may need to set a sysctl tunable kern.timecounter.hardware, based on which timecounter listed in sysctl kern.timecounter.choice has the highest quality. Note, timecounters may lie (on my system, TSC has quality 1000, but drifts uncontrollably), so you may have to try the 2nd or 3rd best.
 

Cinjin

Dabbler
Joined
Jun 12, 2020
Messages
16
Here is the output from FreeNAS:

+ ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
-213.251.53.11 195.66.241.2 2 u 31 64 377 166.516 -20.316 0.185
*t1.time.gq1.yah 208.71.46.33 2 u 53 64 377 7.625 -2.583 0.200
+69.10.161.7 195.205.216.85 3 u 46 64 377 41.741 -3.597 0.244
+ads1 129.128.12.20 3 u 52 64 377 0.175 -18.287 6.459
 

Samuel Tai

Never underestimate your own stupidity
Moderator
Joined
Apr 24, 2020
Messages
5,399
Check the event log on the TGT, since we've verified the time is synced everywhere.
 

Cinjin

Dabbler
Joined
Jun 12, 2020
Messages
16
So a little more detail, in order to fix the issue we dont need to restart samba we just have to restart winbindd and it starts working again.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,546
The next time it goes down, run ps aux | grep winbindd to get the PID for the winbindd child for your domain, then increase it's log level smbcontrol <pid> debug 10 and the log level of the idmap child as well, and then repeat your "id" test. Once you have done this, PM me a debug file.
 
Top