Active Directory State: FAULTED

BabaJega

Dabbler
Joined
Sep 2, 2017
Messages
15
Hello everyone,

I am a bit upset about how active directory is handled in Freenas.
This is from the perspective of someone that is not too familiar with FreeBSD.

If it works: Everything is fine.
If it does not: You are screwed.

There you stand now and have nothing to go on except the status "FAULTED". Thats it.

I can ping my domain and have set up everything exactly how its supposed to be set up.
Why does it not work from one day to another? I have no clue.
Do I get events or any log entries to start digging? Maybe, but they are not in my reach.
I have searched everywhere in the web interface and could not find anything useful on how to go on.

Please help me find out why it does not work.
Or even better: add something like an event viewer.

Thanks in advance!
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Hello everyone,

I am a bit upset about how active directory is handled in Freenas.
This is from the perspective of someone that is not too familiar with FreeBSD.

If it works: Everything is fine.
If it does not: You are screwed.

There you stand now and have nothing to go on except the status "FAULTED". Thats it.

I can ping my domain and have set up everything exactly how its supposed to be set up.
Why does it not work from one day to another? I have no clue.
Do I get events or any log entries to start digging? Maybe, but they are not in my reach.
I have searched everywhere in the web interface and could not find anything useful on how to go on.

Please help me find out why it does not work.
Or even better: add something like an event viewer.

Thanks in advance!
In the transition from HEALTHY to FAULTED should be accompanied by an alert in the GUI (top-right corner). This is typically caused by a scheduled event (such as failing a periodic check on connectivity to the netlogon share of the DC we're currently communicating with). In other cases, it may be due to winbindd transitioning from an "online" status to "offline" status. Another reason why it may happen is that our periodic domain health checks (mostly for clock skew of < 3 minutes) fails.

We don't disable the AD service in the case of a failed health check (only generate alerts), and so if the server becomes unavailable then there may be a legitimate problem in your environment. Log files to look at are /var/log/middlewared.log, /var/log/samba4/log.winbindd, and /var/log/samba4/log.wb-<DOMAIN>.

There are quite a few fixes / improvements scheduled for 11.3-U2 regarding AD, but nothing that points me one way or another to a resolution for your case.

You can PM me a debug file (System->Advanced->Save Debug) and I can check for whether there is something to fix on our side.

Here are some perhaps helpful diagnostic commands to see what's going on in your environment:
Code:
root@freenas[/mnt/dozer/SMB]# midclt call activedirectory.domain_info | jq
{
  "LDAP server": "192.168.1.108",
  "LDAP server name": "DC01.homedom.fun",
  "Realm": "HOMEDOM.FUN",
  "Bind Path": "dc=HOMEDOM,dc=FUN",
  "LDAP port": 389,
  "Server time": 1585849579,
  "KDC server": "192.168.1.108",
  "Server time offset": 0,
  "Last machine account password change": 1584969043
}

The command midclt call activedirectory.started will perform a no-op on the netlogon share of the current DC to check connectivity.
The command midclt call activeidirectory.check_clocksew will check our time vs the server in environment with the PDC emulator FSMO role.
The command midclt call activedirectory.validate_credentials will perform a test bind to the LDAP server in the AD environment (using our current kerberos ticket).
 
Last edited:

BabaJega

Dabbler
Joined
Sep 2, 2017
Messages
15
In the transition from HEALTHY to FAULTED should be accompanied by an alert in the GUI (top-right corner). This is typically caused by a scheduled event (such as failing a periodic check on connectivity to the netlogon share of the DC we're currently communicating with). In other cases, it may be due to winbindd transitioning from an "online" status to "offline" status. Another reason why it may happen is that our periodic domain health checks (mostly for clock skew of < 3 minutes) fails.

We don't disable the AD service in the case of a failed health check (only generate alerts), and so if the server becomes unavailable then there may be a legitimate problem in your environment. Log files to look at are /var/log/middlewared.log, /var/log/samba4/log.winbindd, and /var/log/samba4/log.wb-<DOMAIN>.

There are quite a few fixes / improvements scheduled for 11.3-U2 regarding AD, but nothing that points me one way or another to a resolution for your case.

You can PM me a debug file (System->Advanced->Save Debug) and I can check for whether there is something to fix on our side.

Thank you for your response! There is no need for you to look into it.
I will try to learn by myself how to read and interpret those log files you mentioned.
The hint with the log files was already enough.

Maybe there is a way to add an info box right next to "Status: FAULTED" stating why it failed.
That would help a lot in my optinion.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Try setting "winbind status fifo = Yes" under Services->SMB. I just realized that this should have been set. It won't give more diagnostic information at this time, but it should generate an alert when the winbindd connection manager detects that the domain has gone offline.
 

BabaJega

Dabbler
Joined
Sep 2, 2017
Messages
15
All right, I will change that parameter and observe the system.

It was a time synchronisation error by the way (7 seconds make a huge difference it seems). Should have checked that earlier but it was hard to find out without some kind of error report.

Maybe I will see one next time because of that parameter change. Again: Thanks for your support!
 

Jedi940

Dabbler
Joined
Feb 25, 2020
Messages
20
I'm not trying to thread jack but I'm having the same issue after an update from 11.3 U1 to 11.3 U2. It is not a clock skew issue though. Not sure what happened. I can start a new thread if that would be preferred. File permissions still seem to be working just fine on my shares though. I have tried re-entering the domain account password and restarting SMB. No obvious errors in the log files listed.

Here is the output to the commands listed above:
Code:
root@ppuprfreenas[~]# midclt call activedirectory.started
True
root@ppuprfreenas[~]# midclt call activedirectory.check_clockskew
{"pdc": "vpavddc01.domain.com", "timestamp": "2020-04-21 14:51:04.466265", "clockskew": "0:00:00.179044"}
root@ppuprfreenas[~]# midclt call activedirectory.validate_credentials
null
root@ppuprfreenas[~]# 


Does "Null" in the validate credentials command mean it isn't finding the account?
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I'm not trying to thread jack but I'm having the same issue after an update from 11.3 U1 to 11.3 U2. It is not a clock skew issue though. Not sure what happened. I can start a new thread if that would be preferred. File permissions still seem to be working just fine on my shares though. I have tried re-entering the domain account password and restarting SMB. No obvious errors in the log files listed.

Here is the output to the commands listed above:
Code:
root@ppuprfreenas[~]# midclt call activedirectory.started
True
root@ppuprfreenas[~]# midclt call activedirectory.check_clockskew
{"pdc": "vpavddc01.domain.com", "timestamp": "2020-04-21 14:51:04.466265", "clockskew": "0:00:00.179044"}
root@ppuprfreenas[~]# midclt call activedirectory.validate_credentials
null
root@ppuprfreenas[~]#


Does "Null" in the validate credentials command mean it isn't finding the account?
There's a bug that can result in the directory service state being stuck in "FAULTED" (too aggressive caching). Try the following
Code:
midclt call cache.pop DS_STATE
midclt call directoryservices.get_state
 

Jedi940

Dabbler
Joined
Feb 25, 2020
Messages
20
That appears to have fixed the issue. Thanks! Guess I should have looked over the outstanding bugs too rather than just reviewing the fixed ones.
 
Top