NIS won't start. [EFAULT] NIS cache already exists. Refusing to generate cache.

sandbender

Cadet
Joined
Dec 29, 2022
Messages
5
I recently changed my NIS server from CentOS to Ubuntu. I did this by simple creating a new server with new information. I'm using the same NIS domain name and IP address for the new server. My two TrueNAS servers are having trouble with this however. I get the following error when I try to start NIS
[2022/12/29 08:00:46] (DEBUG) NISService.start():136 - NIS service successfully started. Setting state to HEALTHY.
[2022/12/29 08:00:46] (ERROR) middlewared.job.run():367 - Job <bound method NISService.fill_cache of <middlewared.plugins.nis.NISService object at 0x81b13eeb0>> failed
Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 355, in run
await self.future
File "/usr/local/lib/python3.9/site-packages/middlewared/job.py", line 393, in __run_body
rv = await self.middleware.run_in_thread(self.method, *([self] + args))
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1154, in run_in_thread
return await self.run_in_executor(self.thread_pool_executor, method, *args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/main.py", line 1151, in run_in_executor
return await loop.run_in_executor(pool, functools.partial(method, *args, **kwargs))
File "/usr/local/lib/python3.9/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/usr/local/lib/python3.9/site-packages/middlewared/plugins/nis.py", line 234, in fill_cache
raise CallError('NIS cache already exists. Refusing to generate cache.')
middlewared.service_exception.CallError: [EFAULT] NIS cache already exists. Refusing to generate cache.
In looking at the nis.py code I see the following
def fill_cache(self, job, force=False):
user_next_index = group_next_index = 200000000
if self.middleware.call_sync('cache.has_key', 'NIS_cache') and not force:
raise CallError('NIS cache already exists. Refusing to generate cache.')
Clearly the old NIS cache is in the database and the service is basically refusing to overwrite it. Is there some way for me to remove the cache from the system database? Or to get the force flag set to True?

I am running TrueNAS-13.0-U2.

Thanks
 

sandbender

Cadet
Joined
Dec 29, 2022
Messages
5
Okay. I'll give that a shot. Thanks for responding. I did get past the initial problem by modifying the nis.py file to set the default to True and then restarting middlewared. But it then had other problems.

It is really unfortunate that there is nothing that really replaces NIS. LDAP is way more complicated and totally overkill for small systems. And AD also has problems and really Windows centric.
 

sandbender

Cadet
Joined
Dec 29, 2022
Messages
5
I did the upgrade and the reported problem seems to be gone. But I am still not connecting to the NIS server. Here is what I get in the middlewared.log file when I try to rebuild the service cache

DSCache.refresh():216 - Unable to refresh [nis] cache, state is: FAULTED

I'm not sure if this means the cache is faulted or if it can't because it is not connected to the server. Other clients are accessing the server with no problem. But the two TrueNAS boxes are not.

I am getting lots of messages on the NIS server when clients connect like the following:

Dec 29 19:55:02 service ypserv[102109]: refused connect from 10.1.1.1:37700 to procedure ypproc_match (nis.beyond,shadow.byname;-1)

But the clients are still able to read the NIS maps.

Thanks for your help.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
I did the upgrade and the reported problem seems to be gone. But I am still not connecting to the NIS server. Here is what I get in the middlewared.log file when I try to rebuild the service cache

DSCache.refresh():216 - Unable to refresh [nis] cache, state is: FAULTED

I'm not sure if this means the cache is faulted or if it can't because it is not connected to the server. Other clients are accessing the server with no problem. But the two TrueNAS boxes are not.

I am getting lots of messages on the NIS server when clients connect like the following:

Dec 29 19:55:02 service ypserv[102109]: refused connect from 10.1.1.1:37700 to procedure ypproc_match (nis.beyond,shadow.byname;-1)

But the clients are still able to read the NIS maps.

Thanks for your help.
Not all implementations of NIS are identical. We change state to FAULTED when we're unable to ypbind or ypwhich. The reason why should be in the middleware logs.

We try to unconfigure NIS pretty quickly if things are rejected because NSS can end up getting blocked in some situations which leads to a bad day all-around.
 

sandbender

Cadet
Joined
Dec 29, 2022
Messages
5
Clearly there is something about the server that TrueNAS doesn't like. When I disable ypbind I get this in the log
[2022/12/29 16:01:33] (DEBUG) NISService.stop_impl():235 - NIS service successfully stopped. Setting state to DISABLED.
But I don't get any type of ENABLED message when I enable it.

I'm looking at both the middlewared.log file and the messages file.

Is there any way to increase the verbosity of the logging on the TrueNAS side? I think there is something wrong on the NIS server too. Not bad enough to prevent the other Linux clients from working. But the TrueNAS clients seem to be more picky about it.
 

anodos

Sambassador
iXsystems
Joined
Mar 6, 2014
Messages
9,554
Clearly there is something about the server that TrueNAS doesn't like. When I disable ypbind I get this in the log

But I don't get any type of ENABLED message when I enable it.

I'm looking at both the middlewared.log file and the messages file.

Is there any way to increase the verbosity of the logging on the TrueNAS side? I think there is something wrong on the NIS server too. Not bad enough to prevent the other Linux clients from working. But the TrueNAS clients seem to be more picky about it.
TrueNAS Core isn't Linux. If the service state is FAULTED, then the NIS domain and other configuration details should still be in place. You can try manually running ypwhich to see whether you're hitting RPC errors. If the NIS domain is on a separate subnet you'll need to enable manycast on our config. Generally speaking, there are reasons why most businesses / institutions have switched away from NIS over the years.
 

sandbender

Cadet
Joined
Dec 29, 2022
Messages
5
It looks to me like the TrueNAS server is not even trying to contact the NIS server. They are on the same subnet. And I can ping the server from the filer.

root@filer1:~ # ypwhich
ypwhich: can't yp_bind: reason: RPC failure

rpcinfo output looks fine with no issues. Running rpcinfo with the NIS server IP from the filer looks fine. It's the same output as when run from a functioning NIS client.

If I run ypserv manually from the NIS server with the -d option I see the other clients connect. But I never see a connection from the filer.

What is really strange is that if I start up the old NIS server at a different IP and don't do anything else I see messages like this the daemon.log file. The 10.0.0.5 is the new temporary IP address of the old filer.

Dec 29 17:51:35 filer1 1 2022-12-29T17:51:35.506326-08:00 filer1.mydomain.com /usr/sbin/ypbind 10661 - - NIS server at 10.0.0.5 not in restricted mode access list -- rejecting.
I truly understand why NIS is problematic. And if there were an alternative that was simple to use I would do so in a heartbeat. But my only other options are LDAP (absurdly complicated), Active Directory (Microsoft), or Kerberos (even more complicated than LDAP).

I have a small number of users (around a dozen) and even fewer machines which need the directory service. NIS is just really simple. Do you have recommendations for alternatives?

Thanks for all your help by the way. I feel the problem is something simple but I just can't figure out what that may be.
 
Top