Time not syncing with timeserver

duskwither · Mar 20, 2024

Two separate Truenas scale machines, both bare metal being used as SMB fileserver within a Windows AD environment. The timeserver is the Windows domain controller and is a VM getting it's time from the ESZX host it's on. Domain controller time is correct, windows clients in the domain have no time issues either. Both are in UTC timezone or at least configured to be.

My Truenas machines have time drift issues which i can't explain.

Massive offset (i changed the polling interval to 16 to get some faster results):

Code:

root@hostname[~]# ntpq -npcrv
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 10.151.128.6    .LOCL.           1 u    -   16  377    0.397  +12795.   1.154

associd=0 status=c016 leap_alarm, sync_unspec, 1 event, restart,
version="ntpd 4.2.8p15@1.3728-o Wed Sep 23 11:46:38 UTC 2020 (1)",
processor="x86_64", system="Linux/5.10.142+truenas", leap=11, stratum=16,
precision=-23, rootdelay=0.000, rootdisp=0.000, refid=.,
reftime=(no time),
clock=e9a5783c.85ef52fa  Wed, Mar 20 2024 14:59:08.523, peer=0, tc=3,
mintc=3, offset=+0.000000, frequency=+0.000, sys_jitter=0.000000,
clk_jitter=0.000, clk_wander=0.000

Timedatectl output:

Code:

root@hostname[~]# timedatectl
               Local time: Wed 2024-03-20 14:58:21 UTC
           Universal time: Wed 2024-03-20 14:58:21 UTC
                 RTC time: Wed 2024-03-20 14:58:22
                Time zone: UTC (UTC, +0000)
System clock synchronized: no
              NTP service: n/a
          RTC in local TZ: no

ntp.conf

Code:

root@hostname[~]# cat /etc/ntp.conf
server 10.151.128.6 iburst maxpoll 10 minpoll 4
restrict default ignore
restrict -6 default ignore
restrict 127.0.0.1
restrict -6 ::1
restrict 127.127.1.0
restrict 10.151.128.6 nomodify notrap nopeer noquery

Checked bios for both machines, don't think I can set a timezone there but time is not too far off real-time.
Both machines have been upgraded from truenas core, which had no timesync issues after using the ntpdate command in a cronjob. Gui settings in Truenas Core were always probematic is as well.
I checked with tcpdump and can see UDP packets going back and forth between de timeserver and my truenas machines, but nothing seems to be changing.

I'm slowly getting grey hair trying to troubleshoot this problem, any help will be greatly appreciated.

Samuel Tai · Mar 20, 2024

On Core, TSC may be improperly flagged with an erroneously high quality. Switching to ACPI or HPET as the timecounter fixes this. See https://www.truenas.com/community/threads/system-is-going-back-in-time-wrong-date.85686/.

There may be equivalent tunables for Scale. For Ubuntu, which is a Debian derivative like Scale, see https://manpages.ubuntu.com/manpages/trusty/en/man4/timecounters.4freebsd.html.

duskwither · Mar 25, 2024

Samuel Tai said:
On Core, TSC may be improperly flagged with an erroneously high quality. Switching to ACPI or HPET as the timecounter fixes this. See https://www.truenas.com/community/threads/system-is-going-back-in-time-wrong-date.85686/.

There may be equivalent tunables for Scale. For Ubuntu, which is a Debian derivative like Scale, see https://manpages.ubuntu.com/manpages/trusty/en/man4/timecounters.4freebsd.html.

Thanks! I've been looking but can't seem to find any timecounter choices in Scale:

With sysctl I can't find kern.timecounter. There is no kern to begin with, though sysctl kernel gives me a list, without timecounter in it.

Code:

root@hostname[~]# sysctl kernel
snipped
kernel.tainted = 12289
kernel.threads-max = 1026842
kernel.timer_migration = 1
kernel.traceoff_on_warning = 0
kernel.tracepoint_printk = 0
/snipped

Dmesg does'nt give me any direct hints

Code:

root@hostname[~]# dmesg | grep -i time
[    0.009983] ACPI: PM-Timer IO Port: 0x808
[    0.065739] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.083298] Calibrating delay loop (skipped), value calculated using timer frequency.. 6399.82 BogoMIPS (lpj=12799656)
[    1.202330] workingset: timestamp_bits=36 max_order=25 bucket_order=0
[   88.747537] systemd-journald[1763]: Received client request to flush runtime journal.
[   89.036563] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
[   94.685450] RAPL PMU: API unit is 2^-32 Joules, 1 fixed counters, 163840 ms ovfl timer
[  148.900513] systemd-journald[7722]: Received client request to flush runtime journal.
[  150.896067] systemd-journald[7860]: Received client request to flush runtime journal.

\

Any tips or suggestions?

Samuel Tai · Mar 25, 2024

Try sysctl -a | grep timecounter.

duskwither · Mar 26, 2024

Samuel Tai said:
Try sysctl -a | grep timecounter.

Nothing found unfortunately

anodos · Mar 26, 2024

What version of SCALE is this? It looks like angelfish (which is very much EOL). Since angelfish we've had several time-related fixes (including ones in newer kernels).

duskwither · Mar 26, 2024

Oof, I was convinced I installed a supported version but you're right, this is Angelfish. I'll update to a supported version and report back.

anodos · Mar 26, 2024

Cobia is the current stable supported version of SCALE.

duskwither · Mar 28, 2024

Updated to Cobia and still having similar issues:

Code:

root@hostname[~]# chronyc sources -v

  .-- Source mode  '^' = server, '=' = peer, '#' = local clock.
 / .- Source state '*' = current best, '+' = combined, '-' = not combined,
| /             'x' = may be in error, '~' = too variable, '?' = unusable.
||                                                 .- xxxx [ yyyy ] +/- zzzz
||      Reachability register (octal) -.           |  xxxx = adjusted offset,
||      Log2(Polling interval) --.      |          |  yyyy = measured offset,
||                                \     |          |  zzzz = estimated error.
||                                 |    |           \
MS Name/IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
^? timeserver.domain.xxxxxx>     1   4   377     1  -3615ms[-3615ms] +/-  10.8s

The ? above means the timeserver (Windows DC) is unusable. Need to do some more troubleshooting, unless someone has a magic tip on why the timeserver seems to be unusable.

Samuel Tai · Mar 28, 2024

Your DCs are actually not running in UTC, given the large offset. Try setting reg add HKLM\System\CurrentControlSet\Control\TimeZoneInformation /v RealTimeIsUniversal /t REG_DWORD /d 0x1 on the DCs to actually put them on UTC, and adjust their BIOS clocks accordingly.

danb35 · Mar 28, 2024

Samuel Tai said:
given the large offset.

Is it that large? It's reporting 3615ms, which is only 3.6 seconds. Larger than it ought to be, yes, but not indicative of a timezone error.

Important Announcement for the TrueNAS Community.

Time not syncing with timeserver

duskwither

Cadet

Samuel Tai

Never underestimate your own stupidity

duskwither

Cadet

Samuel Tai

Never underestimate your own stupidity

duskwither

Cadet

anodos

Sambassador

duskwither

Cadet

anodos

Sambassador

duskwither

Cadet

Samuel Tai

Never underestimate your own stupidity

danb35

Hall of Famer

Similar threads