系列文章目录
文章目录
前言
生产环境中,特别是私有云环境中,由于网络问题,导致各个服务器的时间不一致,部分时间跟时钟源存在较大偏差,而生产环境下,如果机器不能直接下线,而需要在线平滑追数据(大部分场景下,是不允许跳变的),本问题主要探讨ntpd模和chrony模式下,平滑追数据时的表现。
一、环境信息
使用本地的vmstation创建2个虚拟机,信息如下
节点名称 | 节点IP | 节点配置 | 操作系统 | 备注 |
---|---|---|---|---|
host-1 | 192.168.82.128 | 1c1g 20g | CentOS7.4 | 作为系统内部的对时源,时间跟外部公有云的对时源一致 |
host-2 | 192.168.82.129 | 1c1g 20g | CentOS7.4 | 对时源是192.168.82.128,通过提调整本地时间,模拟时间延迟的场景 |
二、ntpd模式
2.1 版本信息
本文使用的版本是 Ver. 4.2.6p5,通过yum安装完成
2.2 配置ntpd对时源
通过启动服务器192.168.82.128的chronyd服务,并配置本节点作为系统内部的对时源
2.2.1 配置ntpd服务配置,启动ntpd服务
ntpd的主要配置文件是
cat /etc/sysconfig/ntpd
cat /etc/ntp.conf
/etc/ntpd.conf关键配置
[root@host-1 ~]# cat /etc/ntp.conf |grep -v '#'
driftfile /var/lib/ntp/drift
restrict default nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::1
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
server time.cloud.tencent.com iburst
server time1.cloud.tencent.com iburst
server time2.cloud.tencent.com iburst
server time3.cloud.tencent.com iburst
server time4.cloud.tencent.com iburst
server time5.cloud.tencent.com iburs
/etc/sysconfig/ntpd关键配置
[root@host-1 ~]# cat /etc/sysconfig/ntpd
OPTIONS="-g"
[root@host-1 ~]#
查看服务启动情况
[root@host-1 ~]# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
Active: active (running) since Sun 2022-08-07 06:25:06 PDT; 7min ago
Process: 39906 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 39907 (ntpd)
Tasks: 1
Memory: 1.5M
CGroup: /system.slice/ntpd.service
└─39907 /usr/sbin/ntpd -u ntp:ntp -g
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 4 docker0 172.17.0.1 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 5 virbr0 192.168.122.1 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 6 lo ::1 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 7 vetha5a8b14 fe80::9448:baff:fe4e:caf3 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 8 ens33 fe80::20c:29ff:feb9:8849 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listen normally on 9 docker0 fe80::42:88ff:fee9:b582 UDP 123
Aug 07 06:25:06 host-1 ntpd[39907]: Listening on routing socket on fd #26 for interface updates
Aug 07 06:25:07 host-1 ntpd[39907]: 0.0.0.0 c016 06 restart
Aug 07 06:25:07 host-1 ntpd[39907]: 0.0.0.0 c012 02 freq_set kernel 8.020 PPM
Aug 07 06:25:13 host-1 ntpd[39907]: 0.0.0.0 c615 05 clock_sync
[root@host-1 ~]#
2.3.2 查看对时情况
查看对时源的同步情况
[root@host-1 ~]# ntpstat
synchronised to NTP server (111.230.189.174) at stratum 3
time correct to within 45 ms
polling server every 64 s
[root@host-1 ~]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
-139.199.215.251 100.122.36.196 2 u 15 64 375 36.717 2.310 3.776
*111.230.189.174 100.122.36.196 2 u 81 64 276 34.307 -0.278 1.757
+139.199.214.202 100.122.36.196 2 u 87 64 356 36.572 -1.035 2.475
+134.175.254.134 100.122.36.196 2 u 20 64 377 33.943 -0.948 3.200
-134.175.253.104 9.20.184.92 2 u 17 64 177 32.041 2.798 1.885
说明时钟同步完成
注意使用ntpq -p
查询时间差异时,会有缓存数据,并不完全实时,可以通过ntpdate -q ${ip}
代替
2.3 配置ntpd客户端
通过启动服务器192.168.82.129的ntpd服务,作为客户端,并设置192.168.82.128作为对时源,并进行时钟同步。
2.3.1 配置ntpd服务配置,启动ntpd服务
具体的配置如下
[root@host-2 chrony]# cat /etc/ntp.conf |grep -v '#'
driftfile /var/lib/ntp/drift
restrict default nomodify notrap nopeer noquery
restrict 127.0.0.1
restrict ::1
server 192.168.92.128 iburst
includefile /etc/ntp/crypto/pw
keys /etc/ntp/keys
disable monitor
[root@host-2 chrony]# cat /etc/sysconfig/ntpd
# Command line options for ntpd
#SYNC_HWCLOCK=yes
OPTIONS="-g -x"
[root@host-2 chrony]#
关键配置如下
通过bash man ntpd
可以获取服务的说明,有几个关键配置需要单独拎出来说明一下。也可以参考官方说明,把说明列出来,并加以一些说明
-g
Normally, ntpd exits with a message to the system log if the offset exceeds the panic threshold,
which is 1000 s by default.
This option allows the time to be set to any value without restriction; however, this can happen only once.
If the threshold is exceeded after that, ntpd will exit with a message to the system log.
This option can be used with the -q and -x options.
See the tinker command for other options.
大致的关键信息是,该参数允许完成一次任意时间的跳变,也就是说,如果当前节点的时钟源相差大于1000s以外,也是运行当前节点能够快速跳变到任意时间的。如果1次完成跳变后,依然还是在1000s以外并持续10min后,ntpd服务主动退出(通俗点,就是ntpd服务1次的保命机会)。
-x
Normally, the time is slewed if the offset is less than the step threshold,
which is 128 ms by default, and stepped if above the threshold.
This option sets the threshold to 600 s, which is well within the accuracy window to set the clock manually.
Note: Since the slew rate of typical Unix kernels is limited to 0.5 ms/s, each second of adjustment requires an amortization inter‐val of 2000 s.
Thus, an adjustment as much as 600 s will take almost 14 days to complete.
This option can be used with the -g and -q options. See the tinker command for other options.
Note: The kernel time discipline is disabled with this option and the step threshold is applied also to leap second corrections.
这个参数允许当前节点和时钟源的时间差在600s能够实现渐变的进行时间同步,但是超过600s以后,还是会进行跳变。
套用别人的一个总结。
没有开启的配置,也单独说明一下
tinker panic 0
该参数允许当前节点的时间差和时钟源的时间差能够突破1000s的限制,能够允许超过1000s的时间差而ntpd服务进程不退出,但是不影响当前节点同步时间动作。
查看服务启动情况
[root@host-2 chrony]# systemctl status ntpd
● ntpd.service - Network Time Service
Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
Active: active (running) since Thu 2022-08-04 20:06:56 PDT; 2 days ago
Process: 106375 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 106377 (ntpd)
Tasks: 1
Memory: 1.2M
CGroup: /system.slice/ntpd.service
└─106377 /usr/sbin/ntpd -u ntp:ntp -g -x
Aug 04 20:06:56 host-2 ntpd[106377]: 0.0.0.0 c016 06 restart
Aug 04 20:06:56 host-2 ntpd[106377]: 0.0.0.0 c012 02 freq_set ntpd 0.867 PPM
Aug 04 20:07:03 host-2 ntpd[106377]: 0.0.0.0 c61c 0c clock_step +210615.841292 s
Aug 07 06:37:19 host-2 ntpd[106377]: 0.0.0.0 c614 04 freq_mode
Aug 07 06:37:20 host-2 ntpd[106377]: 0.0.0.0 c618 08 no_sys_peer
Aug 07 06:42:09 host-2 ntpd[106377]: Deleting interface #3 ens33, 192.168.92.129#123, interface stats: received=12, sent=12, dropped=0, active_time=297 secs
Aug 07 06:42:09 host-2 ntpd[106377]: 192.168.92.128 interface 192.168.92.129 -> (none)
Aug 07 06:42:12 host-2 ntpd[106377]: Listen normally on 8 ens33 192.168.92.129 UDP 123
Aug 07 06:42:12 host-2 ntpd[106377]: new interface(s) found: waking up resolver
Aug 07 06:42:13 host-2 ntpd[106377]: 0.0.0.0 c628 08 no_sys_peer
[root@host-2 chrony]#
2.3.2 查看对时情况
查看对时源的同步情况
[root@host-2 chrony]# ntpstat
synchronised to NTP server (192.168.92.128) at stratum 4
time correct to within 1007 ms
polling server every 64 s
[root@host-2 chrony]# ntpstat
synchronised to NTP server (192.168.92.128) at stratum 4
time correct to within 1007 ms
polling server every 64 s
[root@host-2 chrony]# ntpq -p
remote refid st t when poll reach delay offset jitter
==============================================================================
*192.168.92.128 111.230.189.174 3 u 24 64 1 0.275 1.557 0.049
[root@host-2 chrony]# ntpdate -q 192.168.92.128
server 192.168.92.128, stratum 3, offset 0.001456, delay 0.02597
7 Aug 06:53:35 ntpdate[109915]: adjust time server 192.168.92.128 offset 0.001456 sec
[root@host-2 chrony]#
说明时间已经完成同步。
2.4 关键场景
如下几个问题,是生产环境中最关注的问题。大致如下
2.4.1 场景1:能否持续有效的长时间进行平滑同步,直到时间追上了?
经过测试,这种场景下,又分3个子场景。
2.4.1.1 场景1.1:当前节点和时钟源时间差别在600s以内,能否稳定追上?
人工设置当前节点的时间,跟时钟源的节点时间相差低于600s(383s),并观察20min,验证能否稳定在追时间。按照计算,追时间的速度是0.5ms/1s(需要2000s时间追1s),大约需要383*2000/3600=212.7h
才能追完
[root@host-2 chrony]# date -s 'Sun Aug 7 06:50:36 PDT 2022'
Sun Aug 7 06:50:36 PDT 2022
[root@host-2 ~]# while [[ 1 ]];do ntpdate -q 192.168.92.128;sleep 1;done
7 Aug 06:56:49 ntpdate[110416]: adjust time server 192.168.92.128 offset 0.001041 sec
server 192.168.92.128, stratum 3, offset 0.001061, delay 0.02582
7 Aug 06:56:56 ntpdate[110418]: adjust time server 192.168.92.128 offset 0.001061 sec
server 192.168.92.128, stratum 3, offset 383.287000, delay 0.02589
7 Aug 06:50:40 ntpdate[110420]: step time server 192.168.92.128 offset 383.287000 sec
server 192.168.92.128, stratum 3, offset 383.287006, delay 0.02591
7 Aug 06:50:47 ntpdate[110443]: step time server 192.168.92.128 offset 383.287006 sec
server 192.168.92.128, stratum 3, offset 383.287000, delay 0.02586
7 Aug 06:50:54 ntpdate[110445]: step time server 192.168.92.128 offset 383.287000 sec
server 192.168.92.128, stratum 3, offset 383.286995, delay 0.02583
7 Aug 06:51:01 ntpdate[110455]: step time server 192.168.92.128 offset 383.286995 sec
server 192.168.92.128, stratum 3, offset 383.286954, delay 0.02589
7 Aug 06:51:08 ntpdate[110457]: step time server 192.168.92.128 offset 383.286954 sec
时间差是383s,并通过截图可以看出,时间差是在持续缩小的,说明正在同步时间,持续观察20min
启动时间是
server 192.168.92.128, stratum 3, offset 383.287000, delay 0.02589
7 Aug 06:50:40 ntpdate[110420]: step time server 192.168.92.128 offset 383.287000 sec
持续了20min后
server 192.168.92.128, stratum 3, offset 383.065839, delay 0.02583
7 Aug 07:14:10 ntpdate[130815]: step time server 192.168.92.128 offset 383.065839 sec
Sun Aug 7 07:14:11 PDT 2022
比较稳定,该种场景下,应该能够持续稳定的实现平滑追时间。
2.4.1.1 场景1.2:当前节点和时钟源时间差别在600s-1000s,能否稳定追上?
人工设置当前节点的时间,跟时钟源的节点时间相差在600s-1000s(733s),并观察20min,验证能否稳定在追时间。按照计算,追时间的速度是0.5ms/1s(需要2000s时间追1s),大约需要733*2000/3600=407.7h
才能追完
根据截图,确实也在平滑的追数据,观察20min
启动时间
server 192.168.92.128, stratum 3, offset 733.321759, delay 0.02589
7 Aug 07:12:44 ntpdate[6681]: step time server 192.168.92.128 offset 733.321759 sec
Sun Aug 7 07:12:45 PDT 2022
跳变时间
server 192.168.92.128, stratum 3, offset 732.909001, delay 0.02591
7 Aug 07:26:36 ntpdate[7262]: step time server 192.168.92.128 offset 732.909001 sec
Sun Aug 7 07:26:37 PDT 2022
当时间为Sun Aug 7 07:26:37 PDT 2022时,也就是说,追数据的时间只持续了15min,发生了跳变,直接跟时钟源的时间相同!!! 。
2.4.1.1 场景1.3:当前节点和时钟源时间差别在1000s以上,能否稳定追上?
人工设置当前节点的时间,跟时钟源的节点时间相差1000s以上(518415s),并观察20min,验证能否稳定在追时间。按照计算,追时间的速度是0.5ms/1s(需要2000s时间追1s),大约需要518415*2000/3600288088h
才能追完。
根据截图,确实也在平滑的追数据,观察20min
启动时间
[root@localhost etc]# date -s 'Sun Aug 1 07:31:48 PDT 2022'
Mon Aug 1 07:31:48 PDT 2022
开始接受
server 192.168.92.128, stratum 3, offset 733.321759, delay 0.02589
7 Aug 07:12:44 ntpdate[6681]: step time server 192.168.92.128 offset 733.321759 sec
Sun Aug 7 07:12:45 PDT 2022
第一次同步时间
cat /var/log/messags
Aug 1 07:40:17 localhost ntpd[34356]: 0.0.0.0 0613 03 spike_detect +518415.746602 s
跳变时间
server 192.168.92.128, stratum 3, offset 518415.741053, delay 0.02594
7 Aug 07:54:10 ntpdate[35608]: step time server 192.168.92.128 offset 518415.741053 sec
Sun Aug 7 07:54:11 PDT 2022
当时间为Sun Aug 7 07:26:37 PDT 2022时,也就是说,追数据的时间只持续了15min,发生了跳变,直接跟时钟源的时间相同!! 。
本次测试使用的是bash -g -x
模式,在ntpd服务运行过程中第1次发生时间跳变,因此ntpd服务并不主动退出。如果在进程运行的范围内,再次出现1000s以外并且持续10min以上,则ntpd会主动退出。
bash -g -x
模式,1次跳变,ntpd服务不退出
2次跳变,时间差超过1000s并且持续10min,ntpd服务直接退出
经过测试,如果单纯使用bash -x
模式,时间差超过1000s并且持续10min,ntpd服务会直接退出。
2.4.2 场景2:如果进行时间同步期间,ntpd服务发生了重启,是否会引发时间跳变?
关键参数配置为bash -g -x
,如果重启ntpd服务会引发跳变,直接直接跟时钟源的时间相同
[root@host-2 ~]# cat /etc/sysconfig/ntpd
OPTIONS="-g -x"
[root@host-2 ~]#
关键参数配置为bash -x
,如果重启ntpd服务会引发跳变,直接直接跟时钟源的时间相同。如果是时间相差超过1000s,则会启动失败,需要人工同步时间后再启动。
查看/var/log/messages日志,发现必须要求在1000s内,才能启动,这是bash -g
的作用
总结一下,ntpd模式下,如果ntpd服务重启,节点时间的表现情况
|–|–|
| -g | 时间直接跳变到跟时钟源一致,ntpd正常启动 |
| -x | ntpd无法启动,需要人工将时间设置到1000s以内,才能启动 |
| -g -x |时间直接跳变到跟时钟源一致,ntpd正常启动|
三、chronyd模式
3.1 版本信息
chronyd时ntpd的加强版本,本文不重点讨论2者的区别,而通过验证chronyd模式下,在平滑对时过程中的表现。本文使用的版本是 3.4 (+READLINE +SECHASH +IPV6 +DEBUG),通过yum安装完成
root@host-2 chrony]# chronyc -v
chronyc (chrony) version 3.4 (+READLINE +SECHASH +IPV6 +DEBUG)
3.2 配置chronyd对时源
通过启动服务器192.168.82.128的chronyd服务,并配置本节点作为系统内部的对时源
3.2.1 配置chronyd服务配置,启动chronyd服务
具体的配置如下
[root@host-1 ~]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server time.cloud.tencent.com iburst
server time1.cloud.tencent.com iburst
server time2.cloud.tencent.com iburst
server time3.cloud.tencent.com iburst
server time4.cloud.tencent.com iburst
server time5.cloud.tencent.com iburst
# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift
# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3
# Enable kernel synchronization of the real-time clock (RTC).
rtcsync
# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *
# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2
# Allow NTP client access from local network.
allow 192.168.0.0/16
# Serve time even if not synchronized to a time source.
local stratum 10
# Specify file containing keys for NTP authentication.
#keyfile /etc/chrony.keys
# Specify directory for log files.
logdir /var/log/chrony
# Select which information is logged.
#log measurements statistics tracking
其中设置本节点作为系统的对时源,关键配置是
# Allow NTP client access from local network.
allow 192.168.0.0/16
# Serve time even if not synchronized to a time source.
local stratum 10
查看服务启动情况
oot@host-1 ~]# systemctl status chronyd
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2022-08-07 02:17:22 PDT; 1h 57min ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Process: 36016 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
Process: 36011 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 36014 (chronyd)
Tasks: 1
Memory: 440.0K
CGroup: /system.slice/chronyd.service
└─36014 /usr/sbin/chronyd
Aug 07 02:17:22 host-1 systemd[1]: Stopping NTP client/server...
Aug 07 02:17:22 host-1 systemd[1]: Stopped NTP client/server.
Aug 07 02:17:22 host-1 systemd[1]: Starting NTP client/server...
Aug 07 02:17:22 host-1 chronyd[36014]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Aug 07 02:17:22 host-1 chronyd[36014]: Frequency -7.495 +/- 6.949 ppm read from /var/lib/chrony/drift
Aug 07 02:17:22 host-1 systemd[1]: Started NTP client/server.
Aug 07 02:17:28 host-1 chronyd[36014]: Selected source 134.175.254.134
Aug 07 02:18:33 host-1 chronyd[36014]: Selected source 134.175.253.104
[root@host-1 ~]#
3.3.2 查看对时情况
查看对时源的同步情况
[root@host-1 ~]# chronyc sources
210 Number of sources = 5
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 139.199.215.251 2 10 277 113 -234us[ -234us] +/- 47ms
^+ 111.230.189.174 2 10 333 91 -900us[ -900us] +/- 50ms
^+ 139.199.214.202 2 10 377 22m -392us[ -577us] +/- 65ms
^+ 134.175.254.134 2 10 377 560 -1356us[-1356us] +/- 48ms
^* 134.175.253.104 2 10 277 21m -1578us[-1765us] +/- 35ms
查看时钟同步情况
[root@host-1 ~]# chronyc tracking
Reference ID : 86AFFD68 (134.175.253.104)
Stratum : 3
Ref time (UTC) : Sun Aug 07 10:51:11 2022
System time : 0.000320631 seconds slow of NTP time
Last offset : -0.000187386 seconds
RMS offset : 0.000809337 seconds
Frequency : 8.419 ppm slow
Residual freq : -0.008 ppm
Skew : 0.413 ppm
Root delay : 0.035052553 seconds
Root dispersion : 0.010563156 seconds
Update interval : 1026.9 seconds
Leap status : Normal
[root@host-1 ~]#
其中System time配置是表示,当前节点和时钟源时间的时间差异,本节点只有0.000320631 seconds,说明对时已经完成。至此,已经完成系统内部时钟源的配置,可以为系统内部的其他节点提供对时服务。
3.3 配置chronyd客户端
通过启动服务器192.168.82.129的chronyd服务,作为客户端,并设置192.168.82.128作为对时源,并进行时钟同步。
3.3.1 配置chronyd服务配置,启动chronyd服务
具体的配置如下
[root@host-2 chrony]# cat /etc/chrony.conf
# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
server 192.168.92.128 iburst
# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift
# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 99999999999 3
# Enable kernel synchronization of the real-time clock (RTC).
rtcsync
# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *
# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2
# Allow NTP client access from local network.
#allow 192.168.0.0/16
# Serve time even if not synchronized to a time source.
#local stratum 10
# Specify file containing keys for NTP authentication.
#keyfile /etc/chrony.keys
# Specify directory for log files.
logdir /var/log/chrony
# Select which information is logged.
#log measurements statistics tracking
stratumweight 0
driftfile /var/lib/chrony/drift
rtcsync
bindcmdaddress 127.0.0.1
bindcmdaddress ::1
keyfile /etc/chrony.keys
commandkey 1
generatecommandkey
noclientlog
logchange 0.5
logdir /var/log/chrony
maxupdateskew 500
maxslewrate 500
[root@host-2 chrony]#
关键配置列表如下
server 192.168.92.128 iburst 设置对时源,本节点设置为** 192.168.92.128**,并进行时钟同步
makestep 99999999999 3 这个配置很关键,设置进行时间跳变的时间差,以及跳变次数。在启动chronyd服务时,会检查当前节点的时钟和对时源的时钟差异,如果超出时钟差阈值(99999999999s),则会通过3次时间跳变,进行时间的强制同步(这一点很重要,在生产系统内,应该是不允许这种情况发生的)
maxupdateskew 500
设置平滑追数据的频率
maxslewrate 500
查看服务启动情况
oot@host-1 ~]# systemctl status chronyd
● chronyd.service - NTP client/server
Loaded: loaded (/usr/lib/systemd/system/chronyd.service; enabled; vendor preset: enabled)
Active: active (running) since Sun 2022-08-07 02:17:22 PDT; 1h 57min ago
Docs: man:chronyd(8)
man:chrony.conf(5)
Process: 36016 ExecStartPost=/usr/libexec/chrony-helper update-daemon (code=exited, status=0/SUCCESS)
Process: 36011 ExecStart=/usr/sbin/chronyd $OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 36014 (chronyd)
Tasks: 1
Memory: 440.0K
CGroup: /system.slice/chronyd.service
└─36014 /usr/sbin/chronyd
Aug 07 02:17:22 host-1 systemd[1]: Stopping NTP client/server...
Aug 07 02:17:22 host-1 systemd[1]: Stopped NTP client/server.
Aug 07 02:17:22 host-1 systemd[1]: Starting NTP client/server...
Aug 07 02:17:22 host-1 chronyd[36014]: chronyd version 3.4 starting (+CMDMON +NTP +REFCLOCK +RTC +PRIVDROP +SCFILTER +SIGND +ASYNCDNS +SECHASH +IPV6 +DEBUG)
Aug 07 02:17:22 host-1 chronyd[36014]: Frequency -7.495 +/- 6.949 ppm read from /var/lib/chrony/drift
Aug 07 02:17:22 host-1 systemd[1]: Started NTP client/server.
Aug 07 02:17:28 host-1 chronyd[36014]: Selected source 134.175.254.134
Aug 07 02:18:33 host-1 chronyd[36014]: Selected source 134.175.253.104
[root@host-1 ~]#
3.3.2 查看对时情况
查看对时源的同步情况
[root@host-2 chrony]# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.92.128 3 6 17 9 +8390ns[ +31us] +/- 25ms
[root@host-2 chrony]#
查看时钟同步情况
[root@host-2 chrony]# chronyc tracking
Reference ID : C0A85C80 (192.168.92.128)
Stratum : 4
Ref time (UTC) : Sun Aug 07 11:33:14 2022
System time : 0.000000021 seconds slow of NTP time
Last offset : +0.000022861 seconds
RMS offset : 0.000022861 seconds
Frequency : 7.997 ppm slow
Residual freq : +3.821 ppm
Skew : 0.423 ppm
Root delay : 0.039650649 seconds
Root dispersion : 0.005133089 seconds
Update interval : 2.0 seconds
Leap status : Normal
[root@host-2 chrony]#
其中System time配置是表示,当前节点和时钟源时间的时间差异,本节点只有0.000000021 seconds,说明对时已经完成。
3.4 关键场景
如下几个问题,是生产环境中最关注的问题。大致如下
3.4.1 场景1:能否持续有效的长时间进行平滑同步,直到时间追上了?
在启动192.168.31.129节点的chronyd服务后,人工设置当前节点的时间,使其小于时钟源的时间
[root@host-2 chrony]# date -s 'Fri Aug 4 18:30:08 PDT 2022'
Thu Aug 4 18:30:08 PDT 2022
[root@host-2 chrony]# date
Thu Aug 4 18:30:27 PDT 2022
[root@host-2 chrony]#
查看对时情况
[root@host-2 chrony]# chronyc sources
210 Number of sources = 1
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^? 192.168.92.128 3 6 377 8 -3489m[ -3489m] +/- 25ms
[root@host-2 chrony]#
[root@host-2 chrony]# chronyc tracking
Reference ID : C0A85C80 (192.168.92.128)
Stratum : 4
Ref time (UTC) : Fri Aug 05 01:28:06 2022
System time : 0.000000000 seconds slow of NTP time
Last offset : +0.000000808 seconds
RMS offset : 0.000033040 seconds
Frequency : 8.521 ppm slow
Residual freq : -0.002 ppm
Skew : 0.365 ppm
Root delay : 0.039602775 seconds
Root dispersion : 0.005342626 seconds
Update interval : 64.4 seconds
Leap status : Normal
[root@host-2 chrony]#
并没有马上完成时间对时,稍微等几分钟,等他完成对时
大约等待了1min后,时钟源对时上了,并开始进行平滑时间同步
[root@host-2 ~]# while [[ 1 ]];do chronyc tracking|grep ‘System time’;sleep 1;done
System time : 209367.500000000 seconds slow of NTP time
System time : 209367.500000000 seconds slow of NTP time
System time : 209367.500000000 seconds slow of NTP time
System time : 209367.500000000 seconds slow of NTP time
System time : 209367.500000000 seconds slow of NTP time
时间在缩小,说明确实在进行平滑的追时间。
经过实际的验证测试,持续同步2h(更多时间没有持续验证,有时间进一步验证),并没有出现跳变的情况,平滑同步时间表现稳定,符合预期。
3.4.2 场景2:如果进行时间同步期间,chronyd服务发生了重启,是否会引发时间跳变?
这里涉及到关键配置makestep 99999999999 3,由于当前节点跟时钟源的时间差是209367.500000000s,并没有超出99999999999 s,因此没有触发时间跳变的逻辑,需要重点注意!!!
当前节点跟时钟源的时间差是209367.500000000s,配置makestep 99999999999 3重启chronyd服务,不会引发时间跳变
作为验证对比,我们将makestep 99999999999 3设置为makestep 1 3,重启chronyd服务,会引发当前节点的时间跳变
四、总结
4.1 ntpd模式下
4.1.1 平滑追时间表现
不同时间范围情况下, ntpd平滑追时间的表现
时间范围 | 备注 |
---|---|
0-600s | 追时间比较平稳,时间平稳 |
600s-1000s | 平稳追15分钟后,会发生跳变,时间跳变成时钟源相同 |
1000s以上 | 平稳追15分钟,会发生跳变,时间跳变成时钟源相同。如果此后时间再次出现1000s以上并持续10min,ntpd进程会退出 |
4.1.2 发生服务重启表现
当前节点与时钟源的时间超过1000s时,3种模式下的重启ntpd服务
模式 | 备注 |
---|---|
-g | 时间直接跳变到跟时钟源一致,ntpd正常启动 |
-x | ntpd无法启动,需要人工将时间设置到1000s以内,才能启动 |
-g -x | 时间直接跳变到跟时钟源一致,ntpd正常启动 |
4.2 chronyd模式下
4.2.1 平滑追时间表现
时间范围 | 备注 |
---|---|
任意时间范围 | 追时间比较平稳,时间平稳 |
4.2.2 发生服务重启表现
如果在追时间期间重启chronyd服务
需要注意设置makestop配置,只需要阈值高于时间差,则不会触发跳变,反正则会触发跳变
4.3 建议
日常生产环境过程,节点的时间跟时钟源相差几天是比较常见的现象,在进行追时间时建议使用chrony服务进行,比较稳定可靠。ntpd服务的设计,就是对当前节点和时钟源的时间差有要求,从理念上要去两者需要在1000s以内,否则则认为不符合生产条件,应当人工干预修复。