背景:
CDH 6.3.1集群上kudu实例Tablet Server异常无法启动。根据报错信息提示时钟同步问题无法启动Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Timed out waiting for clock sync: Error reading clock. Clock considered unsynchronized,排查NTP服务正常,重启无日志生成。最初异常日志信息如下:
/sbin/ntpdc -n -c timeout 1000 -c peers -c sysinfo -c sysstats -c version
------------------------------------------
stdout:
remote local st poll reach delay offset disp
=======================================================================
*xx.xx.xx.48 xx.xx.xx.36 11 64 77 0.00003 -0.000299 0.43404
system peer: xx.xx.xx.48
system peer mode: client
leap indicator: 11
stratum: 12
precision: -24
root distance: 0.00003 s
root dispersion: 0.20044 s
reference ID: [xx.xx.xx.48]
reference time: e3a13758.6d96d3f7 Thu, Jan 7 2021 15:31:36.428
system flags: auth ntp kernel stats
jitter: 0.000000 s
stability: 0.000 ppm
broadcastdelay: 0.000000 s
authdelay: 0.000000 s
time since restart: 577
time since reset: 577
packets received: 44
packets processed: 10
current version: 10
previous version: 0
declined: 0
access denied: 0
bad length or format: 0
bad authentication: 0
rate exceeded: 0
ntpdc 4.2.6p5@1.2349-o Mon Oct 9 16:33:13 UTC 2017 (1)
下午3点32:10.251分 WARN cc:97
could not find executable: chronyc
下午3点32:10.253分 WARN cc:97
could not find executable: chronyc
下午3点32:10.253分 FATAL cc:85
Check failed: _s.ok() Bad status: Service unavailable: Cannot initialize clock: Timed out waiting for clock sync: Error reading clock. Clock considered unsynchronized
下午3点50:16.776分 INFO cc:2578
经过多方尝试后处理方法为:
1.重启该实例服务器。(尝试过不重启服务器仅修改参数无法启动实例且无日志生成,尝试重启服务器不修改参数无法启动实例相同报错信息)
2.重启服务器后检查ntp服务恢复正常,待CM界面ntp告警恢复正常。
3.进入该实例配置gflagfile参数添加:--max_clock_sync_error_usec=20000000
4.重启Tablet Server服务