CRS报CRS-2409告警信息问题分析与处理
ORACLE 11.2.0.3
1、报错信息
检查第1节点的CRS alert log,发现存在有下面异常信息
2013-0X-XX 19:27:17.609
[ctssd(18809056)]CRS-2409:The clock on host XXXdb1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.
2013-0X-XX 19:59:42.312
[ctssd(18809056)]CRS-2409:The clock on host XXXdb1 is not synchronous with the mean cluster time. No action has been taken as the Cluster Time Synchronization Service is running in observer mode.
上面报错的意思是,ORACLE的CTSSD服务发现异常,不能处于观察模式
2、报错信息问题分析
2.1、操作
系统的NTPD服务处于启动状态,CTSSD就不会工作,但是只要CTSSD服务启动,正常情况下应该处于观察模式
2.2、当前OS的NTPD服务正在运行,并且,CTSSD不能处于观察模式运行
3、排查过程
3.1、检查两个节点的时间是否存在差异
1
2
3
4
5
|
XXXdb1:/u01/app/11.2.0.3/grid/log/XXXdb1$ssh XXXdb2
date
Mon Jul 15 20:30:17 GMT+08:00 2013
XXXdb1:/u01/app/11.2.0.3/grid/log/XXXdb1$
date
Mon Jul 15 20:30:18 GMT+08:00 2013
|
经检查,时间不存在差异
3.2、检查OS的NTPD服务
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
XXXdb1:/# lssrc -ls xntpd
Program
name
: /usr/sbin/xntpd
Version: 3
Leap indicator: 00 (
No
leap
second
today.)
Sys peer: 10.XXX.XXX.71
Sys stratum: 2
Sys
precision
: -18
Debug/Tracing: DISABLED
Root distance: 0.000397
Root dispersion: 0.013458
Reference ID: 10.XXX.XXX.71
Reference
time
: d58e6e41.d0fca000 Mon, Jul 15 2013 20:49:05.816
Broadcast delay: 0.003906 (sec)
Auth delay: 0.000122 (sec)
System flags: bclient auth pll monitor filegen
System uptime: 30149695 (sec)
Clock stability: 0.047607 (sec)
Clock frequency: 0.000000 (sec)
Peer: 10.XXX.XXX.71
flags: (configured)(sys peer)
stratum: 1, version: 3
our mode: client, his mode: server
Subsystem
Group
PID Status
xntpd tcpip 4128900 active
|
经检查两个节点,OS层的NTPD都在运行,并且可以做时间同步
3.3、检查ctssd的运行情况
1
2
3
4
5
6
|
XXXdb1:/#su - grid
XXXdb1:/home/grid$ crsctl stat res ora.ctssd -init
NAME
=ora.ctssd
TYPE=ora.ctss.type
TARGET=ONLINE
STATE=ONLINE
on
XXXdb1
|
经检查两个节点,CTSSD服务都已经启动
3.4、借助CRS的cluvfy工具诊断CTSS错误的原因
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
|
XXXdb1:/home/grid$cluvfy comp clocksync -n
all
-verbose
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware
is
installed
on
all
nodes...
Check
of
Clusterware install passed
Checking if CTSS Resource
is
running
on
all
nodes...
Check
: CTSS Resource running
on
all
nodes
Node
Name
Status
------------------------------------ ------------------------
XXXdb2 passed
XXXdb1 passed
Result: CTSS resource
check
passed
Querying CTSS
for
time
offset
on
all
nodes...
Result: Query
of
CTSS
for
time
offset passed
Check
CTSS state started...
Check
: CTSS state
Node
Name
State
------------------------------------ ------------------------
XXXdb2 Observer
XXXdb1 Observer
CTSS
is
in
Observer state. Switching over
to
clock synchronization checks using NTP
Starting Clock synchronization checks using Network
Time
Protocol(NTP)...
NTP Configuration file
check
started...
The NTP configuration file
"/etc/ntp.conf"
is
available
on
all
nodes
NTP Configuration file
check
passed
……
Checking NTP daemon command line
for
slewing
option
"-x"
Check
: NTP daemon command line
Node
Name
Slewing
Option
Set
?
------------------------------------ ------------------------
XXXdb2
no
XXXdb1
no
Result:
NTP daemon slewing
option
check
failed
on
some
nodes
PRVF-5436 : The NTP daemon running
on
one
or
more nodes lacks the slewing
option
"-x"
Result: Clock synchronization
check
using Network
Time
Protocol(NTP) failed
|
见上面标红色字体部分,在做NTP slewingoption时,两个节点都不通过,原因为,NTP没有运行在“-X”模式
3.5、检查OS层NTPD的配置
(1)检查/etc/ntpd.conf
1
2
3
4
|
server 10.XXX.XXX.71
broadcastclient
driftfile /etc/ntp.drift
tracefile /etc/ntp.trace
|
(2)检查/etc/rc.tcpip文件的配置
存在有下面信息:
1
2
|
# Start up Network
Time
Protocol (NTP) daemon
start /usr/sbin/xntpd
"$src_running"
-a
"-x"
|
看来配置不存在问题,但当前运行却不处于”-x”模式,很有可能是NTPD被重启过,启动时没有加上个”-x”参数
4、问题处理方法 (两个节点都执行)
1
2
3
4
5
6
7
8
|
4.1、停用CTSSD (grid用户执行)
$ crsctl stat resource ora.ctssd -init
4.2、停止NTPD (root用户执行)
# stopsrc -s xntpd
4.3、启动NTPD,加入“-x”模式 (root用户执行,启动需要几分钟时间)
# startsrc -s xntpd -a
"-x"
4.4、启动CTSSD (grid用户执行)
$ crsctl start resource ora.ctssd -init
|
linux 系统vi /etc/sysconfig/ntpd 加入-x选项
OPTIONS="-x -u ntp:ntp -p /var/run/ntpd.pid -g"
5、问题解决验证
5.1、启动CTSSD后CTSD日志记录信息
1
2
3
4
5
6
|
2013-XX-XX 19:18:00.427
[ctssd(6357170)]CRS-2403:The Cluster
Time
Synchronization Service
on
host XXXdb1
is
in
observer mode.
2013-XX-XX 19:18:00.888
[ctssd(6357170)]CRS-2407:The new Cluster
Time
Synchronization Service reference node
is
host XXXdb2.
2013-08-08 19:18:00.890
[ctssd(6357170)]CRS-2401:The Cluster
Time
Synchronization Service started
on
host XXXdb1.
|
5.2、借助CRS的cluvfy工具验证CTSS环境是否全部正常
1
2
3
4
5
6
7
8
|
XXXdb1:/#su - grid
XXXdb1:/home/grid$ cluvfy comp clocksync -n
all
-verbose
……..(为了节省篇幅,把正常的都省略)
Check
for
NTP daemon
or
service using UDP port 123
Node
Name
Port
Open
?
------------------------------------ ------------------------
XXXdb2 yes
XXXdb1 yes
|
Result: Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.
从上面检查信息看到,以前不正的项目现在全部正常,检查通过。