由于HP的工程师经验比较充足,所以本次工程实施对于我个人来讲实属不幸,根据与工程师沟通,前期准备工作相当到位,安装过程非常顺利,施工历时3小时左右...没能学到任何相关故障的处理办法!
工程拓扑:
数据库oracle群集安装为非RAC
DB1 info
lan0:192.168.1.1 HB ; up
lan1:172.24.1.11 Master ; up
lan2:Null Standby ; down
浮动地址 172.24.1.10
DB2 info
lan0:192.168.1.2 HB ; up
lan1:172.24.1.12 Master ; up
lan2:Null Standby ; down
浮动地址 172.24.1.10
oracle数据库启动脚本:oraStart.sh
#!/usr/bin/ksh
su - oracle
sqlplus "/ as sysdba" <<EOF
startup;
exit
EOF
oracle数据库停止脚本:oraStop.sh
#!/usr/bin/ksh
su - oracle
sqlplus "/ as sysdba" <<EOF
shutdown immediate;
exit
EOF
oracle监听启动脚本: lisStart.sh
su - oracle
#!/usr/bin/ksh
lsnrctl start
oracle监听停止脚本:lisStop.sh
su - oracle
#!/usr/bin/ksh
lsnrctl stop
MC S/G实施摘要:
前期确认:
确保oracle安装
正确配置实例
编写并测试数据库启停脚本,监听启停脚本
确认线路物理连接顺畅,主机与网络设备各部件工作正常
安装...
配置... ...
测试...
测试确认:
MC S/G仅对监听心跳信息正常与否进行判断,因此在默认配置情况下,无法针对系统服务或应用服务进行监测,即当系统服务、进程或应用服务停止后,MC S/G则无法实现资源包的切换。但MC S/G其扩展功能强大,若依托于手工脚本编写是可以实现系统服务,与应用进程监听的。以下仅对MC S/G默认配置进行基本测试...
A. 摘除DB1心跳线
DB1 lan0断开
DB2 lan0断开
·在系统中检查各系统网卡状态
#cmviewcl -v
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db1 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.1.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled db1 (current)
Alternate up enabled db2
·在db1系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
root 27911 27706 0 10:47:27 pts/ta 0:00 grep oracle
oracle 2566 1 0 Nov 22 ? 0:09 ora_pmon_m8
oracle 2568 1 0 Nov 22 ? 0:08 ora_dbw0_m8
oracle 2570 1 0 Nov 22 ? 0:12 ora_lgwr_m8
oracle 2572 1 0 Nov 22 ? 0:28 ora_ckpt_m8
oracle 2574 1 0 Nov 22 ? 0:29 ora_smon_m8
oracle 2576 1 0 Nov 22 ? 0:00 ora_reco_m8
oracle 2578 1 0 Nov 22 ? 0:17 ora_cjq0_m8
oracle 2580 1 0 Nov 22 ? 1:22 ora_qmn0_m8
oracle 2582 1 0 Nov 22 ? 0:00 ora_s000_m8
oracle 2584 1 0 Nov 22 ? 0:00 ora_d000_m8
oracle 2931 1 0 Nov 22 ? 0:00 /oracle/product/9.2.0.1/bin/tnslsnr LISTENER -inherit
·在系统中检查资源包迁移状态
# cmviewcl
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db1 up running
db2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db1
oracle在DB1应用进程正常,资源包状态良好。
此时心跳信号数据传输迁移至公共链路,即应用各主机lan1作为心跳信号传输网卡,同时lan1继续肩负业务数据传输的使命。
无须切换资源包。
B. 保持心跳线断开状态,并摘除DB1的lan1网线
DB1 lan0断开
DB2 lan0断开
DB1 lan1断开
·在系统中检查各系统网卡状态
#cmviewcl -v
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db1 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY down 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.1.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled db1 (current)
Alternate up enabled db2
·在db1系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
root 27911 27706 0 10:47:27 pts/ta 0:00 grep oracle
oracle 2566 1 0 Nov 22 ? 0:09 ora_pmon_m8
oracle 2568 1 0 Nov 22 ? 0:08 ora_dbw0_m8
oracle 2570 1 0 Nov 22 ? 0:12 ora_lgwr_m8
oracle 2572 1 0 Nov 22 ? 0:28 ora_ckpt_m8
oracle 2574 1 0 Nov 22 ? 0:29 ora_smon_m8
oracle 2576 1 0 Nov 22 ? 0:00 ora_reco_m8
oracle 2578 1 0 Nov 22 ? 0:17 ora_cjq0_m8
oracle 2580 1 0 Nov 22 ? 1:22 ora_qmn0_m8
oracle 2582 1 0 Nov 22 ? 0:00 ora_s000_m8
oracle 2584 1 0 Nov 22 ? 0:00 ora_d000_m8
oracle 2931 1 0 Nov 22 ? 0:00 /oracle/product/9.2.0.1/bin/tnslsnr LISTENER -inherit
·在系统中检查资源包迁移状态
# cmviewcl
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db1 up running
db2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db1
oracle应用进程正常,资源包状态良好
当前由主机DB1的备用网卡lan2接管此前lan1的所有任务,即将心跳信号传输与业务数据传输均交付lan2处理。
无须切换资源包。
C. 未执行DB1的lan0 , lan1 , lan2同时断开,由于MC保护机制,当所有网络连接断开时主机会自动重新启动,因此为保护系统没有执行lan2断开操作。
D. 将原主机DB1所有断开网卡重新连接,包括lan0和lan1
E. 执行手工切换资源包,将资源包从DB1切换到DB2
·资源包切换执行过程
# cmhaltnode -f -v db1
Disabling package switching to all nodes being halted.
Disabling all packages from running on db1.
Warning: Do not modify or enable packages until the halt operation is completed
Halting Package oracle.
Halting cluster services on node db1.
Successfully halted all nodes specified.
Halt operation complete.
# tail –f /etc/cmcluster/oracle/*.log
Deactivated volume group in Exclusive Mode.
Volume group "vgdata1" has been successfully changed.
Nov 27 11:14:23 - Node "db1": Deactivating volume group vgdata2
Deactivated volume group in Exclusive Mode.
Volume group "vgdata2" has been successfully changed.
Nov 27 11:14:23 - Node "db1": Deactivating volume group vgdata3
Deactivated volume group in Exclusive Mode.
Volume group "vgdata3" has been successfully changed.
########### Node "db1": Package halt completed at Mon Nov 27 11:14:23
EAT 2006 ###########
·登陆db2的oracle系统,并检查返回值
# su oracle
password:
$ sqlplus "/ as sysbda"
SQL*Plus: Release 9.2.0.1.0 - Production on Mon Nov 27 11:44:08 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
SP2-0306: Invalid option.
Usage: CONN[ECT] [logon] [AS {SYSDBA|SYSOPER}]
where <logon> ::= <username>[/<password>][@<connect_string>] | /
Enter user-name:oraUser
Enter password:
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> exit
通过正常登陆数据库,表明资源包切换正常,oracle实例等相关信息已与db2主机建立关联。
·在db1系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
root 28248 28231 0 11:53:00 pts/ta 0:00 grep oracle
·在db2系统中检查oracle相关进程状态
#ps -ef |grep oracle
输出结果:
... ...
oracle 5769 1 0 11:12:22 ? 0:00 ora_cjq0_xinem8
oracle 5759 1 0 11:12:22 ? 0:00 ora_dbw0_xinem8
oracle 5773 1 0 11:12:22 ? 0:00 ora_s000_xinem8
oracle 5757 1 0 11:12:22 ? 0:00 ora_pmon_xinem8
... ...
·检查并确认资源包状态
# cmviewcl -v
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db2
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.22.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary down db1
Alternate up enabled db2 (current)
NODE STATUS STATE
db1 down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/1/2/0 lan0
PRIMARY unknown 0/2/1/0/6/0 lan1
STANDBY unknown 0/3/1/0/6/0 lan2
资源包切换成功!
F. 摘除DB2心跳线
DB1 lan0断开
DB2 lan0断开
·在系统中检查各系统网卡状态
#cmviewcl -v
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db2
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.22.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary down db1
Alternate up enabled db2 (current)
NODE STATUS STATE
db1 down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/1/2/0 lan0
PRIMARY unknown 0/2/1/0/6/0 lan1
STANDBY unknown 0/3/1/0/6/0 lan2
·在系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
... ...
oracle 5769 1 0 11:12:22 ? 0:00 ora_cjq0_xinem8
oracle 5759 1 0 11:12:22 ? 0:00 ora_dbw0_xinem8
oracle 5773 1 0 11:12:22 ? 0:00 ora_s000_xinem8
oracle 5757 1 0 11:12:22 ? 0:00 ora_pmon_xinem8
... ...
·在系统中检查资源包迁移状态
# cmviewcl
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db2
NODE STATUS STATE
db1 down halted
oracle应用进程正常,资源包状态良好。
此时心跳信号数据传输迁移至公共链路,即应用各主机lan1作为心跳信号传输网卡,同时lan1继续肩负业务数据传输的使命。
G. 保持心跳线断开状态,并摘除DB2的lan1网线
DB1 lan0断开
DB2 lan0断开
DB2 lan1断开
·在系统中检查各系统网卡状态
#cmviewcl -v
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY down 0/1/2/0 lan0
PRIMARY down 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db2
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.22.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary down db1
Alternate up enabled db2 (current)
NODE STATUS STATE
db1 down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/1/2/0 lan0
PRIMARY unknown 0/2/1/0/6/0 lan1
STANDBY unknown 0/3/1/0/6/0 lan2
·在系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
... ...
oracle 5769 1 0 11:12:22 ? 0:00 ora_cjq0_xinem8
oracle 5759 1 0 11:12:22 ? 0:00 ora_dbw0_xinem8
oracle 5773 1 0 11:12:22 ? 0:00 ora_s000_xinem8
oracle 5757 1 0 11:12:22 ? 0:00 ora_pmon_xinem8
... ...
·在系统中检查资源包迁移状态
# cmviewcl
输出结果:
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 up running
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db2
NODE STATUS STATE
db1 down halted
oracle应用进程正常,资源包状态良好
当前由主机DB1的备用网卡lan2接管此前lan1的所有任务,即将心跳信号传输与业务数据传输均交付lan2处理。
H. 未执行DB2的lan0 , lan1 , lan2同时断开,由于MC保护机制,当所有网络连接断开时主机会自动重新启动,因此为保护系统没有执行lan2断开操作
I. 将原主机DB2所有断开网卡重新连接,包括lan0和lan1
J. 执行手工资源包切回,将资源包从DB2切回到DB1
·资源包切回执行过程
# cmrunnode –v db1
cmrunnode : Validating network configuration...
Gathering configuration information ..
Gathering Network Configuration ...... Done
cmrunnode : Network validation complete
cmrunnode : Waiting for cluster to form.....
cmrunnode : Cluster successfully formed.
cmrunnode : Check the syslog files on all nodes in the cluster
cmrunnode : to verify that no warnings occurred during startup.
# cmhaltnode –f –v db2
Disabling package switching to all nodes being halted.
Disabling all packages from running on db2.
Warning: Do not modify or enable packages until the halt operation is completed.
Halting Package oracle.
Halting cluster services on node db2.
..
Successfully halted all nodes specified.
Halt operation complete.
# tail –f /etc/cmcluster/oracle/*.log
Deactivated volume group in Exclusive Mode.
Volume group "vgdata1" has been successfully changed.
Nov 28 09:43:53 - Node "db2": Deactivating volume group vgdata2
Deactivated volume group in Exclusive Mode.
Volume group "vgdata2" has been successfully changed.
Nov 28 09:43:53 - Node "db2": Deactivating volume group vgdata3
Deactivated volume group in Exclusive Mode.
Volume group "vgdata3" has been successfully changed.
########### Node "db2": Package halt completed at Tue Nov 28 09:43:53
EAT 2006 ###########
·登陆db1的oracle系统,并检查返回值
# su - oracle
(c)Copyright 1983-2000 Hewlett-Packard Co., All Rights Reserved.
(c)Copyright 1979, 1980, 1983, 1985-1993 The Regents of the Univ. of California
(c)Copyright 1980, 1984, 1986 Novell, Inc.
(c)Copyright 1986-1992 Sun Microsystems, Inc.
(c)Copyright 1985, 1986, 1988 Massachusetts Institute of Technology
(c)Copyright 1989-1993 The Open Software Foundation, Inc.
(c)Copyright 1986 Digital Equipment Corp.
(c)Copyright 1990 Motorola, Inc.
(c)Copyright 1990, 1991, 1992 Cornell University
(c)Copyright 1989-1991 The University of Maryland
(c)Copyright 1988 Carnegie Mellon University
(c)Copyright 1991-2000 Mentat Inc.
(c)Copyright 1996 Morning Star Technologies, Inc.
(c)Copyright 1996 Progressive Systems, Inc.
(c)Copyright 1991-2000 Isogon Corporation, All Rights Reserved.
RESTRICTED RIGHTS LEGEND
Use, duplication, or disclosure by the U.S. Government is subject to
restrictions as set forth in sub-paragraph (c)(1)(ii) of the Rights in
Technical Data and Computer Software clause in DFARS 252.227-7013
Hewlett-Packard Company
3000 Hanover Street
Palo Alto, CA 94304 U.S.A.
Rights for non-DOD U.S. Government Departments and Agencies are as set
forth in FAR 52.227-19(c)(1,2).
$ sqlplus "/ as sysdba"
SQL*Plus: Release 9.2.0.1.0 - Production on Tue Nov 28 09:55:48 2006
Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.1.0 - Production
SQL> exit
·在db1系统中检查oracle相关进程状态
# ps -ef |grep oracle
输出结果:
... ...
oracle 5769 1 0 11:12:22 ? 0:00 ora_cjq0_xinem8
oracle 5759 1 0 11:12:22 ? 0:00 ora_dbw0_xinem8
oracle 5773 1 0 11:12:22 ? 0:00 ora_s000_xinem8
oracle 5757 1 0 11:12:22 ? 0:00 ora_pmon_xinem8
... ...
·在db2系统中检查oracle相关进程状态
#ps -ef |grep oracle
输出结果:
root 28248 28231 0 11:53:00 pts/ta 0:00 grep oracle
·检查并确认资源包状态
# cmviewcl -v
输出结果:
如何分析并测试结果,等待与HP工程师确认
K. 为测试数据库脚本执行能力,我们有对DB1执行了强行关闭系统的操作
如何分析并测试结果,等待与HP工程师确认
HP MC ServiceGuard 操作命令
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
db2 down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/1/2 /0 lan0
PRIMARY unknown 0/2/1 /0/6/0 lan1
STANDBY unknown 0/3/1 /0/6/0 lan2
NODE STATUS STATE
db1 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2 /0 lan0
PRIMARY up 0/2/1 /0/6/0 lan1
STANDBY up
0/3/1
/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle up running enabled db1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.22.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled db1 (current)
Alternate down db2
补充:
上述测试完成,但仍遗留一个问题,试观察当前cmviewcl输出结果中db2状态 ... ...
NODE STATUS STATE
db2 down halted
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled db1 (current)
Alternate down db2
当前db2状态依旧为down,因此需要在手工切回资源包操作执行后,重新启动节点db2,即执行:
# cmrunnode -v db2
OK!现在可以再应用cmviewcl检查一下输出结果中db2的状态 ... ...
项目 | 描述 |
集群启停测试 | 集群能启动: #[/]cmruncl –v |
集群能停止: #[/]cmhaltcl –v –f | |
包停止、切换测试 | 当集群启动后,有一个运行包在运行: #[/]cmviewcl |
停止系统1,在系统1上运行的应用包都自动切换到系统2上: #[/]cmhaltnode –v –f [系统1主机名] #[/]cmviewcl 恢复系统1的节点。 #[/]cmrunnode [系统1主机名] #[/]cmmodpkg –e pkgname #[/]cmmodpkg –n [系统1主机名] –e pkgname | |
停止系统2,在系统2上运行的应用包都自动切换到系统1上: #[/]cmhaltnode –v –f [系统2主机名] #[/]cmviewcl 恢复系统2的节点。 #[/]cmrunnode [系统2主机名] #[/]cmmodpkg –e pkgname #[/]cmmodpkg –n [系统2主机名] –e pkgname | |
备注 |
|
当出现下述情况时,需要手工设置...
情况描述:
...
手工设置:
xine1[#/etc/cmcluster/oracle]cmmodpkg -v -e oracle ‘设置资源包oracle的Failover状态为auto
Enabling switching for package oracle.
cmmodpkg : Successfully enabled package oracle.
cmmodpkg : Completed successfully on all packages specified.
xine1[#/etc/cmcluster/oracle]tail /var/adm/syslog/syslog.log ‘检查配置生效
May 31 15:13:32 xine1 CM-CMD[8635]: cmmodpkg -v -e oracle
May 31 15:13:32 xine1 CM-CMD[8635]: Request from root on node xine1 to modify package switching
May 31 15:13:32 xine1 cmcld: Request from node xine1 to enable global switching for package oracle.
May 31 15:13:32 xine1 cmcld: Enabled switching for package oracle.
xine1[#/etc/cmcluster/oracle]cmviewcl -v ‘配置生效后执行结果
CLUSTER STATUS
cluster1 up
NODE STATUS STATE
xine1 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/2/1/0/6/0 lan1
STANDBY up 0/3/1/0/6/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
oracle starting starting enabled xine1
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Subnet up 172.24.22.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary down xine2
Alternate up enabled xine1 (current)
NODE STATUS STATE
xine2 down halted
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY unknown 0/1/2/0 lan0
PRIMARY unknown 0/2/1/0/6/0 lan1
STANDBY unknown 0/3/1/0/6/0 lan2