转自:http://www.db2china.net//home/space.php?uid=28836&do=blog&id=31085
IBM信息管理月刊第十期有很多大牛的HADR的文章,正好新接手的team用到了tsa+hadr, tsa这玩意以前没摸过,光见过,就是samp。V9.7自带安装的,历时大约3天左右,进行CentOS 6.4(64 bit)\RHEL 6.3(64,32bit)\SUSE 11 patch 1,最后成功安装了tsa
1 、安装 TSA
TSA支持平台:
SUSE SLES 10 (32-bit/64-bit),SUSE SLES 11 (32-bit/64-bit) ,Red Hat RHEL 5 (32-bit/64-bit),AIX 5.3 x,AIX 6.1 x,Solaris 10 (64-bit)
DB2 V9.7如果没有安装tsa,请先安装,安装文件位于解压后的server/db2/(plantform)/tsamp目录下,我的为server/db2/linuxamd64/tsamp,安装前先检查一下
# ./prereqSAM
prereqSAM: All prerequisites for the ITSAMP installation are met on operating system:
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 1
#./installSAM安装就可以了
我的环境
操作系统:SUSE Linux 11 patch 1
DB2:DB2 UDB Enterprise Server Edition(ESE)Version 9.7,Fixpak 8
TSA:TSA 3.1
2、配置rsh
由于SUSE LINUX不自带RSH-SERVER服务,所以首先要去从www.rpmfind.net下载rsh-server服务的RPM包,rpm –ivh rsh-server-0.17-715.1.x86_64.rpm
我使用两台虚拟机,安装suse linux,为了真实模拟,我每台机器设置两块网卡,
机器1:n4shost1:eth0:192.168.18.101,eth1:192.168.1.101
机器2:n4shost2:eth0:192.168.18.102,eth2:192.168.1.102
配置/etc/hosts,/root/.rhosts,,确认机群中每台机器都有三个相同的文件 /etc/hosts /etc/hosts.equiv /root/.rhosts
/etc/hosts.enquiv添加
+n4shost1 db2inst1 >> /etc/hosts.enquiv
+n4shost2 db2inst1 >> /etc/hosts.enquiv
/root/.rhosts
n4shost1 root
n4shost2 root
由于配置rsh不是高手,我将/home/db2inst1/.rhost也同样加入了相同的内容
n4shost1 db2inst1
n4shost2 db2inst1
编辑/etc/securetty文件 添加rsh rexec rlogin三个服务
编辑/etc/xinetd.d/rexec、rlogin、rsh三个服务,将“disable = YES”改成“disable = NO”,需要去掉或者注释掉
重启服务service xinetd restart
试验rsh n4shost1ls
rsh n4shost2ls
如果配置正确执行就能看到相应的$HOME下的文件列表
说明:RHEL配置rsh时需要进行重命名操作,不修改貌似会报一冲突的错误,没试过(拍砖莫拍我,拍我原同事吧)
#cd /usr/Kerberos/bin
#mv rsh rsh.bak
#mv rcp rcp.bak
#mv rlogin rlogin.bak
3、配置HADR
在两台VMServer上/etc/services文件后添加下列内容:
#echo DB2_stab_1 60011/tcp
为了验证hadr,我开启了备机可读。
db2set DB2_HADR_ROS=ON
db2set DB2_STANDBY_ISO=UR(如果此变量不设置,在select时需显式的加上with ur)
db2set DB2_HADR_PEER_WAIT_LIMIT=10(db2set DB2_HADR_PEER_WAIT_LIMIT=10(考虑设置DB2_HADR_PEER_WAIT_LIMIT注册表变量,该变量使您能够防止主数据库日志记录由于备用数据库速度较慢或处于阻塞状态而受到阻塞)
做HADR时主备机最好配置相同,如果standby慢的话并且primary非常繁忙可能导致standby重放速度跟不上主机。
Primary Server操作:
$ db2 "create db stab using codeset utf-8 territory us"
$ db2 "update db cfg for stab using autorestart offLOGARCHMETH1 DISK:/db2/log_archive/stab NEWLOGPATH /db2/log_online/stab trackmod on logindexbuild on indexrec restart HADR_LOCAL_HOST 192.168.18.101 HADR_LOCAL_SVC DB2_stab_1 HADR_REMOTE_HOST 192.168.18.102 HADR_REMOTE_SVC DB2_stab_1 HADR_REMOTE_INST db2inst1 HADR_TIMEOUT 60 HADR_SYNCMODE SYNC HADR_PEER_WINDOW 180"
$ db2 "backup db stab"
备份完成后拷贝到standby server
Standby Server操作:
$db2 "restore db stabreplace history file without prompting"
$db2 "update db cfg for stab using HADR_LOCAL_HOST 192.168.18.102 HADR_REMOTE_HOST 192.168.18.101"
启动备机:
$db2 "start hadr on db stab as standby"
启动主机:
$db2 "start hadr on db stab as primary"
--查看HADR状态
db2pd -d macc -hadr
4、配置TSA
HADR虽好但是它不能监控主数据库服务器上发生的故障,比如网络问题。并且
HADR只能是由数据库管理员手工运行takeover切换数据库的状态。此时就要用到
Tivoli SAMP帮助HADR进行主数据库服务器状态的监控和角色的自动改变。使用HADR和Tivoli SAMP配置成一个自动故障转移的高可用系统。
4.1初始化RSCT集群节点
以下操作由root用户在n4shost1和n4shost2执行
4.2清除集群配置
以下操作由db2inst1用户在n4shost1和n4shost2执行
4.3HADR Standby配置
以下操作由db2inst1用户在n4shost2执行
#db2haicu
输入1按回车,再输入域名hadr_domain
Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
hadr_domain
出现域节点数输入提示,输入2,按回车,出现节点名的提示,分别输入db2prd01和db2prd02,回车,出现创建域的提示,按回车进行域创建,如下:
How many cluster nodes will the domain hadr_domain contain?
2
Enter the host name of a machine to add to the domain :
n4shost1
Enter the host name of a machine to add to the domain :
n4shost2
db2haicu can now create a new domain containing the 2 machines the you specified. If you choose not to create a domain now,db2haicu will exit.
Create the domain now?[1]
1. Yes
2. No
1
Creating domain hadr_domain in the cluster …
Creating domain hadr_domain in the clusterwas successful
出现仲裁IP配置提示,按两次回车,出现仲裁IP输入提示,输入192.168.1.1,再按回车,如下:
Configure a quorum device for the domain called hadr_domain?[1]
1.Yes
2.No
The following is a list of supported quorum device types:
1.Network Quorum
Enter the number corresponding to the quorm device type to be used:[1]
Specify the network address of the quorumdevice:
192.168.1.1
Configuring quorum device for domain hadr_domain…
Configuring quorum device for domain hadr_domain was successful.
出现网络配置提示,按回车进行网络配置,如下:
Create networks for these network interface cards?[1]
1.Yes
2.No由于统一打算eth0定义为私有网络专用,eth1为公用网络专用,
Enter the name of the network for the network interface card: eth0 on cluster node: n4shost1.sapdemo.com
1. Create a new public network for this network interface card.
2. Create a new private network for this network interface card.
Enter selection:
2
Are you sure you want to add the network interface card eth0 on cluster node n4shost1.sapdemo.com to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth0 on cluster node n4shost1.sapdemo.com to the network db2_private_network_0 ...
Adding network interface card eth0 on cluster node n4shost1.sapdemo.com to the network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: eth0 on cluster node: n4shost2.sapdemo.com
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card eth0 on cluster node n4shost2.sapdemo.com to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth0 on cluster node n4shost2.sapdemo.com to the network db2_private_network_0 ...
Adding network interface card eth0 on cluster node n4shost2.sapdemo.com to the network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: eth1 on cluster node: n4shost1.sapdemo.com
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
2
Are you sure you want to add the network interface card eth1 on cluster node n4shost1.sapdemo.com to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth1 on cluster node n4shost1.sapdemo.com to the network db2_public_network_0 ...
Adding network interface card eth1 on cluster node n4shost1.sapdemo.com to the network db2_public_network_0 was successful.
Enter the name of the network for the network interface card: eth1 on cluster node: n4shost2.sapdemo.com
1. db2_public_network_0
2. db2_private_network_0
3. Create a new public network for this network interface card.
4. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card eth1 on cluster node n4shost2.sapdemo.com to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth1 on cluster node n4shost2.sapdemo.com to the network db2_public_network_0 ...
Adding network interface card eth1 on cluster node n4shost2.sapdemo.com to the network db2_public_network_0 was successful.
Retrieving high availability configuration parameter for instance db2inst1 ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, see the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the high availability configuration parameter?
出现实例TSA配置提示,将修改实例相关TSA参数
The following are valid settings for the high availability configuration parameter:
1.TSA
2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance db2inst1 to TSA.
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
出现数据库ITIM HADR配置验证提示,按回车,输入n4shost1和n4shost2作为私有网络节点名
Do you want to validate and automate HADR failover for the HADR database STAB? [1]
1. Yes
2. No
1
Adding HADR database STAB to the domain ...
The cluster node 192.168.18.101 was not found in the domain. Please re-enter the host name.
n4shost1
The cluster node 192.168.18.102 was not found in the domain. Please re-enter the host name.
n4shost2
Adding HADR database STAB to the domain ...
The HADR database STAB has been determined to be valid for high availability. However, the database cannot be added to the cluster from this node because db2haicu detected this node is the standby for the HADR database STAB. Run db2haicu on the primary for the HADR database STAB to configure the database for automated failover.
All cluster configurations have been completed successfully. db2haicu exiting ..
4.4 HADR Primary配置
以下操作由db2inst1用户在n4shost1执行
db2haicu
出现实例TSA配置提示,将修改实例相关TSA参数,按回车,如下:
Retrieving high availability configuration parameter for instance db2inst1 ...
The cluster manager name configuration parameter (high availability configuration parameter) is not set. For more information, see the topic "cluster_mgr - Cluster manager name configuration parameter" in the DB2 Information Center. Do you want to set the high availability configuration parameter?
The following are valid settings for the high availability configuration parameter:
1.TSA
2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance db2inst1 to TSA.
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
出现数据库STAB HADR配置验证提示,按回车,输入n4shost2和n4shost1作为网络节点名,如下:
Do you want to validate and automate HADR failover for the HADR database STAB? [1]
1. Yes
2. No
Adding HADR database STAB to the domain ...
The cluster node 192.168.18.102 was not found in the domain. Please re-enter the host name.
n4shost2
The cluster node 192.168.18.101 was not found in the domain. Please re-enter the host name.
n4shost1
Adding HADR database STAB to the domain ...
Adding HADR database STAB to the domain was successful.
出现数据库ITIM虚拟IP的配置提示,输入1,输入192.168.1.188作为虚拟IP,输入虚拟IP的子网掩码,按回车,再输入1将虚拟IP加入公有网络,如下:
Do you want to configure a virtual IP address for the HADR database STAB? [1]
1. Yes
2. No
1
Enter the virtual IP address:
192.168.1.188
Enter the subnet mask for the virtual IP address 192.168.1.188: [255.255.255.0]
255.255.255.0
Select the network for the virtual IP 192.168.1.188:
1. db2_public_network_0
2. db2_private_network_0
Enter selection:
1
Adding virtual IP address 192.168.1.188 to the domain ...
Adding virtual IP address 192.168.1.188 to the domain was successful.
All cluster configurations have been completed successfully. db2haicu exiting ...
4.5 TSA验证
以下操作由db2inst1用户在n4shost1和n4shost2执行
$lssam
Online IBM.ResourceGroup:db2_db2inst1_db2inst1_STAB-rg Nominal=Online
|- Online IBM.Application:db2_db2inst1_db2inst1_STAB-rs
|- Online IBM.Application:db2_db2inst1_db2inst1_STAB-rs:n4shost1
'- Offline IBM.Application:db2_db2inst1_db2inst1_STAB-rs:n4shost2
'- Online IBM.ServiceIP:db2ip_192_168_1_188-rs
|- Online IBM.ServiceIP:db2ip_192_168_1_188-rs:n4shost1
'- Offline IBM.ServiceIP:db2ip_192_168_1_188-rs:n4shost2
Online IBM.ResourceGroup:db2_db2inst1_n4shost1_0-rg Nominal=Online
'- Online IBM.Application:db2_db2inst1_n4shost1_0-rs
'- Online IBM.Application:db2_db2inst1_n4shost1_0-rs:n4shost1
Online IBM.ResourceGroup:db2_db2inst1_n4shost2_0-rg Nominal=Online
'- Online IBM.Application:db2_db2inst1_n4shost2_0-rs
'- Online IBM.Application:db2_db2inst1_n4shost2_0-rs:n4shost2
Online IBM.Equivalency:db2_db2inst1_db2inst1_STAB-rg_group-equ
|- Online IBM.PeerNode:n4shost1:n4shost1
'- Online IBM.PeerNode:n4shost2:n4shost2
Online IBM.Equivalency:db2_db2inst1_n4shost1_0-rg_group-equ
'- Online IBM.PeerNode:n4shost1:n4shost1
Online IBM.Equivalency:db2_db2inst1_n4shost2_0-rg_group-equ
'- Online IBM.PeerNode:n4shost2:n4shost2
Online IBM.Equivalency:db2_private_network_0
|- Online IBM.NetworkInterface:eth0:n4shost1
'- Online IBM.NetworkInterface:eth0:n4shost2
Online IBM.Equivalency:db2_public_network_0
|- Online IBM.NetworkInterface:eth1:n4shost1
'- Online IBM.NetworkInterface:eth1:n4shost2
4.5模拟网络故障、宕机
如果TSA验证成功,并且你的虚拟IP正常工作,现在就来模拟一下故障吧,primary上模拟拔掉网线、db2_kill,kill -9 DB2进程,重启,掉电,随便玩一下吧。。。
5、遇到问题
2632-044 the domain cannot be created due to the following errors that were detected while harvesting information from the target nodes:
node1: 2632-068 this node has the same internal identifier as node2 and cannot be included in the domain definition.”
由于本人比较懒,拷贝了suse Linux的虚拟机,为了解决这样的问题,在developer work上找到可以解决方案,在错误消息中指出的节点上,以root身份运行/usr/sbin/rsct/install/bin/recfgct命令来重新设置节点ID。然后从preprpnode命令开始继续设置。还可能会收到下面这样的错误消息:
“2632-044 The domain cannot be created due to the following errors that were detected while harvesting information from the target nodes:
node1: 2610-418 Permission is denied to access the resources or resource class specified in this command.”,这个我发现developer work说的不太准,我设置了/home/db2inst1/.rhost文件解决。
DB2 V10.1我第一次配置成功,后来配置死活搞不定,以后有时间重试,朋友在RHEL 5.8配置成功,我用6.3安装tsa时验证不能通过,后放弃,不过TSA3.1肯定支持RHEL6,我这的生产环境就是RHEL6,IBM实验室的人给装的,不知道这帮家伙怎么骗过了安装验证,我看其安装脚本中有不进行OS系统的验证,自己试验没成功。感谢我在品恩的原同事小田同学,他是Linux high hand,他写的RHEL5.8配置hadr与tsa比我这个详细的多太多了。
参考:在Tivoli System Automation集群域中实现DB2高可用性灾难恢复分步说明实现过程http://www.ibm.com/developerworks/cn/data/library/techarticles/dm-0704sundaram/
Implement DB2 high availability disaster recovery in a Tivoli System Automation cluster domain
http://www.dbatodba.com/db2/how-to-do/hadr-and-tsa
DB2 HADR setup with TSA using db2haicu and Virtual IP.
http://www-01.ibm.com/support/docview.wss?uid=swg21439218
High Availability and Disaster Recovery Options for DB2 on Linux, UNIX, and Windows (Chap 8: DB2 with TSA)
https://www.e-techservices.com/redbooks/HA+DRforDB2.pdf
后记:RHEL6.3上成功搞定,安装tsa 3.22,安装包未完全验证