这段时间单位所有系统上线前的audit,sat(test),以及各种fail over(电,网络,cluster,存储等),忙得不亦乐乎。
所以cluster包的测试耽搁了两天,今天晚上总算找点时间配置测试完毕。更新如下:
- 开始测试package
- 增加新的共享vg1:
- 增加vg1
- 增加新的共享vg1:
netapp上增加lun
Netapp#lun show
/vol/vol1/iscsi1/lun0 10g (10737418240) (r/w, online, unmap)
NetApp#lun map /vol/vol1/iscsi1/lun0 ig0
lun map: auto-assigned ig0=1
Db1#ioscan -H 255
DB1#insf -H 255
DB!#ioscan -funC disk
发现新硬盘
disk 16 255/0/5.0.0.1 sdisk CLAIMED DEVICE NETAPP LUN
/dev/dsk/c8t0d1 /dev/rdsk/c8t0d1
# mkdir /dev/vgdb1
# mknod /dev/vgdb1/group c 64 0x020000
# vgcreate /dev/vgdb1 /dev/dsk/c8t0d1
Increased the number of physical extents per physical volume to 2559.
Volume group "/dev/vgdb1" has been successfully created.
Volume Group configuration for /dev/vgdb1 has been saved in /etc/lvmconf/vgdb1.conf
# lvcreate -n lv_db1 /dev/vgdb1
Logical volume "/dev/vgdb1/lv_db1" has been successfully created with
character device "/dev/vgdb1/rlv_db1".
Volume Group configuration for /dev/vgdb1 has been saved in /etc/lvmconf/vgdb1.conf
#lvextend -l 2559 /dev/vgdb1/lv_db1
#newfs -F vxfs /dev/vgdb1/rlv_db1
# mkdir /db1
# mount /dev/vgdb1/lv_db1 /db1
#umount /db1
#vgchange -a n /dev/vgdb1
在第二个节点上类似操作(查询设备,不建立vg)
导出vg信息到第二个节点:
# vgexport -v -s -p -m /tmp/vgdb1.map vgdb1
# vgexport -v -s -p -f /tmp/vgdb1pv vgdb1
Beginning the export process on Volume Group "vgdb1".
/dev/dsk/c8t0d1
# rcp /tmp/vgdb1.map /tmp/vgdb1pv db2:/tmp
在第二个节点上:
# mkdir /dev/vgdb1
You have mail in /var/mail/root
# mknod /dev/vgdb1/group c 64 0x020000
# vgimport -v -f /tmp/vgdb1pv -m /tmp/vgdb1.map vgdb1
Beginning the import process on Volume Group "vgdb1".
vgimport: Warning: Volume Group belongs to different CPU ID.
Can not determine if Volume Group is in use on another system. Continuing.
Logical volume "/dev/vgdb1/lv_db1" has been successfully created
with lv number 1.
Volume group "/dev/vgdb1" has been successfully created.
Warning: A backup of this volume group may not exist on this machine.
Please remember to take a backup using the vgcfgbackup command after activating the volume group.
激活查看属性ok
- 修改cluster配置文件:
- Mcdb.conf增加:
USER_NAME oracle
USER_HOST CLUSTER_MEMBER_NODE
- Mcdb.conf增加:
USER_ROLE MONITOR
VOLUME_GROUP /dev/vgdb1
#cmcheckconf -v -C /etc/cmcluster/mcdb.conf
#cmapplyconf -v -C /etc/cmcluster/mcdb.conf
- 安装oracle
- 参照oracle文档,现在的环境是已经有一个数据库,准备迁移到cluster中
参数需要修改的:controlfiles:/u01/oradata/ora10/control01.ctl,02.ctl
db_recovery_file_dest:/u01/flash_recovery_area
Spfile:/opt/oracle/products/ora10/dbs/spfileora10.ora
然后数据库文件的整体迁移.rman
密码文件和参数文件放到共享存储上:重建密码文件,参数文件用
Flash_recovery_area迁移 - 增加node2节点上的opt大小扩充到20G
/opt
LV Name /dev/vg00/lvol6
- 参照oracle文档,现在的环境是已经有一个数据库,准备迁移到cluster中
LV Status available/syncd
LV Size (Mbytes) 20480
Current LE 640
Allocated PE 640
Used PV 1
Node2#lvextend -L 20480 /dev/vg00/lvol6
# extendfs -F vxfs /dev/vg00/lvol6--->error
vxfs extendfs: /dev/vg00/lvol6 is mounted, cannot extend.
#fsadm -F vxfs -b 20460M /opt
- 手工建库:
只在第一个节点装了数据库软件,而且建立了一个库(本地);第二个节点没装软件也没建库。能不能偷偷懒,把软件直接拷贝到第二台机器,然后把库迁移到共享存储上呢。可以:
先把oracle的目录整体拷贝看能用不,/var/ /etc/ /opt
SQL> alter system set db_recovery_file_dest='/db1/flash_recovery_area';
SQL> alter system set control_files='/db1/oradata/ora10/control01.ctl,/db1/oradata/ora10/control02.ctl' scope=spfile;
spfile拷贝到共享目录上,然后pfile建在本地,指向其.spfile=/db1/dbs/spfileora10.ora
controlfile改错了,alter system set control_files='/db1/oradata/ora10/control01.ctl','/db1/oradata/ora10/control02.ctl' scope=spfile;
Rename all datafile and redo log file to new location.
SQL>alter database open;
Copy initora10文件到node2上
手工挂接node2的共享存储,
password文件也可以考虑放到共享存储上。
OK!db 放到存储上了
- 建立cluster package
- Cd /etc/cmcluster
- Mkdir db1
- Cd db1
# cmmakepkg -p db1.conf
Package template is created.
This file must be edited before it can be used.
# cmmakepkg -s db1.cntl
Package control script is created.
This file must be edited before it can be used.
编辑两个文件。
Db1.cntl(默认值部分不再重复,只列出修改部分)
VG[0]=vgdb1
LV[0]="/dev/vgdb1/lv_db1"; FS[0]="/db1"; FS_MOUNT_OPT[0]=""
IP[0]="10.68.14.225"
SUBNET[0]="10.68.14.0"
SERVICE_NAME[0]="DB_RESOURCE"
SERVICE_CMD[0]="/etc/cmcluster/db1/oracle.sh monitor"
SERVICE_RESTART[0]="-r 2"
#SERVICE_NAME[1]="LSNR_RESOURCE"
#SERVICE_CMD[1]="/etc/cmcluster/db1/oracle.sh listener_monitor"
#SERVICE_RESTART[1]="-r 2"
function customer_defined_run_cmds
{
/etc/cmcluster/db1/oracle.sh startup
test_return 51
}
function customer_defined_halt_cmds
{
/etc/cmcluster/db1/oracle.sh shutdown
test_return 52
}
Db1.conf
PACKAGE_NAME db1
NODE_NAME db1
NODE_NAME db2
RUN_SCRIPT /etc/cmcluster/db1/db1.cntl
RUN_SCRIPT_TIMEOUT NO_TIMEOUT
HALT_SCRIPT /etc/cmcluster/db1/db1.cntl
HALT_SCRIPT_TIMEOUT NO_TIMEOUT
SERVICE_NAME DB_RESOURCE
SERVICE_FAIL_FAST_ENABLED NO
SERVICE_HALT_TIMEOUT 30
#SERVICE_NAME LSNR_RESOURCE
#SERVICE_FAIL_FAST_ENABLED NO
#SERVICE_HALT_TIMEOUT 30
SUBNET 10.68.14.0
另外,单独写了一个oracle.sh
#!/usr/bin/sh
ORACLE_HOME=/opt/oracle/products/ora10
SID_NAME=ora10
export ORACLE_HOME
export ORACLE_SID=${SID_NAME}
#monitor interval
MONITOR_INTERVAL=10
#monitor process
set -A DBA_MONITOR_PROCESSES ora_smon_${SID_NAME} ora_pmon_${SID_NAME} ora_lgwr_${SID_NAME} ora_dbw0_${SID_NAME} ora_ckpt_${SID_N
AME} LISTENER
function db_shutdown
{
print "Begin listener shutdown at `date`!"
su - oracle -c ${ORACLE_HOME}/bin/lsnrctl <<EOF
stop
exit
print "End listener shutdown at `date`!"
EOF
print "begin instance shutdown at `date`"
su - oracle -c ${ORACLE_HOME}/bin/sqlplus /nolog <<EOF
connect / as sysdba
shutdown immediate
EOF
print "end instance shutdown at `date`"
}
function db_startup
{
print "Begin listener startup at `date`!"
su - oracle -c ${ORACLE_HOME}/bin/lsnrctl <<EOF
connect / as sysdba
startup
EOF
print "End oracle startup at `date`."
}
function db_monitor
{
sleep ${MONITOR_INTERVAL}
typeset -i n=0
for i in ${DBA_MONITOR_PROCESSES[@]}
do
DBA_MONITOR_PROCESSES_PID[$n]=`ps -fu oracle | awk '/'${i}'/ { print $2 }'`
if [[ ${DBA_MONITOR_PROCESSES_PID[$n]} = "" ]]
then
print "/n"
print "/n *** ${i} has failed at startup time, ABORTING Oracle! ***"
exit
fi
(( n = n + 1 ))
done
sleep ${MONITOR_INTERVAL}
set -A MONITOR_PROCESSES_PID ${DBA_MONITOR_PROCESSES_PID[@]}
while true
do
for i in ${MONITOR_PROCESSES_PID[@]}
do
kill -s 0 ${i} > /dev/null
if [[ $? != 0 ]]
then
print "/n"
print "/n *** ${i} has failed at startup time, ABORTING Oracle! ***"
exit
fi
done
#check_ext_proc_lsnr
sleep ${MONITOR_INTERVAL}
done
}
if [[ $# != 1 ]]; then
print "/n *** ${0} called with an incorrect number of arguments ***/n"
print "Usage: ${0} [ shutdown | startup | monitor ]"
print "$#: $@"
exit
fi
print "/n *** $0 called with $1 argument! ***/n"
case $1 in
startup)
db_startup
;;
shutdown)
db_shutdown
;;
monitor)
db_monitor
;;
*)
print "Usage: ${0} [ shutdown | startup | monitor ]"
;;
esac
Rcp 3 files to node2
cmcheckconf -v -C /etc/cmcluster/mcdb.conf -P /etc/cmcluster/db1/db1.conf
cmapplyconf -v -C /etc/cmcluster/mcdb.conf -P /etc/cmcluster/db1/db1.conf
apply的时候,只有conf会自动分发,但其他用户自定义的文件-比如脚本比如cntl,不会自动分发。-需要手动分发。
switch自动切换的问题
两个node分别用cmmodpkg来修正auto switch的问题。
我脚本写的挺烂的,东西参考一些,南北自造一通。开始是oracle都跑不起来,后来总算跑起来了,不过跑完就断。服务monitor写了个空函数,就造成了早早退出。
后来抄了一个monitor程序,总体就是一个死循环,然后定时轮训进程状态。
后来服务注释了,竟然还能起来,找到原因(conf中没有删除服务):
# cmviewcl -v
CLUSTER STATUS
mcdb up
NODE STATUS STATE
db1 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/1/2/1 lan1
STANDBY up 0/3/1/0 lan2
NODE STATUS STATE
db2 up running
Network_Parameters:
INTERFACE STATUS PATH NAME
PRIMARY up 0/1/2/0 lan0
PRIMARY up 0/1/2/1 lan1
STANDBY up 0/3/1/0 lan2
PACKAGE STATUS STATE AUTO_RUN NODE
db1 up running enabled db2
Policy_Parameters:
POLICY_NAME CONFIGURED_VALUE
Failover configured_node
Failback manual
Script_Parameters:
ITEM STATUS MAX_RESTARTS RESTARTS NAME
Service uninitia 0 0 DB_RESOURCE
Service uninitia 0 0 LSNR_RESOURCE
Subnet up 10.68.14.0
Node_Switching_Parameters:
NODE_TYPE STATUS SWITCHING NAME
Primary up enabled db1
Alternate up enabled db2 (current)
listener启动失败原因:2->1拷贝的,listener.ora中ip没变
2, 两个节点的ip都应该是vip, /etc/hosts和
手工调试脚本先
./oracle.sh startup
./oracle.sh monitor
./oracle.sh shutdown
还有几个问题
1)手工停止db后,package状态还是up
2)需要加个debug或者maint状态