My test environment:
Master System: winXP T400
Virtual Software: GXS vmware3.2.1
guest system: as5u1
oracle db: oracle10.2.1
As the cause of the accident two days ago, my colleagues, under the new plug power supply, so that my test environment rac unexpected power outages, resulting in a vote (votingdisk) virtual disk fails, causing systems to get up. How do? I was thinking is that until the last moment, determined not to reload. Recovery process is also familiar with the process.
As the votingdisk disk failure, the system can not start, reporting votingdisk disk is not recognized, in order to allow the system to start, I votingdisk disk in vmware in re-add it again, before the system can restart. System up, the node rac1 gnome encountered the bug, better not affect the normal work rac.
Because my votedisk damage, and there is no backup, we can only rebuild votedisk
votedisk damage can only be reconstructed in two ways votedisk
1. If votedisk back up, down with the backup and recovery
eg:
Backup votedisk: dd if = / dev/raw/raw2 f = / tmp / votedisk.bak
Recovery votedisk: dd if = / tmp / votedisk.bak f = / dev/raw/raw2
2. No votedisk backup, then it can only reinstall clusterware the (votedisk the information is deleted or added in the installation of clusterware node node to add the information to go in and have ocssd process maintenance)
According to my description of the failure phenomenon, now can only reinstall clusterware to re-initialize votedisk, has achieved the purpose of restoration
This is the fastest way
Steps:
1. Stop all node clusterware stack
[Root @ rac2 ~] # su - oracle
[Oracle @ rac2 ~] $ su
Password:
[Root @ rac2 oracle] # crsctl stop crs
2. Backup directory for each node clusterware
[Root @ rac1 oracle] # mv / u01/crs / u01/crsbak
3. In order to avoid the new configuration information to configure and old conflicts, the best clear the configuration information must clusterware
Clean up the old configuration information (in each node is running)
3.1 crs from start to delete information
Modify / etc / inittab, remove the following three lines.
h1: 2: respawn: / etc / init.evmd run> / dev / null 2> & 1 dev / null
h2: 2: respawn: / etc / init.cssd fatal> / dev / null 2> & 1 dev / null
h3: 2: respawn: / etc / init.crsd run> / dev / null 2> & 1 dev / null
linux:
rm / etc / oracle / *
rm-f / etc / init.d / init.cssd
rm-f / etc / init.d / init.crs
rm-f / etc / init.d / init.crsd
rm-f / etc / init.d / init.evmd
rm-f / etc/rc2.d/K96init.crs
rm-f / etc/rc2.d/S96init.crs
rm-f / etc/rc3.d/K96init.crs
rm-f / etc/rc3.d/S96init.crs
rm-f / etc/rc5.d/K96init.crs
rm-f / etc/rc5.d/S96init.crs
rm-Rf / etc / oracle / scls_scr
rm-f / etc / inittab.crs
cp / etc / inittab.orig / etc / inittab
ps-ef | grep init.d
/ Etc / init.d / init.crsd
/ Etc / init.d / init.evmd
/ Etc / init.d / init.cssd
3.2 clean up the old configuration information
rm-rf / etc / oracle / * (delete ocr.loc)
rm-rf / var / tmp / .oracle or / tmp / .oracle
3.3 Use dd clear vote disk and ocr (raw device)
dd if = / dev / zero f = / dev / votedisk_device bs = 8192 count = 2560
dd if = / dev / zero f = / dev / ocr_device bs = 8192 count = 12800
Reference:
Clear 10g RAC CRS method
http://6month.itpub.net/post/37672/470422
4. Were performed at each node $ CRS_HOME / install / rootdelete.sh
5. In any node on the implementation of the script. $ CRS_HOME / install / rootdeinstall.sh, only one node can run
Crs above configuration is deleted, the following is re-install crs, execute the following command to check if there is no return value, it can continue to install
ps-e | grep-i 'ocs [s] d'
ps-e | grep-i 'cr [s] d.bin'
ps-e | grep-i 'ev [m] d.bin'
6. And in step 5, a node with the script. $ CRS_HOME / root.sh
7. In other node implementation of the $ CRS_HOME / root.sh, and then note the last node of the output and, finally, run ". / VIPCA"
8. With netca reconfigure listener, to confirm the registration to ocr
This time with crs_stat-t-v can see the listener, ONS, GSD, VIP registration to ocr, the also need to asm, the database also registered in ocr
9. ASM registered to ocr
srvctl add asm-n rac1-i + ASM1-o $ ORACLE_HOME
srvctl add asm-n rac2-i + ASM2-o $ ORACLE_HOME
10. Start ASM
srvctl start asm-n rac1
srvctl start asm-n rac2
Usually starts when the last instance will asm error, generally because of rac can not confirm which network to use as a private interconnect
So can two ASM instance pfile to add the following two parameters
Add two parameters in the file / u01/app/oracle/product/10.2.0/db_1/dbs/init + ASM2.ora
+ ASM1.cluster_interconnects = '192 .168.0.31 '
+ ASM2.cluster_interconnects = '192 .168.0.22 '
ip must write right, or have problems, and then restart the asm instance, the problem can be solved
11. Manual registration database objects (database name and db_name to the same case)
srvctl add database-d RAC-o $ ORACLE_HOME
12. Manual registration of two instances of objects (instances of the same name to and isntance_name case)
srvctl add instance-d RAC-i rac1-n rac1
srvctl add instance-d RAC-i rac2-n rac2
13. Modify instances and examples of correspondence between asm
srvctl modify instance-d RAC-i RAC1-s + ASM1
srvctl modify instance-d RAC-i RAC2-s + ASM2
14. Start the database (in this process will usually error, so as to hand-asm instance of the specified private interconnect)
sql> alter system set cluster_interconnects = '192 .168.0.31 'scope = spfile sid =' * '
sql> alter system set cluster_interconnects = '192 .168.0.22 'scope = spfile sid =' * '
Then start the database
srvctl start database-d RAC
15. Check ons is normal
[Root @ rac2 oracle] # ps-ef | grep ons
root 4008 10066 0 08:39 pts / 0 00:00:00 grep ons
root 5660 1 0 06:42? 00:00:00 sendmail: accepting connections
oracle 16107 1 0 07:21? 00:00:00 / u01/crs/oracle/product/10.2.0/crs/opmn/bin/ons-d
oracle 16108 16107 0 07:21? 00:00:00 / u01/crs/oracle/product/10.2.0/crs/opmn/bin/ons-d
[Root @ rac2 oracle] # onsctl ping
Number of onsconfiguration retrieved, numcfg = 0
ons is not running ...
[Root @ rac2 oracle] # onsctl start
Number of onsconfiguration retrieved, numcfg = 0
Number of onsconfiguration retrieved, numcfg = 0
onsctl: ons started
[Root @ rac2 oracle] # onsctl ping
Number of onsconfiguration retrieved, numcfg = 0
ons is running ...
[Root @ rac2 oracle] #
If ons can not start, you can use the following command, you can also directly edit the file $ CRS_HOME / opmn / conf / ons.config
racgons add_config hostname1: port hostname2: port
racgons remove_config hostname1: port hostname2: port
16. Check whether the network command oifcfg normal
eg: add interface configuration
oifcfg setif-global eth0/192.168.2.0: public
oifcfg setif-global eth1/192.168.0.0: cluster_interconnect
17. With the following two commands verify the correctness of configuration
1. [Oracle @ rac1 ~] $ / tmp/10201_clusterware_linux32/clusterware/cluvfy/runcluvfy.sh stage-post crsinst-n rac1, rac2
2. Crs_stat-t-v
This votingdisk already successfully re-configured, but also encountered in the process are many problems, the following is the summary of issues, including the third issue left a lot of detours, with a small half-time, other problems are are resolved quickly.
1. As votingdisk disk failure, so re-add the removed disk in vmware
2. Crs start-up problems (time synchronization)
3. Asm2 not start instance of the problem (the point written as a comma)
4. Can not start a database problem
5. Onsctl startup problems
6. Change / dev / raw is the main problem
7. Oifcfg home network problems
Specific problems and solutions, please refer to another article: http://blog.csdn.net/wyzxg/archive/2010/05/09/5572418.aspx
Reference document:
Reconstruction votedisk / ocr
http://www.laoxiong.net/10g_rebuild_crs_rac.html
http://www.dbasoul.com/2010/700.html
------ End ------
Master System: winXP T400
Virtual Software: GXS vmware3.2.1
guest system: as5u1
oracle db: oracle10.2.1
As the cause of the accident two days ago, my colleagues, under the new plug power supply, so that my test environment rac unexpected power outages, resulting in a vote (votingdisk) virtual disk fails, causing systems to get up. How do? I was thinking is that until the last moment, determined not to reload. Recovery process is also familiar with the process.
As the votingdisk disk failure, the system can not start, reporting votingdisk disk is not recognized, in order to allow the system to start, I votingdisk disk in vmware in re-add it again, before the system can restart. System up, the node rac1 gnome encountered the bug, better not affect the normal work rac.
Because my votedisk damage, and there is no backup, we can only rebuild votedisk
votedisk damage can only be reconstructed in two ways votedisk
1. If votedisk back up, down with the backup and recovery
eg:
Backup votedisk: dd if = / dev/raw/raw2 f = / tmp / votedisk.bak
Recovery votedisk: dd if = / tmp / votedisk.bak f = / dev/raw/raw2
2. No votedisk backup, then it can only reinstall clusterware the (votedisk the information is deleted or added in the installation of clusterware node node to add the information to go in and have ocssd process maintenance)
According to my description of the failure phenomenon, now can only reinstall clusterware to re-initialize votedisk, has achieved the purpose of restoration
This is the fastest way
Steps:
1. Stop all node clusterware stack
[Root @ rac2 ~] # su - oracle
[Oracle @ rac2 ~] $ su
Password:
[Root @ rac2 oracle] # crsctl stop crs
2. Backup directory for each node clusterware
[Root @ rac1 oracle] # mv / u01/crs / u01/crsbak
3. In order to avoid the new configuration information to configure and old conflicts, the best clear the configuration information must clusterware
Clean up the old configuration information (in each node is running)
3.1 crs from start to delete information
Modify / etc / inittab, remove the following three lines.
h1: 2: respawn: / etc / init.evmd run> / dev / null 2> & 1 dev / null
h2: 2: respawn: / etc / init.cssd fatal> / dev / null 2> & 1 dev / null
h3: 2: respawn: / etc / init.crsd run> / dev / null 2> & 1 dev / null
linux:
rm / etc / oracle / *
rm-f / etc / init.d / init.cssd
rm-f / etc / init.d / init.crs
rm-f / etc / init.d / init.crsd
rm-f / etc / init.d / init.evmd
rm-f / etc/rc2.d/K96init.crs
rm-f / etc/rc2.d/S96init.crs
rm-f / etc/rc3.d/K96init.crs
rm-f / etc/rc3.d/S96init.crs
rm-f / etc/rc5.d/K96init.crs
rm-f / etc/rc5.d/S96init.crs
rm-Rf / etc / oracle / scls_scr
rm-f / etc / inittab.crs
cp / etc / inittab.orig / etc / inittab
ps-ef | grep init.d
/ Etc / init.d / init.crsd
/ Etc / init.d / init.evmd
/ Etc / init.d / init.cssd
3.2 clean up the old configuration information
rm-rf / etc / oracle / * (delete ocr.loc)
rm-rf / var / tmp / .oracle or / tmp / .oracle
3.3 Use dd clear vote disk and ocr (raw device)
dd if = / dev / zero f = / dev / votedisk_device bs = 8192 count = 2560
dd if = / dev / zero f = / dev / ocr_device bs = 8192 count = 12800
Reference:
Clear 10g RAC CRS method
http://6month.itpub.net/post/37672/470422
4. Were performed at each node $ CRS_HOME / install / rootdelete.sh
5. In any node on the implementation of the script. $ CRS_HOME / install / rootdeinstall.sh, only one node can run
Crs above configuration is deleted, the following is re-install crs, execute the following command to check if there is no return value, it can continue to install
ps-e | grep-i 'ocs [s] d'
ps-e | grep-i 'cr [s] d.bin'
ps-e | grep-i 'ev [m] d.bin'
6. And in step 5, a node with the script. $ CRS_HOME / root.sh
7. In other node implementation of the $ CRS_HOME / root.sh, and then note the last node of the output and, finally, run ". / VIPCA"
8. With netca reconfigure listener, to confirm the registration to ocr
This time with crs_stat-t-v can see the listener, ONS, GSD, VIP registration to ocr, the also need to asm, the database also registered in ocr
9. ASM registered to ocr
srvctl add asm-n rac1-i + ASM1-o $ ORACLE_HOME
srvctl add asm-n rac2-i + ASM2-o $ ORACLE_HOME
10. Start ASM
srvctl start asm-n rac1
srvctl start asm-n rac2
Usually starts when the last instance will asm error, generally because of rac can not confirm which network to use as a private interconnect
So can two ASM instance pfile to add the following two parameters
Add two parameters in the file / u01/app/oracle/product/10.2.0/db_1/dbs/init + ASM2.ora
+ ASM1.cluster_interconnects = '192 .168.0.31 '
+ ASM2.cluster_interconnects = '192 .168.0.22 '
ip must write right, or have problems, and then restart the asm instance, the problem can be solved
11. Manual registration database objects (database name and db_name to the same case)
srvctl add database-d RAC-o $ ORACLE_HOME
12. Manual registration of two instances of objects (instances of the same name to and isntance_name case)
srvctl add instance-d RAC-i rac1-n rac1
srvctl add instance-d RAC-i rac2-n rac2
13. Modify instances and examples of correspondence between asm
srvctl modify instance-d RAC-i RAC1-s + ASM1
srvctl modify instance-d RAC-i RAC2-s + ASM2
14. Start the database (in this process will usually error, so as to hand-asm instance of the specified private interconnect)
sql> alter system set cluster_interconnects = '192 .168.0.31 'scope = spfile sid =' * '
sql> alter system set cluster_interconnects = '192 .168.0.22 'scope = spfile sid =' * '
Then start the database
srvctl start database-d RAC
15. Check ons is normal
[Root @ rac2 oracle] # ps-ef | grep ons
root 4008 10066 0 08:39 pts / 0 00:00:00 grep ons
root 5660 1 0 06:42? 00:00:00 sendmail: accepting connections
oracle 16107 1 0 07:21? 00:00:00 / u01/crs/oracle/product/10.2.0/crs/opmn/bin/ons-d
oracle 16108 16107 0 07:21? 00:00:00 / u01/crs/oracle/product/10.2.0/crs/opmn/bin/ons-d
[Root @ rac2 oracle] # onsctl ping
Number of onsconfiguration retrieved, numcfg = 0
ons is not running ...
[Root @ rac2 oracle] # onsctl start
Number of onsconfiguration retrieved, numcfg = 0
Number of onsconfiguration retrieved, numcfg = 0
onsctl: ons started
[Root @ rac2 oracle] # onsctl ping
Number of onsconfiguration retrieved, numcfg = 0
ons is running ...
[Root @ rac2 oracle] #
If ons can not start, you can use the following command, you can also directly edit the file $ CRS_HOME / opmn / conf / ons.config
racgons add_config hostname1: port hostname2: port
racgons remove_config hostname1: port hostname2: port
16. Check whether the network command oifcfg normal
eg: add interface configuration
oifcfg setif-global eth0/192.168.2.0: public
oifcfg setif-global eth1/192.168.0.0: cluster_interconnect
17. With the following two commands verify the correctness of configuration
1. [Oracle @ rac1 ~] $ / tmp/10201_clusterware_linux32/clusterware/cluvfy/runcluvfy.sh stage-post crsinst-n rac1, rac2
2. Crs_stat-t-v
This votingdisk already successfully re-configured, but also encountered in the process are many problems, the following is the summary of issues, including the third issue left a lot of detours, with a small half-time, other problems are are resolved quickly.
1. As votingdisk disk failure, so re-add the removed disk in vmware
2. Crs start-up problems (time synchronization)
3. Asm2 not start instance of the problem (the point written as a comma)
4. Can not start a database problem
5. Onsctl startup problems
6. Change / dev / raw is the main problem
7. Oifcfg home network problems
Specific problems and solutions, please refer to another article: http://blog.csdn.net/wyzxg/archive/2010/05/09/5572418.aspx
Reference document:
Reconstruction votedisk / ocr
http://www.laoxiong.net/10g_rebuild_crs_rac.html
http://www.dbasoul.com/2010/700.html
------ End ------
来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/230160/viewspace-675870/,如需转载,请注明出处,否则将追究法律责任。
转载于:http://blog.itpub.net/230160/viewspace-675870/