有关oracle高可靠性的一些讨论和想法(转)

最新推荐文章于 2021-05-12 17:10:56 发布

congnen9588

最新推荐文章于 2021-05-12 17:10:56 发布

阅读量397

点赞数

有关oracle高可靠性的一些讨论和想法http://skyhorse.blogbus.com/logs/2004/03/106569.html

有关RAC的工作日志：
12月16日到12月23日做RAC的试验。12月24日把服务器交给QYC做DataGuard.
QYC做完DataGuard试验之后，1月4日我重新开始做RAC的试验。

当初说是要做XX集团的双机热备,因为我应用oracle的时间非常短，对oracle并不熟悉，所以我这段时间就搜集了
一些相关的信息和资料，以供大家参考。
XX集团的应用我分析了一下，应该是不要求24*7连续工作的，只要能够及时恢复访问即可，而且数据量不是太大。
而且我原来让XX方面做了NAT, 我们在这里就可以进行远端的控制，控制到XX集团内部的Intranet的个别服务器。

我在网上所能搜到的信息是高可用性解决方案分为4种，
一种是oracle提供的被用方法,Standby (=9i DataGuard)
一种是AR (高级复制Advanced Replication，在以前版本叫快照snapshot)
一种是oracle 并行服务器8i的OPS (9i RAC,Real Application Cluster)
一种是第三方HA解决方案（如Rose HA，故障切换时间是几分钟)

oracle公司的牛人著的里也是
把这4种方法做为高可用方案的组成。

这几种方案从原理上来讲都很容易理解，但是实际上有相当多的细节和问题。

另外还有一种是大家都不太熟悉的是oracle 的 failsafe。
failsafe 采用的是SHARE NOTHING结构，即采用若干台服务器组成集群，共同连接到一个共享磁盘系统，
在同一时刻，只有一台服务器能够访问共享磁盘，能够对外提供服务.这与第3方HA方案的概念基本一样。
但是 failsafe系统局限于WINDOWS(winnt,win2k...)平台，必须配合MSCS(microsoft cluster server).

我在网上找到现成的双机热备的文档就是讲在 oracle8i上如何做standby. 其保证了始终有一台备用的
数据库能够在很短时间内通过人工，恢复正常的访问，并保证数据一致。这是不要求24*7连续工作时所考虑的方案。

我们所能做试验的就是前三种方案，因为人手有限，所以就做了9i的DataGuard 和RAC 两种方案的试验。
高级复制据说lwd在很久以前做过。我打电话问oracle公司，他说AR对数据库的性能影响太大。
高级复制也分为两种情况
1.主动／被动策略: node1处于主动模式，数据库可读写，node2处于被动模式，数据库只读。
2.主动／主动策略: node1和node2 都处于主动模式，数据库都可读写。这种对数据库的性能影响特别大。

在讲述DataGuard和RAC这两种方案之前，我先补充一点关于oracle Client 如何能够不修改本机配置就能
访问两台oracles数据库的方法。
也就是修改本机的tnsname.ora
一个通常的tnsname.ora 如下：
RACDB =
(DESCRIPTION =
(LOAD_BALANCE = off)
(failover = on)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.61)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.62)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = racdb)
)
)
在 ADDRESS_LIST 中写了两个地址，client 通过oracle net 在访问时，如果访问不通第一个ip,就会访问第2个ip.
这个特性是早就有了的。load_balance 特性也是有的。但是在两台数据库内容不一致的情况下是没有任何意义的。
不过，在oracle9i 的官方pdf中，load_balance 特性是不推荐使用的。

RAC 的试验我昨天已经做成了，虽然遇到了一些不大不小的Bug和不稳定现象。
环境是oracle9.2.0.1.0 , 2* RedHatAdvanceServer 2.1 和一个磁盘阵列, 采用的是裸设备。
RAC 是share everything 模式，两个数据库实例同时共享同一套数据文件，控制文件，日志文件。
客户端可以同时访问这两台数据库得到的数据都是一致的，它的重点是高性能,可扩展性。但是可靠性是不如DataGuard的。
因为首先在物理上是连接在一起的，是没法容灾的。
其次，instance1 死掉的话，可能可能影响instance2。
(Oracle 公司的电话支持说的，以及网上的论坛中有相关的例子，一个实例down机拖累另一台不能正常工作,
我在做RAC试验的时候，也出现了node1 重起，造成node2也重起的个别现象)
当然了，与单机的oracle相比，可用性肯定是高的。

另外网上我所能找得到的RAC成功案例(论坛oracle版主之类实施)，无一例外都是oracle经过认证的服务器硬件和软件.
例如HP,DELL PowerEdge服务器。DELL/EMC fiber-channel storage array 等等。
另外，因为没有多余交换机，4块网卡中的进行内部通信用的两块网卡我采用的是直接级联
(新聚思公司的oracle支持说这样不稳定，但是为什么不稳定也没有说原因)

有关共享文件系统的一些问题：
采用裸设备无法进行日常管理，也没有办法进行文件系统级的备份。
开始我第一次在Mandrake8.1的时候，对阵列进行分区，而fdisk在linux下只能分16个分区，我只好采用
lvm（logical volume manager，支持256个)对裸设备进行管理。后来在dbca创建数据库的最后阶段无法创建，只好作罢。
第二次用RedHat AS2.1，oracle网站新推出了针对ocfs，我将其2003-1-3 更新的有关ocfs的所有rpm包(只适用
于AS2.1）安装上，但是却发现无法正常加载ocfs module, 我查了好久，估计这与我们所用的世纪曙光硬件有关，
采用的AMD双Athlon MP 1800+ 以及相关主机硬件，RedHat AS 2.1 无法正常认出，从而造成ocfs modules也无法
正常加载，因为ocfs modules与kernel是相关的。或许换成intel 的双cpu, 或换成单cpu ，然后重装系统就可以解决。
因为rhAS2.1的内核不支持 lvm, 需要重新编译内核才能支持，我只好将磁盘阵列分成2个drive,分别进行了
分区，跳过了fdisk分区数量限制，给oracle提供了足够多的裸分区。
当初做方案时买的vertris 的冷备份软件(大概10万元)是只能在oracle停机时通过smb来copy 文件进行备份到磁带里的。
而裸设备是没有办法copy 的。

客户端在tnsname.ora配好address_list后，
当nodeA 停机时，是可以不用修改配置访问到nodeB 的。
但是这也分很多种情况
nodeA down,
listenerA down,
InstanceA down,
InstanceA in indeterminate state,
session die等等。
并非每种情况都能实现自动转到node2上。

第三方HA软件是靠自己的agent软件检测模块按照自己的故障判断标准进行强制转换的。第一台肯定不会被访问到，
在几分钟之后所有的访问都会访问到第二台刚刚起来的数据库上。
oracle 要想实现与第三方HA软件一样的功能，只能与microsoft cluster server一起在windows平台
上实现failover.
除此之外，oracle本身的几种High Available 方案是不提供与此类似的自动failover功能的。
RAC提供并行；
standby/dataguard提供热备份服务器(需要人工维护切换)；
AR 可以基本实时提供两台数据一致的数据库，但是数据库性能受影响。而且客户端能否在各种各样的情况下都自动
切换到第二台数据库上我也不知道。(例如listener running, instance down时无法切换到第二台)
主数据库发生灾难，无法访问的情况下应该是能够切换的，但是有些情况下，只需要修改
tnsname.ora或者停掉node1的listener即可。

以前曾经有人在职成网做过 RoseHA+oracle817+Turbolinux的集成方案, 据说效果也非常差。我所看到我们这里的人去职成网
进行维护N多次。(N非常大) 所以在集成方案中如果用到了oracle数据库，就准备好有人长期进行维护，主数据库
在万一情况下发生灾难，只要有一台热的备用数据库能够在比较短(电话通知之后1天之内)的时间内继续投入使用
就达到了可用性的目的，不至于主数据库损坏，重新进行安装恢复占用星期级的时间。
要想达到failover自动切换，无需人的参予是一种理想化状态，在unix平台上无法实现，windows平台上的oracle failover
我不太清楚，应该是能实现这个想法的。

standby备用数据库是在oracle7.x才开始提供的一项功能，到了oracle8i才能提供read only模式，
到了9i 才使日志应用等实现了自动化，但是这个自动化不是故障切换自动化，而是只为了实现热备份数据库的功能完善而
增加的一些自动化。归根到底，oracle公司开发这么久，还没有开发完善这些高可用方案，只是一直处于完善阶段。
RAC的并行提供服务我从一些oracle技术支持那里听来的说法也是最好一台用来做读写，另一台专门提供只读操作的查询，
不然仍然影响性能。用来做我们这种failover应用的倒不多。
很容易理解的一些稍微复杂的原理，要想在实际中应用是需要大量时间的，里面所涉及到的众多细节如日志增量等等很麻烦。
就连oracle9.0.0.1在linux下的OUI(oracle univesal installer) 安装程序在它认证的linux上运行也是一堆Bug.
也就是它的jre有毛病，所以我当初在mandrake8.1上创建数据库出现了问题，无法进行下去。
特定的环境，特定的问题，很多都是没有解释的。这是网上的一个DBA的原话。
网上也有oracle81700升级到81740就出故障的案例。

使用DataGuard(standby) 是不能实现故障的自动切换的，因为据oracle公司的人说无从判断究竟算什么样的故障才开始进行转移，
这个已经超出oracle软件本身的范围了。或许可以通过自己编写程序来按照自己的标准来进行判断和转移。
但是DataGuard做到了始终有一台数据库与主数据库保持一致。在加上客户端的tnsname.ora的addresslist在一定程度上
是可以实现部分的故障切换的。
备数据库平时只能处于read only或 recovery manage 模式。
read only 不能应用主数据库传来的重作日志，recovery manage 可以进行数据恢复，但是不能被客户端访问。
备用数据库经常处于修复状态，因此不能被终端用户使用，这从管理角度是一种浪费(所以8i开始提供了read only模式)。

我的想法是
1. 主数据库发生灾难，被迫关闭，XX方面打电话通知过来，我们通过远程由人工激活备用的数据库即可。也就是敲几行sql命令即可。
完全可以写成脚本，随便找一个人执行一下即可。

2. 备数据库白天处于read only 模式，可供webserver(也就是客户端)查询，晚上12点到1点通过cron 运行在recover managed模式，
将白天主数据库的更改应用到备数据库上。

3. 通过cron将备数据库白天处于 primary 模式，可读可写，晚上通过脚本改回standby模式，并且应用主数据库的更新。
这样当主数据库down机，客户端会立刻连到第二台数据库上，同时也能够进行读写。数据分歧只有一天，并且达到了无人
切换状态。

这3种方法，第1种是最好的。
第2种是可行的，是oracle官方认可的，有数据分歧，和只读的局限性。
第3种有数据分歧并且有或大或小的细节问题没有考虑，只是我的一个临时想法。

在RAC 和 DataGuard 这两种方案中，
RAC对硬件和操作系统要求都比较高，维护也非常复杂，我们买的vertas 备份软件也没有办法使用冷备的文件。
对人员的素质要求也很高。
随便举个例子，RedHat AS 2.1 如果认不出SCSI driver，就没法做了。因为oracle9.2i只能用这个操作系统。
( webmail没有用mandrake8.1而是用mandrake8.2就是这个原因)
不确定因素太多。
在做系统集成方案和买硬件时都要仔细考虑，买什么样的服务器，阵列，网卡，几个交换机，linuxAS21能否装上等等。
而不是随便写个双机热备，买两个服务器，一个交换机就行了。

不过这个方案可以用在我们自己的机房里，提供高性能的oracle数据库服务。(但是需要比较多的时间来准备和调试）。
我现在只能做到把oracle92i装起来，具体平时的管理还要靠有数据库使用经验的其他同事来做。
安装文档我放在附件里了。
如果要应用到XX集团方案中，人员的出差以及硬件，所消耗时间等都需要考虑，我没有把握能再成功装一遍。

DataGuard对条件的要求就很低了，只要随便两台一模一样的数据库就行，不用管操作系统和硬件，网络只需要联通即可。
在主数据出问题时，可以迅速恢复到第二台上。也就是oracle8i里俗称的双机热备。(实际上不止1台热备机，可以一个主数据库
带多个备数据库）
我们在XX处实施时，只需要在这里装好，然后将两个数据库打包通过ftp 传到XX处，然后展开即可运行，也不需要重新安装操作系统和
更改硬件。
网速很快，我原来传过一些上G的文件，也就是几个小时。

要想达到比较理想化的状态，在客户端能够自动避开有故障的数据库访问没有问题的数据库的话，
比较接近这种想法的方案就是AR高级复制的双主动模式，但是oracle公司的技术支持不推荐使用。
我们这里可以找人试一下。lwd用数据库很久了，他以前做过。

实际上如果主数据库瘫痪，是需要人远程干预的，而oracle的客户端只有2个webserver，修改不修改本机的tnsname.ora都无所谓，
都能够实现正常的访问，修改webserver 本机的tnsname.ora的时间也是不用考虑的。应用程序也是无需修改的。
另外，oracle 提供的TAF(Transparent Application failover)在实际用途中是意义不大的。

下面附上几个我搜集到的一些资料，有些来自论坛上的oracle版主或DBA.

附件1: 论坛散言
附件2：ORACLE FailSafe与rac(ops)的区别
附件3：oracle双机热备份方法(oracle8i standby）
附件4：ROSE HA & ESCORT DA 双机热备架构方案
附件5：oracle9ir2 setup ( lijun )

附件1: 论坛散言
----------
我曾经在yahoo messenger上问过美国Oracle里面专门讲RAC这门课的人员，他说试bug.

有时候原因很难说的，我们曾有类似的经历，当时是香港的客户，自以为自己oracle玩的很好，居然没经过我们同意（我们是他们的系统集成商，为他们提供服务和协调），就擅自把数据库从81700升级到81740，于是，我们的应用整个连不上数据库（用proc编写的一个server），后来用一个测试程序，里面就一个connect db语句，结果每次执行程序就死在那里，甚至CTRL+C , CTRL+ Z等等都退不出来，换句话说，就是整个session都死了，必须关掉那个telnet窗口重新开一个，后来纠缠了将近两天时间，被告知是个bug，oracle的一个support给我们了一个小patch，打上就好了（就是关于使用aio的lib）
还有，在aix的某些版本上使用oracle817的cursor_share＝force就没问题，但是在Tru64 unix上就会出现致命问题
。。。。。。。。

类似的事情，我的理解，特定的环境，特定的问题，很多都是没有解释的，能够解决就可以了,或者，在经历多了以后，可以凭直觉感觉到什么，就很好了，呵呵

朋友，实际情况就是这样。作DBA的有时并没有办法。不过，作为一个DBA无论如何都要力荐作归档。我们有一个客户是移动领域的，全省的营业数据库集中，大约有900G数据吧（不是我负责）。这次我去作应急扩容的时候，居然发现没有任何数据保护措施。没有归档、没有LOGICAL备份，甚至日志文件都没有双份。我操，如果这个数据库出了问题导致数据丢失，决策者拉出去枪毙我都不觉得过分。想想啊，那是全省的数据啊。
BTW:我还想说一点，ORACLE公司的人很多都是垃圾，有的人根本没有什么水平，我都能够点出一大堆ORACLE公司员工的名字，他们的水平比各位只低不高。仗着英语好点在ORACLE混，这真的很影响我对ORACLE公司的看法。真正ORACLE有水平的，你得掏很多钱才能遇到。SHIT，无论如何总不能拿个白痴样的人来挡事儿吧？各位知道ORACLE800电话值班的都是什么人吧？就是去ORACLE培训的人。我身边就有两个。：）），唉：（（不说也罢，说了火大。去年我们客户的数据库坏了（我负责的），客户过分相信ORACLE，不听我得建议（没有办法，客户愿意掏钱你管得着么），结果，本来能够恢复的，那个自称ORACLE广州公司应急技术支持的，把数据库重建了，引起2天停机。SHIT。

不单是oracle，所有的公司都一样，如hp,ibm，在前台和客户直接打交道的都是些菜的不能再菜的人，真正的高手都在后面，除非是大项目，一般很少和客户打交道的：）

附件2：ORACLE FailSafe与rac(ops)的区别

title: ORACLE FailSafe与rac(ops)的区别
created: 2002-10-30
------------------------------

oracle failsafe和RAC均为ORACLE公司提供的高可靠性（HA）解决方案。然而之两者之间却存在着很大区别：

1。操作系统： failsafe系统局限于WINDOWS平台，必须配合MSCS（microsoft cluster server），而RAC最早是在UNIX平台推出的，目前已扩展至LINUX和WINDOWS平台，通过OSD（operating system dependent）与系统交互。对于高端的RAC应用，UNIX依然是首选的平台。

2。系统结构：FAILSAFE采用的是SHARE NOTHING结构，即采用若干台服务器组成集群，共同连接到一个共享磁盘系统，在同一时刻，只有一台服务器能够访问共享磁盘，能够对外提供服务。只要当此服务器失效时，才有另一台接管共享磁盘。RAC则是采用SHARE EVERYTHING，组成集群的每一台服务器都可以访问共享磁盘，都能对外提供服务。也就是说FAILSAFE只能利用一台服务器资源，RAC可以并行利用多台服务器资源。

3。运行机理：组成FAILSAFE集群的每台SERVER有独立的IP，整个集群又有一个IP，另外还为FAILSAFE GROUP分配一个单独的IP（后两个IP为虚拟IP，对于客户来说，只需知道集群IP，就可以透明访问数据库）。工作期间，只有一台服务器（preferred or owner or manager）对外提供服务，其余服务器(operator)成待命状，当前者失效时，另一服务器就会接管前者，包括FAILSAFE GROUP IP与CLUSTER IP，同时FAILSAFE会启动上面的DATABASE SERVICE，LISTENER和其他服务。客户只要重新连接即可，不需要做任何改动。对于RAC组成的集群，每台服务器都分别有自已的IP，INSTANCE等，可以单独对外提供服务，只不过它们都是操作位于共享磁盘上的同一个数据库。当某台服务器失效后，用户只要修改网络配置，如（TNSNAMES。ORA），即可重新连接到仍在正常运行的服务器上。但和TAF结合使用时，甚至网络也可配置成透明的。

3。集群容量：前者通常为两台，后者在一些平台上能扩展至8台。

4。分区：FAILSAFE数据库所在的磁盘必须是NTFS格式的，RAC则相对灵活，通常要求是RAW，然而若干OS已操作出了CLUSTER文件系统可以供RAC直接使用。

从以上分析可以看出，FAILSAFE较适合一个可靠性要求很高，应用相对较小，对高性能要求相对不高的系统，而RAC则更适合可靠性、扩展性、性能要求都相对较高的较大型的应用。另外要指出的是，用户不用象为获得RAC一样为FAILESAFE单独付费，它是FREE的。。。。。。

-------------
by Rudolf Lu
welcome to www.cnoug.com

附件3：oracle双机热备份方法(oracle8i standby）

Standby Database

一、创建Standby Database 要求
1、 Primary 与standby 数据库所在主机上的oracle server及操作系统版本必须相同，且具有相同的补丁；
2、 primary 数据库必须为archive mode；Standby Database也必须为archive mode ；

二、创建standby数据库
1、首先查看Primary Database的归档模式（svrmgr>archive log list）若为无归档模式，将数据库转换为自动归档方式，方法如下：
先关闭Primary Database：
svrmgr>shutdown immediate
再以mount方式启动Primary Database：
svrmgr>connect internal/password
svrmgr>startup mount
svrmgr>alter database archivelog
svrmgr>shutdown
2、修改init%oracle_sid%.ora 文件中的参数,加入:
log_archive_start = true # if you want automatic archiving
log_archive_dest=%ORACLE_HOME%databasearchive
log_archive_format = %ORACLE_SID%%S.%T其中log_archive_dest
是指归档日志文件存放的位置，可根据实际情况来设定，如E 盘空间较
大，可以设为：log_archive_dest=eracledatabasearchive
3、复制Primary Database 的init%oracle_sid%.ora 和对应的Password 文件到Standby Database 上的目录%oracle_home%database；根据实际情况修改Standby Database上的对应init%oracle_sid%.ora中的参数：
log_archive_dest；
4、在Standby Database位于的主机上创建实例，实例名称同Primary Database,例如：
Oradim80 -new -sid SID_NAME -intpwd oracle -startmode auto -pfile=crantdatabaseinit%oracle_sid%.ora;
5、在Primary Database上为Standby创建Controlfile,将controlfile 拷贝至Standby Database 的确定位置，操作如下：
svrmgr>alter database create standby controlfile as ;
6、在Primary Database上将联机日志归档，
svrmgr>alter system archive log current;
7、关闭Primary Database,
svrmgr>shutdown immediate
8、从Primary Database上将所有数据文件、日志文件、归档日志文件（不包括控制文件）拷贝至Standby Database的确定位置；
9、以nomount方式启动Standby Database：
svrmgr>startup nomount
svrmgr>alter database mount standby database[execlusive/parallel]

10、同步Standby Database：
svrmgr>recover standby database;
11、恢复Primary Database 启动状态。

三、维护standby数据库
1、将primary数据库所产生的archive log文件定期传到standby数据库的确定位置；手工的做恢复，使standby数据库与primary数据库保持同步。
svrmgr> set autorecovery on;
svrmgr> connect internal/password;
svrmgr> startup nomount pfile=;
svrmgr> alter database mount standby database;
svrmgr> recover standby database;
svrmgr> shutdown
2、当primary 数据库失败时，将standby 数据库激活：取消(cancel)恢复执行以下操作：
svrmgr> connect internal/password;
svrmgr> startup nomount pfile=;
svrmgr> alter database mount standby database;
svrmgr>alter database activate standby database；
shutdown standby 数据库；
svrmgr>shutdown immediate
重起Standby Database；
svrmgr>startup

附件4：ROSE HA & ESCORT DA 双机热备架构方案

以下是一份双机热备份的方案大家可以参考参考，并没有指定要用下面的产品，只是希望大家对双机有一个了解
ROSE HA & ESCORT DA 双机热备架构方案

一、前言
对现代企业来说，利用计算机系统来提供及时可靠的信息和服务是必不可少的；另一方面，计算机硬件与软件都不可避免地会发生故障，这些故障有可能给企业带来极大的损失，甚至整个服务的终止，网络的瘫痪。可见，对一些特别的企业或公司，系统的高可用性显得更为重要。因此，必须有适当的措施来确保计算机系统提供不间断的服务，以维护系统的可用性。

信息系统的可用性通常在两种情况下会受到影响，一种是系统当机、错误操作和管理引起的异常失败，另一种是由于系统维护和升级，需要安装新的硬件或软件而正常关机。高可靠性软件必须为这两种情况提供不间断的系统服务。

本方案正是经过了对软、硬件产品的综合考察，对各种双机热备份架构方案的深入分析而提出的。我们力图向您提供完备、智能化、易管理的双机热备份架构方案，从而为您的系统的高可用性建设尽绵薄之力。

二、双机热备份架构方案
1、网络拓扑图

2、采用的软、硬件及其特点
? 磁盘阵列设备
采用ESCORT DA-3500 P2D多网冗余、双高性能服务器系统、磁盘阵列(RAID 0、1、3、5)等种方式来实现冗余高可靠。
? 双机软件
ROSE HA
3、系统特点
? 共用存储设备
采用硬盘阵列(Disk Array)作为共用存放设备，以保证数据的可靠性和可恢复性，用于存放提供服务所必需的软件和数据，避免和减少由于磁盘故障或错误所造成的损失。
? 开放性
支持各种流行的数据库软件(如：Oracle、Sybase、Informix、sql server等)，以及其他的主流应用。
? 快速的反应速度
典型的错误检测时间是5秒，服务转移时间一般在10－120秒之间。
? 自动处理过程
错误检测和服务转移过程完全由ROSE HA软件自动处理，不需要系统管理员干预。
? 图形用户界面
ROSE HA软件以Java Applet的形式提供，系统管理者通过交互式界面来配置HA，并且该界面实时地显示出主机系统及服务的状态。
? 灵活性
用户可指定每台服务器的作用(avtive or standby),指定要监控的服务和硬件部份，定义指定的服务发生故障后要采取的进一步行动(如是否重新启动该服务，允许的最大启动时间)。
? 可扩充性
用户可通过增加服务来进一步提高系统的可用性。
? 丰富的附加功能
提供不同的针对特定应用的Agent程序，使服务监控更切实际，更加有效。
? 提供用于开发Agent程序的应用程序界面(API)，使用者可针对特定的服务编写执行状态诊断及错误恢复工作的Agent程序。

三、双机热备工作过程
系统起动后，ROSE HA首先启动HA MANAGER管理程序，根据高可用性系统的配置结构初始化,然后启动必要的服务和代理程序来监控和管理系统服务。HA代理程序用来监控、监测、诊断和管理硬件软件服务。
代理程序检测到该服务处理于活动状态，HA MANAGER就会认为该服务处于活动状态，HA MANAGER会定时通知后备服务器上的HA MANAGER，其每项服务处于正常。
当代理程序检测到某个服务发生故障时，它就通知HA MANAGER管理程序。HA软件首先会重新起动该服务多次(可由用户设定)，如果启动不能成功，该服务会由HA转移到后备服务器上。
HA周期性地检测系统硬件的状态，如果硬件发生故障，HA会把与该硬件相关的服务转移到后备服务器上。
当某项服务发生转移时，HA首先在运行服务器上停止该服务，之后，由备份服务器上的HA在备份服务器上启动该服务。由于停止和启动该服务都需要一定的时间，所以当服务被切换(转移)时，该服务会有一个短暂的中断，在切换完成后，该服务自动恢复正常运行。
考虑到一些数据库系统(如Oracle、Sybase、Informix等)以及其他一些应用软件(如Domino Server、WWW Server等)，ROSE HA在HA管理模块(HA Manager)的基础上，提供了一系列的Agent软件模块。Agent为一个软件监视模块，监控数据库服务或其他应用服务的运行。当运行服务器发生故障时，由Agnet检测到之后，Agent向HA主控软件请求，进行相应的处理动作。

ROSE HA软件极大程度上减少了人的介入，提高了系统的可靠性与安全性，使服务能高可靠的运行.。

附件5：oracle9ir2 setup ( lijun )

oracle 9i rac step by step

by lijun at 2003-1-5 create.

environment:
oracle9i database 9.2.0.1.0 enterprise
redhat advance server 2.1
rac1 and rac2 server
rac1: 211.68.29.61 1.1.1.1
rac2: 211.68.29.62 1.1.1.2

IFT3102 disk array 90G RAID5 + 140G RAID0

0. get all the soft and exact or setup it.
downloaded on rac1 :

rpm -ivh cpp-2.96-108.1.i386.rpm
glibc-devel-2.2.4-26.i386.rpm
kernel-headers-2.4.9-e.3.i386.rpm
gcc-2.96-108.1.i386.rpm
binutils-2.11.90.0.8-12.i386.rpm

{ #####
{ not install jdk , had IBM JDK and JRE)
Install JDK
Download JDK 1.3.1 or Blackdown 1.1.8_v3: (I always use Blackdown)
http://www.blackdown.org
http://java.sun.com
According to JDK documentation, install JDK under /usr/local .
Then create a symbolic link to the JDK under /usr/local/java :
As root:
bzip2 -dc jdk118_v3-glibc-2.1.3.tar.bz2 | tar xf - -C /usr/local
ln -s /usr/local/jdk118_v3 /usr/local/java
( can get from ftp://192.168.100.44/pub/oracle/oracle9i.linux/ )
} #####

at rac1 and rac2 node as root run setup command:

zcat lnx_920_disk1.cpio.gz | cpio -idmv
zcat lnx_920_disk2.cpio.gz | cpio -idmv
zcat lnx_920_disk3.cpio.gz | cpio -idmv

1. install the shared disk array .
rpm -ivh pdksh-5.2.14-13.i386.rpm
rpm -ivh ncurses4-5.0-5.i386.rpm (cdrom 2 )

2. for partition
It seems that without LVM, Linux can only
support 16 partitions per disk.(3 primary partition,
1 extend partition. And you can get only 12
partitions from the extend partition). It is very
difficult to get a big rac system installed.Is there
really that kind of limit on Linux?

/dev/sdc5-16
/dev/sdd5-16

mkdir /dev/raw
mknod /dev/raw/rawctl c 162 0

/bin/chmod 600 /dev/raw/raw1
/bin/chmod 600 /dev/raw/raw2
/bin/chmod 600 /dev/raw/raw3
/bin/chmod 600 /dev/raw/raw4
/bin/chmod 600 /dev/raw/raw5
/bin/chmod 600 /dev/raw/raw6
/bin/chmod 600 /dev/raw/raw7
/bin/chmod 600 /dev/raw/raw8
/bin/chmod 600 /dev/raw/raw9
/bin/chmod 600 /dev/raw/raw10
/bin/chmod 600 /dev/raw/raw11
/bin/chmod 600 /dev/raw/raw12
/bin/chmod 600 /dev/raw/raw13
/bin/chmod 600 /dev/raw/raw14
/bin/chmod 600 /dev/raw/raw15
/bin/chmod 600 /dev/raw/raw16

/bin/chmod 600 /dev/raw/raw17

as root run at rac2:
mknod /dev/raw/raw.. ....
/bin/chmod 600 ...

3. Installing Cluster Interconnect and Public Network Hardware
cat rac1 /etc/hosts
127.0.0.1 rac1 localhost.localdomain localhost
211.68.29.61 rac1
211.68.29.62 rac2
1.1.1.1 int-rac1
1.1.1.2 int-rac2

cat rac2 /etc/hosts
127.0.0.1 rac2 localhost.localdomain localhost
211.68.29.61 rac1
211.68.29.62 rac2
1.1.1.1 int-rac1
1.1.1.2 int-rac2

ping all ip ok.

4. create users and group for oracle and chown attrib.

Create an oracle account on each node so that the account:
Is a member of the osdba group
Is used only to install and update Oracle software
user /home/httpd/ only for it has enough disk space.
A typical command would look like the following:
As root at rac1:
groupadd dba
groupadd oinstall
useradd -G dba,oinstall -u 101 -m -d /home/oracle -s /bin/bash oracle
passwd oracle
mkdir /var/oracle9i
chown -R oracle.oinstall /var/oracle9i
chmod -R ug=rwx,o=rx /var/oracle9i
ln -s /var/oracle9i /oracle

As root at rac2:
groupadd dba
groupadd oinstall
useradd -G dba,oinstall -u 101 -m -d /home/oracle -s /bin/bash oracle
passwd oracle
mkdir /var/oracle9i
chown -R oracle.oinstall /var/oracle9i
chmod -R ug=rwx,o=rx /var/oracle9i
ln -s /var/oracle9i /oracle

As root at rac1 and rac2:
/bin/chown -R oracle.dba /dev/raw
/bin/chown oracle.dba /dev/raw/rawctl
/bin/chown oracle.dba /dev/raw/raw1
/bin/chown oracle.dba /dev/raw/raw2
/bin/chown oracle.dba /dev/raw/raw3
/bin/chown oracle.dba /dev/raw/raw4
/bin/chown oracle.dba /dev/raw/raw5
/bin/chown oracle.dba /dev/raw/raw6
/bin/chown oracle.dba /dev/raw/raw7
/bin/chown oracle.dba /dev/raw/raw8
/bin/chown oracle.dba /dev/raw/raw9
/bin/chown oracle.dba /dev/raw/raw10
/bin/chown oracle.dba /dev/raw/raw11
/bin/chown oracle.dba /dev/raw/raw12
/bin/chown oracle.dba /dev/raw/raw13
/bin/chown oracle.dba /dev/raw/raw14
/bin/chown oracle.dba /dev/raw/raw15
/bin/chown oracle.dba /dev/raw/raw16

/bin/chown oracle.dba /dev/raw/raw17

at rac1 and rac2:
mkdir /var/opt/oracle
chown oracle.oinstall /var/opt/oracle
Make sure you unset LANG, JRE_HOME and JAVA_HOME in your .profile(.bashrc).

5. cluster software install and create dbora
as root at rac1:
mknod /dev/watchdog c 10 130
chmod 600 /dev/watchdog
chown oracle /dev/watchdog
/sbin/insmod softdog soft_margin=60 ( also added to startoracle_root.sh )

as root at system startup in a file called dbora
I put dbora to /etc/init.d/
add add a line : /etc/rc.d/dbora
to /etc/init.d/rc.local
chmod 755 /etc/init.d/dbora
ln -s /etc/init.d/dbora /etc/rc5.d/S78dbora
ln -s /etc/init.d/dbora /etc/rc3.d/S78dbora

as root at rac2:
mknod /dev/watchdog c 10 130
chmod 600 /dev/watchdog
chown oracle /dev/watchdog
/sbin/insmod softdog soft_margin=60 ( also added to startoracle_root.sh )

as root at system startup in a file called dbora
I put dbora to /etc/init.d/
add add a line : /etc/rc.d/dbora
to /etc/init.d/rc.local
chmod 755 /etc/init.d/dbora
ln -s /etc/init.d/dbora /etc/rc5.d/S78dbora
ln -s /etc/init.d/dbora /etc/rc3.d/S78dbora

add below lines to dbora ( same at rac1 and rac2 )
/usr/bin/raw /dev/raw/raw1 /dev/sdc5
/usr/bin/raw /dev/raw/raw2 /dev/sdc6
/usr/bin/raw /dev/raw/raw3 /dev/sdc7
/usr/bin/raw /dev/raw/raw4 /dev/sdc8
/usr/bin/raw /dev/raw/raw5 /dev/sdc9
/usr/bin/raw /dev/raw/raw6 /dev/sdc10
/usr/bin/raw /dev/raw/raw7 /dev/sdc11
/usr/bin/raw /dev/raw/raw8 /dev/sdc12
/usr/bin/raw /dev/raw/raw9 /dev/sdc13
/usr/bin/raw /dev/raw/raw10 /dev/sdc14
/usr/bin/raw /dev/raw/raw11 /dev/sdc15
/usr/bin/raw /dev/raw/raw12 /dev/sdd5
/usr/bin/raw /dev/raw/raw13 /dev/sdd6
/usr/bin/raw /dev/raw/raw14 /dev/sdd7
/usr/bin/raw /dev/raw/raw15 /dev/sdd8
/usr/bin/raw /dev/raw/raw16 /dev/sdd9

/usr/bin/raw /dev/raw/raw17 /dev/sdd10

6. preinstall rlogin . ( on rac1 and rac2 node )
because you have installed rsh-server and rsh, so you can do:
ntsysv
add * to rexec, rawdevices, rsh,rlogin
or chkconfig --add rsh ;chkconfig --add rexec ;chkconfig --add rlogin;

you must edit this files:
/etc/hosts.equiv
$ORACLEHOME/.rhosts (/home/httpd/oracle/.rhosts su oracle to touch this file )
this files contain the information for the hosts and trusted hosts(for rlogin, rcmd, rcp, etc)

cat hosts.equiv
rac1
rac2
int-rac1
int-rac2

then
service restart xinetd

Note: If you are prompted for a password, you have not given the oracle account the same attributes on all nodes. You must correct this because the Oracle Universal Installer cannot use the rcp command to copy Oracle products to the remote node's directories without user equivalence.
On AS2.1 you can't rlogin under root.

7. setting networking and system kernel parameters ( add to rac1 and rac2 /etc/init.d/rhas_ossetup.sh )
chmod 755 /etc/init.d/rhas_ossetup.sh
ln -s /etc/init.d/rhas_ossetup.sh /etc/rc5.d/S77rhas_ossetup
ln -s /etc/init.d/rhas_ossetup.sh /etc/rc3.d/S77rhas_ossetup

echo 65535 > /proc/sys/net/core/rmem_default
export SEMMSL=250
export SEMMNS=32000
export SEMOPM=100
export SEMMNI=128
echo $SEMMSL $SEMMNS $SEMOPM $ SEMMNI > /proc/sys/kernel/sem
#echo 250 32000 100 128 > /proc/sys/kernel/sem
export SHMMAX=2147483648
echo $SHMMAX > /proc/sys/kernel/shmmax
# Set file handles and process limits
echo 65536 > /proc/sys/fs/file-max
ulimit -n 65536

8. establish system environment variables

# Oracle Environment
su - oracle;
cat .bashrc
export ORACLE_BASE=/oracle
export ORACLE_HOME=/oracle/product/9.2.0
#export ORACLE_SID=coredata
export ORACLE_TERM=xterm
export NLS_LANG=AMERICAN;
#export NLS_LANG=american_america.ZHS16GBK
export ORA_NLS33=$ORACLE_HOME/ocommon/nls/admin/data
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:$ORACLE_HOME/oracm/lib:/lib:/usr/lib:/usr/local/lib
export PATH=$PATH:$ORACLE_HOME/bin
CLASSPATH=$ORACLE_HOME/JRE:$ORACLE_HOME/jlib:$ORACLE_HOME/rdbms/jlib
CLASSPATH=$CLASSPATH:$ORACLE_HOME/network/jlib
export CLASSPATH
unset LC_MESSAGES
unset LC_TIME
unset LC_NUMERIC
unset LC_CTYPE
unset LC_MONETARY
unset LC_COLLATE
unset LANG

As user oracle do this on rac1 and rac2 ( racdb is db_name )
mkdir /oracle/oradata
mkdir /oracle/oradata/racdb
ln -s /dev/raw/raw1 /oracle/oradata/racdb/db_name_raw_system
ln -s /dev/raw/raw2 /oracle/oradata/racdb/db_name_raw_users
ln -s /dev/raw/raw3 /oracle/oradata/racdb/db_name_raw_temp
ln -s /dev/raw/raw4 /oracle/oradata/racdb/db_name_raw_undotbs
ln -s /dev/raw/raw5 /oracle/oradata/racdb/db_name_raw_undotbs2
ln -s /dev/raw/raw6 /oracle/oradata/racdb/db_name_raw_indx
ln -s /dev/raw/raw7 /oracle/oradata/racdb/db_name_raw_controlfile1
ln -s /dev/raw/raw8 /oracle/oradata/racdb/db_name_raw_controlfile2
ln -s /dev/raw/raw9 /oracle/oradata/racdb/db_name_raw_rdo_1_1
ln -s /dev/raw/raw10 /oracle/oradata/racdb/db_name_raw_rdo_1_2
ln -s /dev/raw/raw11 /oracle/oradata/racdb/db_name_raw_rdo_2_1
ln -s /dev/raw/raw12 /oracle/oradata/racdb/db_name_raw_rdo_2_2
ln -s /dev/raw/raw13 /oracle/oradata/racdb/db_name_raw_spfile
ln -s /dev/raw/raw16 /oracle/oradata/racdb/db_name_raw_tools

ln -s /dev/raw/raw17 /oracle/oradata/racdb/dawndata.dbf

On the node from which you run the Oracle Universal Installer, create an ASCII file identifying the raw volume objects as shown above. The DBCA requires that these objects exist during installation and database creation. When creating the ASCII file content for the objects, name them using the format:
database_object=raw_device_file_path or database_object=soft_link
When you create the ASCII file, separate the database objects from the paths with equals (=) signs as shown in the example below:-
cat /home/oracle/raw_config
system=/oracle/oradata/racdb/db_name_raw_system
users=/oracle/oradata/racdb/db_name_raw_users
temp=/oracle/oradata/racdb/db_name_raw_temp
undotbs1=/oracle/oradata/racdb/db_name_raw_undotbs
undotbs2=/oracle/oradata/racdb/db_name_raw_undotbs2
control1=/oracle/oradata/racdb/db_name_raw_controlfile1
control2=/oracle/oradata/racdb/db_name_raw_controlfile2
redo1_1=/oracle/oradata/racdb/db_name_raw_rdo_1_1
redo1_2=/oracle/oradata/racdb/db_name_raw_rdo_1_2
redo2_1=/oracle/oradata/racdb/db_name_raw_rdo_2_1
redo2_2=/oracle/oradata/racdb/db_name_raw_rdo_2_2
spfile=/oracle/oradata/racdb/db_name_raw_spfile
indx=/oracle/oradata/racdb/db_name_raw_indx
tools=/oracle/oradata/racdb/db_name_raw_tools

dawndata=/oracle/oradata/racdb/dawndata.dbf

and You must specify that Oracle should use this file to determine the raw device volume names by setting the following environment variable where filename is the name of the ASCII file that contains the entries shown in the example above:
setenv DBCA_RAW_CONFIG filename
or
export DBCA_RAW_CONFIG=/home/oracle/raw_config
add to .bashrc

9. run oracle9i universal installer as oracle at rac1

next,
ok, input oinstall, next,
An instruction to run /tmp/orainstRoot.sh appears. Run this as root and click Continue.
select oracle cluster manager 9.2.0.1.0 , select production language add simplified chinese, next,
input public node1,2 : rac1 ,rac2 , press next.
input private node1,2 : int-rac1,int-rac2, press next.
see 60000, press next.
enter the quorum disk information , input /dev/raw/raw14 (equal /dev/oracle/nm_raw) ,press next.
press install and next.
click exit and confirm by clicking yes.

on rac2 as oracle:
mkdir $ORACLE_HOME/oracm/log
su root
$ORACLE_HOME/oracm/bin/ocmstart.sh

on rac1 as oracle:
su root
$ORACLE_HOME/oracm/bin/ocmstart.sh

if start ocmstart.sh failed then
rm -f $ORACLE_HOME/oracm/log/*

Run the installer again at rac1,
Custer Node Selection Screen. Select nodes(rac1 and rac2). Press Next.
At the file location screen, press Next.
Select the Products to install. In this example, select the Oracle9i Database , select production language add simplified chinesethen click Next.
Select the installation type. Choose the Custom option. The selection on this screen refers to the installation operation, not the database configuration. Click Next.
Select database components, make sure you include the Real Application Clusters option (i also installed partitioning and netlistener ).
(comment:I self delete: data mining,oracle managent server 9.2.0.1.0 ,enterprise manager web site 9.2.0.1.0 ,oracle http server 9.2.0.1.0, legato networker single server 6.1.0.0.0 ) Click Next.
Component Locations, accept defaults and press Next.
You are prompted for the pathname of the shared configuration file : enter the pathname of the raw device you used for the srvconf file (ex: /dev/raw15) and click Next.
Privileged operating system groups. Click Next.
Create database, check no. Click Next.
Summary screen. Click Install.
Install screen shows the progress.
Insert (umount + mount ) CD2,3 when asked. Press ok. ( if you exact lnx_920.cpio to same dir, don't need )
You will get a popup indicating to run $ORACLE_HOME/root.sh as root.
at on all nodes su root and Do so and answer the questions (just press return) Then press OK.
Configuration tools window appears and starts cluster configuration assistant - net configuration assistant
Net Configuration assistant - Do not configure this now.Click Cancel and confirm by clicking Yes.
Accept the error indicating that one or more tools have failed. This is because we cancelled net configuration. Click Ok. Click Next.
End of installation screen. Press Exit and confirm by clicking Yes.

Make sure the directory $ORACLE_HOME/rdbms/audit exists on all nodes, create it if not.
Make sure the directory $ORACLE_HOME/rdbms/log exists on all nodes, create it if not.
Make sure the directory $ORACLE_HOME/network/log exists on all nodes, create it if not.

10. Configure the Cluster Manager, Node Monitor .
The installation has created the configuration file for you in $ORACLE_HOME/oracm/admin/cmcfg.ora. The file should look like:
cat $ORACLE_HOME/oracm/admin/cmcfg.ora
HeartBeat=15000
ClusterName=Oracle Cluster Manager, version 9i
PollInterval=1000
MissCount=20
PrivateNodeNames=int-rac1 int-rac2
PublicNodeNames=rac1 rac2
ServicePort=9998
WatchdogSafetyMargin=5000
WatchdogTimerMargin=60000
CmDiskFile=/dev/raw/raw14
HostName=int-rac1

put this command in a startup script (/etc/rc.d/startoracle_root.sh=/etc/init.d/dbora ) run as root at all nodes.
ORACLE_HOME=/oracle/product/9.2.0
$ORACLE_HOME/bin/gsdctl start

11. configure the listeners.
Before creating our database, we will configure the listeners, this avoids errors during the database creation.
make sure the LANG environment variable is not set
$unset LANG
start the network configuration assistant as oracle user:
$netca
On the welcome screen, select cluster configuration and click Next.
On the nodes screen, select all nodes and click Next.
On this screen, select listener configuration and click Next.
You get the listener configuration screen. Select add and click Next.
Accept the default name listener by clicking Next.
This is the protocol selection screen. TCPIP should already be selected. Add IPC (click IPC then > and click Next.
Accept the default port number (1521) on the screen by clicking Next.
On the IPC configuration screen, enter your database name as key and click Next.
Do not configure anything else and exit netca unless you want to use Oracle intelligent agent. You will have to statically add the service names to allow automatic detection of your database by the agent. The instances will register themselves with the listener.

12. Create a RAC Database using the Oracle Database Configuration Assistant
The Oracle Database Configuration Assistant (DBCA) will create a database for you (for an example of manual database creation see Database Creation in Oracle9i RAC). The DBCA creates your database using the optimal flexible architecture (OFA). This means the DBCA creates your database files, including the default server parameter file, using standard file naming and file placement practices. The primary phases of DBCA processing are:-
sugget you run dbca to create database, not by manual database creation.

dbca
Start DBCA by executing the command dbca. The RAC Welcome Page displays.
Choose Oracle Cluster Database option and select Next.
The Operations page is displayed. Choose the option Create a Database and click Next.
The Node Selection page appears. Select the nodes that you want to configure as part of the RAC database and click Next. If nodes are missing from the Node Selection then perform clusterware diagnostics by executing the $ORACLE_HOME/bin/lsnodes -v command and analyzing its output. Resolve the problem and then restart the DBCA.
The Database Templates page is displayed. The templates other than New Database include datafiles. Choose New Database and then click Next.
DBCA now displays the Database Identification page. Enter the Global Database Name and Oracle System Identifier (SID). The Global Database Name is typically of the form name.domain, for example mydb.us.oracle.com while the SID is used to uniquely identify an instance (DBCA should insert a suggested SID, equivalent to name1 where name was entered in the Database Name field). In the RAC case the SID specified will be used as a prefix for the instance number. For example, MYDB, would become MYDB1, MYDB2 for instance 1 and 2 respectively.
input racdb.coredata as name and input racdb as sid prefix.

The Database Options page is displayed. Deselect all options (also click additional / standard database configuration) and accept the tablespaces to be dropped as well, then choose Next. Note: If you did not choose New Database from the Database Template page, you will not see this screen.

Select the dedicated server mode option from the Database Connection Options page. Note: If you did not choose New Database from the Database Template page, you will not see this screen. Click Next.

DBCA now displays the Initialization Parameters page. This page comprises a number of Tab fields. Modify the Memory settings if desired and then select the File Locations tab. The option Create persistent initialization parameter file is selected by default. The file name should point to the correct database link. Otherwise you didn't set the DBCA_RAW_CONFIG correctly or the file contains errors. The button All Initialization Parameters... displays the Initialization Parameters dialog box. This box presents values for all initialization parameters and indicates whether they are to be included in the spfile to be created through the check box, included (Y/N). Instance specific parameters have an instance value in the instance column. Complete entries in the All Initialization Parameters (especially check/correct the remote_listener entry, should read LISTENER) page and select Close. Note: There are a few exceptions to what can be altered via this screen. Ensure all entries in the Initialization Parameters page are complete. You might want to select an 8 bit character set in the db Sizing section and select Next.
shared pool 128M
large pool 16M
java pool 16M
select zhs16gbk for character sets

DBCA now displays the Database Storage Window. This page allows you to enter file names for each tablespace in your database. They should be pointing to the correct raw device or database link. Make sure to correct the datafile sizes to what you have planned and remember that the datafile should be somewhat smaller than the raw device. Click Next.
The Database Creation Options page is displayed. Ensure that the option Create Database is checked and click Finish.
The DBCA Summary window is displayed. Review this information and then click OK.
Once the Summary screen is closed using the OK option, DBCA begins to create the database according to the values specified.

You will get the password management window, complete as desired and click Exit.
A new database now exists. It can be accessed via Oracle SQL*PLUS or other applications designed to work with an Oracle RAC database. You should make sure to execute the startoracle_root.sh script at system startup as root and the startoracle.sh script as user oracle.

sys pass: sys123 12345 ? 123456?
system pass: system123

For Oracle RAC V9.2.0.1:
Please install patch Patch 2417903 available via MetaLink. This resolves an issue with slow startup of your instances.

# Pre Patch Installation tasks:
# -----------------------------
# Currently running CM processes i.e oracm and watchdogd
# have to be stopped(kill -9 ). This should be done
# on all the nodes on which CM for Real Application Clusters
# is running.
unzip 2417903.zip
cd 2417903
OPatch/opatch apply
# Patch Special Instructions:
# ---------------------------
# Make sure that all instances running under the ORACLE_HOME being
# patched are cleanly shutdown before installing this patch. Also
# ensure that the tool used to terminate the instance(s) has been
# exited cleanly.

# Post Patch Installation Tasks:
# ------------------------------
# Once the patch is installed on all the nodes, you need to start
# the Cluster Manager processes, i.e oracm and watchdogd. The
# script $ORACLE_HOME/oracm/admin/ocmstart.sh does the job. This
# should be done on all the nodes that needs to be part of the cluster.
#

su - oracle at and
rac1
export ORACLE_SID=racdb1
rac2
export ORACLE_SID=racdb2

can add to .bashrc.

su - oracle
srvctl config database -d racdb

srvctl start database -d racdb
srvctl status database -d racdb

srvctl stop database -d racdb

srvctl start instance -d racdb -i racinst1

srvctl status instance -d racdb -i racinst1

srvctl stop instance -d racdb -i racinst1

SQL> select machine,failover_type,failover_method,failed_over,count(*) from v$se
ssion group by machine,failover_type,failover_method,failed_over;

#below can't use!!!!
#at rac1 and rac2:
#echo > listener.ora
#the instance would register lisenter auto.

#below can't use!!!!
#sqplus '/ as sysdba' at rac1 and rac2
#alter system register

at /etc/init.d/dbora or /etc/rc.d/rc.local ( if you want to start soft auto)
you must copy env at .bashrc
before to start
/oracle/product/9.2.0/oracm/bin/ocmstart.sh
/oracle/product/9.2.0/bin/gsdctl start

at client ( run sqlplus )
this mode can be used: when rac1's instance and listener is shutdown ,it will connect to rac2.
cat tnsname.ora
RACDB1 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.61)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = RACDB )
)
)

RACDB2 =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.62)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = RACDB )
)
)
RACDB =
(DESCRIPTION =
(LOAD_BALANCE = off)
(failover = on)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.61)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.62)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = racdb)
)
)

another mode is TAF ( transparent application failover. )
TAF can enables the user to continue working by usering the new connection as
if the original connection had never failed. (means a connect would't broke )
还有就是应用程序必须定制，所以这个很少真正有用到的，毕竟没有人用sqlplus来当作应用程序跑，
所以TAF的实际使用还是很少。

cat tnsname.ora

rac1=
(DESCRIPTION =
(failover = on)
(load_balance = off)
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.61)(PORT = 1521))
# (ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.62)(PORT = 1521))
)
(CONNECT_DATA =
(service_name = racdb)
(failover_mode = (type = select) (method = basic)
(backup=rac2)(retries=20)(delay=15))
)
)
rac2=
(DESCRIPTION =
(failover = on)
(load_balance = on)
(ADDRESS_LIST =
# (ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.61)(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)(HOST = 211.68.29.62)(PORT = 1521))
)
(CONNECT_DATA =
(service_name = racdb)
(failover_mode = (type = select) (method = basic)
(backup=rac1)(retries=20)(delay=15))
)
)

to enable startup database at boot.
ln -s /oracle/admin/racdb/pfile/init.ora /oracle/product/9.2.0/dbs/initracdb.ora
vi /etc/oratab N to Y

comment：
A.
Please verify that your Linux version and the version or Oracle RDBMS is listed below as certified
together:
OS VERSION ORACLE VERSION Certified
---------- -------------- ---------
RedHat 7.0 9.0.1.x Yes
RedHat 7.1 9.0.1.x Yes
RedHat AS 2.1 9.0.1.x, 9.2.x Yes
SuSE 7.1 9.0.1.x Yes
SuSE 7.2 9.0.1.x Yes
SuSE SLE57 9.0.1.x, 9.2.x Yes
Note: RedHat 7.2, 7.3 or 8.x is not certified with any Oracle version!!

B.
http://download-west.oracle.com/ ... pf_ocm.htm#CHDFGHAI

Watchdog Daemon
The Watchdog daemon (watchdogd) uses the standard Linux Watchdog timer to monitor selected system resources to prevent database corruption.

The Watchdog daemon monitors the Node Monitor and the Cluster Manager and passes notifications to the Watchdog timer at defined intervals. The behavior of the Watchdog timer is partially controlled by the CONFIG_WATCHDOG_NOWAYOUT kernel configuration parameter.

Oracle9i Real Application Clusters requires that you set the value of the CONFIG_WATCHDOG_NOWAYOUT configuration parameter to Y (disable watchdog shutdown on close). When the Watchdog timer detects an Oracle instance or service failure, it resets the server to avoid possible corruption of the database. If the value of the CONFIG_WATCHDOG_NOWAYOUT parameter is N and a failure is detected, the Watchdog timer does not reset the server.

Oracle9i Real Application Clusters uses the software implementation of the Watchdog timer provided by the Linux kernel.

C.
If for any reason the Oracle installation didn't finish successfully, you might want to clean up the following files and directories before you restart over again:
rm -rf /etc/oraInst.loc /etc/oratab /tmp/OraInstall
/tmp/
$ORACLE_BASE/*

D.
核心参数需要设置shmmax，必须设置为内存的一般，不然就会在编译的时候出错，当然就不能正常建库了，解决办法是将核心参数设置好，然后在oracle用户下执行relink all 命令，清除temp目录内容，接着就可以建库了

E. fix oui bug. for java vm
Install of 9.2.0.2 Patchset on RAC and Linux Doesn't Finish

cd /oracle/oui/bin/linux
ln -s libclntsh.so.9.0 libclntsh.so

F. TAF test
from client :
sqlplus thtf/thtf@rac1
select instance_name from v$instance;
racdb1

from rac1 server:
select username,sid,serial# from GV$SESSION where username='THTF';
alter system disconnect session '19,49' post_transaction;

from client :
select instance_name from v$instance;
racdb2

from rac2 server:
select username,sid,serial# from GV$SESSION where username='THTF';
alter system disconnect session '10,39' post_transaction;

from client :
select instance_name from v$instance;
racdb1

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/756652/viewspace-242070/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/756652/viewspace-242070/

congnen9588

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
有关oracle高可靠性的一些讨论和想法(转)

有关oracle高可靠性的一些讨论和想法http://skyhorse.blogbus.com/logs/2004/03/106569.html有关RAC的工作日志：12月16日到12月23日做RAC的试验。12月24日把服务器...
复制链接

扫一扫