这是一次特殊的刷机过程,因此其中的部分步骤没有执行(普通的刷机过程远比这个简单很多,这里涉及的大部分问题,普通的刷机过程大概都不会碰到)。
这次年初的刷机经历让我记忆犹新,碰到的问题不止10个,8个。。。因为之前安装执行了“ResecureMachine”,本次刷机又只刷db nodes,没有刷cell nodes,onecommand中很多东西是写死的,比如密码“welcome1”,而带有强安全认证的系统密码是极其复杂的,这样onecommand中很多使用dcli -g执行的命令根本无法执行,手工配置ssh后,也是问题多多,我几乎都快忘记那个晚上是怎么过的了。。。
那个晚上我跟一个兄弟一起摸爬滚打,想起来也收获颇多呀,感谢兄弟一路相伴,也让我学习到很多东西!!
每个版本的onecommand可能会有所不同(目前看还都是大同小异),但大致的步骤都差不多。
下面的解释是基于安装过程执行的脚本分析的,表明在哪里执行只代表该脚本影响的是db nodes 还是cell nodes。
[root@dm01db01 onecommand]# ./deploy11203.sh -l
INFO: Logging all actions in /opt/oracle.SupportTools/onecommand/tmp/dm01db01-20120331164744.log and traces in /opt/oracle.SupportTools/onecommand/tmp/dm01db01-20120331164744.trc
INFO: Loading configuration file /opt/oracle.SupportTools/onecommand/onecommand.params...
The steps in order are...
Step 0 = ValidateEnv --在DB和CELL节点都做了,验证全部安装环境是否ready,在执行reimage前必须先check
Step 1 = CreateWorkDir --创建必要的目录,仅在 DB节点做
Step 2 = UnzipFiles --解压文件
Step 3 = setupSSHroot --配置SSH和sodu
Step 4 = UpdateEtcHosts --更新主机的etc hosts文件(怀疑dns等配置是否在此完成)
Step 5 = CreateCellipinitora --在db_nodes节点创建/etc/oracle/cell/network-config目录,创建 cellip.ora 和 cellinit.ora
Step 6 = ValidateIB ---在db_nodes执行,检查infinicheck : verify-topology -t quarterrack
Step 7 = ValidateCell
Step 8 = PingRdsCheck ---rdsping:all_group,all_ib_group
Step 9 = RunCalibrate --!!!!!在cell_group节点上做IPOS测试,因为怕担心该步骤会在celldisk上写东西,本次没有做
Step 10 = CreateUsers --在db节点创建组: racoper asmadmin asmdba asmoper oinstall dba
--在db节点创建用户: oracle grid
--修改 /opt/oracle.SupportTools/onecommand/tmp/ocmd-checkusers.pl 的权限
Step 11 = SetupSSHusers --配置grid和oracle的SSH
Step 12 = CreateGridDisks --先将cell的几个服务重启,然后删除以前的celldisk,然后创建新的celldisk
Step 13 = GridSwInstall --安装grid软件(似乎只是copy了文件,似乎对应普通的图形界面安装的relink之前的所有步骤)
Step 14 = PatchGridHome --给grid打patch
Step 15 = RelinkRDSGI --将grid软件执行relink操作,relink之后才能有bin目录下的许多工具,如crsctl等等
Step 16 = GridRootScripts --执行root脚本
Step 17 = DbSwInstall --安装db软件
Step 18 = PatchDBHomes --给db软件打patch
Step 19 = CreateASMDiskgroups --创建ASM磁盘组(似乎对应图形安装的asmca)
Step 20 = DbcaDB --使用dbca创建测试数据库
Step 21 = DoUnlock /opt/oracle.SupportTools/onecommand/tmp/updsysctl.sh
/etc/sysctl.conf
unlocking /u01/app/11.2.0.3/grid for RDS relink
Step 22 = RelinkRDSDb /opt/oracle.SupportTools/onecommand/tmp/relinkrds.sh
Step 23 = LockUpGI
INFO: Locking up Grid Infrastructure home, and this will start the stack...
INFO: Running /u01/app/11.2.0.3/grid/crs/install/rootcrs.pl -patch using /opt/oracle.SupportTools/onecommand/tmp/rootcrspatch-20120330121453.sh...
SUCCESS: Locking /u01/app/11.2.0.3/grid completed
Step 24 = ApplySecurityFixes
=======================================================================
INFO: Dropping users not in sys,system,sysman,outlm,dbsnmp,wmsys and oracle_ocm...
INFO: Running on the local node: /bin/su oracle -c /opt/oracle.SupportTools/onecommand/tmp/dropunsecureusers.sh
INFO: Securing old executables..
SUCCESS: Ran /usr/bin/ssh -l root dm01db01 "/bin/chown root:root /opt/oracle.SupportTools/onecommand/tmp/chmodcrshome.sh; /bin/chmod 0766 /opt/oracle.SupportTools/onecommand/tmp/chmodcrshome.sh" and it returned: RC=0
INFO: Running as root: /usr/bin/ssh -l root dm01db02 /opt/oracle.SupportTools/onecommand/tmp/chmodcrshome.sh
INFO: Securing /u01/app/11.2.0.3/grid..
INFO: Going to run /bin/chmod -R 755 /opt/oracle.SupportTools/onecommand on nodes in /opt/oracle.SupportTools/onecommand/tmp/db_nodes as user root... #Step 24#
INFO: Running as root: /usr/bin/ssh -l root dm01db02 /opt/oracle.SupportTools/onecommand/tmp/DoAllcmds-20120330121852.sh
INFO: Removing temporary files from cells...
SUCCESS: Ran /usr/local/bin/dcli -g /opt/oracle.SupportTools/onecommand/tmp/db_nodes -l root "find /u01/app/grid -perm -2 -type f -print -exec chmod o-w {} \;" and it returned: RC=0
SUCCESS: Ran /usr/local/bin/dcli -g /opt/oracle.SupportTools/onecommand/tmp/db_nodes -l root "find /u01/app/oracle -perm -2 -type f -print -exec chmod o-w {} \;" and it returned: RC=0
INFO: Setting up snapshot controlfile for database wmrz...
INFO: Running on the local node: /bin/su oracle -c /opt/oracle.SupportTools/onecommand/tmp/configuresnapshotcontrolfile-wmrz.sh
INFO: Validating OSWatcher on cells ...
INFO: OSWatcher does not need updating in this image version ...
SUCCESS: Ran /usr/bin/ssh -l root dm01db01 "/bin/chown root:root /opt/oracle.SupportTools/onecommand/tmp/updsyslog.sh; /bin/chmod 0766 /opt/oracle.SupportTools/onecommand/tmp/updsyslog.sh" and it returned: RC=0
SUCCESS: Ran /usr/local/bin/dcli -g /opt/oracle.SupportTools/onecommand/tmp/db_nodes -l root "/sbin/service syslog restart" and it returned: RC=0
INFO: Going to stop and start clusterware..
INFO: Running /u01/app/11.2.0.3/grid/bin/crsctl start cluster -all to start clusterware on all nodes...
INFO: Mounting diskgroups across all nodes...
INFO: Running to mount diskgroup...
INFO: Diskgroup DATA_DM01 already running...
INFO: Running /bin/su grid -c "/u01/app/11.2.0.3/grid/bin/srvctl start diskgroup -g DATA_DM01" to mount diskgroup...
INFO: Diskgroup RECO_DM01 already running...
INFO: Time spent in step 24 ApplySecurityFixes is 248 seconds.
SUCCESS: Diskgroups should be mounted, going to check status of clusterware in 30 seconds...
INFO: Time spent in step 24 ApplySecurityFixes is 278 seconds.
INFO: Going to see if everything is back up by running crsctl stat res -t ...
SUCCESS: Successfully restarted Grid Infrastructure on all nodes in /opt/oracle.SupportTools/onecommand/tmp/db_nodes...
=======================================================================
Step 25 = SetupCellEmailAlerts --安装邮件系统
Step 26 = ResecureMachine --强安全
[root@dm01db01 onecommand]#