Oracle Exadata Machine X4-2实施记录

     这是在我第一次成功部署Oracle Exadata Machine X4-2后总结的文章。
    这篇文章记录在第一次实施过程中遇到的问题,以及解决问题的过程。

1.执行onecommand第1步验证配置文件时的报错。

   出现这个问题的原因是我使用vi编辑器手动修改了操作系统的机器名,将原有的管理网段机器名(dm01dbadm01)修改为了Client网段机器名(dm01db01),下面是报错的内容:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1

 Executing Validate Configuration File..............java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db01.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

......

Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.validation.DnsValidation.validateClusterDns(DnsValidation.java:137)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.reflect.InvocationTargetException
        ... 11 more
Caused by: oracle.onecommand.escommon.common.OcmdException: Unable to locate file dm01db02.tequ.com--etc-hosts
        at oracle.onecommand.escommon.common.EsCommonUtils.getFileInputStream(EsCommonUtils.java:834)
        at oracle.onecommand.escommon.common.FileUtils.readFile(FileUtils.java:103)
        at oracle.onecommand.deploy.validation.DnsValidation.getMapfromHostsFile(DnsValidation.java:322)
        at oracle.onecommand.deploy.validation.DnsValidation.validateDnsOnMachine(DnsValidation.java:216)
        ... 11 more
.....
 Validating cluster: cluster-clu1
  Locating machines...
  Verifying operating systems...
  Validating cluster networks......
  Validating network connectivity............
  Validating NTP setup..........
  Validating physical disks on storage cells........................................................
 Completed validation...
 
 SUCCESS: Validated NTP server 10.0.8.114
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
 SUCCESS: 
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
 ......

    最开始我一直以为这个报错是hosts的配置问题,后来静下心来想了一下,报错是找不到 dm01db02.tequ.com--etc-hosts和 dm01db02.tequ.com--etc-hosts 两个文件,难道这是两个单独的文件,于是对onecommand目录进行了搜索:
[root@dm01db01 linux-x64]# find . -name *host*
./WorkDir/dm01dbadm02.tequ.com--etc-hosts
./WorkDir/dm01dbadm01.tequ.com--etc-hosts

   果然在WorkDir目录下有两个原有机器名的类似文件。

[root@dm01db01 linux-x64]# cd WorkDir
[root@dm01db01 WorkDir]# ls
dm01celadm01.tequ.com-cpuInfo.txt  dm01db02.tequ.com-cpuInfo.txt     dm01dbadm02.tequ.com-memInfo.txt
dm01celadm01.tequ.com-memInfo.txt  dm01db02.tequ.com-memInfo.txt     dm01dbadm02.tequ.com-ntpConf.txt
dm01celadm02.tequ.com-cpuInfo.txt  dm01db02.tequ.com-ntpConf.txt     p13390677_112040_Linux-x86-64_1of7.zip
dm01celadm02.tequ.com-memInfo.txt  dm01dbadm01.tequ.com-cpuInfo.txt  p13390677_112040_Linux-x86-64_2of7.zip
dm01celadm03.tequ.com-cpuInfo.txt  dm01dbadm01.tequ.com--etc-hosts   p13390677_112040_Linux-x86-64_3of7.zip
dm01celadm03.tequ.com-memInfo.txt  dm01dbadm01.tequ.com-memInfo.txt  p18371656_112040_Linux-x86-64.zip
dm01db01.tequ.com-cpuInfo.txt      dm01dbadm01.tequ.com-ntpConf.txt  p6880880_112000_Linux-x86-64.zip
dm01db01.tequ.com-memInfo.txt      dm01dbadm02.tequ.com-cpuInfo.txt
dm01db01.tequ.com-ntpConf.txt      dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# ls *host*
dm01dbadm01.tequ.com--etc-hosts  dm01dbadm02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cat dm01dbadm01.tequ.com--etc-hosts 
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1       localhost.localdomain   localhost


192.168.10.1    dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2    dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10       dm01db01.tequ.com       dm01db01
10.255.255.10   dm01dbadm01.tequ.com    dm01dbadm01


192.168.10.3    dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4    dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11       dm01db02.tequ.com       dm01db02
10.255.255.11   dm01dbadm02.tequ.com    dm01dbadm02


10.0.3.13       dm01db02-vip.tequ.com   dm01db02-vip
10.0.3.12       dm01db01-vip.tequ.com   dm01db01-vip
#### END Generated by Exadata ####

   文件的内容和/etc/hosts文件是一致的,说明在执行onecommand的时候读取的是WorkDir下的*--etc-hosts文件,而不是直接读取/etc/hosts文件。

直接拷贝这两份文件:
[root@dm01db01 WorkDir]# cp dm01dbadm01.tequ.com--etc-hosts dm01db01.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# cp dm01dbadm02.tequ.com--etc-hosts dm01db02.tequ.com--etc-hosts
[root@dm01db01 WorkDir]# 
[root@dm01db01 WorkDir]# cd ..

之后再次执行验证命令:
[root@dm01db01 linux-x64]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/linux-x64/tequ-dm01.xml -s 1

 Executing Validate Configuration File...................
 Validating cluster: cluster-clu1
  Locating machines...
  Verifying operating systems...
  Validating cluster networks......
  Validating network connectivity............
  Validating NTP setup..........
  Validating physical disks on storage cells........................................................
 Completed validation...
 
 SUCCESS: 10.255.255.10 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.10 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.255.255.11 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.11 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.13 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.0.3.12 configured correctly on machine dm01db01.tequ.com
 SUCCESS: 10.255.255.10 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.10 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.255.255.11 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.11 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.13 configured correctly on machine dm01db02.tequ.com
 SUCCESS: 10.0.3.12 configured correctly on machine dm01db02.tequ.com
 SUCCESS: Validated NTP server 10.0.8.114
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db02.tequ.com, machine type: compute
 SUCCESS: Found Operating system LinuxPhysical and configuration file expects LinuxPhysical on machine dm01db01.tequ.com, machine type: compute
 SUCCESS: 
 SUCCESS: NTP servers on machine dm01db02.tequ.com verified successfully
 SUCCESS: NTP servers on machine dm01db01.tequ.com verified successfully
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p18371656_112040_Linux-x86-64.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip exists...
 SUCCESS: Required file /opt/oracle.SupportTools/onecommand/linux-x64/WorkDir/p6880880_112000_Linux-x86-64.zip exists...
 
 Following errors were found...
 ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db01.tequ.com
 ERROR: 10.0.3.16 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: 10.0.3.14 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: 10.0.3.15 with hostname dm01-scan is not configured in DNS and in /etc/hosts on dm01db02.tequ.com
 ERROR: Encountered error while checking NTP server. Error getting time from NTP server: 10.0.5.114
 
 Errors occured...

    没有再报之间的Java错误,这里还报了dm01-scan解析有问题,通过在操作系统层面执行:
[root@dm01db01 bin]# nslookup dm01-scan
Server:         10.0.8.114
Address:        10.0.8.114#53

Name:   dm01-scan.tequ.com
Address: 10.0.3.15
Name:   dm01-scan.tequ.com
Address: 10.0.3.16
Name:   dm01-scan.tequ.com
Address: 10.0.3.14
   解析dm01-scan和dm01-scan.domain解析都没问题,于是将该问题忽略。

   NTP可以配置多个,只要保障有一个暂时可用,以上的NTP错误即可忽略。

   另外想强调一点,onecommand目录下的log目录会详细记录每一次的onecommand操作,其他错误可以通过查看对应的日志来找问题。
[root@dm01db01 oracle.SupportTools]# cd /opt/oracle.SupportTools/onecommand/log
[root@dm01db01 log]# ll
total 33328
-rw-r--r-- 1 root root   61426 Jun 11 23:41 log.out
-rw-r--r-- 1 root root   57376 Jun 11 14:15 Step10_Initialize_Cluster_Software_140611_140220.out
-rw-r--r-- 1 root root 4369159 Jun 11 14:33 Step11_Install_Database_Software_140611_142500.out
-rw-r--r-- 1 root root    2136 Jun 11 14:38 Step12_Relink_Database_with_RDS_140611_143823.out
-rw-r--r-- 1 root root   84212 Jun 11 14:39 Step13_Create_ASM_Diskgroups_140611_143851.out
-rw-r--r-- 1 root root  217597 Jun 11 14:41 Step14_Create_Databases_140611_144043.out
-rw-r--r-- 1 root root  235300 Jun 11 19:52 Step14_Create_Databases_140611_195204.out
-rw-r--r-- 1 root root  236827 Jun 11 20:35 Step14_Create_Databases_140611_203506.out
-rw-r--r-- 1 root root  235302 Jun 11 20:43 Step14_Create_Databases_140611_204302.out
-rw-r--r-- 1 root root  225397 Jun 11 21:07 Step14_Create_Databases_140611_204615.out
-rw-r--r-- 1 root root  402325 Jun 11 21:14 Step15_Apply_Security_Fixes_140611_210805.out
-rw-r--r-- 1 root root  655789 Jun 11 21:23 Step16_Create_Installation_Summary_140611_212222.out
-rw-r--r-- 1 root root  741416 Jun 11 21:35 Step17_Resecure_Machine_140611_212329.out
-rw-r--r-- 1 root root   28409 Jun 11 23:32 Step17_Resecure_Machine_140611_233217.out
-rw-r--r-- 1 root root  256551 Jun 11 01:39 Step1_Validate_Configuration_File_140609_141934.out
-rw-r--r-- 1 root root  492178 Jun 11 01:39 Step1_Validate_Configuration_File_140609_142913.out
-rw-r--r-- 1 root root  492398 Jun 11 01:39 Step1_Validate_Configuration_File_140609_165947.out
-rw-r--r-- 1 root root  494475 Jun 11 01:39 Step1_Validate_Configuration_File_140609_171926.out
-rw-r--r-- 1 root root  495927 Jun 11 01:39 Step1_Validate_Configuration_File_140610_095109.out
-rw-r--r-- 1 root root  491049 Jun 11 01:39 Step1_Validate_Configuration_File_140610_104933.out
-rw-r--r-- 1 root root  227377 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111043.out
-rw-r--r-- 1 root root  581537 Jun 11 01:39 Step1_Validate_Configuration_File_140610_111825.out
-rw-r--r-- 1 root root  580946 Jun 11 01:39 Step1_Validate_Configuration_File_140610_112825.out
-rw-r--r-- 1 root root  580689 Jun 11 01:39 Step1_Validate_Configuration_File_140610_114206.out
-rw-r--r-- 1 root root  583397 Jun 11 01:39 Step1_Validate_Configuration_File_140610_141909.out
-rw-r--r-- 1 root root  487083 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143318.out
-rw-r--r-- 1 root root  534536 Jun 11 01:39 Step1_Validate_Configuration_File_140610_143842.out
-rw-r--r-- 1 root root  531754 Jun 11 01:39 Step1_Validate_Configuration_File_140610_145105.out
-rw-r--r-- 1 root root  585919 Jun 11 01:39 Step1_Validate_Configuration_File_140610_235714.out
-rw-r--r-- 1 root root  490349 Jun 11 01:39 Step1_Validate_Configuration_File_140611_005853.out
-rw-r--r-- 1 root root  489191 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010318.out
-rw-r--r-- 1 root root  489267 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010452.out
-rw-r--r-- 1 root root  489311 Jun 11 01:39 Step1_Validate_Configuration_File_140611_010935.out
-rw-r--r-- 1 root root  490987 Jun 11 01:39 Step1_Validate_Configuration_File_140611_011338.out
-rw-r--r-- 1 root root  491003 Jun 11 01:42 Step1_Validate_Configuration_File_140611_014106.out
-rw-r--r-- 1 root root 2468856 Jun 11 01:39 Step2_Setup_Required_Files_140610_113549.out
-rw-r--r-- 1 root root 2458010 Jun 11 01:39 Step2_Setup_Required_Files_140611_011643.out
-rw-r--r-- 1 root root   48406 Jun 11 01:46 Step2_Setup_Required_Files_140611_014537.out
-rw-r--r-- 1 root root 2468732 Jun 11 01:50 Step2_Setup_Required_Files_140611_014853.out
-rw-r--r-- 1 root root 2861511 Jun 11 10:10 Step2_Setup_Required_Files_140611_100656.out
-rw-r--r-- 1 root root  663710 Jun 11 01:39 Step3_Update_Nodes_for_Eighth_Rack_140610_113946.out
-rw-r--r-- 1 root root  697616 Jun 11 10:13 Step3_Update_Nodes_for_Eighth_Rack_140611_101020.out
-rw-r--r-- 1 root root  666778 Jun 11 10:27 Step3_Update_Nodes_for_Eighth_Rack_140611_102543.out
-rw-r--r-- 1 root root  386914 Jun 11 10:40 Step4_Create_Users_140611_103752.out
-rw-r--r-- 1 root root   17602 Jun 11 10:40 Step5_Setup_Cell_Connectivity_140611_104041.out
-rw-r--r-- 1 root root  701735 Jun 11 10:45 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_104104.out
-rw-r--r-- 1 root root  697829 Jun 11 11:06 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_110244.out
-rw-r--r-- 1 root root  771081 Jun 11 11:28 Step6_Verify_Infiniband_and_Calibrate_Cells_140611_112140.out
-rw-r--r-- 1 root root   23701 Jun 11 11:33 Step7_Create_Cell_Disks_140611_113025.out
-rw-r--r-- 1 root root  590542 Jun 11 11:34 Step8_Create_Grid_Disks_140611_113429.out
-rw-r--r-- 1 root root   70332 Jun 11 11:38 Step9_Install_Cluster_Software_140611_113621.out
-rw-r--r-- 1 root root   70570 Jun 11 12:00 Step9_Install_Cluster_Software_140611_115755.out
-rw-r--r-- 1 root root  190715 Jun 11 14:01 Step9_Install_Cluster_Software_140611_135339.out
-rw-r--r-- 1 root root   13982 Jun 11 19:48 UndoStep14_Create_Databases_140611_194842.out
-rw-r--r-- 1 root root   14250 Jun 11 19:51 UndoStep14_Create_Databases_140611_195103.out
-rw-r--r-- 1 root root   14004 Jun 11 19:51 UndoStep14_Create_Databases_140611_195135.out

2.执行onecommand第2步创建必要文件时的报错。

下面是在执行onecommand第二步时候的报错:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2

 Executing Setup Required Files..
 Copying and extracting required files...
 Required files are:
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
 Copying required files...
 Checking status of remote files..........
 Getting status of local files............
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip.
 Copying file: p18371656_112040_Linux-x86-64.zip to node dm01db02.tequ.com.
 Copying file: p6880880_112000_Linux-x86-64.zip to node dm01db02.tequ.com............
 Completed copying files.....
 Extracting required files............................
 Copying resourcecontrol and other required files........................
 Execution Exception in future get
 OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException
 Error running Setup Required Files error message Error running oracle.onecommand.deploy.software.SoftwareUtils method setupRequiredFiles

    从这个报错看不出任何的原因。

查看 /opt/oracle.SupportTools/onecommand/log/ Step2_Setup_Required_Files_140611_014853.out,找到第一个报错的地方,下面是相关的日志输出:
......
2014-06-11 01:50:26,449 [FINE  ][MDThread][        KommandOutput:95] ======
2014-06-11 01:50:26,449 [FINE  ][MDThread][        KommandOutput:64] # of kommand outputs 1
2014-06-11 01:50:26,449 [FINE  ][MDThread][          RunCommand:170] Ran commands, elapsed time = 17006 mS
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:79] ======
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:80] Output
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:81] ======
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:84] Ret code = <52>
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:86] From node dm01celadm01.tequ.com
2014-06-11 01:50:26,455 [FINE  ][MDThread][        KommandOutput:89] ## Output Start
2014-06-11 01:50:26,455 [FINE  ][MDThread][       EsCommonUtils:596] OCMD-00052: Node dm01celadm01.tequ.com appears to be down.
2014-06-11 01:50:26,456 [FINE  ][MDThread][        KommandOutput:91] ## Output End
2014-06-11 01:50:26,456 [FINE  ][MDThread][        KommandOutput:95] ======
2014-06-11 01:50:26,456 [FINE  ][MDThread][        OcmdException:62] Throwing OcmdException... message:Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully...
2014-06-11 01:50:26,456 [FINE  ][MDThread][        OcmdException:98] Stack trace...
2014-06-11 01:50:26,457 [FINE  ][MDThread][       OcmdException:135] OcmdException from node dm01db01.tequ.com return code = 2 output string: Command [mkdir -p /opt/oracle.SupportTools] run on node 10.255.255.12 as user root did not execute successfully... stack trace = java.lang.Throwable
        at oracle.onecommand.escommon.common.OcmdException.ocmdException(OcmdException.java:95)
        at oracle.onecommand.escommon.common.OcmdException.(OcmdException.java:64)
        at oracle.onecommand.commandexec.utils.CommonUtils.checkKommandOutput(CommonUtils.java:1369)
        at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1491)
        at oracle.onecommand.commandexec.utils.RemoteFileUtils.sftpPutFile(RemoteFileUtils.java:1378)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.CallableReflectionMethod.runMethodInParallel(CallableReflectionMethod.java:121)
        at oracle.onecommand.commandexec.utils.CallableReflectionMethod.call(CallableReflectionMethod.java:70)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)

2014-06-11 01:50:26,457 [INFO  ][    main][          RunCommand:714] Execution Exception in future get
2014-06-11 01:50:26,458 [INFO  ][    main][          RunCommand:721] OCMD-02624: Error while executing command {0}.java.lang.reflect.InvocationTargetException

   从表面上看是说在dm01celadm01(存储服务器)上执行mkdir命令不成功,但手动在存储服务器执行该命令是没问题的。原因可能是在数据库服务器节点 dm01celadm01.tequ.com名称 不能被解析,检查数据库服务器的/etc/hosts文件,确实没有配置对3台存储服务器名称的解析,将以下内容加入到两台数据库服务器hosts文件中
#### BEGIN Generated by Exadata. DO NOT MODIFY ####
127.0.0.1 localhost.localdomain localhost

192.168.10.1 dm01db01-priv1.tequ.com dm01db01-priv1
192.168.10.2 dm01db01-priv2.tequ.com dm01db01-priv2
10.0.3.10 dm01db01.tequ.com dm01db01
10.255.255.10 dm01dbadm01.tequ.com dm01dbadm01

192.168.10.3    dm01db02-priv1.tequ.com dm01db02-priv1
192.168.10.4    dm01db02-priv2.tequ.com dm01db02-priv2
10.0.3.11       dm01db02.tequ.com       dm01db02
10.255.255.11   dm01dbadm02.tequ.com    dm01dbadm02

10.0.3.13       dm01db02-vip.tequ.com   dm01db02-vip
10.0.3.12       dm01db01-vip.tequ.com   dm01db01-vip

10.255.255.12   dm01celadm01.tequ.com   dm01celadm01
10.255.255.13   dm01celadm02.tequ.com   dm01celadm02
10.255.255.14   dm01celadm03.tequ.com   dm01celadm03
#### END Generated by Exadata ####

之后再次执行onecommand操作:
[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 2

 Executing Setup Required Files.
 Copying and extracting required files...
 Required files are:
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip
 /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip
 Copying required files...
 Checking status of remote files..........
 Checking status of existing files on remote nodes....
 Getting status of local files.................
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_1of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_1of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_2of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_2of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p13390677_112040_Linux-x86-64_3of7.zip at /opt/oracle.SupportTools/onecommand/p13390677_112040_Linux-x86-64_3of7.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p18371656_112040_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/p18371656_112040_Linux-x86-64.zip.
 Creating symbolic link for file /opt/oracle.SupportTools/onecommand/WorkDir/p6880880_112000_Linux-x86-64.zip at /opt/oracle.SupportTools/onecommand/Software/patches/p6880880_112000_Linux-x86-64.zip..
 Extracting required files........................
 Copying resourcecontrol and other required files.............................................................................................................................
 Creating databasemachine.xml for EM discovery
 Done Creating databasemachine.xml for EM discovery.
 Successfully completed execution of step Setup Required Files [elapsed Time [Elapsed = 185647 mS [3.0 minutes] Wed Jun 11 10:10:02 CST 2014]]

成功完成第二步。

3.执行onecommand第6步验证InfiniBand相关配置。

下面是执行第6步时候的报错:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 6

 Executing Verify Infiniband and Calibrate Cells
 Running rds ping tests on cluster nodes...........................................................................................................
 Validating infiniband network with rds-ping.....
 No ping errors while pinging infiniband fabric.......................................................................................................................................
 dm01celadm02.tequ.com
 ssh: dm01celadm03: Temporary failure in name resolution
 ssh: dm01celadm01: Temporary failure in name resolution
 ssh: dm01db02: Temporary failure in name resolution
 ssh: dm01db01: Temporary failure in name resolution
 Error running Verify Infiniband and Calibrate Cells error message Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband
 Error running oracle.onecommand.deploy.validation.ValidationUtils method validateInfiniband

在操作系统层面可以使用rds-ping >来验证rds包传输。
手动测试没问题之后,再次执行第6步操作成功

4.执行onecommand第9步安装Cluster软件时候的报错。

在执行第9步的时候收到如下的报错:

[root@dm01db01 onecommand]# ./install.sh -cf /opt/oracle.SupportTools/onecommand/tequ-dm01.xml -s 9

 Executing Install Cluster Software
 Installing cluster cluster-clu1.
 Getting grid disks using utility in /opt/oracle.SupportTools/onecommand/Software/11.2.0.4/grid...................
 Running Oracle installer.................................................................................................................java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
        at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
        at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
        ... 11 more

 java.util.concurrent.ExecutionException: java.lang.reflect.InvocationTargetException
        at java.util.concurrent.FutureTask$Sync.innerGet(Unknown Source)
        at java.util.concurrent.FutureTask.get(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer.getOEDAResults(Parallelizer.java:367)
        at oracle.onecommand.deploy.software.SoftwareUtils.getKommandOutputsFromParallelizer(SoftwareUtils.java:1304)
        at oracle.onecommand.deploy.software.SoftwareUtils.doInstallClusterware(SoftwareUtils.java:1327)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterWare(SoftwareUtils.java:1292)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeStep(InstalSoftware.java:526)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.executeForwardAction(InstalSoftware.java:462)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.parseCmdLine(InstalSoftware.java:368)
        at oracle.onecommand.deploy.cliXml.InstalSoftware.main(InstalSoftware.java:265)
Caused by: java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.run(Parallelizer.java:520)
        at oracle.onecommand.commandexec.utils.Parallelizer$ParallelCallable.call(Parallelizer.java:538)
        at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
        at java.util.concurrent.FutureTask.run(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Caused by: oracle.onecommand.escommon.common.OcmdException: Installation did not complete successfully, please check logs in /u01/app/oraInventory
        at oracle.onecommand.deploy.software.ClusterZipInstall112040.install(ClusterZipInstall112040.java:358)
        at oracle.onecommand.deploy.software.SoftwareUtils.installClusterwareByCluster(SoftwareUtils.java:1342)
        ... 11 more


 Errors occured...

查看log/Step9_Install_Cluster_Software_140611_115755.out文件可以看到下面的报错:
2014-06-11 12:00:07,602 [FINE  ][thread-1][       EsCommonUtils:596] Preparing to launch Oracle Universal Installer from /tmp/OraInstall2014-06-11_11-58-17AM. Please wait ...[FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    CAUSE: Installer has detected that network interface ib0 does not maintain connectivity on all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    ACTION: Ensure that the chosen interface has been configured across all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596] [FATAL] [INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    CAUSE: Installer has detected that network interface ib1 does not maintain connectivity on all cluster nodes.

2014-06-11 12:00:07,603 [FINE  ][thread-1][       EsCommonUtils:596]    ACTION: Ensure that the chosen interface has been configured across all cluster nodes.

   这种报错平时我们也会遇到,通过MOS找到下面这篇文章:

[INS-41112] Specified network interface doesnt maintain connectivity across cluster nodes. (文档 ID 1427202.1)
修改时间: 2013-7-8 类型: REFERENCE

In this Document

Purpose
Details
References


APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

PURPOSE

The note lists problems, solutions or workarounds that's related to the following 11gR2 GI OUI error:

[FATAL] [ INS-41112 ] Specified network interface doesnt maintain connectivity across cluster nodes.
CAUSE: Installer has detected that network interface  eth1  does not maintain connectivity on all cluster nodes.
ACTION: Ensure that the chosen interface has been configured across all cluster nodes.


DETAILS


[INS-41112] is a high level error number, the workarounds/solutions depend on the error code from lower layer, however, [INS-41112] does tell which interface is having the issue:

CAUSE: Installer has detected that network interface  eth1  does not maintain connectivity on all cluster nodes.

## >> in this case, it's eth1 that's having connectivityissue



To find out lower layer error code, execute the following as grid user:

runcluvfy.sh comp nodecon -i -n ,, -verbose



Refer to the following once CVU reports real error code:

  • PRVF-7617
Refer to  note 1335136.1  for details.

 

  • PRVF-6020 : Different MTU values used across network interfaces in subnet "10.10.10.0"
Refer to  note 1429104.1  for details.

手动执行runcluvfy.sh命令验证安装环境,如果验证通过可以再次尝试执行这步。
再次执行这个步骤成功。

5.执行onecommand第14步DBCA创建数据库。
   在执行第14步前之前确保所有数据库服务器操作系统grid和oracle用户的环境变量已经正确配置。

    这次Exadata 实施是我的第一次,通过这个过程学到了不少东西,在整个实施过程需要注意四 点:
1).Exadata过程中没有图形化、没有字符界面 工具,几乎全是脚本和命令的方式。
2 ). 在实施前的规划过程中一定要考虑周全 ,避免在实施过程中推翻之前的规划,特别是IP规划。
2).尽量避免手动的修改包括IP地址、主机名等在内的配置文件。
3).出现报错心要静,仔细查看日志,仔细分析报错的内容,才能很快的找到问题的原因。

在手动修改主机IP的时候还遇到了如下的问题:
   参考文章《ssh连接Linux收到The remote system refused the connection报错》:http://blog.itpub.net/23135684/viewspace-1181160/

--end--

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/23135684/viewspace-1181296/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/23135684/viewspace-1181296/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值