上周连续处理了好几个ODA的奇葩问题,说是ODA的问题,其实不然,都是人为原因导致,言归正传,我们来聊聊这一个dcs-10001的问题:
客户环境为ODA X9-2-HA 加强版,imageinfo 19.16 高校使用。
通过ODA UI创建集群数据库提示DCS-10001:Internal error encountered: Failed to create the database,起初我定位是有DCS故障引发,故检查了DCS与zk等日志,但一无所获,
检查项目包括:
dcs-agent.log zk.log等
随后查询后台知识库:
1、建议检查系统参数kernel.sem ,这些参数并未被修改,保持刷机时的状态;
2023-07-17 18:49:30,346 DEBUG [Thread-10374] [] c.o.d.c.u.CommonsUtils: Output :
Look at the log file "/u01/app/oracle/cfgtoollogs/dbca/dirxnatp/dirxnatp1.log" for further details.
2023-07-17 18:49:28,936 DEBUG [Thread-10374] [] c.o.d.c.u.CommonsUtils: Output :
[WARNING] ORA-27154: post/wait create failed
2023-07-17 18:49:28,936 DEBUG [Thread-10374] [] c.o.d.c.u.CommonsUtils: Output :
ORA-27302: failure occurred at: sskgpbitsper
2023-07-17 18:49:28,936 DEBUG [Thread-10374] [] c.o.d.c.u.CommonsUtils: Output : EMPTY CONTENT
2023-07-17 18:49:28,937 DEBUG [Thread-10374] [] c.o.d.c.u.CommonsUtils: Output :
[FATAL] ORA-01034: ORACLE not available ◄▬▬
/u01/app/oracle/cfgtoollogs/dbca/dirxnatp/dirxnatp.log
/u01/app/oracle/cfgtoollogs/dbca/dirxnatp/dirxnatp1.log
DCS-10001:Internal error encountered: Failed to create the database
2、BUG?
create missing audit directory with correct permission and start the 2nd instance.
$ORACLE_BASE/admin/DGC folder on ODA node 1, 'njodap3b' along with the subfolders as shown:'Bug 33783815' : LNX64-1914-CMT:TDE RAC DATABASE CREATION FAILED IN MIGRATED DCS ENV DUE TO MISSING AUDIT_FILE_DEST LOCATION IN NODE2
imageinfo 版本已解决,故排除bug原因。
3、检查/u01/app/oracle/cfgtoollogs/dbca log
发现错误信息如下:
[main] [ 2023-08-02 15:03:42.478 CST ] [CommandBuffer.checkCommandStatus:83] CommandBuffer:checkCommandStatus returning false, command number= 1 thread name= Thread[main,5,main]
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterConfig.submit:590] status=false
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterException.setErrorMessage:145] ClusterException.setErrorMessage: commands[0]=oracle.ops.mgmt.command.file.PathExistCommand@74971ed9
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterException.setErrorMessage:156] status : true
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterException.setErrorMessage:145] ClusterException.setErrorMessage: commands[1]=oracle.ops.mgmt.command.file.PathExistCommand@131fcb6f
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterException.setErrorMessage:156] status : false
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterException.setErrorMessage:184] ClusterException.setErrorMessage: errString is 'PRKC-1191 : Remote command execution setup check for node oda2 using shell /usr
/bin/ssh failed.
FIPS mode initializedssh: connect to host oda2 port 22: Connection refused'
[main] [ 2023-08-02 15:03:42.478 CST ] [ClusterConfig.destroy:468] destroying resources for client thread Thread[main,5,main]
[main] [ 2023-08-02 15:03:42.478 CST ] [Utils.getString:278] ==========Str is oda2
[main] [ 2023-08-02 15:03:42.479 CST ] [OracleHome.getNodeNames:446] exception checking oracle home existence on nodesPRKC-1027 : Error checking existence of file /u01/app/odaorahome/oracle/product/19.0.0
.0/dbhome_1 on oda2
[main] [ 2023-08-02 15:03:42.479 CST ] [OracleHome.getNodeNames:474] no of nodes returned1
INFO: Aug 02, 2023 3:03:42 PM oracle.install.commons.net.support.DefaultSSHSupportManager <init>
INFO: Preparing Default SSH support manager
PRKN-1038 : The command "/usr/bin/ssh -o FallBackToRsh=no -o PasswordAuthentication=no -o StrictHostKeyChecking=yes -o NumberOfPasswordPrompts=0 oda2 -n /bin/true" run on node "oda1" gave an unexpected
output: "FIPS mode initializedssh: connect to host oda2 port 22: Connection refused"
错误原因居然是22端口不通,经与客户核实,端口已被修改为非默认,后修改为默认22端口后通过ODA GUI 创建数据库正常。
总结:ODA产品某些默认配置不能修改。