现象
Trafodion中建表或创建索引时,有时候会报以下错误,
*** ERROR[8448] Unable to access Hbase interface. Call to ExpHbaseInterface::create() returned error HBASE_CREATE_ERROR(701). Cause: java.io.IOException: createTable exception. Unable to create table TRAFODION.PKSAAS.IDX_MAC Reason: java.io.IOException: createTable call error
org.trafodion.dtm.HBaseTxClient.callCreateTable(HBaseTxClient.java:2070) Caused by
java.io.IOException: java.util.concurrent.ExecutionException: java.io.IOException: java.io.IOException: pushOnlineEpoch -- Error: current onlineEpoch 1499063472720 is less than new onlineEpoch 1499063474350, transId: 3096233333801809 in region: TRAFODION.PKSAAS.IDX_MAC,\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00,1499063474876.95cbea5810b97773deced4a97aa60348.
分析
OnlineEpoch是与时钟同步有关的错误,由于Trafodion是分布式数据库,需要保证不同的节点时钟是同步的。
解决
1 检查各节点的ntp服务
[trafodion@n11 logs]$ sudo pdsh $MY_NODES service ntpd status
n12: ntpd (pid 42175) is running...
n11: ntpd (pid 7277) is running...
n13: ntpd (pid 52856) is running...
n14: ntpd (pid 33192) is running...
2 检查各节点的时间
[trafodion@n11 logs]$ pdsh $MY_NODES date
n12: Mon Jul 3 14:39:29 CST 2017
n13: Mon Jul 3 14:39:32 CST 2017
n14: Mon Jul 3 14:39:32 CST 2017
n11: Mon Jul 3 14:39:32 CST 2017
3 从以上结果判断n12的时间与其他节点不一致,需要手动同步(1 关闭ntp服务 2 ntpdate 3 启动ntp服务)
[root@n12 trafodion]# service ntpd stop
Shutting down ntpd: [ OK ]
[root@n12 trafodion]# ntpdate n11
3 Jul 14:42:25 ntpdate[33285]: step time server 10.10.11.11 offset 2.854010 sec
[trafodion@n12 ~]$ pdsh $MY_NODES date
n12: Mon Jul 3 14:42:49 CST 2017
n13: Mon Jul 3 14:42:49 CST 2017
n14: Mon Jul 3 14:42:49 CST 2017
n11: Mon Jul 3 14:42:49 CST 2017
[root@n12 trafodion]# service ntpd start
Starting ntpd: [ OK ]