在一次es节点挂掉后日志所报错误:
[index_20190830][[index_20190830][2]] IndexShardRecoveryException[failed to recovery from gateway]; nested: EngineCreationFailureException[failed to create engine]; nested: EOFException;
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:250)
at org.elasticsearch.index.shard.StoreRecoveryService.access$100(StoreRecoveryService.java:56)
at org.elasticsearch.index.shard.StoreRecoveryService$1.run(StoreRecoveryService.java:129)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: [index_20190830][[index_20190830][2]] EngineCreationFailureException[failed to create engine]; nested: EOFException;
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:155)
at org.elasticsearch.index.engine.InternalEngineFactory.newReadWriteEngine(InternalEngineFactory.java:25)
at org.elasticsearch.index.shard.IndexShard.newEngine(IndexShard.java:1509)
at org.elasticsearch.index.shard.IndexShard.createNewEngine(IndexShard.java:1493)
at org.elasticsearch.index.shard.IndexShard.internalPerformTranslogRecovery(IndexShard.java:966)
at org.elasticsearch.index.shard.IndexShard.performTranslogRecovery(IndexShard.java:938)
at org.elasticsearch.index.shard.StoreRecoveryService.recoverFromStore(StoreRecoveryService.java:241)
... 5 more
Caused by: java.io.EOFException
at org.apache.lucene.store.InputStreamDataInput.readByte(InputStreamDataInput.java:37)
at org.apache.lucene.store.DataInput.readInt(DataInput.java:101)
at org.apache.lucene.store.DataInput.readLong(DataInput.java:157)
at org.elasticsearch.index.translog.Checkpoint.<init>(Checkpoint.java:54)
at org.elasticsearch.index.translog.Checkpoint.read(Checkpoint.java:83)
at org.elasticsearch.index.translog.Translog.recoverFromFiles(Translog.java:337)
at org.elasticsearch.index.translog.Translog.<init>(Translog.java:179)
at org.elasticsearch.index.engine.InternalEngine.openTranslog(InternalEngine.java:208)
at org.elasticsearch.index.engine.InternalEngine.<init>(InternalEngine.java:151)
... 11 more
检查
java -cp lib/lucene-core-5.5.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex elasticsearch-data/es_cluster/nodes/0/indices/index_20190830/2/index/
修复
java -cp lib/lucene-core-5.5.0.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex elasticsearch-data/es_cluster/nodes/0/indices/index_20190830/2/index/ -exorcise
或
java -cp lib/lucene-core-3.6.1.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex elasticsearch-data/es_cluster/nodes/0/indices/index_20190830/2/index/ -fix
参考:
https://www.jianshu.com/p/2b4a6846d699
https://www.iteye.com/blog/aoyouzi-2126576
发现没有数据没有错误,不是数据问题
继续排查
重启节点,close索引然后open,增加分片副本,强制分片,移动分片等操作都不行
查看 https://elasticsearch.cn/question/6735 文章
该文章问题是
org.elasticsearch.common.io.stream.InputStreamStreamInput.readByte(InputStreamStreamInput.java:43)
这个方法读取数据问题,而我的是translog的错
查看改索引下的translog目录发现
translog-117.ckp为空,其他索引的translog没有空文件
translog相关文件都是乱码,尝试复制translog.ckp 为translog-117.ckp替换原来的文件,问题解决