private static void output(SearchHit[] hits) {
BulkRequestBuilder prepareBulk = writeClient.prepareBulk();
for (SearchHit sh : hits) {
prepareBulk.add(writeClient
.prepareIndex(writeIndex, sh.getType(), sh.getId())
.setSource(sh.getSource()));
}
if (prepareBulk.numberOfActions() >= 1) {
prepareBulk.get();
}
}
如上代码使用prepareBulk()和prepareIndex()方法,发现当操作百万数据时,总是发生不定数据量的丢失,当修改为如下api,并同时优化bulk操作配置 问题解决:
private static void output1(SearchHit[] hits) {
BulkRequest bulkRequest = new BulkRequest();
for (SearchHit sh : hits) {
IndexRequest source1 = new IndexRequest(writeIndex,sh.getType(), sh.getId()).source(sh.getSource());
bulkRequest.add(source1);
}
if (bulkRequest.numberOfActions() >= 1) {
writeClient.bulk(bulkRequest);
}
}
优化bulk操作参考:
https://blog.csdn.net/liyantianmin/article/details/77935636
https://blog.csdn.net/aa5305123/article/details/86542105
首次索引设置副本数为0
threadpool.index.queue_size: 1000
indices.memory.index_buffer_size: 20%
index.translog.durability: async
index.translog.flush_threshold_size: 600MB
index.translog.flush_threshold_ops: 500000
threadpool.bulk.queue_size: 1000
当bulk批量进行index操作压力过大会查看http://10.69.46.34:9200//_nodes/stats?pretty会在bulk下看到大量reject,在代码中打印bulk操作的BulkResponse响应信息也会发现大量的EsRejectedExcutionException异常。这也是我丢失数据的主要原因。
"bulk" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 1337077,
"largest" : 4,
"completed" : 22935863
}
做上面的优化bulk配置后,rejected数量变为0,bulk写入时不在丢失数据:
"bulk" : {
"threads" : 4,
"queue" : 0,
"active" : 0,
"rejected" : 0,
"largest" : 4,
"completed" : 149784
},