elasticsearch delete by query 在 2.0 的时候已经被去除了,官网上说是因为delete by query 强制refresh可能导致在并行索引时发生OutOfMemoryError ,也可能导致主副本不一致,官网建议使用 scroll/scan API 查出ID,然后在根据ID批量删除。
根据官网建议写了一个delete by query 插件。
主要代码如下:
SearchResponse scrollResp = client.prepareSearch(index).setTypes(type)
.setScroll(new TimeValue(60000)).setSize(defaultBatchSize).setQuery(query)
.execute().actionGet();
long total = scrollResp.getHits().getTotalHits();
while (true) {
BulkRequestBuilder requestBuilder = client.prepareBulk().setRefresh(true);
for (SearchHit hit : scrollResp.getHits().getHits())
requestBuilder.add(new DeleteRequest(index, type, hit.getId()));
BulkResponse reponse = requestBuilder.execute().actionGet();
if (reponse.hasFailures()) {
for (BulkItemResponse item : reponse) {
if (item.isFailed()) {
LOGGER.warn(item.getFailureMessage());
}
}
}
total = total - reponse.getItems().length;
LOGGER.info("has removed " + reponse.getItems().length + " rows, remain " + total + " rows ...");
scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute()
.actionGet();
if (scrollResp.getHits().getHits().length == 0)
break;
}
写了一个插件,插件地址:https://github.com/weiyuc/delete_by_query,
目前这个插件是针对es 2.4.1版本的。其他版本可以修改一下即可。