前言
删除文档作为ES操作中重要的一部分,其必要性毋庸置疑。而根据官网文档api可知,有两种删除方式:一是直接根据index
,type
,id
直接删除,而第二种是查询删除,也就是所谓的Delete By Query API
。
第一种删除方式因为id作为唯一标识,所以如果文档存在肯定能指定删除。
而第二种查询删除的方式,其作用过程相当于先查询出满足条件的文档,再根据文档ID依次删除。所以必须注意查询条件,确定查询结果范围。否则会误删很多文档。
当使用RestHighLevelClient操作时,第一种api没有问题,而第二种虽然提供了DeleteByQueryRequest
,但是没有相应的方法执行这个请求。(如果存在,还望不吝指教!)只能自己查询再删除两步走。虽然由客户端发出两次请求肯定没有Delete By Query
快,但是目前只能使用这种方式曲线救国了。
还有一种方式就是使用RestClient,灵活拼接json语句,发送Http请求。
消息来源:https://discuss.elastic.co/t/delete-by-query-with-new-java-rest-api/107578
正文
准备数据
/PUT http://{{host}}:{{port}}/delete_demo
{
"mappings":{
"demo":{
"properties":{
"content":{
"type":"text",
"fields":{ "keyword":{ "type":"keyword" } } }
}
}
}
}
/POST http://{{host}}:{{port}}/_bulk
{"index":{"_index":"delete_demo","_type":"demo"}}
{"content":"test1"}
{"index":{"_index":"delete_demo","_type":"demo"}}
{"content":"test1"}
{"index":{"_index":"delete_demo","_type":"demo"}}
{"content":"test1 add"}
{"index":{"_index":"delete_demo","_type":"demo"}}
{"content":"test2"}
注意:批量操作时,每行数据后面都得回车换行,最后一行后要跟空行!
{
"took": 7,
"errors": false,
"items": [
{
"index": {
"_index": "delete_demo",
"_type": "demo",
"_id": "AWExGSdW00f4t28WAPen",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"index": {
"_index": "delete_demo",
"_type": "demo",
"_id": "AWExGSdW00f4t28WAPeo",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"index": {
"_index": "delete_demo",
"_type": "demo",
"_id": "AWExGSdW00f4t28WAPep",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
},
{
"index": {
"_index": "delete_demo",
"_type": "demo",
"_id": "AWExGSdW00f4t28WAPeq",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"created": true,
"status": 201
}
}
]
}
ID方式删除
API格式
/DELETE http://{{host}}:{{port}}/delete_demo/demo/AWExGSdW00f4t28WAPen
Java 客户端
public class ElkDaoTest extends BaseTest{
@Autowired
private RestHighLevelClient rhlClient;
private String index;
private String type;
private String id;
@Before
public void prepare(){
index = "delete_demo";
type = "demo";
id = "AWExGSdW00f4t28WAPeo";
}
@Test
public void delete(){
DeleteRequest deleteRequest = new DeleteRequest(index,type,id);
DeleteResponse response = null;
try {
response = rhlClient.delete(deleteRequest);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println(response);
}
}
同样删除成功。
关于rhlClient的使用可以参考之前的博文ElasticSearch Rest High Level Client 教程(一)通用操作。
Delete By Query
API方式
首先重新把之前的数据恢复到四个文档。
/POST http://{{host}}:{{port}}/delete_demo/demo/_delete_by_query
{
"query":{
"match":{
"content":"test1"
}
}
}
{
"took": 14,
"timed_out": false,
"total": 3,
"deleted": 3,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
/GET http://{{host}}:{{port}}/delete_demo/demo/_search
{
"took": 0,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "delete_demo",
"_type": "demo",
"_id": "AWExKDse00f4t28WAafF",
"_score": 1,
"_source": {
"content": "test2"
}
}
]
}
}
结果显示删除了三个文档,即test1
,test1
,test1 add
,只剩下test2
。显然是将查询到的结果都删除了。
如果使用term
,也是同样按照查询匹配删除。
/POST http://{{host}}:{{port}}/delete_demo/demo/_delete_by_query
{
"query":{
"term":{
"content.keyword":"test1"
}
}
}
{
"took": 6,
"timed_out": false,
"total": 2,
"deleted": 2,
"batches": 1,
"version_conflicts": 0,
"noops": 0,
"retries": {
"bulk": 0,
"search": 0
},
"throttled_millis": 0,
"requests_per_second": -1,
"throttled_until_millis": 0,
"failures": []
}
证明Delete By Query
就是先查询再删除的过程。
Java 客户端
使用RestHighLevelClient
public class ElkDaoTest extends BaseTest { @Autowired private RestHighLevelClient rhlClient; private String index; private String type; private String deleteText; @Before public void prepare() { index = "delete_demo"; type = "demo"; deleteText = "test1"; } @Test public void delete() { try { SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.timeout(new TimeValue(2, TimeUnit.SECONDS)); TermQueryBuilder termQueryBuilder1 = QueryBuilders.termQuery("content.keyword", deleteText); sourceBuilder.query(termQueryBuilder1); SearchRequest searchRequest = new SearchRequest(index); searchRequest.types(type); searchRequest.source(sourceBuilder); SearchResponse response = rhlClient.search(searchRequest); SearchHits hits = response.getHits(); List<String> docIds = new ArrayList<>(hits.getHits().length); for (SearchHit hit : hits) { docIds.add(hit.getId()); } BulkRequest bulkRequest = new BulkRequest(); for (String id : docIds) { DeleteRequest deleteRequest = new DeleteRequest(index, type, id); bulkRequest.add(deleteRequest); } rhlClient.bulk(bulkRequest); } catch (IOException e) { e.printStackTrace(); } } }
恢复数据再执行以上代码,查询只剩下
test1 add
和test2
两个文档,删除查询成功。具体查询不再贴出。使用RestClient
之前系列文章就有提到过,rhlClient是对RestClient的封装,而rhlClient有部分功能还在完善,还未在java中实现。那么使用restClient直接以http的形式调用ES服务就好了。
public class ElkDaoTest extends BaseTest { @Autowired private RestClient restClient; private String index; private String type; private String deleteText; @Before public void prepare() { index = "delete_demo"; type = "demo"; deleteText = "test1"; } @Test public void delete() { String endPoint = "/" + index + "/" + type +"/_delete_by_query"; String source = genereateQueryString(); HttpEntity entity = new NStringEntity(source, ContentType.APPLICATION_JSON); try { restClient.performRequest("POST", endPoint,Collections.<String, String> emptyMap(), entity); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } } public String genereateQueryString(){ IndexRequest indexRequest = new IndexRequest(); XContentBuilder builder; try { builder = JsonXContent.contentBuilder() .startObject() .startObject("query") .startObject("term") .field("content.keyword",deleteText) .endObject() .endObject() .endObject(); indexRequest.source(builder); } catch (IOException e) { // TODO Auto-generated catch block e.printStackTrace(); } String source = indexRequest.source().utf8ToString(); return source; } }
运行后,同样删除了
test1
的两个文档,功能实现。优点就在于不需要发起两次HTTP连接,节省时间。
总结
就删除操作而言,RestHighLevelClient所能做的还不够完善,因此要联系RestClient的灵活性才能实现我们想要的功能。