Elasticsearch常用查询及JAVA实现

介绍

Elasticsearch简称(ES) 是一个分布式的 RESTful 风格的搜索和数据分析引擎, 是一个高度可伸缩的开源全文搜索和分析引擎。它允许您快速和接近实时地存储、搜索和分析大量数据。
本文不讨论ES的搭建和实现原理,只介绍ES的常用查询和SQL的对应,以及部分JAVA代码的实现。

前提背景

假设你已经有一个搭建好的ES环境,主要工作是查询分析ES中的数据。
本文的示例是基于联通智慧客服对话记录的,联通热线客服和在线客服已经实现了智能语音人机对话,所有对话记录数据存储在ES中,为了更好的节约人工成本、提升服务质量,本文主要介绍如何查询和分析对话记录。

辅助工具

Postman,本文示例使用Postman调用,当然可以通过其他方式调用 REST API

常用查询

  • 查询所有索引

请求:

 http://127.0.0.1:9200/_cat/indices?v

响应:

health status index     uuid   pri rep docs.count docs.deleted store.size pri.store.size
green  open   index_0   xxxx   5   0   10855755          813      8.7gb          8.7gb
  • 查询数据结构

请求:

 http://127.0.0.1:9200/index_0/_mapping

响应:(为了方便演示只列出了部分结构)

{
    "index_0": {
        "mappings": {
            "dialogue_history": {
                "_all": {
                    "enabled": false
                },
                "dynamic_templates": [
                    {
                        "nested_records": {
                            "match": "records",
                            "mapping": {
                                "type": "nested"
                            }
                        }
                    }
                ],
                "properties": {
                    "create_time": {
                        "type": "date",
                        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm:ss.SSS||yyyy-MM-dd||epoch_millis"
                    },
                    "id": {
                        "type": "keyword"
                    },
                    "records": {
                        "type": "nested",
                        "properties": {
                            "intent_name": {
                                "type": "keyword"
                            },
                            "prev_intent_name": {
                                "type": "keyword"
                            },
                            "query_text": {
                                "type": "text"
                            },
                            "slot_fill_fail_num": {
                                "type": "long"
                            },
                            "slot_fill_num": {
                                "type": "long"
                            },
                            "slot_fill_success_num": {
                                "type": "long"
                            },
                            "global_variable_num": {
                                "type": "long"
                            },
                            "global_variable_success_num": {
                                "type": "long"
                            }
                        }
                    }
                }
            }
        }
    }
}

关系型数据库表示:
只用于和ES查询对比,没有实际意义

主表:dialogue_history:对话历史记录,一通电话一条记录,会有多条交互记录

描述
id主键
create_time创建时间

子表:records:交互记录,一句话一条记录

描述
id主键
parent_id父表id
intent_name意图名称
prev_intent_name前一意图名称
query_text用户话述
slot_fill_num槽位数量
slot_fill_success_num槽位成功数量
slot_fill_fail_num槽位失败数量
global_variable_num全局变量数量
global_variable_success_num全局变量成功数量

在这里插入图片描述

  • 查询所有数据(分页+排序)
    说明:查询所有数据,按创建时间排序,取 0 到 10 条记录。

请求:(后续查询都使用该地址)

 http://127.0.0.1:9200/index_0/_search

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "from": 0,
    "size": 10,
    "sort": { "create_time": { "order": "desc" } }
}

SQL对比:

 SELECT * FROM dialogue_history ORDER BY create_time DESC LIMIT 0,10  
  • 一级索引查询,单字段
    说明:按主键id查询对话记录

DSL语句:

{
    "query": {
        "match": {"id":"2-1612108799-5900994"}
    }
}

SQL对比:

 SELECT * FROM dialogue_history where id = '2-1612108799-5900994'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.termsQuery("id", "2-1612108799-5900994"));
searchSourceBuilder.query(boolBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);
  • 一级索引查询,按范围
    说明:按“创建时间”范围查询对话记录

DSL语句:

{
    "query": {
        "range": {
            "create_time": {
                "from": "2021-01-21 00:00:00",
                "to": "2021-01-22 00:00:00"
            }
        }
    }
}

SQL对比:

 SELECT * FROM dialogue_history 
 where create_time BETWEEN '2021-01-21 00:00:00' AND '2021-01-22 00:00:00'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.rangeQuery("create_time").from(
                dateForm.format("2021-01-21 00:00:00")).to(dateForm.format("2021-01-22 00:00:00")));
searchSourceBuilder.query(boolBuilder);
  • 二级索引查询
    说明:按“用户话述”模糊查询

DSL语句:

{
    "query": {
        "bool": {
            "must": [
                {
                    "nested": {
                        "path": "records",
                        "query": {
                            "bool": {
                                "must": [
                                    {
                                        "match": {
                                            "records.query_text": "欢迎语通用播报"
                                        }
                                    }
                                ]
                            }
                        }
                    }
                }
            ]
        }
    }
} 

SQL对比:

 SELECT * FROM dialogue_history d 
 left outer join records r on d.id = records.parent_id 
 where records.query_text like '%欢迎语通用播报%'

java代码(请求拼接):

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
BoolQueryBuilder boolBuilder = QueryBuilders.boolQuery();
boolBuilder.must(QueryBuilders.nestedQuery("records",
                        QueryBuilders.boolQuery().must(QueryBuilders.matchPhraseQuery("records.query_text",
                                "欢迎语通用播报")), ScoreMode.Total));
searchSourceBuilder.query(boolBuilder);
  • 二级索引聚合查询
    说明:按“意图名称”聚合,对“槽位数量”、“槽位成功数量”、“槽位失败数量”求和。

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggs": {
        "intent_group": {
            "nested": {
                "path": "records"
            },
            "aggs": {
                "intent_name": {
                    "terms": {
                        "field": "records.intent_name"
                    },
                    "aggs": {
                        "slot_fill_num": {
                            "sum": {
                                "field": "records.slot_fill_num"
                            }
                        },
                        "slot_fill_success_num": {
                            "sum": {
                                "field": "records.slot_fill_success_num"
                            }
                        },
                        "slot_fill_fail_num": {
                            "sum": {
                                "field": "records.slot_fill_fail_num"
                            }
                        }
                    }
                }
            }
        }
    }
}

SQL对比:

 SELECT records.intent_name intent_name,
	 count(records.intent_name) count,
	 sum(records.slot_fill_num) slot_fill_num,
	 sum(records.slot_fill_success_num) slot_fill_success_num,
	 sum(records.slot_fill_fail_num) slot_fill_fail_num
 FROM dialogue_history d 
 left outer join records  on d.id = records.parent_id 
 group by records.intent_name

java代码(请求拼接):

 // 按意图统计
TermsAggregationBuilder intentAggr
        = AggregationBuilders.terms("count").field("records.intent_name");
// 按槽位总数求和
SumAggregationBuilder slotFillNumAggr
        = AggregationBuilders.sum("slot_fill_num").field("records.slot_fill_num");
// 按槽位成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
        = AggregationBuilders.sum("slot_fill_success_num").field("records.slot_fill_success_num");
// 按槽位失败数求和
SumAggregationBuilder slotFillFailNumAggr
        = AggregationBuilders.sum("slot_fill_fail_num").field("records.slot_fill_fail_num");

intentAggr.subAggregation(slotFillNumAggr)
        .subAggregation(slotFillSuccessNumAggr)
        .subAggregation(slotFillFailNumAggr)
        .size(Integer.MAX_VALUE);

NestedAggregationBuilder aggregationBuilder = AggregationBuilders
        .nested("records", "records").subAggregation(intentAggr);
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);

java代码(返回解析):

Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
   ParsedNested details = (ParsedNested) detailsMap.get("records");
   Map<String, Aggregation> countMap = details.getAggregations().asMap();
   ParsedStringTerms count = (ParsedStringTerms) countMap.get("count");

   count.getBuckets().forEach(it -> {
       String intentName = (String) it.getKey();
       Long intentSum = it.getDocCount();
       Map<String, Aggregation> sumMap = it.getAggregations().asMap();
       ParsedSum slotFillNum = (ParsedSum) sumMap.get("slot_fill_num");
       ParsedSum slotFillSuccessNum = (ParsedSum) sumMap.get("slot_fill_success_num");
       ParsedSum slotFillFailNum = (ParsedSum) sumMap.get("slot_fill_fail_num");
       result.add(new StatisticIntentSlot(
               intentName,
               intentSum,
               slotFillNum.getValue(),
               slotFillSuccessNum.getValue(),
               slotFillFailNum.getValue()));
   });
} catch (Exception e) {
   LoggerHelper.err(getClass(), e.getMessage(), e);
}
  • 二级索引聚合查询+多列聚合+过滤
    说明:按“意图名称”、“前一意图名称”聚合,过滤掉没有“全局变量数量”的记录,对“全局变量数量”、“全局变量成功数量”求和。。

DSL语句:

{
    "query": {
        "match_all": {}
    },
    "size": 0,
    "aggregations": {
        "records": {
            "nested": {
                "path": "records"
            },
            "aggregations": {
                "global_variable": {
                    "filter": {
                        "exists": {
                            "field": "records.global_variable_num"
                        }
                    },
                    "aggregations": {
                        "prev_intent_name": {
                            "terms": {
                                "field": "records.prev_intent_name",
                                 "size": 10
                            },
                            "aggregations": {
                                "intent_name": {
                                    "terms": {
                                        "field": "records.intent_name",
                                        "size": 2147483647,
                                    },
                                    "aggregations": {
                                        "global_variable_num": {
                                            "sum": {
                                                "field": "records.global_variable_num"
                                            }
                                        },
                                        "global_variable_success_num": {
                                            "sum": {
                                                "field": "records.global_variable_success_num"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    }
}

SQL对比:

 SELECT records.prev_intent_name,
 	 records.intent_name,
	 count(records.prev_intent_name) count,
	 sum(records.global_variable_num) global_variable_num,
	 sum(records.global_variable_success_num) global_variable_success_num
 FROM dialogue_history d 
 left outer join records  on d.id = records.parent_id 
 group by
	records.prev_intent_name,
	records.intent_name
 HAVING records.global_variable_num is not null 

java代码(请求拼接):

 // 过滤聚合数据
FilterAggregationBuilder filterAggr
        = AggregationBuilders.filter("global_variable", QueryBuilders.existsQuery("records.global_variable_num"));
// 按上一意图统计
TermsAggregationBuilder prevIntentAggr
        = AggregationBuilders.terms("prev_intent_name").field("records.prev_intent_name");
// 按意图统计
TermsAggregationBuilder intentAggr
        = AggregationBuilders.terms("intent_name").field("records.intent_name");
// 按变量总数求和
SumAggregationBuilder slotFillNumAggr
        = AggregationBuilders.sum("global_variable_num").field("records.global_variable_num");
// 按变量成功数求和
SumAggregationBuilder slotFillSuccessNumAggr
        = AggregationBuilders.sum("global_variable_success_num").
        field("records.global_variable_success_num");

filterAggr.subAggregation(
        prevIntentAggr.subAggregation(
                intentAggr.subAggregation(slotFillNumAggr).subAggregation(slotFillSuccessNumAggr)
                        .size(Integer.MAX_VALUE)));

NestedAggregationBuilder aggregationBuilder = AggregationBuilders
        .nested("records", "records").subAggregation(filterAggr);

SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(0);
searchSourceBuilder.aggregation(aggregationBuilder);
// RestHighLevelClient ES配置信息
RestHighLevelClient client;
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices("index_0");
searchRequest.source(searchSourceBuilder);
SearchResponse searchHits= client.search(searchRequest);

java代码(返回解析):

Map<String, Aggregation> detailsMap = searchHits.getAggregations().asMap();
try {
    ParsedNested records = (ParsedNested) detailsMap.get("records");
    Map<String, Aggregation> globalVariableMap = records.getAggregations().asMap();
    ParsedFilter globalVariableFilter = (ParsedFilter) globalVariableMap.get("global_variable");
    Map<String, Aggregation> prevIntentNameMap = globalVariableFilter.getAggregations().asMap();
    ParsedStringTerms prevIntentNameCount = (ParsedStringTerms) prevIntentNameMap.get("prev_intent_name");
    prevIntentNameCount.getBuckets().forEach(prevIntentNameCountIt -> {
        String prevIntentName = (String) prevIntentNameCountIt.getKey();
        Map<String, Aggregation> intentNameMap = prevIntentNameCountIt.getAggregations().asMap();
        ParsedStringTerms intentNameCount = (ParsedStringTerms) intentNameMap.get("intent_name");
        intentNameCount.getBuckets().forEach(it -> {
            String intentName = (String) it.getKey();
            Long intentSum = it.getDocCount();
            Map<String, Aggregation> sumMap = it.getAggregations().asMap();
            ParsedSum globalVariableNum = (ParsedSum) sumMap.get("global_variable_num");
            ParsedSum globalVariableSuccessNum = (ParsedSum) sumMap.get("global_variable_success_num");
            result.add(new StatisticGlobalIntentSlot(
                    intentName,
                    prevIntentName,
                    intentSum,
                    globalVariableNum.getValue(),
                    globalVariableSuccessNum.getValue()));
        });

    });
} catch (Exception e) {
    LoggerHelper.err(getClass(), e.getMessage(), e);
}

总结

  1. SQL能查的结构ES都能查,SQL查不了试试ES能不能查。
  2. 注意字段类型,节省索引空间。
  3. 一级索引、二级索引查询效率没问题,三级索引性能会下降。

后记

  1. 本文是基于业务提炼的ES查询示例,为用惯了SQL的同学提供参考,也为自己备忘。随着业务发展还会继续更新。
  2. 本文只摘录了部分查询DSL语句和JAVA代码,直接运行可能会有问题,有什么问题欢迎大家留言指正。
  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Java实现ES(Elasticsearch)的left join关联查询,可以通过以下步骤进行: 1. 首先,你需要使用JavaElasticsearch客户端库来连接和操作Elasticsearch常用Java Elasticsearch客户端库有Elasticsearch Java High Level REST Client和Elasticsearch Java Transport Client。 2. 创建一个Elasticsearch查询请求,使用`SearchRequest`对象来定义查询的索引、类型和查询条件等信息。 3. 使用`SearchSourceBuilder`对象来构建查询的DSL(Domain Specific Language),包括设置查询条件、过滤条件、排序等。 4. 对于left join关联查询,你可以使用Elasticsearch的`Nested`或`Parent-Child`关系来实现。如果你的数据模型存在嵌套字段,可以使用`Nested`关系来进行关联查询。如果你的数据模型存在父子关系,可以使用`Parent-Child`关系来进行关联查询。 5. 在查询DSL,使用`Nested Query`或`Has Parent Query`来进行关联查询。你可以通过设置`path`参数指定嵌套字段的路径或者指定父文档的类型。 6. 执行查询请求,获取查询结果。使用Elasticsearch客户端库提供的方法来发送查询请求并解析返回的结果。 下面是一个示例代码片段,演示了如何使用Java Elasticsearch High Level REST Client实现ES的left join关联查询: ```java // 创建Elasticsearch客户端 RestHighLevelClient client = new RestHighLevelClient( RestClient.builder(new HttpHost("localhost", 9200, "http"))); // 创建查询请求 SearchRequest searchRequest = new SearchRequest("index_name"); searchRequest.types("type_name"); // 构建查询DSL SearchSourceBuilder sourceBuilder = new SearchSourceBuilder(); sourceBuilder.query(QueryBuilders.nestedQuery("nested_field", QueryBuilders.matchQuery("nested_field.field_name", "query_value"), ScoreMode.None)); // 设置查询DSL searchRequest.source(sourceBuilder); // 执行查询请求 SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT); // 解析查询结果 SearchHits hits = searchResponse.getHits(); for (SearchHit hit : hits) { // 处理每个文档的结果 Map<String, Object> sourceAsMap = hit.getSourceAsMap(); // ... } // 关闭Elasticsearch客户端 client.close(); ```

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

liutao5459

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值