elasticsearch group by java_Elasticsearch如何实现 SQL语句中 Group By 和 Limit 的功能

有 SQL 背景的同学在学习 Elasticsearch 时,面对一个查询需求,不由自主地会先思考如何用 SQL 来实现,然后再去想 Elasticsearch 的 Query DSL 如何实现。那么本篇就给大家讲一条常见的 SQL 语句如何用 Elasticsearch 的查询语言实现。

1. SQL语句

假设我们有一个汽车的数据集,每个汽车都有车型、颜色等字段,我希望获取颜色种类大于1个的前2车型。假设汽车的数据模型如下:

{

"model":"modelA",

"color":"red"

}

假设我们有一个 cars 表,通过如下语句创建测试数据。

INSERT INTO cars (model,color) VALUES ('A','red');

INSERT INTO cars (model,color) VALUES ('A','white');

INSERT INTO cars (model,color) VALUES ('A','black');

INSERT INTO cars (model,color) VALUES ('A','yellow');

INSERT INTO cars (model,color) VALUES ('B','red');

INSERT INTO cars (model,color) VALUES ('B','white');

INSERT INTO cars (model,color) VALUES ('C','black');

INSERT INTO cars (model,color) VALUES ('C','red');

INSERT INTO cars (model,color) VALUES ('C','white');

INSERT INTO cars (model,color) VALUES ('C','yellow');

INSERT INTO cars (model,color) VALUES ('C','blue');

INSERT INTO cars (model,color) VALUES ('D','red');

INSERT INTO cars (model,color) VALUES ('A','red');

那么实现我们需求的 SQL 语句也比较简单,实现如下:

SELECT model,COUNT(DISTINCT color) color_count FROM cars GROUP BY model HAVING color_count > 1 ORDER BY color_count desc LIMIT 2;

这条查询语句中 Group By 是按照 model 做分组, Having color_count>1 限定了车型颜色种类大于1,ORDER BY color_count desc 限定结果按照颜色种类倒序排列,而 LIMIT 2 限定只返回前3条数据。

那么在 Elasticsearch 中如何实现这个需求呢?

2. 在 Elasticsearch 模拟测试数据

首先我们需要先在 elasticsearch 中插入测试的数据,这里我们使用 bulk 接口 ,如下所示:

POST _bulk

{"index":{"_index":"cars","_type":"doc","_id":"1"}}

{"model":"A","color":"red"}

{"index":{"_index":"cars","_type":"doc","_id":"2"}}

{"model":"A","color":"white"}

{"index":{"_index":"cars","_type":"doc","_id":"3"}}

{"model":"A","color":"black"}

{"index":{"_index":"cars","_type":"doc","_id":"4"}}

{"model":"A","color":"yellow"}

{"index":{"_index":"cars","_type":"doc","_id":"5"}}

{"model":"B","color":"red"}

{"index":{"_index":"cars","_type":"doc","_id":"6"}}

{"model":"B","color":"white"}

{"index":{"_index":"cars","_type":"doc","_id":"7"}}

{"model":"C","color":"black"}

{"index":{"_index":"cars","_type":"doc","_id":"8"}}

{"model":"C","color":"red"}

{"index":{"_index":"cars","_type":"doc","_id":"9"}}

{"model":"C","color":"white"}

{"index":{"_index":"cars","_type":"doc","_id":"10"}}

{"model":"C","color":"yellow"}

{"index":{"_index":"cars","_type":"doc","_id":"11"}}

{"model":"C","color":"blue"}

{"index":{"_index":"cars","_type":"doc","_id":"12"}}

{"model":"D","color":"red"}

{"index":{"_index":"cars","_type":"doc","_id":"13"}}

{"model":"A","color":"red"}

其中 index 为 cars,type 为 doc,所有数据与mysql 数据保持一致。大家可以在 Kibana 的 Dev Tools 中执行上面的命令,然后执行下面的查询语句验证数据是否已经成功存入。

GET cars/_search

3. Group By VS Terms/Metric Aggregation

SQL 中 Group By 语句在 Elasticsearch 中对应的是 Terms Aggregation,即分桶聚合,对应 Group By color 的语句如下所示:

GET cars/_search

{

"size":0,

"aggs":{

"models":{

"terms":{

"field":"model.keyword"

}

}

}

}

结果如下:

{

"took": 161,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 13,

"max_score": 0,

"hits": []

},

"aggregations": {

"models": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "A",

"doc_count": 5

},

{

"key": "C",

"doc_count": 5

},

{

"key": "B",

"doc_count": 2

},

{

"key": "D",

"doc_count": 1

}

]

}

}

}

我们看 aggregations 这个 key 下面的即为返回结果。

SQL 语句中还有一项是 COUNT(DISTINCT color) color_count 用于计算每个 model 的颜色数,在 Elasticsearch 中我们需要使用一个指标类聚合 Cardinality ,进行不同值计数。语句如下:

GET cars/_search

{

"size": 0,

"aggs": {

"models": {

"terms": {

"field": "model.keyword"

},

"aggs": {

"color_count": {

"cardinality": {

"field": "color.keyword"

}

}

}

}

}

}

其返回结果如下:

{

"took": 74,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 13,

"max_score": 0,

"hits": []

},

"aggregations": {

"models": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "A",

"doc_count": 5,

"color_count": {

"value": 4

}

},

{

"key": "C",

"doc_count": 5,

"color_count": {

"value": 5

}

},

{

"key": "B",

"doc_count": 2,

"color_count": {

"value": 2

}

},

{

"key": "D",

"doc_count": 1,

"color_count": {

"value": 1

}

}

]

}

}

}

结果中 color_count 即为每个 model 的颜色数,但这里所有的模型都返回了,我们只想要颜色数大于1的模型,因此这里还要加一个过滤条件。

4. Having Condition VS Bucket Filter Aggregation

Having color_count > 1 在 Elasticsearch 中对应的是 Bucket Filter 聚合,语句如下所示:

GET cars/_search

{

"size": 0,

"aggs": {

"models": {

"terms": {

"field": "model.keyword"

},

"aggs": {

"color_count": {

"cardinality": {

"field": "color.keyword"

}

},

"color_count_filter": {

"bucket_selector": {

"buckets_path": {

"colorCount": "color_count"

},

"script": "params.colorCount>1"

}

}

}

}

}

}

返回结果如下:

{

"took": 39,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 13,

"max_score": 0,

"hits": []

},

"aggregations": {

"models": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "A",

"doc_count": 5,

"color_count": {

"value": 4

}

},

{

"key": "C",

"doc_count": 5,

"color_count": {

"value": 5

}

},

{

"key": "B",

"doc_count": 2,

"color_count": {

"value": 2

}

}

]

}

}

}

此时返回结果只包含颜色数大于1的模型,但大家会发现颜色数多的 C 不是在第一个位置,我们还需要做排序处理。

5. Order By Limit VS Bucket Sort Aggregation

ORDER BY color_count desc LIMIT 3 在 Elasticsearch 中可以使用 Bucket Sort 聚合实现,语句如下所示:

GET cars/_search

{

"size": 0,

"aggs": {

"models": {

"terms": {

"field": "model.keyword"

},

"aggs": {

"color_count": {

"cardinality": {

"field": "color.keyword"

}

},

"color_count_filter": {

"bucket_selector": {

"buckets_path": {

"colorCount": "color_count"

},

"script": "params.colorCount>1"

}

},

"color_count_sort": {

"bucket_sort": {

"sort": {

"color_count": "desc"

},

"size": 2

}

}

}

}

}

}

返回结果如下:

{

"took": 32,

"timed_out": false,

"_shards": {

"total": 5,

"successful": 5,

"skipped": 0,

"failed": 0

},

"hits": {

"total": 13,

"max_score": 0,

"hits": []

},

"aggregations": {

"models": {

"doc_count_error_upper_bound": 0,

"sum_other_doc_count": 0,

"buckets": [

{

"key": "C",

"doc_count": 5,

"color_count": {

"value": 5

}

},

{

"key": "A",

"doc_count": 5,

"color_count": {

"value": 4

}

}

]

}

}

}

至此我们便将 SQL 语句实现的功能用 Elasticsearch 查询语句实现了。对比 SQL 语句与 Elasticsearch 的查询语句,大家会发现后者复杂了很多,但并非无章可循,随着大家对常见语法越来越熟悉,相信一定会越写越得心应手!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值