一、普通查询
{
"query": {
"bool": {
"must": [
{
"wildcard": {
"interface_name": "*CacheRequestBodyFilter*"
}
},
{
"match_phrase_prefix": {
"message": "接口访问"
}
},
{
"range": {
"@timestamp": {
"gte": "2021-10-25 07:14:05",
"lte": "now",
"format": "yyyy-MM-dd HH:mm:ss",
"time_zone": "+8"
}
}
}
]
}
},
"size": 100
}
对应type为text类型的字段,ES会对字段数据进行分词。
重点关注:
- term:精确查找,通常用于匹配价格、ID、用户名等,类似于sql中的=
- terms:多值精确查找,类似于sql中的in
- match:文本匹配,先将查找内容进行分词,再使用分词结果进行匹配
- match_phrase:短语匹配,跟match的不同在于,它严格按照分词顺序进行匹配
- match_phrase_prefix:短语匹配,跟match_phrase的不同在于,它在match_phrase的基础上用分词的最后一个进行前缀匹配
- multi_match:允许在多个字段上match同一个查询语句
- wildcard:通配符查询
- fuzzy:容差查询,他有一定程度上的智能,比如查询JVA可能会返回JAVA
- range:用于范围查询
- regexp:正则查询
- exists:返回字段不为null的记录
2、runtime_mappings
可以在mapping或者DSL查询体中构建runtime_mappings,动态计算运行时字段,我们重点看DSL查询runtime_mappings
{
"runtime_mappings": {
"day_of_week": {
"type": "keyword",
"script": {
"source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
}
}
},
"aggs": {
"day_of_week": {
"terms": {
"field": "day_of_week"
}
}
}
}
runtime_mappings可以在runtime_mappings字段的基础上再次构建。mapping中的runtime_mappings可以在mapping创建后添加,并且可以覆盖同名properties。
二、聚合查询
2.1、指标聚合,可以理解为sql中的聚合函数
重点关注:
- avg:均值
- max:最大值
- min:最小值
- value_count:计数
- cardinality:去重计数,不精确
- sum:求和
- stats:是
min
,max
,sum
,count和
avg的合集
例子:
格式:
{
"aggs":{
"自定义的aggs名字":{
"指标聚合类型":{
"field":"要聚合的字段"
}
}
}
}
查询:
{
"aggs": {
"grades_stats": { "stats": { "field": "grade" } }
}
}
返回:
{
...
"aggregations": {
"grades_stats": {
"count": 2,
"min": 50.0,
"max": 100.0,
"avg": 75.0,
"sum": 150.0
}
}
}
2.2、桶聚合
重点关注:
size:(默认是10),如果数量超过1000,请使用composite桶聚合。
order:对返回的结果排序,默认是"_count":"desc"
min_doc_count:只返回匹配超过配置数量的匹配项
include:可以通配符".*sport.*",也可以是精确值的数组[ "mazda", "honda" ],表示字段包含include的项的才会被统计。include还可以设置对字段值进行分区,例如要根据登录时间过期一批用户缓存,如果数据量太大,可以根据cardinality来统计用户数(例如有100个),设置num_partitions值为10,那么每个分区应该有10个用户,设置size为5,那么就会返回每个分区的前5个用户
exclude:可以通配符,也可以是精确值的数组,表示字段包含include的项的不会被统计
missing:定义应如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为具有值
例:
{
"size": 0,
"aggs": {
"expired_sessions": {
"terms": {
"field": "account_id",
"include": {
"partition": 0,
"num_partitions": 20
},
"exclude": [ "xxx", "aaa" ]
"size": 10000,
"order": {
"last_access": "asc"
},
"missing": "0",
"value_type": "keyword"
},
"aggs": {
"last_access": {
"max": {
"field": "access_date"
}
}
}
}
}
}
- range:将一个范围的数值分段统计,from,to是包前不包后,例如价格:
keyed:和ranges里的key联合使用,自定义key值
请求:
{
"aggs": {
"price_ranges": {
"range": {
"field": "price",
"keyed": true,
"ranges": [
{ "key": "cheap", "to": 100 },
{ "key": "average", "from": 100, "to": 200 },
{ "key": "expensive", "from": 200 }
]
}
}
}
}
返回:
{
...
"aggregations": {
"price_ranges": {
"buckets": {
"cheap": {
"to": 100.0,
"doc_count": 2
},
"average": {
"from": 100.0,
"to": 200.0,
"doc_count": 2
},
"expensive": {
"from": 200.0,
"doc_count": 3
}
}
}
}
}
可以对runtime_mappings的动态字段进行统计
{
"runtime_mappings": {
"price.euros": {
"type": "double",
"script": {
"source": """
emit(doc['price'].value * params.conversion_rate)
""",
"params": {
"conversion_rate": 0.835526591
}
}
}
},
"aggs": {
"price_ranges": {
"range": {
"field": "price.euros",
"ranges": [
{ "to": 100 },
{ "from": 100, "to": 200 },
{ "from": 200 }
]
}
}
}
}
可以对histogram直方图数据进行range统计
- date_range:对于日期的range统计
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
有需要可以官网查询format的格式和单位
请求:
{
"aggs": {
"range": {
"date_range": {
"field": "date",
"format": "MM-yyy",
"time_zone": "UTC" //这个一般不写
"ranges": [
{ "from": "01-2015", "to": "03-2015", "key": "quarter_01" },
{ "from": "03-2015", "to": "06-2015", "key": "quarter_02" }
],
"keyed": true
}
}
}
}
返回:
{
...
"aggregations": {
"range": {
"buckets": {
"quarter_01": {
"from": 1.4200704E12,
"from_as_string": "01-2015",
"to": 1.425168E12,
"to_as_string": "03-2015",
"doc_count": 5
},
"quarter_02": {
"from": 1.425168E12,
"from_as_string": "03-2015",
"to": 1.4331168E12,
"to_as_string": "06-2015",
"doc_count": 2
}
}
}
}
}
- filters:筛选满足条件的记录放入桶中并统计
other_bucket_key:将不符合filters的记录都丢到这个桶中,也可以不要这个属性,返回结果就只统计符合filter的记录
请求:
{
"size": 0,
"aggs" : {
"messages" : {
"filters" : {
"other_bucket_key": "other_messages",
"filters" : {
"errors" : { "match" : { "body" : "error" }},
"warnings" : { "match" : { "body" : "warning" }}
}
}
}
}
}
返回:
{
"took": 3,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"messages": {
"buckets": {
"errors": {
"doc_count": 1
},
"warnings": {
"doc_count": 2
},
"other_messages": {
"doc_count": 1
}
}
}
}
}
- multi_terms:相当于多个terms联合聚合统计,官方建议在需要联合排序或者是需要前N条记录的情况下用multi_terms,否则可以考虑在构建索引的时候就添加一个联合字段或者是用runtime_mappings定义动态字段(这样可以直接用terms统计),或者使用composite聚合
请求:
{
"aggs": {
"genres_and_products": {
"multi_terms": {
"terms": [{
"field": "genre"
}, {
"field": "product"
"missing": "Product Z"
}]
}
}
}
}
返回:
{
...
"aggregations" : {
"genres_and_products" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : [
"rock",
"Product A"
],
"key_as_string" : "rock|Product A",
"doc_count" : 2
},
{
"key" : [
"electronic",
"Product B"
],
"key_as_string" : "electronic|Product B",
"doc_count" : 1
},
{
"key" : [
"jazz",
"Product B"
],
"key_as_string" : "jazz|Product B",
"doc_count" : 1
},
{
"key" : [
"rock",
"Product B"
],
"key_as_string" : "rock|Product B",
"doc_count" : 1
}
]
}
}
}
- composite:composite聚合成本非常高,要仔细测试后再上生产环境。他与multi_terms的不同可以理解为,composite聚合是multi_terms聚合的翻页版本。
sources:composite的聚合条件都要在sources下,根据sources下数组顺序决定聚合返回键的顺序。sources参数只能是下面4种类型:Terms、Histogram、Date histogram、GeoTile grid。
missing_bucket:默认情况下字段没有值的记录将被忽略,如果这个属性为true,那么没有值的记录也会包含在响应中。
如果想提高composite的查询统计效率可以在mapping中设置index sort,查询的时候遵循类似sql的最左原则,例如:
{
"settings": {
"index": {
"sort.field": [ "username", "timestamp" ],
"sort.order": [ "asc", "desc" ]
}
},
"mappings": {
"properties": {
"username": {
"type": "keyword",
"doc_values": true
},
"timestamp": {
"type": "date"
}
}
}
}
请求示例如下:
{
"size": 0,
"track_total_hits": false,
"aggs": {
"my_buckets": {
"composite": {
"sources": [
{ "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
]
}
}
}
}
如果要翻页,就要添加size参数并注意响应中的after_key值
{
"size": 0,
"track_total_hits": false,
"aggs": {
"my_buckets": {
"composite": {
"size": 2,
"sources": [
{ "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
]
}
}
}
}
响应:
{
...
"aggregations": {
"my_buckets": {
"after_key": {
"user_name": "mad max",
"date": 1494288000000
},
"buckets": [
{
"key": {
"user_name": "rocky",
"date": 1494201600000
},
"doc_count": 1
},
{
"key": {
"user_name": "mad max",
"date": 1494288000000
},
"doc_count": 2
}
]
}
}
}
下一次请求就带上after_key值
{
"size": 0,
"track_total_hits": false,
"aggs": {
"my_buckets": {
"composite": {
"size": 2,
"sources": [
{ "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
{ "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
],
"after": { "user_name": "mad max","date": 1494288000000}
}
}
}
}