elasticsearch学习DSL查询

一、普通查询

{
  "query": {
    "bool": {
      "must": [
        {
          "wildcard": {
            "interface_name": "*CacheRequestBodyFilter*"
          }
        },
        {
          "match_phrase_prefix": {
            "message": "接口访问"
          }
        },
        {
          "range": {
            "@timestamp": {
              "gte": "2021-10-25 07:14:05",
              "lte": "now",
              "format": "yyyy-MM-dd HH:mm:ss",
              "time_zone": "+8"
            }
          }
        }
      ]
    }
  },
  "size": 100
}

对应type为text类型的字段,ES会对字段数据进行分词。

重点关注:

  • term:精确查找,通常用于匹配价格、ID、用户名等,类似于sql中的=
  • terms:多值精确查找,类似于sql中的in
  • match:文本匹配,先将查找内容进行分词,再使用分词结果进行匹配
  • match_phrase:短语匹配,跟match的不同在于,它严格按照分词顺序进行匹配
  • match_phrase_prefix:短语匹配,跟match_phrase的不同在于,它在match_phrase的基础上用分词的最后一个进行前缀匹配
  • multi_match:允许在多个字段上match同一个查询语句
  • wildcard:通配符查询
  • fuzzy:容差查询,他有一定程度上的智能,比如查询JVA可能会返回JAVA
  • range:用于范围查询
  • regexp:正则查询
  • exists:返回字段不为null的记录

2、runtime_mappings

可以在mapping或者DSL查询体中构建runtime_mappings,动态计算运行时字段,我们重点看DSL查询runtime_mappings

{
  "runtime_mappings": {
    "day_of_week": {
      "type": "keyword",
      "script": {
        "source": "emit(doc['@timestamp'].value.dayOfWeekEnum.getDisplayName(TextStyle.FULL, Locale.ROOT))"
      }
    }
  },
  "aggs": {
    "day_of_week": {
      "terms": {
        "field": "day_of_week"
      }
    }
  }
}

runtime_mappings可以在runtime_mappings字段的基础上再次构建。mapping中的runtime_mappings可以在mapping创建后添加,并且可以覆盖同名properties。

二、聚合查询

2.1、指标聚合,可以理解为sql中的聚合函数

重点关注:

  • avg:均值
  • max:最大值
  • min:最小值
  • value_count:计数
  • cardinality:去重计数,不精确
  • sum:求和
  • stats:是minmaxsumcount和avg的合集

例子:

格式:
{
    "aggs":{
        "自定义的aggs名字":{
            "指标聚合类型":{
                "field":"要聚合的字段"
            }
        }
    }
}



查询:
{
  "aggs": {
    "grades_stats": { "stats": { "field": "grade" } }
  }
}
返回:
{
  ...

  "aggregations": {
    "grades_stats": {
      "count": 2,
      "min": 50.0,
      "max": 100.0,
      "avg": 75.0,
      "sum": 150.0
    }
  }
}

2.2、桶聚合

重点关注:

size:(默认是10),如果数量超过1000,请使用composite桶聚合。
order:对返回的结果排序,默认是"_count":"desc"
min_doc_count:只返回匹配超过配置数量的匹配项
include:可以通配符".*sport.*",也可以是精确值的数组[ "mazda", "honda" ],表示字段包含include的项的才会被统计。include还可以设置对字段值进行分区,例如要根据登录时间过期一批用户缓存,如果数据量太大,可以根据cardinality来统计用户数(例如有100个),设置num_partitions值为10,那么每个分区应该有10个用户,设置size为5,那么就会返回每个分区的前5个用户
exclude:可以通配符,也可以是精确值的数组,表示字段包含include的项的不会被统计
missing:定义应如何处理缺少值的文档。默认情况下,它们将被忽略,但也可以将它们视为具有值

例:
{
   "size": 0,
   "aggs": {
      "expired_sessions": {
         "terms": {
            "field": "account_id",
            "include": {
               "partition": 0,
               "num_partitions": 20
            },
            "exclude": [ "xxx", "aaa" ]
            "size": 10000,
            "order": {
               "last_access": "asc"
            },
            "missing": "0",
            "value_type": "keyword"
         },
         "aggs": {
            "last_access": {
               "max": {
                  "field": "access_date"
               }
            }
         }
      }
   }
}
  • range:将一个范围的数值分段统计,from,to是包前不包后,例如价格:
keyed:和ranges里的key联合使用,自定义key值

请求:
{
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price",
        "keyed": true,
        "ranges": [
          { "key": "cheap", "to": 100 },
          { "key": "average", "from": 100, "to": 200 },
          { "key": "expensive", "from": 200 }
        ]
      }
    }
  }
}

返回:
{
  ...
  "aggregations": {
    "price_ranges": {
      "buckets": {
        "cheap": {
          "to": 100.0,
          "doc_count": 2
        },
        "average": {
          "from": 100.0,
          "to": 200.0,
          "doc_count": 2
        },
        "expensive": {
          "from": 200.0,
          "doc_count": 3
        }
      }
    }
  }
}

可以对runtime_mappings的动态字段进行统计
{
  "runtime_mappings": {
    "price.euros": {
      "type": "double",
      "script": {
        "source": """
          emit(doc['price'].value * params.conversion_rate)
        """,
        "params": {
          "conversion_rate": 0.835526591
        }
      }
    }
  },
  "aggs": {
    "price_ranges": {
      "range": {
        "field": "price.euros",
        "ranges": [
          { "to": 100 },
          { "from": 100, "to": 200 },
          { "from": 200 }
        ]
      }
    }
  }
}

可以对histogram直方图数据进行range统计
  • date_range:对于日期的range统计
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-daterange-aggregation.html
有需要可以官网查询format的格式和单位

请求:
{
  "aggs": {
    "range": {
      "date_range": {
        "field": "date",
        "format": "MM-yyy",
        "time_zone": "UTC" //这个一般不写
        "ranges": [
          { "from": "01-2015", "to": "03-2015", "key": "quarter_01" },
          { "from": "03-2015", "to": "06-2015", "key": "quarter_02" }
        ],
        "keyed": true
      }
    }
  }
}

返回:
{
  ...
  "aggregations": {
    "range": {
      "buckets": {
        "quarter_01": {
          "from": 1.4200704E12,
          "from_as_string": "01-2015",
          "to": 1.425168E12,
          "to_as_string": "03-2015",
          "doc_count": 5
        },
        "quarter_02": {
          "from": 1.425168E12,
          "from_as_string": "03-2015",
          "to": 1.4331168E12,
          "to_as_string": "06-2015",
          "doc_count": 2
        }
      }
    }
  }
}

  • filters:筛选满足条件的记录放入桶中并统计
other_bucket_key:将不符合filters的记录都丢到这个桶中,也可以不要这个属性,返回结果就只统计符合filter的记录
请求:
{
  "size": 0,
  "aggs" : {
    "messages" : {
      "filters" : {
        "other_bucket_key": "other_messages",
        "filters" : {
          "errors" :   { "match" : { "body" : "error"   }},
          "warnings" : { "match" : { "body" : "warning" }}
        }
      }
    }
  }
}
返回:
{
  "took": 3,
  "timed_out": false,
  "_shards": ...,
  "hits": ...,
  "aggregations": {
    "messages": {
      "buckets": {
        "errors": {
          "doc_count": 1
        },
        "warnings": {
          "doc_count": 2
        },
        "other_messages": {
          "doc_count": 1
        }
      }
    }
  }
}
  • multi_terms:相当于多个terms联合聚合统计,官方建议在需要联合排序或者是需要前N条记录的情况下用multi_terms,否则可以考虑在构建索引的时候就添加一个联合字段或者是用runtime_mappings定义动态字段(这样可以直接用terms统计),或者使用composite聚合
请求:
{
  "aggs": {
    "genres_and_products": {
      "multi_terms": {
        "terms": [{
          "field": "genre" 
        }, {
          "field": "product"
          "missing": "Product Z"
        }]
      }
    }
  }
}
返回:
{
  ...
  "aggregations" : {
    "genres_and_products" : {
      "doc_count_error_upper_bound" : 0,  
      "sum_other_doc_count" : 0,          
      "buckets" : [                       
        {
          "key" : [                       
            "rock",
            "Product A"
          ],
          "key_as_string" : "rock|Product A",
          "doc_count" : 2
        },
        {
          "key" : [
            "electronic",
            "Product B"
          ],
          "key_as_string" : "electronic|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "jazz",
            "Product B"
          ],
          "key_as_string" : "jazz|Product B",
          "doc_count" : 1
        },
        {
          "key" : [
            "rock",
            "Product B"
          ],
          "key_as_string" : "rock|Product B",
          "doc_count" : 1
        }
      ]
    }
  }
}
  • composite:composite聚合成本非常高,要仔细测试后再上生产环境。他与multi_terms的不同可以理解为,composite聚合是multi_terms聚合的翻页版本。
sources:composite的聚合条件都要在sources下,根据sources下数组顺序决定聚合返回键的顺序。sources参数只能是下面4种类型:Terms、Histogram、Date histogram、GeoTile grid。
missing_bucket:默认情况下字段没有值的记录将被忽略,如果这个属性为true,那么没有值的记录也会包含在响应中。
如果想提高composite的查询统计效率可以在mapping中设置index sort,查询的时候遵循类似sql的最左原则,例如:
{
  "settings": {
    "index": {
      "sort.field": [ "username", "timestamp" ],   
      "sort.order": [ "asc", "desc" ]              
    }
  },
  "mappings": {
    "properties": {
      "username": {
        "type": "keyword",
        "doc_values": true
      },
      "timestamp": {
        "type": "date"
      }
    }
  }
}

请求示例如下:
{
  "size": 0,
  "track_total_hits": false,
  "aggs": {
    "my_buckets": {
      "composite": {
        "sources": [
          { "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
          { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
        ]
      }
    }
  }
}
如果要翻页,就要添加size参数并注意响应中的after_key值
{
  "size": 0,
  "track_total_hits": false,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 2,
        "sources": [
          { "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
          { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
        ]
      }
    }
  }
}
响应:
{
  ...
  "aggregations": {
    "my_buckets": {
      "after_key": {
        "user_name": "mad max",
        "date": 1494288000000
      },
      "buckets": [
        {
          "key": {
            "user_name": "rocky",
            "date": 1494201600000
          },
          "doc_count": 1
        },
        {
          "key": {
            "user_name": "mad max",
            "date": 1494288000000
          },
          "doc_count": 2
        }
      ]
    }
  }
}
下一次请求就带上after_key值
{
  "size": 0,
  "track_total_hits": false,
  "aggs": {
    "my_buckets": {
      "composite": {
        "size": 2,
        "sources": [
          { "user_name": { "terms": { "field": "user_name", "order": "ase", "missing_bucket": true} } },
          { "date": { "date_histogram": { "field": "timestamp", "calendar_interval": "1d", "order": "desc" } } }
        ],
        "after": { "user_name": "mad max","date": 1494288000000}
      }
    }
  }
}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值