ES参考之Query DSL

原创 2017年01月03日 16:42:08

https://www.elastic.co/guide/en/elasticsearch/reference/2.1/query-dsl.html

ES因搜索而生,主要工作就是处理查询,返回结果。

ES提供了基于JSON的query DSL查询语言,有两种类型子句:
1. Leaf query。查找特定字段的特定值。如match、term、range查询。
2. Compound query。wrap other leaf or compound queries。以一种含有逻辑(如booldis_max查询)的方式组合多个查询,或改变查询行为(如constant_score查询)

Query clauses behave differently depending on whether they are used in query context or filter context.

Query and filter context

Query context
Answers the question “How well does this document match this query clause?”
有打分(_score)。

In effect(生效) whenever a query clause is passed to a query parameter,

Filter context
Answers the question “Does this document match this query clause?”
无打分,yes or no。常用于过滤structured data. 如timestamp是否在特定范围

为了提高性能,ES把常用的filters自动缓存。

In effect whenever a query clause is passed to a filter parameter,

  1. filter or must_not parameters in the bool query,
  2. the filter parameter in the constant_score query
  3. the filter aggregation.
POST _search
{
  "query": {
    "bool" : {
      "filter": {//1.1
        "term" : { "tag" : "tech" }
      },
      "must_not" : {//1.2
        "range" : {
          "age" : { "from" : 10, "to" : 20 }
        }
      }
    }
  }
}
GET /_search
{
    "query": {
        "constant_score" : {//2.1
            "filter" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }
}
{
    "aggs" : {
        "red_products" : {
            "filter" : { "term": { "color": "red" } },//3.1
            "aggs" : {
                "avg_price" : { "avg" : { "field" : "price" } }
            }
        }
    }
}

GET /_search
{
  "query": {// query context
    "bool": { 
      "must": [// bool and two must:query context
        { "match": { "title":   "Search"        }}, 
        { "match": { "content": "Elasticsearch" }}  
      ],
      "filter": [ // filter context,term、range are used in FC
        { "term":  { "status": "published" }}, 
        { "range": { "publish_date": { "gte": "2015-01-01" }}} 
      ]
    }
  }
}

注意:在query context中的条件影响打分,而filter context不影响打分。

搜索类型

query_then_fetch :
第一步,执行查询得到对文档进行排序和分级所需要信息,在所有分片上执行。然后,只在相关分片上查询文档的实际内容。返回结果的最大数量是 size 参数的值。这个类型是默认的查询类型。
query_and_fetch :
查询在所有分片上并行执行,所有分片返回等于 size 值的结果数。返回文档的最大数等于 size 乘以 分片的数量。
dfs_query_and_fetch :
与 query_and_fetch 类似,在初始查询中执行分布式词频的计算,以得到返回文件的更精确的得分,从而让查询结果更想相关。
dfs_query_then_fetch :
与 query_then_fetch 类似,在初始查询中执行分布式词频的计算,以得到返回文件的更精确的得分,从而让查询结果更想相关。
count —–>size=0
特殊搜索,只返回匹配查询的文档数。
scan —–>scroll:
在发送第一个请求后,响应一个滚动标识符,类似于数据库当中的游标

Match All Query

最简单的,匹配所有(_score=1.0)

GET /_search
{
    "query": {
        "match_all": {}//不匹配:match_none
    }
}
//自定义打分:"boost" : 1.2 

返回结果:

  • took 花费多少毫秒
  • time_out 是否超时,若超时将得到部分结果或得不到任何结果
  • _shards 分片状态
    • total 总共分片数,主分片???
    • successful 查询成功的分片数
    • failed 查询失败的分片数,因在查询过程一些错误或异常发生
  • hits 查询结果

    • total 匹配查询的总文档数
    • max_score 最大得分的文档的得分数,若no match scoring was computed 通常是1
    • hits , list result

    文档结果中最常见的字段:

    • _index 哪个索引中
    • _type 哪个type中
    • _id 文档id
    • _source ,doc source,默认返回,可禁止返回
    • _score ,doc query score
    • sort ,the values that are used to sort, if the docs are sorted.
    • highlight , the highlighted segments, if highlighting was requested.
    • fields ,some fields can be retrieved without the need to fetch all the source objects.

3种查询方式、查询参数

http://<server>/_search
查询所有index、type
http://<server>/<index_name(s)>/_search
查询多个index,逗号分隔
http://<server>/<index_name(s)>/<type_name(s)>/_search
查询多个index,多个type,逗号分隔

如上,索引名 <==> 别名

The core query is usually contained in the body of the GET/POST call, but a lot of options can
also be expressed as URI query parameters, as follows:

  • q ,perform simple string queries
    …_search?q=字段名:字段值
  • df(default feld,默认字段)
    …_search?df=字段名&q=字段值
  • from,默认0。the start index of the hits.
  • size,默认10。 the number of hits to be returned. 是总返回,还是每个分片?
    from+size
    from + size ≤ index.max_result_window(1w)

    ?size=10&from=10000 是每个分片返回10000,汇总后再取前10。

  • analyzer ,the default analyzer to be used.

  • default_operator(默认 or)
  • explain , return information on how the score is calculated
  • fields, defne felds that must be returned。若store=true则直接取,否则从_source解析。
  • sort(默认score asc)
  • timeout,默认no active。 If a timeout is fred, all the hits accumulated are returned.
  • search_type
  • track_scores 默认false。When sorting on a field, scores are not computed. By setting track_scores to true, scores will still be computed and tracked.

    When sorting, the relevant sorted field values are loaded into memory. This means that per shard, there should be enough memory to contain them.

  • pretty ,若true,则 the results will be pretty printed.

query body参数:

  • query 查询语句
  • from+size 控制小(浅)分页
  • sort
  • post_filter ,flter out the query results without affecting the facet count.

    applied to the search hits at the very end of a search request, after aggs have already been calculated.

  • _source,control the returned source.可禁用(false),部分字段(obj.*) ,或 multiple exclude/include.

  • fielddata_fields ---->docvalue_fields(ES 5.0)
  • fields -->stored_fields(ES 5.0) ,store不再从_source解析。
  • facets --> (1.0 deprecated , 2.0 removed)使用aggs替代
  • aggs
  • index_boost ,per-index boost value.
  • highlighting
  • version 默认false,true则返回结果中添加doc version
  • rescore

    using a secondary (usually more costly) algorithm, instead of applying the costly algorithm to all documents in the index.

    A rescore request is executed on each shard before it returns its results…

  • min_score 若设置,则小于该分数的doc将不返回

  • explain , how the TD/IF score.
  • script_fields
  • suggest
  • search_type
  • scroll

Full text queries

全文检索。
They understand how the field being queried is analyzed and will apply each field’s analyzer (or search_analyzer) to the query string before executing.

1 Match

fuzzy、phrase、proximity

accepts text/numerics/dates, analyzes them, and constructs a query.

{
    "match" : {
        "message" : "this is a test"
    }//message是一个字段名称,可为任意字段,包括_all
}

三种类型match query:boolean(默认)、phrase、match_phrase_prefix

1 boolean匹配查询
参数:

  • operator,or(默认)、and。or匹配其中一个,and是匹配所有。
  • analyzer ,默认是field mapping definition中的,或 default search analyzer,定义分析查询文本用到的analyzer。
  • fuzziness 构建模糊查询,对string值区间在0..1,值设置相似度。
  • prefix_length 控制模糊查询的行为,指明区分项的共同前缀长度,默认是0。
  • max_expansions :控制模糊查询的行为,指明查询中的词项可扩展的数目,默认可以无限大。

    fuzziness+prefix_length+max_expansions

  • lenient,默认false,可设置true,to ignore exceptions caused by data-type mismatches。如query numeric field with text query string.

  • zero_terms_query, none(默认),all 。If the analyzer used removes all tokens in a query like a stop filter does, the default behavior is to match no documents at all.

  • cutoff_frequency 相对(0..1)或绝对值(≥0),per-shard-level.。 Allows handling stopwords dynamically at runtime。
  • minimum_should_match,

TODO

prefix_length max_expansions

fuzziness
构建模糊查询

numeric, date and IPv4 fields

fuzziness is interpreted as a +/- margin.
-fuzziness <= field value <= +fuzziness
numeric: 2 or 2.0
date: 毫秒或字符串形式的"2h"
ip: long or another IPv4 address (which will be converted into a long).

string fields

fuzziness is interpreted as a Levenshtein Edit Distance ----the number of one character changes that need to be made to one string to make it the same as another string.
值:0, 1, 2,允许的最大edits.
值:AUTO,依据term长度。
0..2 必须完全匹配;
3..5 最大允许1个;
>5  最大允许2个
{
    "match" : {
        "message" : {//message是字段名称
            "query" : "this is a test",
            "operator" : "and"
        }
    }
}

2 phrase

{
    "match_phrase" : {//
        "message" : "this is a test"
    }
}

only a type of a match query, it can also be used in the following manner:

{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase" //
        }
    }
}

可指定参数:

  • slop 间隔几个词。
  • analyzer

3 match_phrase_prefix
same as match_phrase, except allows for prefix matches on the last term in the text.

{
    "match_phrase_prefix" : {//
        "message" : "this is a test"
    }
}
等价于:
{
    "match" : {
        "message" : {
            "query" : "this is a test",
            "type" : "phrase_prefix"//
        }
    }
}

It accepts the same parameters as the phrase type. In addition, it also accepts a max_expansions parameter that can control to how many prefixes the last term will be expanded.

{
    "match_phrase_prefix" : {
        "message" : {
            "query" : "this is a test",
            "max_expansions" : 10 //
        }
    }
}

Comparison to query_string / field
The match family of queries does not go through a “query parsing” process. It does not support field name prefixes, wildcard characters, or other “advanced” features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great “as you type” behavior to automatically load search results.

2 Match Phrase

like Match,but matching exact phrases or word proximity matches

3 Match Phrase Prefix

like Match Phrase,but does a wildcard search on the final word.

4 Multi Match

The multi-field version of the match query.

5 Common Terms

6 Query String

7 Simple Query String

Term level queries

1 term && terms

注意:对字符串,要么是分词后的最小单元,要么是不分词。

POST myindex
{
    "mappings": {
        "mytype":{
            "properties": {

                "no":{
                    "type": "integer"
                }
            }

        }
    }
}
POST myindex/mytype
{
    "no":2 //8 2 6
}
GET myindex/mytype/_search

GET myindex/mytype/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "term": {
                  "no": {
                     "value": "2"
                  }
               }
            },
            {
               "term": {
                  "no": {
                     "value": "6"
                  }
               }
            }
         ]
      }
   }
}
GET myindex/mytype/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "terms": {
                  "no": [2,6] //查询字段的多个值
               }
            }
         ]
      }
   }
}

通过其它索引,查询字段的取值范围:

PUT /users/user/2
{
    "followers" : ["1", "3"]
}

PUT /tweets/tweet/1
{
    "user" : "1"
}

GET /tweets/_search
{
    "query" : {
        "terms" : {
            "user" : {//其值,从users索引,user类型,id为2的doc中,followers字段中查找值
                "index" : "users",
                "type" : "user",
                "id" : "2",
                "path" : "followers"
            }
        }
    }
}

3 Range

"range" : { //gte、lte、boost
    "date" : {"gte" : "now-1d/d"}
}

"range" : {//date字段的range
    "born" : {
        "gte": "01/01/2012",
        "lte": "2013",
        "format": "dd/MM/yyyy||yyyy"  //或关系,指定2种日期格式
    }
}

range query中考虑时区:

"range" : {
    "timestamp" : {
        "gte": "2015-01-01 00:00:00", //提供的"东1区"的时间,实际为"2014-12-31T23:00:00 UTC",
        "lte": "now", //now 不受time_zone影响,dates must be stored as UTC
        "time_zone": "+01:00" //东1区
    }
}

4 Exists

    "query": {
        "exists" : { "field" : "user" }
    }

非null的几种情况:

{ "user": "" }   空不是null 
{ "user": "-" }   分词后为空不是null, Even though the standard analyzer would emit zero tokens, the original field is non-null.
{ "user": ["jane", null ] }  至少一个要素不是null,At least one non-null value is required.

null的几种情况:

{ "user": null } 
{ "user": [] }     无元素,no values
{ "user": [null] }  At least one non-null value is required.
{ "foo":  "bar" }  user字段压根不存在

null_value mapping:

"user": {
"type": "text",
"null_value": "_null_"  //
}

explicit null values would be indexed as the string _null_,如下就是显示的声明null:

{ "user": null }
{ "user": [null] }

其它的非显式 null,将不替换,但仍不匹配exists查询:

{ "user": [] }
{ "foo": "bar" }

missing query

“must_not” 包装 exists

5 Prefix Query

fields contain terms with a specified prefix (not analyzed).

"query": {
    "prefix" : { "user" : "ki" }
}

可选:”boost” : 2.0

6 Wildcard (通配符)

fields match a wildcard expression (not analyzed).
通配符:

  • * , 匹配any character sequence
  • ? , 匹配any single character

注意: this query can be slow, as it needs to iterate over many terms. A wildcard term should not start with one of the wildcards * or ?.

7 Regexp

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html
性能严重依赖正则表达式的复杂度。

8 Fuzzy (模糊)

警告:Deprecated in 5.0removed in 6.0,Use match queries with fuzziness instead.

简易:

"query": {
   "fuzzy" : { "user" : "ki" }
}

高级:

"query": {
    "fuzzy" : {
        "user" : {
            "value" :         "ki",
            "boost" :         1.0,
            "fuzziness" :     2, //1. 
            "prefix_length" : 0, //2.
            "max_expansions": 100 //3.
        }
    }
}
  • fuzziness. The max edit distance(最大编辑距离). 参考
    • text or keyword fields, fuzziness, the number of one character changes that need to be made to one string to make it the same as another string. 其值可以是0,1,2,3
    • 默认值auto,依赖字段的长度自动生成。
      • 长度[0..2] ,must match exactly
      • 长度[3..5] , one edit allowed
      • 长度>5 ,two edits allowed
  • prefix_length. The number of initial characters which will not be “fuzzified”. 默认0。
  • max_expansions. The maximum number of terms that the fuzzy query will expand to. 默认50. 最多利用fuzzy扩展、模糊出的terms。

注意:prefix_length=0,max_expansions非常高,将导致严重的性能问题。

9 Type (无意义)

GET /_search
{
    "query": {
        "type" : {
            "value" : "my_type"
        }
    }
}
 GET _all/my_type/_search

10 Ids 【 _uid 】

通过doc id过滤,这种查询使用_uid({type}#{id})

GET /_search
{
    "query": {
        "ids" : {
            "type" : "my_type", //可选项,可单值或数组
            "values" : ["1", "4", "100"]
        }
    }
}

5 Compound queries

wrap other compound or leaf queries,either to combine their results and scores,to change their behaviour, or to switch from query to filter context.

1 constant_score (filter context,固定打分值)

wraps another query, but executes it in filter context. All matching documents are given the same “constant” _score.

2 bool

combin multiple leaf or compound query clauses。
The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.

3 dis_max (best match)

bool 查询是匹配组合的多个条件,而dis_max查询是匹配任一条件,仅返回最佳匹配的。

4 function_score

Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.

5 boosting

6 indices query

6 Joining queries

1 Nested

2 Has Child

3 Has Parent

4 Parent Id

7 Geo queries

1 GeoShape Query

2 Geo Bounding Box

3 Geo Distance

4 Geo Distance Range

5 Geo Polygon

8 Specialized queries

1 More Like This

2 Template

3 Script

4 Percolate

9 Span queries

1 Span Term

2 Span Multi Term

3 Span First

4 Span Near

5 Span Or

6 Span Not

7 Span Containing

8 Span Within

9 Span Field Masking

10 Minimum Should Match

11 Multi Term Query Rewrite

elasticsearch 的滚动(scroll)
http://www.jianshu.com/p/14aa8b09c789

相关文章推荐

ElasticSearch(七)--请求体查询

简单查询lite search (字符串查询)是一种有效的命令行ad hoc 查询,但是想要善用搜索,必须使用请求体查询request  body search API.之所以这么称呼,是因为大多数的...

ElasticSearchDSL查询模板之一(按日查询与按关键词聚类)

最近工作忙得紧,又开始设计用户画像项目,又要做需求,所以博客有一段时间没有更新了,今天刚写完了一大波需求,所以也就抽出一点时间分享一些DSL 在开源这些方法对最开始,先定义几个全局变量,后面对方法中会...
  • neujs
  • neujs
  • 2017年04月18日 17:34
  • 859

Elasticsearch(入门篇)——Query DSL与查询行为

http://www.cnblogs.com/miqi1992/p/5708553.html ES提供了丰富多彩的查询接口,可以满足各种各样的查询要求。更多内容请参考:ELK...

Elasticsearch DSL中Query与Filter的区别

Elasticsearch支持很多查询方式,除了通过9300(默认)端口通过TCP协议进行查询,另一种就是DSL,它是把请求写在JSON里面,然后进行相关查询。一个DSL例子GET _search{ ...

es 学习 5 DSL mapping 使用 案例

es 学习 3 DSL 总结

Query DSL for elasticsearch Query

Query DSL Query DSL (资料来自: http://www.elasticsearch.cn/guide/reference/query-dsl/) --简介-- elastics...

QueryDSL介绍

1,QueryDSL仅仅是一个通用的查询框架,专注于通过Java API构建类型安全的SQL查询。 2,Querydsl可以通过一组通用的查询API为用户构建出适合不同类型ORM框架或者是SQL的查...

Delphi7高级应用开发随书源码

  • 2003年04月30日 00:00
  • 676KB
  • 下载

springData使用QueryDsl

参考资料 1:http://docs.spring.io/spring-data/jpa/docs/1.10.x/reference/pdf/spring-data-jpa-reference.pd...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:ES参考之Query DSL
举报原因:
原因补充:

(最多只允许输入30个字)