Elasticsearch查询

最新推荐文章于 2024-08-11 17:35:01 发布

weixin_34348174

最新推荐文章于 2024-08-11 17:35:01 发布

阅读量257

点赞数

文章标签：大数据 c/c++ java

Query DSL

Elasticsearch提供了一个基于JSON的完整的查询DSL（领域特定语言）。它定义的查询语言由两种类型的子句组成：“叶子查询子句”和“组合查询子句”。

叶子查询子句

叶子查询子句查找特定字段中的特定值，例如 match、term 或 range 查询。

复合查询子句

复合查询子句包装其他叶子或复合查询，并用于以逻辑方式组合多个查询（如 bool 或 dis_max 查询），或更改其行为（如 constant_score 查询）。

1. Query and filter context

查询子句的行为取决于它是用在查询上下文（query context）还是用在过滤器上下文（filter context）：

1.1. Query context

在查询上下文中的查询子句回答了“这个文档与这个查询子句的匹配程度是怎样的？”问题。除了决定文档是否匹配以外，查询子句还会计算一个“_score”，它表示文档与其他文档的相关程度。

1.2. Filter context

在过滤器上下文中，一个查询子句回答了“这个文档与查询子句匹配吗？”的问题。这个答案是简单的Yes或者No，也不会计算分数。过滤上下文主要用于过滤结构化数据，例如：

这个timestamp在2015年到2016年的范围内吗？
这个status字段的值是“published”吗？

（

PS：Query VS Filter

查询反应的是文档与查询子句的匹配程度，而过滤反应的是文档是否匹配查询子句
一个是筛选是否满足条件，情况无非两种：是或不是；一个是看满足条件的记录与查询条件的匹配程度
哪些满足条件，这是过滤；满足条件的这些记录与条件的匹配程度，这是查询
过滤不会计算评分，查询会计算评分

）

频繁使用的过滤器将被Elasticsearch自动缓存，以提高性能。

当查询子句中被传递了一个filter参数时过滤器上下文就生效了。例如，bool查询中的filter参数或者must_not参数。

下面是一个查询子句的例子，这个查询将匹配满足以下所有条件的文档：

title 字段包含单词“search”
content 字段包含单词“elasticsearch”
status 字段包含明确的单词“published”
publish_date 字段的包含的日期大于或等于2015-01-01

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": { 
        "bool": { 
            "must": [
                { "match": { "title":   "Search"        }}, 
                { "match": { "content": "Elasticsearch" }}  
            ],
            "filter": [ 
                { "term":  { "status": "published" }}, 
                { "range": { "publish_date": { "gte": "2015-01-01" }}} 
            ]
        }
    }
}
'

关于上面的查询子句作如下说明：

quary 参数表示这是一个查询上下文
bool 和两个match子句用在查询上下文中，表明它们参与每条文档的打分
filter 参数表明这是过滤器上下文
term 和 range 子句用在过滤器上下文中，它们会过滤掉不匹配的文档，而且不会影响匹配文档的分数

（PS：类比SQL的话，match相当于模糊查询，term相当于精确查询，range相当于范围查询）

2. Match All Query

最简单的查询，匹配所有文档，使它们的_score为1.0

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

_score可以被改变，通过用boost参数

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": { "boost" : 1.2 }
    }
}
'

与match_all相反的是match_none，它不匹配任何文档

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_none": {}
    }
}
'

3. Full text queries

3.1. Match Query

match查询接受文本/数值/日期类型的数据，分析它们，并构造一个查询。

match是一种布尔类型的查询。这意味着它对提供的文本进行分析，并在分析的过程中为提供的文本构造一个布尔查询。operator 选项可以设置为 or 或者 and 以此来控制布尔子句（默认是 or ）。例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}
'

注意，查询语句都是以“query”开头的，这里“message”是字段名

你也可以加一些参数，比如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "operator" : "and"
            }
        }
    }
}
'

（PS：match是模糊查询）

3.2. Match Phrase Query

match_phrase 查询与 match类似，但是它是用于精确匹配或单词接近匹配的。例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_phrase" : {
            "message" : "this is a test"
        }
    }
}
'

当然，你也可以加参数

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_phrase" : {
            "message" : {
                "query" : "this is a test",
                "analyzer" : "my_analyzer"
            }
        }
    }
}
'

这里“analyzer”是用来设置用那个分析器来分析文本

3.3. Match Phrase Prefix Query

类似于match_phrase查询，但是对最后一个单词进行通配符搜索。

match_phrase_prefix允许文本的最后一个单词进行前缀匹配

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_phrase_prefix" : {
            "message" : "quick brown f"
        }
    }
}
'

除了match_phrase允许的那些参数外，match_phrase_prefix还可以接受一个max_expansions参数，它是用来控制最后一个单词可以扩展多少后缀（默认50）。

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_phrase_prefix" : {
            "message" : {
                "query" : "quick brown f",
                "max_expansions" : 10
            }
        }
    }
}
'

3.4. Multi Match Query

multi_match 相当于 match 的多字段版本

顾名思义，multi_match可以指定多个字段，而match只能针对一个字段

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", 
      "fields": [ "subject", "message" ] 
    }
  }
}
'

另外，字段可以用通配符，例如下面的例子中可以查询 title ， first_name ， last_name 等字段：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match" : {
      "query":    "Will Smith",
      "fields": [ "title", "*_name" ] 
    }
  }
}
'

单个字段可以被提升，例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "multi_match" : {
      "query" : "this is a test",
      "fields" : [ "subject^3", "message" ] 
    }
  }
}
'

上面的例子，subject字段的重要性是message字段的三倍

3.5. Query String Query

支持Lucene查询字符串语法，允许指定 AND | OR | NOT ，并且在单个查询字符串中进行多字段查询

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "this AND that OR thus"
        }
    }
}
'

query_string查询解析输入并围绕操作符拆分文本，每个文本部分都是独立分析的，例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "(new york city) OR (big apple)"
        }
    }
}
'

上面的例子中，将被拆分成 “new york city” 和 “big apple” 两部分，并且每一部分都被分析器独立分析

注意，按操作符拆分

query_string的参数包括：

query　　实例被解析的查询文本

default_field　　如果没有指定前缀字段的话，这是默认的查询字段。（默认查询所有字段）

default_operator　　如果没有明确指定操作符的话，那么这是默认的操作符。例如，如果默认操作符是OR的话，那么“my name is jack”将被翻译成“my OR name OR is OR jack”，同理，如果是AND，则被翻译成“my AND name AND is AND jack”

analyzer　　用来解析查询字符串的解析器的名字

allow_leading_wildcard　　如果设置了，那么 * 或 ? 允许作为第一个字符。默认是true

lenient　　如果设置为true，则格式失败将被忽略

在query_string中，多字段查询应该这样写：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name"],
            "query" : "this AND that"
        }
    }
}
'

等价于

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "query_string": {
            "query": "(content:this OR name:this) AND (content:that OR name:that)"
        }
    }
}
'

上面两个是等价的

3.6. Simple Query String Query

simple_query_string 是query_string的一个更简单、更健壮、更适合面向用户的版本

使用SimpleQueryParser解析上下文的查询。与常规的query_string查询不同，simple_query_string查询永远不会抛出异常，并丢弃查询的无效部分。

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "simple_query_string" : {
        "query": "\"fried eggs\" +(eggplant | potato) -frittata",
        "fields": ["title^5", "body"],
        "default_operator": "and"
    }
  }
}
'

3.7. 实例练习

准备数据

//    删除索引
curl -X DELETE "192.168.1.134:9200/book"

//    创建索引
curl -X PUT "192.168.1.134:9200/book" -H 'Content-Type: application/json' -d'
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_doc" : {
            "properties" : {
                "title":        { "type": "text"  }, 
                "author":         { "type": "text"  }, 
                "introduction": { "type": "text"  },
                "publish_date": { 
                    "type": "date",
                    "format": "yyyy-MM-dd"
                }
            }
        }
    }
}
'

//    查看索引
curl -X GET "192.168.1.134:9200/book?pretty"

//    插入文档
curl -X PUT "192.168.1.134:9200/book/_doc/1" -H 'Content-Type: application/json' -d'
{
    "title" : "Hello Java",
    "author": "zhangsan",
    "publish_date" : "2008-11-15",
    "introduction" : "This is a book for novice."
}
'

//    查看文档
curl -X GET "192.168.1.134:9200/book/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

match查询（注意，match查询只能是针对单个字段）

这个例子中，我们用“Java”查询到2条，接下来用“Java入门”将查到5条

这是因为解析器会将“Java入门”拆分为“Java”和“入门”两个单词，而且默认的操作符是or

也就是说，查询的结果是title中包含“Java”或者“入门”的记录

现在变成查询title中同时包含“Java”和“入门”的记录，因此只有1条

multi_match多字段查询

match_phrase查询

对比不难发现，同样的关键字“Java从”，用match查出5条，用match_phrase只查出1条

query_string查询

4. Term level queries（单词级别查询）

全文本查询会在执行之前对查询字符串进行分析，而单词级别查询会对存储在反向索引中的精确的term进行操作。

这些查询通常用于结构化的数据，比如：numbers ， dates ，enums 等，而不是对全文本字段。

（PS：也就是说，全文本查询之前要先对文本内容进行分词，而单词级别的查询直接在相应字段的反向索引中精确查找，单词级别的查询一般用于数值、日期等类型的字段上）

4.1. Term Query

在指定的字段中查找包含指定的精确的term的文档

term查询将在反向索引（或者叫倒排索引）中查找包含特定的精确的term的文档。例如：

curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term" : { "user" : "Kimchy" } 
  }
}
'

上面的例子，在user字段的反向索引中查找包含精确的Kimchy的文档

还可以指定一个boost参数，使这个term查询比另一个查询具有更高的相关性得分。例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool": {
            "should": [
            {
                "term": {
                    "status": {
                        "value": "urgent",
                        "boost": 2.0 
                    }
                }
            },
            {
                "term": {
                    "status": "normal" 
                }
            }
          ]
        }
    }
}
'

这个例子中，urgent查询子句有一个boost参数值为2.0，这就意味着它的重要程度是后面的normal查询子句的两倍，normal子句默认的boost是1.0

4.2. Terms Query

查找包含指定字段中指定的任何确切term的文档

筛选出与所提供的terms中任何一个匹配的文档

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "terms" : { "user" : ["kimchy", "elasticsearch"]}
    }
}
'

4.3. Range Query

查找指定字段在指定范围内包含值（日期、数字或字符串）的文档。

下面的例子返回age字段的值在10到20之间的文档：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }
}
'

range查询可以接受下列参数：

gte　　大于或等于

gt　　大于

lte　　小于或等于

lt　　小于

boost　　设置boost值，默认是1.0

4.3.1. Range on date fields

当range查询用于date类型的字段时，范围可以用Date Math表示：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "range" : {
            "date" : {
                "gte" : "now-1d/d",
                "lt" :  "now/d"
            }
        }
    }
}
'

当使用Date Math将日期四舍五入到最近的日期、月份、小时等时，四舍五入日期取决于范围的两端是包含的还是排除的。

例如：

rounded up 向上舍入

rounded down 向下舍入

gt 大于2014-11-18||/M 变成 2014-11-30T23:59:59.999

gte 大于或等于2014-11-18||/M 变成 2014-11-01

lt 小于2014-11-18||/M 变成 2014-11-01

lte 小于或等于2014-11-18||/M 变成2014-11-30T23:59:59.999

这个其实很好理解，

大于2014-11-18||/M相当于是大于2014年11月，因此大于2014-11-18||/M等价于大于2014-11-30 23:59:59

也就是说，大于11月，相当于是大于11月的最后一天，即11-30 23:59:59

同理，大于或等于2014-11-18||/M，相当于大于或等于11月，自然是11月的第一天，即2014-11-01

同理，小于2014-11-18||/M，相当于小于11月，自然是小于11月1日，故而小于2014-11-18||/M等价于小于2014-11-01

同理，小于或等于2014-11-18||/M，等于11月自然是包含11月的，意味着小于11月30日，故而小于或等于2014-11-18||/M等价于小于或等于2014-11-30 23:59:59

4.3.2. Date format in range query

在日期范围查询的时候，我们可以指定日期格式。例如：

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "range" : {
            "born" : {
                "gte": "01/01/2012",
                "lte": "2013",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}
'

这个例子是查询在2012-01-01到2013-12-31之间出生的人

下面看时间范围查询

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "range" : {
            "timestamp" : {
                "gte": "2015-01-01 00:00:00", 
                "lte": "now", 
                "time_zone": "+01:00"
            }
        }
    }
}
'

4.4. Exsit Query

在特定的字段中查找非空值的文档

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "exists" : { "field" : "user" }
    }
}
'

4.5. Prefix Query

查找包含带有指定前缀的term的文档

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{ "query": {
    "prefix" : { "user" : "ki" }
  }
}
'

可以关联boost

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{ "query": {
    "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
  }
}
'

4.6. Wildcard Query

支持通配符查询，*表示任意字符，?表示任意单个字符

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "wildcard" : { "user" : "ki*y" }
    }
}
'

可以加boost参数

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "wildcard" : { "user" : { "value" : "ki*y", "boost" : 2.0 } }
    }
}
'

4.7. Regexp Query

正则表达式查询

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "regexp":{
            "name.first": "s.*y"
        }
    }
}
'

4.8. Ids Query

用_uid字段查询

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "ids" : {
            "type" : "_doc",
            "values" : ["1", "4", "100"]
        }
    }
}
'

4.9. 实例练习

5. 复合查询

复合查询包装其他复合查询或叶子查询，以组合它们的结果和得分，更改它们的行为，或从查询切换到筛选上下文。

5.1. 固定分数查询

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "constant_score" : {
            "filter" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }
}
'

5.2. 布尔查询

关于should子句，特别要注意：

如果这个布尔查询位于查询上下文，并且有must或者filter子句，那么即使should子句没有匹配任何文档，也没关系
如果是位于过滤器上下文，或者既没有must也没有filter，那么至少有一个should查询必须匹配文档。
这个行为可以通过设置minimum_should_match参数来显式地控制。

举个例子：

curl -X POST "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
    "query": {
        "bool" : {
            "must" : {
                "term" : { "user" : "kimchy" }
            },
            "filter": {
                "term" : { "tag" : "tech" }
            },
            "must_not" : {
                "range" : {
                    "age" : { "gte" : 10, "lte" : 20 }
                }
            },
            "should" : [
                { "term" : { "tag" : "wow" } },
                { "term" : { "tag" : "elasticsearch" } }
            ],
            "minimum_should_match" : 1,
            "boost" : 1.0
        }
    }
}
'

查询user为“kimchy”，并且tag为“tech”，并且age不在10~20之间，并且tag为wow或elasticsearch的文档

filter查询分数默认是0

curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}
'