ElasticSearch Query DSL(六)

最新推荐文章于 2024-09-09 17:21:20 发布

666呀

最新推荐文章于 2024-09-09 17:21:20 发布

阅读量399

点赞数

分类专栏： elasticsearch 文章标签： elasticsearch

本文链接：https://blog.csdn.net/Suubyy/article/details/118603857

版权

elasticsearch 专栏收录该内容

39 篇文章 7 订阅

订阅专栏

ElasticSearch Query DSL(六)

Match All查询

最简单的查询，匹配所有文档，给它们所有的 _score 为 1.0。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

_score 可以用 boost 参数改变：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_all": { "boost" : 1.2 }
  }
}
'

这与 match_all 查询相反，它不匹配任何文档。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match_none": {}
  }
}
'

术语级别的查询

你可以根据结构化数据中的精确值使用term-level查询文档。结构化数据包含日期范围、IP地址、价格或者产品ID。

与全文搜索不同的是，term-level查询不会分析搜索的term。相反，term-level查询会匹配存储在字段中的精确值。

term-level查询类型：

exists query：返回包含字段的任何索引值的文档。
fuzzy query：返回包含与搜索词相似的词的文档。
ids query：根据文档 ID 返回文档。
prefix query：返回在提供的字段中包含特定前缀的文档。
range query：返回包含提供范围内的术语的文档。
regexp query：返回包含与正则表达式匹配的术语的文档。
term query：返回在提供的字段中包含确切术语的文档。
terms query：返回在提供的字段中包含一个或多个确切术语的文档。
terms_set query：返回在提供的字段中包含最少数量的精确术语的文档。您可以使用字段或脚本定义匹配术语的最小数量。
type query：返回指定类型的文档。
wildcard query：返回包含匹配通配符模式的术语的文档。

exists查询

返回包含字段的索引值的文档。由于各种原因，文档字段的索引值可能不存在:

字段source JSON中为null或者为[]
字段在索引映射中被设置为index:false
字段值超出了索引映射中ignore_above设置的值
字段值为畸形的，并且在映射中定义了ignore_malformed

请求例子：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "exists": {
      "field": "user"
    }
  }
}
'

exists顶级参数设置

field：（必须，字符串），你希望搜索字段的名字。

当字段被认为不存在时，那么在source的JSON中该字段的值是null或者[]。以下的值表名字段确实存在：
- 空字符串。""或者"-"
- 包含null或者其他值的数组。例如[null,bar]
- 在映射中定义的null-value

查找缺少索引值的文档

想要查找缺少字段索引值的文档，请使用带有exists语句的must_not的bool查询。

以下请求搜索结果将返回user.id字段缺少索引值的文档：

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "bool": {
      "must_not": {
        "exists": {
          "field": "user.id"
        }
      }
    }
  }
}
'

fuzzy模糊查询

返回包含与搜索术语类似术语的文档，

编辑距离是将一个术语转换为另一个术语所需更改字符的数量。这些变化包括：

改变一个字符 (box → fox)
删除一个字符 (black → lack)
插入一个字符 (sic → sick)
互换两个字符的位置 (act → cat)

为了找到相似的词，模糊查询在指定的编辑距离内创建一组搜索词的所有可能的变化或扩展。然后查询返回每个扩展的精确匹配。

查询例子

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "fuzzy": {
      "user.id": {
        "value": "ki"
      }
    }
  }
}
'

使用高级参数的示例

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "fuzzy": {
      "user.id": {
        "value": "ki",
        "fuzziness": "AUTO",
        "max_expansions": 50,
        "prefix_length": 0,
        "transpositions": true,
        "rewrite": "constant_score"
      }
    }
  }
}
'

fuzz的顶级参数

<field>（必须，对象），你希望搜索的字段

`<field>`参数

value：（必须，字符串），你希望在<field>字段中找到的术语
fuzziness：（可选，字符串），允许匹配的最大编辑距离。有效值和其他信息请查看Fuzziness
max_expansions：（可选，integer），创建的最大变体数量。默认为50
prefix_length：（可选，integer），创建扩展时保持不变的起始字符数。默认为0
transpositions：（可选，boolean），指示编辑是否包括两个相邻字符的换位。默认为true。
rewrite：（可选，字符串），用于重写查询的方法。有关有效值和更多信息，请参阅rewrite parameter。

ID查询

根据文档的id返回文档。该查询使用存储在_id字段中的文档id。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "ids" : {
      "values" : ["1", "4", "100"]
    }
  }
}
'

前缀查询

返回包含指定前缀的字段的文档。

请求例子

以下搜索请求返回user.id字段中以ki开头术语的文档。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "prefix": {
      "user.id": {
        "value": "ki"
      }
    }
  }
}
'

prefix顶级参数

<field>：（必须，对象），你希望搜索的字段

`<field>`参数

value：（必须，字符串），你希望在提供的<field>字段上发现以value字符开始的术语。

加速前缀查询

你可以使用index_prefixes映射参数加速前缀查询。如果开启，ElasticSearch会在单独的字段中索引2-5个字符的前缀。这使得ElasticSearch在很大的索引上运行前缀查询是非常高效的

如果 search.allow_expensive_queries 设置为 false，则不会执行前缀查询。但是，如果启用了 index_prefixes，则会构建一个优化的查询

范围查询

返回包含指定范围内的术语的文档。

请求例子：

根据以下搜索。返回age字段在10到20范围内的术语的文档

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "age": {
        "gte": 10,
        "lte": 20,
        "boost": 2.0
      }
    }
  }
}
'

rang的顶级参数

<field>：（必须，对象），你希望搜索的字段

`<field>`参数

gt：（可选）大于
gte：（可选）大于等于
lt：（可选）小于
lte：（可选）小于等于
format：（可选的，字符串），查询中用来转换date类型字段的格式。
relation：（可选的，字符串），指示范围查询如何匹配范围字段的值。有效值为：
- INTERSECTS (Default)：匹配具有与查询范围相交的范围字段值的文档。
- CONTAINS：匹配具有完全包含查询范围的范围字段值的文档。
- WITHIN：匹配具有完全在查询范围内的范围字段值的文档。
boost：（可选，float），用于减少或者增加相关性分数的浮点数。默认为1.0

在`text`和`keyword`上运行`range`查询

如果search.allow_expensive_queries设置为false，那么在text和keyword上运行范围查询将不会被执行。

在`date`字段上运行`range`查询

当field字段参数是一个date数据类型，你可以使用带有以下参数的date match：

gt
gte
lt
lte

例如，以下查询将会返回@timestamp字段包含今天和明天的文档。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "range": {
      "timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  }
}
'

正则查询

返回与正则表达式匹配的术语文档。

请求例子

以下请求中，返回user.id字段中已k开头，并且以y结尾的术语的文档。

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "regexp": {
      "user.id": {
        "value": "k.*y",
        "flags": "ALL",
        "case_insensitive": true,
        "max_determinized_states": 10000,
        "rewrite": "constant_score"
      }
    }
  }
}
'

regexp顶级参数

<field>：（必须，对象），你希望搜索的字段

`<field>`参数

value：（必须，字符串），你希望在提供的<field>字段上运行value正则表达式来匹配术语。

默认情况下，正在表达式限制1000个字符。你可以通过index.max_regex_length来修改这个设置。
flags：（可选，字符串），为正则表达式启用可选的运算符。

提示：如果search.allow_expensive_queries设置为false，则不会运行正则表达式查询。

term询

返回指定字段包含精确术语的文档。你可以根据一个精确的值使用term查询文档，例如price、productID、username。

提示：避免在text字段上使用term查询。

请求例子

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "user.id": {
        "value": "kimchy",
        "boost": 1.0
      }
    }
  }
}
'

term顶级参数

<field>：（必须，对象），你希望搜索的字段

`<field>`参数

value：（必须，字符串），你希望在提供的<field>字段上查找的术语。要返回文档，这个术语必须精确匹配字段值，包括空格和大写。
boost：（可选。flaot），增加或者减少相关性分数的浮点数。默认为1.0

避免在`text`上使用`term`查询

默认情况下，ElasticSearch会在分析期间改变text字段的值。例如、默认的standard analyzer分析器会根据以下规则修改text字段的值。

删除大多数标点符号
将剩余的内容分为独立的单词。成为分词
将分词转换成小写

为了更好地搜索文本字段，匹配查询还会在执行搜索之前分析您提供的搜索词。这意味着匹配查询可以在文本字段中搜索已分析的分词，而不是精确的词。

term查询不会分析提供的搜索词。term查询只会搜索提供的精确词。这就意味着在text字段上执行term查询可能返回较差的结果或者没有返回结果。

尝试根据以下例子，查看结果的不同。

使用名为full_text的text字段创建索引。

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "full_text": { "type": "text" }
    }
  }
}
'

在文档中索引full_text字段值为Quick Brown Foxes!
```
curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "full_text":   "Quick Brown Foxes!"
}
'
```
因为full_text是一个text字段，ElasticSearch在分析期间会将Quick Brown Foxes!改变为[quick, brown, fox]。
使用term查询在full_text字段中搜索Quick Brown Foxes!。包含pretty参数以至于响应的结果更加具有可读性。
```
curl -X GET "localhost:9200/my-index-000001/_search?pretty&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "term": {
      "full_text": "Quick Brown Foxes!"
    }
  }
}
'
```
因为full_text字段不在包含Quick Brown Foxes!精确值，所有响应不包含任何结果：

你可以使用match查询来搜索Quick Brown Foxes!

curl -X GET "localhost:9200/my-index-000001/_search?pretty&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "match": {
      "full_text": "Quick Brown Foxes!"
    }
  }
}
'

Terms查询

在提供的字段上查询包含一个或者多个精确术语的文档。

terms查询跟term查询类似，只不过是terms可以查询多个值。

请求例子

根据以下请求，查询user.id字段为kimchy或者elkbee精确值的文档

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "terms": {
      "user.id": [ "kimchy", "elkbee" ],
      "boost": 1.0
    }
  }
}
'

terms顶级参数

<field>：（必须，对象），你希望搜索的字段。这个参数值将要在指定字段上查找的精确值数组。返回的文档必须匹配提供的一个或者多个精确值，支持空格和大小写。默认情况下。该数组最大为65536个术语。你可以使用index.max_terms_count来修改这个限制。
boost：（可选的，float），增加或者减少相关性分数的浮点数。

术语查找例子

以下是术语查找的工作例子：

创建名为color的keyword类型的字段

curl -X PUT "localhost:9200/my-index-000001?pretty" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "color": { "type": "keyword" }
    }
  }
}
'

在颜色字段中索引 ID 为 1 且值为 [“blue”, “green”] 的文档。

curl -X PUT "localhost:9200/my-index-000001/_doc/1?pretty" -H 'Content-Type: application/json' -d'
{
  "color":   ["blue", "green"]
}
'

在颜色字段中索引另一个 ID 为 2 且值为 blue 的文档。

curl -X PUT "localhost:9200/my-index-000001/_doc/2?pretty" -H 'Content-Type: application/json' -d'
{
  "color":   "blue"
}
'

使用term lookup参数的terms查询来查找包含一个或者多个相同术语和文档id为2的文档。

curl -X GET "localhost:9200/my-index-000001/_search?pretty&pretty" -H 'Content-Type: application/json' -d'
{
  "query": {
    "terms": {
        "color" : {
            "index" : "my-index-000001",
            "id" : "2",
            "path" : "color"
        }
    }
  }
}
'

因为文档2和文档1的color字段都包含blue值，所有会返回以下hits

{
  "took" : 17,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "color" : [
            "blue",
            "green"
          ]
        }
      },
      {
        "_index" : "my-index-000001",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "color" : "blue"
        }
      }
    ]
  }
}

terms-set查询

在给定字段中查找包含最少数量精确术语值的文档

terms-set查询跟terms查询类似，不同之处在于terms-set查询可以指定匹配术语的最小数量。例如：

programming_languages字段包含已知编程语言的列表。例如java、php、c++。你可以使用terms-set查询来匹配包含最少两个语言的文档
permisson字段，包含应用的权限列表。你可以使用terms-set查询可以匹配权限的子集。

请求例子

大多数例子中，你需要在索引映射中包含一个numeric字段。这个numeric字段包含返回文档所需要的必须匹配到的术语数量。

要了解如何为 term_set 查询设置索引，请尝试以下示例。

创建名为job-candidates索引，并包含以下字段
- name，keyword类型的字段。包含求职者的姓名
- programming_languages,keyword字段。这个字段包含求职者会的编程语言。
- required_matches，numericlong类型的字段。这个字段包含返回文档所需要的匹配术语的数量。
```
curl -X PUT "localhost:9200/job-candidates?pretty" -H 'Content-Type: application/json' -d'{  "mappings": {    "properties": {      "name": {        "type": "keyword"      },      "programming_languages": {        "type": "keyword"      },      "required_matches": {        "type": "long"      }    }  }}'
```

索引一个id为1的文档，并且：

name字段值为：Jane Smith
programming_languages字段值为：["c++", "java"]
required_matches字段值为2

包括 ?refresh 参数，以便立即搜索文档。

curl -X PUT "localhost:9200/job-candidates/_doc/1?refresh&pretty" -H 'Content-Type: application/json' -d'{  "name": "Jane Smith",  "programming_languages": [ "c++", "java" ],  "required_matches": 2}'

索引另一个文档id为2，并且：

name字段为Jason Response
programming_languages字段值为：["java", "php"]
required_matches字段值为2

curl -X PUT "localhost:9200/job-candidates/_doc/2?refresh&pretty" -H 'Content-Type: application/json' -d'{  "name": "Jason Response",  "programming_languages": [ "java", "php" ],  "required_matches": 2}'

你现在可以使用required_matches字段值作为terms_set查询中返回文档所需要的匹配术语的数量。

下面的搜索返回的文档中programming_languages字段至少包含以下两个术语:

c++
java
php

minimum_should_match_field是required_matches。这意味着所需的匹配项数是2，即required_matches字段的值。

curl -X GET "localhost:9200/job-candidates/_search?pretty" -H 'Content-Type: application/json' -d'{  "query": {    "terms_set": {      "programming_languages": {        "terms": [ "c++", "java", "php" ],        "minimum_should_match_field": "required_matches"      }    }  }}'

通配符查询

curl -X GET "localhost:9200/_search?pretty" -H 'Content-Type: application/json' -d'{  "query": {    "wildcard": {      "user.id": {        "value": "ki*y",        "boost": 1.0,        "rewrite": "constant_score"      }    }  }}'