Elasticsearch7.17学习笔记

前言 

       本学习笔记主要基于 阅读Elasticsearch7.17版本官方文档和实操总结而来,官方文档地址https://www.elastic.co/guide/en/elasticsearch/reference/7.17/index.html

目录

一、ES的存储形式

二、使用方式

2.1 向ES中添加文档

2.2 搜索

2.3 get specific fields 

2.4 范围查询

2.5 extract fields from unstrctured content  从非结构化内容中提取fields

2.6 Combine queries 组合查询

2.7 Aggregate data  聚合数据

2.8 图解一个请求​

2.9 field data type  字段包括哪些类型

2.10 解释 结构化数据、非结构化数据、半结构化数据

2.11 term和match区别

三、Query DSL

3.1 dis_max 分离最大化

3.2 boosting query 

3.3 constant_score  

3.4 function_score query  用户自定义score机制

3.5 intervals query  间隔查询

3.6 match query 

3.7 combined_feilds 多个字段

3.8 multi_match

3.9 query_string

3.10 joining query

3.11 percolate query

3.12 rank_feature

3.13 pinned query 

3.14 fuzzy query

3.15 exist

3.16 wildcard query  通配符查询

结语


一、ES的存储形式

1.Elasticsearch stores complex data structures that have been serialized as JSON documents

ES存储已序列化为JSON文档的复杂数据结构

2.When a document is stored, it is indexed

当文档被存储时,它会被建立索引

3.An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in

倒排索引列出任何文档中出现的每个惟一单词,并标识每个单词出现的所有文档

4.An index can be thought of as an optimized collection of documents and each document is a collection of fields, which are the key-value pairs that contain your data

可以将索引看作是文档的优化集合,每个文档都是字段的集合,字段是包含数据的键-值对

5.term 是中文 ‘术语或者条款 / 项’ 的意思

match 是‘匹配’的意思,复数是matches

extract ‘提取/索取’

shard 分片

6.The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two.  Structured queries are similar to the types of queries you can construct in SQL.  For example, you could search the gender and age fields in your employee index and sort the matches by the hire_date field.  Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms

Elasticsearch REST api支持结构化查询、全文查询和将两者结合起来的复杂查询。结构化查询类似于可以在SQL中构造的查询类型。例如,您可以在员工索引中搜索性别和年龄字段,并根据hire_date字段对匹配进行排序。全文查询查找与查询字符串匹配的所有文档,并按相关性(它们与搜索词的匹配程度)排序返回它们

二、使用方式

2.1 向ES中添加文档

add single document 添加单个文档

向ES服务器发送这个请求body

POST logs-my_app-default/_doc
{
  "@timestamp": "2099-05-06T16:21:15.000Z",
  "event": {
    "original": "192.0.2.42 - - [06/May/2099:16:21:15 +0000] \"GET /images/bg.jpg HTTP/1.0\" 200 24736"
  }
}

得到响应,其中

        _index 包含存储的文档

        _id是文档在索引中的唯一id

{
  "_index": ".ds-logs-my_app-default-2099-05-06-000001",
  "_type": "_doc",
  "_id": "gl5MJXMBMk1dGnErnBW8",
  "_version": 1,
  "result": "created",
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "_seq_no": 0,
  "_primary_term": 1
}

一条request请求 添加多个文档

使用_bulk在首行末端 ,多个文档间需换行,每个文档都是json格式

示例:

PUT logs-my_app-default/_bulk
{ "create": { } }
{ "@timestamp": "2099-05-07T16:24:32.000Z", "event": { "original": "192.0.2.242 - - [07/May/2020:16:24:32 -0500] \"GET /images/hm_nbg.jpg HTTP/1.0\" 304 0" } }
{ "create": { } }
{ "@timestamp": "2099-05-08T16:25:42.000Z", "event": { "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638" } }

2.2 搜索

此条请求,将匹配logs-my_app-default中的所有日志条目,并按@timestamp降序对它们进行排序

GET logs-my_app-default/_search
{
  "query": {
    "match_all": { }
  },
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

得到response如下,其中

默认情况下hits部分最多包括与搜索匹配的前10个文档。每个命中的_source是已提交了索引的原始数据(json)

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 3,
      "relation": "eq"
    },
    "max_score": null,
    "hits": [
      {
        "_index": ".ds-logs-my_app-default-2099-05-06-000001",
        "_type": "_doc",
        "_id": "PdjWongB9KPnaVm2IyaL",
        "_score": null,
        "_source": {
          "@timestamp": "2099-05-08T16:25:42.000Z",
          "event": {
            "original": "192.0.2.255 - - [08/May/2099:16:25:42 +0000] \"GET /favicon.ico HTTP/1.0\" 200 3638"
          }
        },
        "sort": [
          4081940742000
        ]
      },
      ...
    ]
  }
}

2.3 get specific fields 

解析整个_source对于大型文档来说是笨拙的。要从响应中排除它,请将_source参数设置为false。作为代替,使用fields参数来检索所需的字段

示例:

GET logs-my_app-default/_search
{
  "query": {
    "match_all": { }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

response以平面数组的形式包含每个命中的field值, 区别与上次搜索可观察fields和_score的内容

{
  ...
  "hits": {
    ...
    "hits": [
      {
        "_index": ".ds-logs-my_app-default-2099-05-06-000001",
        "_type": "_doc",
        "_id": "PdjWongB9KPnaVm2IyaL",
        "_score": null,
        "fields": {
          "@timestamp": [
            "2099-05-08T16:25:42.000Z"
          ]
        },
        "sort": [
          4081940742000
        ]
      },
      ...
    ]
  }
}

2.4 范围查询

在query中使用range关键字

gte: greater than or equal 大于等于

lt: less than 小于

GET logs-my_app-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2099-05-05",
        "lt": "2099-05-08"
      }
    }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

可以使用date math 来定义相对时间范围。下面的查询是 搜索过去一天的数据,而不是 去匹配logs-my_app-default中的任何日志条目(上一条请求会去匹配logs-my_app-default中时间戳字段的内容去搜索)。注意观察gte和lt 内容中使用了所谓的date math

GET logs-my_app-default/_search
{
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-1d/d",
        "lt": "now/d"
      }
    }
  },
  "fields": [
    "@timestamp"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

2.5 extract fields from unstrctured content  从非结构化内容中提取fields

这种搜索 用到了映射,讲 从非结构化内容中提取fields之前,先说一下mapping映射

Mapping is the process of defining how a document, and the fields it contains, are stored and indexed.  

Each document is a collection of fields, which each have their own data type.  When mapping your data, you create a mapping definition, which contains a list of fields that are pertinent to the document.  A mapping definition also includes metadata fields, like the _source field, which customize how a document’s associated metadata is handled.  

Use dynamic mapping and explicit mapping to define your data.  Each method provides different benefits based on where you are in your data journey.  For example, explicitly map fields where you don’t want to use the defaults, or to gain greater control over which fields are created.  You can then allow Elasticsearch to add other fields dynamically.

映射是定义文档及其包含的字段如何存储和索引的过程。

每个文档都是字段的集合,每个字段都有自己的数据类型。当映射数据时,创建映射定义,其中包含与文档相关的字段列表。映射定义还包括元数据字段,如_source字段,它自定义如何处理文档关联的元数据。

使用动态映射和显式映射来定义数据。每种方法根据您在数据旅程中的位置提供不同的好处。例如,显式地映射不希望使用默认值的字段,或者获得对创建哪些字段的更大控制。然后,您可以允许Elasticsearch 动态 添加其他字段。

Experiment with mapping options  

Define runtime fields in a search request to experiment with different mapping options, and also fix mistakes in your index mapping values by overriding values in the mapping during the search request.

通过映射选项进行试验

在搜索请求中定义 runtime fields,以试验不同的映射选项,并通过在搜索请求期间覆盖映射中的值来修复索引映射值中的错误

搜索语句示例:

其中runtime_mappings部分用到了映射,其中进行了获取source.ip的操作。而后又会在响应中将source.ip放在fields中

GET logs-my_app-default/_search
{
  "runtime_mappings": {
    "source.ip": {
      "type": "ip",
      "script": """
        String sourceip=grok('%{IPORHOST:sourceip} .*').extract(doc[ "event.original" ].value)?.sourceip;
        if (sourceip != null) emit(sourceip);
      """
    }
  },
  "query": {
    "range": {
      "@timestamp": {
        "gte": "2099-05-05",
        "lt": "2099-05-08"
      }
    }
  },
  "fields": [
    "@timestamp",
    "source.ip"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

响应如下

{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "J1zs312345B1SeF7g53S",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-07T16:24:32.000Z"
          ],
          "source.ip" : [
            "192.1.2.122"
          ]
        },
        "sort" : [
          4081854272000
        ]
      },
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Ilzr3IMBd9B43217032T",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "source.ip" : [
            "192.1.2.122"
          ]
        },
        "sort" : [
          4081767675000
        ]
      },
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Jlz567Bd9B1SeF89XPX",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "source.ip" : [
            "192.1.2.122"
          ]
        },
        "sort" : [
          4081767675000
        ]
      }
    ]
  }
}

2.6 Combine queries 组合查询

使用bool这个参数

The following search combines two range queries: one on @timestamp and one on the source.ip runtime field

示例

GET logs-my_app-default/_search
{
  "runtime_mappings": {
    "source.ip": {
      "type": "ip",
      "script": """
        String sourceip=grok('%{IPORHOST:sourceip} .*').extract(doc[ "event.original" ].value)?.sourceip;
        if (sourceip != null) emit(sourceip);
      """
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "2099-05-05",
              "lt": "2099-05-08"
            }
          }
        },
        {
          "range": {
            "source.ip": {
              "gte": "192.0.2.0",
              "lte": "192.0.2.240"
            }
          }
        }
      ]
    }
  },
  "fields": [
    "@timestamp",
    "source.ip"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

响应

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Ilzr3IMBd9B1SeF703MT",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "source.ip" : [
            "192.0.2.42"
          ]
        },
        "sort" : [
          4081767675000
        ]
      },
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Jlzs3IMBd9B1SeF7IXPX",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "source.ip" : [
            "192.0.2.42"
          ]
        },
        "sort" : [
          4081767675000
        ]
      }
    ]
  }
}

compound query 混合查询中,下面这个查询返回满足must/filter/must_not/should的结果

minimum_should_match是设置应满足should的比例(多个should字句时,设定满足多少should比例才返回)

boost是设定此搜索条件的权重

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user.id" : "kimchy" }
      },
      "filter": {
        "term" : { "tags" : "production" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tags" : "env1" } },
        { "term" : { "tags" : "deployed" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

2.7 Aggregate data  聚合数据

Use aggregations to summarize data as metrics, statistics, or other analytics.  

The following search uses an aggregation to calculate the average_response_size using the http.response.body.bytes runtime field.   The aggregation only runs on documents that match the query

使用聚合将数据总结为度量、统计或其他分析。

下面的搜索 通过http.response.body.bytes runtime field 聚合计算average_response_size。这个聚合建立在与查询匹配的文档上

request示例,其中

runtime_mappings会求得每个请求的http.response.body.bytes放在fields中,用agg关键字声明 聚合 ,聚合字段定义为average_response_size,在其中用avg求平均每个field的http.response.body.bytes

GET logs-my_app-default/_search
{
  "runtime_mappings": {
    "http.response.body.bytes": {
      "type": "long",
      "script": """
        String bytes=grok('%{COMMONAPACHELOG}').extract(doc[ "event.original" ].value)?.bytes;
        if (bytes != null) emit(Integer.parseInt(bytes));
      """
    }
  },
  "aggs": {
    "average_response_size":{
      "avg": {
        "field": "http.response.body.bytes"
      }
    }
  },
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": {
              "gte": "2099-05-05",
              "lt": "2099-05-08"
            }
          }
        }
      ]
    }
  },
  "fields": [
    "@timestamp",
    "http.response.body.bytes"
  ],
  "_source": false,
  "sort": [
    {
      "@timestamp": "desc"
    }
  ]
}

response,其中

aggregations字段中包含 聚合计算出的内容

#! Elasticsearch built-in security features are not enabled. Without authentication, your cluster could be accessible to anyone. See https://www.elastic.co/guide/en/elasticsearch/reference/7.17/security-minimal-setup.html to enable security.
{
  "took" : 112,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "J1zs3gewsd9B1SeF7gnPS",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-07T16:24:32.000Z"
          ],
          "http.response.body.bytes" : [
            0
          ]
        },
        "sort" : [
          4081854272000
        ]
      },
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Ilzr3IMBd9B1S321033T",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "http.response.body.bytes" : [
            24736
          ]
        },
        "sort" : [
          4081767675000
        ]
      },
      {
        "_index" : ".ds-logs-my_app-default-2022.10.15-000001",
        "_type" : "_doc",
        "_id" : "Jlzs3IMBd9B1S12345PX",
        "_score" : null,
        "fields" : {
          "@timestamp" : [
            "2099-05-06T16:21:15.000Z"
          ],
          "http.response.body.bytes" : [
            24736
          ]
        },
        "sort" : [
          4081767675000
        ]
      }
    ]
  },
  "aggregations" : {
    "average_response_size" : {
      "value" : 16490.666666666668
    }
  }
}

2.8 图解一个请求

 响应

2.9 field data type  字段包括哪些类型

每个field 数据有自己的field data 类型,比如是:text、keyword、boolean、Dates、Range(long_range/double_range/date_range)、ip等等

        其中,keyword类型经常用在 sorting, aggregations聚合, and term-level queries, such as term.应避免使用keyword fields full-text search全文搜索,应使用text field type作为代替

2.10 解释 结构化数据、非结构化数据、半结构化数据

结构化数据是指可以使用关系型数据库表示和存储,表现为二维形式的数据。一般特点是:数据以行为单位,一行数据表示一个实体的信息,每一行数据的属性是相同的

非机构化数据就是没有固定结构的数据。各种文档、图片、视频/音频等都属于非结构化数据。对于这类数据,我们一般直接整体进行存储,而且一般存储为二进制的数据格式

半结构化数据是结构化数据的一种形式,它并不符合关系型数据库或其他数据表的形式关联起来的数据模型结构,但包含相关标记,用来分隔语义元素以及对记录和字段进行分层。因此,它也被称为自描述的结构。半结构化数据,属于同一类实体可以有不同的属性,即使他们被组合在一起,这些属性的顺序并不重要。常见的半结构数据有XML和JSON

2.11 term和match区别

term是精确搜索,不会对 query的内容 进行分词,拿着整个的query内容 去es中查找。

match是模糊搜索,会对 query内容 进行分词,即使只有一个字命中也会返回,返回所有的命中结果并附带相似分数_score,我们这里说的分词es官方称作 analyzed text fields(分析过了的文本字段)

示例:先添加三条文档

POST /test/_doc
{
"name": "张三",
"age": 25
}

POST /test/_doc
{
"name": "张无忌",
"age": 50
}

POST /test/_doc
{
"name": "李四",
"age": 30
}

term精确搜索‘张’,响应出 张三和张无忌

# 请求
GET test/_search
{
  "query": {
    "term": {"name": "张"}
}
}

# 响应
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.7549127,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "13q_34MBwb0XtVQgyvdY",
        "_score" : 0.7549127,
        "_source" : {
          "name" : "张三",
          "age" : 25
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2HrA34MBwb0XtVQgbfd6",
        "_score" : 0.6407243,
        "_source" : {
          "name" : "张无忌",
          "age" : 50
        }
      }
    ]
  }
}

term精确搜索‘张三’,无结果并没有返回之前存的张三

es默认存储的是text类型字段,默认的分词器会对存储内容进行分词存到倒排索引中,所以即使我们之前存了‘张三’,也term不出来匹配的结果,因为es中的张三已经被分词了,分成了'张' '三'。

# 请求
GET test/_search
{
  "query": {
    "term": {"name": "张三"}
}
}


#  响应
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

match演示,可以看到即使查的是 张三,结果中张无忌也出来了,并且张三的_score高于张无忌的_score

# 请求
GET test/_search
{
  "query": {
    "match": {"name": "张三"}
}
}

#  响应
{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 2.0661702,
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "13q_34MBwb0XtVQgyvdY",
        "_score" : 2.0661702,
        "_source" : {
          "name" : "张三",
          "age" : 25
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2HrA34MBwb0XtVQgbfd6",
        "_score" : 0.6407243,
        "_source" : {
          "name" : "张无忌",
          "age" : 50
        }
      }
    ]
  }
}

三、Query DSL

3.1 dis_max 分离最大化

分离最大化查询(Disjunction Max Query)指的是: 将任何与任一查询匹配的文档作为结果返回,但只将最佳匹配的评分作为查询的评分结果返回。而为了考虑查询的其他字段,在请求参数中加入tie_breaker,此时返回结果 综合score最大的fields的值 与 其他fields 的score乘以tie_breaker

3.2 boosting query 

其中包括positive和negative参数

positive 中的内容是希望匹配的内容

而negative中的内容会 根据negative_boost系数进行降低权重计算结果 返回

GET /_search
{
  "query": {
    "boosting": {
      "positive": {
        "term": {
          "text": "apple"
        }
      },
      "negative": {
        "term": {
          "text": "pie tart fruit crumble tree"
        }
      },
      "negative_boost": 0.5
    }
  }
}

3.3 constant_score  常量分数 

包装一个filter query,返回的结果中score是经过boost计算出来的

在constant_score下方必要参数:filter

可选参数:boost,默认为1,如果设为1.2则此条constant_score下每个filter匹配出来的文档score会乘以1.2

3.4 function_score query  用户自定义score机制

3.5  intervals query  间隔查询

根据 interval query的内容 与 匹配的内容词条 还有 间隔参数  制定间隔规则,去es中索取符合规则的文档

下面这条返回的结果my favorite food 三个单词是直接挨着的,因为max_gaps设的0, 顺序上它的后面是 hot water OR cold porridge ,因为参数ordered为true

这条搜索会匹配出my favorite food is cold porridge,而非when it's cold my favorite food is porridge

POST _search
{
  "query": {
    "intervals" : {
      "my_text" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "my favorite food",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "hot water" } },
                  { "match" : { "query" : "cold porridge" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

3.6 match query 

match_bool_prefix 构造成等价于bool term 查询,注意最后一个词是前缀匹配

GET /_search
{
  "query": {
    "match_bool_prefix" : {
      "message" : "quick brown f"
    }
  }
}

# 等价于下面这条

GET /_search
{
  "query": {
    "bool" : {
      "should": [
        { "term": { "message": "quick" }},
        { "term": { "message": "brown" }},
        { "prefix": { "message": "f"}}
      ]
    }
  }
}

match_phrase  短语查询

会去es中按 math_phrase 中的短语内容和顺序去  查数据

如下条,返回的结果必须完全满足有 this is a test 这个顺序的

这里也进行分词了内部是根据位置+1+2 这样满足匹配顺序的,而term query是不分词

GET /_search
{
  "query": {
    "match_phrase": {
      "message": "this is a test"
    }
  }
}

match_phrase_prefix

与 match_pharse类似,但是短语中最后一个字符在倒排序索引列表中进行通配符搜索。重要参数:模糊匹配数控制 max_expansions 默认值50,最小值为1

3.7 combined_feilds 多个字段

The combined_fields query supports searching multiple text fields as if their contents had been indexed into one combined field. The query takes a term-centric view of the input string: first it analyzes the query string into individual terms, then looks for each term in any of the fields. This query is particularly useful when a match could span multiple text fields, for example the title, abstract, and body of an article

combined_fields查询支持搜索多个文本字段,就像它们的内容已经被索引到一个组合字段中一样。查询采用以term为中心的输入字符串视图:首先,它将查询字符串分析为单个term,然后在任何字段中查找每个term。当匹配可以跨越多个文本字段时,例如标题、摘要和文章主体,此查询特别有用

示例:

在title, abstract, and body 三个字段中搜database  and systems

operator 也可以为or 

GET /_search
{
  "query": {
    "combined_fields" : {
      "query":      "database systems",
      "fields":     [ "title", "abstract", "body"],
      "operator":   "and"
    }
  }
}

3.8 multi_match

multi_match 查询将允许你在 mapping 使用不同的分词器,而 combine_fields 查询需要相同的分析器

示例

其中,type默认是best_fields,还可以是most_fields(等价于should )、phrase and phrase_prefix、cross_fields

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "brown fox",
      "type":       "best_fields",
      "fields":     [ "subject", "message" ],
      "tie_breaker": 0.3
    }
  }
}

# 等价于
GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "subject": "brown fox" }},
        { "match": { "message": "brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }

3.9 query_string

3.9.1 指定单个字段查询

GET /_search
{
  "query": {
    "query_string": {
      "query": "(new york city) OR (big apple)",
      "default_field": "content"
    }
  }
}

3.9.2 指定多个字段查询

GET /_search
{
  "query": {
    "query_string": {
      "fields": [ "content", "name" ],
      "query": "this AND that"
    }
  }

3.9.3 simple_query_string

simple_query_string查询的语法比query_string查询更有限,但它不会因为语法无效而返回错误。相反,它会忽略查询字符串中任何无效的部分。

3.10 joining query

包括netsed、haschild、hasparent

nested嵌套对象

先在mapping映射时定义type为nested,查询时用nested语句

PUT /my-index-000001
{
  "mappings": {
    "properties": {
      "obj1": {
        "type": "nested"
      }
    }
  }
}
GET /my-index-000001/_search
{
  "query": {
    "nested": {
      "path": "obj1",
      "query": {
        "bool": {
          "must": [
            { "match": { "obj1.name": "blue" } },
            { "range": { "obj1.count": { "gt": 5 } } }
          ]
        }
      },
      "score_mode": "avg"
    }
  }
}

3.11 percolate query

es的普通查询是通过某些条件来查询满足的文档,percolator则不同,先是注册一些条件,然后查询一条文档是否满足其中的某些条件。es的percolator特性在数据分类、数据路由、事件监控和预警方面都有很好的应用

现在Mapping中定义percolator

PUT /my-index-00001
{
  "mappings": {
    "properties": {
      "message": {
        "type": "text"
      },
      "query": {
        "type": "percolator"
      }
    }
  }
}

查询时再使用percolate参数

3.12 rank_feature

根据上下文动态地对文档进行评分是很常见的。 例如,如果你需要对某个类别内的更多文档进行评分,经典方案是提升(给低分的文档提分)基于某个值的文档,例如页面排名、点击量或类别。Elasticsearch 提供了两种基于值提高分数的新方法。一个是 rank feature 字段,另一个是它的扩展,即使用值向量。根据 rank_feature 或 rank_features 字段的数值提高文档的相关性分数。rank_feature 查询通常用在 bool 查询的 should 子句中,因此它的相关性分数被添加到 bool 查询的其他分数中。
(此处借鉴这篇文章)Elasticsearch:Rank feature query - 排名功能查询_Elastic 中国社区官方博客的博客-CSDN博客_elasticsearch 排名

3.13 pinned query 

Promotes selected documents to rank higher than those matching a given query. This feature is typically used to guide searchers to curated documents that are promoted over and above any "organic" matches for a search. The promoted or "pinned" documents are identified using the document IDs stored in the _id field.

提升选定文档的排名,使其高于匹配给定查询的文档的排名。该功能通常用于引导搜索者到经过策划的文档,这些文档会在搜索的任何“organic”匹配之上被提升。提升或"pinned" 文档使用存储在_id字段中的文档id进行标识。

比如,下面这个请求,ids中这些内容将置顶返回

GET /_search
{
  "query": {
    "pinned": {
      "ids": [ "1", "4", "100" ],
      "organic": {
        "match": {
          "description": "iphone"
        }
      }
    }
  }
}

3.14 fuzzy query

将我们fuzzy query的内容 进行模糊匹配(或者说自动改错纠正输入内容)到 es中去查找

GET /_search
{
  "query": {
    "fuzzy": {
      "user.id": {
        "value": "ki"
      }
    }
  }
}

3.15 exist

exists过滤document,查找出那些在特定字段有值的document,值可以为‘’不可以为null

3.16 wildcard query  通配符查询

Promotes selected documents to rank higher than those matching a given query. Returns documents that contain terms matching a wildcard pattern. A wildcard operator is a placeholder that matches one or more characters.  For example, the * wildcard operator matches zero or more characters.  You can combine wildcard operators with other characters to create a wildcard pattern.  

提升选定文档的排名,使其高于匹配给定查询的文档的排名。返回包含匹配通配符模式的术语的文档。通配符是匹配一个或多个字符的占位符。例如,*通配符匹配零个或多个字符。可以将通配符操作符与其他字符组合在一起,以创建通配符模式。

结语

        工作需要的原因,刚开始接触ES,读了两天的英文文档,在此写下文章作为记录。接下来还会持续的学习,欢迎阅读此文以及正在学习ES的同仁在评论区与我交流。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值