ES7基础篇-08-搜索(匹配,过滤，布尔，高亮，排序)

置顶 Alan0517

已于 2023-05-31 10:16:17 修改

阅读量4.2k

点赞数

分类专栏： ES-基础篇文章标签： elasticsearch

于 2021-04-01 23:22:39 首次发布

本文链接：https://blog.csdn.net/Hmj050117/article/details/115387083

版权

ES-基础篇专栏收录该内容

10 篇文章 5 订阅

订阅专栏

1. 基础概念

1.1 分数（score）

ES的搜索结果是按照相关分数的高低进行排序的，因为在搜索的过程中，会计算这个分数。这个分数代表了这条记录匹配搜索内容的相关程度。分数是一个浮点型的数字，对应的是搜索结果中的_score字段，分数越高代表匹配度越高，排序越靠前。

在ES的搜索当中，分为两种，一种计算分数，而另外一种是不计算分数的。

1.2 查询（query context）

查询，代表的是这条记录与搜索内容匹配的怎么样，除了决定这条记录是否匹配外，还要计算这条记录的相关分数。这个和咱们平时的查询是一样的，比如我们搜索一个关键词，分词以后匹配到相关的记录，这些相关的记录都是查询的结果，那这些结果谁排名靠前，谁排名靠后呢？这个就要看匹配的程度，也就是计算的分数。

1.3 过滤（filter context）

过滤，代表的含义非常的简单，就是YES or NO，这条记录是否匹配查询条件，它不会计算分数。频繁使用的过滤还会被ES加入到缓存，以提升ES的性能。

2. `基本查询`

基本语法

GET /索引库名/_search
{
    "query":{
        "查询类型":{
            "查询条件":"查询条件值"
        }
    }
}

这里的query代表一个查询对象，里面可以有不同的查询属性

查询类型：
- 例如：match_all， match，term ， range 等等
查询条件：查询条件会根据类型的不同，写法也有差异，后面详细讲解

2.1 查询所有（`match_all`)

示例：

GET wql/_search
{
  "query": {
    "match_all": {}
  }
}

query：代表查询对象
match_all：代表查询所有

结果：

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "userName" : "zhansan",
          "userPhone" : "15727538286",
          "userAdress" : "江西省宜春市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "userName" : "wangwu",
          "userPhone" : "15797721570",
          "userAdress" : "江西省南昌市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}

took：查询花费时间，单位是毫秒
time_out：是否超时
_shards：分片信息
hits：搜索结果总览对象
- total：搜索到的总条数
- max_score：所有结果中文档得分的最高分
- hits：搜索结果的文档对象数组，每个元素是一条搜索到的文档信息
  - _index：索引库
  - _type：文档类型
  - _id：文档id
  - _score：文档得分
  - _source：文档的源数据

2.2 匹配查询（`match`）

match类型查询，会把查询条件进行分词，然后进行查询,默认多个词条之间是or的关系

GET wql/_search
{
  "query": {
    "match": {
      "userAdress": "九江市上高县"
    }
  }
}

结果：

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 4.387768,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.387768,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 3.6486926,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "zhansan",
          "userPhone" : "15727538286",
          "userAdress" : "江西省宜春市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "wangwu",
          "userPhone" : "15797721570",
          "userAdress" : "江西省南昌市上高县泗溪镇"
        }
      }
    ]
  }
}

结果发现，多个词之间是or的关系。

and关系

某些情况下，我们需要更精确查找，我们希望这个关系变成and，可以这样做：

GET wql/_search
{
  "query": {
    "match": {
      "userAdress": {
        "query": "九江市上高县",
        "operator": "and"
      }
    }
  }
}

结果：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 4.387768,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.387768,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}

2.3 多字段查询（`multi_match`)

multi_match与match类似，不同的是它可以在多个字段中查询

这里我特意新增一条数据做测试

POST /wql/_doc/6
{
  "userName": "九江市",
  "userPhone": "15727538286",
  "userAdress": "黄塘村"
}

GET wql/_search
{
  "query": {
    "multi_match": {
      "query": "九江市",
      "fields": [
        "userAdress",
        "userName"
      ]
    }
  }
}

本案例当中，我们会在userAdress和userName查找

结果:

{
  "took" : 266,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 5.587492,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 5.587492,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 4.236225,
        "_source" : {
          "userName" : "九江市",
          "userPhone" : "15727538286",
          "userAdress" : "黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.651648,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}

2.4 词条匹配(`term`)

term 查询被用于精确值匹配，这些精确值可能是数字、时间、布尔或者那些未分词的字符串

GET wql/_search
{
  "query": {
    "term": {
      "userPhone": {
        "value": "17067888006"
      }
    }
  }
}

结果：

{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.540445,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.540445,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      }
    ]
  }
}

2.5 多词条精确匹配(`terms`)

terms 查询和 term 查询一样，但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值，那么这个文档满足条件：

GET wql/_search
{
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

结果：

{
  "took" : 147,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userName" : "九江市",
          "userPhone" : "17067888007",
          "userAdress" : "黄塘村"
        }
      }
    ]
  }
}

3. `过滤`

3.1 `_source过滤`

默认情况下，elasticsearch在搜索的结果中，会把文档中保存在_source的所有字段都返回。

如果我们只想获取其中的部分字段，我们可以添加_source的过滤

3.1.1 直接指定字段

示例：

GET wql/_search
{
  "_source": ["userPhone"], 
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

返回的结果：

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888006"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888007"
        }
      }
    ]
  }
}

3.1.2 指定`includes`和`excludes`

我们也可以通过：

includes：来指定想要显示的字段
excludes：来指定不想要显示的字段

二者都是可选的。

注意： 都有时，excludes优先级>includes优先级

示例：

GET wql/_search
{
  "_source": {
    "includes": [
      "userPhone",
      "userName"
    ],
    "excludes": [
      "userAdress",
      "userName"
    ]
  },
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

结果如下：

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888006"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888007"
        }
      }
    ]
  }
}

3.1.3 `filter过滤`

条件查询中进行过滤

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤，并且不希望过滤条件影响评分，那么就不要把过滤条件作为查询条件来用。而是使用filter方式：

GET wql/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "userAdress": "江西"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "userPhone": {
              "gte": 17067888006
            }
          }
        }
      ]
    }
  }
}

无查询条件，直接过滤

如果一次查询只有过滤，没有查询条件，不希望进行评分，我们可以使用constant_score取代只有 filter 语句的 bool 查询。在性能上是完全相同的，但对于提高查询简洁性和清晰度有很大帮助。

GET /wql/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "userPhone": {
            "gte": 17067888006
          }
        }
      }
    }
  }
}

4. `高级查询`

4.1 布尔组合（`bool`)

关键词	描述
must	必须满足的条件，而且会计算分数
filter	必须满足的条件，不会计算分数
should	可以满足的条件，会计算分数
must_not	必须不满足的条件，不会计算分数

bool把各种其它查询通过must（与）、must_not（非）、should（或）的方式进行组合

GET /wang/_search
{
    "query":{
        "bool":{
        	"must":     { "match": { "title": "大米" }},
        	"must_not": { "match": { "title":  "电视" }},
        	"should":   { "match": { "title": "手机" }}
        }
    }
}

结果：

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      }
    ]
  }
}

4.2 范围查询(`range`)

range 查询找出那些落在指定区间内的数字或者时间

GET /wang/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000.0,
                "lt":   2800.00
            }
    	}
    }
}

range查询允许以下字符：

操作符	说明
gt	大于
gte	大于等于
lt	小于
lte	小于等于

4.3 模糊查询(`fuzzy`)

fuzzy 查询是 term 查询的模糊等价。它允许用户搜索词条与实际词条的拼写出现偏差，但是偏差的编辑距离不得超过2：

GET /wang/_search
{
  "query": {
    "fuzzy": {
      "title": "appla"
    }
  }
}

上面的查询，也能查询到apple手机

我们可以通过fuzziness来指定允许的编辑距离：

GET /wang/_search
{
  "query": {
    "fuzzy": {
        "title": {
            "value":"appla",
            "fuzziness":1
        }
    }
  }
}

4.4 `Boosting Query`

这个查询比较有意思，它有两个关键词positive和negative:

positive是“正”，所有满足positive条件的数据都会被查询出来;
negative是“负”，满足negative条件的数据并不会被过滤掉，而是会扣减分数。
negative_boost是得分的系数，它的分数在0~1之间，满足了negative条件的数据，它们的分数会乘以这个系数，比如这个系数是0.5，原来100分的数据如果满足了negative条件，它的分数会乘以0.5，变成50分。

5. `排序`

5.1 `单字段排序`

sort 可以让我们按照不同的字段进行排序，并且通过order指定排序的方式

GET /wang/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

5.2 `多字段排序`

假定我们想要结合使用 price和 _score（得分）进行查询，并且匹配的结果首先按照价格排序，然后按照相关性得分排序：

GET /goods/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "小米手机" }},
        	"filter":{
                "range":{"price":{"gt":200000,"lt":300000}}
        	}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

6. `高亮`

elasticsearch中实现高亮的语法比较简单：

GET /wang/_search
{
  "query": {
    "match": {
      "title": "手机"
    }
  },
  "highlight": {
    "pre_tags": "<em>",
    "post_tags": "</em>", 
    "fields": {
      "title": {}
    }
  }
}

在使用match查询的同时，加上一个highlight属性：

pre_tags：前置标签
post_tags：后置标签
fields：需要高亮的字段
- title：这里声明title字段需要高亮，后面可以为这个字段设置特有配置，也可以空

结果：

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "title": "大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        },
        "highlight": {
          "title": [
            "大米<em>手机</em>"
          ]
        }
      },
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "JP6xa2kBtq36Pzvxpjaf",
        "_score": 0.19856805,
        "_source": {
          "title": "小米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        },
        "highlight": {
          "title": [
            "小米<em>手机</em>"
          ]
        }
      },
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "3",
        "_score": 0.16853254,
        "_source": {
          "title": "超大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3299,
          "stock": 200,
          "saleable": true,
          "subTitle": "哈哈"
        },
        "highlight": {
          "title": [
            "超大米<em>手机</em>"
          ]
        }
      }
    ]
  }
}

7. `分页`

通过from和size来指定分页的开始位置及每页大小。

语法：

GET /wang/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "from": 10000,
  "size": 2
}

但是，其本质是逻辑分页，因此为了避免深度分页的问题，ES限制最多查到第10000条。

如果需要查询到10000以后的数据，你可以采用两种方式：

scroll滚动查询
search after

Alan0517

关注

0
点赞
踩
13

收藏

觉得还不错? 一键收藏
打赏
0
评论
ES7基础篇-08-搜索(匹配,过滤，布尔，高亮，排序)

查询，代表的是这条记录与搜索内容匹配的怎么样，除了决定这条记录是否匹配外，还要计算这条记录的相关分数。这个和咱们平时的查询是一样的，比如我们搜索一个关键词，分词以后匹配到相关的记录，这些相关的记录都是查询的结果，那这些结果谁排名靠前，谁排名靠后呢？这个就要看匹配的程度，ES的搜索结果是按照相关分数的高低进行排序的，因为在搜索的过程中，会计算这个分数。过滤，代表的含义非常的简单，就是YES or NO，这条记录是否匹配查询条件，在ES的搜索当中，分为两种，一种计算分数，而另外一种是不计算分数的。
复制链接

扫一扫