ES7基础篇-08-搜索(匹配,过滤,布尔,高亮,排序)

1. 基础概念

1.1 分数(score)

ES的搜索结果是按照相关分数的高低进行排序的,因为在搜索的过程中,会计算这个分数。这个分数代表了这条记录匹配搜索内容的相关程度。分数是一个浮点型的数字,对应的是搜索结果中的_score字段,分数越高代表匹配度越高,排序越靠前

在ES的搜索当中,分为两种,一种计算分数,而另外一种是不计算分数的。

1.2 查询(query context)

查询,代表的是这条记录与搜索内容匹配的怎么样,除了决定这条记录是否匹配外,还要计算这条记录的相关分数。这个和咱们平时的查询是一样的,比如我们搜索一个关键词,分词以后匹配到相关的记录,这些相关的记录都是查询的结果,那这些结果谁排名靠前,谁排名靠后呢?这个就要看匹配的程度,也就是计算的分数

1.3 过滤(filter context)

过滤,代表的含义非常的简单,就是YES or NO,这条记录是否匹配查询条件,它不会计算分数。频繁使用的过滤还会被ES加入到缓存,以提升ES的性能。

2. 基本查询

基本语法

GET /索引库名/_search
{
    "query":{
        "查询类型":{
            "查询条件":"查询条件值"
        }
    }
}

这里的query代表一个查询对象,里面可以有不同的查询属性

  • 查询类型:
    • 例如:match_allmatchtermrange 等等
  • 查询条件:查询条件会根据类型的不同,写法也有差异,后面详细讲解

2.1 查询所有(match_all)

示例:

GET wql/_search
{
  "query": {
    "match_all": {}
  }
}

  • query:代表查询对象
  • match_all:代表查询所有

结果:

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "userName" : "zhansan",
          "userPhone" : "15727538286",
          "userAdress" : "江西省宜春市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "userName" : "wangwu",
          "userPhone" : "15797721570",
          "userAdress" : "江西省南昌市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}

  • took:查询花费时间,单位是毫秒
  • time_out:是否超时
  • _shards:分片信息
  • hits:搜索结果总览对象
    • total:搜索到的总条数
    • max_score:所有结果中文档得分的最高分
    • hits:搜索结果的文档对象数组,每个元素是一条搜索到的文档信息
      • _index:索引库
      • _type:文档类型
      • _id:文档id
      • _score:文档得分
      • _source:文档的源数据

2.2 匹配查询(match

match类型查询,会把查询条件进行分词,然后进行查询,默认多个词条之间or的关系

GET wql/_search
{
  "query": {
    "match": {
      "userAdress": "九江市上高县"
    }
  }
}

结果:

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 5,
      "relation" : "eq"
    },
    "max_score" : 4.387768,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.387768,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 3.6486926,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "zhansan",
          "userPhone" : "15727538286",
          "userAdress" : "江西省宜春市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.1584977,
        "_source" : {
          "userName" : "wangwu",
          "userPhone" : "15797721570",
          "userAdress" : "江西省南昌市上高县泗溪镇"
        }
      }
    ]
  }
}


结果发现,多个词之间是or的关系。

  • and关系

某些情况下,我们需要更精确查找,我们希望这个关系变成and,可以这样做:

GET wql/_search
{
  "query": {
    "match": {
      "userAdress": {
        "query": "九江市上高县",
        "operator": "and"
      }
    }
  }
}

结果:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 4.387768,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 4.387768,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}


2.3 多字段查询(multi_match)

multi_matchmatch类似,不同的是它可以在多个字段中查询

这里我特意新增一条数据做测试

POST /wql/_doc/6
{
  "userName": "九江市",
  "userPhone": "15727538286",
  "userAdress": "黄塘村"
}

GET wql/_search
{
  "query": {
    "multi_match": {
      "query": "九江市",
      "fields": [
        "userAdress",
        "userName"
      ]
    }
  }
}

本案例当中,我们会在userAdress和userName查找

结果:

{
  "took" : 266,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 5.587492,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 5.587492,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 4.236225,
        "_source" : {
          "userName" : "九江市",
          "userPhone" : "15727538286",
          "userAdress" : "黄塘村"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 3.651648,
        "_source" : {
          "userName" : "zhaoliu",
          "userPhone" : "15727538286",
          "userAdress" : "江西省九江市上高县泗溪镇"
        }
      }
    ]
  }
}

2.4 词条匹配(term)

term 查询被用于精确值 匹配,这些精确值可能是数字、时间、布尔或者那些未分词的字符串

GET wql/_search
{
  "query": {
    "term": {
      "userPhone": {
        "value": "17067888006"
      }
    }
  }
}

结果:

{
  "took" : 27,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.540445,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.540445,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      }
    ]
  }
}

2.5 多词条精确匹配(terms)

terms 查询和 term 查询一样,但它允许你指定多值进行匹配。如果这个字段包含了指定值中的任何一个值,那么这个文档满足条件:

GET wql/_search
{
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

结果:

{
  "took" : 147,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userName" : "lisi",
          "userPhone" : "17067888006",
          "userAdress" : "江西省高安市上高县泗溪镇"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userName" : "九江市",
          "userPhone" : "17067888007",
          "userAdress" : "黄塘村"
        }
      }
    ]
  }
}

3. 过滤

3.1 _source过滤

默认情况下,elasticsearch在搜索的结果中,会把文档中保存在_source的所有字段都返回。

如果我们只想获取其中的部分字段,我们可以添加_source的过滤

3.1.1 直接指定字段

示例:

GET wql/_search
{
  "_source": ["userPhone"], 
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

返回的结果:

{
  "took" : 9,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888006"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888007"
        }
      }
    ]
  }
}

3.1.2 指定includesexcludes

我们也可以通过:

  • includes:来指定想要显示的字段
  • excludes:来指定不想要显示的字段

二者都是可选的。

注意: 都有时,excludes优先级>includes优先级

示例:

GET wql/_search
{
  "_source": {
    "includes": [
      "userPhone",
      "userName"
    ],
    "excludes": [
      "userAdress",
      "userName"
    ]
  },
  "query": {
    "terms": {
      "userPhone": [
        "17067888006",
        "17067888007"
      ]
    }
  }
}

结果如下:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888006"
        }
      },
      {
        "_index" : "wql",
        "_type" : "_doc",
        "_id" : "7",
        "_score" : 1.0,
        "_source" : {
          "userPhone" : "17067888007"
        }
      }
    ]
  }
}

3.1.3 filter过滤

条件查询中进行过滤

所有的查询都会影响到文档的评分及排名。如果我们需要在查询结果中进行过滤,并且不希望过滤条件影响评分,那么就不要把过滤条件作为查询条件来用。而是使用filter方式:

GET wql/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "userAdress": "江西"
          }
        }
      ],
      "filter": [
        {
          "range": {
            "userPhone": {
              "gte": 17067888006
            }
          }
        }
      ]
    }
  }
}

无查询条件,直接过滤

如果一次查询只有过滤,没有查询条件,不希望进行评分,我们可以使用constant_score取代只有 filter 语句的 bool 查询。在性能上是完全相同的,但对于提高查询简洁性和清晰度有很大帮助。

GET /wql/_search
{
  "query": {
    "constant_score": {
      "filter": {
        "range": {
          "userPhone": {
            "gte": 17067888006
          }
        }
      }
    }
  }
}

4. 高级查询

4.1 布尔组合(bool)

关键词描述
must必须满足的条件,而且会计算分数
filter必须满足的条件,不会计算分数
should可以满足的条件,会计算分数
must_not必须不满足的条件,不会计算分数

bool把各种其它查询通过must(与)、must_not(非)、should(或)的方式进行组合

GET /wang/_search
{
    "query":{
        "bool":{
        	"must":     { "match": { "title": "大米" }},
        	"must_not": { "match": { "title":  "电视" }},
        	"should":   { "match": { "title": "手机" }}
        }
    }
}

结果:

{
  "took": 10,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "title": "大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        }
      }
    ]
  }
}

4.2 范围查询(range)

range 查询找出那些落在指定区间内的数字或者时间

GET /wang/_search
{
    "query":{
        "range": {
            "price": {
                "gte":  1000.0,
                "lt":   2800.00
            }
    	}
    }
}

range查询允许以下字符:

操作符说明
gt大于
gte大于等于
lt小于
lte小于等于

4.3 模糊查询(fuzzy)

fuzzy 查询是 term 查询的模糊等价。它允许用户搜索词条与实际词条的拼写出现偏差,但是偏差的编辑距离不得超过2:

GET /wang/_search
{
  "query": {
    "fuzzy": {
      "title": "appla"
    }
  }
}

上面的查询,也能查询到apple手机

我们可以通过fuzziness来指定允许的编辑距离:

GET /wang/_search
{
  "query": {
    "fuzzy": {
        "title": {
            "value":"appla",
            "fuzziness":1
        }
    }
  }
}

4.4 Boosting Query

这个查询比较有意思,它有两个关键词positivenegative:

  • positive是“正”,所有满足positive条件的数据都会被查询出来;
  • negative是“负”,满足negative条件的数据并不会被过滤掉,而是会扣减分数。
  • negative_boost是得分的系数,它的分数在0~1之间,满足了negative条件的数据,它们的分数会乘以这个系数,比如这个系数是0.5,原来100分的数据如果满足了negative条件,它的分数会乘以0.5,变成50分。

5. 排序

5.1 单字段排序

sort 可以让我们按照不同的字段进行排序,并且通过order指定排序的方式

GET /wang/_search
{
  "query": {
    "match": {
      "title": "小米手机"
    }
  },
  "sort": [
    {
      "price": {
        "order": "desc"
      }
    }
  ]
}

5.2 多字段排序

假定我们想要结合使用 price和 _score(得分) 进行查询,并且匹配的结果首先按照价格排序,然后按照相关性得分排序:

GET /goods/_search
{
    "query":{
        "bool":{
        	"must":{ "match": { "title": "小米手机" }},
        	"filter":{
                "range":{"price":{"gt":200000,"lt":300000}}
        	}
        }
    },
    "sort": [
      { "price": { "order": "desc" }},
      { "_score": { "order": "desc" }}
    ]
}

6. 高亮

elasticsearch中实现高亮的语法比较简单:

GET /wang/_search
{
  "query": {
    "match": {
      "title": "手机"
    }
  },
  "highlight": {
    "pre_tags": "<em>",
    "post_tags": "</em>", 
    "fields": {
      "title": {}
    }
  }
}

在使用match查询的同时,加上一个highlight属性:

  • pre_tags:前置标签
  • post_tags:后置标签
  • fields:需要高亮的字段
    • title:这里声明title字段需要高亮,后面可以为这个字段设置特有配置,也可以空

结果:

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 0.2876821,
    "hits": [
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "title": "大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2899
        },
        "highlight": {
          "title": [
            "大米<em>手机</em>"
          ]
        }
      },
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "JP6xa2kBtq36Pzvxpjaf",
        "_score": 0.19856805,
        "_source": {
          "title": "小米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 2699
        },
        "highlight": {
          "title": [
            "小米<em>手机</em>"
          ]
        }
      },
      {
        "_index": "wang",
        "_type": "goods",
        "_id": "3",
        "_score": 0.16853254,
        "_source": {
          "title": "超大米手机",
          "images": "http://image.leyou.com/12479122.jpg",
          "price": 3299,
          "stock": 200,
          "saleable": true,
          "subTitle": "哈哈"
        },
        "highlight": {
          "title": [
            "超大米<em>手机</em>"
          ]
        }
      }
    ]
  }
}

7. 分页

通过from和size来指定分页的开始位置及每页大小。

语法:

GET /wang/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "price": {
        "order": "asc"
      }
    }
  ],
  "from": 10000,
  "size": 2
}

但是,其本质是逻辑分页,因此为了避免深度分页的问题,ES限制最多查到第10000条。

如果需要查询到10000以后的数据,你可以采用两种方式:

  • scroll滚动查询
  • search after
  • 0
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Alan0517

感谢您的鼓励与支持!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值