ES查询－match VS match_phrase

最新推荐文章于 2024-08-11 15:22:00 发布

赶路人儿

最新推荐文章于 2024-08-11 15:22:00 发布

阅读量3w

点赞数 6

分类专栏： es 文章标签： es match match_phrase

es 专栏收录该内容

17 篇文章 3 订阅

订阅专栏

我们以一个查询的示例开始，我们在student这个type中存储了一些学生的基本信息，我们分别使用match和match_phrase进行查询。

首先，使用match进行检索，关键字是“He is”：

GET /test/student/_search
{
  "query": {
    "match": {
      "description": "He is"
    }
  }
}

执行这条查询，得到的结果如下：

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 4,
      "max_score": 0.2169777,
      "hits": [
         {
            "_index": "test",
            "_type": "student",
            "_id": "2",
            "_score": 0.2169777,
            "_source": {
               "name": "februus",
               "sex": "male",
               "age": 24,
               "description": "He is passionate.",
               "interests": "reading, programing"
            }
         },
         {
            "_index": "test",
            "_type": "student",
            "_id": "1",
            "_score": 0.16273327,
            "_source": {
               "name": "leotse",
               "sex": "male",
               "age": 25,
               "description": "He is a big data engineer.",
               "interests": "reading, swiming, hiking"
            }
         },
         {
            "_index": "test",
            "_type": "student",
            "_id": "4",
            "_score": 0.01989093,
            "_source": {
               "name": "pascal",
               "sex": "male",
               "age": 25,
               "description": "He works very hard because he wanna go to Canada.",
               "interests": "programing, reading"
            }
         },
         {
            "_index": "test",
            "_type": "student",
            "_id": "3",
            "_score": 0.016878016,
            "_source": {
               "name": "yolovon",
               "sex": "female",
               "age": 24,
               "description": "She is so charming and beautiful.",
               "interests": "reading, shopping"
            }
         }
      ]
   }
}

而当你执行match_phrase时：

GET /test/student/_search
{
  "query": {
    "match_phrase": {
      "description": "He is"
    }
  }
}

结果如下：

{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.30685282,
      "hits": [
         {
            "_index": "test",
            "_type": "student",
            "_id": "2",
            "_score": 0.30685282,
            "_source": {
               "name": "februus",
               "sex": "male",
               "age": 24,
               "description": "He is passionate.",
               "interests": "reading, programing"
            }
         },
         {
            "_index": "test",
            "_type": "student",
            "_id": "1",
            "_score": 0.23013961,
            "_source": {
               "name": "leotse",
               "sex": "male",
               "age": 25,
               "description": "He is a big data engineer.",
               "interests": "reading, swiming, hiking"
            }
         }
      ]
   }
}

占的篇幅有点长，但是如果能基于此看清这两者之间的区别，那也是值得的。

我们分析一下这两者结果的差别：

1.非常直观的一点，对于同一个数据集，两者检索出来的结果集数量不一样；
2.对于match的结果，我们可以可以看到，结果的Document中description这个field可以包含“He is”，“He”或者“is”；
3.match_phrased的结果中的description字段，必须包含“He is”这一个词组；
4.所有的检索结果都有一个_score字段，看起来是当前这个document在当前搜索条件下的评分，而检索结果也是按照这个得分从高到低进行排序。
我们要想弄清楚match和match_phrase的区别，要先回到他们的用途：match是全文搜索，也就是说这里的搜索条件是针对这个字段的全文，只要发现和搜索条件相关的Document，都会出现在最终的结果集中，事实上，ES会根据结果相关性评分来对结果集进行排序，这个相关性评分也就是我们看到的_score字段；总体上看，description中出现了“He is”的Document的相关性评分高于只出现“He”或“is”的Document。（至于怎么给每一个Document评分，我们会在以后介绍）。
相关性(relevance)的概念在Elasticsearch中非常重要，而这个概念在传统关系型数据库中是不可想象的，因为传统数据库对记录的查询只有匹配或者不匹配。

那么，如果我们不想将我们的查询条件拆分，应该怎么办呢？这时候我们就可以使用match_phrase：
match_phrase是短语搜索，亦即它会将给定的短语（phrase）当成一个完整的查询条件。当使用match_phrase进行搜索的时候，你的结果集中，所有的Document都必须包含你指定的查询词组，在这里是“He is”。这看起来有点像关系型数据库的like查询操作。