Elasticsearch 学习第7篇—布尔过滤器

最新推荐文章于 2024-06-22 11:57:18 发布

水晶果冻1125

最新推荐文章于 2024-06-22 11:57:18 发布

阅读量785

点赞数

分类专栏： Elasticsearch 文章标签： Elasticsearch minimum_should_match bool 复合查询

本文链接：https://blog.csdn.net/m0_37617778/article/details/102610097

版权

Elasticsearch 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

`bool` 过滤

bool 过滤可以用来合并多个过滤条件查询结果的布尔逻辑，bool 过滤器由三部分组成：

{
    "bool" : {
        "must" : [],
        "should" : [],
        "must_not" : [],
    }
}

它包含以下操作符：

must：多个查询条件的完全匹配,相当于 and。
must_not：多个查询条件的相反匹配，相当于 not。
should：至少有一个查询条件匹配, 相当于 or。

注意：

must、must_not语句里面如果包含多个条件，则各个条件间是且的关系，而should的多个条件是或的关系。
查询语句同时包含must和should时，可以不满足should的条件，因为must条件优先级高于should，但是如果也满足should的条件，则会提高相关性得分。
可以使用minimum_should_match参数来控制应当满足条件的个数或百分比，通常和should配合使用。
must、must_not、should支持数组，bool复合查询语句中使用不参与计算相关性得分的过滤查询时，可以将过滤内容写到filter中的查询语句中。

查询举例

测试数据如下：

_index	_type	_id	_score	first_name	last_name	age	about
megacorp	employee	5	1	李	国庆	38	I like to shopping foods
megacorp	employee	8	1	Li	Haijing	35	I like to shopping foods1
megacorp	employee	2	1	Jane	Smith	32	I like to collect rock albums
megacorp	employee	4	1	Li	Haijing	35	I like to shopping foods
megacorp	employee	6	1	张	张国庆	28	I like to shopping foods
megacorp	employee	1	1	John	Smith	25	I love to go rock climbing
megacorp	employee	3	1	Douglas	Fir	35	I like to build cabinets

mapping信息如下：

{
  "mapping": {
    "employee": {
      "properties": {
        "about": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "age": {
          "type": "long"
        },
        "first_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "interests": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "last_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

使用es默认的标准分词器（它根据Unicode Consortium的定义的单词边界(word boundaries) 来切分文本，然后去掉大部分标点符号。最后，把所有词转为小写。例如Smith创建的分词索引是小新的smith）进行分词。

查询需求

找出年龄大于30岁但是不等于38岁的，first_name为Douglas或last_name为Smith的所有人，相当于下面sq;的

select * from employee where age>30 and age<>38 and (first_name="Douglas" or last_name="Smith")

es的布尔过滤查询语句

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name":"Smith"}},
                {"term":{"first_name":"Douglas"}}
              
            ]
        }
    }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "8",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Li",
          "last_name" : "Haijing",
          "age" : 35,
          "about" : "I like to shopping foods1",
          "interests" : [
            "music1"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Li",
          "last_name" : "Haijing",
          "age" : "35",
          "about" : "I like to shopping foods",
          "interests" : [
            "forestry"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

结果中有4条文档命中，而我最开始的预期是只有两条数据，即下图中红框中标注的数据

但实际结果是，蓝色框中的数据也查询出来了。原因如下：

查询语句同时包含must（filter、must_not）和should时，可以不满足should的条件，因为must条件优先级高于should，但是如果也满足should的条件，则会提高相关性得分。

从以上示例中可知should中的条件就是可以不满足，我们可以理解为有没有should不影响命中结果，只是得分可能会不同，但是如果我们想让should中的条件必须满足其一呢?

有两种方法可以解决，一种是用mustd对should进行包裹，另一种是使用minimum_should_match 参数

第一种方案：minimum_should_match代表了最小匹配精度，如果设置minimum_should_match=1，那么should语句中至少需要有一个条件满足，查询语句如下：

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name":"Smith"}},
                {"term":{"first_name":"Douglas"}}
              
            ],
            "minimum_should_match":1
        }
    }
}

此时返回的结果如下：

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

没有任何命中，原因是我们should中用的是term过滤查询，不会对查询关键词进行分词，输入的内容会原封不动的进行匹配，而我们在es中的索引是采用标准分词的，也就是说索引是小写的，因此没有任何文档被命中.

这时我们可以通过字段的keyword字段进行精确匹配

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name.keyword":"Smith"}},
                {"term":{"first_name.keyword":"Douglas"}}
              
            ],
            "minimum_should_match":1
        }
    }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

第二种方案

将should语句用must包裹

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "must":[
        		  {
          			"bool":{
          			  "should": [
          				  {"term":{"last_name.keyword":"Smith"}},
                          {"term":{"first_name.keyword":"Douglas"}}
          			  ]
          			 }
        		  }
        	  ]
        }
    }
}

结果如下：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}