Elasticsearch 学习第7篇—布尔过滤器

bool 过滤

bool 过滤可以用来合并多个过滤条件查询结果的布尔逻辑,bool 过滤器由三部分组成:

{
    "bool" : {
        "must" : [],
        "should" : [],
        "must_not" : [],
    }
}

它包含以下操作符:

  • must:多个查询条件的完全匹配,相当于 and
  • must_not:多个查询条件的相反匹配,相当于 not
  • should:至少有一个查询条件匹配, 相当于 or

注意:

  1. must、must_not语句里面如果包含多个条件,则各个条件间是的关系,而should的多个条件是的关系。
  2. 查询语句同时包含must和should时,可以不满足should的条件,因为must条件优先级高于should,但是如果也满足should的条件,则会提高相关性得分。
  3. 可以使用minimum_should_match参数来控制应当满足条件的个数或百分比,通常和should配合使用。
  4. must、must_not、should支持数组,bool复合查询语句中使用不参与计算相关性得分的过滤查询时,可以将过滤内容写到filter中的查询语句中。

查询举例

测试数据如下:

 

_index

_type

_id

_score

first_name

last_name

age

about

megacorp

employee

5

1

国庆

38

I like to shopping foods

megacorp

employee

8

1

Li

Haijing

35

I like to shopping foods1

megacorp

employee

2

1

Jane

Smith

32

I like to collect rock albums

megacorp

employee

4

1

Li

Haijing

35

I like to shopping foods

megacorp

employee

6

1

张国庆

28

I like to shopping foods

megacorp

employee

1

1

John

Smith

25

I love to go rock climbing

megacorp

employee

3

1

Douglas

Fir

35

I like to build cabinets

mapping信息如下:

{
  "mapping": {
    "employee": {
      "properties": {
        "about": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "age": {
          "type": "long"
        },
        "first_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "interests": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        },
        "last_name": {
          "type": "text",
          "fields": {
            "keyword": {
              "type": "keyword",
              "ignore_above": 256
            }
          }
        }
      }
    }
  }
}

使用es默认的标准分词器(它根据Unicode Consortium的定义的单词边界(word boundaries) 来切分文本,然后去掉大部分标点符号。最后,把所有词转为小写。例如Smith创建的分词索引是小新的smith)进行分词。

  • 查询需求

      找出年龄大于30岁但是不等于38岁的,first_name为Douglas或last_name为Smith的所有人,相当于下面sq;的

select * from employee  where age>30 and age<>38 and (first_name="Douglas" or last_name="Smith")

  • es的布尔过滤查询语句
GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name":"Smith"}},
                {"term":{"first_name":"Douglas"}}
              
            ]
        }
    }
}

结果

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "8",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Li",
          "last_name" : "Haijing",
          "age" : 35,
          "about" : "I like to shopping foods1",
          "interests" : [
            "music1"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "4",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Li",
          "last_name" : "Haijing",
          "age" : "35",
          "about" : "I like to shopping foods",
          "interests" : [
            "forestry"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

结果中有4条文档命中,而我最开始的预期是只有两条数据,即下图中红框中标注的数据

但实际结果是,蓝色框中的数据也查询出来了。原因如下:

查询语句同时包含must(filter、must_not)和should时,可以不满足should的条件,因为must条件优先级高于should,但是如果也满足should的条件,则会提高相关性得分。

从以上示例中可知should中的条件就是可以不满足,我们可以理解为有没有should不影响命中结果,只是得分可能会不同,但是如果我们想让should中的条件必须满足其一呢?

有两种方法可以解决,一种是用mustd对should进行包裹,另一种是使用minimum_should_match 参数

  • 第一种方案:minimum_should_match代表了最小匹配精度,如果设置minimum_should_match=1,那么should语句中至少需要有一个条件满足,查询语句如下:
GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name":"Smith"}},
                {"term":{"first_name":"Douglas"}}
              
            ],
            "minimum_should_match":1
        }
    }
}

此时返回的结果如下:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

没有任何命中,原因是我们should中用的是term过滤查询,不会对查询关键词进行分词,输入的内容会原封不动的进行匹配,而我们在es中的索引是采用标准分词的,也就是说索引是小写的,因此没有任何文档被命中.

这时我们可以通过字段的keyword字段进行精确匹配

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "should": [
                {"term":{"last_name.keyword":"Smith"}},
                {"term":{"first_name.keyword":"Douglas"}}
              
            ],
            "minimum_should_match":1
        }
    }
}

结果

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}
  • 第二种方案

将should语句用must包裹

GET /megacorp/employee/_search
{
    "query" : {
        "bool" : {
            "filter" : {
                "range" : {
                    "age" : { "gt" : 30 }
                }
            },
            "must_not": {
              "term":{"age":38}
            },
            "must":[
        		  {
          			"bool":{
          			  "should": [
          				  {"term":{"last_name.keyword":"Smith"}},
                          {"term":{"first_name.keyword":"Douglas"}}
          			  ]
          			 }
        		  }
        	  ]
        }
    }
}

结果如下:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.9808292,
    "hits" : [
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "2",
        "_score" : 0.9808292,
        "_source" : {
          "first_name" : "Jane",
          "last_name" : "Smith",
          "age" : 32,
          "about" : "I like to collect rock albums",
          "interests" : [
            "music"
          ]
        }
      },
      {
        "_index" : "megacorp",
        "_type" : "employee",
        "_id" : "3",
        "_score" : 0.2876821,
        "_source" : {
          "first_name" : "Douglas",
          "last_name" : "Fir",
          "age" : 35,
          "about" : "I like to build cabinets",
          "interests" : [
            "forestry"
          ]
        }
      }
    ]
  }
}

 

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值