bool
过滤
bool
过滤可以用来合并多个过滤条件查询结果的布尔逻辑,bool 过滤器由三部分组成:
{
"bool" : {
"must" : [],
"should" : [],
"must_not" : [],
}
}
它包含以下操作符:
must
:多个查询条件的完全匹配,相当于and
。must_not
:多个查询条件的相反匹配,相当于not
。should
:至少有一个查询条件匹配, 相当于or
。
注意:
- must、must_not语句里面如果包含多个条件,则各个条件间是且的关系,而should的多个条件是或的关系。
- 查询语句同时包含must和should时,可以不满足should的条件,因为must条件优先级高于should,但是如果也满足should的条件,则会提高相关性得分。
- 可以使用minimum_should_match参数来控制应当满足条件的个数或百分比,通常和should配合使用。
- must、must_not、should支持数组,bool复合查询语句中使用不参与计算相关性得分的过滤查询时,可以将过滤内容写到filter中的查询语句中。
查询举例
测试数据如下:
_index | _type | _id | _score | first_name | last_name | age | about |
---|---|---|---|---|---|---|---|
megacorp | employee | 5 | 1 | 李 | 国庆 | 38 | I like to shopping foods |
megacorp | employee | 8 | 1 | Li | Haijing | 35 | I like to shopping foods1 |
megacorp | employee | 2 | 1 | Jane | Smith | 32 | I like to collect rock albums |
megacorp | employee | 4 | 1 | Li | Haijing | 35 | I like to shopping foods |
megacorp | employee | 6 | 1 | 张 | 张国庆 | 28 | I like to shopping foods |
megacorp | employee | 1 | 1 | John | Smith | 25 | I love to go rock climbing |
megacorp | employee | 3 | 1 | Douglas | Fir | 35 | I like to build cabinets |
mapping信息如下:
{
"mapping": {
"employee": {
"properties": {
"about": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"age": {
"type": "long"
},
"first_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"interests": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"last_name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
使用es默认的标准分词器(它根据Unicode Consortium的定义的单词边界(word boundaries) 来切分文本,然后去掉大部分标点符号。最后,把所有词转为小写。例如Smith创建的分词索引是小新的smith)进行分词。
- 查询需求
找出年龄大于30岁但是不等于38岁的,first_name为Douglas或last_name为Smith的所有人,相当于下面sq;的
select * from employee where age>30 and age<>38 and (first_name="Douglas" or last_name="Smith")
- es的布尔过滤查询语句
GET /megacorp/employee/_search
{
"query" : {
"bool" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"must_not": {
"term":{"age":38}
},
"should": [
{"term":{"last_name":"Smith"}},
{"term":{"first_name":"Douglas"}}
]
}
}
}
结果
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 4,
"max_score" : 0.0,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "8",
"_score" : 0.0,
"_source" : {
"first_name" : "Li",
"last_name" : "Haijing",
"age" : 35,
"about" : "I like to shopping foods1",
"interests" : [
"music1"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.0,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "4",
"_score" : 0.0,
"_source" : {
"first_name" : "Li",
"last_name" : "Haijing",
"age" : "35",
"about" : "I like to shopping foods",
"interests" : [
"forestry"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.0,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [
"forestry"
]
}
}
]
}
}
结果中有4条文档命中,而我最开始的预期是只有两条数据,即下图中红框中标注的数据
但实际结果是,蓝色框中的数据也查询出来了。原因如下:
查询语句同时包含must(filter、must_not)和should时,可以不满足should的条件,因为must条件优先级高于should,但是如果也满足should的条件,则会提高相关性得分。
从以上示例中可知should中的条件就是可以不满足,我们可以理解为有没有should不影响命中结果,只是得分可能会不同,但是如果我们想让should中的条件必须满足其一呢?
有两种方法可以解决,一种是用mustd对should进行包裹,另一种是使用minimum_should_match 参数
- 第一种方案:minimum_should_match代表了最小匹配精度,如果设置minimum_should_match=1,那么should语句中至少需要有一个条件满足,查询语句如下:
GET /megacorp/employee/_search
{
"query" : {
"bool" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"must_not": {
"term":{"age":38}
},
"should": [
{"term":{"last_name":"Smith"}},
{"term":{"first_name":"Douglas"}}
],
"minimum_should_match":1
}
}
}
此时返回的结果如下:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
没有任何命中,原因是我们should中用的是term过滤查询,不会对查询关键词进行分词,输入的内容会原封不动的进行匹配,而我们在es中的索引是采用标准分词的,也就是说索引是小写的,因此没有任何文档被命中.
这时我们可以通过字段的keyword字段进行精确匹配
GET /megacorp/employee/_search
{
"query" : {
"bool" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"must_not": {
"term":{"age":38}
},
"should": [
{"term":{"last_name.keyword":"Smith"}},
{"term":{"first_name.keyword":"Douglas"}}
],
"minimum_should_match":1
}
}
}
结果
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.9808292,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.9808292,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [
"forestry"
]
}
}
]
}
}
- 第二种方案
将should语句用must包裹
GET /megacorp/employee/_search
{
"query" : {
"bool" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"must_not": {
"term":{"age":38}
},
"must":[
{
"bool":{
"should": [
{"term":{"last_name.keyword":"Smith"}},
{"term":{"first_name.keyword":"Douglas"}}
]
}
}
]
}
}
}
结果如下:
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.9808292,
"hits" : [
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "2",
"_score" : 0.9808292,
"_source" : {
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests" : [
"music"
]
}
},
{
"_index" : "megacorp",
"_type" : "employee",
"_id" : "3",
"_score" : 0.2876821,
"_source" : {
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about" : "I like to build cabinets",
"interests" : [
"forestry"
]
}
}
]
}
}