elasticsearch学习13--Compound queries 之 Bool query

最新推荐文章于 2024-01-03 10:49:21 发布

Cape_sir

最新推荐文章于 2024-01-03 10:49:21 发布

阅读量206

点赞数

分类专栏： elasticsearch学习文章标签：大数据 es elasticsearch

本文链接：https://blog.csdn.net/weixin_42652596/article/details/110380458

版权

elasticsearch学习专栏收录该内容

15 篇文章 4 订阅

订阅专栏

1、ES的两种上下文

1）Query context 查询上下文

查询子句关注“此文档与该查询子句的匹配程度如何？”，除了确定文档是否匹配之外，查询子句还计算_score元字段中的相关性得分。

2）Filter context过滤器上下文

查询子句关注“此文档是否与此查询子句匹配？" ，答案很简单，是或否，不计算分数。过滤器上下文主要用于过滤结构化数据。

2、Bool query 简介

布尔查询映射到LuceneBooleanQuery。它是使用一个或多个布尔子句构建的，每个子句都有固定的类型。Bool query 的子句的类型有4种：

1）filter子句：（查询）必须匹配，子句在过滤器上下文中执行，这意味着计分被忽略，并且子句被视为用于缓存。

2）must子句：（查询）必须出现在匹配的文档中，并将有助于得分。

3）must_not子句：（查询）不得出现在匹配的文档中。子句在过滤器上下文中执行，这意味着计分被忽略，并且子句被视为用于缓存。

4）should子句：（查询）应出现在匹配的文档中。【注意should的最小匹配数】

Bool query 注意事项：

1）Bool query 只支持以上4种查询的子句；
2）以上4种查询的子句，只支持 Full text queries 和 Term-level queries 和 Bool query ；
3）只有must 和 should 子句会计算相关性评分；filter 和 must_not 子句都是在过滤器上下文中执行，计分被忽略，并且子句被考虑用于缓存。

3、Bool query 子句讲解

首先创建index和准备数据。

PUT /blogs_index
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "_doc": {
      "dynamic": false,
      "properties": {
        "id": {
          "type": "integer"
        },
        "author": {
          "type": "keyword"
        },
        "title": {
          "type": "text",
          "analyzer": "ik_smart"
        },
        "content": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_smart"
        },
        "tag": {
          "type": "keyword"
        },
        "influence": {
          "type": "integer_range"
        },
        "createAt": {
          "type": "date",
          "format": "yyyy-MM-dd HH:mm"
        }
      }
    }
  }
}

POST _bulk
{"index":{"_index":"blogs_index","_type":"_doc","_id":"1"}}
{"id":1,"author":"大衣哥","title":"关注我，系统学习es","content":"这是关于es的文章","tag":[1,2,3],"influence":{"gte":10,"lte":12},"createAt":"2020-05-24 10:56"}
{"index":{"_index":"blogs_index","_type":"_doc","_id":"2"}}
{"id":2,"author":"大衣哥","title":"系统学习编程","content":"这是关于编程的文章","tag":[2,3,4],"influence":{"gte":12,"lte":15},"createAt":"2020-05-23 10:56"}
{"index":{"_index":"blogs_index","_type":"_doc","_id":"3"}}
{"id":3,"author":"大衣哥","title":"关注我，必看文章","content":"这是关于关于es和编程的必看文章","tag":[2,3,4],"influence":{"gte":12,"lte":15},"createAt":"2020-05-22 10:56"}
{"index":{"_index":"blogs_index","_type":"_doc","_id":"4"}}
{"id":4,"author":"大衣","title":"关注我，系统学习es","content":"这是关于es的文章","tag":[1,2,3],"influence":{"gte":10,"lte":15},"createAt":"2020-05-24 10:56"}
{"index":{"_index":"blogs_index","_type":"_doc","_id":"5"}}
{"id":5,"author":"大衣","title":"系统学习编程","content":"这是关于编程的文章","tag":[2,3,4],"influence":{"gte":12,"lte":18},"createAt":"2020-05-25 10:56"}
{"index":{"_index":"blogs_index","_type":"_doc","_id":"6"}}
{"id":6,"author":"大衣","title":"关注我，必看文章","content":"这是关于关于es和编程的必看文章","tag":[2,3,4],"influence":{"gte":15,"lte":18},"createAt":"2020-05-20 10:56"}

1）filter 的使用

【语句1】：filter 子句中包含多个 Full text queries 和 Term-level queries 的子句

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "author": "大衣哥"
          }
        },
        {
          "term": {
            "tag": "2"
          }
        },
        {
          "match": {
            "title": "es"
          }
        }
      ]
    }
  }
}

结果分析：语句可以检索到文档1，检索逻辑是：author = “大衣哥” and tag 包含 2 and title的PostingLIst包含“es”。

【语句2】：filter 子句也可包含 bool query，实现更复杂的逻辑。

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "author": "大衣哥"
              }
            },
            {
              "term": {
                "tag": "2"
              }
            }
          ],
          "should": [
            {
              "match": {
                "title": "es"
              }
            },
            {
              "match": {
                "content": "es"
              }
            }
          ]
        }
      }
    }
  }
}

结果分析：可以检索到文档1和文档3，检索逻辑是：author = “大衣哥” and tag 包含 2 and （ title的PostingLIst包含“es”or content的PostingLIst包含“es”）

注意：使用filter查询，是不会计算文档的相关性评分的，也就是_score为0。

2）must的使用

【语句3】：直接将【语句1】中的 filter 换为 must

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "author": "大衣哥"
          }
        },
        {
          "term": {
            "tag": "2"
          }
        },
        {
          "match": {
            "title": "es"
          }
        }
      ]
    }
  }
}

结果分析：检索到文档1，检索逻辑与【语句1】完全一致，唯一的区别就是must会计算相关性评分。

3）must_not 的使用

【语句4】：直接将【语句1】中的 must 换为 must_not，同时删除对tag的检索

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "term": {
            "author": "大衣哥"
          }
        },
        {
          "match": {
            "title": "es"
          }
        }
      ]
    }
  }
}

结果分析：可检索到文档5和文档6，检索逻辑为：author != 大衣哥 and title的postingList 不包含“es”。

同时，must_not 会将相关性评分处理为常数1。

4）should 的使用

【语句5】：直接将【语句4】中的 must_not 换为 should

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "author": "大衣哥"
          }
        },
      
        {
          "match": {
            "title": "es"
          }
        }
      ]
    }
  }
}

结果分析：会检索到文档1、2、3、4，检索逻辑是：author = 大衣哥 or title的PostingList包含“es”。

4、should 子句讲解

1）should 仅影响得分的情况

如果bool（第二个）查询在bool（第一个）的must子句下，那么bool（第二个）的should查询即使没有匹配到，文档也将与查询匹配。在这种情况下，should的子句仅用于影响得分。

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must": [
            {
              "term": {
                "author": "大衣哥"
              }
            },
            {
              "term": {
                "tag": "2"
              }
            }
          ],
          "should": [
            {
              "match": {
                "title": "es"
              }
            },
            {
              "match": {
                "content": "es"
              }
            }
          ]
        }
      }
    }
  }
}

结果分析：会检索到文档1、2、3。正常理解should语句至少满足一个条件而言，检索到文档1和文档3是没问题（如【语句2】）。但是文档2，也被检索出来了，那就证明此时的should子句仅用于影响得分。

观察文档2：
在这里插入图片描述

2）should 至少匹配一个的情况

如果bool 查询是 Filter context 或 既没有must也没filter，则文档至少与一个should的查询相匹配。

bool 查询是 Filter context 的例子可以参考【语句2】。
既没有must也没filter的例子可参考上面【语句5】。

3）minimum_should_match 参数值说明

如果不管哪种情况，我们都希望should子句应该被匹配，或者应该被匹配多个，那应该怎么做呢？可以通过minimum_should_match 参数控制。

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "must": {
        "bool": {
          "must": [
            {
              "term": {
                "author": "大衣哥"
              }
            }
          ],
          "should": [
            {
              "match": {
                "title": "es"
              }
            },
            {
              "match": {
                "content": "es"
              }
            },
            {
              "match": {
                "content": "编程"
              }
            }
          ],
          "minimum_should_match":2
        }
      }
    }
  }
}

结果分析：可以检索到文档1和3，检索逻辑为：author = 大衣哥 and （title的PostingList 包含“es” 、content的PostingList 包含“es” 、content的PostingList 包含“编程”）【这3个条件满足2个】

minimum_should_match 参数可以有以下形式的参数：

标记：N——应该匹配的子句数，S——子句总数，X——用户给定的参数值
（1）正整数，比如2
此时N=2，需要匹配两个子句。
（2）负整数，比如-2
此时N = S - |X|，即 N= S - 2。如果子句总数为5，那么需要匹配3个子句。
（3）正百分比，比如25%
此时N = min 取整（S*X），如果子句总数为5，那么 N= 1。

（4）负百分比，比如-25%
此时N =S - min 取整（|S*X|），如果子句总数为5，那么 N= 4。
（5）组合，比如 3<90%
意思是当子句总数<= 3，则需匹配所有子句；当子句总数 > 3，则仅需要按（3）计算90%的情况。
（6）多种组合，比如：2<-25% 9<3
多个条件规范可以用空格分隔。每个条件规范仅对大于其前一个的数字有效。
在此示例中：当子句总数<= 2，则需匹配所有子句；如果子句总数为3-9个，则需要-25％(按负百分数计算)；如果有9个以上的子句，则需要匹配3个子句。

注意：
当minimum_should_match的值大于子句数量时，将检索不到值；
当minimum_should_match为0时，should子句失效。

5、一道Bool Query 练习题

影响力 influence在范围12~20；文章标签tag包含3或者4，同时不能包含1；发布时间createAt一周内；标题title或内容content 包含“es”、“编程”、“必看”【3选2】且需要相关性评分。

参考答案如下：

POST /blogs_index/_search
{
  "query": {
    "bool": {
      "filter": {
        "bool": {
          "must": [
            {
              "range": {
                "influence": {
                  "gte": 12,
                  "lte": 20,
                  "relation": "WITHIN"
                }
              }
            },
            {
              "range": {
                "createAt": {
                  "gte": "now-1w/d"
                }
              }
            },
            {
              "terms": {
                "tag": [
                  3,
                  4
                ]
              }
            }
          ],
          "must_not": [
            {
              "term": {
                "tag": 1
              }
            }
          ]
        }
      },
      "should": [
        {
          "multi_match": {
            "query": "es",
            "fields": [
              "title",
              "content"
            ]
          }
        },
        {
          "multi_match": {
            "query": "编程",
            "fields": [
              "title",
              "content"
            ]
          }
        },
        {
          "multi_match": {
            "query": "必看",
            "fields": [
              "title",
              "content"
            ]
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

Cape_sir

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch学习13--Compound queries 之 Bool query

1、ES的两种上下文1）Query context 查询上下文查询子句关注“此文档与该查询子句的匹配程度如何？”，除了确定文档是否匹配之外，查询子句还计算_score元字段中的相关性得分。2）Filter context过滤器上下文查询子句关注“此文档是否与此查询子句匹配？" ，答案很简单，是或否，不计算分数。过滤器上下文主要用于过滤结构化数据。2、Bool query 简介布尔查询映射到LuceneBooleanQuery。它是使用一个或多个布尔子句构建的，每个子句都有固定的类型。Bool
复制链接

扫一扫

专栏目录