ElasticSearch7学习笔记之布尔查询

最新推荐文章于 2024-05-11 17:57:21 发布

coder_szc

最新推荐文章于 2024-05-11 17:57:21 发布

阅读量2.2k

点赞数

分类专栏： ElasticSearch 文章标签：搜索引擎 elasticsearch

本文链接：https://blog.csdn.net/qq_37475168/article/details/122687504

版权

ElasticSearch 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

文章目录

基本
单字符串查询
- should查询
- dis_max查询
单字符串多字段查询
中文和多语言分词检索

基本

一个布尔查询是一个或多个查询字句的组合，总共有四种子句，其中两个会影响算分，两个个不影响
在这里插入图片描述

正向匹配

子句顺序不重要，但如果没有must子句，那么should子句中必须至少有一项匹配。
示例如下，先插入数据：

POST /products/_bulk
{ "index": { "_id": 1 }}
{ "price" : 10,"avaliable":true,"date":"2018-01-01", "productID" : "XHDK-A-1293-#fJ3" }
{ "index": { "_id": 2 }}
{ "price" : 20,"avaliable":true,"date":"2019-01-01", "productID" : "KDKE-B-9947-#kL5" }
{ "index": { "_id": 3 }}
{ "price" : 30,"avaliable":true, "productID" : "JODL-X-1937-#pV7" }
{ "index": { "_id": 4 }}
{ "price" : 30,"avaliable":false, "productID" : "QQPX-R-3956-#aD8" }

然后进行布尔查询：

POST /products/_search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "price" : "30" }
      },
      "filter": {
        "term" : { "avaliable" : "true" }
      },
      "must_not" : {
        "range" : {
          "price" : { "lte" : 10 }
        }
      },
      "should" : [
        { "term" : { "productID.keyword" : "JODL-X-1937-#pV7" } },
        { "term" : { "productID.keyword" : "XHDK-A-1293-#fJ3" } }
      ],
      "minimum_should_match" :1
    }
  }
}

负向匹配

也可以通过negative和negative_boost来指定负向强化字段和因子。
示例如下，先插入数据：

POST /news/_bulk
{ "index": { "_id": 1 }}
{ "content":"Apple Mac" }
{ "index": { "_id": 2 }}
{ "content":"Apple iPad" }
{ "index": { "_id": 3 }}
{ "content":"Apple employee like Apple Pie and Apple Juice" }

再进行带负向强化的布尔查询

POST news/_search
{
  "query": {
    "boosting": {
      "positive": {
        "match": {
          "content": "apple"
        }
      },
      "negative": {
        "match": {
          "content": "pie"
        }
      },
      "negative_boost": 0.5
    }
  }
}

嵌套布尔查询

同一级别下的查询子句的权重是一样的，所以我们可以通过嵌套的方式改变子句的权重

POST /animals/_search
{
  "query": {
    "bool": {
      "should": [
        { "term": { "text": "quick" }},
        { "term": { "text": "dog"   }},
        {
          "bool":{
            "should":[
               { "term": { "text": "brown" }},
               { "term": { "text": "brown" }}
            ]
          }
        }
      ]
    }
  }
}

单字符串查询

dis_max会把多个子句进行匹配，总的匹配度高的结果放在前面

PUT /blogs/_doc/1
{
    "title": "Quick brown rabbits",
    "body":  "Brown rabbits are commonly seen."
}

PUT /blogs/_doc/2
{
    "title": "Keeping pets healthy",
    "body":  "My quick brown fox eats rabbits on a regular basis."
}

should查询

此时如果用下面的should查询，那么两个文档分数一样

POST /blogs/_search
{
    "query": {
        "bool": {
            "should": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

但是把should换成dis_max就不一样了

dis_max查询

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Brown fox" }},
                { "match": { "body":  "Brown fox" }}
            ]
        }
    }
}

由于文档2匹配得更好(直接匹配了brown fox)，所以它就会在前面

但对于以下dis_max，因为两个文档都没有直接匹配，所以分数一样。注意dis_max和should一样匹配一个子句就算匹配

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ]
        }
    }
}

但显然文档2匹配的单词更多一些(quick和pets都有)，所以需要加上一个参数tie_breaker，来弱化文档1的算分

POST blogs/_search
{
    "query": {
        "dis_max": {
            "queries": [
                { "match": { "title": "Quick pets" }},
                { "match": { "body":  "Quick pets" }}
            ],
            "tie_breaker": 0.2
        }
    }
}

单字符串多字段查询

三种场景：最佳字段、多数字段和混合字段查询

最佳字段

当字段之间相互竞争，又相互关联时，评分来自最佳匹配字段：

POST blogs/_search
{
  "query": {
    "multi_match": {
      "type": "best_fields",
      "query": "quick pets",
      "fields": ["title", "body"],
      "tie_breaker": 0.2
    }
  }
}

多数字段

匹配的字段越多越好。例如English Analyzer在主字段中抽取词干，加入同义词来匹配更多的文档。而Standard Analyze则为相同的文本加入子字段来提供更加精确的匹配。

先插入数据：

PUT test_mul/_doc/1
{
  "k1": "a",
  "k2": "a b",
  "k3": "a c d",
  "k4": "e f c"
}

PUT test_mul/_doc/2
{
  "k1": "a",
  "k2": "b",
  "k3": "a c d",
  "k4": "e f c"
}

再进行多数字段布尔查询：

POST test_mul/_search
{
  "query": {
    "multi_match": {
      "query": "a",
      "type": "most_fields",
      "fields": ["k1", "k2", "k3", "k4"]
    }
  }
}

混合字段

如果需要在多个字段中确定信息，那么需要在这些字段中找到尽可能多的词。

比如插入以下数据：

PUT address/_doc/1
{
  "street": "5 Poland Street",
  "city": "London",
  "country": "United Kindom",
  "postcode": "W1V 3DG"
}

混合字段查询示例如下：

POST address/_search
{
  "query": {
    "multi_match": {
      "query": "Poland Kindom W1V",
      "type": "cross_fields",
      "fields": ["street", "country", "postcode"]
    }
  }
}

中文和多语言分词检索

中文分词

可以通过安装es插件的方式来进行中文分词
例如安装ik分词器和hanlp分词器

#安装插件
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.1.0/elasticsearch-analysis-ik-7.1.0.zip
#安装插件
./elasticsearch-plugin install https://github.com/KennFalcon/elasticsearch-analysis-hanlp/releases/download/v7.1.0/elasticsearch-analysis-hanlp-7.1.0.zip

拼音分词器

然后使用拼音分词器分词

#Pinyin
PUT /artists/
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "user_name_analyzer" : {
                    "tokenizer" : "whitespace",
                    "filter" : "pinyin_first_letter_and_full_pinyin_filter"
                }
            },
            "filter" : {
                "pinyin_first_letter_and_full_pinyin_filter" : {
                    "type" : "pinyin",
                    "keep_first_letter" : true,
                    "keep_full_pinyin" : false,
                    "keep_none_chinese" : true,
                    "keep_original" : false,
                    "limit_first_letter_length" : 16,
                    "lowercase" : true,
                    "trim_whitespace" : true,
                    "keep_none_chinese_in_first_letter" : true
                }
            }
        }
    }
}

GET /artists/_analyze
{
  "text": ["刘德华 张学友 郭富城 黎明 四大天王"],
  "analyzer": "user_name_analyzer"
}

hanlp分词

使用hanlp标准分词器分词

POST _analyze
{
  "analyzer": "hanlp_standard",
  "text": ["剑桥分析公司多位高管对卧底记者说，他们确保了唐纳德·特朗普在总统大选中获胜"]
}

英文精确分词

对于英文精确分词，可以通过对同一字段设置两个索引，每个索引设置不同的分词器来实现。下面这个例子就是先对title进行标准分词，再给title加一个tag_e字段，并对其进行english分词

PUT /my_index
{
  "mappings": {
      "properties": {
        "title": {
          "type": "text",
          "fields": {
            "tag_e": {
              "type":     "text",
              "analyzer": "english"
            }
          }
        }
      }
  }
}

测试如下：

PUT /my_index/_doc/1
{ "title": "I'm happy for this fox" }

PUT /my_index/_doc/2
{ "title": "I'm not happy about my fox problem" }

GET /my_index/_search
{
  "query": {
    "multi_match": {
      "type":     "most_fields",
      "query":    "not happy foxes",
      "fields": [ "title", "title.tag_e" ]
    }
  }
}

coder_szc

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch7学习笔记之布尔查询

文章目录基本正向匹配负向匹配嵌套布尔查询单字符串查询should查询dis_max查询单字符串多字段查询最佳字段多数字段混合字段中文和多语言分词检索中文分词拼音分词器hanlp分词英文精确分词基本一个布尔查询是一个或多个查询字句的组合，总共有四种子句，其中两个会影响算分，两个个不影响正向匹配子句顺序不重要，但如果没有must子句，那么should子句中必须至少有一项匹配。示例如下，先插入数据：POST /products/_bulk{ "index": { "_id": 1 }}{ "pr
复制链接

扫一扫