elasticsearch学习6--Full text queries全文检索之match query

最新推荐文章于 2023-01-03 14:00:00 发布

Cape_sir

最新推荐文章于 2023-01-03 14:00:00 发布

阅读量251

点赞数 1

分类专栏： elasticsearch学习文章标签： elasticsearch 大数据 es

本文链接：https://blog.csdn.net/weixin_42652596/article/details/110168994

版权

elasticsearch学习专栏收录该内容

15 篇文章 4 订阅

订阅专栏

首先创建一个index

PUT /test_001
{
  "settings": {
    "index": {
      "number_of_shards": 1,
      "number_of_replicas": 1
    }
  },
  "mappings": {
    "_doc": {
      "dynamic": false,
      "properties": {
        "id": {
          "type": "integer"
        },
        "content": {
          "type": "keyword",
          "fields": {
            "field1": {
              "type": "text",
              "analyzer": "ik_max_word",
              "search_analyzer": "ik_max_word"
            },
            "field2": {
              "type": "text",
              "analyzer": "ik_smart"
            }
          }
        },
        "createAt": {
          "type": "date"
        }
      }
    }
  }
}

简单解释下，content字段的映射：【就是一个字段配置多个分词器】

"content": {
  "type": "keyword", # 默认为 keyword类型
  "fields": {
    "field1": { # 创建content的子字段，名为field1
      "type": "text",
      "analyzer": "ik_max_word", # 字段field1的倒排序索引分词器为ik_max_word
      "search_analyzer": "ik_max_word" # 检索关键词的分词器为ik_max_word
    },
    "field2": {  # 创建content的子字段，名为field2
      "type": "text",
      "analyzer": "ik_smart" # 字段field2的倒排序索引分词器为ik_smart
         # 字段field2的检索关键词的分词器默认为ik_smart
    }
  }
}

现在往这个index中添加数据

POST _bulk
{ "index" : { "_index" : "test_001", "_type" : "_doc", "_id" : "1" } }
{ "id" : 1,"content":"关注我，系统学编程" }
{ "index" : { "_index" : "test_001", "_type" : "_doc", "_id" : "2" } }
{ "id" : 2,"content":"系统学编程,就关注我" }
{ "index" : { "_index" : "test_001", "_type" : "_doc", "_id" : "3" } }
{ "id" : 3,"content":"系统编程，求关注" }

1）使用 content 的默认字段检索【keword】

# 1、发现查询不到结果
POST /test_001/_doc/_search
{
  "query":{
    "match":{
      "content":"系统学"
    }
  }
}

# 2、查询到id = 1 的文档
POST /test_001/_doc/_search
{
  "query":{
    "match":{
      "content":"关注我，系统学编程"
    }
  }
}

因为content的类型是keyword，在存入时不会被分词，其内容只能在作为一个整体时被查询到（如上面的2）。

2）使用 content.field1字段检索【ik_max_word】

# 1、会检索出所有结果
POST /test_001/_doc/_search
{
  "query":{
    "match":{
      "content.field1":"系统学"
    }
  }
}
# 2、改变检索分词器为ik_smart,只能检索到 文档1和文档2
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "content.field1": {
        "query": "系统学",
        "analyzer": "ik_smart"
      }
    }
  }
}

field1字段在创建index的时候指定了倒排索引分词器和检索关键字分词器都为ik_max_word，在检索时会将【系统学】分词为三个Token，【系统学、系统、学】，我们存入的数据中只要匹配到其中的一个就会被检索出来。

当我们将field1的检索关键字分词器改为ik_smart时，【系统学】只能分词为一个Token【系统学】，我们存入的数据中只有匹配到【系统学】才会被检索出来。

在这里插入图片描述

3）match的核心参数：operator ——控制Token之间的逻辑关系，or/and

# 1、不配置，使用默认值or，得到文档1和文档2
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "content.field2": {
        "query": "系统学es"
      }
    }
  }
}
# 2、and，查询不到结果
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "content.field2": {
        "query": "系统学es",
        "operator":"and"
      }
    }
  }
}

field2，默认使用ik_smart检索分词器，所以检索词“系统学es”被分词为【系统学、es】两个Token，【语句1】的operator默认值为or，所以文档1和2可以被检索到；【语句2】的operator的值是and，也就是需要同时包含【系统学、es】这两个Token才行，所以没有结果。

4）match的核心参数：zero_terms_query——停顿词检索

首先创建一个分词器为stop的index，stop分词器会将分词之后的停顿词过滤。

PUT /test002
{
  "mappings": {
    "_doc": {
      "properties": {
        "message": {
          "type": "text",
          "analyzer": "stop"
        }
      }
    }
  }
}

添加数据

PUT /test_002/_doc/1
{
  "message": "to be or not to be"
}

通过下面的方式检索，发现结果为空

POST /test_002/_doc/_search
{
  "query": {
    "match": {
      "message": {
        "query": "to be or not to be",
        "zero_terms_query": "none"
      }
    }
  }
}

分析原因：‘to be or not tobe’分词之后的结果全都是停用词，被stop分词器全部过滤了。zero_terms_query的值为none（默认），意味着搜索不到停用词，如果值为all，相当于match_all就能检索到停用词了。

5）match的核心参数：lenient—— 忽略数据类型转换异常

# 1.id是integer类型，报错
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "id": {
        "query": "系统学"
      }
    }
  }
}
# 2.加上参数，不报错，语句正常执行
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "id": {
        "query": "系统学",
        "lenient": "true"
      }
    }
  }
}
# 3.为可转换的字符串，也不报错，语句正常执行
POST /test_001/_doc/_search
{
  "query": {
    "match": {
      "id": {
        "query": "2"
      }
    }
  }
}

分析原因：首先我们在创建test_001时，给id字段设置的类型是integer
【语句一】检索id为‘系统学’的记录，可是id为integer类型，检索条件为text/keyword，所以报错
【语句二】同样检索id为‘系统学’的记录，因为添加了忽略类型转换异常的参数，所以不会报错，但是也没有id为‘系统学’的记录，所以结果为空
【语句三】检索id为‘2’学’的记录，虽然这个‘2’也不是integer，但是它可以转换成integer，因此不会报错，而且有id为2的记录还能检索到。

Cape_sir

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch学习6--Full text queries全文检索之match query

首先创建一个indexPUT /test_001{ "settings": { "index": { "number_of_shards": 1, "number_of_replicas": 1 } }, "mappings": { "_doc": { "dynamic": false, "properties": { "id": { "type": "integer"
复制链接

扫一扫