ElasticSearch踩坑记录

最新推荐文章于 2021-08-19 10:24:25 发布

FollowYourHeart2015

最新推荐文章于 2021-08-19 10:24:25 发布

阅读量320

点赞数

分类专栏： Elasticsearch 文章标签： elasticsearch 搜索引擎

本文链接：https://blog.csdn.net/MuErHuoXu/article/details/104449271

版权

Elasticsearch 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

结绳记事，记录，思考，方有成长~

环境信息：ElasticSearch 7.X

1. 使用`term`查询`text分词`的字段，实现模糊查询，返回结果为空。

比如我打算根据中国来搜索我是中国人这条记录，但并未查到。

"query": {
	"term": {
		"title": "中国"
	}
}

原因：在创建mapping时未指定分词器，虽然text字段在保存到ES前会先分词，构建倒排索引，但如果只指定这个字段的type为text这1个属性，则默认分词后的效果为我、是、中、国、人（即拆成每一个汉字，可参照第二部分的执行结果），所以需要指定分词器

# 1 构建mapping
PUT diary
{
  "settings": {
    "number_of_shards": "4",
    "number_of_replicas": "1"
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        # 指定分词器！！！，否则会被分成一个个汉字
        "analyzer": "ik_max_word"
      }
    }
  }
}

# 2 写入记录
POST diary/_doc/111
{
  "title": "我是中国人"
}
# 3 term查询
POST diary/_search
{
  "query": {
    "bool": {
      "must": {
        "term": {
          "title": "中国"
        }
      }
    }
  }
}

2. 如何查看分词效果

# 2.1未指定分词器
POST diary/_analyze
{
  "text": "我是中国人"
}
 # 分词效果如下
 {
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "<IDEOGRAPHIC>",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "<IDEOGRAPHIC>",
      "position": 1
    },
    {
      "token": "中",
      "start_offset": 2,
      "end_offset": 3,
      "type": "<IDEOGRAPHIC>",
      "position": 2
    },
    {
      "token": "国",
      "start_offset": 3,
      "end_offset": 4,
      "type": "<IDEOGRAPHIC>",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 4,
      "end_offset": 5,
      "type": "<IDEOGRAPHIC>",
      "position": 4
    }
  ]
}

# 2.2 指定分词器
POST diary/_analyze
{
  "text": "我是中国人",
  "analyzer": "ik_max_word"
}
# 执行结果如下
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中国人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "中国",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "国人",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    }
  ]
}

3. ik_smart和ik_max_word区别

ik_smart：粗略分词，如果词项有包含关系，则只保留词项长度最大的那个；

# POST diary/_analyze
{
  "text": "我是中国人",
  "analyzer": "ik_smart"
}
# 分词结果
{
  "tokens": [
    {
      "token": "我",
      "start_offset": 0,
      "end_offset": 1,
      "type": "CN_CHAR",
      "position": 0
    },
    {
      "token": "是",
      "start_offset": 1,
      "end_offset": 2,
      "type": "CN_CHAR",
      "position": 1
    },
    {
      "token": "中国人",
      "start_offset": 2,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 2
    }
  ]
}

ik_max_word：细分词，不管词项是否存在包含关系，都会作为分词结果。

FollowYourHeart2015

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
ElasticSearch踩坑记录

结绳记事，记录，思考，方有成长~环境信息：ElasticSearch 7.X使用term查询text分词的字段，实现模糊查询，返回结果为空。比如我打算根据中国来搜索我是中国人这条记录，但并未查到。"query": { "term": { "title": "中国" }}原因：在创建mapping时未指定分词器，虽然text字段在保存到ES前会先分词，构建倒排索引...
复制链接

扫一扫