Elasticsearch安装ik分词

最新推荐文章于 2024-06-02 14:31:57 发布

蒜蓉粉丝蒸扇贝

最新推荐文章于 2024-06-02 14:31:57 发布

阅读量1.4k

点赞数

分类专栏：搜索大数据文章标签： Elasticsearch 分词 ik

本文链接：https://blog.csdn.net/smithallenyu/article/details/51259835

版权

大数据同时被 2 个专栏收录

15 篇文章 0 订阅

订阅专栏

搜索

7 篇文章 0 订阅

订阅专栏

elasticsearch是自带中文分词的，但是基本上是每个单字的分，效果不好。

medcl大神的ik分词，是专门的中文分词。更多信息，可参考 https://github.com/medcl/elasticsearch-analysis-ik

1. 安装ik插件

可以从 https://github.com/medcl/elasticsearch-analysis-ik/releases 下载适合的匹配ES版本的包，

下载之后，放到plugin目录下，解压即可使用了。

2. 创建index的时候，给出mapping，在mapping中，指定字段所使用的analyzer为ik

e.g. curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
    "fulltext": {
             "_all": {
            "analyzer": "ik_max_word",
            "search_analyzer": "ik_max_word",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "string",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "analyzer": "ik_max_word",
                "search_analyzer": "ik_max_word",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}'

具体的case可参考https://github.com/medcl/elasticsearch-analysis-ik

3. 对于分词的效果，可使用 _analyze 来查看

e.g. GET localhost:9200/_analyze -d '
 {
  "analyzer":"ik_max_word",
  "text" : "助手P5 5.14.9003"
}’

4. 对于同一个字段，使用不同分词器的情况，可参考http://keenwon.com/1404.html 给出的例子，对一个field建立多个子field, 对该field及多个子fields使用不同的analyzer。

e.g. 下面黄色标出的部分，即是title这个field的子fileds : cn 和 en

对title本身使用的是标准分词器，对title.cn使用的是ik分词器，对title.cn使用的是自带的英文分词器。

PUT http://192.168.159.159:9200/index1
{
  "settings": {
     "refresh_interval": "5s",
     "number_of_shards" :   1, // 一个主节点
     "number_of_replicas" : 0 // 0个副本，后面可以加
  },
  "mappings": {
    "_default_":{
      "_all": { "enabled":  false } // 关闭_all字段，因为我们只搜索title字段
    },
    "resource": {
      "dynamic": false, // 关闭“动态修改索引”
      "properties": {
        "title": {
          "type": "string",
          "index": "analyzed",
          "fields": {
            "cn": {
              "type": "string",
              "analyzer": "ik"
            },
            "en": {
              "type": "string",
              "analyzer": "english"
            }
          }
        }
      }
    }
  }
}

在搜索的时候，同时匹配该字段及其子字段就可以了。

POST http://192.168.159.159:9200/index1/resource/_search

{
  "query": {
    "multi_match": {
      "type":     "most_fields", 
      "query":    "最新",
      "fields": [ "title", "title.cn", "title.en" ]
    }
  }
}

蒜蓉粉丝蒸扇贝

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch安装ik分词

elasticsearch是自带中文分词的，但是基本上是每个单字的分，效果不好。medcl大神的ik分词，是专门的中文分词。更多信息，可参考 https://github.com/medcl/elasticsearch-analysis-ik1. 安装ik插件可以从 https://github.com/medcl/elasticsearch-analysis-ik/releas
复制链接

扫一扫