ElasticSearch学习之路-IK分词插件的安装

最新推荐文章于 2024-04-21 20:01:31 发布

半岛铁板

最新推荐文章于 2024-04-21 20:01:31 发布

阅读量250

点赞数

分类专栏： Elasticsearch 文章标签： Elasticsearch ik分词器

Elasticsearch 专栏收录该内容

25 篇文章 1 订阅

订阅专栏

转载自：https://blog.csdn.net/chengyuqiang/article/details/78991570，ES版本号6.3.0

插件安装

离线安装
下载安装包:https://github.com/medcl/elasticsearch-analysis-ik/releases。
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins目录下，创建ik目录
将下载的压缩包解压到F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik目录下，重启es即可
在线安装
进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\bin\目录下
在dos窗口键入命令

elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-6.3.0.zip

注:在线安装的ik分词插件的配置文件在F:\elkStudy\elasticsearch\elasticsearch-6.3.0\config目录下

测试IK中文分词器
（1）ik_smart

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text":"安徽省长江流域"
}

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

（2）ik_max_world

GET _analyze?pretty
{
  "analyzer": "ik_max_word",
  "text":"安徽省长江流域"
}

{
  "tokens": [
    {
      "token": "安徽省",
      "start_offset": 0,
      "end_offset": 3,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "安徽",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 1
    },
    {
      "token": "省长",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 2
    },
    {
      "token": "长江流域",
      "start_offset": 3,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 3
    },
    {
      "token": "长江",
      "start_offset": 3,
      "end_offset": 5,
      "type": "CN_WORD",
      "position": 4
    },
    {
      "token": "江流",
      "start_offset": 4,
      "end_offset": 6,
      "type": "CN_WORD",
      "position": 5
    },
    {
      "token": "流域",
      "start_offset": 5,
      "end_offset": 7,
      "type": "CN_WORD",
      "position": 6
    }
  ]
}

（3）新词的分词结果

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "王者荣耀"
}

{
  "tokens": [
    {
      "token": "王者",
      "start_offset": 0,
      "end_offset": 2,
      "type": "CN_WORD",
      "position": 0
    },
    {
      "token": "荣耀",
      "start_offset": 2,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 1
    }
  ]
}

扩展已有词典
step1.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config目录创建custom文件夹
step2.进入F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\custom目录，创建文件my_word.dic，并添加内容，注意文件的编码一定要为UTF-8 无Bom编码，老哥卡在这里卡了半天。

王者荣耀

step3.修改F:\elkStudy\elasticsearch\elasticsearch-6.3.0\plugins\ik\config\IKAnalyzer.cfg.xml文件

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
    <comment>IK Analyzer 扩展配置</comment>
    <!--用户可以在这里配置自己的扩展字典 -->
    <entry key="ext_dict">custom/my_word.dic</entry>
     <!--用户可以在这里配置自己的扩展停止词字典-->
    <entry key="ext_stopwords"></entry>
    <!--用户可以在这里配置远程扩展字典 -->
    <!-- <entry key="remote_ext_dict">words_location</entry> -->
    <!--用户可以在这里配置远程扩展停止词字典-->
    <!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>

step4.重启ES,Kibana

打印出来上述内容，说明自定义词典加载

step5.测试分词

GET _analyze?pretty
{
  "analyzer": "ik_smart",
  "text": "王者荣耀"
}

{
  "tokens": [
    {
      "token": "王者荣耀",
      "start_offset": 0,
      "end_offset": 4,
      "type": "CN_WORD",
      "position": 0
    }
  ]
}

半岛铁板

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录