IK分词器的安装（win、Linux）

最新推荐文章于 2024-06-24 23:00:07 发布

米斯特尔.w

最新推荐文章于 2024-06-24 23:00:07 发布

阅读量659

点赞数 1

分类专栏：初识java 文章标签： elasticsearch java

本文链接：https://blog.csdn.net/qq_34614766/article/details/105443281

版权

初识java 专栏收录该内容

36 篇文章 0 订阅

订阅专栏

一、简介

IKAnalyzer是一个开源的，基于java语言开发的轻量级的中文分词工具包。从2006年12月推出1.0版开始，IKAnalyzer已经推出了3个大版本。最初，它是以开源项目Lucene为应用主体的，结合词典分词和文法分析算法的中文分词组件。新版本的IKAnalyzer3.0则发展为面向Java的公用分词组件，独立于Lucene项目，同时提供了对Lucene的默认优化实现。

------------windows版本安装------------

1.下载地址：
https://github.com/medcl/elasticsearch-analysis-ik/releases

2.解压，将解压后的IK文件夹拷贝到elasticsearch-5.6.8\plugins下，并重命名文件夹为analysis-ik（注意：必须是analysis-ik）

在这里插入图片描述
3）重新启动ElasticSearch（管理员身份运行bin下的elasticsearch.bat），即可加载IK分词器

可以用PostMan测试一下效果：

结果：

{
    "tokens": [
        {
            "token": "我",
            "start_offset": 0,
            "end_offset": 1,
            "type": "CN_CHAR",
            "position": 0
        },
        {
            "token": "也",
            "start_offset": 1,
            "end_offset": 2,
            "type": "CN_CHAR",
            "position": 1
        },
        {
            "token": "想过",
            "start_offset": 2,
            "end_offset": 4,
            "type": "CN_WORD",
            "position": 2
        },
        {
            "token": "过过",
            "start_offset": 3,
            "end_offset": 5,
            "type": "CN_WORD",
            "position": 3
        },
        {
            "token": "过过",
            "start_offset": 4,
            "end_offset": 6,
            "type": "CN_WORD",
            "position": 4
        },
        {
            "token": "过过",
            "start_offset": 5,
            "end_offset": 7,
            "type": "CN_WORD",
            "position": 5
        },
        {
            "token": "过过",
            "start_offset": 6,
            "end_offset": 8,
            "type": "CN_WORD",
            "position": 6
        },
        {
            "token": "过过",
            "start_offset": 7,
            "end_offset": 9,
            "type": "CN_WORD",
            "position": 7
        },
        {
            "token": "的",
            "start_offset": 9,
            "end_offset": 10,
            "type": "CN_CHAR",
            "position": 8
        },
        {
            "token": "生活",
            "start_offset": 10,
            "end_offset": 12,
            "type": "CN_WORD",
            "position": 9
        }
    ]
}