Elasticsearch安装中文分词插件ik

最新推荐文章于 2025-09-15 15:38:51 发布

原创

最新推荐文章于 2025-09-15 15:38:51 发布 · 7.6w 阅读

20 ·

CC 4.0 BY-SA版权

本文介绍了如何在Elasticsearch中安装ik中文分词插件，解决默认分词不符合需求的问题。步骤包括从GitHub下载源代码、解压、复制配置文件到ES安装目录，并在elasticsearch.yml中添加配置。此外，还提到使用源代码安装插件而非常规命令方式的原因，以及自定义词库的参考链接。

Elasticsearch默认提供的分词器，会把每个汉字分开，而不是我们想要的根据关键词来分词。例如：

curl -XPOST  "http://localhost:9200/userinfo/_analyze?analyzer=standard&pretty=true&text=我是中国人"

我们会得到这样的结果：

{
tokens: [
{
token: text
start_offset: 2
end_offset: 6
type: <ALPHANUM>
position: 1
}
{
token: 我
start_offset: 9
end_offset: 10
type: <IDEOGRAPHIC>
position: 2
}
{
token: 是
start_offset: 10
end_offset: 11
type: <IDEOGRAPHIC>
position: 3
}
{
token: 中
start_offset: 11
end_offset: 12
type: <IDEOGRAPHIC>
position: 4
}
{
token: 国
start_offset: 12
end_offset: 13
type: <IDEOGRAPHIC>
position: 5
}
{
token: 人
start_offset: 13
end_offset: 14
type: <IDEOGRAPHIC>
position: 6
}
]
}

正常情况下，这不是我们想要的结果，比如我们更希望 “中国人”，“中国”，“我”这样的分词，这样我们就需要安装中文分词插件，ik就是实现这个功能的。