Elasticsearch安装中文分词插件ik

最新推荐文章于 2024-07-03 23:30:00 发布

yunshouhu

最新推荐文章于 2024-07-03 23:30:00 发布

阅读量1.2k

点赞数

分类专栏：搜索引擎 java

java 同时被 2 个专栏收录

309 篇文章 5 订阅

订阅专栏

搜索引擎

12 篇文章 0 订阅

订阅专栏

http://blog.csdn.net/liuzhenfeng/article/details/39404435

Elasticsearch默认提供的分词器，会把每个汉字分开，而不是我们想要的根据关键词来分词。例如：

[html] view plain copy print ?

curl -XPOST "http://localhost:9200/userinfo/_analyze?analyzer=standard&pretty=true&text=我是中国人"

curl -XPOST  "http://localhost:9200/userinfo/_analyze?analyzer=standard&pretty=true&text=我是中国人"

我们会得到这样的结果：

[html] view plain copy print ?

{
tokens: [
{
token: text
start_offset: 2
end_offset: 6
type: <ALPHANUM>
position: 1
}
{
token: 我
start_offset: 9
end_offset: 10
type: <IDEOGRAPHIC>
position: 2
}
{
token: 是
start_offset: 10
end_offset: 11
type: <IDEOGRAPHIC>
position: 3
}
{
token: 中
start_offset: 11
end_offset: 12
type: <IDEOGRAPHIC>
position: 4
}
{
token: 国
start_offset: 12
end_offset: 13
type: <IDEOGRAPHIC>
position: 5
}
{
token: 人
start_offset: 13
end_offset: 14
type: <IDEOGRAPHIC>
position: 6
}
]
}

{
tokens: [
{
token: text
start_offset: 2
end_offset: 6
type: <ALPHANUM>
position: 1
}
{
token: 我
start_offset: 9
end_offset: 10
type: <IDEOGRAPHIC>
position: 2
}
{
token: 是
start_offset: 10
end_offset: 11
type: <IDEOGRAPHIC>
position: 3
}
{
token: 中
start_offset: 11
end_offset: 12
type: <IDEOGRAPHIC>
position: 4
}
{
token: 国
start_offset: 12
end_offset: 13
type: <IDEOGRAPHIC>
position: 5
}
{
token: 人
start_offset: 13
end_offset: 14
type: <IDEOGRAPHIC>
position: 6
}
]
}

正常情况下，这不是我们想要的结果，比如我们更希望 “中国人”，“中国”，“我”这样的分词，这样我们就需要安装中文分词插件，ik就是实现这个功能的。

elasticsearch-analysis-ik 是一款中文的分词插件，支持自定义词库。

安装步骤：

1、到github网站下载源代码，网站地址为：https://github.com/medcl/elasticsearch-analysis-ik

右侧下方有一个按钮“Download ZIP"，点击下载源代码elasticsearch-analysis-ik-master.zip。

2、解压文件elasticsearch-analysis-ik-master.zip，进入下载目录，执行命令：

[html] view plain copy print ?

unzip elasticsearch-analysis-ik-master.zip

unzip elasticsearch-analysis-ik-master.zip

3、将解压目录文件中config/ik文件夹复制到ES安装目录config文件夹下。

4、因为是源代码，此处需要使用maven打包，进入解压文件夹中，执行命令：

[html] view plain copy print ?

mvn clean package

mvn clean package

5、将打包得到的jar文件elasticsearch-analysis-ik-1.2.8-sources.jar复制到ES安装目录的lib目录下。

6、在ES的配置文件config/elasticsearch.yml中增加ik的配置，在最后增加：

[html] view plain copy print ?

index:
analysis:
analyzer:
ik:
alias: [ik_analyzer]
type: org.elasticsearch.index.analysis.IkAnalyzerProvider
ik_max_word:
type: ik
use_smart: false
ik_smart:
type: ik
use_smart: true

index:
  analysis:                   
    analyzer:      
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true

或

[html] view plain copy print ?

index.analysis.analyzer.ik.type : “ik”

index.analysis.analyzer.ik.type : “ik”

7、重新启动elasticsearch服务，这样就完成配置了，收入命令：

[html] view plain copy print ?

curl -XPOST "http://localhost:9200/userinfo/_analyze?analyzer=ik&pretty=true&text=我是中国人"

curl -XPOST  "http://localhost:9200/userinfo/_analyze?analyzer=ik&pretty=true&text=我是中国人"

测试结果如下：

[html] view plain copy print ?

{
tokens: [
{
token: text
start_offset: 2
end_offset: 6
type: ENGLISH
position: 1
}
{
token: 我
start_offset: 9
end_offset: 10
type: CN_CHAR
position: 2
}
{
token: 中国人
start_offset: 11
end_offset: 14
type: CN_WORD
position: 3
}
{
token: 中国
start_offset: 11
end_offset: 13
type: CN_WORD
position: 4
}
{
token: 国人
start_offset: 12
end_offset: 14
type: CN_WORD
position: 5
}
]
}

{
tokens: [
{
token: text
start_offset: 2
end_offset: 6
type: ENGLISH
position: 1
}
{
token: 我
start_offset: 9
end_offset: 10
type: CN_CHAR
position: 2
}
{
token: 中国人
start_offset: 11
end_offset: 14
type: CN_WORD
position: 3
}
{
token: 中国
start_offset: 11
end_offset: 13
type: CN_WORD
position: 4
}
{
token: 国人
start_offset: 12
end_offset: 14
type: CN_WORD
position: 5
}
]
}

说明：

1、ES安装插件本来使用使用命令plugin来完成，但是我本机安装ik时一直不成功，所以就使用源代码打包安装了。

2、自定义词库的方式，请参考 https://github.com/medcl/elasticsearch-analysis-ik

附：

2.安装IK插件：（1.2.9没有jar包，要自己打）

要与ES版本配套：

Version
-—————
master | 1.4.0 → master
1.2.9 | 1.4.0
1.2.8 | 1.3.2
1.2.7 | 1.2.1

下载elasticsearch-analysis-ik 源代码 https://github.com/medcl/elasticsearch-analysis-ik

 
      cd  
      elasticsearch-analysis-ik 
     
      mvn package 
     
      cd  
      target 
      /releases/

就会看到 elasticsearch-analysis-ik-1.2.9.zip

把elasticsearch-analysis-ik-1.2.9.zip 解压到 ES/plugins/analysis-ik/

 
      root@blog-mreald-com: 
      /usr/share/elasticsearch/plugins/analysis-ik 
      # ls 
     
      commons-codec-1.6.jar      elasticsearch-analysis-ik-1.2.9.jar  httpclient-4.3.5.jar 
     
      commons-logging-1.1.3.jar  elasticsearch-analysis-ik-1.2.9.zip  httpcore-4.3.2.jar

将ik的配置和字典都复制到ES_HOME/config下

 
      sudo  
      cp  
      -R ik  
      /etc/elasticsearch 
     

elasticsearch.yml 增加下面一行，然后重启下ES

 
      index.analysis.analyzer.ik.type : 'ik' 
     

3.IK 安装之后：

 
      root@blog-mreald-com: 
      /usr/share/elasticsearch 
      # curl -XGET 'http://localhost:9200/posts/_analyze?analyzer=ik&pretty=true&text=Mreald致力于全栈工程师' 
     

查看结果：

 
      { 
     
      "tokens"  
      : [ { 
     
      "token"  
      :  
      "mreald" 
      , 
     
      "start_offset"  
      : 1, 
     
      "end_offset"  
      : 7, 
     
      "type"  
      :  
      "ENGLISH" 
      , 
     
      "position"  
      : 1 
     
      }, { 
     
      "token"  
      :  
      "致力于" 
      , 
     
      "start_offset"  
      : 8, 
     
      "end_offset"  
      : 11, 
     
      "type"  
      :  
      "CN_WORD" 
      , 
     
      "position"  
      : 2 
     
      }, { 
     
      "token"  
      :  
      "致力" 
      , 
     
      "start_offset"  
      : 8, 
     
      "end_offset"  
      : 10, 
     
      "type"  
      :  
      "CN_WORD" 
      , 
     
      "position"  
      : 3 
     
      }, { 
     
      "token"  
      :  
      "致" 
      , 
     
      "start_offset"  
      : 8, 
     
      "end_offset"  
      : 9, 
     
      "type"  
      :  
      "CN_WORD" 
      , 
     
      "position"  
      : 4 
     
      }, { 
     
      ...................

2.错误：

{
  "error" : "NoShardAvailableActionException[[_na][_na] No shard available for
  [org.elasticsearch.action.admin.indices.analyze.AnalyzeRequest@7c9c15]]",
  "status" : 503
}

cp 过去的是： target/releases/elasticsearch-analysis-ik-1.2.9.zip

不是： target/elasticsearch-analysis-ik-1.2.9.jar

http://blog.mreald.com/160