https://github.com/medcl/elasticsearch-analysis-ik/releases
https://github.com/medcl/elasticsearch-analysis-pinyin
以上版本有Medcl维护,紧随Elasticsearch版本更新,更新比较及时。
https://github.com/NLPchina/elasticsearch-analysis-ansj/releases
https://github.com/hankcs/HanLP/releases
https://github.com/KennFalcon/elasticsearch-analysis-hanlp
HanLP由Java开发,Elasticsearch插件由第三方提供。部分版本不能和Elasticsearch同步更新。
jieba
https://github.com/fxsjy/jieba
Python 语言中支持的比较好。Elasticsearch版本的插件更新不及时 需要自行编译。
https://github.com/sing1ee/elasticsearch-jieba-plugin/releases
清华大学中文分析:
https://github.com/thunlp/THULAC
支持的Elasticsearch版本:6.1.0 --6.4.1 版本 其他版本需要自己编译.
https://github.com/microbun/elasticsearch-thulac-plugin
其他的插件:
https://github.com/NLPchina/elasticsearch-sql/releases
# /usr/share/elasticsearch/bin/elasticsearch -V
Version: 6.8.4, Build: default/rpm/bca0c8d/2019-10-16T06:19:49.319352Z, JVM: 1.8.0_221
安装示例:
1.从elasticsearch官方安装:
# /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-smartcn
2.从网络安装第三方插件包:
# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.4/elasticsearch-analysis-ik-6.8.4.zip
3.从插件包:
# /usr/share/elasticsearch/bin/elasticsearch-plugin install file:///root/elasticsearch-analysis-ik-6.8.4.zip
-> Downloading file:///root/elasticsearch-analysis-ik-6.8.4.zip
[=================================================] 100%
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: plugin requires additional permissions @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.
Continue with installation? [y/N]y
-> Installed analysis-ik
4.自动解压插件包到指定路径:
#mkdir -p /usr/share/elasticsearch/plugins/analysis-ik
# unzip elasticsearch-analysis-ik-6.8.4.zip -d /usr/share/elasticsearch/plugins/analysis-ik/
可以看到目录:
# tree /usr/share/elasticsearch/plugins/analysis-ik/
/usr/share/elasticsearch/plugins/analysis-ik/
├── commons-codec-1.9.jar
├── commons-logging-1.2.jar
├── config
│ ├── extra_main.dic
│ ├── extra_single_word.dic
│ ├── extra_single_word_full.dic
│ ├── extra_single_word_low_freq.dic
│ ├── extra_stopword.dic
│ ├── IKAnalyzer.cfg.xml
│ ├── main.dic
│ ├── preposition.dic
│ ├── quantifier.dic
│ ├── stopword.dic
│ ├── suffix.dic
│ └── surname.dic
├── elasticsearch-analysis-ik-6.8.4.jar
├── httpclient-4.5.2.jar
├── httpcore-4.4.4.jar
├── plugin-descriptor.properties
└── plugin-security.policy
1 directory, 19 files
# mkdir -p /usr/share/elasticsearch/plugins/analysis-pinyin
# unzip elasticsearch-analysis-pinyin-6.8.4.zip -d /usr/share/elasticsearch/plugins/analysis-pinyin/
在elasticsearch所有节点重启elasticsearch服务:
systemctl restart elasticsearch
查看插件:
# curl http://192.168.8.102:9200/_cat/plugins
iopwxwk analysis-ik 6.8.4
iopwxwk analysis-pinyin 6.8.4
iopwxwk analysis-smartcn 6.8.4
DtC9wPm analysis-ik 6.8.4
DtC9wPm analysis-pinyin 6.8.4
DtC9wPm analysis-smartcn 6.8.4
eW8Ldat analysis-ik 6.8.4
eW8Ldat analysis-pinyin 6.8.4
eW8Ldat analysis-smartcn 6.8.4
可以看到每个elasticsearch 节点都部署的有三个中文分词器。
默认的分词器:
GET _analyze
{
"text": "中华人民共和国"
}
GET _analyze
{
"analyzer": "ik_smart",
"text": "中华人民共和国"
}
GET _analyze
{
"analyzer": "smartcn",
"text": "中华人民共和国"
}
GET _analyze
{
"analyzer": "ik_max_word",
"text": "中华人民共和国"
}
GET _analyze
{
"analyzer": "pinyin",
"text": "中华人民共和国"
}