Elasticsearch 6.8.4 安装中文分词器及分词示例

https://github.com/medcl/elasticsearch-analysis-ik/releases
https://github.com/medcl/elasticsearch-analysis-pinyin
以上版本有Medcl维护,紧随Elasticsearch版本更新,更新比较及时。

https://github.com/NLPchina/elasticsearch-analysis-ansj/releases

https://github.com/hankcs/HanLP/releases
https://github.com/KennFalcon/elasticsearch-analysis-hanlp
HanLP由Java开发,Elasticsearch插件由第三方提供。部分版本不能和Elasticsearch同步更新。

jieba
https://github.com/fxsjy/jieba
Python 语言中支持的比较好。Elasticsearch版本的插件更新不及时 需要自行编译。
https://github.com/sing1ee/elasticsearch-jieba-plugin/releases
清华大学中文分析:
https://github.com/thunlp/THULAC
支持的Elasticsearch版本:6.1.0 --6.4.1 版本 其他版本需要自己编译.
https://github.com/microbun/elasticsearch-thulac-plugin
其他的插件:
https://github.com/NLPchina/elasticsearch-sql/releases


# /usr/share/elasticsearch/bin/elasticsearch -V
Version: 6.8.4, Build: default/rpm/bca0c8d/2019-10-16T06:19:49.319352Z, JVM: 1.8.0_221


安装示例:
1.从elasticsearch官方安装:

# /usr/share/elasticsearch/bin/elasticsearch-plugin install analysis-smartcn

2.从网络安装第三方插件包:
# /usr/share/elasticsearch/bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.8.4/elasticsearch-analysis-ik-6.8.4.zip


3.从插件包:
# /usr/share/elasticsearch/bin/elasticsearch-plugin  install file:///root/elasticsearch-analysis-ik-6.8.4.zip 
-> Downloading file:///root/elasticsearch-analysis-ik-6.8.4.zip
[=================================================] 100%   
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission * connect,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

Continue with installation? [y/N]y
-> Installed analysis-ik

4.自动解压插件包到指定路径:
#mkdir -p /usr/share/elasticsearch/plugins/analysis-ik
# unzip elasticsearch-analysis-ik-6.8.4.zip -d /usr/share/elasticsearch/plugins/analysis-ik/

可以看到目录:
# tree /usr/share/elasticsearch/plugins/analysis-ik/
/usr/share/elasticsearch/plugins/analysis-ik/
├── commons-codec-1.9.jar
├── commons-logging-1.2.jar
├── config
│   ├── extra_main.dic
│   ├── extra_single_word.dic
│   ├── extra_single_word_full.dic
│   ├── extra_single_word_low_freq.dic
│   ├── extra_stopword.dic
│   ├── IKAnalyzer.cfg.xml
│   ├── main.dic
│   ├── preposition.dic
│   ├── quantifier.dic
│   ├── stopword.dic
│   ├── suffix.dic
│   └── surname.dic
├── elasticsearch-analysis-ik-6.8.4.jar
├── httpclient-4.5.2.jar
├── httpcore-4.4.4.jar
├── plugin-descriptor.properties
└── plugin-security.policy

1 directory, 19 files

# mkdir -p /usr/share/elasticsearch/plugins/analysis-pinyin
# unzip elasticsearch-analysis-pinyin-6.8.4.zip  -d /usr/share/elasticsearch/plugins/analysis-pinyin/


在elasticsearch所有节点重启elasticsearch服务:
systemctl restart elasticsearch

查看插件:
# curl http://192.168.8.102:9200/_cat/plugins
iopwxwk analysis-ik      6.8.4
iopwxwk analysis-pinyin  6.8.4
iopwxwk analysis-smartcn 6.8.4
DtC9wPm analysis-ik      6.8.4
DtC9wPm analysis-pinyin  6.8.4
DtC9wPm analysis-smartcn 6.8.4
eW8Ldat analysis-ik      6.8.4
eW8Ldat analysis-pinyin  6.8.4
eW8Ldat analysis-smartcn 6.8.4

可以看到每个elasticsearch 节点都部署的有三个中文分词器。


默认的分词器:
GET _analyze
{  
  "text": "中华人民共和国"
}



GET _analyze
{  
  "analyzer": "ik_smart",
  "text": "中华人民共和国"
}


GET _analyze
{  
  "analyzer": "smartcn",
  "text": "中华人民共和国"
}



GET _analyze
{  
  "analyzer": "ik_max_word",
  "text": "中华人民共和国"
}

GET _analyze
{  
  "analyzer": "pinyin",
  "text": "中华人民共和国"
}

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值