1.比较
使用内置分词器
内置的分词器将一句话拆分成一个个字,这种拆法意义不大
使用IK分词器
2.安装IK分词器
资源
https://github.com/medcl/elasticsearch-analysis-ik/releases
链接:https://pan.baidu.com/s/1dTzBN6fr1ieks25qDqA26A
提取码:0cc3
在es安装目录下的plugins目录里创建ik目录
mkdir /usr/local/es/elasticsearch-7.2.0/plugins/ik
安装unzip命令
yum -y install unzip
解压
unzip elasticsearch-analysis-ik-7.2.0.zip
重启es即可
3.使用
ik使用
ik_max_word :会将文本做最细粒度的拆分;尽可能多的拆分出词语
ik_smart:会做最粗粒度的拆分;已被分出的词语将不会再次被其它词语占有
{
-"tokens": [
-{
"token": "中华人民共和国",
"start_offset": 0,
"end_offset": 7,
"type": "CN_WORD",
"position": 0
},
-{
"token": "中华人民",
"start_offset": 0,
"end_offset": 4,
"type": "CN_WORD",
"position": 1
},
-{
"token": "中华",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 2
},
-{
"token": "华人",
"start_offset": 1,
"end_offset": 3,
"type": "CN_WORD",
"position": 3
},
-{
"token": "人民共和国",
"start_offset": 2,
"end_offset": 7,
"type": "CN_WORD",
"position": 4
},
-{
"token": "人民",
"start_offset": 2,
"end_offset": 4,
"type": "CN_WORD",
"position": 5
},
-{
"token": "共和国",
"start_offset": 4,
"end_offset": 7,
"type": "CN_WORD",
"position": 6
},
-{
"token": "共和",
"start_offset": 4,
"end_offset": 6,
"type": "CN_WORD",
"position": 7
},
-{
"token": "国",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 8
},
-{
"token": "国歌",
"start_offset": 7,
"end_offset": 9,
"type": "CN_WORD",
"position": 9
}
]
}
4.创建索引、使用IK分词器
创建索引
{
"settings" : {
"analysis" : {
"analyzer" : {
"ik" : {
"tokenizer" : "ik_max_word"
}
}
}
},
"mappings" : {
"properties" : {
"username" : {"type" : "text", "analyzer" : "ik_max_word"}
}
}
}
添加数据
查询
5.自定义
{
-"tokens": [
-{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
-{
"token": "我",
"start_offset": 3,
"end_offset": 4,
"type": "CN_CHAR",
"position": 1
},
-{
"token": "朴",
"start_offset": 4,
"end_offset": 5,
"type": "CN_CHAR",
"position": 2
},
-{
"token": "国",
"start_offset": 5,
"end_offset": 6,
"type": "CN_CHAR",
"position": 3
},
-{
"token": "昌",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 4
}
]
}
创建 custom 目录
mkdir custom
custom/myext.dic
custom/myext_stopword.dic
vim IKAnalyzer.cfg.xml
重启es
{
-"tokens": [
-{
"token": "你好",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
-{
"token": "史珍香",
"start_offset": 2,
"end_offset": 5,
"type": "CN_WORD",
"position": 1
},
-{
"token": "我",
"start_offset": 6,
"end_offset": 7,
"type": "CN_CHAR",
"position": 2
},
-{
"token": "朴国昌",
"start_offset": 7,
"end_offset": 10,
"type": "CN_WORD",
"position": 3
}
]
}