在GitHub上恰好有elasticsearch的拼音分词插件,
地址:https://github.com/medcl/elasticsearch-analysis-pinyin
elasticsearch中分词器(analyzer)的组成包含三部分:
character filters:在tokenizer之前对文本进行处理。例如删除字符、替换字符
tokenizer:将文本按照一定的规则切割成词条(term)。例如keyword,就是不分词;还有ik_smart
tokenizer filter:将tokenizer输出的词条做进一步处理。例如大小写转换、同义词处理、拼音处理等
比如:这样索引库
PUT test2
{
"mappings": {
"properties": {
"title":{
"type": "completion"
}
}
}
}
POST test2/_doc/1
{
"title": ["Sony", "performance"]
}
POST test2/_doc/2
{
"title": ["SK-II", "PITERA"]
}
POST test2/_doc/3
{
"title": ["Nintendo", "switch"]
}
POST test2/_doc/4
{
"title": ["Sony", "Nintendo"]
}
查询的DSL语句如下:
GET /test2/_search
{
"suggest": {
"title_suggest": {
"text": "n",
"completion":{
"field":"title",
"size":10,
"skip_duplicates":true
}
}
}
}
自己去修改映射和bean,下面仅展示java的api
// 自动补全
@Test
public void test01() throws Exception {
// 1.创建request
SearchRequest request = new SearchRequest("hotel");
// 2.准备DSL
request.source().suggest(new SuggestBuilder().addSuggestion("hotelSuggestion",
SuggestBuilders.completionSuggestion("suggestion")
.prefix("shang")
.skipDuplicates(true)
.size(10)));
// 3.发送请求
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 4.解析结果
Suggest suggest = response.getSuggest();
// 4-1 根据名称获取补全结果集
CompletionSuggestion hotelSuggestion = suggest.getSuggestion("hotelSuggestion");
// 4-2 取出options
List<CompletionSuggestion.Entry.Option> optionList = hotelSuggestion.getOptions();
// 4-3 遍历
for (CompletionSuggestion.Entry.Option option : optionList) {
// 取出option中text,补全词条
String text = option.getText().string();
System.out.println(text);
}
}
需要使用的时候自己改参数