ES自动补全-拼音分词器

拼音分词

要实现根据字母做补全,就必须对文档按照拼音分词。

在GitHub上恰好有elasticsearch的拼音分词插件,

地址:https://github.com/medcl/elasticsearch-analysis-pinyin

自定义分词器

elasticsearch中分词器(analyzer)的组成包含三部分:

  • character filters:在tokenizer之前对文本进行处理。例如删除字符、替换字符

  • tokenizer:将文本按照一定的规则切割成词条(term)。例如keyword,就是不分词;还有ik_smart

  • tokenizer filter:将tokenizer输出的词条做进一步处理。例如大小写转换、同义词处理、拼音处理等

我们可以在创建索引库时,通过settings来配置自定义的analyzer(分词器):

https://github.com/medcl/elasticsearch-analysis-pinyin

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": { 
        "my_analyzer": { 
          "tokenizer": "ik_max_word",
          "filter": "py"
        }
      },
      "filter": {
        "py": { 
          "type": "pinyin",
          "keep_full_pinyin": false,
          "keep_joined_full_pinyin": true,
          "keep_original": true,
          "limit_first_letter_length": 16,
          "remove_duplicated_term": true,
          "none_chinese_pinyin_tokenize": false
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "name":{
        "type": "text",
        "analyzer": "my_analyzer",
        "search_analyzer": "ik_smart"
      }
    }
  }
}

拼音分词器适合在创建倒排索引的时候使用,但不能在搜索的时候使用:

自动补全查询

elasticsearch提供了Completion Suggester查询来实现自动补全功能。这个查询会匹配以用户输入内容开头的词条并返回。

为了提高补全查询的效率,对于文档中字段的类型有一些约束

  • 参与补全查询的字段必须是completion类型。

  • 字段的内容一般是用来补全的多个词条形成的数组。

由此,对索引库是有要求的:例子

PUT /test2
{
  "mappings": {
    "properties": {
      "title":{
        "type": "completion"
      }
    }
  }
}
POST test2/_doc/1
{
  "title": ["Sony", "performance"]
}
POST test2/_doc/2
{
  "title": ["SK-II", "PITERA"]
}
POST test2/_doc/3
{
  "title": ["Nintendo", "switch"]
}
POST test2/_doc/4
{
  "title": ["Sony", "Nintendo"]
}

查询语句如下:

GET /test2/_search
{ 
  "suggest": {
    "title_suggest": {
      "text": "s",
      "completion": {
        "field": "title",
        "skip_duplicates":true,
        "size":10
      }
    }
  }
}

实现搜索框自动补全

JavaAPI

再来看结果解析:

// 自动补全
@Test
public void testSuggest() throws Exception {
    // 1.创建request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL参数
    request.source().suggest(
        new SuggestBuilder().addSuggestion("hotelSuggestion",
                                           SuggestBuilders.completionSuggestion("suggestion")
                                           .prefix("r")
                                           .skipDuplicates(true)
                                           .size(10))
    );
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    Suggest suggest = response.getSuggest();
    CompletionSuggestion suggestion = suggest.getSuggestion("hotelSuggestion");
    for (CompletionSuggestion.Entry.Option option : suggestion.getOptions()) {
        String text = option.getText().string();
        System.out.println(text);
    }
}

IHotelService

List<String> getSuggestions(String prefix) throws IOException;

HotelController

@GetMapping("/suggestion")
public List<String> getSuggestions(@RequestParam("key") String prefix) throws IOException {
    return hotelService.getSuggestions(prefix);
}

HotelService

@Override
public List<String> getSuggestions(String prefix) throws IOException {
    // 1.创建request
    SearchRequest request = new SearchRequest("hotel");
    // 2.准备DSL参数
    request.source().suggest(
        new SuggestBuilder().addSuggestion("hotelSuggestion",
                                           SuggestBuilders.completionSuggestion("suggestion")
                                           .prefix(prefix)
                                           .skipDuplicates(true)
                                           .size(10))
    );
    // 3.发送请求
    SearchResponse response = client.search(request, RequestOptions.DEFAULT);
    // 4.解析响应
    Suggest suggest = response.getSuggest();
    CompletionSuggestion suggestion = suggest.getSuggestion("hotelSuggestion");
    List<String> list = new ArrayList<>();
    for (CompletionSuggestion.Entry.Option option : suggestion.getOptions()) {
        String text = option.getText().string();
        list.add(text);
    }
    return list;
}

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值