ES7 ik结合pinyin实现简拼搜索

提示:文章写完后,目录可以自动生成,如何生成可参考右边的帮助文档


前言

最近复习ES,想到之前用过ES6实现ik+pinyin分词器,打算换换ES7来实现一下。

一、先上效果图

接口填写pg两个字母成功匹配到存ES的三条苹果相关数据。
示例:pandas 是基于NumPy 的一种工具,该工具是为了解决数据分析任务而创建的。

二、实现步骤

1.准备环境

推荐window10可以使用docker Desktop

docker pull elasticsearch:7.6.2
docker run --name elasticsearch  -d -e ES_JAVA_OPTS="-Xms512m -Xmx512m" -e "discovery.type=single-node" -p 9200:9200 -p 9300:9300 elasticsearch:7.6.2
在es容器内config目录elasticsearch.yml加入对外开放配置
http.cors.enabled: true
http.cors.allow-origin: "*"
下载分词器,可以从中文社区找 https://elasticsearch.cn/download/ 
将ik pinyin分词器用 dokcer cp 命令复制到容器plugins目录并解压
docker cp ./elasticsearch-analysis-ik-7.6.2.zip elasticsearch:/usr/share/elasticsearch/plugins
unzip elasticsearch-analysis-pinyin-7.6.2.zip -d elasticsearch-analysis-pinyin-7.6.2
unzip elasticsearch-analysis-pinyin-7.6.2.zip -d elasticsearch-analysis-pinyin-7.6.2
重启es
docker restart elasticsearch

2.创建索引

postman put请求 localhost:9200/product 创建名为product的索引,并配置了ik结合pinyin的分词器起名 ik_pinyin_analyzer,此时索引的mapping并没有添加。
请求体如下

{"settings":{"index":{"number_of_replicas":"0","number_of_shards":"1","analysis":{"analyzer":{"ik_pinyin_analyzer":{"tokenizer":"my_ik_pinyin","filter":"pinyin_first_letter_and_full_pinyin_filter"},"pinyin_analyzer":{"tokenizer":"my_pinyin"}},"tokenizer":{"my_ik_pinyin":{"type":"ik_max_word"},"my_pinyin":{"type":"pinyin","keep_first_letter":true,"keep_separate_first_letter":false,"keep_full_pinyin":false,"keep_joined_full_pinyin":true,"keep_none_chinese":true,"none_chinese_pinyin_tokenize":false,"keep_none_chinese_in_joined_full_pinyin":true,"keep_original":false,"limit_first_letter_length":16,"lowercase":true,"trim_whitespace":true,"remove_duplicated_term":true}},"filter":{"pinyin_first_letter_and_full_pinyin_filter":{"type":"pinyin","keep_first_letter":true,"keep_separate_first_letter":false,"keep_full_pinyin":false,"keep_joined_full_pinyin":true,"keep_none_chinese":true,"none_chinese_pinyin_tokenize":false,"keep_none_chinese_in_joined_full_pinyin":true,"keep_original":false,"limit_first_letter_length":16,"lowercase":true,"trim_whitespace":true,"remove_duplicated_term":true}}}}}}

实体类代码如下

@Data
@AllArgsConstructor
@NoArgsConstructor
@Document(indexName = "product")
public class Product {

    @Field(store = true,type = FieldType.Keyword)
    private Long id;

    @MultiField(
            mainField = @Field(type = FieldType.Keyword),
            otherFields = {
                    @InnerField(type = FieldType.Text, suffix = "ik", analyzer = "ik_max_word", searchAnalyzer = "ik_max_word"),
                    @InnerField(type = FieldType.Text, suffix = "ik_pinyin", analyzer = "ik_pinyin_analyzer", searchAnalyzer = "ik_pinyin_analyzer"),
                    @InnerField(type = FieldType.Text, suffix = "pinyin", analyzer = "pinyin_analyzer", searchAnalyzer = "pinyin_analyzer")
            }
    )
    private String name;

    @MultiField(
            mainField = @Field(type = FieldType.Keyword),
            otherFields = {
                    @InnerField(type = FieldType.Text, suffix = "ik", analyzer = "ik_max_word", searchAnalyzer = "ik_max_word"),
                    @InnerField(type = FieldType.Text, suffix = "ik_pinyin", analyzer = "ik_pinyin_analyzer", searchAnalyzer = "ik_pinyin_analyzer"),
                    @InnerField(type = FieldType.Text, suffix = "pinyin", analyzer = "pinyin_analyzer", searchAnalyzer = "pinyin_analyzer")
            }
    )
    private String category;

    @MultiField(
            mainField = @Field(type = FieldType.Keyword),
            otherFields = {
                    @InnerField(type = FieldType.Text, suffix = "ik", analyzer = "ik_max_word", searchAnalyzer = "ik_max_word"),
                    @InnerField(type = FieldType.Text, suffix = "ik_pinyin", analyzer = "ik_pinyin_analyzer", searchAnalyzer = "ik_pinyin_analyzer"),
                    @InnerField(type = FieldType.Text, suffix = "pinyin", analyzer = "pinyin_analyzer", searchAnalyzer = "pinyin_analyzer")
            }
    )
    private String tag;

    @Field(store = true, type = FieldType.Double)
    private Double price;


}

代码上利用springbooot-elasticsearch提供的elasticsearchRestTemplate的方法对实体类Product创建mapping。详细代码见github

    
    public JSONResult create() {
        Document b = esTemplate.indexOps(Product.class).createMapping(Product.class);
        boolean putMapping = esTemplate.indexOps(Product.class).putMapping(b);
        return JSONResult.ok(putMapping);
    }

此时可以使用es-header插件查看索引mappings是否能和product实体类对上。


添加数据测试搜索接口

搜索接口代码如下
@Override
public JSONResult select(String keyword) {
Pageable pageable = PageRequest.of(0, 10);

    SortBuilder sortBuilder = new FieldSortBuilder("price")
            .order(SortOrder.ASC);
    QueryBuilder queryBuilder = QueryBuilders.boolQuery()
    		// 当输入简拼时会被ik-pinyin分词器命中
            .should(QueryBuilders.matchQuery("name.ik_pinyin",keyword).boost(2))
            // 当输入文字时会被ik分词器命中
            .should(QueryBuilders.matchQuery("name.ik",keyword).boost(2))
            // 分类字段也可能匹配上
            .should(QueryBuilders.matchQuery("category.ik",keyword).boost(1))
            .should(QueryBuilders.matchQuery("tag.ik",keyword).boost(1))
         ;
    NativeSearchQuery query = new NativeSearchQueryBuilder()
            .withQuery(queryBuilder)
            .withPageable(pageable)
            .withSort(sortBuilder)
            .build();
    SearchHits<Product> searchHits = esTemplate.search(query, Product.class);
    List<SearchHit<Product>> data = searchHits.getSearchHits();
    return JSONResult.ok(data);
}

之后就可以使用添加接口添加数据

    public JSONResult add(ProductBO product) {
        Product p = new Product(product.getId(), product.getName(), product.getCategory(),product.getTag(), product.getPrice());
        Product save = esTemplate.save(p);
        return JSONResult.ok(save);
    }

至此ES7 ik结合pinyin分词器实现简拼搜索已经完成,看起来花里胡哨估计实际场景没多少地方会用。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值