SpringBoot集成ES 7.6.2 并对字段进行中文和拼音分词处理

最新推荐文章于 2024-04-25 19:52:07 发布

满小超的代码世界

最新推荐文章于 2024-04-25 19:52:07 发布

阅读量2k

点赞数 1

分类专栏： es 文章标签： elasticsearch spring boot java

本文链接：https://blog.csdn.net/qq_48329942/article/details/120717444

版权

es 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

文章目录

前言
一、为什么不用spring封装的spring-data-es？
二、springboot集成es的两种方式
二、hightLevelClient对ES进行操作
特别提醒
安装es head插件
总结

前言

在最近做的流媒体项目中需要集成ES搜索引擎，目前ES最新版本为7.x版本，在以往的项目中我都采用的是spring集成的spring-data-es,使用自定义类集成elasticsearchRepository来实现crud操作，目前spring家族的更新较慢，采用es官网推荐的rest-client会有不错的效率提升

提示：以下是本篇文章正文内容，下面案例可供参考

一、为什么不用spring封装的spring-data-es？

这位博主进行较为详细的说明

简单的说明：

1.springdataes采用的即将废弃的TransportClient，而TransportClient在es7版本中已经为弃用
2.spring更新速度慢，但是在springboot2.3.0后已经支持到es7版本
3.hightLevelClient虽然crud操作较为麻烦，但是在语法上更像dsl语言，也就是说可以参考dsl语言进行编程
4.rest-client有两种客户端，版本低的采用low-level-client,版本能跟上的采用high-level-client

二、springboot集成es的两种方式

虽然我采用的是es推荐的client，但是这里也给出springdataes的方式

1.spring-data-es使用elasticsearch

		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
		</dependency>

注意：
springboot在2.3.0 才集成es 7.x版本，如果springboot是2.2.x版本需要手动指定es版本，因为2.2.x版本内置es为6.x版本，而springboot 2.1.x不建议使用es7.x版本，可能一些类找不到

我使用的是springboot 2.2.5版本，在代码中指定es版本

    <properties>
        <java.version>1.8</java.version>
        <elasticsearch.version>7.6.2</elasticsearch.version>
    </properties>

2.doc对象的注解

代码如下（示例）：

@Data
@ApiModel("ES文件存储对象")
@NoArgsConstructor
@AllArgsConstructor
@Document(indexName = "upload-file") //取消type，
public class UploadFileDoc {

    @Id
    private String id;

    @ApiModelProperty("原始文件名")
    @Field(type = FieldType.Text,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String fileName;

    @ApiModelProperty("文件类型")
    @Field(type = FieldType.Keyword)
    private String type;

    @ApiModelProperty("缩略图")
    private String picUrl;

    @ApiModelProperty(value = "创建人")
    @Field(type = FieldType.Keyword)
    private String createBy;

    @Field(type = FieldType.Date)
    private Date createTime;
    
    @Field(type = FieldType.Date)
    private Date updateTime;
}

在@filed注解中指定分词器细粒度为ik_max_word，也可以指定ik_smart，这种注解方式只能指定单一的分词器，而拼音分词器，可以将ik_max_word替换为analyzer = "pinyin_analyzer"

在es7.x版本里不需要指定type类型，默认为_doc，在es8会废除type类型，原因是不同type是存储在同一个索引中，会影响lucene的压缩性能。

2.doc对象的注解

@Repository
public interface UploadFileDocRepository extends ElasticsearchRepository<UploadFileDoc,String> {
}

es中的index索引会在程序启动的时候扫描到这个类，并对该类指定的对象上的注解进行分析，从而创建对应的索引和字段，已经字段的类型

至此，在controller或者其他层调用UploadFileDocRepository 对象，进行crud操作

二、hightLevelClient对ES进行操作

1.doc对象

@Builder
@Data
@ApiModel("ES文件存储对象")
@NoArgsConstructor
@AllArgsConstructor
@Document(indexName = "upload-file")
@Setting(settingPath = "uploadfile_setting.json")
@Mapping(mappingPath = "uploadfile_mapping.json")
public class UploadFileDoc {

    @Id
    private String id;

    @ApiModelProperty("原始文件名")
    //@Field(type = FieldType.Text,analyzer = "ik_max_word",searchAnalyzer = "ik_max_word")
    private String fileName;

    @ApiModelProperty("文件类型")
    //@Field(type = FieldType.Keyword)
    private String type;

    @ApiModelProperty("缩略图")
    private String picUrl;

    @ApiModelProperty(value = "创建人")
    //@Field(type = FieldType.Keyword)
    private String createBy;

    //@Field(type = FieldType.Date)
    private Date createTime;

    //@Field(type = FieldType.Date)
    private Date updateTime;

}

2.中文，拼音分词器同时使用

springBoot提供多种字段映射方式，第一种说的是使用@Field注解对字段进行类型分类并指定分词器，但是这种方式不满足一些特定场景，现在使用@setting和@mapping指定json文件来实现

直接贴代码：

uploadfile_setting.json

{
  "index": {
    "number_of_shards" : "3",
    "number_of_replicas" : "1",
    "analysis": {
        "analyzer": {
          "default" : {
            "tokenizer" : "ik_max_word"
          },
          "pinyin_analyzer": {
            "tokenizer": "my_pinyin"
          }
        },
        "tokenizer": {
          "my_pinyin": {
            "type": "pinyin",
            //true：支持首字母
            "keep_first_letter": true,
            //false：首字母搜索只有两个首字母相同才能命中，全拼能命中
            //true：任何情况全拼，首字母都能命中
            "keep_separate_first_letter": false,
            //true：支持全拼  eg: 刘德华 -> [liu,de,hua]
            "keep_full_pinyin": true,
            "keep_original": true,
            //设置最大长度
            "limit_first_letter_length": 16,
            "lowercase": true,
            //重复的项将被删除,eg: 德的 -> de
            "remove_duplicated_term": true
          }
        }
      }
    }
}

setting 文件指定分片数，默认为5和1，指定默认分词器为ik分词器，拼音分词器为my_pinyin，并且设置my_pinyin的分词属性，该文件可以在百度找到。但是建议使用es-head来查看索引属性获取

uploadfile_mapping.json

该文件百度上很多都是错的，可能是版本问题，很多加上的mappings:{}字段，也有加上index和type名的字段，在项目启动时，会报格式错误，这就是为什么我推荐大家参考es-head里面，如果生成不了，可以使用第一种方式来生成后进行查看

{
  "properties": {
    "createBy": {
      "type": "keyword"
    },
    "createTime": {
      "type": "date"
    },
    "type": {
      "type": "keyword"
    },
    "fileName": {
      "type": "text",
      "analyzer": "ik_max_word",
      "search_analyzer": "ik_max_word",
      "fields": {
        "pinyin": {
          "type": "text",
          "term_vector": "with_positions_offsets",
          "analyzer": "pinyin_analyzer",
          "boost": 10.0
        }
      }
    }
  }
}

3.使用client进行crud操作

这篇博客写的很详细，并且例举了dsl对比

简单说，就是创建request作为参数来代入client方法中

比如：删除：创建 DeleteRequest 设置参数后 restHighLevelClient.delete(request, RequestOptions.DEFAULT);
更新：创建UpdateRequest 设置参数后 restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT);
其他高级查询，需要了解dsl语言来封装对象

特别提醒

ES对版本一致性要求较为严格，所以使用es7.x那么es服务器也建议采用对应的版本
1.分词器下载连接 https://github.com/medcl/elasticsearch-analysis-ik/releases
2.因为是同一哥们写的，所以ik和pinyin分词器在一起
3.pinyin分词器如果没有7.6.2版本，那么修改配置文件强行转为对应版本,否则肯定报错

安装es head插件

1.谷歌浏览器支持head插件，直接使用
2.自己安装，下载链接 https://codechina.csdn.net/mirrors/mobz/elasticsearch-head?utm_source=csdn_github_accelerator
3.自己安装需要node.js环境，因为下载包貌似是一个前端页面包，擦
4.node.js 下载链接 https://nodejs.org/en/download/
5.安装grunt，在node.js的目录使用cmd，输入命令npm install -g grunt -cli
6.进入head的根目录下，cmd 输入npm install对文件进行编译
7.在同样的根目录下cmd 输入 grunt server即可启动
8.不安装grunt，可以使用npm run start 同样可以启动
9.linux后台运行 grunt server & 或者npm run start &
10.修改es-head默认访问地址为localhost:9200，可以修改_site/app.js中修改localhost为目标地址

总结

好好学习，es水很深，我拿捏不住

满小超的代码世界

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
0
评论
SpringBoot集成ES 7.6.2 并对字段进行中文和拼音分词处理

文章目录前言一、为什么不用spring封装的spring-data-es？二、springboot集成es的两种方式1.spring-data-es使用elasticsearch2.doc对象的注解2.doc对象的注解二、hightLevelClient对ES进行操作1.doc对象2.中文，拼音分词器同时使用3.使用client进行crud操作特别提醒安装es head插件总结前言在最近做的流媒体项目中需要集成ES搜索引擎，目前ES最新版本为7.x版本，在以往的项目中我都采用的是spring集成的..
复制链接

扫一扫