Postagesql + JPA 实现中文全文搜索
1.需求
原本平常常用的数据库是用的MySQL和influxdb,但是了解到Postagsql数据库可以进行全文搜索而且比sql语句的like更优后,想用Postagesql做探测信息的Log的查询,这里用这篇文章记录和总结下。
2. 实现
tsvector类型
这个类型是进行数据库保存分词的类型,我们可以把用储存我们进行的分词语句存放到这个数据库类型里面。这里我把search来保存文本分词后的结构。(当然这里仅仅是对于中文分词,英文分词的话不需要进行数据库存放分词数据,只需要在查询的时候用函数操作分词就好。)
create TABLE asset_log (
id SERIAL PRIMARY KEY,
biz_id varchar(50),`在这里插入代码片`
title varchar(500),
status varchar(50),
description varchar(2000),
search tsvector,
asset_bom_biz_id varchar(50),
asset_bom_node_biz_id varchar(50),
asset_log_template_id int,
content jsonb DEFAULT NULL,
start_time timestamp NULL,
end_time timestamp NULL,
CONSTRAINT fk_iot_asset_log_asset_log_template_id FOREIGN KEY ( asset_log_template_id ) REFERENCES asset_log_template ( id )
);
中文分词函数
中文分词用的是jieba的库,利用jieba直接进行分词。
首先引入依赖
compile group: 'com.huaban',name: 'jieba-analysis',version: '1.0.2'
然后写工具类方法 ,注意【JiebaSegmenter.SegMode】的模式有两种,一种是index,一种是seach,seach分词更粗,index更细点,搜索的话,建议选择seach。
public List<String> segmente(String text, JiebaSegmenter.SegMode mode) {
JiebaSegmenter segmenter = new JiebaSegmenter();
List<SegToken> tokens = segmenter.process(text, mode);
List<String> result = tokens.stream()
.map(tk -> tk.word)
.filter(t -> t != null && !t.isBlank())
.collect(Collectors.toCollection(LinkedList::new));
return result;
}
实体类:
实体类的映射是String,本来我以为是PostgreSQLTSVectorType类型,但是反射出错了,改成Sting就好了。
public class AssetLog {
@Builder.Default
private String bizId = UUID.randomUUID().toString();
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Long id;
private String title;
private String status;
private String description;
private String search ;
private String assetBomBizId;
private String assetBomNodeBizId;
}
Repository
可以看到里面就是用原生sql写对数据库进行更新seach分词的操作,首先用postagsql的to_tsvector函数进行对录入的数据进行分词,然后用setweight函数进行权重的分配,这里我的log主要是两种信息,一种是title,一种是description。把title设为权重A,description设为权重B。|| 符号是用来做拼接的。
public interface AssetLogRepository extends JpaRepository<AssetLog, Long>, JpaSpecificationExecutor<AssetLog> {
@Modifying
@Query(value = "UPDATE asset_log\n" +
"SET search = setweight(to_tsvector('simple', :title), 'A') || setweight(to_tsvector('simple', :description), 'B') WHERE id = :id", nativeQuery = true)
void segment(@Param("title") String title,@Param("description") String description,@Param("id")Long id) ;
}
Service
- 插入数据,在插入数据的时候,我们就对录入的数据进行分词处理和权重处理,并且进行保存到seach里面。
@Transactional
public AssetLogDTO createAssetLog(String accountBizId, UpsertAssetLogDTO request) {
if (StringUtils.isBlank(request.getAssetLogTemplateBizId())) {
throw new BadRequest(ResponseCode.BAD_REQUEST, "AssetLogTemplateId is not null");
}
var assetLog = AssetLog.builder()
.assetLogTemplate(this.getAssetLogTemplateEntity(request.getAssetLogTemplateBizId(), accountBizId))
.assetBomBizId(request.getAssetBomBizId())
.assetBomNodeBizId(request.getAssetBomNodeBizId())
.description(request.getDescription())
.status(request.getStatus())
.title(request.getTitle())
.build();
if (request.getFormContent() != null) {
var newJsonForm = new ObjectMapper().createObjectNode();
if (assetLog.getContent() != null) {
assetLog.getContent().fields()
.forEachRemaining(kv -> newJsonForm.set(kv.getKey(), kv.getValue()));
request.getFormContent().fields()
.forEachRemaining(kv -> newJsonForm.set(kv.getKey(), kv.getValue()));
}
}
/**
if (request.getAssetLogIotVariables() != null) {
var assetLogVariableList = this.assetLogIotVariableRepository.findByBizIdIn(request.getAssetLogIotVariables());
assetLog.setAssetLogIotVariables(assetLogVariableList);
}
**/
assetLog = this.assetLogRepository.save(assetLog);
var id = assetLog.getId() ;
Segment segment= new Segment() ;
LOGGER.info(id.toString()) ;
var title = segment.segmente(request.getTitle(), JiebaSegmenter.SegMode.INDEX);
String titleString = String.join(" ",title) ;
var description = segment.segmente(request.getDescription(), JiebaSegmenter.SegMode.SEARCH) ;
String descriptionString = String.join(" ",description) ;
LOGGER.info( titleString ) ;
LOGGER.info(descriptionString) ;
this.assetLogRepository.segment(titleString,descriptionString, id ) ;
var dataList = this.toAssetLogDtoList(accountBizId, Arrays.asList(assetLog));
return dataList.size() > 0 ? dataList.get(0) : null;
}
- 查询操作
查询操作的时候,我们就进行想查询的数据放到toTsquery函数参数里面对seach字段查询,然后用to_rank函数进行对匹配的评分,当匹配的分数 > 0 的时候就把数据筛选出来。
public PaginationDTO<List<AssetLogDTO>> getAssetLogList(String accountBizId,
String templateBizId, String templateName,
String bizIds,
LocalDateTime from,
LocalDateTime to,
String sort,
String status,
String search,
Integer page, Integer size) {
var data = this.assetLogRepository.findAll(((Specification<AssetLog>) (root, query, builder) -> {
List<Predicate> predicates = Lists.newArrayList();
if (StringUtils.isNotBlank(accountBizId)) {
predicates.add(builder.equal(root.join("assetLogTemplate").get("accountBizId"), accountBizId));
}
if (StringUtils.isNotBlank(templateBizId)) {
predicates.add(builder.equal(root.join("assetLogTemplate").get("bizId"), templateBizId));
}
if (StringUtils.isNotBlank(templateName)) {
predicates.add(builder.equal(root.join("assetLogTemplate").get("bizName"), templateName));
}
if (StringUtils.isNotBlank(bizIds)) {
var bizIdArray = bizIds.split(",");
CriteriaBuilder.In<String> in = builder.in(root.get("bizId"));
for (String bizId : bizIdArray) {
in.value(bizId);
}
predicates.add(in);
}
if (StringUtils.isNotBlank(status)) {
predicates.add(builder.or(Arrays.stream(status.split(","))
.map(str -> builder.equal(root.get("status"), str)).toArray(Predicate[]::new)));
}
if (from != null) {
predicates.add(builder.greaterThan(root.get("startTime"), from));
}
if (to != null) {
predicates.add(builder.lessThan(root.get("startTime"), to));
}
if (StringUtils.isNotBlank(search)) {
LOGGER.info("here");
List<Predicate> predicatess = Lists.newArrayList();
//在程序端分词完毕后使用 ‘|’ 拼接传入
Expression toTsquery = builder.function("to_tsquery",String.class,builder.literal("simple"),builder.literal(search)) ;
Expression toRank = builder.function("ts_rank", PostgreSQLTSVectorType.class,root.get("search"),toTsquery);
if ("asc".equalsIgnoreCase(sort)) {
query.orderBy(builder.asc(toRank));
} else {
query.orderBy(builder.desc(toRank));
}
predicates.add(builder.or(builder.greaterThan(toRank,"0")));
}
return builder.and(predicates.toArray(new Predicate[0]));
}), PageRequest.of(page, size));
return PaginationDTO.<List<AssetLogDTO>>builder().total(data.getTotalElements())
.data(this.toAssetLogDtoList(accountBizId, data.getContent())).build();
}
参考
https://www.skypyb.com/2020/12/jishu/1705/
Over!!!!!!