spring-es-data关键概念总结

最新推荐文章于 2024-08-20 08:56:10 发布

夏洛克牛顿

最新推荐文章于 2024-08-20 08:56:10 发布

阅读量240

点赞数

文章标签： elasticsearch spring java

本文链接：https://blog.csdn.net/baidugreat/article/details/131384509

版权

文中所写的spring-data-elasticsearch版本为5.1.1

Document注解

org.springframework.data.elasticsearch.annotations.Document

Document注解在修饰一个类的时候，说明了这个类是es中的一个存储对象，这个注解的配置应该mappings中的配置项相同

indexName es索引名字
createIndex 是否在启动时创建索引Configuration whether to create an index on repository bootstrapping.
dynamic 控制es如何动态的将字段添加到Document中
storeIdInSource 指定id属性是否存放在_source字段中。_source字段中含有传过来的初始JSON，留做备份使用。 The _source field contains the original JSON document body that was passed at index time. The _source field itself is not indexed (and thus is not searchable), but it is stored so that it can be returned when executing fetch requests, like get or search.
versionType 版本管理配置
writeTypeHint 是否写入类型推断

其中dynamic的部分值如下：

TRUE 新的字段直接添加到mapping中
INHERIT 从父对象或者mappings中继承动态配置

version的作用

Elasticsearch versions are numbers assigned to documents to track their changes and handle concurrency issues. A document gets a version 1 when it is indexed for the first time, and the version is incremented by 1 with every index, update, or delete operation. Elasticsearch uses an optimistic locking concept with the _version parameter to prevent multiple users from editing the same document at the same time. However, Elasticsearch does not keep a history of the document changes, and it does not provide functionality to access previous versions.

基于version的并发控制：

1、ES基于自身乐观锁进行并发控制
ES基于Document中的version字段进行并发控制，初始创建version为1
每次需要带上version进行更新，该步骤需要多次，特别是在多线程环境下

2、基于external version进行并发控制
ES提供了一个插件，可以不用它提供的version，可以基于自己维护的一个版本号进行控制。

这里需要关注两点
1.ES提供的version需要与ES中的version一摸一样的时候才可以进行修改
2.external version提供的version比ES中的version大的时候才可以进行修改

Field注解

org.springframework.data.elasticsearch.annotations.Field

index 是否为这个fied创建倒排索引，如果不需要使用这个字段查询则不用为其创建倒排索引searchAnalyzer 指定该字段进行搜索的时候用到的分词器
analyzer 使用该字段建立倒排索引的时候使用的分词器
type 类型，可以自动识别，如果时keyword，则使用这个字段建立索引的时候，字段不会被分词
format 时间类型字段的格式

分词器

ik分词器有两种分词模式：ik_max_word和ik_smart模式。

ik_max_word
会将文本做最细粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为“中华人民共和国、中华人民、中华、华人、人民大会堂、人民、共和国、大会堂、大会、会堂等词语。

ik_smart
会做最粗粒度的拆分，比如会将“中华人民共和国人民大会堂”拆分为中华人民共和国、人民大会堂。

索引时，为了提供索引的覆盖范围，通常会采用ik_max_word分析器，会以最细粒度分词索引，搜索时为了提高搜索准确度，会采用ik_smart分析器，会以粗粒度分词
字段mapping设置如下：

{
     "author": {
            "type": "string",
            "analyzer": "ik",
            "search_analyzer": "ik_smart"
     }
}

es内置分词器

Standard The standard analyzer divides text into terms on word boundaries, as defined by the Unicode Text Segmentation algorithm. It removes most punctuation, lowercases terms, and supports removing stop words. 默认的分词器
Simple The simple analyzer divides text into terms whenever it encounters a character which is not a letter. It lowercases all terms.
Whitespace The whitespace analyzer divides text into terms whenever it encounters any whitespace character. It does not lowercase terms.
Stop The stop analyzer is like the simple analyzer, but also supports removal of stop words.
Keyword The keyword analyzer is a “noop” analyzer that accepts whatever text it is given and outputs the exact same text as a single term.
Pattern The pattern analyzer uses a regular expression to split the text into terms. It supports lower-casing and stop words.
Language Elasticsearch provides many language-specific analyzers like english or french.
Fingerprint The fingerprint analyzer is a specialist analyzer which creates a fingerprint which can be used for duplicate detection.

Built-in analyzer reference | Elasticsearch Guide [master] | Elastic

各种不同的QueryBuilder

BoolQueryBuilder 该查询用来组合其他类型查询，可以通过must和should来进行组合，相当于mysql中的and和or
MatchQueryBuilder 先对需要查询的词进行分词，然后根据分词的结果去进行查询
TermQueryBuilder 不对查询的词做任何的处理，使用这种类型的查询之前需要先把查询词转换成小写
RangerQueryBuilder 根据范围进行查询

使用spring-data-elasticsearch进行查询的步骤

构建SearchRequest请求对象，指定索引库，
构建SearchSourceBuilder查询对象，如果有分页和排序参数也设置在SearchSourceBuilder中
构建QueryBuilder对象指定查询方式和查询条件
将QuseryBuilder对象设置到SearchSourceBuilder对象中
将SearchSourceBuilder设置到SearchRequest中
调用方法查询数据
解析返回结果

示例代码：

SearchRequest rq = new SearchRequest();
rq.indices("report_text_*");

BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery();
RangeQueryBuilder reportTimeRange = QueryBuilders.rangeQuery("report_time");
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
reportTimeRange.lte(dateFormat.parse(queryParam.getEndDate()).getTime());
reportTimeRange.gte(dateFormat.parse(queryParam.getBeginDate()).getTime());

if (StringUtils.isNotBlank(queryParam.getQueryWord())) {
    MatchPhraseQueryBuilder textMatchPhraseQuery = QueryBuilders.matchPhraseQuery("text", queryParam.getQueryWord());
    boolQueryBuilder.must(textMatchPhraseQuery);
}

if (queryParam.getPlatform() == 1 || queryParam.getPlatform() == 2) {
    TermQueryBuilder platformTermQuery = QueryBuilders.termQuery("platform", PlatformType.getPlatformTypeByCode(queryParam.getPlatform()).name().toLowerCase());
    boolQueryBuilder.must(platformTermQuery);
}
boolQueryBuilder.must(reportTimeRange);

SearchSourceBuilder sourceBuilder = new SearchSourceBuilder();
sourceBuilder.query(boolQueryBuilder);
if (queryParam.getCurrent() == 0 && queryParam.getSize() == 0) {
    sourceBuilder.size(200);
} else {
    sourceBuilder.size(queryParam.getSize());
}

sourceBuilder.sort(new FieldSortBuilder("report_time").order(SortOrder.DESC));
if (highlightText) {
    HighlightBuilder highlightBuilder = new HighlightBuilder();
    highlightBuilder.preTags(PRE_TAGS);
    highlightBuilder.postTags(END_TAGS);
    highlightBuilder.fragmentSize(2000);
    highlightBuilder.field("text");
    sourceBuilder.highlighter(highlightBuilder);
}
if (queryParam.getCurrent() >= 1) {
    int from = (queryParam.getCurrent() - 1) * queryParam.getSize();
    sourceBuilder.from(from);
    sourceBuilder.size(queryParam.getSize());
}
rq.source(sourceBuilder);

SearchResponse search = restHighLevelClient.search(rq, RequestOptions.DEFAULT);
SearchHit[] searchHits = search.getHits().getHits();
ReportTextQueryResultVo vo = new ReportTextQueryResultVo();
List<ReportTextVo> reportTextVos = Lists.newArrayList();
int wordHitCount = 0;
for (SearchHit hit : searchHits) {
    ReportTextRecord record = JSON.parseObject(hit.getSourceAsString(), ReportTextRecord.class);
    wordHitCount += StringUtils.countMatches(record.getText(), queryParam.getQueryWord());
    if (highlightText) {
        Map<String, HighlightField> highlightFields = hit.getHighlightFields();
        if (highlightFields != null) {
            HighlightField highlightField = highlightFields.get("text");
            if (highlightField != null && ArrayUtils.isNotEmpty(highlightField.getFragments())) {
                StringBuilder sb = new StringBuilder();
                for (Text text : highlightField.getFragments()) {
                    sb.append(text.string());
                }
                // 需要带高亮标签，所以需要把结果中的高亮文本全部取出来拼接一下
                record.setText(sb.toString());
            }
        }
    }
    ReportTextVo reportTextVo = ReportTextVo.convertFromRecord(record);
    reportTextVos.add(reportTextVo);
    log.info("bingo test,id={},score={}", record, hit.getScore());
}

CountRequest countRequest = new CountRequest("report_text_*");
countRequest.query(boolQueryBuilder);
CountResponse countResp = restHighLevelClient.count(countRequest, RequestOptions.DEFAULT);

vo.setReportTexts(reportTextVos);
vo.setTextHitCount(countResp.getCount());
vo.setWordHitCount(wordHitCount);
vo.setCurrent(queryParam.getCurrent());
vo.setSize(queryParam.getSize());