目录
12、nested 聚合(对内部字段聚合之后,对外部字段聚合)
13、ES/ElasticSearch 聚合查询时报错:too_many_buckets_exception
官方文档
https://www.elastic.co/guide/cn/elasticsearch/guide/current/full-body-search.html
pom文件
<!-- elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
elasticsearchTemplate
BoolQueryBuilder 当多条件查询的时候可以用它来做拼接,它的should和must相当于mysql中的or和and
termQueryBuilder 有参构造的参数一:字段名,参数二:值查询,表示查询满足该字段的值的文档
MatchQueryBuilder 有参构造的参数一:字段名,参数二:将用户输入的关键字进行分词然后再去查询
1、termQuery对象进行字符的精确匹配查询
es的termQuery对象构造查询语句,精确查询 type= “bird” 的鸟类信息
QueryBuilders.termQuery("type", "bird");
相当于sql语句:
select * from biological where type = 'bird';
2、boolQuery查询
构造boolQuery的对象,在boolQuery对象里面添加逻辑判断条件。
boolquery嵌套的条件有以下类型:
(1) must: 条件必须满足,相当于 and
(2) should: 条件可以满足也可以不满足,相当于 or
(3) must_not: 条件不需要满足,相当于 not
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.should(QueryBuilders.termQuery("type", "bird"));
boolQuery.should(QueryBuilders.termQuery("type", "plant"));
boolQuery.must(QueryBuilders.termQuery("name", "demo"));
相当于sql语句:
select * from biological where (type = 'bird' OR type = 'plant') AND (name = 'demo');
wildcardQuery 相当于 like
boolQuery.must(QueryBuilders.wildcardQuery("scientificname",searchMessage+"*"));
PS:数据有空格或者符号,查询会失效
3、嵌套查询
sql语句:
select * from biological where (type = 'bird' AND name = 'test') OR (name = 'demo');
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.should(
boolQuery.must.QueryBuilders.termQuery("type", "bird")
.must.QueryBuilders.termQuery("name", "test"));
boolQuery.should(QueryBuilders.termQuery("name", "demo"));
4、matchQuery用于文本类型字段的搜索
matchQuery会将搜索条件按照标准分词器的规则分词,分完词之后分别搜索匹配项。
public Page<NameDataList> NameDataList(String typeId, String searchMessage, int offset, HttpServletRequest request) {
TaxondataQuery query = new TaxondataQuery();
query.setPage(offset/10);
query.setQueryString(searchMessage);
// 复合查询
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.termQuery("type", typeId));
//boolQuery.filter(QueryBuilders.rangeQuery("pageSize"));
// 以下为查询条件, 使用 must query 进行查询组合
MultiMatchQueryBuilder matchQuery = QueryBuilders.multiMatchQuery(query.getQueryString(), "scientificname",
"chinesename");
boolQuery.must(matchQuery);
PageRequest pageRequest = PageRequest.of(query.getPage(), query.getSize());
NativeSearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(boolQuery)
.withHighlightFields(
new HighlightBuilder.Field("scientificname"),
new HighlightBuilder.Field("chinesename"))
.withPageable(pageRequest).build();
Page<NameDataList> NameDataLists = elasticsearchTemplate.queryForPage(searchQuery, NameDataList.class, extResultMapper);
return NameDataLists;
}
5、query与filter
query 四种子句:must,filter,should,mustNot
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
boolQuery.must(QueryBuilders.termQuery("chinesename", "云雀"));
SearchQuery searchQuery = new NativeSearchQueryBuilder()
.withQuery(boolQuery)
.build();
List<NameDataList> NameDataLists = elasticsearchTemplate.queryForList(searchQuery, NameDataList.class);
System.out.println(NameDataLists.toString());
filter比query快:
query的时候,会先比较查询条件,然后计算分值,最后返回文档结果;
而filter则是先判断是否满足查询条件,如果不满足,会缓存查询过程(记录该文档不满足结果);
满足的话,就直接缓存结果
综上所述,filter快在两个方面:
1.对结果进行缓存
2.避免计算分值
相关学习:吃透 | Elasticsearch filter和query的不同_铭毅天下(公众号同名)-CSDN博客_elasticsearch filter和query区别
6、es局部更新数据(Kibana)
POST 索引名称/_doc/文档ID/_update
{
"doc":{
"source_id" : 1369907879588814852
}
}
7、批量新增
PUT 索引名称/_bulk?refresh
{"index":{"_id": "1"}}
{"it":1627959130532,"larea" : ["其它"]}
public class EsTest {
public void saveAll() {
BulkRequest bulkRequest = new BulkRequest();
IndexRequest indexRequest = new IndexRequest("INDEX_NAME").id("自定义ID")
.source(GSON.toJson("doc"), XContentType.JSON);
bulkRequest.add(indexRequest);
try {
if (bulkRequest.numberOfActions() > 0) {
log.info(key + ":数据保存到ES数量:{}", bulkRequest.numberOfActions());
//刷新策略需要修改
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
BulkResponse response = restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT);
if (response.hasFailures()) {
log.error(key + "ES保存失败:{}", response.buildFailureMessage());
}
}
} catch (Exception e) {
log.error("ES保存失败异常:{}", e);
}
}
public void saveAsync() {
BulkRequest bulkRequest = new BulkRequest();
IndexRequest indexRequest = new IndexRequest("INDEX_NAME").id("自定义ID")
.source(GSON.toJson("doc"), XContentType.JSON);
bulkRequest.add(indexRequest);
//刷新策略需要修改
bulkRequest.setRefreshPolicy(WriteRequest.RefreshPolicy.IMMEDIATE);
restHighLevelClient.bulkAsync(bulkRequest, RequestOptions.DEFAULT, new ActionListener<BulkResponse>() {
@Override
public void onResponse(BulkResponse bulkItemResponses) {
for (BulkItemResponse bulkItemResponse : bulkItemResponses) {
if (bulkItemResponse.isFailed()) {//判断当前操作是否失败
//获取失败对象,拿到了failure对象,想怎么玩都行
BulkItemResponse.Failure failure = bulkItemResponse.getFailure();
log.error("[dblog] 向elasticsearch插入数据失败:",failure.getCause());
if(failure.getStatus() == RestStatus.BAD_REQUEST) {
log.error("[dblog] id=" + bulkItemResponse.getId() + "为非法的请求!");
continue;
}
}
}
log.info("INDEX ELASTICSEARCH SUCCESS BULK SIZE: {}", bulkItemResponses.getItems().length);
}
@Override
public void onFailure(Exception e) {
log.error("INDEX ELASTICSEARCH ERROR: {}", e);
}
});
}
}
8、批量更新
es版本是5.5前,script中是inline字段
es版本是6.5后,script中是source字段
脚本:
语言 | 沙盒 | 所需插件 |
---|---|---|
painless | yes | 内置 |
groovy | no | 内置 |
javascript | no | lang-javascript |
python | no | lang-javascript |
语言 | 沙盒 | 所需插件 |
关键字”script”: 标志以脚本的方式修改文档
“lang”: 表示以何种脚本语言进行修改, “painless”表示以es内置的脚本语言进行修改. 此外es还支持多种脚本语言, 如Python, js等等
“inline”:指定脚本内容 “ctx”代表es上下文, _source 代表文档
POST /索引名称/_update_by_query
{
"query": {
"match": {
"cont": "Hong Kong"
}
},
"script": {
"lang": "painless",
"source": "ctx._source.msg_id = params.msg_id",
"params": {
"msg_id": "fe01ce2a7fbac8fafaed7c982a04e229"
}
}
}
相关链接:https://blog.csdn.net/qq330983778/article/details/103539418
9、删除索引部分数据
POST 索引名称/_delete_by_query
{
"query": {
"range": {
"insert_time": {
"lte": "now-15d"
}
}
}
}
删除索引全部数据
POST 索引名称/_delete_by_query
{
"query": {
"match_all": {
}
}
}
10、给已存在的索引新增字段
PUT /索引名称/_mapping/
{
"properties": {
"media_id": {
"type": "keyword",
"ignore_above": 256
},
"media_name": {
"type": "keyword",
"ignore_above": 256
}
}
}
11、查询某个字段长度大于多少
使用 filter 过滤
GET /索引名称/_search
{
"_source": "cont",
"size": 100,
"query": {
"bool": {
"must": [
{
"term": {
"_ch": {
"value": "24"
}
}
},
{
"term": {
"lang": {
"value": "en"
}
}
}
],
"filter": {
"regexp": {
"cont": {
"value": ".{100,}"
}
}
}
}
}
}
12、nested 聚合(对内部字段聚合之后,对外部字段聚合)
{
"size": 0,
"aggregations": {
"aggByNest": {
"nested": {
"path": "nested_name"
},
"aggregations": {
"termsAgg": {
"terms": {
"field": "nested_name.uid",
"size": 10
},
"aggregations": {
"reverse_path": {
"reverse_nested": {},
"aggregations": {
"cardinalityAgg": {
"cardinality": {
"field": "title.keyword"
}
}
}
}
}
}
}
}
}
}
Java代码:
NestedAggregationBuilder aggByNest = AggregationBuilders
.nested(CommonConstant.AGG_NEST, CommonConstant.ES_NEST)
.subAggregation(AggregationBuilders.terms(CommonConstant.AGG_TERMS)
.size(10).field(CommonConstant.ES_NEST_NAME)
.subAggregation(AggregationBuilders.reverseNested(CommonConstant.AGG_REVERSE_PATH)
.subAggregation(cardinalityAggregationBuilder(CommonConstant.AGG_CARDINALITY,CommonConstant.ES_TITLE))));
13、ES/ElasticSearch 聚合查询时报错:too_many_buckets_exception
原因:在做es的聚合查询时,当数据基数非常大,或者查询语句不合理会导致es的查询很吃力,甚至出现以下错误。但有时候确实需要这么查询,这个时候需要修改max_buckets的阈值。
解决方案
1、setting里设置:search.max_buckets ,设置大一点够用就行
PUT /_cluster/settings
{"persistent": {"search.max_buckets": 200000}}
2、或者增加查询条件避免过多的数据聚合查询(比如增加开始结束时间等)
14、setting
"settings" : {
"index" : {
"max_result_window" : "100000",
"refresh_interval" : "10s",
"number_of_shards" : "5",
"translog" : {
"flush_threshold_size" : "1024mb",
"sync_interval" : "30s",
"durability" : "async"
},
"number_of_replicas" : "1"
}
}
15、新增或删除别名
es没有修改别名的操作,只能先删除后添加
1、新增别名
POST _aliases
{
"actions": [
{
"add": {
"index": "index_name",
"alias": "index_read"
}
}
]
}
2、删除别名
POST _aliases
{
"actions": [
{
"remove": {
"index": "index_name",
"alias": "index_read"
}
}
]
}
3、is_write_index修改为false
POST _aliases
{
"actions": [
{
"add": {
"index": "index_name",
"alias": "index_write",
"is_write_index": false
}
}
]
}
16、ES高亮查询
public class EsTest {
private final static String htmlMarkFirst = "<span style=\"color:red;\">";
private final static String htmlMarkLast = "</span>";
private SearchSourceBuilder queryBuilderByKeyword(SearchCondition condition) {
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
final BoolQueryBuilder queryMustBuilder = QueryBuilders.boolQuery();
String mcObjectCode = condition.getMcObjectCode();
String keyword = condition.getKeyword();
if (StrUtil.isNotBlank(mcObjectCode)) {
String mustKey = keyword.trim().replaceAll(" +"," OR ");
Index index = Index.valueOf(mcObjectCode);
HighlightBuilder highlightBuilder = new HighlightBuilder();
queryMustBuilder.should(QueryBuilders.queryStringQuery(mustKey).field("author_name"));
queryMustBuilder.should(QueryBuilders.queryStringQuery(mustKey).field("author_desc"));
highlightBuilder.field("author_name").field("author_desc")
.preTags(htmlMarkFirst).postTags(htmlMarkLast);
}
searchSourceBuilder.query(queryMustBuilder).highlighter(highlightBuilder);
// log.info("request builder: {}", searchSourceBuilder.toString());
return searchSourceBuilder;
}
public SearchHit[] query(SearchCondition condition) {
try {
SearchRequest searchRequest = new SearchRequest(CommonConstant.INDEX_NAME).source(
queryBuilderByKeyword(condition));
SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
return search.getHits().getHits();
} catch (Exception e) {
log.error("SEARCH CONDITION: {} ELASTICSEARCH ERROR: {}", condition, e);
return null;
}
}
public static void main(String[] args) {
SearchHit[] searchHits = elasticsearchQuery.query(condition);
for (SearchHit hit : hits) {
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
if (ObjectUtil.isNotEmpty(highlightFields.get("author_name"))) {
hit.getSourceAsMap().put("author_name",
highlightFields.get("author_name").getFragments()[0].toString());
}
if (ObjectUtil.isNotEmpty(highlightFields.get("author_desc"))) {
hit.getSourceAsMap().put("author_desc",
highlightFields.get("author_desc").getFragments()[0].toString());
}
}
}
}
17、索引模板
PUT %3Cdemo-%7Bnow%2Fd%7D-000001%3E
{
"aliases": {
"demo_write": {
"is_write_index": true
}
}
}
PUT /_template/demo_template
{
"index_patterns": [
"demo-*"
],
"aliases": {
"demo_read": {}
},
"settings": {
"index": {
"max_result_window": "100000",
"refresh_interval": "5s",
"number_of_shards": "5",
"translog": {
"flush_threshold_size": "1024mb",
"sync_interval": "30s",
"durability": "async"
},
"number_of_replicas": "1"
}
},
"mappings": {
"properties": {
"demo_url": {
"type": "keyword"
},
"demo_id": {
"type": "keyword"
},
"demo_type": {
"type": "short"
},
"pt": {
"type": "date"
},
"labels": {
"type": "nested",
"properties": {
"user_id": {
"ignore_above": 256,
"type": "keyword"
},
"score": {
"index": false,
"store": false,
"type": "float",
"doc_values": false
}
}
}
}
}
}
18、routing
玩转Elasticsearch routing功能 - Elastic 中文社区
19、top_hits聚合(聚合后获取数据详情)
{
"from": 0,
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"title": "十一"
}
}
]
}
},
"aggregations": {
"termsAgg": {
"terms": {
"field": "title.keyword",
"size": 50,
"min_doc_count": 1,
"shard_min_doc_count": 0,
"show_term_doc_count_error": false
},
"aggregations": {
"topHitsAgg": {
"top_hits": {
"from": 0,
"size": 1,
"version": false,
"seq_no_primary_term": false,
"explain": false,
"sort": [
{
"hot_value": {
"order": "desc"
}
}
]
}
}
}
}
}
}
Java代码:
BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("title", condition.getTitle()));
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder().query(boolQueryBuilder);
TermsAggregationBuilder termsAggregationBuilder = termsAggregationBuilder("termsAgg",
"title.keyword", 50).
.subAggregation(topHitsAggregationBuilder("topHitsAgg", 1)));
SearchSourceBuilder aggregation = searchSourceBuilder.aggregation(termsAggregationBuilder);
SearchRequest searchRequest = searchRequestBuilder(aggregation);
try {
SearchResponse search = client.search(searchRequest, RequestOptions.DEFAULT);
ParsedTerms parsedTermsPlatform = search.getAggregations().get("termsAgg");
List<TrendingResponse> collect = parsedTermsPlatform.getBuckets().stream().map(p -> {
ParsedTopHits parsedTopHits = p.getAggregations().get("topHitsAgg");
SearchHit[] hits = parsedTopHits.getHits().getHits();
String id = hits[0].getId();
Map<String, Object> sourceAsMap = hits[0].getSourceAsMap();
sourceAsMap.put("id", id);
return mapper.map(sourceAsMap, TrendingResponse.class);
}).collect(Collectors.toList());
responseList.addAll(collect1);
return responseList;
} catch (IOException e) {
log.error("SEARCH IS ERROR: ", e);
return Response.error("查询失败");
}
public TermsAggregationBuilder termsAggregationBuilder(String termsAggName, String fieldName, Integer size) {
return AggregationBuilders.terms(termsAggName).field(fieldName).size(size).executionHint("map");
}
public TopHitsAggregationBuilder topHitsAggregationBuilder(String topHitsName, Integer size) {
return AggregationBuilders.topHits(topHitsName).sort("time", SortOrder.DESC).size(size);
}
20、elasticsearch-dump 迁移es数据
21、_reindex旧索引数据转移到新索引
POST _reindex
{
"source": {
"index": "test_data_1"
},
"dest": {
"index": "test_data_2"
}
}
相关推荐:
es聚合优化: