ES 大数据量按照日期分索引存储和查询

苍煜

已于 2024-05-18 14:35:26 修改

阅读量1.3k

点赞数 4

分类专栏： # Elasticsearch 文章标签： elasticsearch 数据库 java

于 2024-04-09 16:22:02 首次发布

本文链接：https://blog.csdn.net/qq_41694906/article/details/137543505

版权

Elasticsearch 专栏收录该内容

14 篇文章 4 订阅

订阅专栏

前言

ES 通常被用来存放各种日志数据或其他大批量数据，然后做统计。
对于请求量大的系统来说，日志或其他业务数据无比庞大，需要按日期来划分索引，便于做冷热数据的迁移管理。大批量的业务数据也要根据日期来区分，提高检索效率。
假设，日志数据按每日来分索引存储，索引名字格式：system_log_20240408，system_log_20240409 等来切分。查询时，使用别名system_log 或者用多个索引联合查询。

DSL 语句验证

创建索引模板

创建索引模板，索引模板主要是用来创建索引默认属性mapping和其他设置settings,以及设置模板索引规则，方便后续增加每天对应的索引

PUT _template/system_log
{
  "order": 0,		// 模板的权重, 多个模板的时候优先匹配用, 值越大, 权重越高
  "index_patterns": ["system_log_*"],	// 创建索引时，索引名称以这个为前缀时，默认使用此模板
  "settings": {
    "number_of_replicas": "1",	// 副本数量
    "number_of_shards": "1"	// 分片数量
  },
  "mappings":{
    "dynamic":"false",	//true 是默认值，自动添加新出现的字段到 mapping 中。false，不添加新出现的字段到 mapping 中，但可以在 doc 中保存新字段。"strict" 不允许出现新字段，会报错。其中嵌套结构内部支持单独配置。
    "properties": {		// 字段的映射
        "test_keyword_File": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
        "text": {
            "type": "text"
        },
        "keywordFile": {
            "type": "keyword"
        },
        "longFile": {
            "type": "long"
        },
        "date": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        },
        "booleanFile": {
            "type": "boolean"
        }
    }
  },
  "aliases": {
    "system_log": {}	// 创建索引时指定的别名，很重要
  }
}

查看模板

GET _template/system_log

在这里插入图片描述

修改模板

再次put一次，就可以完成自动修改

PUT _template/system_log
{
  "order": 0,		// 模板的权重, 多个模板的时候优先匹配用, 值越大, 权重越高
  "index_patterns": ["system_log_*"],	// 创建索引时，索引名称以这个为前缀时，默认使用此模板
  "settings": {
    "number_of_replicas": "1",	// 副本数量
    "number_of_shards": "1"	// 分片数量
  },
  "mappings":{
    "dynamic":"false",	//true 是默认值，自动添加新出现的字段到 mapping 中。false，不添加新出现的字段到 mapping 中，但可以在 doc 中保存新字段。"strict" 不允许出现新字段，会报错。其中嵌套结构内部支持单独配置。
    "properties": {		// 字段的映射
        "test_keyword_File": {
            "type": "text",
            "fields": {
                "keyword": {
                    "type": "keyword",
                    "ignore_above": 256
                }
            }
        },
        "text": {
            "type": "text"
        },
        "keywordFile": {
            "type": "keyword"
        },
        "longFile": {
            "type": "long"
        },
        "date": {
            "type": "date",
            "format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
        },
        "booleanFile": {
            "type": "boolean"
        }
    }
  },
  "aliases": {
    "system_log": {}	// 创建索引时指定的别名，很重要
  }
}

删除模板

DELETE _template/system_log	// 删除模板

插入数据会自动添加索引

这里我们指定的索引名称system_log_20240408，无此索引时会自动创建索引，创建索引时发现是以system_log_为前缀会默认使用上面的模板创建。所以索引system_log_20240408指向的别名是system_log

//增加system_log_20240408 索引，并增加一条数据
POST /system_log_20240408/_doc
{
   "test_keyword_File":"filename",
   "text":"text1",
   "keywordFile":"keywordFile",
   "date":"2024-04-08"
}

//增加system_log_20240408 索引，并增加一条数据
POST /system_log_20240409/_doc
{
   "test_keyword_File":"filename",
   "text":"text1",
   "keywordFile":"keywordFile",
   "date":"2024-04-09"
}

查看system_log_20240408 的别名

GET system_log_20240408/_alias

在这里插入图片描述

多索引数据检索

整体别名检索

GET system_log/_search

在这里插入图片描述

多个索引，用逗号隔开

GET system_log_20240408,system_log_20240409/_search

在这里插入图片描述

索引名模糊匹配

GET system_log*/_search

在这里插入图片描述

java 验证

 @Scheduled(cron = "0 0 0 * * *") // 每天0点创建新的索引
    public void test1(){
        SimpleDateFormat sdf = new SimpleDateFormat("yyyyMMdd");
        String indexName = "system_log_" + sdf.format(new Date());

        //创建索引对象
        CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName);

        //发送请求
        CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT);

        boolean acknowledged = createIndexResponse.isAcknowledged();
        System.out.println("索引操作："+acknowledged);
        if (acknowledged) {
            System.out.println("Index " + indexName + " created successfully.");
        } else {
            System.out.println("Index " + indexName + " creation failed.");
        }


    }

插入数据和查询数据

详见这篇文章：
Elasticsearch-03-JavaApi以及springboot中操作-RestHighLevelClient

java代码多索引查询，用逗号分隔

public void test1(){
        //创建搜索对象
        SearchRequest searchRequest = new SearchRequest();
        searchRequest.indices("system_log_20240408","system_log_20240409");

        //构建查询的请求体
        SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
        //查询所有数据-查询全部
        searchSourceBuilder.query(QueryBuilders.matchAllQuery());

        searchRequest.source(searchSourceBuilder);
        //发送请求
        SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);

        //分析结果
        SearchHits hits = search.getHits();
        for (SearchHit hit : hits) {
            System.out.println(hit.getSourceAsString());
        }


    }

苍煜

关注

4
点赞
踩
5

收藏

觉得还不错? 一键收藏
打赏
0
评论
ES 大数据量按照日期分索引存储和查询

ES 通常被用来存放各种日志数据或其他大批量数据，然后做统计。对于请求量大的系统来说，日志或其他业务数据无比庞大，需要按日期来划分索引，便于做冷热数据的迁移管理。大批量的业务数据也要根据日期来区分，提高检索效率。假设，日志数据按每日来分索引存储，索引名字格式：system_log_20240408，system_log_20240409 等来切分。查询时，使用别名system_log 或者用多个索引联合查询。
复制链接

扫一扫