情景再现:
需要操作ES数据库来根据时间段以及时刻段和其它条件来查询虚拟机以及宿主机的Cpu以及内存性能数据的平均值。
前提:
- ES库中只有timestamp时间戳列
- 需要根据虚拟机ID进行分组聚合取到平均值
解决方法:
- 先查出时间段内的数据,再根据得到的数据进行解析获得小时数据,从而进行过滤
- 修改es库添加小时(hour)字段,使用painless脚本将时间戳timestamp列进行计算得到的小时值直接填充到hour字段中.
详细过程
相比较而言,笔者认为第一种如果数据量过大的话,可能会降低查询效率,所以笔者选择方法2,首先对操作对应缩影,添加hour字段,类型为integer.
这是metric_server 索引的原有mapping
{
"metric_server" : {
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
}
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"zone" : {
"type" : "keyword"
}
}
}
}
}
注意 ! ! !
我们都知道es索引库一旦创建,无法修改mapping,但是可以 添加属性字段 ,所以操作es数据库的时候一定要谨慎.
添加hour字段
PUT /metric_server/_mapping
{
"properties": {
"hour":{
"type": "integer"
}
}
}
此时查看metric_server的mapping:
GET metric_server/_mapping
{
"metric_server" : {
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
},
"hour" : {
"type" : "integer"
},
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"zone" : {
"type" : "keyword"
}
}
}
}
}
添加完hour字段之后如果hour字段没有值的话,查询某条文档是没有hour字段的,例如:
GET /metric_server/_doc/_YzDeYYBydXk0jZNg3XC
{
"_index": "metric_server",
"_type": "_doc",
"_id": "_YzDeYYBydXk0jZNg3XC",
"_version": 9,
"_seq_no": 825268,
"_primary_term": 1,
"found": true,
"_source": {
"cpuUsage": 0.6,
"resourceId": "***********",
"memoryUsage": 36.79,
"diskProvisioned": 0,
"diskIoKbps": 0,
"accountId": "***********",
"zone": "***********",
"name": "***********",
"diskUsage": 5,
"region": "***********",
"netIoKbps": *******,
"timestamp": 1677080101589
}
}
小插曲
为了展示hour字段效果,先对文档id为_YzDeYYBydXk0jZNg3XC
的数据插入一个hour数据,此时笔者插入数据时误将hour写成了 houe , 这会导致 当前操作的索引metric_server 的mapping 改变:
{
"metric_server" : {
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
}
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"zone" : {
"type" : "keyword"
}
}
}
}
}
注意 ! ! !
我们都知道es索引库一旦创建,无法修改mapping,但是可以 添加属性字段 ,所以操作es数据库的时候一定要谨慎.
添加hour字段
PUT /metric_server/_mapping
{
"properties": {
"hour":{
"type": "integer"
}
}
}
此时查看metric_server的mapping 发现 多了一个houe属性:
GET metric_server/_mapping
{
"metric_server" : {
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
},
"houe" : {
"type" : "long"
},
"hour" : {
"type" : "integer"
},
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"zone" : {
"type" : "keyword"
}
}
}
}
}
玩完,一旦
Elasticsearch 索引的 mapping 建立完成后,就无法直接修改已有的字段类型、分词器等属性。如果你需要修改已有的字段类型或分词器等属性,你需要重新创建一个新的索引,并将数据重新索引到新的索引中。这是因为 Elasticsearch 的索引是不可变的,一旦创建完成后,就无法直接修改。因此,在创建索引时,需要仔细考虑字段类型、分词器等属性,以确保索引的正确性和可扩展性。如果你需要修改索引的 mapping,可以考虑使用 Elasticsearch 的 Reindex API 将数据从旧索引重新索引到新索引中,以避免数据丢失和索引不一致的问题。
但是
可以间接的去修改mapping以致达成自己的需求.
方法:按照正确的mapping创建一个索引metric_server1,再使用painless脚本将原索引metric_server的数据复制到现索引metric_server1中, 删除之前的索引metric_server,再使用原索引名metric_server,正确mapping创建一个新索引metric_server,再将metric_server1数据复制到metric_server中即可.
第一步:
PUT /metric_server1
{
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
},
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"hour" : {
"type" : "integer"
},
"zone" : {
"type" : "keyword"
}
}
}
}
第二步:
POST _reindex
{
"source": {
"index": "metric_server"
},
"dest": {
"index": "metric_server1"
}
}
第三步:
DELETE /metric_server
第四步:
PUT /metric_server
{
"mappings" : {
"properties" : {
"accountId" : {
"type" : "keyword"
},
"cpuUsage" : {
"type" : "double"
},
"diskIoKbps" : {
"type" : "double"
},
"diskProvisioned" : {
"type" : "double"
},
"diskUsage" : {
"type" : "double"
},
"memoryUsage" : {
"type" : "double"
},
"name" : {
"type" : "keyword"
},
"netIoKbps" : {
"type" : "double"
},
"region" : {
"type" : "keyword"
},
"resourceId" : {
"type" : "keyword"
},
"timestamp" : {
"type" : "date",
"format" : "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
},
"hour" : {
"type" : "integer"
},
"zone" : {
"type" : "keyword"
}
}
}
}
第五步:
POST _reindex
{
"source": {
"index": "metric_server1"
},
"dest": {
"index": "metric_server"
}
}
第六步:
DELETE /metric_server1
索引正确创建后,使用painless脚本将timestamp计算成小时填充到hour列:
POST /metric_server/_update_by_query
{
"script": {
"source": "if(ctx._source.containsKey('timestamp')){ctx._source.hour = Instant.ofEpochMilli(ctx._source.timestamp).atZone(ZoneId.of('GMT+8')).getHour()}",
"lang": "painless"
},
"query": {
"match_all": {}
}
}
"query": {
"match_all": {}
} 笔者在此是查询所有文档,无论hour是否有值都进行更新,此刻可以根据自己的需求进行条件过滤
再次获取文档数据
{
"_index": "metric_server",
"_type": "_doc",
"_id": "_YzDeYYBydXk0jZNg3XC",
"_version": 9,
"_seq_no": 825268,
"_primary_term": 1,
"found": true,
"_source": {
"cpuUsage": 0.6,
"resourceId": ""***********",
"memoryUsage": 36.79,
"diskProvisioned": 0,
"diskIoKbps": 0,
"accountId": "***********",
"hour": 23,
"zone": ""***********",
"name": ""***********",
"diskUsage": 5,
"region": ""***********",
"netIoKbps": "********,
"timestamp": 1677080101589
}
}
此时hour数据已经更新到所有文档中了,下次只需要再更新数据时将hour填入即可.
以上是这个问题的解决方法,如有问题可私信或者评论区留言.