Elasticsearch集群内如何使用reindex迁移索引，完成分片的拆分

可乐大数据

已于 2024-05-26 12:54:39 修改

阅读量398

点赞数 9

分类专栏： Elasticsearch运维实践文章标签： elasticsearch 大数据搜索引擎

于 2024-05-21 19:00:00 首次发布

本文链接：https://blog.csdn.net/qq_43005694/article/details/139087502

版权

Elasticsearch运维实践专栏收录该内容

15 篇文章 5 订阅

订阅专栏

1、删除我的测试索引：old_index

curl -X DELETE "http://`hostname -i`:9200/old_index"
curl -X DELETE "http://`hostname -i`:9200/new_index"

2、检查集群索引情况

$ curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb

3、新建测试索引：old_index

# 注释
# 1、我只有一个节点，为了测试方便，副本 number_of_replicas 设置为0
# 2、假设我的源索引分片为1，number_of_shards设置为1，用于后续对比验证
curl -X PUT "http://`hostname -i`:9200/old_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}'
# 返回结果，代表索引创建成功
{"acknowledged":true,"shards_acknowledged":true,"index":"old_index"}

4、在old_index索引中插入几条测试数据

curl -X POST "http://`hostname -i`:9200/old_index/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_index": "old_index", "_id": "1" } }
{ "name": "可乐", "description": "大数据SRE工程师", "publish_date": "1991-05-20" }
{ "index": { "_index": "old_index", "_id": "2" } }
{ "name": "炎长", "description": "DBA工程师", "publish_date": "1992-11-23" }
'

# 返回结果
{
	"took": 6,
	"errors": false,
	"items": [{
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 0,
			"_primary_term": 1,
			"status": 201
		}
	}, {
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 1,
			"_primary_term": 1,
			"status": 201
		}
	}]
}

5、查询old_index索引中的数据

curl -X GET "http://`hostname -i`:9200/old_index/_search"

# 查询结果
{
	"took": 7,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 2,
			"relation": "eq"
		},
		"max_score": 1.0,
		"hits": [{
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_score": 1.0,
			"_source": {
				"name": "可乐",
				"description": "大数据SRE工程师",
				"publish_date": "1991-05-20"
			}
		}, {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_score": 1.0,
			"_source": {
				"name": "炎长",
				"description": "DBA工程师",
				"publish_date": "1992-11-23"
			}
		}]
	}
}

6、新建目标索引：new_index

# 注释
# 1、本次将分片设置为2，是为了模拟reindex拆封分片的功能
# 2、建议将目标索引副本设置为0，没有副本，目标索引写入速度会变快，reindex任务执行相应比有部分的写入速度快。reindex结束后，可以根据需要，重新设置副本。

curl -X PUT "http://`hostname -i`:9200/new_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}'

# 返回结果
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}

7、检查两个索引的数据情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        GrJiGswYRqCibszGIVjZhg   2   0          0            0       454b           454b
green  open   old_index        8k4beb7ETpu6Ki-LpOu_EQ   1   0          2            0        4kb            4kb

8、测试reindex将源索引：old_index中的数据迁移到目标索引：new_index

curl -X POST "http://`hostname -i`:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

# 返回结果，创建成功
{"took":8,"timed_out":false,"total":2,"updated":0,"created":2,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

9、检查索引的迁移进度

# 数据量太小，执行时间可能比较快，查看不到reindex的任务情况

curl -X GET "http://`hostname -i`:9200/_tasks?detailed=true&actions=*reindex&human=true"

10、再次检查集群两个索引的情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        aU3mztzXRXOSk9Q1oiP2RA   1   0          2            0      4.4kb          4.4kb
green  open   old_index        g24b-XDfQZ6BO5zdcIOM0A   1   0          2            0      4.4kb          4.4kb

总结

根据实际的生产场景，reindex只适合在两个集群间进行索引迁移，如果在集群内拆分索引分片使用，会产生很大的性能问题，不建议集群内部使用reindex。reindex的逻辑是先查询，这会消耗旧索引节点的读IO，再写入到目标索引，这会消耗新索引节点的写IO，如果读写的任务都在一个节点上，那读写压力就会集中，IO，内存和CPU都有可能成为集群的瓶颈点。如果reindex任务比较多比较大，那将是es集群的一个灾难。建议最好的方式是将索引迁移至新的es集群中，这样源集群只会涉及到查询，影响最小，新集群刚开始一般无业务压力，写入不会增加太大的负担。

https://mp.weixin.qq.com/s?__biz=MzA5MjkyNjU5MQ==&mid=2247484835&idx=1&sn=84ca8ce4c2c41c63ec9cf57fc609fb91&chksm=9064e2b3a7136ba5aa62f98024fac5b613b80d510b500b323a53a4c3ed0eed0d2fda5589fe3d#rd

可乐大数据

关注

9
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch集群内如何使用reindex迁移索引，完成分片的拆分

根据实际的生产场景，reindex对源集群性能带来的影响非常大，不建议这样使用。reindex的逻辑是先查询，再写入，一次全量的查询和持续的写入，想想就知道对源集群有多大的压力。如果你的磁盘性能又特别差，集群负载本身就比较高，那你完蛋了。建议最好的方式是将索引迁移至新的es集群中，这样源集群只会涉及到查询，影响最小，新集群刚开始无业务压力，写入不会增加太大的负担。
复制链接

扫一扫