Elasticsearch集群内如何使用reindex迁移索引,完成分片的拆分

1、删除我的测试索引:old_index

curl -X DELETE "http://`hostname -i`:9200/old_index"
curl -X DELETE "http://`hostname -i`:9200/new_index"

2、检查集群索引情况

$ curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb

3、新建测试索引:old_index

# 注释
# 1、我只有一个节点,为了测试方便,副本 number_of_replicas 设置为0
# 2、假设我的源索引分片为1,number_of_shards设置为1,用于后续对比验证
curl -X PUT "http://`hostname -i`:9200/old_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 0
  }
}'
# 返回结果,代表索引创建成功
{"acknowledged":true,"shards_acknowledged":true,"index":"old_index"}

4、在old_index索引中插入几条测试数据

curl -X POST "http://`hostname -i`:9200/old_index/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_index": "old_index", "_id": "1" } }
{ "name": "可乐", "description": "大数据SRE工程师", "publish_date": "1991-05-20" }
{ "index": { "_index": "old_index", "_id": "2" } }
{ "name": "炎长", "description": "DBA工程师", "publish_date": "1992-11-23" }
'

# 返回结果
{
	"took": 6,
	"errors": false,
	"items": [{
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 0,
			"_primary_term": 1,
			"status": 201
		}
	}, {
		"index": {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_version": 1,
			"result": "created",
			"_shards": {
				"total": 1,
				"successful": 1,
				"failed": 0
			},
			"_seq_no": 1,
			"_primary_term": 1,
			"status": 201
		}
	}]
}

5、查询old_index索引中的数据

curl -X GET "http://`hostname -i`:9200/old_index/_search"

# 查询结果
{
	"took": 7,
	"timed_out": false,
	"_shards": {
		"total": 1,
		"successful": 1,
		"skipped": 0,
		"failed": 0
	},
	"hits": {
		"total": {
			"value": 2,
			"relation": "eq"
		},
		"max_score": 1.0,
		"hits": [{
			"_index": "old_index",
			"_type": "_doc",
			"_id": "1",
			"_score": 1.0,
			"_source": {
				"name": "可乐",
				"description": "大数据SRE工程师",
				"publish_date": "1991-05-20"
			}
		}, {
			"_index": "old_index",
			"_type": "_doc",
			"_id": "2",
			"_score": 1.0,
			"_source": {
				"name": "炎长",
				"description": "DBA工程师",
				"publish_date": "1992-11-23"
			}
		}]
	}
}

6、新建目标索引:new_index

# 注释
# 1、本次将分片设置为2,是为了模拟reindex拆封分片的功能
# 2、建议将目标索引副本设置为0,没有副本,目标索引写入速度会变快,reindex任务执行相应比有部分的写入速度快。reindex结束后,可以根据需要,重新设置副本。

curl -X PUT "http://`hostname -i`:9200/new_index" -H 'Content-Type: application/json' -d'
{
  "mappings": {
    "properties": {
      "name": { "type": "text" },
      "description": { "type": "text" },
      "publish_date": { "type": "date" }
    }
  },
  "settings": {
    "number_of_shards": 2,
    "number_of_replicas": 0
  }
}'

# 返回结果
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}

7、检查两个索引的数据情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        GrJiGswYRqCibszGIVjZhg   2   0          0            0       454b           454b
green  open   old_index        8k4beb7ETpu6Ki-LpOu_EQ   1   0          2            0        4kb            4kb

8、测试reindex将源索引:old_index中的数据迁移到目标索引:new_index

curl -X POST "http://`hostname -i`:9200/_reindex" -H 'Content-Type: application/json' -d'
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}
'

# 返回结果,创建成功
{"took":8,"timed_out":false,"total":2,"updated":0,"created":2,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}

9、检查索引的迁移进度

# 数据量太小,执行时间可能比较快,查看不到reindex的任务情况

curl -X GET "http://`hostname -i`:9200/_tasks?detailed=true&actions=*reindex&human=true"

10、再次检查集群两个索引的情况

curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index            uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   .geoip_databases ib6tlhzjTf-MQBu-XGIVWg   1   0         33            0     31.1mb         31.1mb
green  open   new_index        aU3mztzXRXOSk9Q1oiP2RA   1   0          2            0      4.4kb          4.4kb
green  open   old_index        g24b-XDfQZ6BO5zdcIOM0A   1   0          2            0      4.4kb          4.4kb

总结

根据实际的生产场景,reindex只适合在两个集群间进行索引迁移,如果在集群内拆分索引分片使用,会产生很大的性能问题,不建议集群内部使用reindex。reindex的逻辑是先查询,这会消耗旧索引节点的读IO,再写入到目标索引,这会消耗新索引节点的写IO,如果读写的任务都在一个节点上,那读写压力就会集中,IO,内存和CPU都有可能成为集群的瓶颈点。如果reindex任务比较多比较大,那将是es集群的一个灾难。建议最好的方式是将索引迁移至新的es集群中,这样源集群只会涉及到查询,影响最小,新集群刚开始一般无业务压力,写入不会增加太大的负担。

https://mp.weixin.qq.com/s?__biz=MzA5MjkyNjU5MQ==&mid=2247484835&idx=1&sn=84ca8ce4c2c41c63ec9cf57fc609fb91&chksm=9064e2b3a7136ba5aa62f98024fac5b613b80d510b500b323a53a4c3ed0eed0d2fda5589fe3d#rd

  • 9
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值