1、删除我的测试索引:old_index
curl -X DELETE "http://`hostname -i`:9200/old_index"
curl -X DELETE "http://`hostname -i`:9200/new_index"
2、检查集群索引情况
$ curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
3、新建测试索引:old_index
# 注释
# 1、我只有一个节点,为了测试方便,副本 number_of_replicas 设置为0
# 2、假设我的源索引分片为1,number_of_shards设置为1,用于后续对比验证
curl -X PUT "http://`hostname -i`:9200/old_index" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"name": { "type": "text" },
"description": { "type": "text" },
"publish_date": { "type": "date" }
}
},
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}'
# 返回结果,代表索引创建成功
{"acknowledged":true,"shards_acknowledged":true,"index":"old_index"}
4、在old_index索引中插入几条测试数据
curl -X POST "http://`hostname -i`:9200/old_index/_bulk" -H 'Content-Type: application/x-ndjson' --data-binary '
{ "index": { "_index": "old_index", "_id": "1" } }
{ "name": "可乐", "description": "大数据SRE工程师", "publish_date": "1991-05-20" }
{ "index": { "_index": "old_index", "_id": "2" } }
{ "name": "炎长", "description": "DBA工程师", "publish_date": "1992-11-23" }
'
# 返回结果
{
"took": 6,
"errors": false,
"items": [{
"index": {
"_index": "old_index",
"_type": "_doc",
"_id": "1",
"_version": 1,
"result": "created",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"_seq_no": 0,
"_primary_term": 1,
"status": 201
}
}, {
"index": {
"_index": "old_index",
"_type": "_doc",
"_id": "2",
"_version": 1,
"result": "created",
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"_seq_no": 1,
"_primary_term": 1,
"status": 201
}
}]
}
5、查询old_index索引中的数据
curl -X GET "http://`hostname -i`:9200/old_index/_search"
# 查询结果
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 2,
"relation": "eq"
},
"max_score": 1.0,
"hits": [{
"_index": "old_index",
"_type": "_doc",
"_id": "1",
"_score": 1.0,
"_source": {
"name": "可乐",
"description": "大数据SRE工程师",
"publish_date": "1991-05-20"
}
}, {
"_index": "old_index",
"_type": "_doc",
"_id": "2",
"_score": 1.0,
"_source": {
"name": "炎长",
"description": "DBA工程师",
"publish_date": "1992-11-23"
}
}]
}
}
6、新建目标索引:new_index
# 注释
# 1、本次将分片设置为2,是为了模拟reindex拆封分片的功能
# 2、建议将目标索引副本设置为0,没有副本,目标索引写入速度会变快,reindex任务执行相应比有部分的写入速度快。reindex结束后,可以根据需要,重新设置副本。
curl -X PUT "http://`hostname -i`:9200/new_index" -H 'Content-Type: application/json' -d'
{
"mappings": {
"properties": {
"name": { "type": "text" },
"description": { "type": "text" },
"publish_date": { "type": "date" }
}
},
"settings": {
"number_of_shards": 2,
"number_of_replicas": 0
}
}'
# 返回结果
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}
7、检查两个索引的数据情况
curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
green open new_index GrJiGswYRqCibszGIVjZhg 2 0 0 0 454b 454b
green open old_index 8k4beb7ETpu6Ki-LpOu_EQ 1 0 2 0 4kb 4kb
8、测试reindex将源索引:old_index中的数据迁移到目标索引:new_index
curl -X POST "http://`hostname -i`:9200/_reindex" -H 'Content-Type: application/json' -d'
{
"source": {
"index": "old_index"
},
"dest": {
"index": "new_index"
}
}
'
# 返回结果,创建成功
{"took":8,"timed_out":false,"total":2,"updated":0,"created":2,"deleted":0,"batches":1,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
9、检查索引的迁移进度
# 数据量太小,执行时间可能比较快,查看不到reindex的任务情况
curl -X GET "http://`hostname -i`:9200/_tasks?detailed=true&actions=*reindex&human=true"
10、再次检查集群两个索引的情况
curl -X GET "http://`hostname -i`:9200/_cat/indices?v"
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open .geoip_databases ib6tlhzjTf-MQBu-XGIVWg 1 0 33 0 31.1mb 31.1mb
green open new_index aU3mztzXRXOSk9Q1oiP2RA 1 0 2 0 4.4kb 4.4kb
green open old_index g24b-XDfQZ6BO5zdcIOM0A 1 0 2 0 4.4kb 4.4kb
总结
根据实际的生产场景,reindex只适合在两个集群间进行索引迁移,如果在集群内拆分索引分片使用,会产生很大的性能问题,不建议集群内部使用reindex。reindex的逻辑是先查询,这会消耗旧索引节点的读IO,再写入到目标索引,这会消耗新索引节点的写IO,如果读写的任务都在一个节点上,那读写压力就会集中,IO,内存和CPU都有可能成为集群的瓶颈点。如果reindex任务比较多比较大,那将是es集群的一个灾难。建议最好的方式是将索引迁移至新的es集群中,这样源集群只会涉及到查询,影响最小,新集群刚开始一般无业务压力,写入不会增加太大的负担。
https://mp.weixin.qq.com/s?__biz=MzA5MjkyNjU5MQ==&mid=2247484835&idx=1&sn=84ca8ce4c2c41c63ec9cf57fc609fb91&chksm=9064e2b3a7136ba5aa62f98024fac5b613b80d510b500b323a53a4c3ed0eed0d2fda5589fe3d#rd