ES 用 reindex 做数据迁移-从集群A 的数据，导入到集群B

いNeil

已于 2023-04-26 14:14:48 修改

阅读量2k

点赞数 2

分类专栏： Solr/ElasticSearch 文章标签： elasticsearch 大数据搜索引擎

于 2023-04-26 11:13:36 首次发布

原文链接：https://zhuanlan.zhihu.com/p/602244582

版权

Solr/ElasticSearch 专栏收录该内容

14 篇文章

订阅专栏

文章介绍了如何使用Elasticsearch的reindexAPI从一个集群(A)将数据迁移到另一个集群(B)，包括修改配置、执行API和数据校验的步骤。同时，提到了腾讯云ES的限制。另外，文章还展示了批量迁移脚本的示例以及直接复制数据文件作为迁移的另一种方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

方案一： reindex

目标：从集群A 的数据，导入到集群B

reindex是Elasticsearch提供的一个api接口，可以把数据从源ES集群导入到当前的ES集群，同样实现了数据的迁移，限于腾讯云ES的实现方式，当前版本不支持reindex操作。简单介绍一下reindex接口的使用方式。

第一步：修改本地（需要导入的集群B）

elasticsearch.yaml 修改后要重启（修改完之后重新启动服务）

添加：

reindex.remote.whitelist: ["up01:9200"]

第2步：执行

##2 . 调用reindex api  以下操作表示从源ES集群中查询名为test1的索引，查询条件为title字段为elasticsearch，将结果写入当前集群的test2索引

		POST _reindex
		{
  			"source": {
    			"remote": {
      				"host": "http://up01:9200/"
    			},
    			"index": "tfec_tbl_goods",
    			"size": 100  
  			},
  			"dest": {
   			 	"index": "tfec_tbl_goods"
  			}
		}

第3步：数据校验：

看看数据量跟之前的是不是一样：

	GET _cat/indices/tfec_tbl_goods?v

补充：

es 官网和其他大佬处reindex 使用方法

POST _reindex
{
  "source": {
    "index": "old_index",
	"size": 5000
  },
  "dest": {
    "index": "new_index",
    "version_type": "internal"
	"routing": "=cat"
  }
}

POST _reindex{
	"conflicts": "proceed",	//有冲突继续，默认是有冲突终止
	"size":1000,	//设定条数  
	"source": {    
	    "index": "twitter", 	//也可以为 ["twitter", "blog"]
		"type": "tweet", 	// 或["type1","type2"] 	//红字限制范围 ，非必须  限制文档
		"query": { "term": { "user": "kimchy" } }，	//添加查询来限制文档
		"sort": { "date": "desc" }, 	//排序
		"_source": ["user", "tweet"]，	//指定字段
		"size": 100,	//滚动批次1000更改批处理大小:
	},  
  
    "dest": {    
		"index": "new_twitter",
		"op_type": "create",	//设置将导致_reindex只在目标索引中创建丢失的文档,create 只插入没有的数据
		"version_type": "external"，	//没有设置 version_type或设置为internal 将覆盖掉相同id的数据,设置为external 将更新相同ID文档当version比较后的时候
		"routing": "=cat",	//将路由设置为cat
		"pipeline": "some_ingest_pipeline",	//指定管道来使用Ingest节点特性
    },
  
	"script": { // 执行脚本 
	   "source": "if (ctx._source.foo == 'bar') {ctx._version++; ctx._source.remove('foo')} ", 	                            					                      							 	
		"lang": "painless" 
	}
}




POST _reindex
{
  "conflicts": "proceed",
  "source": {
    "remote": {
      "host": "http://xxx.xxx.com:80",
      "username": "xxx",
      "password": "xxxxxxxxx",
      "socket_timeout": "1m",
      "connect_timeout": "30s"
    },
    "index": "case_cause",
    "size": 5000  //滚动批次1000更改批处理大小:
  },
  "dest": {
    "index": "case_cause",
    "op_type": "create",
    "routing": "=cat"
  }
}

我个人编写的批量执行脚本

本脚本为在本地es集群拉取远端es集群数据

#/bin/bash
#author wxd
#
PARAMETER=$1

ES_URL="http://172.16.248.21:9200"
ES_USER="xxx"
ES_PASSWORD="xxxxxxxxx"

usage() {
    echo -e "\033[46;31mUsage\033[0m: Please use \e[0;35m$0 foo \e[0m"    
    exit 1;
}

do_reindex(){
	for new_index in {parse_task,case_record,parse_check_record,adjustment,code_category,law_question,red_book,law_cloud_category,department,predict_department,predict_judge,cause_predict,law_macroforecast,case_num_data,court_data,indicator_data,standard_value,sub_indicator,cause_weight,cause_weight_backup,events_weight,weight_rate,case_edit,qid_cache,law_case_info_business,law_search_v3,spider_case,code,inner_case,law_doc_template,law,law_firm,institutions,predict_judge_data_v2,predict_judge_data}
	do
		echo -e "\n$(date '+%Y-%m-%d %H:%M:%S'), 索引\e[0;35m$new_index\e[0m  开始reindex"
		curl -POST -H "Content-Type: application/json" -s -u $ES_USER:$ES_PASSWORD "$ES_URL/_reindex" -d '
		{
		  "conflicts": "proceed",
		  "source": {
		    "remote": {
		      "host": "http://xxx.xxx.com:80",
		      "username": "xxxx",
		      "password": "xxxxxxxxxxxxx",
		      "socket_timeout": "1m",
		      "connect_timeout": "30s"
		    },
		    "index": "'$new_index'",
		    "size": 5000
		  },
		  "dest": {
		    "index": "'$new_index'",
		    "op_type": "create",
		    "routing": "=cat"
		  }
		}'
	echo -e "\n$(date '+%Y-%m-%d %H:%M:%S'), 索引\e[0;35m$new_index\e[0m  reindex完成"
	done	
	  
}


if [ -n "$PARAMETER" ]; then
   case "$PARAMETER" in
    foo)
        echo -e "\033[46;31m Do reindex!!!\033[0m"
        do_reindex
		echo -e "-----------------------------------------------\033[46;31mAll reindex done!!!\033[0m-----------------------------------------------"
        ;;  
    *)  
        usage
        exit
        ;;  
    esac
else
    echo -e "\033[46;31merror\033[0m: please input parameter"
fi

从源索引中随机取10条数据到新索引中。

POST _reindex
{
  "size": 10,
  "source": {
    "index": "twitter",
    "query": {
      "function_score" : {
        "query" : { "match_all": {} },
        "random_score" : {}
      }
    },
    "sort": "_score"    
  },
  "dest": {
    "index": "random_twitter"
  }
}