es数据跨集群的两种迁移方式

一只行走鸟

已于 2023-05-22 14:58:50 修改

阅读量3.5k

点赞数 5

文章标签： elasticsearch

于 2020-05-11 16:19:31 首次发布

本文链接：https://blog.csdn.net/weixin_40368523/article/details/106056735

版权

背景：Es集群进行上云，需要把数据从本地迁移至云上。以下提供两种迁移方案，应对于不同的级别的集群场景。
一、Elasticdump
elasticsearch-dump是一款开源的ES数据迁移工具，github地址: https://github.com/taskrabbit/elasticsearch-dump。应用于少量数据的es集群。
安装npm，然后安装elasticdump

npm install elasticdump -g

以下是分了三个步骤进行，第一条命令先将索引的settings先迁移，如果直接迁移mapping或者data将失去原有集群中索引的配置信息如分片数量和副本数量等，当然也可以直接在目标集群中将索引创建完毕后再同步mapping与data。

# -*- coding:utf-8-*-
from elasticsearch import Elasticsearch
import re
import os

es = Elasticsearch("192.16.14.2:9200",port=9200,  timeout=90)
f = es.indices
###获取所有的es索引进行遍历
for index in f.get("*"):
    ###去除系统索引
    an = re.search('^\.', index)
    if an:
        pass
    else:
    	###迁移setting
        os.system("elasticdump --input=http://172.6.0.39:9200/%s --output=http://172.6.0.20:9200/%s --type=settings"%(index,index))
        print("%s设置settings执行完成"%(index))
        ###迁移mapping
        os.system("elasticdump --input=http://172.6.0.39:9200/%s --output=http://172.6.0.20:9200/%s --type=mapping"%(index,index))
        print("%s设置mapping完成"%(index))
        ###迁移数据
        os.system("elasticdump --input=http://172.6.0.39:9200/%s --output=http://172.6.0.20:9200/%s --type=data"%(index,index))
        print("%s索引数据迁移完成"%(index))

二、reindex：
reindex的核心做跨索引、跨集群的数据迁移，应用于大量集群数据的迁移。
慢的原因及优化思路

批量大小值可能太小。需要结合堆内存、线程池调整大小；
reindex的底层是scroll实现，借助scroll并行优化方式，提升效率；
跨索引、跨集群的核心是写入数据，考虑写入优化角度提升效率。
reindex迁移参数：

source：远程ElasticSearch信息
host：远程es的ip和port列表
socket_timeout：超时时间设置
connect_timeout：超时时间设置
index：源索引名称
size：批操作大小，修改大小可获取性能
query：满足条件的数据，全部迁移可去除该参数
dest.index：目标索引名称

迁移脚本

#/bin/bash

##适用于本地集群无用户名密码
ES_HOST=(192.168.1.1 192.168.1.2 192.168.1.3)  ##要写入的ES
OLD_ES_HOST=(192.168.1.4 192.168.1.5 192.168.1.6) ##要迁移数据的es集群
ES_USER="user" 
ES_PASSWORD="pwd"
reindex_file="/root/remote_reindex.file"


###获取es索引列表
#index_list=`curl -u OLD_ES_USER:OLD_ES_PASSWORD -XGET "$OLD_ES_HOST/_cat/indices?v"|awk -F' ' '{if (NR>1){print $3}}' |egrep -v "^\."`
###遍历列表, 空格隔开
#index_list=""

if [ ! -f "$reindex_file" ] || [ ! -s "$reindex_file" ];then
    curl  -sXGET "http://$OLD_ES_HOST:9200/_cat/indices" |grep -vw close |awk '{print $3}'|egrep -v "^\." >> $reindex_file
fi



reindex() {
  for index in $(grep -v reindex $reindex_file);do
      now_reindex=$(grep $index $reindex_file)
      if [[ $now_reindex =~ "reindex"  ]];then
          continue
      fi
      new_count=$(curl -su $ES_USER:$ES_PASSWORD http://$es_host:9200/$index/_count | jq '.count')
      old_count=$(curl -s http://${old_es_host}:9200/$index/_count | jq '.count')
      if [ "$new_count" -eq "$old_count" ];then
          echo "索引:$index,doc数目一致,new: $new_count,old: $old_count" >> count.log
          continue
      fi 
      if [[ $now_reindex != *"reindex"* ]]; then
        reindex_old=$(echo "${index}_reindex_old")
        sed -i "s/$index/$reindex_old/g" $reindex_file
        echo -e "$(date '+%Y-%m-%d %H:%M:%S'), 索引$index,开始reindex" >> reindex-${1}.log
        cmd="curl -su $ES_USER:$ES_PASSWORD -XPOST \"http://$es_host:9200/_reindex\" -H 'Content-Type: application/json' -d'{
               \"conflicts\": \"proceed\",
               \"source\":{
                \"remote\": {
                \"host\": \"http://${old_es_host}:9200\"
                },
                \"index\": \"${index}\",
                \"size\":\"12000\"
              },
              \"dest\": {
                  \"index\": \"${index}\",
                  \"op_type\": \"create\",
                  \"routing\": \"=cat\"
              }
        }'"
        eval ${cmd}
        echo -e "$(date '+%Y-%m-%d %H:%M:%S'), 索引$index,reindex完成" >> reindex-${1}.log
        surplus_index=$(cat $reindex_file |grep "reindex_old" |wc -l | awk '{print $1}')
        all_index=$(wc -l $reindex_file|awk '{print $1}')
        proportion_data=$(echo "$surplus_index $all_index" | awk '{printf "%.4f\n", $1/$2}')
        proportion=$(echo "$proportion_data" | awk '{printf "%.2f\n", $1 * 100}')
        echo "数据迁移进度：${proportion}%" >> reindex-${1}.log
      fi
  done
}

cluster_migration() {
    array_length=${#ES_HOST[@]}
    for ((i=0; i<$array_length; i++)); do
        es_host=${ES_HOST[$i]}
        old_es_host=${OLD_ES_HOST[$i]}
        sleep 1
        reindex  $es_host $old_es_host >> nohup_reindex.log  &
    done     
}


main() {
     if [[ "$1" == "start" ]];then
         cluster_migration
     elif [[ "$1" == "stop" ]];then
         ps -ef |grep $0|grep reindex |grep -v grep|awk '{print $2}'|xargs kill -9
     fi
}

main "$@"

一只行走鸟

关注

5
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
es数据跨集群的两种迁移方式

少量数据迁移使用elasticdump安装npm，然后安装elasticdumpnpm install elasticdump -g以下是分了三个步骤进行，第一条命令先将索引的settings先迁移，如果直接迁移mapping或者data将失去原有集群中索引的配置信息如分片数量和副本数量等，当然也可以直接在目标集群中将索引创建完毕后再同步mapping与data。# -*- coding:utf-8-*-from elasticsearch import Elasticsearchimpor
复制链接

扫一扫