elasticsearch 运维reroute 去除index丢失分片

最新推荐文章于 2024-05-15 10:57:48 发布

tanruixing

最新推荐文章于 2024-05-15 10:57:48 发布

阅读量2k

点赞数

分类专栏： elasticsearch

本文链接：https://blog.csdn.net/tanruixing/article/details/88316383

版权

elasticsearch 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

问题背景：在一个数据量特别大的日志集群中，假如有25台机器，每台机器部署2个es实例。那么共有50个节点，为了集群的数据平衡和考虑到日志数据量大给副本消耗太多的资源，只能设置number_of_shards为50，"number_of_replicas" : "0"。写的场景是按小时写入大量的日志，当某一台机器挂了（导致数据无法恢复）则会导致所有的index都会丢失对应机器所存储的分片，从而集群进入red状态。为恢复集群的red状态，需要用到如下方法进行恢复。这个方法有一定的局限性，就是所有的index丢失的分片数量都是相同的，有读者会问假如存储一周，则有24*7=168个index，每个index的分片未必是相同的。但熟悉es运维的同学知道，es创建index是会根据以前的index分布情况进行分配的，若之前的index分配非常均匀合理，则新的index跟现在的index的分片分配方式是一样的，即很有可能168个index的数据分片都是一样的！所以就可以利用这个特点简化处理。

问题处理：

首先加入一台新的机器，暂时设置不要集群进行reloate分片迁移。
利用以下命令遍历出所有状态为red的index，并予以将丢失的分片设置为空分片分配到新的机器上。下面的命令假设丢失的分片为26、27

curl -XGET http://your_es_ip:your_es_port/_cat/indices | grep red | awk '{print $3}' | awk '{system("sh reroute.sh "$1)}'


index=$1
curl -XPOST 'http://your_es_ip:your_es_port/_cluster/reroute?pretty' -d '{
  "commands":[
    {
      "allocate_empty_primary":{
        "index": "'${index}'",
        "shard": 26,
        "node": "new_node_name1",
        "accept_data_loss": true
      }
    },
    {
      "allocate_empty_primary":{
        "index": "'${index}'",
        "shard": 27,
        "node": "new_node_name2",
        "accept_data_loss": true
      }
    }
  ]
}'

若index分布不均匀则需要_cat/shards找出每个index异常的命令，可以参考如下命令进行处理。

 curl -XGET http://127.0.0.1:9200/_cat/shards | grep UNASSIGNED | awk '{print $1,$2}' | awk '{system("sh reroute.sh " $0)}'

index=$1
shard=$2

curl -XPOST -H 'Content-Type: application/json' 'http://127.0.0.1:9201/_cluster/reroute?pretty' -d '{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "'${index}'",
        "shard": '${shard}',
        "node": "new_node_name",
        "accept_data_loss": true
      }
    }
  ]
}'

tanruixing

关注

0
点赞
踩
4

收藏

觉得还不错? 一键收藏
0
评论
elasticsearch 运维reroute 去除index丢失分片

问题背景：在一个数据量特别大的日志集群中，假如有25台机器，每台机器部署2个es实例。那么共有50个节点，为了集群的数据平衡和考虑到日志数据量大给副本消耗太多的资源，只能设置number_of_shards为50，"number_of_replicas" : "0"。写的场景是按小时写入大量的日志，当某一台机器挂了（导致数据无法恢复）则会导致所有的index都会丢失对应机器所存储的...
复制链接

扫一扫