Elasticsearch yellow unassigned_shards 恢复 replicas 节点恢复

Elasticsearch yellow unassigned_shards 恢复 replicas 节点恢复

问题

Elasticsearch 5.1 集群三个节点,由于某些原因,导致其中两个挂掉了。及时重启后,es的健康状态由red变为yellow,并一直持续yellow状态

分析

调用健康状态接口:

GET /_cluster/health
----------------------
{
  "cluster_name": "es",
  "status": "yellow",
  "timed_out": false,
  "number_of_nodes": 3,
  "number_of_data_nodes": 3,
  "active_primary_shards": 37,
  "active_shards": 71,
  "relocating_shards": 0,
  "initializing_shards": 0,
  "unassigned_shards": 3,
  "delayed_unassigned_shards": 0,
  "number_of_pending_tasks": 0,
  "number_of_in_flight_fetch": 0,
  "task_max_waiting_in_queue_millis": 0,
  "active_shards_percent_as_number": 95.94594594594594
}

发现有三个节点的副本没有分配。
调用节点详情接口:

GET /_cat/shards
my_index             4 p STARTED    7795827  10.7gb 192.168.0.205 node-205-1
my_index             4 r UNASSIGNED                               
my_index             3 p STARTED    7801305  12.4gb 192.168.0.205 node-205-1
my_index             3 r UNASSIGNED                               
my_index             2 r STARTED    7797142  10.6gb 192.168.0.149 node-149-1
my_index             2 p STARTED    7797211  10.6gb 192.168.0.173 node-173-1
my_index             1 p STARTED    7801554  11.4gb 192.168.0.205 node-205-1
my_index             1 r UNASSIGNED                               
my_index             0 r STARTED    7795061  10.8gb 192.168.0.149 node-149-1
my_index             0 p STARTED    7795107  10.8gb 192.168.0.173 node-173-1

发现 primary shards 是ok的,但是 1 3 4 的 replica shards 是挂掉的。
按理来说,Elasticsearch是有自我分配节点功能的,首先查看这个功能是否开启:

GET /_cluster/settings
-----------------------
{
    "persistent": {},
    "transient": {
        "cluster": {
            "routing": {
                "allocation": {
                    "enable": "all" }
            }
        }
    }
}

已经开启了自动分配功能。那就很奇怪了,为什么这三个节点没有分配呢。
于是登上机器,查看es的日志,发现error如下:

[2017-11-21T15:43:54,799][WARN ][o.e.i.c.IndicesClusterStateService] [node-149-1] [[my_index][4]] marking and sending shard failed due to [failed recovery]
org.elasticsearch.indices.recovery.RecoveryFailedException: [my_index][4]: Recovery failed from {node-205-1}{WPC32CtxTtOiTPuCseqF8g}{AyjHnVtwSnik2Rcu_SQg8A}{192.168.0.205}{192.168.0.205:9300} into {node-149-1}{fa9ZVqyXSHKhYHvAhr8x6w}{ECrBVQS_QPOXtc9E0is9Tw}{192.168.0.149}{192.168.0.149:9300}
......
Caused by: org.elasticsearch.indices.recovery.RecoverFilesRecoveryException: Failed to transfer [0] files with total size of [0b]
......
Caused by: java.lang.IllegalStateException: try to recover [my_index][4] from primary shard with sync id but number of docs differ: 7728586 (node-205-1, primary) vs 7728583(node-149-1)
......

原来并不是es没有尝试恢复,而是恢复报错了。
一头雾水,不知道发生了什么,于是google。
尝试了各种方法,比如

POST /_cluster/reroute
{
    "commands" : [ {
        "allocate_empty_primary" : {
            "index" : "my_index",
            "shard" : 1,
            "node" : "node-149-1",
            "accept_data_loss":true
        }
    }]
}
-----------------------------
{
    "error": {
        "root_cause": [
            {
                "type": "remote_transport_exception",
                "reason": "[node-205-1][192.168.0.205:9300][cluster:admin/reroute]"
            }
        ],
        "type": "illegal_argument_exception",
        "reason": "[allocate_empty_primary] primary [my_index][1] is already assigned"
    },
    "status": 400
}

说 primary shards 1 已经被分配了,所以使用这个方法好像行不通。

解决方法

因为所有primary shards都是好的,所有replica shards有问题,那么我强制删除掉replica shards,让es再重新生成,不就ok了吗。
首先先将出问题的index的副本为0

PUT /my_index/_settings
{
    "index" : {
        "number_of_replicas" : 0
    }
}
--------------------------
{
    "acknowledged": true
}

此时再查看节点状态:

GET /_cat/shards
my_index             4 p STARTED    7795827  10.7gb 192.168.0.205 node-205-1
my_index             3 p STARTED    7801305  12.4gb 192.168.0.205 node-205-1
my_index             2 p STARTED    7797211  10.6gb 192.168.0.173 node-173-1
my_index             1 p STARTED    7801554  11.4gb 192.168.0.205 node-205-1
my_index             0 p STARTED    7795107  10.8gb 192.168.0.173 node-173-1

没有 replica shards 了。
接下来再恢复回去:

PUT /my_index/_settings
{
    "index" : {
        "number_of_replicas" : 1
    }
}
--------------------------
{
    "acknowledged": true
}

等待节点自动分配后,集群成功恢复成green!!!

  • 1
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值