ES 副本分片UNASSIGNED

最新推荐文章于 2024-05-05 03:49:43 发布

weixin_34247032

最新推荐文章于 2024-05-05 03:49:43 发布

阅读量394

点赞数

原文链接：https://my.oschina.net/u/2247638/blog/1359028

版权

2019独角兽企业重金招聘Python工程师标准>>>

一直看ES的集群状态都是yellow，一开始没在意，后来越觉奇怪，三个节点，5个主片5个副片，node-3上一直没分片，主片分在了node-1,node-2，副片分在。。。

啊，副片全都UNASSIGNED了，才发现。。。。。。。

于是乎进行rerouter

首先猫一下分片情况

>>> curl -XGET 'http://localhost:9200/_cat/shards'

my_index 4 p STARTED    104917 157.2mb 127.0.0.1 node-1
my_index 4 r UNASSIGNED
my_index 3 p STARTED    104892 156.7mb 127.0.0.1 node-1
my_index 3 r UNASSIGNED
my_index 2 p STARTED    104714 155.6mb 127.0.0.1 node-1
my_index 2 r UNASSIGNED
my_index 1 p STARTED    104874 156.5mb 127.0.0.1 node-2
my_index 1 r UNASSIGNED
my_index 0 p STARTED    105933 156.5mb 127.0.0.1 node-1
my_index 0 r UNASSIGNED

0-4 5个r分片全沦陷了（head上就看出来了好伐...）

然后进行reroute

curl -XPOST 'localhost:9200/_cluster/reroute' -d '{
        "commands" : [ {
              "allocate" : {
                  "index" : "my_index",
                  "shard" : 这里是分片,
                  "node" : 这里是节点,
                  "allow_primary" : true
              }
            }
        ]
    }'

为了方便写个脚本去做

#!bin/sh
for index in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | awk '{print $1}' | sort | uniq); do
    for shard in $(curl -s 'http://localhost:9200/_cat/shards' | grep UNASSIGNED | grep $index | awk '{print $2}' | sort | uniq); do
        echo $index $shard
        curl -XPOST 'http://localhost:9200/_cluster/reroute' -d "{'commands':[{'allocate':{'index':$index,'shard':$shard,'node':'node-3','allow_primary':true}}]}"
        sleep 5
    done
done

好，试一下。。。。。。。果断报错

"type": "illegal_argument_exception",
"reason": "[allocate] allocation of [my_index][0] on node {node-3}{rjG_j423SpejhzmAAUCcqA}{127.0.0.1}{127.0.0.1:9320} is not allowed, reason: [NO(more than allowed [85.0%] used disk on node, free: [6.234234497791846%])][YES(node passes include/exclude/require filters)][YES(allocation disabling is ignored)][YES(shard is not allocated to same node or host)][YES(target node version [2.4.4] is same or newer than source node version [2.4.4])][YES(shard not primary or relocation disabled)][YES(below shard recovery limit of [2])][YES(total shard limit disabled: [index: -1, cluster: -1] <= 0)][YES(primary is already active)][YES(allocation disabling is ignored)][YES(no allocation awareness enabled)]"

这个。。。眼睛有点儿花。直接去查查reroute失败的原因，瞄到如下信息（来自博客http://blog.csdn.net/xiangcheng001/article/details/51133364）

啊，貌似在错误里看到了个85，回看眼错误信息是有个相关的描述。

df -h 检查一下

乖乖，啥时候这么多了。

嗯，再见了，我心爱的电影，大电影，小电影

删完之后再reroute，

{

"acknowledged": true,
"state": {
- "version": 17,
- "state_uuid": "WrYBhVr5T7aem4uReXaRCA",
- "master_node": "6gDDoI_OS32VjAxmCGTSsg",
- "blocks": { },
- "nodes": {
  - "71GSL-osQeaLv-cDU9bygA": {
    - "name": "node-2",
    - "transport_address": "127.0.0.1:9310",
    - "attributes": { }
    },
  - "6gDDoI_OS32VjAxmCGTSsg": {
    - "name": "node-1",
    - "transport_address": "127.0.0.1:9300",
    - "attributes": { }
    },
  - "rjG_j423SpejhzmAAUCcqA": {
    - "name": "node-3",
    - "transport_address": "127.0.0.1:9320",
    - "attributes": { }
    }
  },
- "routing_table": {
  - "indices": {
    - "enterprise_data_gov_20170324": {
      - "shards": {
        "0": [
        {
        "state": "STARTED",
        "primary": true,
        "node": "6gDDoI_OS32VjAxmCGTSsg",
        "relocating_node": null,
        "shard": 0,