Eleastisearch6.0.0由单节点升级到多节点集群cluster时候出现的分片同步错误问题解决

启动多个节点的ES后,ES开始推举master节点并同步分片shard数据到新ES节点上,此时观察Logstash日志抛出以下错误:
logstash.outputs.elasticsearch] retrying failed action with response code: 403 ({"type"=>"cluster_block_exception", "reason"=>"blocked by: [FORBIDDEN/12/index read-only / allow delete (api)]

这是由于ES新节点的数据目录data存储空间不足,导致从master主节点接收同步数据的时候失败,此时ES集群为了保护数据,会自动把索引分片index置为只读read-only

解决步骤:

1.提供足够的存储空间供数据写入,如需在配置文件中更改ES数据存储目录,注意重启ES

2.放开索引只读设置,在Kibana的开发工具Dev Tools中执行(或在服务器上通过curl工具发起PUT请求,下文同)

PUT _settings
    {
    "index": {
    "blocks": {
    "read_only_allow_delete": "false"
    }
    }
    }


此时观察ES集群状态:curl http://10.0.7.220:9200/_cluster/health?pretty

注意到"active_shards_percent_as_number" : 12.0 该项的值产生变化;

也可以观察各节点ES日志:

[2018-01-18T08:00:20,526][INFO ][o.e.m.j.JvmGcMonitorService] [es-2] [gc][765219] overhead, spent [265ms] collecting
[2018-01-18T08:05:44,583][INFO ][o.e.m.j.JvmGcMonitorService] [es-2] [gc][765841] overhead, spent [262ms] collecting
[2018-01-18T08:07:17,853][INFO ][o.e.m.j.JvmGcMonitorService] [es-2] [gc][766113] overhead, spent [444ms] collecting
[2018-01-18T08:10:54,285][INFO ][o.e.m.j.JvmGcMonitorService] [es-2] [gc][766568] overhead, spent [270ms] collecting
[2018-01-18T08:18:16,306][INFO ][o.e.m.j.JvmGcMonitorService] [es-2] [gc][766590] overhead, spent [375ms] collecting


证明各节点在正常开始同步数据




节点数据同步一段时间后,注意到Logstash抛出以下异常:

[2018-01-18T08:18:55,840][INFO ][logstash.outputs.elasticsearch] retrying failed action with response code: 503

({"type"=>"unavailable_shards_exception", "reason"=>"[twitter_news][0] primary shard is not active Timeout: [1m],

request: [BulkShardRequest to [twitter_news] containing [1] requests]"})>

[2018-01-18T08:18:55,840][ERROR][logstash.outputs.elasticsearch] Retrying individual actions>

[2018-01-18T08:18:55,841][ERROR][logstash.outputs.elasticsearch] Action

此时观察ES集群状态:curl http://10.0.7.220:9200/_cluster/health?pretty

{
  "cluster_name" : "Bond_ELK",
  "status" : "red",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 641,
  "active_shards" : 1282,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 45.0
}

注意到,集群状态"status" : "red",非健康状态

并且,此时观察集群内各索引状态:curl http://10.0.7.220:9200/_cat/indices

(10.0.7.220为其中一个ES节点的IP地址,9200为ES的服务端口)

green open logstash-2017.09.10 Lo4z4egNRMGu7qrKYWM35w 5 1    77152 0  218.8mb 109.4mb
green open logstash-2017.10.04 rABy9W2MQmaUmuGiT8QYnQ 5 1    63638 0     89mb  44.5mb
green open logstash-2017.12.20 As0qvTxcTHSW5enaZ9i9Gg 5 1   190670 0  214.8mb 107.4mb
green open logstash-2017.09.09 V0OO7JPJQPmdPeLDJS15Uw 5 1   123109 0  331.3mb 165.6mb
green open logstash-2017.11.18 jLh97NBWSyWYZ8E0UEIZpw 5 1  1646106 0    2.7gb   1.3gb
green open logstash-2017.11.11 BjA78HyzRuycpUQ701giqg 5 1  1401268 0    2.1gb     1gb
green open logstash-2017.12.24 U47kSs37Tw6Umt_ElE3mvg 5 1   463518 0  618.4mb 309.2mb
green open logstash-2017.11.09 R5nYBGDzSlKK2MxE855i8g 5 1   537955 0  872.2mb 436.1mb
green open logstash-2017.10.22 mSh5vwMMSBOA1XqxuEAsqw 5 1   328375 0  509.8mb 254.9mb
green open logstash-2017.09.13 CdOl9OasRtS1kZNbekrBdA 5 1   115972 0  163.4mb  81.7mb
green open logstash-2018.01.15 tGT6NEJpQTWqK9e86BiuRQ 5 1   148796 0  206.8mb 103.5mb
green open logstash-2017.11.01 8F4VFNhJRtSt0eQtQOmKmw 5 1   323805 0  452.8mb 226.4mb
green open logstash-2017.11.02 8c59nl75RPiXCnw2vmkDFQ 5 1   417596 0  685.5mb 342.7mb
green open logstash-2017.09.22 pGS8fBFLS0CervHlE9_lkA 5 1   372848 0  572.2mb 286.1mb
green open logstash-2017.12.02 VleMOwNUTGmjHFmAQXTSBA 5 1   628957 0    1.2gb 638.3mb
green open logstash-2017.10.31 9ke66J_2RpOMa-181TVYwg 5 1   152957 0  221.4mb 110.7mb
green open logstash-2017.10.19 U4vbt88oRMyWhSKcZM8K4Q 5 1   191099 0  280.4mb 140.2mb
red open logstash-2017.10.08 fz9MKG0qQ2OrQTbFaixLMg 5 1   203432 0  0mb 0mb
green open logstash-2017.11.10 7H0fs5DwTE-m8BMxKEYCtQ 5 1   767469 0    1.2gb 626.2mb
green open logstash-2017.11.22 006C00fJR7ynfy54gIL1Mw 5 1   345869 0  573.1mb 286.5mb
red open logstash-2017.10.13 2wgSB3yKSyi8rf58M5_ODA 5 1   340665 0  0mb 0mb
green open logstash-2017.12.29 C7gjv7ImQXWIlLfib3sr9A 5 1   503307 0  584.1mb   292mb
green open logstash-2017.09.26 UYxrsJiNT4uuI1ED5X_JvQ 5 1   121005 0  178.7mb  89.3mb
green open logstash-2017.11.17 DkfdybaxTNap75Z_ebhYGA 5 1   802889 0    1.2gb 651.9mb
green open logstash-2018.01.04 7plCP47OQYeX0PGogzgVwg 5 1   307646 0  357.2mb 178.6mb

发现索引logstash-2017.10.08和logstash-2017.10.13为red异常状态

解决方法:

删除异常分片,首先保证集群重新正常运行,但是注意,会丢失被删除的同步异常数据

同样在Kibana的开发工具上执行:

DELETE /logstash-2017.10.08,logstash-2017.10.13

之后ES集群状态从"red"恢复为"green"或者"yellow",视实际情况而定,ES集群恢复正常数据同步,当"active_shards_percent_as_number" : 的值为100时,说明数据分片完全同步

{

"cluster_name" : "Bond_ELK",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 641,
  "active_shards" : 1282,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 56.0

}


评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值