Elasticsearch2.x Cluster Health

最新推荐文章于 2024-03-30 10:04:54 发布

飞奔的代码

最新推荐文章于 2024-03-30 10:04:54 发布

阅读量1.2k

点赞数

分类专栏： Elasticsearch技术总结文章标签： Elasticsearch Cluster-Health 集群状态API

本文链接：https://blog.csdn.net/chennanymy/article/details/52551167

版权

Elasticsearch技术总结专栏收录该内容

5 篇文章 0 订阅

订阅专栏

原文地址：https://www.elastic.co/guide/en/elasticsearch/guide/current/_cluster_health.html

Cluster Health（集群状态）

一个ES集群可能由1个节点1个索引构成，亦或有100个数据节点（data node），3个master节点，几个客户端节点（client node） —— 所有的操作都发生在成百上千的分片上。

先不用关心集群的规模，我们可能希望能够快速的访问集群状态，我们可以通过Cluster Health API来访问集群状态，它能够告诉我们集群是否健康，或者提醒我们集群哪里出了点问题。可以像下面这样调用cluster-health API。

GET _cluster/health

像其他ES API一样cluster-health将返回一个JSON字符串作为response，下面的信息包含了一些和我们集群相关的重要信息

{
   "cluster_name": "elasticsearch_zach",
   "status": "green",
   "timed_out": false,
   "number_of_nodes": 1,
   "number_of_data_nodes": 1,
   "active_primary_shards": 10,
   "active_shards": 10,
   "relocating_shards": 0,
   "initializing_shards": 0,
   "unassigned_shards": 0
}

在上面比较重要的数据字段是status，它的可能取值如下

green：表示所有分片和副本都被分配，我们的集群100%可用

yellow：表示所有分片都被分配，但是部分副本缺失，没有数据丢失，查询结果也是完整的。但在HA存在一定程度安全隐患，如果再丢失一部分主分片可能导致数据丢失。

red：表示至少丢失1个主分片（和所有的备份），这意味着我们集群丢失了数据，查询结果将不再完整，索引数据时会抛出异常。

green/yellow/red 是一个很好的衡量我们集群状态健康程度的指标，其他的指标大概描述了我们的集群其他状态

number_of_nodes：集群所有节点数

number_of_data_nodes：集群所有数据节点数

active_primary_shards：集群所有索引的主分片数

active_shards：集群所有索引的主分片数

relocating_shards：表示当前集群分片从一个节点转移到另一个节点的分片数，这个值一般情况为0，但是可能会增加，当ES集群不平衡时会存在这种情况，比如一个新节点的加入或一个几点关闭。

initializing_shards：表示分片在创建初期的分片数。

unassigned_shards：未分配分片数。

深入练习：找到有问题的索引

想象一下如果某天集群出问题，通过cluster-health Api返回的结果如下：

{
   "cluster_name": "elasticsearch_zach",
   "status": "red",
   "timed_out": false,
   "number_of_nodes": 8,
   "number_of_data_nodes": 8,
   "active_primary_shards": 90,
   "active_shards": 180,
   "relocating_shards": 0,
   "initializing_shards": 0,
   "unassigned_shards": 20
}

OK，我们能从上面的监控检测结果中推测出什么呢？首先，我们的集群状态是red，这表示我们丢失了数据（主分片+备份）。我们知道集群有10个节点，但上面只列出了8个，意味着2各节点丢失。我们还能看到有20个未分配的分片。

上面是我们能收集到的所有的信息。从中看不出具体丢失了哪些分片？哪个索引？主分片还是备份？

为了回答上面的问题，我们可以使用 cluster-health API加上一个level参数：

GET _cluster/health?level=indices

通过这个参数，我们可以获得关于集群中索引状态详情的列表（状态、分片数、未分配分片，等等）

{
   "cluster_name": "elasticsearch_zach",
   "status": "red",
   "timed_out": false,
   "number_of_nodes": 8,
   "number_of_data_nodes": 8,
   "active_primary_shards": 90,
   "active_shards": 180,
   "relocating_shards": 0,
   "initializing_shards": 0,
   "unassigned_shards": 20
   "indices": {
      "v1": {
         "status": "green",
         "number_of_shards": 10,
         "number_of_replicas": 1,
         "active_primary_shards": 10,
         "active_shards": 20,
         "relocating_shards": 0,
         "initializing_shards": 0,
         "unassigned_shards": 0
      },
      "v2": {
         "status": "red", 
         "number_of_shards": 10,
         "number_of_replicas": 1,
         "active_primary_shards": 0,
         "active_shards": 0,
         "relocating_shards": 0,
         "initializing_shards": 0,
         "unassigned_shards": 20 
      },
      "v3": {
         "status": "green",
         "number_of_shards": 10,
         "number_of_replicas": 1,
         "active_primary_shards": 10,
         "active_shards": 20,
         "relocating_shards": 0,
         "initializing_shards": 0,
         "unassigned_shards": 0
      },
      ....
   }
}

1. 我们可以看到v2索引的状态是red

2. v2索引有20个分片未分配

通过上面的分析问题就变得清晰多了：v2索引有10个分片1个备份，这20个分片都丢失了，可以大概推测出这20个分片都在那丢失的2个节点上。

level这个参数还可以接受更多选项：

GET _cluster/health?level=shards

shards这个选项将导致很冗余的输出，它将列出所有索引上所有分片的状态详情。

Blocking for Status Changes

cluster-health API还有一些有用的技巧，特别是在集成一些单元测试的时候，如：

GET _cluster/health?wait_for_status=green

这个api调用将会被阻塞直到集群返回green状态时，这在我们单元测试或者脚本运行中非常重要。

当你创建新的索引时，ES必须将集群状态的改变广播给集群中每一个节点。这些节点必须初始化新的分片，紧接着返回分片“Started”的状态给主节点，这个处理过程是很快的，但是由于节点间网络延迟可能会花费10-20ms。

如果你有一个自动创建索引的脚本，自动去创建索引（a），然后索引一条文档（b），这个操作可能会失败，因为此时索引可能还未初始化完毕。在a和b这段时间可能小于1ms（远远小于网络延迟）。

比起让应用程序（脚本）睡眠更好的办法是调用cluster-health API加上wait_for_status这个参数，一旦索引分片在所有节点上创建成功，集群状态立刻变为green，这个API调用会返回，此时再执行索引文档就不会有问题了。

飞奔的代码

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Elasticsearch2.x Cluster Health

原文地址：https://www.elastic.co/guide/en/elasticsearch/guide/current/_cluster_health.htmlCluster Health（集群状态）一个ES集群可能由1个节点1个索引构成，亦或有100个数据节点（data node），3个master节点，几个客户端节点（client node） —— 所有的操作都发生在成百上千
复制链接

扫一扫