趣味的解释一下,如果把一个ES集群,比喻成王朝的话,3个状态,是这样的
绿色,太平盛世,国家一片大好
黄色,奸臣当道,国家危在旦夕
红色,皇上不上朝,是可忍孰不可忍
绿色的话,男耕女织,该干啥干啥,就不用管了,黄色的话,哪个王朝没有奸臣啊,也可以忍了。但是如果是红色的话,很严重,非常严重,基本上等一会儿集群就能恢复过来了。好了,已经有一个感性的认识了,那到底是咋回事呢?
绿色,一切正常
黄色,副本丢失
红色,主分片丢失
看到这里豁然开朗,就这么简单啊,我明白了, 但是等下,先别关闭博客,作为一个码农,有追求的码农,能就这么容易被糊弄过去吗? 必须看到代码,才是真理,代码才是最真实的。如果同学们不满足于比喻,那我们继续,我们要来真的了。
查看集群健康
curl http://localhost:9200/_cluster/health?pretty=true
{
"cluster_name" : "mycluster",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 4,
"number_of_data_nodes" : 4,
"active_primary_shards" : 778,
"active_shards" : 1556,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0
}
集群健康类
返回的这些信息,都是什么意思呢? 打开IDE一顿找。。。。最后
集群状态
public enum ClusterHealthStatus {
GREEN((byte) 0),
YELLOW((byte) 1),
RED((byte) 2);
//....
分片路由表状态
public enum ShardRoutingState {
/**
* 分片没有被分配到任意节点.
*/
UNASSIGNED((byte) 1),
/**
* 分片正在初始化 (可能从一个分片或者时间之门正在恢复 ).
*/
INITIALIZING((byte) 2),
/**
* 分片已经开始.
*/
STARTED((byte) 3),
/**
* 分片正在迁移.
*/
RELOCATING((byte) 4);
//....
分片路由表
public class ImmutableShardRouting implements Streamable, Serializable, ShardRouting {
//...
@Override
public boolean unassigned() {
return state == ShardRoutingState.UNASSIGNED;
}
@Override
public boolean initializing() {
return state == ShardRoutingState.INITIALIZING;
}
@Override
public boolean active() {
return started() || relocating();
}
@Override
public boolean started() {
return state == ShardRoutingState.STARTED;
}
@Override
public boolean relocating() {
return state == ShardRoutingState.RELOCATING;
}
//...
看到这里,貌似有点明白了,开始了或者迁移中的分片,就是活动分片,恰好它是主分片,那就是活动主分片。原来如此,但是同学们又要说了,这些个只是实体类,充其量都是些小喽啰啊。别着急,咱们继续看。
集群健康计算
TransportClusterHealthAction类的clusterHealth()方法负责集群健康的计算,它还从它的父类,继承了优良的传统,在Master节点上执行这些操作,如你没有往Master节点发送这个请求,没关系,它会替你转发。前面会做一些个等待信息的处理,我们暂且不关心,直奔主题。
private ClusterHealthResponse clusterHealth(ClusterHealthRequest request, ClusterState clusterState) {
if (logger.isTraceEnabled()) {
logger.trace("基于集群状态计算集群健康,版本 [{}]", clusterState.version());
}
//上来第一件事情,做个验证,这里主要是 routingTable 和 metaData 做个比对。
//比如 :新建索引的时候,用户指定了5个分片,但是实际routingTable里,只有4个,那么完蛋了。
RoutingTableValidation validation = clusterState.routingTable().validate(clusterState.metaData());
ClusterHealthResponse response = new ClusterHealthResponse(clusterName.value(), validation.failures());
response.numberOfNodes = clusterState.nodes().size();
response.numberOfDataNodes = clusterState.nodes().dataNodes().size();
String[] concreteIndices;
try {
concreteIndices = clusterState.metaData().concreteIndicesIgnoreMissing(request.indices());
} catch (IndexMissingException e) {
return response;
}
//整个判断,分成3个层次,同一逻辑,分别计算
for (String index : concreteIndices) {
IndexRoutingTable indexRoutingTable = clusterState.routingTable().index(index);
IndexMetaData indexMetaData = clusterState.metaData().index(index);
if (indexRoutingTable == null) {
continue;
}
ClusterIndexHealth indexHealth = new ClusterIndexHealth(index, indexMetaData.numberOfShards(), indexMetaData.numberOfReplicas(), validation.indexFailures(indexMetaData.index()));
for (IndexShardRoutingTable shardRoutingTable : indexRoutingTable) {
ClusterShardHealth shardHealth = new ClusterShardHealth(shardRoutingTable.shardId().id());
for (ShardRouting shardRouting : shardRoutingTable) {
if (shardRouting.active()) { //如果分片是活动的,什么叫活动的,你懂的
shardHealth.activeShards++;
if (shardRouting.relocating()) {
// the shard is relocating, the one he is relocating to will be in initializing state, so we don't count it
shardHealth.relocatingShards++; //计算迁移证中的
}
if (shardRouting.primary()) {
shardHealth.primaryActive = true; //恰好,它是个主分片
}
} else if (shardRouting.initializing()) {
shardHealth.initializingShards++; //计算初始化中的
} else if (shardRouting.unassigned()) {
shardHealth.unassignedShards++; //没分配的
}
}
if (shardHealth.primaryActive) {
if (shardHealth.activeShards == shardRoutingTable.size()) { //如果所有分片都是活动的话
shardHealth.status = ClusterHealthStatus.GREEN;
} else {
shardHealth.status = ClusterHealthStatus.YELLOW;
}
} else {
//如果主分片,不是活动的,那不出意外,整个集群都是红色的
shardHealth.status = ClusterHealthStatus.RED;
}
indexHealth.shards.put(shardHealth.getId(), shardHealth);
}
for (ClusterShardHealth shardHealth : indexHealth) {
if (shardHealth.isPrimaryActive()) {
indexHealth.activePrimaryShards++;
}
indexHealth.activeShards += shardHealth.activeShards;
indexHealth.relocatingShards += shardHealth.relocatingShards;
indexHealth.initializingShards += shardHealth.initializingShards;
indexHealth.unassignedShards += shardHealth.unassignedShards;
}
// 假设他是健康的绿色
indexHealth.status = ClusterHealthStatus.GREEN;
if (!indexHealth.getValidationFailures().isEmpty()) {
indexHealth.status = ClusterHealthStatus.RED;
} else if (indexHealth.getShards().isEmpty()) { // might be since none has been created yet (two phase index creation)
indexHealth.status = ClusterHealthStatus.RED;
} else {
for (ClusterShardHealth shardHealth : indexHealth) {
if (shardHealth.getStatus() == ClusterHealthStatus.RED) { //只要有一个分片是红色的,那索引健康就是红色的
indexHealth.status = ClusterHealthStatus.RED;
break;
}
if (shardHealth.getStatus() == ClusterHealthStatus.YELLOW) {
indexHealth.status = ClusterHealthStatus.YELLOW;
}
}
}
response.indices.put(indexHealth.getIndex(), indexHealth);
}
for (ClusterIndexHealth indexHealth : response) {
response.activePrimaryShards += indexHealth.activePrimaryShards;
response.activeShards += indexHealth.activeShards;
response.relocatingShards += indexHealth.relocatingShards;
response.initializingShards += indexHealth.initializingShards;
response.unassignedShards += indexHealth.unassignedShards;
}
response.status = ClusterHealthStatus.GREEN;
if (!response.getValidationFailures().isEmpty()) {
response.status = ClusterHealthStatus.RED;
} else if (clusterState.blocks().hasGlobalBlock(RestStatus.SERVICE_UNAVAILABLE)) { //Ping不通了
response.status = ClusterHealthStatus.RED;
} else {
//下面这个循环的意思,就是官方文档说的那句
// The cluster status is controlled by the worst index status.
for (ClusterIndexHealth indexHealth : response) {
if (indexHealth.getStatus() == ClusterHealthStatus.RED) {
response.status = ClusterHealthStatus.RED;
break;
}
if (indexHealth.getStatus() == ClusterHealthStatus.YELLOW) {
response.status = ClusterHealthStatus.YELLOW;
}
}
}
return response;
}