记录一下es丢失分片的问题。
5.4.3版本的es。3个节点分布在三台主机上,分片设置为5分片1副本的配置。因为压力测试,需要升级节点java堆内存,从2G升级到6G。因为是测试集群,在改了jvm.options配置之后(要改es软件目录下的才生效,配置目录下的不生效),挨个重启节点,每个节点相差几秒钟的样子。好了,在我一顿操作猛如虎之后,集群起来了,皆大欢喜,继续测试。过了10来天,开发找过来,说一个refresh操作需要执行超过10s,问题必现。拿到出问题的索引后,通过GET /_cluster/state 发现0和3分片unassigned。再继续通过explain:
GET /_cluster/allocation/explain
{
"index": "dcvs_nonmotorvehicle",
"shard": 3,
"primary": true
}
结果如下:
{
"index": "dcvs_nonmotorvehicle",
"shard": 3,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "CLUSTER_RECOVERED",
"at": "2020-04-10T03:40:41.127Z",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
"node_id": "TklXzLKySf-czdu8zZ5hyQ",
"node_name": "MYSQL2",
"transport_address": "10.45.156.202:9300",
"node_attribute