Elasticsearch 分多个node,在不同node上放shard,有primary shard,还有replica shard,数据放不同shard上为了保证高可用性。
最近接到一份elasticsearch的环境,打开就是无法使用。
1. 查看健康状态为red,unassigned_shards 是4
http://localhost:9200/_cluster/health?pretty
2. 查看都有哪些index,想着不行把index都删除了,发现也没啥index就是不可用。
http://127.0.0.1:9200/_cat/indices?v
删除命令:curl -XDELETE 'http://127.0.0.1:9200/index-name'
查看不同的shards,有没有unassigned,发现有
http://127.0.0.1:9200/_cat/shards
3. 尝试reroute,把shard给分配了,重试几次就结束,没成功
curl -XPOST 'localhost:9200/_cluster/reroute?retry_failed=true'
4. 查看其中的一个index,没发现问题
http://localhost:9200/index-name/_settings?pretty
5. 查看集群问题原因,发现是磁盘快没了超过watermark就不写数据了
curl -XGET 'localhost:9200/_cluster/allocation/explain?pretty'
6. 尝试把watermark调高
curl -H "Content-Type: application/json" -XPUT '127.0.0.1:9200/_cluster/settings' -d '{"transient": { "cluster.routing.allocation.disk.watermark.low": "95%", "cluster.routing.allocation.disk.watermark.high": "98%", "cluster.routing.allocation.disk.watermark.flood_stage": "99%", "cluster.info.update.interval": "1m"}}'
7. 因为只有一个node,es需要node数 > replica + 1,node是1,把replica设0
curl -XPUT 'http://localhost:9200/_settings' -H 'content-Type:application/json' -d'{"number_of_replicas": 0}'
curl -XPUT 'http://localhost:9200/index-name/_settings' -H 'content-Type:application/json' -d'
{"number_of_replicas": 0}'
8. 集群red变green
9. 把返回结果数调大:
curl -H "Content-Type: application/json" -XPUT http://127.0.0.1:9200/coupon-tweets/_settings -d '{ "index" : { "max_result_window" : 200000000}}'