es 一直红色重启不稳定不停的宕机

最新推荐文章于 2024-06-06 14:34:29 发布

weixin_30621711

最新推荐文章于 2024-06-06 14:34:29 发布

阅读量1.1k

点赞数

文章标签：大数据 runtime

原文链接：http://www.cnblogs.com/bigben0123/p/11174673.html

版权

persistent (重启后设置也会存在) or transient (整个集群重启后会消失的设置).

查看集群状态和每个indices状态。搜索到red的，没用就删除

GET /_cluster/health?level=indices

DELETE /.monitoring-kibana-6-2019.07.11/

查看所有未重分配的的分片，分片要平均到各个节点

GET /_cat/shards?h=index,shard,prirep,state,unassigned.reason | grep UNASSIGNED

查看分片分配失败原因：

GET /_cluster/allocation/explain?pretty

设置延迟分片重新分配，减轻重启集群一台是马上reblance带来的压力。所以一般重启时关闭重分配：

PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.enable": "primaries",
"cluster.routing.rebalance.enable" : "none"
}
}

PUT /_all/_settings
{
"settings": {
"index.unassigned.node_left.delayed_timeout": "15m"
}
}

#动态设置es索引副本数量
curl -XPUT 'http://168.7.1.67:9200/log4j-emobilelog/_settings' -d '{
"number_of_replicas" : 2
}'

#设置es不自动分配分片
curl -XPUT 'http://168.7.1.67:9200/log4j-emobilelog/_settings' -d '{
"cluster.routing.allocation.disable_allocation" : true
}'

#手动移动分片
curl -XPOST "http://168.7.1.67:9200/_cluster/reroute' -d '{
"commands" : [{
"move" : {
"index" : "log4j-emobilelog",
"shard" : 0,
"from_node" : "es-0",
"to_node" : "es-3"
}
}]
}'

#手动分配分片
curl -XPOST "http://168.7.1.67:9200/_cluster/reroute' -d '{
"commands" : [{
"allocate" : {
"index" : ".kibana",
"shard" : 0,
"node" : "es-2",
}
}]
}'

设置恢复并发和每秒的大小：
"cluster.routing.allocation.node_concurrent_recoveries": 100,
        "indices.recovery.max_bytes_per_sec": "40mb"

开启疯狂写入模式可以先禁用refresh
curl -XPUT localhost:9200/my_index/_settings -d '{"index":{"refresh_interval":-1}}'

暂时关闭副本：

curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
    "index" : {
        "number_of_replicas" : 1
    }
}'

查看当前线程池、查看当前节点信息
curl -XGET 'http://localhost:9200/_nodes/stats?pretty'

curl -XGET 'localhost:9200/_cat/nodes?h=name,ram.current,ram.percent,ram.max,fielddata.memory_size,query_cache.memory_size,request_cache.memory_size,percolate.memory_size,segments.memory,segments.index_writer_memory,segments.index_writer_max_memory,segments.version_map_memory,segments.fixed_bitset_memory,heap.current,heap.percent,heap.max,\&v'

curl -XPOST "localhost:9200/_cache/clear"

es节点重启注意点：
1、暂停数据写入程序
（如果条件允许，正式环境一般不会允许，我们是es写如果有问题数据会落地回头再写入es，所以也可以允许!!!!! 这种情况基本不会出现需要重启整个es集群）
2、关闭集群shard allocation
3、手动执行POST /_flush/synced
4、重启结点
5、重新开启集群shard allocation
6、等待recovery完成，集群health status变成green
7、重新开启数据写入程序

！！！没有template的数据字段类型又多变很可能拖累es

https://www.elastic.co/guide/en/elasticsearch/guide/current/indexing-performance.html#_using_and_sizing_bulk_requests
Segment merging 拖慢写数据时会有日志
now throttling indexing
默认是20MB 如果ssd建议100-200
PUT /_cluster/settings
{
    "persistent" : {
        "indices.store.throttle.max_bytes_per_sec" : "100mb"
    }
}
如果只录入数据，不做索引查询，甚至可以关掉这个（重新打开将其设置为merge）
PUT /_cluster/settings
{
    "transient" : {
        "indices.store.throttle.type" : "none"
    }
}

机械硬盘减少磁盘io压力方法
(This setting will allow max_thread_count + 2 threads to operate on the disk at one time, so a setting of 1 will allow three threads.)
For SSDs, you can ignore this setting. The default is Math.min(3, Runtime.getRuntime().availableProcessors() / 2), which works well for SSD.

这个是写在配置文件elasticsearch.yml配置文件的
index.merge.scheduler.max_thread_count: 1

Finally, you can increase index.translog.flush_threshold_size from the default 512 MB to something larger, such as 1 GB.
！！！这样能减轻磁盘压力，但会加重内存压力
This allows larger segments to accumulate in the translog before a flush occurs.
By letting larger segments build, you flush less often, and the larger segments merge less often.
All of this adds up to less disk I/O overhead and better indexing rates

知道哪个索引的哪个分片就开始手动修复，通过reroute的allocate分配

curl -XPOST '{ESIP}:9200/_cluster/reroute' -d '{
    "commands" : [ {
          "allocate" : {
              "index" : "eslog1",
              "shard" : 4,
              "node" : "es1",
              "allow_primary" : true
          }
        }
    ]
}'

https://www.cnblogs.com/seaspring/p/9322582.html

ELK的内外网配置：
network.bind_host: 多个地址，可以是内网，外网同时可以访问
network.publish_host: es集群间交互通信地址。如果同时有内网，外网，我们将他设定为这台服务器的内网地址。分片复制会更快。
network.host: 0.0.0.0 指绑到所有的网卡IP上，如果一台服务器有多个地址，外网，内网 (如果没有设置上面两个选项，上面两个选项的默认值就是它。)

转载于:https://www.cnblogs.com/bigben0123/p/11174673.html

weixin_30621711

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
es 一直红色重启不稳定不停的宕机

persistent (重启后设置也会存在) or transient (整个集群重启后会消失的设置).查看集群状态和每个indices状态。搜索到red的，没用就删除GET /_cluster/health?level=indicesDELETE /.monitoring-kibana-6-2019.07.11/查看所有未重分配的的分片，分片要平均到各个节点...
复制链接

扫一扫