记一次ES 事故

从报警来看,业务报接口超时,同时es 错误日志也会提示:

Caused by: org.elasticsearch.common.util.concurrent.EsRejectedExecutionException: rejected execution of org.elasticsearch.transport.TcpTransport$RequestHandler@6fbaf20b on EsThreadPoolExecutor[search, queue capacity
 = 1000, org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor@1f159058[Running, pool size = 49, active threads = 49, queued tasks = 1000, completed tasks = 23765969104]]
	at org.elasticsearch.common.util.concurrent.EsAbortPolicy.rejectedExecution(EsAbortPolicy.java:50) ~[elasticsearch-5.2.2.jar:5.2.2]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) ~[?:1.8.0_71]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) ~[?:1.8.0_71]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.doExecute(EsThreadPoolExecutor.java:94) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor.execute(EsThreadPoolExecutor.java:89) ~[elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.transport.TcpTransport.handleRequest(TcpTransport.java:1445) [elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.transport.TcpTransport.messageReceived(TcpTransport.java:1329) [elasticsearch-5.2.2.jar:5.2.2]
	at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:74) [transport-netty4-5.2.2.jar:5.2.2]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:293) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:280) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:396) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:248) [netty-codec-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) ~[netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:341) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1334) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:363) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:349) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:926) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:129) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:642) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:527) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:481) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:441) [netty-transport-4.1.7.Final.jar:4.1.7.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.7.Final.jar:4.1.7.Final]
	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_71]

常规操作:

重启es,es 客户端服务,没坚持多久有开始报错。

显示集群负载高,70%,通常在10%以下,状态为red, 意味部分主分片不可用。

紧急选择降级操作,排除历史数据几百G的大分片后恢复。

es 5.2 ,client 是TransportClient ,类似单例模式,排除是client 配置引发的问题

反思:

还是没确认系统负载高对应业务的具体索引,以及具体查询语句。对于底层掌握不够。

补充知识点:

1 为啥不能随意调整es的线程池参数

在并发查询量大的情况下,访问流量超过了集群中单个Elasticsearch实例的处理能力,Elasticsearch服务端会触发保护性的机制,这个跟硬件配置cpu的核数有关,调成几百估计系统就崩溃了。

核心关注:

索引(index):主要是索引数据和删除数据操作
搜索(search):主要是获取,统计和搜索操作 
批量操作(bulk):主要是对索引的批量操作
更新(refresh):主要是更新操作

官网介绍如下:

A node uses several thread pools to manage memory consumption. Queues associated with many of the thread pools enable pending requests to be held instead of discarded.

There are several thread pools, but the important ones include:

generic

For generic operations (for example, background node discovery). Thread pool type is scaling.

search

For count/search/suggest operations. Thread pool type is fixed with a size of int((# of allocated processors * 3) / 2) + 1, and queue_size of 1000.

search_throttled

For count/search/suggest/get operations on search_throttled indices. Thread pool type is fixed with a size of 1, and queue_size of 100.

search_coordination

For lightweight search-related coordination operations. Thread pool type is fixed with a size of a max of min(5, (# of allocated processors) / 2), and queue_size of 1000.

get

For get operations. Thread pool type is fixed with a size of # of allocated processors, queue_size of 1000.

analyze

For analyze requests. Thread pool type is fixed with a size of 1, queue size of 16.

write

For single-document index/delete/update and bulk requests. Thread pool type is fixed with a size of # of allocated processors, queue_size of 10000. The maximum size for this pool is 1 + # of allocated processors.

snapshot

For snapshot/restore operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(5, (# of allocated processors) / 2).

snapshot_meta

For snapshot repository metadata read operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(50, (# of allocated processors* 3)).

warmer

For segment warm-up operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(5, (# of allocated processors) / 2).

refresh

For refresh operations. Thread pool type is scaling with a keep-alive of 5m and a max of min(10, (# of allocated processors) / 2).

fetch_shard_started

For listing shard states. Thread pool type is scaling with keep-alive of 5m and a default maximum size of 2 * # of allocated processors.

fetch_shard_store

For listing shard stores. Thread pool type is scaling with keep-alive of 5m and a default maximum size of 2 * # of allocated processors.

flush

For flush and translog fsync operations. Thread pool type is scaling with a keep-alive of 5m and a default maximum size of min(5, (# of allocated processors) / 2).

force_merge

For force merge operations. Thread pool type is fixed with a size of 1 and an unbounded queue size.

management

For cluster management. Thread pool type is scaling with a keep-alive of 5m and a default maximum size of 5.

system_read

For read operations on system indices. Thread pool type is fixed with a default maximum size of min(5, (# of allocated processors) / 2).

system_write

For write operations on system indices. Thread pool type is fixed with a default maximum size of min(5, (# of allocated processors) / 2).

system_critical_read

For critical read operations on system indices. Thread pool type is fixed with a default maximum size of min(5, (# of allocated processors) / 2).

system_critical_write

For critical write operations on system indices. Thread pool type is fixed with a default maximum size of min(5, (# of allocated processors) / 2).

watcher

For watch executions. Thread pool type is fixed with a default maximum size of min(5 * (# of allocated processors), 50) and queue_size of 1000.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值