现象:
昨天突然线上很多接口获取失败,通过 kibana发现大量异常,具体异常信息:
...into fallback. Rejected command because thread-pool queueSize is at rejection threshold.
异常代码出处:
@FeignClient(name = "api", fallbackFactory = LoadBalancingFallbackFactory.class)
public interface LoadBalancingFeignClient {
@PostMapping(value = "/api/loadBalancing/server")
Result currentServer();
}
@Slf4j
@Component
public class LoadBalancingFallbackFactory implements FallbackFactory<LoadBalancingFeignClient> {
@Override
public LoadBalancingFeignClient create(Throwable throwable) {
final String msg = throwable.getMessage();
return () -> {
log.error("loadBalancingFeignClient currentServer into fallback. {}", msg);
return Result.error();
};****
}
}
原因:
看到这里已经很明显了,是由于hystrix线程池不够用,直接熔断导致的。项目apollo配置:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds = 3500
hystrix.threadpool.default.maxQueueSize = 60
hystrix.threadpool.default.queueSizeRejectionThreshold = 40
hystrix参数简析:
maxQueueSize:线程池大小,默认为-1,创建的队列是SynchronousQueue,如果设置大于0则根据其大小创建LinkedBlockingQueue。
queueSizeRejectionThreshold:动态控制线程池队列的上限,即使maxQueueSize没有达到,达到queueSizeRejectionThreshold该值后,请求也会被拒绝,默认值5
相关源码:
hystrix-core-1.5.12-sources.jar!/com/netflix/hystrix/strategy/concurrency/HystrixContextScheduler.java
private class HystrixContextSchedulerWorker extends Worker {
private final Worker worker;
private HystrixContextSchedulerWorker(Worker actualWorker) {
this.worker = actualWorker;
}
@Override
public void unsubscribe() {
worker.unsubscribe();
}
@Override
public boolean isUnsubscribed() {
return worker.isUnsubscribed();
}
@Override
public Subscription schedule(Action0 action, long delayTime, TimeUnit unit) {
if (threadPool != null) {
if (!threadPool.isQueueSpaceAvailable()) {
throw new RejectedExecutionException("Rejected command because thread-pool queueSize is at rejection threshold.");
}
}
return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action), delayTime, unit);
}
@Override
public Subscription schedule(Action0 action) {
if (threadPool != null) {
if (!threadPool.isQueueSpaceAvailable()) {
throw new RejectedExecutionException("Rejected command because thread-pool queueSize is at rejection threshold.");
}
}
return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action));
}
}
解决办法:
- 适当调大Hystrix线程队列参数
- 动态水平扩容服务
- 优化下游服务,减少服务响应时间
【线上踩坑】系列主要是用于简单记录工作中实际遇到的线上问题,如需要更深入的学习和了解,可以直接联系作者。