问题:
Spring Cloud项目启动后,首次使用 FeignClient 请求往往会消耗大量时间,并有一定概率因此导致请求超时。
基本就是两个问题:
- FeignClient 首次请求耗时较长;
- FeignClient 首次请求失败。
探索
仔细观察日志,会发现本质上是因为FeignClient 的初始化花费了大量时间。
2019-01-28 16:19:46.074 INFO 3740 --- [nio-9790-exec-2] s.c.a.AnnotationConfigApplicationContext : Refreshing SpringClientFactory-ms-dyh-manufacturer: startup date [Mon Jan 28 16:19:46 CST 2019]; parent: org.springframework.boot.web.servlet.context.AnnotationConfigServletWebServerApplicationContext@15b986cd
2019-01-28 16:19:46.411 INFO 3740 --- [nio-9790-exec-2] f.a.AutowiredAnnotationBeanPostProcessor : JSR-330 'javax.inject.Inject' annotation found and supported for autowiring
2019-01-28 16:19:46.671 INFO 3740 --- [nio-9790-exec-2] c.netflix.config.ChainedDynamicProperty : Flipping property: ms-dyh-manufacturer.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-01-28 16:19:46.715 INFO 3740 --- [nio-9790-exec-2] c.n.u.concurrent.ShutdownEnabledTimer : Shutdown hook installed for: NFLoadBalancer-PingTimer-ms-dyh-manufacturer
2019-01-28 16:19:46.776 INFO 3740 --- [nio-9790-exec-2] c.netflix.loadbalancer.BaseLoadBalancer : Client: ms-dyh-manufacturer instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=ms-dyh-manufacturer,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2019-01-28 16:19:46.783 INFO 3740 --- [nio-9790-exec-2] c.n.l.DynamicServerListLoadBalancer : Using serverListUpdater PollingServerListUpdater
2019-01-28 16:19:46.810 INFO 3740 --- [nio-9790-exec-2] c.netflix.config.ChainedDynamicProperty : Flipping property: ms-dyh-manufacturer.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2019-01-28 16:19:46.811 INFO 3740 --- [nio-9790-exec-2] c.n.l.DynamicServerListLoadBalancer : DynamicServerListLoadBalancer for client ms-dyh-manufacturer initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=ms-dyh-manufacturer,current list of Servers=[192.168.0.114:8360],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.0.114:8360; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:ConsulServerList{serviceId='ms-dyh-manufacturer', tag=null}
事实上在15年的时候已经有人提出这个问题 issue,当前这个issue还处于open状态。
可以发现,这哥们判断的原因跟我们判断的一致:尽管feign client是在启动时被创建,但真正的初始化却是在首次使用feign client的时候进行的。
解决
1. (不建议)将 Hystrix 的超时时间调高
### Hystrix 配置
hystrix:
command:
default:
execution:
isolation:
thread:
timeoutInMilliseconds: 5000
理论上这是一个治标的办法,这样处理能够解决超时的问题,但无法解决首次花费时间长的问题。同时因为需要将熔断器的超时时间设置得更长,等价于在一定程度上限制了熔断器的适用范围。
所以可用这个方法,但不推荐这个方法。
2. (傻子)禁用 Hystrix 的超时时间
这个方法简直就是傻子才选的,不描述了。
3. 模拟请求进行warm up
基本思路:在spring容器初始化后,找到所有实现了FeignClient 的bean,主动发起任意请求,该请求会导致feign client的真正初始化。
step1. 对feign client的接口添加方法
@GetMapping("/actuator/health")
String heartbeat();
step2. 添加ApplicationListener在spring context加载完后,找到所有的feign client,并通过反射执行一次heart beat,此时便会取巧地触发feign client的初始化。
@Component
public class EarlyInitFeignClientOnContextRefresh implements
ApplicationListener<ContextRefreshedEvent> {
Logger logger = LoggerFactory.getLogger(EarlyInitFeignClientOnContextRefresh.class);
@Autowired()
@Qualifier("cachingLBClientFactory")
CachingSpringLoadBalancerFactory factory;
@Override
public void onApplicationEvent(ContextRefreshedEvent event) {
ApplicationContext applicationContext = event.getApplicationContext();
Map<String, Object> beans = applicationContext.getBeansWithAnnotation(FeignClient.class);
for (Map.Entry<String, Object> entry :
beans.entrySet()) {
Class<?> clazz = entry.getValue().getClass();
try {
Method method = null;
method = clazz.getMethod("heartbeat");
method.invoke(entry.getValue());
logger.warn("init feign client: " + clazz.getName());
} catch (NoSuchMethodException e) {
logger.warn("init feign client fail: no method of heartbeat in " + clazz.getName());
} catch (IllegalAccessException e) {
logger.warn("init feign client fail: IllegalAccessException of " + clazz.getName());
} catch (InvocationTargetException e) {
logger.warn("init feign client fail: InvocationTargetException of " + clazz.getName());
} catch (Exception e){
logger.error(e.getMessage());
}
}
logger.info("init feign client done!");
}
}
但是这种方法肯存在一定的风险,feign的contributor有提到他们之所以用lazy creation是因为不这么做的话在某些特定场景下会存在问题。
扩展思考
- FeignClient的初始化过程;
- FeignClient初始化各阶段的消耗时长,进而具体哪一步耗时最长(是否是注册和发现);
- 当FeignClient调用的服务不在线时,能否保证方法三仍旧有效;
- 切入到FeignClient真实创建过程的初始化,而非通过使用client发起请求模拟地达到这个目的。