一、场景复现:
1、生产环境在上线得时候,重启应用,出现如下报错
2、使用技术栈 springboot 1.5.7 dubbo 2.6.1 druid 1.1.0 …
二、问题分析
上面错误是说 拿到得数据库连接已经被关闭,这种情况一般是由于停机得时候应用里面还有请求在处理,而这些处理中得请求所需要得资源,比如依赖的某个bean已经被spring容器关闭。
三、错误复现
主要复现流程如下
1、启动zk
2、注册一个服务提供着到zk
3、注册一个服务消费者
4、通过jmeter,起N个线程发送请求,在任务处理还没完成的某个时间点,手动kill pid tomcat进程
5、观察日志输出
数据源DruidDataSource这个bean 已经被spring关闭,但我们的请求还没处理完,复现了生产报错
四、 问题定位
在springboot里面,如果使用外置容器,需要通过SpringBootServletInitializer来实现,
@Override
protected SpringApplicationBuilder configure(SpringApplicationBuilder builder) {
return builder.sources(DubboConsumerApplication.class);
}
通过跟踪源码,可以看到,这个实现默认会注册一个钩子,参见
private void refreshContext(ConfigurableApplicationContext context) {
refresh(context);
if (this.registerShutdownHook) {
try {
context.registerShutdownHook();
}
catch (AccessControlException ex) {
// Not allowed in some environments.
}
}
}
springboot里面得钩子注册
@Override
public void registerShutdownHook() {
if (this.shutdownHook == null) {
// No shutdown hook registered yet.
this.shutdownHook = new Thread() {
@Override
public void run() {
synchronized (startupShutdownMonitor) {
doClose();
}
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
而dubbo也有自己的钩子实现,dubbo的钩子在AbstractConfig的静态代码块中,
static {
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
@Override
public void run() {
if (logger.isInfoEnabled()) {
logger.info("Run shutdown hook now.");
}
ProtocolConfig.destroyAll();
}
}, "DubboShutdownHook"));
}
JVM处理钩子类为java.lang.ApplicationShutdownHooks,当运行hook的时候,多个钩子并发执行,没有先后顺序,因为没有先后顺序,就有可能出现钩子之间得资源竞争,比如某个钩子所需要得资源被其它钩子给关闭了。
static void runHooks() {
Collection<Thread> threads;
synchronized(ApplicationShutdownHooks.class) {
threads = hooks.keySet();
hooks = null;
}
//多个钩子线程执行
for (Thread hook : threads) {
hook.start();
}
for (Thread hook : threads) {
while (true) {
try {
hook.join();
break;
} catch (InterruptedException ignored) {
}
}
}
}
目前解决这个问题的方案是:在spring关闭的时候,手动调用dubbo钩子线程执行的方法,再关闭spring管理的资源,比如bean等。
首先springboot的钩子线程被触发得时候,运行如下代码
protected void doClose() {
//入口处做了判断,采用CAS,即使多次执行该方法也ok
if (this.active.get() && this.closed.compareAndSet(false, true)) {
if (logger.isInfoEnabled()) {
logger.info("Closing " + this);
}
LiveBeansView.unregisterApplicationContext(this);
try {
// Publish shutdown event. spring event事件,这里发送容器关闭事件,我们可以在这里做切入
publishEvent(new ContextClosedEvent(this));
}
catch (Throwable ex) {
logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
}
// Stop all Lifecycle beans, to avoid delays during individual destruction.
if (this.lifecycleProcessor != null) {
try {
this.lifecycleProcessor.onClose();
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
}
//执行bean得销毁操作
// Destroy all cached singletons in the context's BeanFactory.
destroyBeans();
// Close the state of this context itself.
closeBeanFactory();
// Let subclasses do some final clean-up if they wish...
onClose();
this.active.set(false);
}
}
首先实现一个Listener,用来监听spring关闭事件,然后手动sleep, 延迟容器得关闭时间
@Component
public class ContextClosedOrderHandler implements ApplicationListener, ApplicationContextAware {
private static final Logger logger = LoggerFactory.getLogger(ContextClosedOrderHandler.class);
private ApplicationContext context;
@Override
public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
this.context = applicationContext;
}
@Override
public void onApplicationEvent(ApplicationEvent event) {
if (event instanceof ContextClosedEvent) {
logger.info("开始关闭spring容器");
try {
ProtocolConfig.destroyAll();
logger.info("spring关闭事件,开始销毁bean");
//如果应用里面还有其它的自定义的线程池等,也可以在这里关闭
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
上面延缓了spring容器得关闭时间,先执行dubbo的优雅停机,通过测试,日志如下
34590 [http-nio-8080-exec-1] INFO org.apache.zookeeper.ZooKeeper - Initiating client connection, connectString=192.168.127.138:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@18fb7e3
34669 [http-nio-8080-exec-1] INFO o.a.c.f.imps.CuratorFrameworkImpl - Default schema
34716 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO org.apache.zookeeper.ClientCnxn - Opening socket connection to server 192.168.127.138/192.168.127.138:2181. Will not attempt to authenticate using SASL (unknown error)
34730 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO org.apache.zookeeper.ClientCnxn - Socket connection established to 192.168.127.138/192.168.127.138:2181, initiating session
34767 [http-nio-8080-exec-1-SendThread(192.168.127.138:2181)] INFO org.apache.zookeeper.ClientCnxn - Session establishment complete on server 192.168.127.138/192.168.127.138:2181, sessionid = 0x16ad66caf160009, negotiated timeout = 40000
34794 [http-nio-8080-exec-1] INFO c.a.d.r.zookeeper.ZookeeperRegistry - [DUBBO] Register: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=consumers&check=false&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
34820 [http-nio-8080-exec-1-EventThread] INFO o.a.c.f.s.ConnectionStateManager - State change: CONNECTED
35029 [http-nio-8080-exec-1] INFO c.a.d.r.zookeeper.ZookeeperRegistry - [DUBBO] Subscribe: consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, dubbo version: 2.6.1, current host: 192.168.127.138
35086 [http-nio-8080-exec-1] INFO c.a.d.r.zookeeper.ZookeeperRegistry - [DUBBO] Notify urls for subscribe url consumer://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=providers,configurators,routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, urls: [dubbo://192.168.10.17:20890/com.dubbo.api.UserService?anyhost=true&application=dubbo-provider&dubbo=2.6.1&generic=false&group=dubbo&interface=com.dubbo.api.UserService&methods=sayUser,sayHello,sayUserOutPutPOJO,sayHello2&pid=17664&revision=1.0.0&side=provider×tamp=1558323464597&version=1.0.0, empty://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=configurators&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0, empty://192.168.127.138/com.alibaba.dubbo.rpc.service.GenericService?application=dubbo-consumer&category=routers&dubbo=2.6.1&generic=true&group=dubbo&interface=com.dubbo.api.UserService&logger=slf4j&pid=10118&side=consumer×tamp=1558378077302&version=1.0.0], dubbo version: 2.6.1, current host: 192.168.127.138
35532 [http-nio-8080-exec-1] INFO c.a.d.r.transport.AbstractClient - [DUBBO] Successed connect to