- 现状
dubbo-2.5.4以下的版本不支持优雅停机,是因为服务端关闭时,客户端主动关闭长连接,导致服务端响应消息不能返回。
在服务端和客户端配置参数:-Ddubbo.service.shutdown.wait=30000,该参数为停机等待时间,但是结果也不生效,还是马上停机了。
- 期望结果
- 正在读或者写的消息,需要继续处理;
- 线程池需等所有任务执行完成后,才关闭;
- 服务端停机时客户端操作
当服务端停机时,客户端检测断开事件,马上关闭了与该服务端之间的连接,并无限次重连服务提供折。如果服务提供者只有一个时,则不在进行服务的远程调用。
客户端断开连接时序图:
通过zk的节点变化事件触发notify,客户端刷新服务提供者,删除该服务提供者【destroyUnusedInvokers方法】;如果只有一个服务提供者时,则销毁所有Invoker【destroyAllInvokers方法】。
- 客户端代码修改
在dubboInvoker销毁时,直接调用了client.close()方法,源码如下:
public void destroy() {
if (super.isDestroyed()){
return ;
} else {
destroyLock.lock();
try{
if (super.isDestroyed()){
return ;
}
super.destroy();
if (invokers != null){
invokers.remove(this);
}
for (ExchangeClient client : clients) {
try {
//直接关闭
client.close();
} catch (Throwable t) {
logger.warn(t.getMessage(), t);
}
}
}finally {
destroyLock.unlock();
}
}
}
修改client.close()为client.close(timeout),代码修改如下:
com.alibaba.dubbo.rpc.protocol.dubbo.DubboInvoker
public void destroy() {
if (super.isDestroyed()){
return ;
} else {
destroyLock.lock();
try{
if (super.isDestroyed()){
return ;
}
super.destroy();
if (invokers != null){
invokers.remove(this);
}
for (ExchangeClient client : clients) {
try {
//修改关闭
this.close(client);
} catch (Throwable t) {
logger.warn(t.getMessage(), t);
}
}
}finally {
destroyLock.unlock();
}
}
}
/**
* 如果没有设置dubbo.service.shutdown.wait
* 或者dubbo.service.shutdown.wait.seconds参数,则直接关闭
* @param client
* @author 夏志强
*/
@SuppressWarnings("deprecation")
private void close(ExchangeClient client) {
String timeout = ConfigUtils.getProperty(Constants.SHUTDOWN_WAIT_KEY);
if(timeout != null && timeout.length() > 0) {
try{
client.close(Integer.parseInt(timeout));
} catch(Exception e) {
}
}
}
修改完上述代码,运行后发现客户端还是马上关闭连接,调试代码发现HeaderExchangeChannel的close方法里,判断HeaderExchangeChannel.this是否在DefaultFuture中和优雅退出是否超过超时时间,如果过了超时时间,则立即关闭,源码如下:
public void close(int timeout) {
if (closed) {
return;
}
closed = true;
if (timeout > 0) {
long start = System.currentTimeMillis();
//DefaultFuture中CHANNELS不包含HeaderExchangeChannel类型
while (DefaultFuture.hasFuture(HeaderExchangeChannel.this)
&& System.currentTimeMillis() - start < timeout) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
logger.warn(e.getMessage(), e);
}
}
}
close();
}
查看HeaderExchangeChannel的request方法,
DefaultFuture future = new DefaultFuture(channel, req, timeout);
实际上channel是NettyClient,所以channel不会是HeaderExchangeChannel类型。
修改代码如下:
com.alibaba.dubbo.remoting.exchange.support.header.HeaderExchangeChannel
// graceful close
public void close(int timeout) {
if (closed) {
return;
}
closed = true;
if (timeout > 0) {
long start = System.currentTimeMillis();
while (DefaultFuture.hasFuture(channel)
&& System.currentTimeMillis() - start < timeout) {
try {
Thread.sleep(10);
} catch (InterruptedException e) {
logger.warn(e.getMessage(), e);
}
}
}
close();
}
再次运行代码,发现结果还是不对,客户端一直在等待关闭,而服务端已经关闭了。
- 服务端停机时序图
目前我分析的是ProtocolConfig的destoryAll()方法,主要分两步,第一步删除zk服务提供者节点,关闭zk监听;第二步dubbo协议销毁,链接关闭。
dubbo协议销毁时序图:
通过时序图看到服务端的优雅关闭是在AbstractServer的close(timeout)方法,源码如下:
public void close(int timeout) {
ExecutorUtil.gracefulShutdown(executor ,timeout);
close();
}
//ExecutorUtil类
public static void gracefulShutdown(Executor executor, int timeout) {
if (!(executor instanceof ExecutorService) || isShutdown(executor)) {
return;
}
final ExecutorService es = (ExecutorService) executor;
try {
es.shutdown(); // Disable new tasks from being submitted
} catch (SecurityException ex2) {
return ;
} catch (NullPointerException ex2) {
return ;
}
try {
if(! es.awaitTermination(timeout, TimeUnit.MILLISECONDS)) {
es.shutdownNow();
}
} catch (InterruptedException ex) {
es.shutdownNow();
Thread.currentThread().interrupt();
}
if (!isShutdown(es)){
newThreadToCloseExecutor(es);
}
}
其中executor是dubbo的线程派发模型,只有executor不为空时,才会等待线程池任务执行完后关闭。这里有一个坑,那就是调用es.awaitTermination时,一定要先调用es.shutdown(),否则就算线程池中的任务执行完或者超时后,都不会关闭,一直阻塞【详情请查看jdk】。调试时发现executor为空,executor值是通过AbstractServer的构造函数来初始化的,源码如下:
ExecutorService executor;
public AbstractServer(URL url, ChannelHandler handler) throws RemotingException {
super(url, handler);
localAddress = getUrl().toInetSocketAddress();
String host = url.getParameter(Constants.ANYHOST_KEY, false)
|| NetUtils.isInvalidLocalHost(getUrl().getHost())
? NetUtils.ANYHOST : getUrl().getHost();
bindAddress = new InetSocketAddress(host, getUrl().getPort());
this.accepts = url.getParameter(Constants.ACCEPTS_KEY, Constants.DEFAULT_ACCEPTS);
this.idleTimeout = url.getParameter(Constants.IDLE_TIMEOUT_KEY, Constants.DEFAULT_IDLE_TIMEOUT);
try {
doOpen();
if (logger.isInfoEnabled()) {
logger.info("Start " + getClass().getSimpleName() + " bind " + getBindAddress() + ", export " + getLocalAddress());
}
} catch (Throwable t) {
throw new RemotingException(url.toInetSocketAddress(), null, "Failed to bind " + getClass().getSimpleName()
+ " on " + getLocalAddress() + ", cause: " + t.getMessage(), t);
}
//设置executor
if (handler instanceof WrappedChannelHandler ){
executor = ((WrappedChannelHandler)handler).getExecutor();
}
}
我们默认dubbo的底层通信框架为netty,所以查看NettyServer代码,
public NettyServer(URL url, ChannelHandler handler) throws RemotingException{
super(url, ChannelHandlers.wrap(handler, ExecutorUtil.setThreadName(url, SERVER_THREAD_POOL_NAME)));
}
//以下为ChannelHandlers代码
public static ChannelHandler wrap(ChannelHandler handler, URL url){
return ChannelHandlers.getInstance().wrapInternal(handler, url);
}
protected ChannelHandler wrapInternal(ChannelHandler handler, URL url) {
return new MultiMessageHandler(new HeartbeatHandler(ExtensionLoader.getExtensionLoader(Dispatcher.class)
.getAdaptiveExtension().dispatch(handler, url)));
}
构造函数中将handler进行了包装,此时handler类型已经变成了MultiMessageHandler,而不是WrappedChannelHandler。看一下handler的继承关系:
因为dubbo的默认线程模型为AllChannelHandler(参照dubbo官方文档),AllChannelHandler父类为WrappedChannelHandler类型,所以需要通过反射来设置executor值。
修改AbstractServer的构造函数,给executor赋值:
com.alibaba.dubbo.remoting.transport.AbstractServer
public AbstractServer(URL url, ChannelHandler handler) throws RemotingException {
super(url, handler);
localAddress = getUrl().toInetSocketAddress();
String host = url.getParameter(Constants.ANYHOST_KEY, false)
|| NetUtils.isInvalidLocalHost(getUrl().getHost())
? NetUtils.ANYHOST : getUrl().getHost();
bindAddress = new InetSocketAddress(host, getUrl().getPort());
this.accepts = url.getParameter(Constants.ACCEPTS_KEY, Constants.DEFAULT_ACCEPTS);
this.idleTimeout = url.getParameter(Constants.IDLE_TIMEOUT_KEY, Constants.DEFAULT_IDLE_TIMEOUT);
try {
doOpen();
if (logger.isInfoEnabled()) {
logger.info("Start " + getClass().getSimpleName() + " bind " + getBindAddress() + ", export " + getLocalAddress());
}
} catch (Throwable t) {
throw new RemotingException(url.toInetSocketAddress(), null, "Failed to bind " + getClass().getSimpleName()
+ " on " + getLocalAddress() + ", cause: " + t.getMessage(), t);
}
//修改s
this.setExecutor(handler);
}
/**
* 设置executor
* @param handler
* @author 夏志强
*/
private void setExecutor(ChannelHandler handler) {
if(handler != null) {
if (handler instanceof WrappedChannelHandler ){
executor = ((WrappedChannelHandler)handler).getExecutor();
} else if (handler instanceof AbstractChannelHandlerDelegate ){
try {
Field field = AbstractChannelHandlerDelegate.class.getDeclaredField("handler");
field.setAccessible(true);
setExecutor((ChannelHandler)field.get(handler));
} catch (Exception e) {
}
}
}
}
再次运行测试代码,这时返回结果正常。