一:问题描述
- 在一段代码执行前,会写日志,在代码的finally快,也会写日志。现在遇到的问题是finally的代码没有执行。
二:问题查找
同事让我看下项目中为什么没有执行结束的日志(该日志可以在管理员界面看到)。我登录系统看到确实没有结束日志,只有开始日志。于是开始查找问题。
2.1 jstack
- 登录系统使用ps -ef|grep java 找到项目的pid,然后jstack pid >> thread.log,然后把thread.log下载到本地。在执行栈中寻找我打印日志前后相关代码。部分日志如下
2.2 日志查询
"xxxxx_Worker-113" prio=10 tid=0x00007f12709e2000 nid=0x552 waiting on condition [0x00007f10e72f0000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000770db96b8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:867)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1197)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
at com.alibaba.dubbo.registry.support.AbstractRegistryFactory.getRegistry(AbstractRegistryFactory.java:88)
at com.alibaba.dubbo.registry.RegistryFactory$Adpative.getRegistry(RegistryFactory$Adpative.java)
at com.alibaba.dubbo.registry.integration.RegistryProtocol.refer(RegistryProtocol.java:240)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper.refer(ProtocolFilterWrapper.java:60)
at com.alibaba.dubbo.rpc.protocol.ProtocolListenerWrapper.refer(ProtocolListenerWrapper.java:63)
at com.alibaba.dubbo.rpc.Protocol$Adpative.refer(Protocol$Adpative.java)
at com.alibaba.dubbo.config.ReferenceConfig.createProxy(ReferenceConfig.java:401)
at com.alibaba.dubbo.config.ReferenceConfig.init(ReferenceConfig.java:309)
at com.alibaba.dubbo.config.ReferenceConfig.get(ReferenceConfig.java:136)
- locked <==0x00000007aa0ace80==> (a com.alibaba.dubbo.config.ReferenceConfig)
at com.xxxx.scheduler.server.process.AbstractProcess.createDubboProxy(AbstractProcess.java:168)
at com.xxxx.scheduler.server.process.DubboProcess.process(DubboProcess.java:82)
at com.xxxx.scheduler.server.process.DubboProcess$$FastClassBySpringCGLIB$$be6a767d.invoke(<generated>)
如上日志显示出createDubboProxy在我的业务代码中创建代理类,如上有2个锁相关。locked <0x00000007aa0ace80> 和parking to wait for <0x0000000770db96b8> 对应的代码分别是。
public synchronized T get() {
if (destroyed){
throw new IllegalStateException("Already destroyed!");
}
if (ref == null) {
init();
}
return ref;
}
和
public Registry getRegistry(URL url) {
url = url.setPath(RegistryService.class.getName())
.addParameter(Constants.INTERFACE_KEY, RegistryService.class.getName())
.removeParameters(Constants.EXPORT_KEY, Constants.REFER_KEY);
String key = url.toServiceString();
// 锁定注册中心获取过程,保证注册中心单一实例
LOCK.lock();
try {
Registry registry = REGISTRIES.get(key);
if (registry != null) {
return registry;
}
registry = createRegistry(url);
if (registry == null) {
throw new IllegalStateException("Can not create registry " + url);
}
REGISTRIES.put(key, registry);
return registry;
} finally {
// 释放锁
LOCK.unlock();
}
}
//这里的lock定义是static的lock。
private static final ReentrantLock LOCK = new ReentrantLock();
对于第一个锁方法 public synchronized T get() ,锁的是this对象,因为当前类在外面的调用中是new的,每次调用都是全新的this,所以不存在锁争用,搜索0x00000007aa0ace80也能看到每个调用栈lock都是不同的对象
对于第二个存在锁的方法public Registry getRegistry(URL url) 可以看到静态的锁,全局只能有一个线程通过。于是我需要在栈信息中搜索获取到锁的调用链,在线程信息中搜索相关代码,于是我搜关键字ZookeeperRegistryFactory.createRegistry ,:ctrl + F 搜索 “ZookeeperRegistryFactory.createRegistry”,大多数线程调用栈都和上面的日志一样,到AbstractRegistryFactory.getRegistry后就进入ReentrantLock类了,只有如下线程调用栈不同,日志如下:
"quartzScheduler_Worker-70" prio=10 tid=0x00007f127098a800 nid=0x527 waiting on condition [0x00007f12a1edc000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007a92154b8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkUntil(LockSupport.java:267)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitUntil(AbstractQueuedSynchronizer.java:2130)
at org.I0Itec.zkclient.ZkClient.waitForKeeperState(ZkClient.java:651)
at org.I0Itec.zkclient.ZkClient.waitUntilConnected(ZkClient.java:634)
at org.I0Itec.zkclient.ZkClient.connect(ZkClient.java:894)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:98)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:92)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:76)
at org.I0Itec.zkclient.ZkClient.<init>(ZkClient.java:72)
at
//调用栈中的关键点ZkclientZookeeperClient.java:25行。
com.alibaba.dubbo.remoting.zookeeper.zkclient.ZkclientZookeeperClient.<init>(ZkclientZookeeperClient.java:25)
at com.alibaba.dubbo.remoting.zookeeper.zkclient.ZkclientZookeeperTransporter.connect(ZkclientZookeeperTransporter.java:10)
at com.alibaba.dubbo.remoting.zookeeper.ZookeeperTransporter$Adpative.connect(ZookeeperTransporter$Adpative.java)
at com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistry.<init>(ZookeeperRegistry.java:71)
at com.alibaba.dubbo.registry.zookeeper.ZookeeperRegistryFactory.createRegistry(ZookeeperRegistryFactory.java:37)
at com.alibaba.dubbo.registry.support.AbstractRegistryFactory.getRegistry(AbstractRegistryFactory.java:94)
at com.alibaba.dubbo.registry.RegistryFactory$Adpative.getRegistry(RegistryFactory$Adpative.java)
at com.alibaba.dubbo.registry.integration.RegistryProtocol.refer(RegistryProtocol.java:240)
at com.alibaba.dubbo.rpc.protocol.ProtocolFilterWrapper.refer(ProtocolFilterWrapper.java:60)
at com.alibaba.dubbo.rpc.protocol.ProtocolListenerWrapper.refer(ProtocolListenerWrapper.java:63)
at com.alibaba.dubbo.rpc.Protocol$Adpative.refer(Protocol$Adpative.java)
at com.alibaba.dubbo.config.ReferenceConfig.createProxy(ReferenceConfig.java:401)
at com.alibaba.dubbo.config.ReferenceConfig.init(ReferenceConfig.java:309)
at com.alibaba.dubbo.config.ReferenceConfig.get(ReferenceConfig.java:136)
根据这个调用找到代码如下:
public class ZkclientZookeeperClient extends AbstractZookeeperClient<IZkChildListener> {
private final ZkClient client;
private volatile KeeperState state = KeeperState.SyncConnected;
public ZkclientZookeeperClient(URL url) {
super(url);
//这里初始化ZkClient使用的是只有url地址的构造函数。
//内部实现是 this(serverstring, Integer.MAX_VALUE);
//也就是超时设置的是Integer.MAX_VALUE毫秒
client = new ZkClient(url.getBackupAddress());
client.subscribeStateChanges(new IZkStateListener() {
public void handleStateChanged(KeeperState state) throws Exception {
ZkclientZookeeperClient.this.state = state;
if (state == KeeperState.Disconnected) {
stateChanged(StateListener.DISCONNECTED);
} else if (state == KeeperState.SyncConnected) {
stateChanged(StateListener.CONNECTED);
}
}
public void handleNewSession() throws Exception {
stateChanged(StateListener.RECONNECTED);
}
});
}
三:解决办法。
升级dubbo是不可能的,于是我去看了dubbo的github代码,发现已经在 2017.10.31 修复了这个问题。
提交记录:
具体提交记录:
到此确认这个确实是dubbo的bug。于是需要解决这个问题。升级dubbo风险太大,修改单个dubbo的jar,又害怕因为重新编译出问题。最后只能是把ZkclientZookeeperClient复制一份到我业务的src目录下,然后做了如下修改:
//dubbo中ZkclientZookeeperClient.java第25行中的
client = new ZkClient(url.getBackupAddress());
//修改为如下代码,加入超时时间。
//为了确保dubbo使用的是我业务的ZkclientZookeeperClient类,而不是dubbo.jar中的ZkclientZookeeperClient类,于是加了日志。测试发现没有问题。
logger.info(" ignored this error, My_ZkclientZookeeperClient init | start new ZkClient,url is:{} ", url.getBackupAddress());
try {
client = new ZkClient(url.getBackupAddress(), 10000);
} catch (Exception e) {
logger.error(" new ZkClient is fail ,url is ", url.getBackupAddress(), e);
throw e;
}
- tomcat有自己的类加载器,会在lib和classes中优先加载classes的class文件。如果是java,则会是java -cp(等价于java -classpath)指定,平时我们用ide运行java程序,ide都会指定-cp。 *
到此,问题解决。