在一次哨兵集群升级过程中,写Redis客户访问测试接口的时候,接口使用的JFinal的RedisSentinelPlugin初始化好的Redis获取Cache,进而获取Jedis。使用Jedis就可以任意操作Redis了。代码如下:
Cache cache = Redis.use();
Jedis jedis = cache.getJedis();
String s = jedis.get(key);
...use s
结果访问多次之后接口就无响应了,表现为一直阻塞。然后重启应用,又可以访问多次,之后又无响应了。起初不知道原因,怀疑网络延迟,哨兵搭建配置等等,感觉像无头苍蝇。最后实在没办法了,本地调试一下,搭建了一样的哨兵集群,debug,8次之后发现程序阻塞在一行代码中。
为什么会阻塞到这里呢?往上追溯,这个方法是在GenericObjectPool中调用的。
public T borrowObject(long borrowMaxWaitMillis) throws Exception {
assertOpen();
AbandonedConfig ac = this.abandonedConfig;
if (ac != null && ac.getRemoveAbandonedOnBorrow() &&
(getNumIdle() < 2) &&
(getNumActive() > getMaxTotal() - 3) ) {
removeAbandoned(ac);
}
PooledObject<T> p = null;
// Get local copy of current config so it is consistent for entire
// method execution
boolean blockWhenExhausted = getBlockWhenExhausted();
boolean create;
long waitTime = System.currentTimeMillis();
while (p == null) {
create = false;
if (blockWhenExhausted) {
p = idleObjects.pollFirst();
if (p == null) {
p = create();
if (p != null) {
create = true;
}
}
if (p == null) {
if (borrowMaxWaitMillis < 0) {
p = idleObjects.takeFirst();
} else {
p = idleObjects.pollFirst(borrowMaxWaitMillis,
TimeUnit.MILLISECONDS);
}
}
if (p == null) {
throw new NoSuchElementException(
"Timeout waiting for idle object");
}
if (!p.allocate()) {
p = null;
}
} else {
p = idleObjects.pollFirst();
if (p == null) {
p = create();
if (p != null) {
create = true;
}
}
if (p == null) {
throw new NoSuchElementException("Pool exhausted");
}
if (!p.allocate()) {
p = null;
}
}
if (p != null) {
try {
factory.activateObject(p);
} catch (Exception e) {
try {
destroy(p);
} catch (Exception e1) {
// Ignore - activation failure is more important
}
p = null;
if (create) {
NoSuchElementException nsee = new NoSuchElementException(
"Unable to activate object");
nsee.initCause(e);
throw nsee;
}
}
if (p != null && (getTestOnBorrow() || create && getTestOnCreate())) {
boolean validate = false;
Throwable validationThrowable = null;
try {
validate = factory.validateObject(p);
} catch (Throwable t) {
PoolUtils.checkRethrow(t);
validationThrowable = t;
}
if (!validate) {
try {
destroy(p);
destroyedByBorrowValidationCount.incrementAndGet();
} catch (Exception e) {
// Ignore - validation failure is more important
}
p = null;
if (create) {
NoSuchElementException nsee = new NoSuchElementException(
"Unable to validate object");
nsee.initCause(validationThrowable);
throw nsee;
}
}
}
}
}
updateStatsBorrow(p, System.currentTimeMillis() - waitTime);
return p.getObject();
}
此方法大体逻辑是先获取池化对象,没有获取就从空闲队列去拿。
进入这个分支的原因是create方法返回null了。
private PooledObject<T> create() throws Exception {
int localMaxTotal = getMaxTotal();
long newCreateCount = createCount.incrementAndGet();
if (localMaxTotal > -1 && newCreateCount > localMaxTotal ||
newCreateCount > Integer.MAX_VALUE) {
createCount.decrementAndGet();
return null;
}
final PooledObject<T> p;
try {
p = factory.makeObject();
} catch (Exception e) {
createCount.decrementAndGet();
throw e;
}
AbandonedConfig ac = this.abandonedConfig;
if (ac != null && ac.getLogAbandoned()) {
p.setLogAbandoned(true);
}
createdCount.incrementAndGet();
allObjects.put(new IdentityWrapper<T>(p.getObject()), p);
return p;
}
debug进来,发现是由于newCreateCount > localMaxTotal这句话生效了,原因在于localMaxTotal=8,当前面调用8次之后,newCreateCount=9。而localMaxTotal=8是org.apache.commons.pool2.impl.GenericObjectPoolConfig#DEFAULT_MAX_TOTAL这个默认值。果然,在RedisSentinelPlugin的构造方法中默认传入的一个new GenericObjectPoolConfig()。
进一步追究newCreateCount由createCount这个原子变量+1而来,难道他会一直增长就没有减少的地方吗?有!
private void destroy(PooledObject<T> toDestory) throws Exception {
toDestory.invalidate();
idleObjects.remove(toDestory);
allObjects.remove(new IdentityWrapper<T>(toDestory.getObject()));
try {
factory.destroyObject(toDestory);
} finally {
destroyedCount.incrementAndGet();
createCount.decrementAndGet();
}
}
至此,突然灵光一闪,回想起谆谆告诫,池化资源都需要关闭!!原来才发现我们没有关闭的操作。
修改一下,加上关闭操作,世界一下清净了。
Cache cache = Redis.use();
Jedis jedis = cache.getJedis();
String s = jedis.get(key);
cache.close(jedis);
...use s
再去翻Cache源码,发现人家的方法都是这种模式,用完的资源在finally中关闭。
而我们是由于获取Jedis用完后没有关闭造成的这个问题。
一个通宵换来的,记之以飨读者。