在JBoss Remoting 2.2.2中存在这么一个bug,如果刚好客户端的timeout比服务器端处理时间短的话,就会出现客户端连接池中的连接被无故用掉一个的状况,而且是没法回收的,最终就会导致很快客户端的连接池被占满的现象,在分析JBoss Remoting 2.2.2的代码后发现了问题的所在,同时查看了下JBoss Remoting 2.4的代码,发现在2.4中此bug已被修复。
来看下JBoss Remoting 2.2.2中有问题的这段代码的片断:
{
if (pooled != null )
{
usedPooled ++ ;
if (trace) log.trace( this + " got a socket, usedPooled: " + usedPooled);
break ;
}
if (usedPooled < maxPoolSize)
{
// Try to get a socket.
if (trace) log.trace( this + " getting a socket, usedPooled: " + usedPooled);
usedPooled ++ ;
}
else
{
retry = true ;
if (trace) log.trace( this + " will try again to get a socket " );
}
}
Socket socket = null ;
long timestamp = System.currentTimeMillis();
try
{
if (trace) { log.trace( this + " creating socket " + (counter ++ ) + " , attempt " + (i + 1 )); }
socket = createSocket(address.address, address.port, timeRemaining);
if (trace) log.trace( this + " created socket: " + socket);
}
catch (Exception ex)
{
log.debug( this + " got Exception " + ex + " , creation attempt took " +
(System.currentTimeMillis() - timestamp) + " ms " );
synchronized (usedPoolLock)
{
usedPooled -- ;
}
if (i + 1 < numberOfRetries)
{
Thread.sleep( 1 );
continue ;
}
throw ex;
}
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata = getLocator().getParameters();
if (metadata == null )
{
metadata = new HashMap( 2 );
}
else
{
metadata = new HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if (timeAllowed > 0 )
{
timeRemaining = ( int ) (timeAllowed - (System.currentTimeMillis() - start));
if (timeRemaining <= 0 )
break ;
metadata.put(SocketWrapper.TEMP_TIMEOUT, new Integer(timeRemaining));
}
pooled = createClientSocket(socket, address.timeout, metadata);
这段代码的问题出在哪呢,就出在最后一行,或者说出在前面实现给usedPooled++上也可以。
在这里JBoss Remoting过于相信createClientSocket这行代码了,jboss remoting认为这行代码是不可能抛出异常的,但事实上其实这行是有可能会抛出异常的,可以想下,如果这行代码执行抛出异常的话,会造成的现象就是之前说的,客户端连接池中的连接被占用了一个,而且没有回收的地方。
所以最简单的方法自然是在pooled=createClientSocket(socket, address.timeout, metadata);这行代码上增加捕捉try...catch,如果有异常抛出的话,则将usedPooled--;就像之前createSocket那个地方一样。
在JBoss Remoting 2.4中,jboss不再采用usedPooled这个long型加上usedPoolLock这个对象锁的方式来控制连接池,而是改为了采用更简单好用的Semphore,不过用的还是EDG包的,而不是java 5的,来看看jboss remoting 2.4中的这段代码改成什么样了:
if (trace) log.trace( this + " obtained semaphore: " + semaphore.permits());
if (timedout)
{
throw new IllegalStateException( " Timeout waiting for a free socket " );
}
SocketWrapper pooled = null ;
if (tryPool)
{
synchronized (pool)
{
// if connection within pool, use it
if (pool.size() > 0 )
{
pooled = getPooledConnection();
if (trace) log.trace( this + " reusing pooled connection: " + pooled);
}
}
}
else
{
if (trace) log.trace( this + " avoiding connection pool, creating new socket " );
}
if (pooled == null )
{
// Need to create a new one
Socket socket = null ;
if (trace) { log.trace( this + " creating socket " ); }
// timeAllowed < 0 indicates no per invocation timeout has been set.
int timeRemaining = - 1 ;
if ( 0 <= timeAllowed)
{
timeRemaining = ( int ) (timeAllowed - (System.currentTimeMillis() - start));
}
socket = createSocket(address.address, address.port, timeRemaining);
if (trace) log.trace( this + " created socket: " + socket);
socket.setTcpNoDelay(address.enableTcpNoDelay);
Map metadata = getLocator().getParameters();
if (metadata == null )
{
metadata = new HashMap( 2 );
}
else
{
metadata = new HashMap(metadata);
}
metadata.put(SocketWrapper.MARSHALLER, marshaller);
metadata.put(SocketWrapper.UNMARSHALLER, unmarshaller);
if (timeAllowed > 0 )
{
timeRemaining = ( int ) (timeAllowed - (System.currentTimeMillis() - start));
if (timeRemaining <= 0 )
throw new IllegalStateException( " Timeout creating a new socket " );
metadata.put(SocketWrapper.TEMP_TIMEOUT, new Integer(timeRemaining));
}
pooled = createClientSocket(socket, address.timeout, metadata);
}
return pooled;
从以上代码可以看到,JBoss首先是通过semphore.attempt的方式来获取信号量锁,然后就在下面的所有代码中都不做异常的捕捉,jboss在这里改为了在外面统一捕捉这个方法的所有异常,并在有异常的情况下再调用semphore.release():
{
boolean tryPool = retryCount < (numberOfCallRetries - 1 )
|| maxPoolSize == 1
|| numberOfCallRetries == 1 ;
long l = System.currentTimeMillis();
socketWrapper = getConnection(marshaller, unmarshaller, tryPool, timeLeft);
long d = System.currentTimeMillis() - l;
if (trace) log.trace( " took " + d + " ms to get socket " + socketWrapper);
}
catch (Exception e)
{
// if (bailOut)
// return null;
semaphore.release();
if (trace) log.trace( this + " released semaphore: " + semaphore.permits(), e);
sockEx = new CannotConnectException(
" Can not get connection to server. Problem establishing " +
" socket connection for " + locator, e);
continue ;
}
这样自然是不会再出现2.2.2版本里的那个bug了。
:),由于2.4是重构为了采用semphore,不知道这个bug是刚好凑巧被这样修复了呢,还是知道了这个bug进行fixed,呵呵,不管如何,总之bug是被修订了。
这个bug对于使用jboss remoting的同学们而言还是要引起注意的,因为jboss remoting 2.2.2是jboss as 4.2.2中默认带的版本。
从上面的代码可以看到使用semphore这样的方式来控制并发的需要限制大小的数据结构是非常好的,简单易用,以前的那些long+Object Lock的方式实在是繁琐,另外这个bug也给大家提了醒,在并发的这些资源的控制上千万要注意锁以及释放的点,千万不要主观的认为某些代码是绝对不会出问题的。