起因:由于业务系统有多个定时任务定时访问银行端,银行每天也有大量业务访问业务系统,都是通过mina通信,部署在测试环境的系统每过一两天打开句柄过万,生产的也是一周左右不重启业务系统就会爆掉。一开始并不清楚到底是哪方面原因导致句柄增长这么快,因为这是一个老系统,经过多次升级,大量的并发、多线程,所以只好做了一个定时任务,每周重启生产业务系统。
说明:业务系统和银行之间的通信是通过c写的转换平台转发双方的信息的,结构:业务系统(请求)——>转换平台(转发)——>银行端(相应)——>转换平台(转发)——>业务系统收到响应,银行端访问业务系统也是这样的方式。
开始通过命令查进程占用的句柄数,从大到小排序,一行一个进程ID
lsof -n|awk '{print $2}'|sort|uniq -c|sort -nr|more 其中第一列是打开的句柄数,第二列是进程ID。
然后通过命令查看单个进程所有打开的文件详情
lsof -p 进程id
但是这样查看感觉太乱了,没办法查看,于是通过命令:将执行结果内容输出到日志文件中查看
lsof -p 进程id > openfiles.log
发现是因为很多socket连接没有释放,这就能定位出大概是业务系统和银行通信的问题,分析原因:测试环境有多家银行,有些银行端测试环境没有测试时并不会开启,而业务系统直连的是转换平台,所以业务系统作为客户端,访问转换平台是通的,而转换平台转发不出去,无响应,虽说转换平台设置了超时时间,但是业务端作为客户端访问时并没有设置读取超时时间,所以会导致客户端等待因而导致句柄快速增长
下面贴出业务系统作为cilen端和service端的代码,并标志出做出修改的部分。
client代码:
package com.fortunes.hmfms.network.client;
import java.net.InetSocketAddress;
import java.util.concurrent.TimeUnit;
import org.apache.mina.core.RuntimeIoException;
import org.apache.mina.core.future.ConnectFuture;
import org.apache.mina.core.future.ReadFuture;
import org.apache.mina.core.future.WriteFuture;
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;
import org.apache.mina.filter.codec.ProtocolCodecFilter;
import org.apache.mina.filter.logging.LoggingFilter;
import org.apache.mina.transport.socket.SocketConnector;
import org.apache.mina.transport.socket.nio.NioSocketConnector;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fortunes.Message;
import com.fortunes.hmfms.network.codec.MessageCodecFactory;
import com.fortunes.hmfms.network.model.XmlEntity;
public class Client extends IoHandlerAdapter{
final Logger logger = LoggerFactory.getLogger("ROOT");
public static final int CONNECT_TIMEOUT = 3000;
public static final String RETURN_VALUE = "returnValue";
private InetSocketAddress serverAddress;
private SocketConnector connector;
private IoSession session;
private MessageReceivedCallback messageReceivedCallback;
public Client() {
connector = new NioSocketConnector();
connector.getFilterChain().addLast("logger", new LoggingFilter());
connector.getFilterChain().addLast("codec",
new ProtocolCodecFilter(new MessageCodecFactory()));
//connector.setHandler(new Client());
connector.setHandler(this);
//设置超时 add by czp 20181207
connector.setConnectTimeoutMillis(CONNECT_TIMEOUT*2);
}
public boolean isConnected() {
return (session != null && session.isConnected());
}
public void connect(InetSocketAddress serverAddress){
setServerAddress(serverAddress);
connect();
}
public void reConnectIfNecessary(){
if(!isConnected()){
logger.info("连接被断开,重新连接");
connect();
}
}
public void connect() {
ConnectFuture connectFuture = getConnector().connect(getServerAddress());
connectFuture.awaitUninterruptibly(CONNECT_TIMEOUT);
//add by czp 20181207
if (connectFuture.isDone()) {
if (!connectFuture.isConnected()) { //若在指定时间内没连接成功,则抛出异常
logger.info("连接失败");
getConnector().dispose(); //不关闭的话会运行一段时间后抛出,too many open files异常,导致无法连接
}
}
if(connectFuture.isConnected()){
try {
session = connectFuture.getSession();
session.getConfig().setUseReadOperation(true);
logger.info("成功连接至{},本地地址:{}",session.getRemoteAddress(),session.getLocalAddress());
} catch (RuntimeIoException e) {
logger.info("连接失败",e);
}
}else{
connectFuture.cancel();
getConnector().dispose();
logger.info("连接失败");
}
}
public XmlEntity sendRequest(Message message,MessageReceivedCallback callback){
/*setMessageReceivedCallback(callback);
WriteFuture writeFuture = session.write(message);
writeFuture.awaitUninterruptibly();
ReadFuture readFuture = session.read();
readFuture.awaitUninterruptibly();
return callback.process(this, session, (Message)readFuture.getMessage());
*/
//change by czp 20181207 解决银行端无响应出现句柄快速上涨
Message resp=null;
try {
setMessageReceivedCallback(callback);
WriteFuture writeFuture = session.write(message);
writeFuture.awaitUninterruptibly();
ReadFuture readFuture = session.read();
if(readFuture.awaitUninterruptibly(CONNECT_TIMEOUT*2, TimeUnit.MILLISECONDS)){ //Wait until the message is received
resp=(Message)readFuture.getMessage();
//return callback.process(this, session, (Message)readFuture.getMessage());
}else{
logger.info("读取服务端响应超时,服务端:"+readFuture.getSession().getServiceAddress());
if(session != null){
//关闭IoSession,该操作是异步的,true为立即关闭,false为所有写操作都flush后关闭
//这里仅仅是关闭了TCP的连接通道,并未关闭Client端程序
session.getService().dispose();
session.close(false);
//客户端发起连接时,会请求系统分配相关的文件句柄,而在连接失败时记得释放资源,否则会造成文件句柄泄露
//当总的文件句柄数超过系统设置值时[ulimit -n],则抛异常"java.io.IOException: Too many open files",导致新连接无法创建,服务器挂掉
//所以,若不关闭的话,其运行一段时间后可能抛出too many open files异常,导致无法连接
connector.dispose();
logger.info("读取服务端响应超时,客户端自动释放资源。。。。。。。。。。。");
}
}
} catch (Exception e) {
logger.info("Client.sendRequest出现异常:"+e.getStackTrace());
}
return callback.process(this, session, resp);
}
public void close(){
if(isConnected()){
//关闭IoSession,该操作是异步的,true为立即关闭,false为所有写操作都flush后关闭
//这里仅仅是关闭了TCP的连接通道,并未关闭Client端程序
session.getService().dispose();//add by czp 20181207
session.close(false);
connector.dispose();
logger.info("客户端关闭了连接\n");
}
}
@Override
public void messageReceived(IoSession session, Object message)
throws Exception {
logger.info("收到来自"+session.getRemoteAddress()+"的消息:\n{} - 本地端口:{}",message,session.getLocalAddress());
}
@Override
public void messageSent(IoSession session, Object message) throws Exception {
logger.info("发送至"+session.getRemoteAddress()+"的消息:\n{}",message);
logger.info("消息已发送!");
}
@Override
public void sessionClosed(IoSession session) throws Exception {
session.getService().dispose();//add by czp 20181207
session.close(false);//add by czp 20181207
logger.info("连接至{}的连接被关闭!- 本地端口:{}",session.getRemoteAddress(),session.getLocalAddress());
}
@Override
public void exceptionCaught(IoSession session, Throwable cause)
throws Exception {
session.close(true);
logger.info("通信出现异常{}",cause);
}
public void setMessageReceivedCallback(MessageReceivedCallback messageReceivedCallback) {
this.messageReceivedCallback = messageReceivedCallback;
}
public MessageReceivedCallback getMessageReceivedCallback() {
return messageReceivedCallback;
}
public void setConnector(SocketConnector connector) {
this.connector = connector;
}
public SocketConnector getConnector() {
return connector;
}
public void setServerAddress(InetSocketAddress serverAddress) {
this.serverAddress = serverAddress;
}
public InetSocketAddress getServerAddress() {
return serverAddress;
}
}
尤其是在方法sessionClosed中添加的session.getService().dispose();和session.close(false);在session关闭前对句柄的释放。这很重要,如果没有释放,即使session关闭,被它打开的文件句柄会一直持有的。
下面是业务系统作为service服务端的代码MessageHandler:(服务端代码无修改)
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.fortunes.Message;
import com.fortunes.hmfms.network.model.XmlEntity;
public class MessageHandler extends IoHandlerAdapter {
final Logger logger = LoggerFactory.getLogger("ROOT");
@Override
public void messageReceived(IoSession session, Object o){
logger.info("mina socket 通信,收到来自远程客户端:{}的请求,银行代码:{}",session.getRemoteAddress(),bankCode);
Message requestMessage = (Message)o;
XmlEntity responseXml = null;
XmlEntity requestXml = XmlEntity.parse(requestMessage.getContents());
if(requestXml == null){
responseXml = XmlEntity.create().createDefaultRequest(XML_ERROR);
responseXml.setResponseCodeAndMsg("000001", "XML报文格式解释出错!请检查输入的报文格式");
session.write(Message.createDefaultMessage(responseXml.buildAsBytes()));
}else{
try {
//业务处理代码
} catch (NumberFormatException e) {
logger.info("报文接口程序执行异常", e);
responseXml = XmlEntity.create().createDefaultResponse(requestXml);
responseXml.setResponseCodeAndMsg("000001", "系统异常!请检查输入数据,"+e.getMessage());
} catch (Exception e) {
logger.info("报文接口程序执行异常", e);
responseXml = XmlEntity.create().createDefaultResponse(requestXml);
responseXml.setResponseCodeAndMsg("000001", "系统异常!请检查输入数据,稍后再试");
}
}
session.write(Message.createDefaultMessage(responseXml.buildAsBytes()));
}
@Override
public void exceptionCaught(IoSession session, Throwable cause)throws Exception {
session.close(true);
logger.info("通信程序执行异常", cause);
}
@Override
public void sessionClosed(IoSession session) throws Exception {
session.close(false);
//logger.info("连接已关闭!", session.getRemoteAddress());
logger.info("连接已关闭!");
}
}
通过上述修改,部署在测试环境测试后,发现再无句柄快速增长的情况,句柄数稳定在初始部署的条数。