1.问题现象
文件服务器在运行一段时间后,大量文件下载失败
2.问题定位
查看服务器日志,发现大量的连接池异常
org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool
导致从OBS下载文件失败
通过查看服务器网络状态检测到服务器有大量的CLOSE_WAIT的状态
[root@host]# netstat -an|awk '/tcp/ {print $6}'|sort|uniq -c
800 CLOSE_WAIT
23 ESTABLISHED
25 LISTEN
2 TIME_WAIT
研究了CLOSE_WAIT数量过大的原理一般是由于被动关闭连接处理不当导致的
例如,服务器A会去请求服务器B上面的apache获取文件资源,正常情况下,如果请求成功,那么在抓取完资源后服务器A会主动发出关闭连接的请求,这个时候就是主动关闭连接,连接状态我们可以看到是TIME_WAIT。如果一旦发生异常呢?假设请求的资源服务器B上并不存在,那么这个时候就会由服务器B发出关闭连接的请求,服务器A就是被动的关闭了连接,如果服务器A被动关闭连接之后自己并没有释放连接,那就会造成CLOSE_WAIT的状态了
根据这个原理我们去分析,我们FileServer是作为客户端去请求OBS下载文件,如果OBS文件不存在,此时会发出关闭请求,如果FileServer没有去关闭连接应该会导致此类问题
查看日志,可以看出确实有大量的文件不存在情况
[root@host]# cat fileserver.log.2019-02-27* |grep 'get response from nsp is:404' |wc -l
881
查看代码中连接池的配置,发现http最大连接数配置的是800,而现在CLOSE_WAIT的状态已经达到800,这个也恰好可以验证之前的猜想
继续定位,查看下载文件时的代码
private static CloseableHttpResponse getNSPResponse(String url, Integer fileType, String nspUrl) throws CException
{
HttpGet httpRequest = buildHttpGetUriRequest(url, fileType, nspUrl);
CloseableHttpResponse responseGet = null;
try
{
responseGet = HttpUtil.doGet(httpRequest);
}
catch (IOException e)
{
logDebugger.error("getNSPResponse.url = " + url + ",Send to nsp doGet is IOException:", e);
throw new CException(ResultCodeConstants.SEND_NSP_IO_EXCEPTION, "Send to nsp doGet is IOException");
}
if (responseGet == null || responseGet.getStatusLine() == null)
{
logDebugger.error("getNSPResponse.url = " + url + ",Get response from nsp is null");
throw new CException(ResultCodeConstants.GET_RESP_FROM_NSP_NUll, "Get response from nsp is null");
}
int statusCode = responseGet.getStatusLine().getStatusCode();
logDebugger.info("getNSPResponse.url = " + url + ",get response from nsp is:{}",statusCode);
if (HttpStatus.SC_OK != statusCode)
{
int errorCode = ResultCodeConstants.RESOPNE_FROM_NSP_PREFIX * 1000 + statusCode;
logDebugger.error("getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = " + errorCode);
throw new CException(errorCode,
"getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = "+ errorCode);
}
return responseGet;
}
在代码中果然发现了问题,当我们获取文件如果不存在时,代码中直接抛出的异常,而没有进行response流的关闭,导致连接一直未释放
出现问题代码如下:
3.模拟问题重现
下载一个OBS上不存在的文件,进行开发环境问题重现
public static void main(String[] args) throws IOException {
String url = "http://lfappdevfile01.hwcloudtest.cn:18085/FileServer/getFile/app/000/000/375/0900086000000000375.20190227145423.92492623218927803776262971924930:20190510161531:2500:9692682CC19232CA6DE605D340C269D7E11CBAFC1B67D7F3E476030E679D5EC4.jpg";
for(int i = 0; i< 1000; i++)
{
HttpRequest.downLoadFromUrl(url,"test" +i +".jpg","D:\\test\\test\\");
}
System.out.println("下载完成");
}
测试前
测试后,问题果然重新
4.代码修复
知道原因以后修改代码,进行测试,功能恢复正常,未出现大量CLOSE_WAIT状态
环境运行一段时间未出现该问题
private static JSONObject getNSPResponse(String url, Integer fileType, String nspUrl) throws CException
{
HttpGet httpRequest = buildHttpGetUriRequest(url, fileType, nspUrl);
CloseableHttpResponse responseGet = null;
try()
{
responseGet = HttpUtil.doGet(httpRequest);
if (responseGet == null || responseGet.getStatusLine() == null)
{
logDebugger.error("getNSPResponse.url = " + url + ",Get response from nsp is null");
throw new CException(ResultCodeConstants.GET_RESP_FROM_NSP_NUll, "Get response from nsp is null");
}
int statusCode = responseGet.getStatusLine().getStatusCode();
logDebugger.info("getNSPResponse.url = " + url + ",get response from nsp is:{}",statusCode);
if (HttpStatus.SC_OK != statusCode)
{
int errorCode = ResultCodeConstants.RESOPNE_FROM_NSP_PREFIX * 1000 + statusCode;
logDebugger.error("getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = " + errorCode);
throw new CException(errorCode,
"getNSPResponse.url = " + url + ",get response from nsp is exception, ExceptionCode = "+ errorCode);
}
JSONObject rspStruct = getResponse(responseGet);
logDebugger.info(
"NSPServiceClient.executeGet,rspStruct.getString(url)=" + rspStruct.getString("url") + ",status = "
+ responseGet.getStatusLine().getStatusCode() + " from nsp.");
return rspStruct;
}
catch (IOException e)
{
logDebugger.error("getNSPResponse.url = " + url + ",Send to nsp doGet is IOException:", e);
throw new CException(ResultCodeConstants.SEND_NSP_IO_EXCEPTION, "Send to nsp doGet is IOException");
}
finally
{
IOUtils.closeQuietly(responseGet);
}
}