功能上线后,运行平稳过一段时间(也许新功能上线后,进行了灰度,访问量不大)也就没引起注意。事故发生后,也立即做了补救措施,但部分现场也没有及时保留,从告警日志中发现,rt飙高的接口都指向了同一个接口。该接口是通过httpclient来调用微信的接口,因为微信为了方便用户对接其接口,对httpclient进行了封装,提供了一个sdk,代码如下:
// 加载商户私钥(privateKey:私钥字符串)
PrivateKey merchantPrivateKey = null;
try {
merchantPrivateKey = PemUtil
.loadPrivateKey(new ByteArrayInputStream(privateKey.getBytes("utf-8")));
// 加载平台证书(mchId:商户号,mchSerialNo:商户证书序列号,apiV3Key:V3密钥)
updateCertificatesVerifier = new AutoUpdateCertificatesVerifier(
new WechatPay2Credentials(mchId, new PrivateKeySigner(mchSerialNo, merchantPrivateKey)), apiV3Key.getBytes("utf-8"));
// 初始化httpClient
httpClient = WechatPayHttpClientBuilder.create()
.withMerchant(mchId, mchSerialNo, merchantPrivateKey)
.withValidator(new WechatPay2Validator(updateCertificatesVerifier))
.build();
} catch (UnsupportedEncodingException e) {
log.error("WeChatPayHttpClientUtil初始化失败");
}
经过排查发现次sdk只是对httpclient验签等相关功能进行了补充,对于一些httpclient核心如连接池并没有进行设置,也就是都是使用默认的配置,通过debug源码发现
ConnectionRequestTimeout,ConnectionTimeout,SocketTimeout默认值都为 -1
ConnectionRequestTimeout
,从连接池获取一个可用连接的等待超时ConnectionTimeout
建立socket连接的超时(三次握手阶段)SocketTimeout
socket读写超时(通信阶段)
而-1则代表永不过期,意味着假设连接池里面的线程不够,那么请求也会一直等下去而不是报什么像org.apache.http.conn.ConnectionPoolTimeoutException: Timeout waiting for connection from pool 的错误,这也是导致rt飙高的重要原因之一,用户请求一直等待一个连接,而等不到。
那什么导致连接池没有可用连接了呢?通过build()方法发现
WechatPayHttpClientBuilder.create()
.withMerchant(mchId, mchSerialNo, merchantPrivateKey)
.withValidator(new WechatPay2Validator(updateCertificatesVerifier))
.build();
PoolingHttpClientConnectionManager是HttpClientConnectionManager实现之一,也就是说通HttpClientBuilder.build()出来的连接池是采用PoolingHttpClientConnectionManager连接池,此连接池默认实现如下
其中有两个核心参数defaultMaxPerRoute默认值为2,maxTotal默认值为20。
maxTotal:是同时间正在使用的最多的连接数
defaultMaxPerRoute:是针对一个域名同时间正在使用的最多的连接数
也就是说一个域名下最多同时支持2个连接(对于我们系统来说完全不够的),再联系上面的连接池等待时间永不过期,rt飙高的原因也就找到了答案。
最后修改如下:
// 加载商户私钥(privateKey:私钥字符串)
PrivateKey merchantPrivateKey = null;
try {
merchantPrivateKey = PemUtil
.loadPrivateKey(new ByteArrayInputStream(privateKey.getBytes("utf-8")));
// 加载平台证书(mchId:商户号,mchSerialNo:商户证书序列号,apiV3Key:V3密钥)
updateCertificatesVerifier = new AutoUpdateCertificatesVerifier(
new WechatPay2Credentials(mchId, new PrivateKeySigner(mchSerialNo, merchantPrivateKey)), apiV3Key.getBytes("utf-8"));
// 初始化httpClient
RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(3000).setConnectionRequestTimeout(3000)
.setSocketTimeout(3000).build();
httpClient = WechatPayHttpClientBuilder.create()
.withMerchant(mchId, mchSerialNo, merchantPrivateKey)
.withValidator(new WechatPay2Validator(updateCertificatesVerifier))
.evictExpiredConnections() // 开启回收空闲线程
.evictIdleConnections(3l,TimeUnit.SECONDS) //默认3秒回收空闲连接
.setMaxConnPerRoute(300).setMaxConnTotal(300)
.setConnectionTimeToLive(3, TimeUnit.SECONDS).build();
} catch (UnsupportedEncodingException e) {
log.error("WeChatPayHttpClientUtil初始化失败");
}
总结:此问题在于自己没有对第三方提供的sdk去深入探究它如何实现,同时自己对httpclient理解的也不够深导致。吸取教训!!
针对上面这种情况,自己也分别在本地也对两种配置进行了测试,感兴趣的同学可以看下。
public static void main(String[] args) throws IOException {
//这三行日志主要是为了关闭httpClient的debug日志打印,否则控制台看着很难受
// 设置默认工厂类
System.setProperty("org.apache.commons.logging.LogFactory", "org.apache.commons.logging.impl.LogFactoryImpl");
// 设置日志打印类
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.SimpleLog");
//设置默认日志级别
LogFactory.getFactory().setAttribute("org.apache.commons.logging.simplelog.defaultlog", "error");
PoolingHttpClientConnectionManager manager = new PoolingHttpClientConnectionManager();
CloseableHttpClient httpClient = getHttpClient(manager);
// 监控线程 打印httpClient线程池信息
Thread monitor = new Thread(new MyThreadMonitor(manager));
monitor.start();
// 使用CountDownLatch模拟并发400个请求
int countDownThread = 400;
CountDownLatch countDownLatch = new CountDownLatch(countDownThread);
for (int i = 0; i < countDownThread; i++) {
Thread thread = new Thread(new RequestThread(httpClient, countDownLatch));
thread.start();
countDownLatch.countDown();
}
}
//使用当前配置
public static CloseableHttpClient getHttpClient(PoolingHttpClientConnectionManager manager) {
RequestConfig requestConfig = RequestConfig.custom()
.setConnectTimeout(3000) // 建立tcp握手的时间
.setConnectionRequestTimeout(3000) // 从连接池等待获取连接的时间
.setSocketTimeout(3000).build(); // 等待服务器返回数据的时间
manager.setDefaultMaxPerRoute(300); //设置当前路由也就是域名同时最大的连接数
manager.setMaxTotal(300); //设置最大的连接数
CloseableHttpClient httpClient = HttpClientBuilder.create().evictExpiredConnections().evictIdleConnections(3l,TimeUnit.SECONDS) //默认3秒回收空闲连接
.setConnectionManager(manager)
.setDefaultRequestConfig(requestConfig)
.setConnectionTimeToLive(1l, TimeUnit.SECONDS).build();
return httpClient;
}
//使用默认配置
public static CloseableHttpClient getDefaultHttpClient(PoolingHttpClientConnectionManager manager) {
return HttpClientBuilder.create().setConnectionManager(manager).build();
}
static class MyThreadMonitor implements Runnable {
private PoolingHttpClientConnectionManager manager;
public MyThreadMonitor(PoolingHttpClientConnectionManager manager) {
this.manager = manager;
}
@Override
public void run() {
int i = 0;
while (true) {
for (HttpRoute route : manager.getRoutes()) {
System.err.println("i:" + (i++) + route.getTargetHost().getHostName() + manager.getStats(route));
}
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
static class RequestThread implements Runnable {
private CloseableHttpClient httpClient;
private CountDownLatch countDownLatch;
public RequestThread(CloseableHttpClient httpClient, CountDownLatch countDownLatch) {
this.httpClient = httpClient;
this.countDownLatch = countDownLatch;
}
@Override
public void run() {
try {
countDownLatch.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
long sTime = System.currentTimeMillis();
HttpGet httpGet = new HttpGet("http://localhost:8080/test/testClient?threadId="+Thread.currentThread().getId());
httpGet.addHeader("Accept", "application/json");
CloseableHttpResponse response = null;
try {
response = httpClient.execute(httpGet);
if (response != null && response.getStatusLine().getStatusCode() == 200) {
//此句话是为了关闭流,否则会出现线程池leased一直被占用无法释放只增不减
EntityUtils.toString(response.getEntity(), "UTF-8");
}
long eTime = System.currentTimeMillis();
long time = eTime - sTime;
if (time > 1000) {
System.err.println("线程id:[" + Thread.currentThread().getId()+"]" + " 超时!!!:" + time);
} else {
System.out.println("线程id:[" + Thread.currentThread().getId() +"]"+ " 耗时:" + time + "毫秒");
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
------------------Controller----------------------
@GetMapping("/testClient")
public BizResult<String> testClient(@RequestParam Long threadId) {
Random random = new Random();
int i = random.nextInt(10);
// 10分之1概率模拟出现网络问题导致的延时
if(i == 1){
System.out.println("线程id:["+threadId+"]"+"超时数量:["+atomicInteger.getAndIncrement()+"]");
try {
Thread.sleep(300);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
return BizResult.create("success");
}
1.使用当前配置,400并发连接也能正常响应
2.使用默认配置,出现连接池排队等待情况,可以看到默认最大连接才为2