遇见问题:
在使用HttpClient4.5写爬虫时,遇见如下异常:
java.net.SocketException: Socket closed
at java.net.SocketInputStream.socketRead0(Native Method) ~[?:1.8.0_11]
at java.net.SocketInputStream.read(SocketInputStream.java:150) ~[?:1.8.0_11]
at java.net.SocketInputStream.read(SocketInputStream.java:121) ~[?:1.8.0_11]
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:136) ~[httpcore-4.3.2.jar:4.3.2]
at org.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:195) ~[httpcore-4.3.2.jar:4.3.2]
at org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178) ~[httpcore-4.3.2.jar:4.3.2]
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137) ~[httpclient-4.3.2.jar:4.3.2]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) ~[?:1.8.0_11]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) ~[?:1.8.0_11]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_11]
at java.io.InputStreamReader.read(InputStreamReader.java:184) ~[?:1.8.0_11]
at java.io.Reader.read(Reader.java:140) ~[?:1.8.0_11]
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:244) ~[httpcore-4.3.2.jar:4.3.2]
at org.apache.http.util.EntityUtils.toString(EntityUtils.java:288) ~[httpcore-4.3.2.jar:4.3.2]
at com.api.service.DxLnCallClientService.getCustInfo(DxLnCallClientService.java:311) [classes/:1.0.0]
at com.api.service.DxPcCallClientService.setSpiderCDRReqBean(DxPcCallClientService.java:458) [classes/:1.0.0]
at com.api.service.DxPcCallClientService.synchroDataToMongodb(DxPcCallClientService.java:535) [classes/:1.0.0]
at com.api.service.DxLnCallClientService.asynchroData(DxLnCallClientService.java:280) [classes/:1.0.0]
at com.api.service.DxLnCallClientService$$FastClassBySpringCGLIB$$75a373f7.invoke(<generated>) [classes/:1.0.0]
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) [spring-core-4.2.3.RELEASE.jar:4.2.3.RELEASE]
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:718) [spring-aop-4.2.3.RELEASE.jar:4.2.3.RELEASE]
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) [spring-aop-4.2.3.RELEASE.jar:4.2.3.RELEASE]
at org.springframework.aop.interceptor.AsyncExecutionInterceptor$1.call(AsyncExecutionInterceptor.java:108) [spring-aop-4.2.3.RELEASE.jar:4.2.3.RELEASE]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_11]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_11]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_11]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_11]
在本地测试好好的,不曾发现任何问题。可是一旦搬上测试环境和生产环境就尴尬了。后来逐步添加日志,将每一步的异常进行捕获,并打印每一次请求的报文进行打印。发现一个简单的规律,即在获取的报文相对长一点是就会出现异常。那么初步就得出结论,是http请求在获取报文体内容前进行关闭处理了。
代码如下:
/**
* POST 方式访问
* @param client HttpClient
* @param context HttpClientContext
* @param url url
* @param nvp List<NameValuePair>
* @return HttpResponse
*/
public HttpResponse httpPostResp(HttpClient client, HttpClientContext context,String url,List<NameValuePair> nvp){
HttpPost httpPost = new HttpPost(url) ;
httpPost.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36") ;//逃避反爬虫
HttpResponse response ;
try{
httpPost.setEntity(new UrlEncodedFormEntity(nvp,"UTF-8")) ;
response = client.execute(httpPost,context) ;
}catch (Exception e){
logger.error("POST 方式访问异常:"+url,e) ;
throw new SimpleException(2,"Http响应:"+e.getMessage()) ;
}finally {
httpPost.abort() ;
}
return response ;
}
HttpResponse response = this.simpleHttpClientUtil.httpPostResp(httpClient,httpClientContext,url,new ArrayList<>()) ;
错误很明显,这样将HttpResponse作为结果返回给调用方使用,是有风险的,但是也不是每次都报错。
解决方案:
/**
* POST 方式访问
* @param client HttpClient
* @param context HttpClientContext
* @param url url
* @param nvp List<NameValuePair>
* @return 响应文本内容
*/
public String httpPostRespTxt(HttpClient client, HttpClientContext context,String url,List<NameValuePair> nvp){
String webTxt;
HttpPost httpPost = new HttpPost(url) ;
httpPost.addHeader("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36") ;//逃避反爬虫
HttpResponse response ;
try{
httpPost.setEntity(new UrlEncodedFormEntity(nvp,"UTF-8")) ;
response = client.execute(httpPost,context) ;
webTxt = EntityUtils.toString(response.getEntity()) ;
}catch (Exception e){
logger.error("POST 方式访问异常:"+url,e) ;
throw new SimpleException(2,"Http响应:"+e.getMessage()) ;
}finally {
httpPost.abort() ;
}
return webTxt ;
}
后记,还是基础不扎实,以至于多走了一些弯路。