转载来源:http://blog.csdn.net/zstu_cc/article/details/39738117
转载请注明出处,不要像那个ThinkSAAS一样。
终于国庆了,有时间写点东西,赶紧更新一下。
上一篇介绍了HtmlUnit在网络抓取,小型爬虫等应用中优劣势,这篇一起来看下HttpClient在这一方面的应用。
HttpClient 是 Apache Jakarta Common 下的子项目,可以用来提供高效的、最新的、功能丰富的支持 HTTP 协议的 客户端 编 程工具包。它采用是大家再熟悉不过的Scoket编程,我们在HttpClientConnectionOperator类中可以看到。
<span style= "font-size:14px;" > public void connect( final ManagedHttpClientConnection conn, final HttpHost host, final InetSocketAddress localAddress, final int connectTimeout, final SocketConfig socketConfig, final HttpContext context) throws IOException { final Lookup<ConnectionSocketFactory> registry = getSocketFactoryRegistry(context); final ConnectionSocketFactory sf = registry.lookup(host.getSchemeName()); if (sf == null ) { throw new UnsupportedSchemeException(host.getSchemeName() + " protocol is not supported" ); } final InetAddress[] addresses = this .dnsResolver.resolve(host.getHostName()); final int port = this .schemePortResolver.resolve(host); for ( int i = 0 ; i < addresses.length; i++) { final InetAddress address = addresses[i]; final boolean last = i == addresses.length - 1 ; Socket sock = sf.createSocket(context); sock.setReuseAddress(socketConfig.isSoReuseAddress()); conn.bind(sock); final InetSocketAddress remoteAddress = new InetSocketAddress(address, port); if ( this .log.isDebugEnabled()) { this .log.debug( "Connecting to " + remoteAddress); } try { sock.setSoTimeout(socketConfig.getSoTimeout()); sock = sf.connectSocket( connectTimeout, sock, host, remoteAddress, localAddress, context); sock.setTcpNoDelay(socketConfig.isTcpNoDelay()); sock.setKeepAlive(socketConfig.isSoKeepAlive()); final int linger = socketConfig.getSoLinger(); if (linger >= 0 ) { sock.setSoLinger(linger > 0 , linger); } conn.bind(sock); if ( this .log.isDebugEnabled()) { this .log.debug( "Connection established " + conn); } return ; } catch ( final SocketTimeoutException ex) { if (last) { throw new ConnectTimeoutException(ex, host, addresses); } } catch ( final ConnectException ex) { if (last) { final String msg = ex.getMessage(); if ( "Connection timed out" .equals(msg)) { throw new ConnectTimeoutException(ex, host, addresses); } else { throw new HttpHostConnectException(ex, host, addresses); } } } if ( this .log.isDebugEnabled()) { this .log.debug( "Connect to " + remoteAddress + " timed out. " + "Connection will be retried using another IP address" ); } } } </span>
就是说HttpClient和URLConnection一样是通过Socket编程来实现网络通信的,相比来说当然是JDK的东西效率什么的更高了,但是我们选择前者,其实就是主要原因就是因为----懒!HttpClient是对Socket/HTTP协议恰到好处的封装,它不像HtmlUnit那样高度,也不像URLConnection用起来比较麻烦,它兼有简单、强扩展等特性,所以和它同Apache Jakarta开发项目组的HtmlUnit也采用了HttpClient。
优点(从百科中说的优点来看):
1、实现了所有 HTTP 的方法(GET,POST,PUT,HEAD 等):实现倒是实现了,笔者就用过GET、POST、PUT,其它的可能用的比较少了。现在Java Servlet服务器不是整天都GET、POST的,倒也不是特别关心其它的方式
2、支持自动转向:这句话说的也点坑,因为它其实是想说,支持200以下的响应码自动向前,这点可以参考HttpRequestExecutor类中的doReceiveResponse方法。
protected HttpResponse doReceiveResponse( final HttpRequest request, final HttpClientConnection conn, final HttpContext context) throws HttpException, IOException { Args.notNull(request, "HTTP request" ); Args.notNull(conn, "Client connection" ); Args.notNull(context, "HTTP context" ); HttpResponse response = null ; int statusCode = 0 ; while (response == null || statusCode < HttpStatus.SC_OK) { response = conn.receiveResponseHeader(); if (canResponseHaveBody(request, response)) { conn.receiveResponseEntity(response); } statusCode = response.getStatusLine().getStatusCode(); } return response; }
最开始的时候我还以为是支持自动跳转,就像上一篇文章中的例子一样。
3、支持 HTTPS 协议:HttpClient对SSL的支持是比较全面的,最简单的:
private static HttpClient getSSLInsecureClient() throws Exception { 43 . 44 . SSLContext sslContext = new SSLContextBuilder().loadTrustMaterial( null , new TrustStrategy() { 45 . public boolean isTrusted(X509Certificate[] chain, String authType) throws CertificateException { 46 . return true ; 47 . } 48 . }).build(); 49 . SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(sslContext); 50 . return HttpClients.custom(). 51 . setSSLSocketFactory(sslsf) 52 . 53 . .build(); 54 . }
HttpClient还支持各种各样的证书验证方式,还有服务器认证,看这个就可以了:
http://baike.baidu.com/view/2476238.htm?fr=aladdin
4、支持代理 服务器 :支持代理,就实用的,你可以把HttpClient的代理设置为Filder(一款功能非常强大的网络监听软件,WebDebugger),这样所有的HttpClient发出的请求都会被Filder所接收和管理,这是在代码测试阶段一个非常好的方式。
还不太完整,先发布了吧,之前实在是没时间