httpclinet 是一个java语言开源包 ,支持通过http协议下载各种文件,具有良好的配置性。详细见:http://hc.apache.org/httpclient-3.x/ 。
下面具体例举使用的经验(httpclient3.1):
1. 多线程
private static HttpClient hc = null;
hc = new HttpClient(new MultiThreadedHttpConnectionManager());
2. 使用代理
private static HttpClient hcproxy = null;
hcproxy = new HttpClient(new MultiThreadedHttpConnectionManager());
hcproxy.getHostConfiguration().setProxy("192.168.1.1", 8888);//ip and port
3.含有中文url的处理
GetMethod get = null;
URI strURI = null;
url="http://60.190.222.233:5583/bst28/0911/满文军--爱唱给你听.wma";
try {
strURI = new URI(url, true, "GBK");
} catch (URIException e) {
strURI = new URI(url, false, "GBK");
}
get = new GetMethod();
get.setURI(strURI);
4.抓取常见的httpclient基本设置
get.setFollowRedirects(true);//设置重定向
get.getParams().setParameter(HttpMethodParams.RETRY_HANDLER,
new DefaultHttpMethodRetryHandler(5, false));//设置重试
get.getParams().setParameter(HttpMethodParams.SO_TIMEOUT,
new Integer(5000));//设置超时
get.getParams().setParameter(HttpMethodParams.USER_AGENT,"Nokia");//设置UA
get.getParams().setCookiePolicy(CookiePolicy.IGNORE_COOKIES);//设置Cookie
get.setRequestHeader("Cookie","laystate=undefined;show160site=show160");
get.addRequestHeader("referer", "");//设置referer
UA,Cookie,Referer 是抓取页面时特别需要注意的几项设置。