想copy个爬虫玩玩,结果提到用的jar包是apache的http客户端开源项目---HttpClient
就下载了一个版本4.3
HttpClient httpclient = new HttpClient();
结果遇到这句就给跪了
提示Cannot instantiate the type HttpClient,
google 了下,在stackoverflow上面说是应该
HttpClient httpclient = new DefaultHttpClient();
这样写,不过得先import org.apache.http.impl.client.Defaul
tHttpClient;
据说从4.×版本后,它的用法就变了不能这么使用了
最后把爬到的html代码保存成html格式就行了
这个代码比较正常:
import java.io.IOException;
import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpException;
import org.apache.commons.httpclient.methods.GetMethod;
public class Spider {
public static void main(String[] args) {
// TODO Auto-generated method stub
// 相当于打开浏览器
HttpClient httpClient = new HttpClient();
// 相当于在浏览器中输入网址
GetMethod getMethod = new GetMethod("http://www.baidu.com");
try {
// 返回HTTP状态码,在后面用到。
int statusCode = httpClient.executeMethod(getMethod);
// 此处输出的是html语言
System.out.println("response=" + getMethod.getResponseBodyAsString());
} catch (HttpException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} finally {
// 关闭网络连接,防止造成资源浪费
getMethod.releaseConnection();
}
}
}
不太正常的代码:
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.HttpResponse;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.ResponseHandler;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class Spider {
public static void main(String[] args) throws Exception {
CloseableHttpClient httpclient = HttpClients.createDefault();
try {
String url = "http://www.baidu.com";
HttpGet httpGet = new HttpGet(url);
System.out.println("executing request " + httpGet.getURI());
ResponseHandler<String> responseHandler = new ResponseHandler<String>() {
public String handleResponse(final HttpResponse response) throws ClientProtocolException, IOException {
int status = response.getStatusLine().getStatusCode();
if (status >= 200 && status < 300) {
HttpEntity entity = response.getEntity();
return entity != null ? EntityUtils.toString(entity) : null;
} else {
throw new ClientProtocolException("Unexpected response status: " + status);
}
}
};
String responseBody = httpclient.execute(httpGet, responseHandler);
System.out.println("-------------------------------------------");
System.out.println(responseBody);
System.out.println("-------------------------------------------");
} finally {
httpclient.close();
}
}
}