代理ip的原理是: 把A机器把请求request发送到代理服务器C, C在向服务器B发送request, 然后通过C把响应response给A
在使用httpclient 写爬虫程序的时候通常 需要使用代理:
public class HttpClientProxyTest {
@Test
public void test1() throws ClientProtocolException, IOException {
HttpClient httpclient = getClient(true);
String url = "http://1111.ip138.com/ic.asp";
HttpGet httpGet = new HttpGet(url);
CloseableHttpResponse res = (CloseableHttpResponse) httpclient.execute(httpGet);
String string = EntityUtils.toString(res.getEntity(), "gb2312");
System.out.println(string);
}
private HttpClient getClient(boolean proxy) {
if(proxy) {
// 阿里的代理服务器
String hostStr = "42.121.105.155";
HttpHost host = new HttpHost(hostStr, 8888);
DefaultProxyRoutePlanner routePlanner = new DefaultProxyRoutePlanner(
host);
return HttpClients.custom()
.setRoutePlanner(routePlanner).build();
} else {
return HttpClients.custom().build();
}
}
}
程序输出:
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=gb2312">
<title> 您的IP地址 </title>
</head>
<body style="margin:0px"><center>您的IP是:[42.121.105.155] 来自:浙江省杭州市 阿里巴巴</center></body></html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=gb2312">
<title> 您的IP地址 </title>
</head>
<body style="margin:0px"><center>您的IP是:[211.157.164.1] 来自:北京市 263网络通信</center></body></html>