HttpClient 这个框架主要用来请求第三方服务器,然后获取到网页,得到我们需要的数据
搞个简单实例,让大家体验一把
创建一个Maven项目 在pom.xml贴上依赖
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.3</version>
</dependency>
第一种:
package com.gcx.demo.HelloWorld2;
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
public class App2 {
public static void main(String[] args) {
//创建一个httpclient 实例
CloseableHttpClient httpClient=HttpClients.createDefault();
//创见HttpGet实例
HttpGet httpGet=new HttpGet("https://www.baidu.com");
CloseableHttpResponse execute=null;
//执行这个实例
try {
execute = httpClient.execute(httpGet);
//获取返回的实体
HttpEntity entity = execute.getEntity();
System.out.println("网页内容:"+EntityUtils.toString(entity,"utf-8"));
} catch (ClientProtocolException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}finally{
try {
execute.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
输出:
这里得到了百度首页源码,当然要获得具体数据的话,要用到Jsoup,后面文章在做详述
第二种(简化下):
package com.gcx.demo.HelloWorld2;
import java.io.IOException;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;
/**
* Hello world!
*
*/
public class App
{
public static void main( String[] args ) throws ClientProtocolException, IOException{
//创建HttpClient实例
CloseableHttpClient httpClient=HttpClients.createDefault();
//创建httpget实例
HttpGet httpGet=new HttpGet("http://www.baidu.com");
//执行这个实例
CloseableHttpResponse response = httpClient.execute(httpGet);
//获取返回实体
HttpEntity entity = response.getEntity();
System.out.println("网页内容:"+EntityUtils.toString(entity,"utf-8"));
response.close();
}
}
但是实际开发的话,我们对于每一种异常的抛出,catch里都需要做一些业务上的操作,所以以后用的话,还是第一种,假如爬虫任务很简单,容易爬取,并且量小,那就第二种。还是要根据具体情况而定