博主用到是Apache下的HttpClient包,它是用来提供高效的,最新的,功能丰富的支持Http协议的客户端编程工具,并且它支持Http协议最新的版本和建议。相比于其他传统JDK自带的URLConnection,并增加了易用性和灵活性。其主要功能是用来向服务器发送请求,并返回相关资源。在Java网络爬虫实战中,经常使用HttpClient来获取网页内容,用jsoup来解析网页内容。
本文用HttpClient来演示简单获取酷狗音乐(https://www.kugou.com)内容:
首先导入依赖:
<dependency> <?用来模拟浏览器发送http请求的一个工具?> <groupId>org.apache.httpcomponents</groupId> <artifactId>httpclient</artifactId> <version>4.5.3</version> </dependency>
具体代码如下:
import org.apache.http.client.methods.CloseableHttpResponse; import org.apache.http.client.methods.HttpGet; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import org.apache.http.util.EntityUtils; import org.junit.Test; /** * @Author 海龟 * @Date 2020/10/7 14:59 * @Desc 演示使用HttpClient实现网络爬虫 */ public class HttpClient { @Test public void testGet() throws Exception { //1.创建HttpClient对象 CloseableHttpClient httpClient = HttpClients.createDefault(); //2. 创建HttpGet请求,并进行相关设置 HttpGet httpGet = new HttpGet("https://www.kugou.com/?username==java"); httpGet.setHeader("User-Agent", "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 Edg/85.0.564.68"); //3.发起请求 CloseableHttpResponse response = httpClient.execute(httpGet); //4.判断响应状态码并获取响应数据 if (response.getStatusLine().getStatusCode() == 200) { //200表示响应成功 String html = EntityUtils.toString(response.getEntity(), "UTF-8"); System.out.println(html); } //5.关闭资源 httpClient.close(); response.close(); }}
演示结果为:
快快去演示一下吧!!!