httpclient,搞懂它

乐之者java

于 2020-05-14 22:09:07 发布

阅读量1.4k

点赞数 1

分类专栏： java 文章标签： http

本文链接：https://blog.csdn.net/xiaozhuangyumaotao/article/details/106129641

版权

java 专栏收录该内容

120 篇文章 0 订阅

订阅专栏

一、httpclient的引入以及HttpGet请求

httpclient版本以及maven引入：

<dependency>
    <groupId>org.apache.httpcomponents</groupId>
    <artifactId>httpclient</artifactId>
    <version>4.5.2</version>
</dependency>

引入了httpclient之后，我们就可以使用了，先看下HttpGet

@Test
	public void test0() throws Exception{
		CloseableHttpClient client=HttpClients.createDefault();
		String url="http://www.roadjava.com/s/spjc/jshjquery/2018/02/bootstrapdtooltip.html";
		HttpGet httpGet=new HttpGet(url);
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
			response = client.execute(httpGet);
			HttpEntity entity = response.getEntity();
			System.out.println("获取到的内容："+EntityUtils.toString(entity,"UTF-8"));
			EntityUtils.consume(entity);//关闭entity
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
				try {client.close();} catch (IOException e) {e.printStackTrace();}
			}
			if (response!=null) {
				try {response.close();} catch (IOException e) {e.printStackTrace();}
			}
		}
	}

运行结果：

有几点需要解释的地方：

一、CloseableHttpClient类可以理解为一个浏览器，顾名思义，既然是可以被关闭的浏览器，我们在finally代码处理块里面一定不要忘记关闭CloseableHttpClient的对象。

二、System.out.println("获取到的内容："+EntityUtils.toString(entity,"UTF-8"));这一句里面的UTF-8指的是当从entity(目标url的结果)里面找不到编码时，使用哪种编码解析网页，这里的编码指的是：

针对写法规范的h5页面，httpclient可以直接获取到编码，解析出来不会是乱码，你可以直接写System.out.println("获取到的内容："+EntityUtils.toString(entity)); 不用指定编码；针对非h5页面，就不一定能解析出来了，为了确保万一，需要先判断要爬取的页面是什么编码格式并获取到这个编码格式，并作为EntityUtils.toString(entity,目标网址的网页编码格式))函数的第二个参数传入。否则，你得到的返回内容将会是乱码：

二、让你的HttpClients更像一个浏览器

使用HttpClient访问网络资源的时候，很有可能遇到下面这个问题：

即“系统检测亲不是真人行为，因系统资源限制，我们只能拒绝你的请求。”我擦，这泰牛了吧，怎么办呢，因为我们使使用代码访问的，并不是真正通过浏览器访问的，现在的问题是怎么让我们用HttpClient代码发出的请求看起来更像从一个浏览器发出的请求？

这里就要加一个http的请求头了，可以在下面的地方(谷歌浏览器按f12,找到network)找到浏览器的请求头：

怎么加呢？

//设置了这个之后就像一个真正的浏览器了，可以排除掉一些网站的拦截

httpGet.addHeader("User-Agent","Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36");

整体代码如下所示：

@Test
	public void test0() throws Exception{
		CloseableHttpClient client=HttpClients.createDefault();
		String url="https://www.tuicool.com/";
		HttpGet httpGet=new HttpGet(url);
		//设置了这个之后就像一个真正的浏览器了，可以排除掉一些网站的拦截
		 httpGet.addHeader("User-Agent",
  "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36");
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
			response = client.execute(httpGet);
			HttpEntity entity = response.getEntity();
			System.out.println("获取到的内容："+EntityUtils.toString(entity));
			EntityUtils.consume(entity);//关闭entity
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
				try {client.close();} catch (IOException e) {e.printStackTrace();}
			}
			if (response!=null) {
				try {response.close();} catch (IOException e) {e.printStackTrace();}
			}
		}
	}

三、使用HttpClient获取响应头以及ContentType

@Test
	public void test0() throws Exception{
		CloseableHttpClient client=HttpClients.createDefault();
		String url="http://news.baidu.com/";
		HttpGet httpGet=new HttpGet(url);
		httpGet.addHeader("User-Agent",
	 "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36");
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
			response = client.execute(httpGet);
			Header[] allHeaders = response.getAllHeaders();
			for(Header h:allHeaders){
				System.out.println("响应头："+h.getName()+":"+h.getValue());
			}
		    HttpEntity entity = response.getEntity();
		    System.out.println("内容类型:"+entity.getContentType().getValue());
			System.out.println("响应体："+EntityUtils.toString(entity));
			EntityUtils.consume(entity);//关闭entity
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
				try {client.close();} catch (IOException e) {e.printStackTrace();}
			}
			if (response!=null) {
				try {response.close();} catch (IOException e) {e.printStackTrace();}
			}
		}
	}

运行结果：

响应头：Connection:keep-alive

响应头：Content-Type:text/html;charset=utf-8

响应头：Date:Sun, 04 Feb 2018 18:23:30 GMT

响应头：P3p:CP=" OTI DSP COR IVA OUR IND COM "

响应头：Server:Apache

响应头：Set-Cookie:BAIDUID=4BC308E279623810C6941EBF84748F56:FG=1; expires=Mon, 04-Feb-19 18:23:30 GMT; max-age=31536000; path=/; domain=.baidu.com; version=1

响应头：Tracecode:14104312990353441546020502

响应头：Tracecode:14103936990818883082020502

响应头：Vary:Accept-Encoding

响应头：Transfer-Encoding:chunked

内容类型:text/html;charset=utf-8

响应体：<!doctype html>

<head>

........

四、使用HttpClient获取网络上的图片并保存到本地

/*
	 * 获取图片等非文本内容
	 */
	@Test
	public void test2(){
		CloseableHttpClient client=HttpClients.createDefault();
		HttpGet httpGet=null;
		httpGet=new HttpGet("http://images.sohu.com/uiue/sohu_logo/beijing2008/2008sohu.gif");
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
			 response = client.execute(httpGet);
			HttpEntity entity = response.getEntity();
			if (entity!=null) {
				String value = entity.getContentType().getValue();
				if (!value.startsWith("image")) {
					return ;
				}
				int statusCode=response.getStatusLine().getStatusCode();
				if (statusCode!=200) {//返回200才代表访问图片等成功
					return ;
				}
				String suffix=".jpg";
				value=value.toLowerCase();
				if (value.contains("jpg")||value.contains("jpeg")) {
					suffix=".jpg";
				}else if (value.contains("bmp")||value.contains("bitmap")) {
					suffix=".bmp";
				}else if (value.contains("png")) {//image/x-png
					suffix=".png";
				}else if (value.contains("gif")) {//image/gif
					suffix=".gif";
				}
				byte[] byteArray = EntityUtils.toByteArray(entity);
				FileOutputStream fos=new FileOutputStream("d:/zhao/222"+suffix);
				fos.write(byteArray);
				fos.close();
			}
			EntityUtils.consume(entity);
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
				try {client.close();} catch (IOException e) {e.printStackTrace();}
			}
			if (response!=null) {
				try {response.close();} catch (IOException e) {e.printStackTrace();}
			}
		}
	}

注意：当访问的内容为图片资源的时候，通过entity.getContentType().getValue()得到的内容类型ContentType为image/xxxx，其实如果不是那么严格，也不用判断图片的后缀，一律保存为jpg或者其他自己想要的格式都可以，我这里为了严格，才这样判断图片类型，这在网络爬虫上经常使用到。

打开后，图片是完好无损的：

五、httpclient设置代理ip

在爬虫系统中，经常高频率的访问一个网站一般都会被这个网站识别到你的ip并屏蔽掉你的程序，导致爬虫无法进行，这个时候可以使用代理ip来解决爬虫被屏蔽的问题，httpclient设置代理的方法：

httpGet=new HttpGet("http://news.baidu.com/");

//常见的代理ip网站，比如西刺，无忧代理，66ip等

HttpHost proxy=new HttpHost("110.73.43.152",8123);

RequestConfig config=RequestConfig.custom().setProxy(proxy).build();

//设置代理ip

httpGet.setConfig(config);

比如目标网站target屏蔽了你的爬虫，target得到你的ip是1.1.1.1，设置代理ip之后，你的爬虫系统对外暴露的ip就不再是1.1.1.1了，而是

110.73.43.152了，这样就暂时解决掉了真实ip被屏蔽的问题，但是真实的情况并不是我们想象的那么简单的，这些免费的代理ip一般都不稳定，处理起来也很复杂，所以如果你的公司没钱，就只能让你受苦了，自己设置代理ip吧，很有可能你需要再写个爬虫不断地去更新这些代理ip，以达到不停的切换你使用的代理ip的目的，另外，也避免了这些代理ip不久就会失效的问题。当然，如果你的公司有钱，多买点服务器，多搞点静态ip去整就爽多了，当然，静态ip可不是你说能买就能买到的。

六、httpclient设置连接超时和读取超时时间

HttpGet httpGet=new HttpGet("http://www.xicidaili.com/");
		RequestConfig config=RequestConfig.custom().
				setConnectTimeout(5000).
				setSocketTimeout(5000).
				build();
		 httpGet.setConfig(config);
		 System.out.println(config.getConnectTimeout());//-1
		 System.out.println(config.getSocketTimeout());//-1

httpclient设置连接超时时间和读取超时时间

注意：如果目标网址写错了，比如代理ip或者端口写错了又或者是访问的网址写错了发现马上就会报错：java.net.UnknownHostException这样的错误，这是就跟你设置的连接超时和读取超时时间就没关系了。

连接超时时间：是指的从你的程序所在的电脑访问到目标地址经过的时间，因为中间经过多层路由，比如国外的网站就经常出现连接超时

读取超时时间：指的是已经连接上了之后，读取完目标网址内容的时间

七、使用HttpGet发送请求传参实例

使用http请求某个url时，跟浏览器访问这个url一样，传参数的话直接在url后边加上?parma1=xx&parma2=xxxxx即可，下面是例子代码，代码是以前写的，请求的当时一个项目的地址，我也找不到那个项目在哪里了，自然就没法启动，也就没法把运行结果截图给大家了，我也懒得去找了，直接把请求代码贴出来，放心吧，绝对不会错的：

package cn.zhao.test.utils;

import java.io.IOException;

import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.HttpClient;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.util.EntityUtils;

public class SmsGet {
	public static void main(String[] args) {
		//构造浏览器
		CloseableHttpClient client=HttpClients.createDefault();
		//请求行
		HttpGet httpGet = new HttpGet("http://localhost:8080/test/Channel?param1=111");
		//请求头
		//空格
		//请求体
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
			 response = client.execute(httpGet);
			//响应行
			System.out.println("响应行："+response.getStatusLine().toString());
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
				try {
					client.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
			if (response!=null) {
				try {
					response.close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
		
	}
}

八、使用HttpPost发送请求传参实例

HttpPost传参和HttpGet有点区别了，需要牵涉到一些httpclient的api，下面是使用HttpPost传参的代码：

package cn.zhao.test.utils;

import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.lang.annotation.Annotation;
import java.util.ArrayList;
import java.util.List;

import javax.persistence.Basic;
import javax.persistence.FetchType;

import org.apache.http.Header;
import org.apache.http.HttpEntity;
import org.apache.http.NameValuePair;
import org.apache.http.client.entity.UrlEncodedFormEntity;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpPost;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.apache.http.message.BasicNameValuePair;
import org.apache.http.util.EntityUtils;

public class SmsPost {
	public static void main(String[] args) {
		//构造浏览器
		CloseableHttpClient client=HttpClients.createDefault();
		//请求行
		HttpPost httpPost = new HttpPost("http://localhost:8080/test/Channel");
		//请求头：还以为它有默认值，谁知道没有（//设置一下就有了）
		httpPost.addHeader("Content-Type","application/x-www-form-urlencoded;charset=utf-8");

		List<NameValuePair> list=new ArrayList<>();
		list.add(new BasicNameValuePair("param2", "唉"));
		list.add(new BasicNameValuePair("param3", "没办法"));
		//处理响应部分
		CloseableHttpResponse response =null;
		try {
		      UrlEncodedFormEntity postParam=new UrlEncodedFormEntity(list,"UTF-8");
			httpPost.setEntity(postParam);
			 response = client.execute(httpPost);
			System.out.println("响应码："+response.getStatusLine().getStatusCode());
		} catch (Exception e) {
			e.printStackTrace();
		} finally{
			if (client!=null) {
						try {client.close();} catch (IOException e) {e.printStackTrace();}
					}
					if (response!=null) {
						try {response.close();} catch (IOException e) {e.printStackTrace();}
					}
		}
		
	}
}

通过BasicNameValuePair类可以设置httppost传的参数。

九、HttpPost使用org.apache.http.entity.StringEntity传递参数

第八部分中，使用的是UrlEncodedFormEntity结合BasicNameValuePair类的方式来完成httpPost的传参。

传递参数还有另外一种方式，就是我们本次给出的例子使用org.apache.http.entity.StringEntity，两者之间的区别是适用于不同的content-type场景。

代码如下，因为类中其他无关的代码比较多，我就不粘贴全部了，只把使用到传参的代码这一部分粘贴出来，其余都是一样的：

HttpPost httpPost=new HttpPost(url);
			//构造参数
			JSONObject parmMap=new JSONObject();
			parmMap.put("mobile", request.getParameter("phone"));
			parmMap.put("idCardNo", request.getParameter("idcard"));
			parmMap.put("name", request.getParameter("queryName"));
			AntiUser antiUser=getSessionAntiUser(request);
			parmMap.put("company_id", antiUser.getAnlianyunId());
			parmMap.put("script_id", "5a741b4117b35aebd8cf1d85");
			parmMap.put("price", 20);//query_unit_price
			StringEntity s = new StringEntity(parmMap.toString(), "utf-8");
		    s.setContentEncoding("UTF-8");
		    s.setContentType("application/json;charset=utf-8");
			httpPost.setEntity(s);
			//执行
			response =client.execute(httpPost);	//处理响应部分
		 int statusCode = response.getStatusLine().getStatusCode();
		 if (statusCode!=200) {
			 logger.error("请求"+url+"出错,状态码:"+statusCode);
		 }else {
			HttpEntity entity = response.getEntity();
			String retStr = EntityUtils.toString(entity,"utf-8");
			EntityUtils.consume(entity);
			return JSONObject.fromObject(retStr);
		}

这里需要注意一下，也是很重要的一点，一般来说使用s.setContentEncoding("UTF-8");就可以确保HttpPost返回的retStr不是乱码，但是有时候这一句还不起作用，需要在StringEntity声明的时候也加上编码才可以StringEntity s = new StringEntity(parmMap.toString(), "utf-8");

这样才能确保传出的中文数据不是乱码，StringEntity用来存放需要传递的参数。

乐之者java

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
httpclient,搞懂它

一、httpclient的引入以及HttpGet请求httpclient版本以及maven引入：<dependency><groupId>org.apache.httpcomponents</groupId><artifactId>httpclient</artifactId><version>4.5.2</version></dependency>引入了httpcl...
复制链接

扫一扫

专栏目录