httpclient tutorial chapter1

最新推荐文章于 2019-06-03 21:42:00 发布

__ZhouWei

最新推荐文章于 2019-06-03 21:42:00 发布

阅读量465

点赞数

分类专栏：搜索引擎

搜索引擎专栏收录该内容

1 篇文章 0 订阅

订阅专栏

HttpClient是设计用于发送和接受HTTP信息的。如果不指明集合或者重新规范过的请求/重定向URIs，甚至包括其他的功能性的HTTP信息收发方式，它是不会尝试加工网页中的信息、执行javascript代码等以及猜测网页内容的类型的。

执行请求

HttpClient所执行的完整的流程即为执行HTTP方法。执行一个HTTP方法包含一个或多个HTTP请求/HTTP相应的交换。

下例为其最简单的执行过程：

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
<...>
} finally {
response.close();
}

HTTP请求

所有的HTTP请求都包含方法名、URI和HTTP版本号。HttpClient支持所有HTTP/1.1中定义的方法，例如GET、HEAD、POST、PUT、DELETE、TRACE以及OPTIONS。并通过特定的类将其实现，这些类是：HttpGet、HttpHead、HttpPost、HttpPut、HttpDelete、HttpTrace以及HttpOptions。

URI：（Uniform Resource Indentifier）统一资源标识，指明了哪个资源需要相应该请求。HTTP请求需要的URI包括：协议类型、主机名、可选项、资源路径、可选查询项和可选标志符。

HttpGet httpget = new HttpGet(
"http://www.google.com/search?hl=en&q=httpclient&btnG=Google+Search&aq=f&oq=");

HttpClient提供了URIBuilder来创建URI，例如：

URI uri = new URIBuilder()
.setScheme("http")
.setHost("www.google.com")
.setPath("/search")
.setParameter("q", "httpclient")
.setParameter("btnG", "Google Search")
.setParameter("aq", "f")
.setParameter("oq", "")
.build();
HttpGet httpget = new HttpGet(uri);

以上两个代码片段执行的结果是相同的。

HTTP响应

HTTP响应是指当服务器接受并执行了一段请求信息之后返回的消息。该消息的第一行包含了协议名、版本号以及相关的文本信息。

HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
System.out.println(response.getProtocolVersion());
System.out.println(response.getStatusLine().getStatusCode());
System.out.println(response.getStatusLine().getReasonPhrase());
System.out.println(response.getStatusLine().toString());

其输出为：

HTTP/1.1
200
OK
HTTP/1.1 200 OK

消息头

HTTP消息可以包含一些信息在消息头部，这些信息可以是：内容的长度、内容的类型等。HttpClient支持回收、添加、删除和列举消息头部的这些信息。

HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
Header h1 = response.getFirstHeader("Set-Cookie");
System.out.println(h1);
Header h2 = response.getLastHeader("Set-Cookie");
System.out.println(h2);
Header[] hs = response.getHeaders("Set-Cookie");
System.out.println(hs.length);

其输出为：

Set-Cookie: c1=a; path=/; domain=localhost
Set-Cookie: c2=b; path="/", c3=c; domain="localhost"
2

对消息头的操作甚至可以通过HeaderIerator接口来实现：

HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
HeaderIterator it = response.headerIterator("Set-Cookie");
while (it.hasNext()) {
System.out.println(it.next());
}

其输出为：

Set-Cookie: c1=a; path=/; domain=localhost
Set-Cookie: c2=b; path="/", c3=c; domain="localhost"

当然也支持用于解析HTTP信息为独立的头部信息：

HttpResponse response = new BasicHttpResponse(HttpVersion.HTTP_1_1,
HttpStatus.SC_OK, "OK");
response.addHeader("Set-Cookie",
"c1=a; path=/; domain=localhost");
response.addHeader("Set-Cookie",
"c2=b; path=\"/\", c3=c; domain=\"localhost\"");
HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator("Set-Cookie"));
while (it.hasNext()) {
HeaderElement elem = it.nextElement();
System.out.println(elem.getName() + " = " + elem.getValue());
NameValuePair[] params = elem.getParameters();
for (int i = 0; i < params.length; i++) {
System.out.println(" " + params[i]);
}
}

其输出为：

c1 = a
path=/
domain=localhost
c2 = b
path=/
c3 = c
domain=localhost

HTTP实体

HTTP请求/响应消息可以携带具有内容的实体信息。实体可以即可以被选择添加在请求中也可以添加在响应中。HTTP定义了请求中包含的两种实体：POST和PUT。响应通常包含一个内容实体。响应的实体通常是HEAD方法或者204 No Content , 304 Not Modified , 205 Reset Content。

HttpClient根据实体的内容来源，将其分为3类：

streamed：流式或者传输过程中产生的内容。具体说这两种实体均包含在HTTP响应中。流式的实体通常是不能重复的。

self-contained：这种内容是在内存中或者根据意义推导的，不同于连接和其他实体。自包含的实体是可以重复的这种实体几乎都用于发送HTTP请求中。

wrapping：这种内容由其他实体获得。

当HTTP响应中包含流式内容时，这些区别对于连接管理器十分重要。而当HTTP请求的实体由软件创建并且只使用HttpClient进行传输时，该实体是流式还是自包含的意义并不明显。因此，当内容为不重复时建议使用流式实体，当内容需要重复时则建议使用自包含实体。

Repeatable entities（可重复的实体）

可重复读取的实体包括ByteArrayEntity和StringEntity。

Using HTTP entities（使用HTTP实体）

实体既可以是二进制也可以是字符串形式的，因为其是可以进行字符编码的。

当执行请求时或者服务器端成功的接收到请求并且向客户端发送响应时，就会创建实体。

一方面可以通过HttpEntity#getContent()方法将实体的内容读取出来以java.io.InputStream类型返回，另一方面也可以通过HttpEntity#writeTo(OutputStream)方法利用java的标准输出流将内容写入实体。

当获得一条带有实体的消息时，可以通过HttpEntity#getContentType()和HttpEntity#getContentLength()方法从其消息头部中读取实体内容的类型和实体内容的长度。如果其内容的类型（Content_Type）是字符编码形式的，例如MIME类型文本格式，可以使用HttpEntity#getContentEncoding()方法获取其内容。如果无法读取其消息头，该方法会返回一个-1，并且其内容会以NULL的形式返回。如果消息头部可用，则会返回一个Header类型的对象。

当需要写一个外发的消息时，数据需要通过创建器写入实体。

StringEntity myEntity = new StringEntity("important message",
ContentType.create("text/plain", "UTF-8"));
System.out.println(myEntity.getContentType());
System.out.println(myEntity.getContentLength());
System.out.println(EntityUtils.toString(myEntity));
System.out.println(EntityUtils.toByteArray(myEntity).length);

其输出为：

Content-Type: text/plain; charset=utf-8
17
important message
17

Ensuring release of low level resources（确保发布版本是基于较低版本的资源）

为了确保发布版本是适合系统资源的，就必须使得内容流（InputStream或OutputStream）与实体的结合更为紧密或者又请求发的发起者响应它自己。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
try {
// do something useful
} finally {
instream.close();
}
}
} finally {
response.close();
}

内容流和响应的关闭方式是不同的，前者会试图保持底层链接的畅通直到实体内容被使用完时才会关闭底层的链接。

请注意HttpEntity#writeTo(OutputStream)方法，该方法需要确保一个适合系统资源的发布版本，这样才能将所有需要写入实体的内容发布出去。如果该方法通过调用HttpEntity#getContent()方法获得了一个java.io.InputStream实例，它仍然需要在使用的结尾将其关闭。

当使用流实体进行工作时，可以使用EntityUtils#consume(HttpEntity)方法确保实体的内容被完全的使用了并且基本流在使用过后被关闭了。

Consuming entity content（实体内容的使用）

操作实体的方法比较推荐的有两个：HttpEntity#getContent()和HttpEntity#writeTo(OutputStream)。HttpClient提供了EntityUtils类，其中提供了一些静态方法以助于更方便的读取实体中的内容或信息。我们可以通过该类中的静态方法获取实体内容中的字符串/字节数组来代替直接的使用java.io.InputStream。除非知道一个来自信任HTTP服务器发送的响应实体或者知道该响应实体的确切长度外，十分不推荐使用EntityUtils来对响应实体进行操作。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/");
CloseableHttpResponse response = httpclient.execute(httpget);
try {
HttpEntity entity = response.getEntity();
if (entity != null) {
long len = entity.getContentLength();
if (len != -1 && len < 2048) {
System.out.println(EntityUtils.toString(entity));
} else {
// Stream content out
}
}
} finally {
response.close();
}

在某些情况下读取实体内容会不止一次。因此实体的内容需要被缓存起来，放在内存里或者放在硬盘上。最简单的方法是将其通过BufferedHttpEntity类进行缓存。这样的做法是将原始实体内容放入内存。

CloseableHttpResponse response = <...>
HttpEntity entity = response.getEntity();
if (entity != null) {
entity = new BufferedHttpEntity(entity);
}

Producing entity content（写实体内容）

HttpClient提供了一系列可以有效的基于HTTP连接的用于流输出的基础类。这些类的实例可以将请求封装进实体内部，例如将POST和PUT请求封装入实体内部并发送出去。HttpClient提供一系列类型处理通常的数据类型，例如String，ByteArray，InputStream和File：它们是StringEntity，ByteArrayEntity，InputStreamEntity以及FileEntity。

File file = new File("somefile.txt");
FileEntity entity = new FileEntity(file,
ContentType.create("text/plain", "UTF-8"));
HttpPost httppost = new HttpPost("http://localhost/action.do");
httppost.setEntity(entity);

请注意，InputStreamEntity是不可重复的，因为它只能从底层数据流中读取一次。

HTML forms（HTML表单）

许多应用需要模仿提交表单的过程，例如，需要登录一个Web应用或者提交输入数据。HttpClient提供了一个实体类，UrlEncodedFormEntity来处理这个流程。

List<NameValuePair> formparams = new ArrayList<NameValuePair>();
formparams.add(new BasicNameValuePair("param1", "value1"));
formparams.add(new BasicNameValuePair("param2", "value2"));
UrlEncodedFormEntity entity = new UrlEncodedFormEntity(formparams, Consts.UTF_8);
HttpPost httppost = new HttpPost("http://localhost/handler.do");
httppost.setEntity(entity);

UrlEncodedFormEntity类的实例会调用URL编码来写入参数并且生成以下形式的内容：

param1=value1&param2=value2

Content chunking

通常情况下推荐让HttpClient基于被转译的HTTP消息的性质来选择最适当的编码方式。这样就需要通知HttpClient分组编码，通过设置HttpEntity#setChunked()为true。请注意，HttpClient只会将这个标志作为一种暗示。在使用HTTP协议的版本时该值会被忽略不计，因为版本信息是不支持组块编码的，例如HTTP/1.0。

StringEntity entity = new StringEntity("important message",
ContentType.create("plain/text", Consts.UTF_8));
entity.setChunked(true);
HttpPost httppost = new HttpPost("http://localhost/acrtion.do");
httppost.setEntity(entity);

Response handlers

最常用也是最便捷的处理响应的方法是使用ResponseHandler接口，其中包含handleResponse(HttpResponse response)方法。该方法使得开发人员不需要考虑链接的管理。当使用ResponseHandler时，HttpClient会自动的考察连接是否成功以及请求的执行是否成功或者是否产生了了异常。

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpGet httpget = new HttpGet("http://localhost/json");
ResponseHandler<MyJsonObject> rh = new ResponseHandler<MyJsonObject>() {
@Override
public JsonObject handleResponse(
final HttpResponse response) throws IOException {
StatusLine statusLine = response.getStatusLine();
HttpEntity entity = response.getEntity();
if (statusLine.getStatusCode() >= 300) {
throw new HttpResponseException(
statusLine.getStatusCode(),
statusLine.getReasonPhrase());
}
if (entity == null) {
throw new ClientProtocolException("Response contains no content");
}
Gson gson = new GsonBuilder().create();
ContentType contentType = ContentType.getOrDefault(entity);
Charset charset = contentType.getCharset();
Reader reader = new InputStreamReader(entity.getContent(), charset);
return gson.fromJson(reader, MyJsonObject.class);
}
};
MyJsonObject myjson = client.execute(httpget, rh);

HttpClient interface

HttpClient接口中有所有Http请求执行的方法。它强制是的处理流程中的细节是没有限制的以及非特定的，并且分离了连接管理、状态管理和身份认证使得它们的处理独立实现。这样确实使得为接口添加功能变得简便，例如响应缓存。

通常来说HttpClient实现方式像是面向一些特定目的处理的外观模式或者负责实现处理特定的HTTP协议请求的接口的策略模式，例如重定向、身份认证处理或者确定连接的位置以及保持连接等。这就允许编程人员用第三方或者自己定义的方法选择替换默认的实现。

ConnectionKeepAliveStrategy keepAliveStrat = new DefaultConnectionKeepAliveStrategy() {
@Override
public long getKeepAliveDuration(
HttpResponse response,
HttpContext context) {
long keepAlive = super.getKeepAliveDuration(response, context);
if (keepAlive == -1) {
// Keep connections alive 5 seconds if a keep-alive value
// has not be explicitly set by the server
keepAlive = 5000;
}
return keepAlive;
}
};
CloseableHttpClient httpclient = HttpClients.custom()
.setKeepAliveStrategy(keepAliveStrat)
.build();

HttpClient thread safety

HttpClient的实现中特别强调线程安全。甚至可以使用同一个实例同时发出多个请求。

HttpClient resource deallocation

当CloseableHttpClient类型的一个实例不再需要并且也到了其生命周期的末尾时，使用它的连接管理器必须通过调用CloseableHttpClient#close()方法将其关闭。

CloseableHttpClient httpclient = HttpClients.createDefault();
try {
<...>
} finally {
httpclient.close();
}

HTTP execution context

HTTP最初被设计作为一个状态不多面向进行请求响应的协议。然而，现实世界的应用程序常常需要能够在几个相关的请求-响应交换之间传递状态信息。为了能够使得应用程序可以维护一个进程的状态，HttpClient允许HTTP请求在执行时可以带有一个特定的执行上下文（context），可以称其为HTTP context。多个逻辑相关的请求可以在进行连续的请求时复用一个相同的context来共享一个逻辑会话。HTTP context类似于java.util.Map<String,Object>。它可以存放任意名字值。应用程序可以在发送请求之前填上内容或者在发送请求之后查看context的内容。由于HttpContext可以放置任意的对象，这样并不是线程安全的。因此建议各个线程使用和维护它自己的context。

当执行HTTP请求时，HttpClient会添加以下内容至其context中：

HttpConnection：表示实际连接到的目标服务器的实例。

HttpHost：表示连接目标的实例。

HttpRoute：表示完全路由的实例。

HttpRequest：表示实际的HTTP请求的实例。在context中的最终的HttpRequest对象常常表示为信息的状态例如它要发送至的目标服务器。在默认HTTP/1.0和HTTP/1.1之前，使用相关的请求URIs。如果请求的发送是否使用无管道技术经过代理服务器，URI也会是完全的。

HttpResponse：表示实际的响应实例。

java.lang.Boolean：作为一个标识表示实际的请求是否被目标方完全接收。

RequestConfig：表示实际的请求设定。

java.util.List<URI>：表示一组在请求执行过程中发生重定向的所有位置。

可以使用HttpClientContext适配器类与context状态进行交互。

HttpContext context = <...>
HttpClientContext clientContext = HttpClientContext.adapt(context);
HttpHost target = clientContext.getTargetHost();
HttpRequest request = clientContext.getRequest();
HttpResponse response = clientContext.getResponse();
RequestConfig config = clientContext.getRequestConfig();

多个请求队列表示一个逻辑相关的段，它们在执行时会使用一个相同的context实例来确保请求之间可以自由的进行context以及状态信息的交互。

在下面这个例子中，通过初始化请求设置的请求设定在请求执行过程中被多个请求通过相同的context相互传递和共同使用。

CloseableHttpClient httpclient = HttpClients.createDefault();
RequestConfig requestConfig = RequestConfig.custom()
.setSocketTimeout(1000)
.setConnectTimeout(1000)
.build();
HttpGet httpget1 = new HttpGet("http://localhost/1");
httpget1.setConfig(requestConfig);
CloseableHttpResponse response1 = httpclient.execute(httpget1, context);
try {
HttpEntity entity1 = response1.getEntity();
} finally {
response1.close();
}
HttpGet httpget2 = new HttpGet("http://localhost/2");
CloseableHttpResponse response2 = httpclient.execute(httpget2, context);
try {
HttpEntity entity2 = response2.getEntity();
} finally {
response2.close();
}

Exception handling

HttpClient能够抛出两种异常：java.io.IOException用于处理I/O处理的异常，例如Socket超时或者Socket重启；HttpException是用于标识HTTP异常的信号，例如违反了HTTP协议。通常情况下I/O的错误并非致命的并且是可以弥补的，而HTTP协议的错误被认为是致命的以及无法自动处理或补救的。

HTTP transport safety

理解HTTP协议并不是适用于所有类型的应用很重要。HTTP是简单面向请求/响应的协议，它最初设计是用于支持静态和动态处理内容的接受。它从未被需求支持操作的传递。例如，如果请求被成功送达，HTTP服务器会根据约定将它那部分填满，然后产生一个响应并连同一个状态码返回给客户端。即使客户端没有接受到服务器发送的响应，服务器端也不会重新发送该响应。如果客户端据定重新发送与之前相同的请求，服务器端会多次果断的结束执行相同处理流程。有时候这种情况可能会导致应用程序数据异常或应用程序状态异常。

HTTP设计之初就未曾希望其能够进行过程的传输，但它仍然是可以在特定情况下用于传输关键应用程序的任务的协议。为了确保HTTP传输层的安全，系统必须确定在应用层有与HTTP方法等幂的方法。

Idempotent methods

HTTP/1.1特别定义了等幂方法应该是：方法拥有“等幂性”应当是对于处理N个(N>0)个特定请求的副作用应当与一个简单请求的相同。

换句话说，就是应用程序应当清楚它必须时刻准备着处理多个相同的方法。例如，可以通过提供唯一的处理ID并且尽量不要执行逻辑相同的操作。

请注意，这个问题不是特指HttpClient。基于浏览器的应用程序在关于HTTP方法的非等幂方法都必须面对同样的问题。

HttpClient假定方法中不包含实体的方法是等幂的，例如GET和HEAD；而包含实体的方法是非等幂的，例如POST和PUT。

Automatic exception recovery

HttpClient会默认的试图自动修复I/O异常。自动修复机制仅限于一些特定已知安全的异常。

HttpClient不会试图从逻辑错误或者HTTP协议错误中进行修复。（这些异常是由HttpException类抛出的）
HttpClient会自动重试那些被假定为等幂的方法。
HttpClient会自动重试那些在传输过程中出现异常并且本机还在不断向目标服务器发送HTTP请求的方法。

Request retry handler

如果需要自定义实现异常修复机制，可以通过对HttpRequestRetryHandler接口进行实现。

HttpRequestRetryHandler myRetryHandler = new HttpRequestRetryHandler() {
public boolean retryRequest(
IOException exception,
int executionCount,
HttpContext context) {
if (executionCount >= 5) {
// Do not retry if over max retry count
return false;
}
if (exception instanceof InterruptedIOException) {
// Timeout
return false;
}
if (exception instanceof UnknownHostException) {
// Unknown host
return false;
}
if (exception instanceof ConnectTimeoutException) {
// Connection refused
return false;
}
if (exception instanceof SSLException) {
// SSL handshake exception
return false;
}
HttpClientContext clientContext = HttpClientContext.adapt(context);
HttpRequest request = clientContext.getRequest();
boolean idempotent = !(request instanceof HttpEntityEnclosingRequest);
if (idempotent) {
// Retry if the request is considered idempotent
return true;
}
return false;
}
};
CloseableHttpClient httpclient = HttpClients.custom()
.setRetryHandler(myRetryHandler)
.build();

Aborting requests

在某些情况下，HTTP请求在期望的时间内执行失败是因为服务器端过高的负载或者客户端并发了过多的请求。在这种情况下尽早的终止请求并且解除I/O端的执行线程的线程锁是必要的。HttpClient在执行HTTP请求时可以通过HttpUriRequest#abort()方法在任意阶段将其终止。该方法是线程安全的并且可以被任意线程调用。当HTTP请求被要求退出执行之后，执行它的线程（假定此时该线程正在执行I/O操作）一定会解除线程锁并抛出InterruptedIOException异常。

HTTP protocol interceptors

Http协议拦截器是一个实现了HTTP协议另一方面的程序。通常情况下，协议拦截器应当能够从发送到本机的消息以及发送出去的消息中中识别出具有特定HTTP头或这一组相关HTTP头的消息。协议拦截器也可以操作消息中包含的具有内容的实体，这些实体可以经过压缩/解压缩。通过装饰器设计模式进行设计，即在原本的实体类外再包上一层实体类。

CloseableHttpClient httpclient = HttpClients.custom()
.addInterceptorLast(new HttpRequestInterceptor() {
public void process(
final HttpRequest request,
final HttpContext context) throws HttpException, IOException {
AtomicInteger count = (AtomicInteger) context.getAttribute("count");
request.addHeader("Count", Integer.toString(count.getAndIncrement()));
}
})
.build();
AtomicInteger count = new AtomicInteger(1);
HttpClientContext localContext = HttpClientContext.create();
localContext.setAttribute("count", count);
HttpGet httpget = new HttpGet("http://localhost/");
for (int i = 0; i < 10; i++) {
CloseableHttpResponse response = httpclient.execute(httpget, localContext);
try {
HttpEntity entity = response.getEntity();
} finally {
response.close();
}
}

Redirect handling

除了特别指明禁止HTTP的方式外(例如需要用户介入的操作)，HttpClient在处理重定向时都是自动的。

LaxRedirectStrategy redirectStrategy = new LaxRedirectStrategy();
CloseableHttpClient httpclient = HttpClients.custom()
.setRedirectStrategy(redirectStrategy)
.build();

CloseableHttpClient httpclient = HttpClients.createDefault();
HttpClientContext context = HttpClientContext.create();
HttpGet httpget = new HttpGet("http://localhost:8080/");
CloseableHttpResponse response = httpclient.execute(httpget, context);
try {
HttpHost target = context.getTargetHost();
List<URI> redirectLocations = context.getRedirectLocations();
URI location = URIUtils.resolve(httpget.getURI(), target, redirectLocations);
System.out.println("Final HTTP location: " + location.toASCIIString());
// Expected to be an absolute URI
} finally {
response.close();
}