URL主要功能用于读取服务端或者目标地址的数据,但是要具体对一个请求的元数据进行解析,就无能为力了。这个时候,URLConnection就是一个很好的切入口。这东西不仅仅能读取数据,还能对元数据进行读取,还能读取header(头header还是很重要的,互联网开发中,往往我们就是要header里面的数据),并且,URLConnection还能使用各种的HTTP方法(POST/GET/OPTIONS/PUT/DELETE)往服务端发数据。本章我不介绍太多,尽量精简。
一、构建与读
整体上使用URLConnection的基本步骤如下:
- 构造一个URL对象
- 调用这个URL的openConnection()获取一个对应的URLConnection对象
- 配置这个URLConnection
- 读取首部字段
- 获取输入流并读取数据
- 获得输出流并写入数据
- 关闭连接
基本的代码片段如下:
try{
URL u = new URL("http://www.baidu.com");
URLConnection conn = u.openConnection();
// 从URL读取
} catch(MalformedURLException ex){
System.err.println(ex);
} catch(IOException ex){
System.err.println(ex)
}
1、内部一些简单原理
- URLConnection是一个抽象类,只有一个方法没有实现:public void connect() throws IOException
- 一些常见的实现类:
- sun.net.www.protocol.file.FileURLConnection:文件名相关
- sun.net.www.protocol.http.HttpURLConnection:网络相关
- 创建URLConnection之后,不进行connect的调用,在第一次要进行数据通信的时候,才调用,例如:getInputStream、getContent、getHeaderField等
2、读取服务器的数据
public class NetworkMain {
public static void main(String[] args) {
try {
URL url = new URL("https://www.baidu.com");
URLConnection urlConnection = url.openConnection();
try (InputStream inputStream = urlConnection.getInputStream();){
InputStream buffer = new BufferedInputStream(inputStream);
InputStreamReader reader = new InputStreamReader(buffer);
int c;
while ((c=reader.read()) != -1){
System.out.print((char) c);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
URL与URLConnection的区别:
- URLConnection提供了对HTTP首部的读取
- URLConnection可以配置发送给服务器的请求
- URLConnection出了读取,还能写入,箱服务器
3、首部与读取
下面是一个百度首页获取的header具体信息:
Accept-Ranges:[bytes]
null:[HTTP/1.1 200 OK]
Server:[bfe/1.0.8.18]
Etag:["58860402-98b"]
Cache-Control:[private, no-cache, no-store, proxy-revalidate, no-transform]
Connection:[Keep-Alive]
Set-Cookie:[BDORZ=27315; max-age=86400; domain=.baidu.com; path=/]
Pragma:[no-cache]
Last-Modified:[Mon, 23 Jan 2017 13:24:18 GMT]
Content-Length:[2443]
Date:[Thu, 13 Sep 2018 09:51:05 GMT]
Content-Type:[text/html]
获取的代码如下:
public class NetworkMain {
public static void main(String[] args) {
try {
URL url = new URL("https://www.baidu.com");
URLConnection urlConnection = url.openConnection();
Map<String, List<String>> headerFields = urlConnection.getHeaderFields();
for(Map.Entry<String,List<String>> entry : headerFields.entrySet()){
System.out.println(entry.getKey()+":"+entry.getValue().toString());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
a、Content-type
返回响应主题的MIME((Multipurpose Internet Mail Extensions)多用途互联网邮件扩展类型)。其实就是内容的类型和编码方式
- 没指定不会抛异常,直接返回null
- text/html不指定编码方式,默认使用ISO-8859-1,是http默认编码方式
- 其他常用的类型还有:text/plain、image/gif、application/xml、image/jpeg
- 可通过getContentEncoding方法进行获取编码方式,不指定会返回null
b、Content-length
获取内容的总共的字节大小,如果没有content-length头,getContentLength()方法就返回-1
- Java7中增加了个getContentLengthLong方法,返回long,防止超出int最大值
- http下载一个二进制文件,最好使用getContentLength方法来判断什么时候结束InputStream对象
c、Date
指出文件何时发送的
d、Expires
指示何时从缓存中删除文档,如果没有这个header,getExpireation方法放回0,表示永远不会过期
e、Last-Modified
文档最后修改时间,没有这个header的话,getLastModified方法返回0
二、缓存
缓存是永恒的话题~好吧,web浏览器的缓存,也是一个能屠龙的功能。这小节会介绍下如果使用web缓存,与Java中设置缓存的几个类
1、如何设置header使之能够缓存
一般来说GET的HTTP请求都会缓存,也应该缓存,但是POST请求就不应该缓存。当然这些都可以通过header进行调整:
- Expires首部(HTTP1.0)指示可以缓存这个资源,知道指定的时间为止
- Cache-control首部(HTTP1.1)细粒度的缓存控制,如果这个和expires首部都有,会以这个首部为主,多个cache-control是被允许的:
- Max-age=[second]:从现在到缓存项过期之前的秒数
- s-maxage=[seconds]:从现在起,知道缓存项再共享缓存中过期之前的秒数。私有缓存可以将缓存项保存更长时间
- Public:可以缓存一个经过认证的响应。否则已认证的响应不能缓存
- Private:仅单个用户缓存可以保存响应,而共享缓存不应该保存
- No-cache:缓存项仍然可以缓存,不过客户端在每次访问时都要用一个Etag或者Last-modified头重新验证响应
- no-store:不管怎样都不缓存
- Last-modified:最后一次修改日期。客户端可以使用一个HEAD请求来检查这个日期,只有当本地缓存的日期早于这个值,才会真正执行GET请求
- Etag:资源的唯一标识。HEAD请求访问这个Etag服务端的值,只有与本地的Etag值不同的情况下,说明缓存失效了,才会调用GET请求
2、Java的Web缓存
默认请款下,直接使用URL请求资源的时候,Java是不进行缓存的,要默认实现几个类来增加Java对Web请求的缓存功能:
ResponseCache//设置默认缓存策略的对象
CacheRequest//设置请求的对象
CacheResponse//设置回复请求的对象
一个简简单单的实现代码,稍微有点长,不过不难,其中还有对header中cache-control字段的解析对象构建,是一个不错的起步例子:
import java.io.*;
import java.net.*;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
public class NetworkMain {
public static class CacheControl {
private Date maxAge = null;
private Date sMaxAge = null;
private boolean mustRevalidate = false;
private boolean noCache = false;
private boolean noStore = false;
private boolean proxyRevalidate = false;
private boolean publicCache = false;
private boolean privateCache = false;
public CacheControl(String s) {
if (s == null || !s.contains(":")) {
return; // default policy
}
String value = s.split(":")[1].trim();
String[] components = value.split(",");
Date now = new Date();
for (String component : components) {
try {
component = component.trim().toLowerCase(Locale.US);
if (component.startsWith("max-age=")) {
int secondsInTheFuture = Integer.parseInt(component.substring(8));
maxAge = new Date(now.getTime() + 1000 * secondsInTheFuture);
} else if (component.startsWith("s-maxage=")) {
int secondsInTheFuture = Integer.parseInt(component.substring(8));
sMaxAge = new Date(now.getTime() + 1000 * secondsInTheFuture);
} else if (component.equals("must-revalidate")) {
mustRevalidate = true;
} else if (component.equals("proxy-revalidate")) {
proxyRevalidate = true;
} else if (component.equals("no-cache")) {
noCache = true;
} else if (component.equals("public")) {
publicCache = true;
} else if (component.equals("private")) {
privateCache = true;
}
} catch (RuntimeException ex) {
continue;
}
}
}
public Date getMaxAge() {
return maxAge;
}
public Date getSharedMaxAge() {
return sMaxAge;
}
public boolean mustRevalidate() {
return mustRevalidate;
}
public boolean proxyRevalidate() {
return proxyRevalidate;
}
public boolean noStore() {
return noStore;
}
public boolean noCache() {
return noCache;
}
public boolean publicCache() {
return publicCache;
}
public boolean privateCache() {
return privateCache;
}
}
public static class SimpleCacheRequest extends CacheRequest {
private ByteArrayOutputStream out = new ByteArrayOutputStream();
@Override
public OutputStream getBody() throws IOException {
return out;
}
@Override
public void abort() {
out.reset();
}
public byte[] getData() {
if (out.size() == 0) {
return null;
} else {
return out.toByteArray();
}
}
}
public static class SimleCacheResponse extends CacheResponse {
private final Map<String, List<String>> headers;
private final SimpleCacheRequest request;
private final Date expires;
private final CacheControl control;
public SimleCacheResponse(SimpleCacheRequest request, URLConnection uc, CacheControl control) throws IOException {
this.request = request;
this.control = control;
this.expires = new Date(uc.getExpiration());
this.headers = Collections.unmodifiableMap(uc.getHeaderFields());
}
@Override
public InputStream getBody() {
return new ByteArrayInputStream(request.getData());
}
@Override
public Map<String, List<String>> getHeaders()
throws IOException {
return headers;
}
public CacheControl getControl() {
return control;
}
public boolean isExpired() {
Date now = new Date();
if (control.getMaxAge().before(now)) return true;
else if (expires != null && control.getMaxAge() != null) {
return expires.before(now);
} else {
return false;
}
}
}
public static class MemoryCache extends ResponseCache {
private final Map<URI, SimleCacheResponse> responses
= new ConcurrentHashMap<URI, SimleCacheResponse>();
private final int maxEntries;
public MemoryCache() {
this(100);
}
public MemoryCache(int maxEntries) {
this.maxEntries = maxEntries;
}
@Override
public CacheRequest put(URI uri, URLConnection conn)
throws IOException {
if (responses.size() >= maxEntries) return null;
CacheControl control = new CacheControl(conn.getHeaderField("Cache-Control"));
if (control.noStore()) {
return null;
} else if (!conn.getHeaderField(0).startsWith("GET ")) {
// only cache GET
return null;
}
SimpleCacheRequest request = new SimpleCacheRequest();
SimleCacheResponse response = new SimleCacheResponse(request, conn, control);
responses.put(uri, response);
return request;
}
@Override
public CacheResponse get(URI uri, String requestMethod,
Map<String, List<String>> requestHeaders)
throws IOException {
if ("GET".equals(requestMethod)) {
SimleCacheResponse response = responses.get(uri);
// check expiration date
if (response != null && response.isExpired()) {
responses.remove(response);
response = null;
}
return response;
} else {
return null;
}
}
}
public static void main(String[] args) {
ResponseCache.setDefault(new MemoryCache());
try {
URL url = new URL("https://www.baidu.com");
URLConnection urlConnection = url.openConnection();
Map<String, List<String>> headerFields = urlConnection.getHeaderFields();
for (Map.Entry<String, List<String>> entry : headerFields.entrySet()) {
System.out.println(entry.getKey() + ":" + entry.getValue().toString());
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
三、一些连接配置项
URLConnection类有7个保护的字段,定义了客户端如何向服务端作出请求,JDK源码中,对这些配置项做了很好的说明,直接读英文无压力,我就不多说了:
/**
* The URL represents the remote object on the World Wide Web to
* which this connection is opened.
* <p>
* The value of this field can be accessed by the
* {@code getURL} method.
* <p>
* The default value of this variable is the value of the URL
* argument in the {@code URLConnection} constructor.
*
* @see java.net.URLConnection#getURL()
* @see java.net.URLConnection#url
*/
protected URL url;
/**
* This variable is set by the {@code setDoInput} method. Its
* value is returned by the {@code getDoInput} method.
* <p>
* A URL connection can be used for input and/or output. Setting the
* {@code doInput} flag to {@code true} indicates that
* the application intends to read data from the URL connection.
* <p>
* The default value of this field is {@code true}.
*
* @see java.net.URLConnection#getDoInput()
* @see java.net.URLConnection#setDoInput(boolean)
*/
protected boolean doInput = true;
/**
* This variable is set by the {@code setDoOutput} method. Its
* value is returned by the {@code getDoOutput} method.
* <p>
* A URL connection can be used for input and/or output. Setting the
* {@code doOutput} flag to {@code true} indicates
* that the application intends to write data to the URL connection.
* <p>
* The default value of this field is {@code false}.
*
* @see java.net.URLConnection#getDoOutput()
* @see java.net.URLConnection#setDoOutput(boolean)
*/
protected boolean doOutput = false;
/**
* If {@code true}, this {@code URL} is being examined in
* a context in which it makes sense to allow user interactions such
* as popping up an authentication dialog. If {@code false},
* then no user interaction is allowed.
* <p>
* The value of this field can be set by the
* {@code setAllowUserInteraction} method.
* Its value is returned by the
* {@code getAllowUserInteraction} method.
* Its default value is the value of the argument in the last invocation
* of the {@code setDefaultAllowUserInteraction} method.
*
* @see java.net.URLConnection#getAllowUserInteraction()
* @see java.net.URLConnection#setAllowUserInteraction(boolean)
* @see java.net.URLConnection#setDefaultAllowUserInteraction(boolean)
*/
protected boolean allowUserInteraction = defaultAllowUserInteraction;
/**
* If {@code true}, the protocol is allowed to use caching
* whenever it can. If {@code false}, the protocol must always
* try to get a fresh copy of the object.
* <p>
* This field is set by the {@code setUseCaches} method. Its
* value is returned by the {@code getUseCaches} method.
* <p>
* Its default value is the value given in the last invocation of the
* {@code setDefaultUseCaches} method.
*
* @see java.net.URLConnection#setUseCaches(boolean)
* @see java.net.URLConnection#getUseCaches()
* @see java.net.URLConnection#setDefaultUseCaches(boolean)
*/
protected boolean useCaches = defaultUseCaches;
/**
* Some protocols support skipping the fetching of the object unless
* the object has been modified more recently than a certain time.
* <p>
* A nonzero value gives a time as the number of milliseconds since
* January 1, 1970, GMT. The object is fetched only if it has been
* modified more recently than that time.
* <p>
* This variable is set by the {@code setIfModifiedSince}
* method. Its value is returned by the
* {@code getIfModifiedSince} method.
* <p>
* The default value of this field is {@code 0}, indicating
* that the fetching must always occur.
*
* @see java.net.URLConnection#getIfModifiedSince()
* @see java.net.URLConnection#setIfModifiedSince(long)
*/
protected long ifModifiedSince = 0;
/**
* If {@code false}, this connection object has not created a
* communications link to the specified URL. If {@code true},
* the communications link has been established.
*/
protected boolean connected = false;
对象中有相对应的set和get方法,一般如果在openConnection方法调用之后进行set,都会抛出IllegalStateException异常
四、向服务端写数据
这部分两块,写header,写内容
1、设置请求数据的header
这里设置header和前面的不一样,前面是对服务端请求过来的数据进行header读取,这里会回写服务端的时候,对这个请求Request进行header添加的操作,主要用下面这几个方法:
public void setRequestProperty(String key, String value);//设置一个key对应的值,value可以逗号分隔设置多个
public void addRequestProperty(String key, String value);//对一个key的值进行添加值的操作
比较好玩的是,发现setRequestProperty的源码不难,可以看看,增加源码的亲密度
public abstract class URLConnection {
...
public void setRequestProperty(String key, String value) {
if (connected)
throw new IllegalStateException("Already connected");
if (key == null)
throw new NullPointerException ("key is null");
if (requests == null)
requests = new MessageHeader();
requests.set(key, value);
}
...
}
public class MessageHeader {
private String[] keys;
private String[] values;
private int nkeys;
public synchronized void set(String var1, String var2) {
int var3 = this.nkeys;
do {
--var3;
if (var3 < 0) {
this.add(var1, var2);
return;
}
} while(!var1.equalsIgnoreCase(this.keys[var3]));
this.values[var3] = var2;
}
public synchronized void add(String var1, String var2) {
this.grow();
this.keys[this.nkeys] = var1;
this.values[this.nkeys] = var2;
++this.nkeys;
}
private void grow() {
if (this.keys == null || this.nkeys >= this.keys.length) {
String[] var1 = new String[this.nkeys + 4];
String[] var2 = new String[this.nkeys + 4];
if (this.keys != null) {
//会发现底层JDK会使用这种,因为快速!
System.arraycopy(this.keys, 0, var1, 0, this.nkeys);
}
if (this.values != null) {
System.arraycopy(this.values, 0, var2, 0, this.nkeys);
}
this.keys = var1;
this.values = var2;
}
}
}
2、POST写数据
其实对于是GET还是POST写数据,Java的URLConnection会有个类似于自动判断的功能:
- 默认是GET
- 如果将doOutput参数置为true,使用OutputStream写数据,就是POST,会自动设置header
- 当然,有其他方法主动设置请求方法
下面是一个提交POST请求的小小例子:
public static void main(String[] args) {
try {
URL url = new URL("https://www.baidu.com");
URLConnection urlConnection = url.openConnection();
urlConnection.setDoOutput(true);
OutputStream outputStream = urlConnection.getOutputStream();
BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(outputStream, "8859_1"));
bw.write("lalalalallala");
bw.flush();
bw.close();
} catch (IOException e) {
e.printStackTrace();
}
}
五、HttpURLConnection
默认如何URL请求是一个http的协议的话,返回的就是这个HttpURLConnection这个对象,他是URLConnection的抽象子类。使用public void setRequestMethod(String method) throws ProtocolException
方法来设置具体使用什么HTTP请求方法。下面几个常用的方法罗列:
- GET
- POST
- HEAD:经常用于获取最后修改时间以淘汰缓存
- PUT
- DELETE
- OPTIONS:跨域使用(重点),询问服务器支持哪些HTTP的方法
- TRACE:查看服务器和客户端之间的代理服务器做了哪些修改,可以ng配置查询使用