面向基础系列之---Java网络编程---网络连接组件的使用（URL与URI）

最新推荐文章于 2022-11-20 15:06:34 发布

weixin_34342207

最新推荐文章于 2022-11-20 15:06:34 发布

阅读量367

点赞数

文章标签： java 运维人工智能

原文链接：https://my.oschina.net/UBW/blog/2050374

版权

为什么80%的码农都做不了架构师？>>>

一、URI与URL的定义与区别

URI：统一资源标识符（Uniform Resource Identifier）是采用一种特定的语法标识一个资源的字符串

URL：是一个URI，除了能标识一个资源，还会为资源提供一个特定的网络位置，客户端可以用它来获取资源

1、URI组成

a、基本格式如下：

模式:模式特定部分

b、一些模式列举：

data：包含Base64编码的数据
file：本地磁盘上的文件
ftp：FTP服务器
http：使用超文本传输协议的国际互联网服务
mailto：电子邮件地址
magnet：可以通过对等网络下载的资源（例如BT种子）
telnet：telnet服务器
urn：统一资源名

2、URI编码

编码，对于这个章节来说，是一个重头戏。往往一个链接地址，会被编码问题搞得晕头转型，所以要特别重视

a、模式部分组成：

字母
数字
加号（+）
点（.）
连号（-）

b、模式特定部分组成：

ASCII字母数字
-、_、.、!和~
定界符：/、?、&和=
其他非ASCII字符，都要用%号进行转义编码
”/“与”@“如果不是用于特定的部分（定界符）都要进行转义编码

3、URL组成

Protocol://userInfo@host:port/path?query#fragment

按照上面的罗列，顺序说就是：

协议（protocol）
用户信息（userInfo）
URL主机（host）
端口（port）
路径（path）
查询（query）
片段（fragment）

4、理解相对URL

这里主要理解两种变现形式的相对URL定位，绝对路径是https://my.oschina.net/UBW/blog/2046461情况下

a、第一种导航定位：

<a href="javafaq.html">

结果是：https://my.oschina.net/UBW/blog/javafaq.html

b、第二种导航定位

<a href="/projects/ipv6/">

结果是：https://my.oschina.net/projects/ipv6/

二、URL类

java.net.URL类是对统一资源定位符的抽象：

final类型，直接继承Object类，线程安全
内部使用了策略设计模式，根据不同策略处理不同协议
既能标识资源，又能获取资源

1、创建URL

public URL(String spec) throws MalformedURLException//①
public URL(String protocol, String hostname, String file) throws MalformedURLException//②
public URL(String protocol, String host, int port, String file) throws MalformedURLException//③
public URL(URL base, String relative) throws MalformedURLException//④

a、公共的一些点：

如果所在的JVM不支持某种协议，创建URL的时候会抛出MalformedURLException异常
Java不对对它构造的URL完成任何正确性检查与合法性检查

b、判断一个协议是否被支持的代码片段：

public static void testProtocol(String url){
    try{
        URL url = new URL(url);
        System.out.prinln("supported");
    }catch (MalformedURLException e){
        System.out.prinln("not supported");
    }
}

c、②与③构造的情况

②的构造器没有port，所以默认设置成-1，-1的port被解析成默认协议的端口，例如http就是80
file这个参数包含：路径、文件名、可选的片段标识符
file是一个全路径，如果不小心写成不是/开头的相对路径，会抛错MalformedURLException

d、④的使用

try{
    URL u1 = new URL("https://my.oschina.net/UBW/blog/2046461");
    URL u2 = new URL(ur, "mailinglists.html");
    /**
    u2包装的url是：https://my.oschina.net/UBW/blog/mailinglists.html
    */
} catch(MalformedURLException e){
    System.out.prinln(e);
}

2、通过URL获取地址数据

这个小结是重点的重点，主要获取地址数据的方法，为后面的服务器构建的基石。

public final InputStream openStream() throws java.io.IOException //①
public URLConnection openConnection() throws java.io.IOException //②
public final Object getContent() throws java.io.IOException //③
public final Object getContent(Class[] classes) throws java.io.IOException //④

a、①方法：简单的获取原始内容，不包括任何http的header与协议其他信息

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;

public class NetworkMain {

    public static void main(String[] args) {
        try {
            URL u = new URL("http://www.baidu.com");
            InputStream inputStream = u.openStream();
            int c;
            while((c = inputStream.read()) != -1){
                System.out.print((char) c);
            }
            inputStream.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

这种方式是最简单的方式，也是用处最少的，因为只能获取简单的文本方式的内容，对二进制的文件内容就无济于事了。并且还获取不到header，这个对于识别文件的编码，是很不友好的。

b、②方法：出了原始内容之外，可以访问这个协议指定的所有元数据和header

import java.io.IOException;
import java.io.InputStream;
import java.net.URL;
import java.net.URLConnection;
import java.util.List;
import java.util.Map;

public class NetworkMain {

    public static void main(String[] args) {
        try {
            URL u = new URL("http://www.baidu.com");
            URLConnection urlConnection = u.openConnection();
            Map<String, List<String>> headerFields = urlConnection.getHeaderFields();
            // 这里就是获取header的方式
            for(Map.Entry<String, List<String>> entry : headerFields.entrySet()){
                System.out.println(entry.getKey()+":"+entry.getValue().toString());
            }
            // 获取的输入流其实也是内容
            InputStream inputStream = urlConnection.getInputStream();
            int c;
            while((c = inputStream.read()) != -1){
                System.out.print((char) c);
            }
            inputStream.close();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

c、③方法：返回一个Object对象，根据源的不同能向下转型成不同的对象

例如图像对象java.awt.image.ImageProducer
例如音频对象sun.applet.AppletAudioClip
例如我们熟悉的对象java.io.InputStream
这个方法会读取header里面的Content-type字段，来进行类型识别。如果不识别的对象，会返回一个java.io.InputStream对象来代替

d、④方法：类似于③，不过会更具传入的class对象返回所希望的对象

URL u = new URL("http://www.nwu.org");
Class<?>[] types = new Class[3];
types[0] = String.class;
types[1] = Reader.class;
types[2] = InputStream.class;
Object o = u.getContent(types);

3、获取URL各个部分

下面写一个完整的URL的地址：

http ://admin:123456@www.ibiblio.org:8080/java/books/jnp/index.html/isbn=123423423#toc

模式（http）
授权机构：用户信息+主机名+端口（admin:123456@www.ibiblio.org:8080）
路径（/java/books/jnp/index.html）
片段标识符（toc）
查询字符串（isbn=123423423）

public String getProtocol();//协议信息(例如：http、https)
public String getHost();//主机名
public int getPort();//端口，没指定的话返回-1
public int getDefaultPort();//协议的默认端口，协议没有默认端口的话返回-1，即使设定了端口，也不影响这个的返回
public String getFile();//路径+文件名(例如上面url中的：/java/books/jnp/index.html)
public String getPath();//路径(例如上面url中的：/java/books/jnp/)
public String getRef();//片段(例如上面例子中的toc)
public String getQuery();//查询字符串例如(isbn=123423423)
public String getUserInfo();//用户信息，例如(admin:123456)
public String getAuthority();//授权机构

4、URL相等性比较

equal()方法会阻塞，进行DNS查询，主机一样，将会判定相等
当且仅当两个URL有相同的主机、端口和路径上相同的资源，而且有相同的片段和查询，才相等
URL没有实现Comparable接口
http://www.url.com:80 不等于http://www.url.com
http://www.url.com/index.html 不等于http://www.url.com
sameFile()不考虑片段标识符
URL的equal是先根据字面进行层级性判定，如果字面都有着同样的层级且不相等，才会进行DNS解析

/***
 * equals源码解析：不难，几乎都是字符串的判断，就一处是使用了InetAdress
 */

//java.net.URLStreamHandler
protected boolean equals(URL u1, URL u2) {
    String ref1 = u1.getRef();
    String ref2 = u2.getRef();
    return (ref1 == ref2 || (ref1 != null && ref1.equals(ref2))) &&
        sameFile(u1, u2);
}

protected boolean sameFile(URL u1, URL u2) {
    // Compare the protocols.
    if (!((u1.getProtocol() == u2.getProtocol()) ||
          (u1.getProtocol() != null &&
           u1.getProtocol().equalsIgnoreCase(u2.getProtocol()))))
        return false;

    // Compare the files.
    if (!(u1.getFile() == u2.getFile() ||
          (u1.getFile() != null && u1.getFile().equals(u2.getFile()))))
        return false;

    // Compare the ports.
    int port1, port2;
    port1 = (u1.getPort() != -1) ? u1.getPort() : u1.handler.getDefaultPort();
    port2 = (u2.getPort() != -1) ? u2.getPort() : u2.handler.getDefaultPort();
    if (port1 != port2)
        return false;

    // Compare the hosts.
    // hostsEqual会调用InetAdress的getByName()，进行DNS解析
    if (!hostsEqual(u1, u2))
        return false;

    return true;
}

protected boolean hostsEqual(URL u1, URL u2) {
    InetAddress a1 = getHostAddress(u1);
    InetAddress a2 = getHostAddress(u2);
    // if we have internet address for both, compare them
    if (a1 != null && a2 != null) {
        return a1.equals(a2);
        // else, if both have host names, compare them
    } else if (u1.getHost() != null && u2.getHost() != null)
        return u1.getHost().equalsIgnoreCase(u2.getHost());
    else
        return u1.getHost() == null && u2.getHost() == null;
}

protected synchronized InetAddress getHostAddress(URL u) {
    if (u.hostAddress != null)
        return u.hostAddress;

    String host = u.getHost();
    if (host == null || host.equals("")) {
        return null;
    } else {
        try {
            // 这里进行了DNS查询操作
            u.hostAddress = InetAddress.getByName(host);
        } catch (UnknownHostException ex) {
            return null;
        } catch (SecurityException se) {
            return null;
        }
    }
    return u.hostAddress;
}

三、URI类

列出一些关键点吧：

URI完全有关于资源的标识和URI解析。
URI不能解析目标地址的数据
如果单纯进行地址字符串的比较，最好使用URI，因为URL的比较方法有可能进行DNS查询，而RUI不会
URI有toURL()方法，同样的URL也有toURI()方法

1、创建URI

public URI(String uri) throws URISyntaxException;//①
public URI(String scheme, String schemeSpecificPart, String fragment) 
    throws URISyntaxException;//②
public URI(String scheme, String host, String path, String fragment) 
    throws 	URISyntaxException;//③
public URI(String scheme, String authority, String path, String query, String fragment) 	throws URISyntaxException;//④
public URI(String scheme, String userInfo, String host, int port, String path, 
	String query, String fragment) throws URISyntaxException;//⑤

URI不依赖于底层的协议处理器，所以只要语法上面正确，就可以，所以可以创建类似于：tel:、urn:等这种

如果URI以冒号开头，说白了就是语法不正确，就会抛出URISyntaxException
scheme是协议也是模式的意思，必须由ASCII字母、数字以及三个标点符号（+、-和.）组成，并且必须以字母开头
模式scheme字段可以为null，如果为null那么就是一个相对的URI
模式scheme字段不为null的时候，并且path参数不以/开头，那么就会抛出URISyntaxException异常
上面各个构造方法都是对各个参数进行合法性检查的。如果确定URI是一个合法的URI，可以直接使用 public static URI create(String str)进行创建

2、URI各部分

这里有个不透明和层次的概念，很难理解

a、透明与不透明（opaque：不透明的）

下面是JDK源码：

    /**
     * Tells whether or not this URI is opaque.
     *
     * <p> A URI is opaque if, and only if, it is absolute and its
     * scheme-specific part does not begin with a slash character ('/').
     * An opaque URI has a scheme, a scheme-specific part, and possibly
     * a fragment; all other components are undefined. </p>
     *
     * @return  {@code true} if, and only if, this URI is opaque
     */
    public boolean isOpaque() {
        return path == null;
    }

解释下方法官方说明：一个URI如果要是不透明，当且仅当他是一个绝对路径且他的模式特定部分不是/开头。一个不透明的URI会有模式，模式头顶部分，和可能存在片段。其他各个部分都是未定义的。

如果一个URI是透明的，就是我们经常使用的层次形式的URI，当然URL是透明的
其他的各种URI都是不透明的，isOpaque都返回true

public static void main(String[] args) {
    try {
        URI opaque = new URI("mail:jicheng@163.com");
        System.out.println(opaque.isOpaque());//true
        URI notOpaque = new URI("mail:/jicheng@163.com");
        System.out.println(notOpaque.isOpaque());//false
    } catch (URISyntaxException e) {
        e.printStackTrace();
    }
}

b、获取各个部分

//下面方法获取到的是解码的字符串
public String getAuthority();
public String getFragment();
public String getHost();
public String getPath();
public int getPort();
public String getQuery();
public String getUserInfo();
//下面方法获取到的是未解码的字符串，就是带%那种
public String getRawAuthority();
public String getRawFragment();
public String getRawPath();
public String getRawQuery();
public String getRawUserInfo();

不透明的URI只能获取到getScheme、getSchemeSpecificPart、getFragment

c、URI相等性比较

几点要注意的：

比较模式与授权机构不区分大小写，其余部分区分
转义字符比较前不解码
hashCode与相等性是一致行为

下面是比较点，顺序向下进行判断：

如果模式不同就比价模式，不区分大小写
模式相等，一般认为透明（层次）的URI小于有相同模式的不透明的URI
如果都是不透明的URI，则根据模式特定部分进行比较
如果不透明切模式特定部分都相等，就根据片段进行比较
如果两个URI都是透明的，则根据授权机构进行排序，其中主机比较是不区分大小写
如果模式和授权机构都相等，使用路径来排序
如果路径也相等，就比较查询字符串
如果查询字符串也相等，就比较片段

public static void main(String[] args) {
    try {
        URI uri1= new URI("maIL://jicheng:jicheng@wwW.JICHENg.com");
        URI uri2= new URI("mail://jicheng:jicheng@www.jicheng.com");
        System.out.println(uri1.equals(uri2));//true
    } catch (URISyntaxException e) {
        e.printStackTrace();
    }
}

四、URL的编码与解码

对于URL字符串的编解码是业界的一个频率非常高的操作。平时互联网的相关开发，尤其要注意这个。

1、URL字符的组成

大写字母
小写字母
数字
标点：-、_、.、!、~、*、'、,
字符：/、&、?、@、#、;、$、+、=、%出现在路径或者查询字符串中要进行编码

2、URLEncoder

默认情况下使用这个进行编码，会对路径分隔符：/、&、=、:进行编码的，这不是我们想要的，我们只要对查询字符串进行编码就行：

public static void main(String[] args) {
    String url = "https://www.jicheng.com/search?";
    try {
        url+=URLEncoder.encode("h1","UTF-8");
        url+="=";
        url+=URLEncoder.encode("I/O","UTF-8");
        System.out.println(url);//https://www.jicheng.com/search?h1=I%2FO
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
    }
}

3、URLDecoder

这个可以全部字符串进行解码操作，不用担心局部被影响的问题了

public static void main(String[] args) {
    String url = "//https://www.jicheng.com/search?h1=I%2FO";
    try {
        String decode = URLDecoder.decode(url, "UTF-8");
        System.out.println(decode);
    } catch (UnsupportedEncodingException e) {
        e.printStackTrace();
	}
}

转载于:https://my.oschina.net/UBW/blog/2050374