Java网咯编程——第五章 URL和URI

最新推荐文章于 2023-12-04 09:59:19 发布

WYCPhoenix

最新推荐文章于 2023-12-04 09:59:19 发布

阅读量531

点赞数

分类专栏： java网络编程文章标签：网络编程 url uri java

本文链接：https://blog.csdn.net/sagwsagw/article/details/71189453

版权

java网络编程专栏收录该内容

9 篇文章 1 订阅

订阅专栏

统一资源标识符URI：采用特定㞏标识一个资源的字符串，一个通用的URI指定的资源是什么而不指定资源位置、如何获取，其组成如下

模式 : 模式特定部分（典型的有授权机构、路径、查询）

其中模式可能是 data、file、ftp、http、mailto、magnet、telnet、urn

模式特定部分由ASCII字母数字组成，还包括 _ 、 - 、. 、！、 ~，除了用作定界符的/ 、？、 &、 =外，所有其他非ASCII字符，需要使用%转义，其后是该字符的UTF-8的十六进制码

URL统一资源定位符，是一种URI，除了指定资源外，还未资源指定特定的网络位置，典型的URL组成如下

protocol://userInfo@host:port/path?query#fragment

protocol：即URI中的模式，不能使URN，

userInfo：是用户登录信息，可选

host：提供资源的主机，可以使主机名，也可以是IP

port：可选，如果是协议的默认端口可以省略

userInfo+host+port = URI中的权威机构

path：指定服务器上资源的路径，如果该path映射到的是文件系统，则一般path是相对于文档的根目录，而不是相对于文件系统的根目录

query：向服务器附加的参数，一般用在http的URL中

fragment：指定远程资源的某个特定部分，例如，如果是HTML文档，则指向HTML中某个anchor，即用id标识的部分，fragment是URL引用，不是URL

相对URL：URL可以继承父类文档的协议、主机名、路径，如果继承了父类文档的部分信息，则不完整的URL称为相对URL，相对URL认为缺少的部分和所在文档的URL中对应部分相同，即路径相对于当前文档，需要注意的是，如果相对URL以 / 开头，说明此相对URL是相对于文档根目录

相对URL优点

1、减少录入

2、相对URL允许用多种协议提供文档树

3、相对URL允许将整个文档树从一个网站转移到另一个网站而不破坏内部链接

java.net.URL类，是对URL的抽象，是一、个final类，使用的是策略模式，而不是继承，因为URL是不可变的，因此也是线程安全的，Java中的URL类只会检查URL的模式，模式特定部分不予检查，对于用不支持的协议或者错误URL语法创建的URL，将抛出MalformedURLException（父类异常是IOException）

URL类应该主要用于从服务器下载内容，其他操作诸如字符串解析操作应该选择URI

从URL获取数据（均会抛出IOException ）

基本的openStream()，返回一个InputStream，可以从这个流中读取数据，且是一个原始的、未经解释的数据；

需要控制下载过程，则需要使用openConnection()，返回一个URLConnection，进而通过这个对象获得InputStream对象，如果希望和服务器直接通信，推荐使用这个方法，通过openConnection ，不仅可以接收服务器发送的所有数据，还可以访问协议指定的元数据；

使用getContent()，它将从服务器获取的数据中寻找Content-Type字段，如果没有或者无法识别，经返回某种InputStream，即通过getContent 可以返回一个更加完整的对象，如Image、String等，；

   import java.io.IOException; 
 
   import java.net.URL; 
 
   public class ContentGetter { 
 
        public static void main(String[] args) { 
 
              String str = "https://www.baidu.com"; 
 
              try { 
 
                   URL url = new URL(str); 
 
                   Object obj = url.getContent(); 
 
                   System.out.println("I get a " + obj.getClass().getName()); 
 
              } catch (IOException e) { 
 
                   System.err.println(e); 
 
              } 
 
        } 
 
   } 
 
   使用getContent最大的问题在于确定获得的对象类型，可以通过instance关键字解决 
 
   import java.io.IOException; 
 
   import java.io.InputStream; 
 
   import java.io.Reader; 
 
   import java.net.URL; 
 
   public class ContentGetter { 
 
        public static void main(String[] args) { 
 
              String str = "https://www.baidu.com/"; 
 
              Class<?>[] types = new Class[3]; 
 
              types[0] = String.class; 
 
              types[1] = Reader.class; 
 
              types[2] = InputStream.class; 
 
              try { 
 
                   URL url = new URL(str); 
 
                   Object obj = url.getContent(types); 
 
                   if (obj instanceof String) { 
 
                        System.out.println("String"); 
 
                        System.out.println(obj); 
 
                   } else if (obj instanceof Reader) { 
 
                        System.out.println("Reader"); 
 
                        int c; 
 
                        Reader r = (Reader) obj; 
 
                        while((c = r.read()) != -1) { 
 
                              System.out.print((char) c); 
 
                        } 
 
                        r.close(); 
 
                   } else { 
 
                        System.out.println("InputStream"); 
 
                        InputStream in = (InputStream) obj; 
 
                        int c; 
 
                        while((c = in.read()) != -1) { 
 
                              System.out.write(c); 
 
                        } 
 
                        in.close(); 
 
                   } 
 
              } catch (IOException e) { 
 
                   System.out.println(e); 
 
              } 
 
        } 
 
   }

URL分解

通用URL格式： protocol://userInfo@host:port/path?query#fragment

   import java.net.MalformedURLException; 
 
   import java.net.URL; 
 
   public class URLSplitter { 
 
        public static void main(String[] args) { 
 
              String[] str = {"https://www.baidu.com/"}; 
 
              URLSplitter(str); 
 
        } 
 
        public static void URLSplitter(String[] str) { 
 
              for(int i = 0; i < str.length; i++) { 
 
                   try { 
 
                        //protocol://userInfo@host:port/path?query#fragment 
 
                        URL url = new URL(str[i]); 
 
                        System.out.println("URL is " + url); 
 
                        // get protocol 
 
                        System.out.println("Protocol is " + url.getProtocol()); 
 
                        // get uerInfo 
 
                        System.out.println("UserInfo is " + url.getUserInfo()); 
 
                        // get host 
 
                        String host = url.getHost(); 
 
                        if (host != null) { 
 
                              int atSign = host.indexOf("@"); 
 
                              if (atSign != -1) { 
 
                                   host = host.substring(atSign+1); 
 
                              } 
 
                              System.out.println("Host is " + host); 
 
                        } else { 
 
                              System.out.println("Host is null"); 
 
                        } 
 
                        // get port 
 
                        int port = url.getPort(); 
 
                        if (port != -1) { 
 
                              System.out.println("Port is " + port); 
 
                        } else { 
 
                              int defaultPort = url.getDefaultPort(); 
 
                              if (defaultPort != -1) { 
 
                                   System.out.println("DefaultPort is " + defaultPort); 
 
                              } else { 
 
                                   System.out.println("Unknown Port"); 
 
                              } 
 
                        } 
 
                        // get path 
 
                        System.out.println("Path is " + url.getPath()); 
 
                        // get query 
 
                        System.out.println("Query is " + url.getQuery()); 
 
                        //get fragment 
 
                        System.out.println("Freagment is " + url.getRef()); 
 
                        // get file  Note: 主机名后的第一个 / 开始，直到最后都认为是file，Java不区分path和fileName 
 
                        System.out.println("File is " + url.getFile()); 
 
                        // get Authority : userInfo@host:port 
 
                        System.out.println("Authority is " + url.getAuthority()); 
 
                   } catch (MalformedURLException e) { 
 
                         System.out.println(e); 
 
                   } 
 
              } 
 
        } 
 
   }

相等性和比较

当且仅当URL指向相同的主机、端口、路径、查询、资源、片段时，两个URL才是相等的，需要注意的是，URL的equals方法会尝试DNS解析主机来判断主机是否相等，即equals可能是个阻塞的IO操作，因此应避免将URL存储在依赖equals的数据结构中，如hashmap，如要存储，更好的选择是使用URI；URL的equals不会具体比较两个URL标识的资源

URL类没有实现Comparable

URL类的samFile()，可以检查两个URL是否指向同一个资源

URI：包含URL和URN

URL类和URI类的主要区别：

1、URI完全有关于资源的标识和URI的解析，不提供获取URI所标识资源的方式

2、较URL，URI和规范描述更加一致

3、URI对象可以标示相对URI，URL类在存储URI时会将其绝对化

即，URL主要用于网络资源的获取，而URI主要用于解析和处理字符串，因此如果需要下载内容应使用URL，如果需要标识应使用URI，都需要则在必要情况下，两者可以互相转换

URI的构造——Java中的URI是不可变对象

1、Java中URI构造函数中的字符串不限于ASCII，也可以包含Unicode等字符、

2、模式scheme部分，是URI的协议部分，必须是ASCII字母、数字、和三个符号+、-、.，且必须以字母开头，如果模式为null，则表示创建相对URI

3、模式特定部分，URI类会用%对无效字符进行编码

4、构造函数中任何不可转义的语法错误都会抛出URISyntaxException

5、构造函数中任何参数都可以设置为null，表示忽略该字段

6、若确定URI没有错误，则可使用URI.create，该静态方法可能抛出IllegalArguementException

URI分解

不透明URI，是绝对URI，其模式特定部分不是用 / 开始的，不透明URI无法对模式特定部分完成进一步解析；而分层URI是以 / 开始，可以对模式特定部分进一步解析

分层URI如果路径部分使用 / 开始，则称为绝对URI，否则是相对的

模式：模式特定部分#分段

1、URI类提供了一系列方法用于返回URI的各个部分，其中getFoo()，表示返回URI编码后的部分，getRowFoo()，表示返回解码后的部分，每个部分如果不存在则返回null或者-1

2、没有模式部分的URI为相对URI，可以使用isAbsolute()判断

3、模式特定部分，对于是否透明，可以使用isOpaque()判断，对于分层结构，还有获取对应字段的getFoo()和getRowFoo()

4、对于非ASCII字符，Java中的URI不会将其先使用%转义，即非ASCII字符仍会出现在getRowFoo()中，除非在构造的时候已经编码

5、Java不会在一开始就检查授权机构的语法错误，其后果是一般无法返回授权机构的各个部分，可以使用parseServerAuthority()强制解析，无法解析则抛出URISyntaxException

   import java.net.URI; 
 
   import java.net.URISyntaxException; 
 
   public class URISpliter { 
 
        public static void main(String[] args) { 
 
              String[] str = {"http://java.sun.com/j2se/1.3/docs/guide/collections/designfaq.html#28", 
 
                                   "mailto:java-net@java.sun.com"}; 
 
              URISpliter(str); 
 
        } 
 
        public static void URISpliter(String[] strIn) { 
 
              for(String str : strIn) { 
 
                   try { 
 
                        URI uri = new URI(str); 
 
                        System.out.println("URI is " + uri); 
 
                        if (uri.isOpaque()) { 
 
                              //不透明URI的模式特定部分无法进一步解析 
 
                              System.out.println("this is an opaque URI."); 
 
                              System.out.println("ths scheme is " + uri.getScheme()); 
 
                              System.out.println("the scheme specific part is " + uri.getSchemeSpecificPart()); 
 
                              System.out.println("the fragment ID is " + uri.getFragment()); 
 
                        } else { 
 
                              //分层的URI，可以进一步解析模式特定部分 
 
                              System.out.println("this is hierarchical URI."); 
 
                              System.out.println("the scheme is " + uri.getScheme()); 
 
                              try { 
 
                                   //由于Java不实现对授权机构进行检测，因此需要强制转换授权机构 
 
                                   uri = uri.parseServerAuthority(); 
 
                                   System.out.println("the userInfo is " + uri.getUserInfo()); 
 
                                   System.out.println("the host is " + uri.getHost()); 
 
                                   System.out.println("the port is " + uri.getPort()); 
 
                              } catch (URISyntaxException e) { 
 
                                   //是基于注册的授权机构 
 
                                   System.out.println("the authority is " + uri.getAuthority()); 
 
                              } 
 
                              System.out.println("the path is " + uri.getPath()); 
 
                              System.out.println("the query is " + uri.getQuery()); 
 
                              System.out.println("the fragment ID is " + uri.getFragment()); 
 
                        } 
 
                   } catch (URISyntaxException e) { 
 
                        System.out.println(str + " dose not seem to be a URI"); 
 
                   } 
 
                   System.out.println(); 
 
              } 
 
        } 
 
   }

相等性和比较

与URL类似，相等的URI必须都是层次的或者不透明的，模式和授权机构不区分大小写，其余部分除了用于转义的16进制数外，区分大小写，转义字符在比较时不会解码；

URI实现了Comparable接口，即可以排序，URI只能和URI进行比较，否则抛出ClassCastException异常，比较顺序：

1、模式不同，则比较模式，不考虑大小写

2、模式相等的，层次URI小于不透明URI；

3、都是不透明的，则按模式特定部分排序；

4、模式和不透明的模式特定部分相同，则比较fragment；

5、都是层次URI，则按授权机构排序，授权机构按用户信息、主机、端口排序，其中主机名不区分大小写

6、模式、授权机构相同，比较路径

7、路径相同，比较query；

8、query相同，比较fragment

URI字符串表示

toString()，返回URI未编码的字符串形式，即非ASCII字符不转义，因此可能返回错误的URI

toASCIIString()，返回编码后的字符串形式，即非ASCII字符用 % 转义，因此总是返回正确的URI

URL编码

除了A~Z、a~z，0~9，-、_、.、！、~、*、‘、，外，以及用于分隔用途的 / 、&、@、#、；、$、+、=字符外，其他字符均需要进行编码：%两个十进制数；对于空格，既可以编码成%20，也可以编码为 +，而 + 的编码为 %20；/、#、=、&、？用于文件名时也需要编码;

URL类不会自动的编码、解码，可以使用无效的ASCII、非ASCII、转义字符%等构建URL;

编码字符集最好选择 UTF-8，兼容性更好；

编码RULEncoder.encode（），可能抛出UnsupportedEncodingException ，需要注意的是，该方法对所有传入的字符进行编码，同时会对用作分隔的 : 、/、&、= 进行编码，因此正确的用法是对分一部分进行分别编码，最后合成正确的URL！！！！

   import java.io.UnsupportedEncodingException; 
 
   import java.net.URLEncoder; 
 
   public class EncodeURL { 
 
        public static void main(String[] args) { 
 
              //错误的方式 
 
              try { 
 
                   String str = "https://www.google.com/search?h1=en&as_q=Java&as_epq=I/O"; 
 
                   String query_error = URLEncoder.encode(str, "UTF-8"); 
 
                   System.out.println(query_error); 
 
              } catch (UnsupportedEncodingException e) { 
 
                   System.out.println(e); 
 
              } 
 
              //正确的方式 
 
              try { 
 
                   String url = "https://www.google.com/search?"; 
 
                   url += URLEncoder.encode("h1", "UTF-8"); 
 
                   url += "="; 
 
                   url += URLEncoder.encode("en", "UTF-8"); 
 
                   url += "&"; 
 
                   url += URLEncoder.encode("as_q", "UTF-8"); 
 
                   url += "="; 
 
                   url += URLEncoder.encode("java", "UTF-8"); 
 
                   url += "&"; 
 
                   url += URLEncoder.encode("as_epq", "UTF-8"); 
 
                   url += "="; 
 
                   url += URLEncoder.encode("I/O", "UTF-8"); 
 
                   System.out.println(url); 
 
              } catch (UnsupportedEncodingException e) { 
 
                   System.out.println(e); 
 
              } 
 
        } 
 
   } 
 
   更加方便的方法 
 
   import java.io.UnsupportedEncodingException; 
 
   import java.net.URLEncoder; 
 
   public class QueryString { 
 
        private StringBuilder query = new StringBuilder(); 
 
        public QueryString(){} 
 
        public synchronized void add(String name, String value) { 
 
              //URL规范不禁止过度转换，WEB浏览器负责处理过度编码 
 
              query.append("&"); 
 
              encode(name, value); 
 
        } 
 
        private synchronized void encode(String name, String value) { 
 
              try { 
 
                   query.append(URLEncoder.encode(name, "UTF-8")); 
 
                   query.append("="); 
 
                   query.append(URLEncoder.encode(value, "UTF-8")); 
 
              } catch (UnsupportedEncodingException e) { 
 
                   throw new RuntimeException("VM dose not support UTF-8"); 
 
              } 
 
        } 
 
        public synchronized String getQuery() { 
 
              return query.toString(); 
 
        } 
 
        @Override 
 
        public synchronized String toString() { 
 
              return getQuery(); 
 
        } 
 
        public static void main(String[] args) { 
 
              QueryString query = new QueryString(); 
 
              query.add("h1", "end"); 
 
              query.add("as_q", "Java"); 
 
              query.add("as_epq", "I/O"); 
 
              String url = "https://www.google.com/search?" + query; 
 
              System.out.println(url); 
 
        } 
 
   }

URL解码

URLDecoder.decode()，解码字符集未知时最好选择 UTF-8，该方法仅对转义字符处理，非转义字符不做处理，即可以传入完成的URL；注意，可能抛出UnsupportedEncodingException，解码无效时会抛出IllegalArguementException

GET方式与服务器程序通信

1、对于GET方式通信，只需要知道程序希望接收什么样的名-值组合

2、构建查询字符串，包含必要的名-值对

3、构造对应的URL

4、发送到服务器，获得响应

   import java.io.BufferedInputStream; 
 
   import java.io.IOException; 
 
   import java.io.InputStream; 
 
   import java.io.InputStreamReader; 
 
   import java.net.MalformedURLException; 
 
   import java.net.URL; 
 
   public class BaiduQuery { 
 
       public static void main(String[] args) { 
 
           String[] search = {"alibaba"}; 
 
           String target = ""; 
 
           for (int i = 0; i < search.length; i++) { 
 
               target += search[i]; 
 
           } 
 
           target = target.trim(); 
 
           QueryString query = new QueryString(); 
 
           query.add("q", target); 
 
           try { 
 
               URL u = new URL("https://www.baidu.com/s?" + query); 
 
               try (InputStream in = new BufferedInputStream(u.openStream())) { 
 
                   InputStreamReader theHTML = new InputStreamReader(in); 
 
                   int c; 
 
                   while ((c = theHTML.read()) != -1) { 
 
                       System.out.print((char) c); 
 
                   } 
 
               } 
 
           } catch (MalformedURLException e) { 
 
               System.out.println(e); 
 
           } catch (IOException e) { 
 
               System.out.println(e); 
 
           } 
 
       } 
 
   }