URI入门详解之规范阅读(RFC 3986)

本文详细介绍了URI的基础概念,包括URI的通用语法、字符处理、各组件(如方案、权威信息、路径、查询和片段)的详细规则。此外,还涵盖了URI的使用,如相对引用、绝对URI、同文档引用,以及解析过程和规范化比较,旨在帮助理解URI的使用和解析机制。
摘要由CSDN通过智能技术生成
Uniform Resource Identifier (URI): Generic Syntax

被更新:
Updated by: 6874 与 7320–(被替代)–》8820
替代了:
Obsoletes: 2732, 2396, 1808

相关:
URL方案的注册:RFC 2717 Registration Procedures for URL Scheme Names

阅读之前:
阅读依赖 : ABNF(RFC 5234)
RFC系列的源文件地址:
都提交于Gitee,https://gitee.com/testzyh/notes/tree/master/RFC

时间:2005 1月

概要:

URI是一个紧密的字符串序列标识一个抽象的或物理的资源。本规范定义了基本的URI语法与处理相对形式的路径。
A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource. This specification defines the generic URI syntax and a process for resolving URI references that might be in relative form, along with guidelines and security considerations for the use of URIs on the Internet. The URI syntax defines a grammar that is a superset of all valid URIs, allowing an implementation to parse the common components of a URI reference without knowing the scheme-specific requirements of every possible identifier. This specification does not define a generative grammar for URIs; that task is performed by the individual specifications of each URI scheme.

opaque :not transmitting

1、介绍

一个 Uniform Resource Identifier (URI) 提供了一种简单、可扩展的方式实现标识一个资源。本规范的URI语法与语意源自World Wide Web的全球信息提议上,起源于1990年,其内容在 “Universal Resource Identifiers in WWW” [RFC1630]。而语法的设计是要符合如下两个所推荐的:“Functional Recommendations for Internet Resource Locators” [RFC1736] 与 “Functional Requirements for Uniform Resource Names” [RFC1737]。

本文档替代了RFC 2396,而其将URL与相对URL结合在一起,形成统一的URI:
This document obsoletes [RFC 2396], which merged “Uniform Resource Locators” [RFC 1738] and “Relative Uniform Resource Locators” [RFC 1808] in order to define a single, generic syntax for all URIs。

1.1 URI的概览

URI有如下的特性:

  • Uniform(统一):统一性带来了几个好处。1、允许不同的类型的标识符在同一个上下文中出现,即使不同类型的标识符其所获取资源的机制不尽相同。2、如果不同类型的资源有通用的语法惯例,则可以允许统一的语意解释。(It allows uniform semantic interpretation of common syntactic conventions across different types of resource identifiers. )3、可以引入一种新的类型,而不影响原有的。4、允许标识符在不同的上下文中进行复用,对于新的应用或者协议可以利用预先存在的、大量的、广泛使用的URI。
  • Resource(资源):本规范不会限制资源的范围。资源这个词用在一种更一般的情况下-------任何可以被URI标识。比如:一个电子文档、一个图片、一个有着连续目的性的信息源(比如,某地的天气预报)、一个服务(比如:HTTP-to-SMS的网关)、或者一个其他资源的集合。一个资源是否能被互联网访问并不是必要的,比如:人类、公司、图书馆里的书也是资源。相似的,抽象的概念也可以是资源,比如:数学的符号、关系的类型(父母、员工)、数字(比如、0、1、无穷)
  • Identifier(标识符):一个标识符包含特定的信息,可以在标识范围内和其他内容区分开。Our use of the terms “identify” and “identifying” refer to this purpose of distinguishing one resource from all other resources, regardless of how that purpose is accomplished (e.g., by name, address, or context). These terms should not be mistaken as an assumption that an identifier defines or embodies the identity of what is referenced, though that may be the case for some identifiers. Nor should it be assumed that a system using URIs will access the resource identified: in many cases, URIs are used to denote resources without any intention that they be accessed. Likewise, the “one” resource identified might not be singular in nature (e.g., a resource might be a named set or a mapping that varies over time).

URI的格式定义在第三节,其允许我们通过一种分离定义的、可扩展的命名方案集对资源进行统一标识。而标识是如何完成、分配、使能全都委派给指定的方案(Scheme)去完成。

URI有一个全局范围,任何上下文中都可以被解释,即使解释得到的结果与终端用户的上下文有关联。比如:http://localhost/

1.1.1 通用语法(generic syntax)

本规范定义了所有 URI 方案所需的或许多 URI 方案通用的 URI 语法元素。 因此,它定义了所需的语法和语义,可以为 URI reference的实现一个scheme独立的解析机制,通过该机制可以推迟对 URI 的scheme相关的处理,直到需要schema相关的语义。 (This specification defines those elements of the URI syntax that are required of all URI schemes or are common to many URI schemes. It thus defines the syntax and semantics needed to implement a scheme-independent parsing mechanism for URI references, by which the scheme-dependent handling of a URI can be postponed until the scheme-dependent semantics are needed. )

同样,使用 URI 引用的协议和数据格式可以参考本规范作为所有 URI 允许的语法范围的定义,包括那些尚未定义的方案。(Likewise, protocols and data formats that make use of URI references can refer to this specification as a definition for the range of syntax allowed for all URIs, including those schemes that have yet to be defined. )

解耦合:通过上面的方式,将scheme识别的发展 与使用 URI 的协议、数据格式和实现的发展进行解耦合。(This decouples the evolution of identification schemes from the evolution of protocols, data formats, and implementations that make use of URIs.)

解析器:A parser of the generic URI syntax can parse any URI reference intoits major components. Once the scheme is determined, further scheme-specific parsing can be performed on the components.

通用语法是所有URI方案的超集:In otherwords, the URI generic syntax is a superset of the syntax of all URI schemes.

1.1.2 例子
ftp://ftp.is.co.za/rfc/rfc1808.txt
http://www.ietf.org/rfc/rfc2396.txt
ldap://[2001:db8::7]/c=GB?objectClass?one
mailto:John.Doe@example.com
news:comp.infosystems.www.servers.unix
tel:+1-816-555-1212
telnet://192.0.2.16:80/
urn:oasis:names:specification:docbook:dtd:xml:4.1.2
1.1.3 URI、URL与URN

一个URI可以进一步分类为定位器(Locator)、名称、或都有。Uniform Resource Locator" (URL)这个词指的是URI的子集,除了标识资源外,其还提供了一种通过描述其主要访问机制的定位资源的方法(比如:网络位置)。而 "Uniform Resource Name"(URN)在历史上指的是以URI的 urn scheme [RFC 2141]以及任何拥有name属性的URI,要求在全局范围内保持独特性,并且在资源不存或无法访问在是也需要维持。

一个独立的sheme无需将自己归类于定位器或者名称。URI的实例可以自由拥有这样的特性,这依赖于对持久性的考虑以及对所有权的分配,而不是shceme的品质。

未来的规范与相关文档应该使用 通用的词 URI,而不是带有限制性的 URL或者 URN。([RFC3305])

1.2 设计的考虑

1.2.1 转录性

转录性的目地:可以用一个简单的场景来描述。 想象一下,两个同事 Sam 和 Kim 坐在一家酒吧里国际会议和交流研究思想。山姆问Kim获取更多信息的地址,因此 Kim在餐巾纸上写了URI的网址。回到家后,山姆拿出餐巾纸并将 URI 键入计算机,然后计算机获取到Kim所指向的信息。

URI的应用场景:需要被人们记住,应对有意义和令人熟悉。
A URI often has to be remembered by people, and it is easier for people to remember a URI when it consists of meaningful or familiar components.

1.2.3 有层次结构的标识符

URI的层次结构:从左到右,组件的重要性逐渐降低。scheme的模糊性与可见性。
The URI syntax is organized hierarchically, with components listed in order of decreasing significance from left to right. For some URI schemes, the visible hierarchy is limited to the scheme itself: everything after the scheme component delimiter (:) is considered opaque to URI processing. Other URI schemes make the hierarchy explicit and visible to generic parsing algorithms.

相对URI的位置独立性,scheme独立性。以及可以不修改,完成移植。
Relative referencing of URIs allows document trees to be partially independent of their location and access scheme. For instance, it is possible for a single set of hypertext documents to be simultaneously accessible and traversable via each of the “file”, “http”, and “ftp” schemes if the documents refer to each other with relative references. Furthermore, such document trees can be moved, as a whole, without changing any of the relative references.

2、字符

2.1 百分号编码

比如:"%20" 表示 ASCII码的SP(空格)= 0x20 = 32

pct-encoded = "%" HEXDIG HEXDIG

2.2 预留字符

reserved = gen-delims / sub-delims
gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
sub-delims = "!" / "$" / "&" / "’" / "(" / ")"
              / "*" / "+" / "," / ";" / "="

2.3 未预留字符

unreserved = ALPHA / DIGIT / "-" / "." / "_" / "˜"

3. 语法组件

URL的语法:URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

  • schemepath这两个是必须要有的,但是 path可以是空的(无字符)。
  • 如果 authority存在,则 path要么为空要么以 /字符为起始。
  • 如果 authority不存在,则 path不能以 //为开头。
hier-part   = "//" authority path-abempty
            / path-absolute
            / path-rootless
            / path-empty

例子:

         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose

3.1. 方案(Scheme)

方案需要注册,方案下有着自己的规则。

格式:scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

3.2. Authority

很多的URL的scheme都定义的名空间的所有权:
Many URI schemes include a hierarchical element for a naming authority so that governance of the name space defined by the remainder of the URI is delegated to that authority (which may, in turn, delegate it further).

其前面是 //,以 /,或 ?,或 #,或URI的结尾为终结:
The authority component is preceded by a double slash ("//") and is terminated by the next slash ("/"), question mark ("?"), or number sign ("#") character, or by the end of the URI.

格式:authority = [ userinfo "@" ] host [ ":" port ]

3.2.1 用户信息

userinfo = *( unreserved / pct-encoded / sub-delims / ":" )

3.2.2 主机

host = IP-literal / IPv4address / reg-name

IP-literal = "[" ( IPv6address / IPvFuture ) "]"

IPvFuture = "v" 1*HEXDIG "." 1*( unreserved / sub-delims / ":" )

IPv6address =                            6( h16 ":" ) ls32
            /                       "::" 5( h16 ":" ) ls32
            / [               h16 ] "::" 4( h16 ":" ) ls32
            / [ *1( h16 ":" ) h16 ] "::" 3( h16 ":" ) ls32
            / [ *2( h16 ":" ) h16 ] "::" 2( h16 ":" ) ls32
            / [ *3( h16 ":" ) h16 ] "::"    h16 ":"   ls32
            / [ *4( h16 ":" ) h16 ] "::"              ls32
            / [ *5( h16 ":" ) h16 ] "::"              h16
            / [ *6( h16 ":" ) h16 ] "::"

ls32        = ( h16 ":" h16 ) / IPv4address
            ; least-significant 32 bits of address

h16         = 1*4HEXDIG
            ; 16 bits of address represented in hexadecimal

IPv4address = dec-octet "." dec-octet "." dec-octet "." dec-octet

dec-octet   = DIGIT                 ; 0-9
            / %x31-39 DIGIT         ; 10-99
            / "1" 2DIGIT            ; 100-199
            / "2" %x30-34 DIGIT     ; 200-249
            / "25" %x30-35          ; 250-255

reg-name = *( unreserved / pct-encoded / sub-delims )

3.2.3 端口

port = *DIGIT

3.3. Path

path的终止符:
The path is terminated by the first question mark ("?") or number sign ("#") character, or by the end of the URI.

格式:

path          = path-abempty    ; begins with "/" or is empty
              / path-absolute   ; begins with "/" but not "//"
              / path-noscheme   ; begins with a non-colon segment
              / path-rootless   ; begins with a segment
              / path-empty      ; zero characters
              
path-abempty  = *( "/" segment )
path-absolute = "/" [ segment-nz *( "/" segment ) ]
path-noscheme = segment-nz-nc *( "/" segment )
path-rootless = segment-nz *( "/" segment )
path-empty    = 0<pchar>
segment       = *pchar
segment-nz    = 1*pchar
segment-nz-nc = 1*( unreserved / pct-encoded / sub-delims / "@" )
              ; non-zero-length segment without any colon ":"
              
pchar         = unreserved / pct-encoded / sub-delims / ":" / "@"

3.4 Query

query组件包含的是无层次结构的数据,以第一个 ?号标记开始,以 #号或者URI的结尾标记结束:
The query component contains non-hierarchical data that, along with data in the path component (Section 3.3), serves to identify a resource within the scope of the URI’s scheme and naming authority (if any). The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

query = *( pchar / "/" / "?" )

3.5 Fragment

fragment标识符组件标识的是主资源的二级资源(子资源)的部分或者一些额外信息。用#号的出现表示开始,而URI的结尾表示结束:
The fragment identifier component of a URI allows indirect identification of a secondary resource by reference to a primary resource and additional identifying information. The identified secondary resource may be some portion or subset of the primary resource, some view on representations of the primary resource, or some other resource defined or described by those representations. A fragment identifier component is indicated by the presence of a number sign ("#") character and terminated by the end of the URI.

fragment = *( pchar / "/" / "?" )

4、使用

4.1 URI的引用

URI-reference = URI / relative-ref

4.2 相对引用

relative-ref  = relative-part [ "?" query ] [ "#" fragment ]

relative-part = "//" authority path-abempty
              / path-absolute
              / path-noscheme
              / path-empty

4.3 绝对URI

absolute-URI = scheme ":" hier-part [ "?" query ]

4.4 同文档引用

当一个 URI-reference指向一个URI,不考虑其fragment的部分,而于base URI相同,则称这样的引用为同文档引用。

4.5 前缀引用

heuristics:(探索、启发)Derived from a Greek word that means “to discover,” heuristic describes a rule or a method that comes from experience and helps you think through things, like the process of elimination, or the process of trial and error. You can think of a heuristic as a shortcut. Besides finding it in philosophy books, if you are interested in computing, you’ll find references to heuristic programming. You can use it as a noun or as an adjective.

前缀引用很流行,其只包含URI的authority部分和path部分,比如:
www.w3.org/Addressing/,或者简单的只是一个自己的DNS注册名。

这样子的引用,只是为了给人看的,而不是给机器看的。假定了其基于上下文的启发式功能可以自动补充这个URI(Such references areprimarily intended for human interpretation rather than for machines,with the assumption that context-based heuristics are sufficient tocomplete the URI)(比如:以www开头,很可能有http://的前缀)

5、引用解析

本节定义了如何解析一个 URI reference的过程,假定其处于一个允许相对引用的上下文中。故而处理得到结果是一个可以和第三节定义的 <URI>格式相匹配的字符串。(This section defines the process of resolving a URI reference within a context that allows relative references so that the result is a string matching the <URI> syntax rule of Section 3.)

5.1 建立一个base URI

“相对”这个词,意味着存在一个base URI,对应于被相对引用所使用的base URI。相对引用只有在base URI存在下,才能使用。

  • base URI必须由解析器构建:A base URI must be established by the parserprior to parsing URI references that might be relative.
  • 格式:A base URI must conform to the <absolute-URI> syntax rule
  • 获取到的base URI,在使用前必须剥离fragment:If the base URI is obtained from a URI reference, then that reference must be converted to absolute form and stripped of any fragment component prior to its use as a base URI.

base URI的四种构建方式:

  • 最内层构建的,其优先级最高。
.----------------------------------------------------------.
|  .----------------------------------------------------.  |
|  |  .----------------------------------------------.  |  |
|  |  |  .----------------------------------------.  |  |  |
|  |  |  |  .----------------------------------.  |  |  |  |
|  |  |  |  |       <relative-reference>       |  |  |  |  |
|  |  |  |  `----------------------------------'  |  |  |  |
|  |  |  | (5.1.1) Base URI embedded in content   |  |  |  |
|  |  |  `----------------------------------------'  |  |  |
|  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
|  |  |         (message, representation, or none)   |  |  |
|  |  `----------------------------------------------'  |  |
|  | (5.1.3) URI used to retrieve the entity            |  |
|  `----------------------------------------------------'  |
| (5.1.4) Default Base URI (application-dependent)         |
`----------------------------------------------------------'

5.2 相对解析

5.2.2 引用转换

伪代码:解析算法

  • URI reference (R) 、target URI (T) 。将R转化为T的过程。
-- The URI reference is parsed into the five URI components
--
(R.scheme, R.authority, R.path, R.query, R.fragment) = parse(R);

-- A non-strict parser may ignore a scheme in the reference
-- if it is identical to the base URI's scheme.
--
if ((not strict) and (R.scheme == Base.scheme)) then
   undefine(R.scheme);
endif;
if defined(R.scheme) then
   T.scheme    = R.scheme;
   T.authority = R.authority;
   T.path      = remove_dot_segments(R.path);
   T.query     = R.query;
else
   if defined(R.authority) then
      T.authority = R.authority;
      T.path      = remove_dot_segments(R.path);
      T.query     = R.query;
   else
      if (R.path == "") then
         T.path = Base.path;
         if defined(R.query) then
            T.query = R.query;
         else
            T.query = Base.query;
         endif;
      else
         if (R.path starts-with "/") then
            T.path = remove_dot_segments(R.path);
         else
            T.path = merge(Base.path, R.path);
            T.path = remove_dot_segments(T.path);
         endif;
         T.query = R.query;
      endif;
      T.authority = Base.authority;
   endif;
   T.scheme = Base.scheme;
endif;

T.fragment = R.fragment;
5.2.3 merge paths

上面的伪代码中,有一个merge函数。处理流程:

  • If the base URI has a defined authority component and an empty path, then return a string consisting of “/” concatenated with the reference’s path; otherwise,
  • return a string consisting of the reference’s path component appended to all but the last segment of the base URI’s path (i.e., excluding any characters after the right-most “/” in the base URI path, or excluding the entire base URI path if it does not contain any “/” characters).
5.2.4. Remove Dot Segments

移除特殊的 .号与 ..

虽然有很多办法可以完成移除的过程,我们在这里描述一个简单的办法,使用2个字符串buffer完成:

  • 输入buffer初始化为要附加的path组件,输出buffer初始化为长度0。
  • 当输入buffer不为空,按如下步骤循环处理
    • A. If the input buffer begins with a prefix of “…/” or “./”, then remove that prefix from the input buffer; otherwise,
    • B. if the input buffer begins with a prefix of “/./” or “/.”, where “.” is a complete path segment, then replace that prefix with “/” in the input buffer; otherwise,
    • C. if the input buffer begins with a prefix of “/…/” or “/…”, where “…” is a complete path segment, then replace that prefix with “/” in the input buffer and remove the last segment and its preceding “/” (if any) from the output buffer; otherwise,
    • D. if the input buffer consists only of “.” or “…”, then remove that from the input buffer; otherwise,
    • E. move the first path segment in the input buffer to the end of the output buffer, including the initial “/” character (if any) and any subsequent characters up to, but not including, the next “/” character or the end of the input buffer.
  • 最后,输出buffer得到我们需要的结果。
STEP   OUTPUT BUFFER         INPUT BUFFER

 1 :                         /a/b/c/./../../g
 2E:   /a                    /b/c/./../../g
 2E:   /a/b                  /c/./../../g
 2E:   /a/b/c                /./../../g
 2B:   /a/b/c                /../../g
 2C:   /a/b                  /../g
 2C:   /a                    /g
 2E:   /a/g

STEP   OUTPUT BUFFER         INPUT BUFFER

 1 :                         mid/content=5/../6
 2E:   mid                   /content=5/../6
 2E:   mid/content=5         /../6
 2C:   mid                   /6
 2E:   mid/6

5.3 组件组装

result = ""

if defined(scheme) then
   append scheme to result;
   append ":" to result;
endif;

if defined(authority) then
   append "//" to result;
   append authority to result;
endif;

append path to result;

if defined(query) then
   append "?" to result;
   append query to result;
endif;

if defined(fragment) then
   append "#" to result;
   append fragment to result;
endif;

return result;

5.4 解析的例子

首先,假设 base URIhttp://a/b/c/d;p?q

5.4.1 正常情况的例子

左边是相对引用的,右边是转化为目标URI的结果:

"g:h"           =  "g:h"
"g"             =  "http://a/b/c/g"
"./g"           =  "http://a/b/c/g"
"g/"            =  "http://a/b/c/g/"
"/g"            =  "http://a/g"
"//g"           =  "http://g"
"?y"            =  "http://a/b/c/d;p?y"
"g?y"           =  "http://a/b/c/g?y"
"#s"            =  "http://a/b/c/d;p?q#s"
"g#s"           =  "http://a/b/c/g#s"
"g?y#s"         =  "http://a/b/c/g?y#s"
";x"            =  "http://a/b/c/;x"
"g;x"           =  "http://a/b/c/g;x"
"g;x?y#s"       =  "http://a/b/c/g;x?y#s"
""              =  "http://a/b/c/d;p?q"
"."             =  "http://a/b/c/"
"./"            =  "http://a/b/c/"
".."            =  "http://a/b/"
"../"           =  "http://a/b/"
"../g"          =  "http://a/b/g"
"../.."         =  "http://a/"
"../../"        =  "http://a/"
"../../g"       =  "http://a/g"
5.4.2 异常情况的例子

base URIhttp://a/b/c/d;p?q

1、..符号不能改变authority

"../../../g"    =  "http://a/g"
"../../../../g" =  "http://a/g"

2、解析器:.号与 ..号只有在其为Segment的时候需要移除。

"/./g"          =  "http://a/g"
"/../g"         =  "http://a/g"
"g."            =  "http://a/b/c/g."
".g"            =  "http://a/b/c/.g"
"g.."           =  "http://a/b/c/g.."
"..g"           =  "http://a/b/c/..g"

3、使用了不必要或者无意义的.号与 ..

"./../g"        =  "http://a/b/g"
"./g/."         =  "http://a/b/c/g/"
"g/./h"         =  "http://a/b/c/g/h"
"g/../h"        =  "http://a/b/c/h"
"g;x=1/./y"     =  "http://a/b/c/g;x=1/y"
"g;x=1/../y"    =  "http://a/b/c/y"

4、fragment没有相对地址

"g?y/./x"       =  "http://a/b/c/g?y/./x"
"g?y/../x"      =  "http://a/b/c/g?y/../x"
"g#s/./x"       =  "http://a/b/c/g#s/./x"
"g#s/../x"      =  "http://a/b/c/g#s/../x"

5、有些运行相对地址出现schema,这是个漏洞,这种向后兼容的方式不应该使用。

  "http:g"        =  "http:g"         ; for strict parsers
                  /  "http://a/b/c/g" ; for backward compatibility

6、规范化与比较

对URI的最常用的操作是比较,每当一个响应缓存被访问了,一个浏览器检测其历史记录并将一个链接标上颜色:
One of the most common operations on URIs is simple comparison: determining whether two URIs are equivalent without using the URIs to access their respective resource(s). A comparison is performed every time a response cache is accessed, a browser checks its history to color a link, or an XML parser processes tags within a namespace. Extensive normalization prior to comparison of URIs is often used by spiders and indexing engines to prune a search space or to reduce duplication of request actions and response storage.

6.1 相等性

false negative:漏报率; 漏报; 被动错误信息; 假阴性; 错误否定;
false positives :指某(些)个负样本被模型预测为正;此种情况可以称作判断为真的错误情况,或称为误报

同一个资源可以被不同URI指向:
Even though it is possible to determine that two URIs are equivalent, URI comparison is not sufficient to determine whether two URIs identify different resources. For example, an owner of two different domain names could decide to serve the same resource from both, resulting in two different URIs. Therefore, comparison methods are designed to minimize false negatives while strictly avoiding false positives.

6.2 比较阶梯

假阴性无法被完全消除,可以减少概率。

URI的比较如同爬楼梯,跑的越高,代价越高,假阴性的概率也越低。

6.2.1 简单字符串比较

字符同,则相同。

6.2.2 基于语法的规范化

如下两个是相同的:

example://a/b/c/%7Bfoo%7D
eXAMPLE://a/./b/../b/%63/%7bfoo%7d
6.2.2.1 大小写规范化

百分号编码部分忽视大小写:
For all URIs, the hexadecimal digits within a percent-encoding triplet (e.g., “%3a” versus “%3A”) are case-insensitive

scheme与host部分:
namely, that the scheme and host are case-insensitive and therefore should be normalized to lowercase. For example, the URI <HTTP://www.EXAMPLE.com/> is equivalent to <http://www.example.com/>.

6.2.2.3 path的规范化

...只用于相对路径,且要处理掉。

6.2.3 基于schema的规范化

The syntax and semantics of URIs vary from scheme to scheme ,所以这里定义的是schema共有的部分。

对于 http schema来说,默认端口为80,定义一个空的path等同于 /,所以下面4个URI是相等的:

http://example.com
http://example.com/
http://example.com:/
http://example.com:80/

6.2.4 基于protocol的规范化

假设http://example.com/data会重定向(redirect)到某一个URI。
http://example.com/data/重定向到一个不同的URI。

7、安全性考虑

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值