HTTP 权威解析之URLs and Resources


1. 定义

  • Uniform resource locators (URLs) are the standardized names for the Internet’s resources. URLs point to pieces of electronic information, telling you where they are located and how to interact with them.
  • URLs are the usual human access point to HTTP and other protocols: a person points a browser at a URL and, behind the scenes, the browser sends the appropriate protocol messages to get the resource that the person wants.
  • URLs actually are a subset of a more general class of resource identifier called a uniform resource identifier, or URI.
    • URIs are a general concept comprised of two main subsets, URLs and URNs.
      • URLs identify resources by describing where resources are located, whereas URNs identify resources by name, regardless of where they currently reside.
  • URLs provide a way to uniformly name resources. Most URLs have the same “scheme://server location/path” structure. So, for every resource out there and every way to get those resources, you have a single way to name each resource so that anyone can use that name to find it.
  • The HTTP specification uses the more general concept of URIs as its resource identifiers; in practice, however, HTTP applications deal only with the URL subset of URIs.


2. URL Syntax

  • Most URL schemes base their URL syntax on this nine-part general format:
  • Almost no URLs contain all these components. The three most important parts of a
    URL are the scheme, the host, and the path.最主要的也就是scheme, host,有时候path也可以省略,此时由浏览器补全path:index.html



  • Hosts and Ports:
    • The host component identifies the host machine on the Internet that has access to the resource. The name can be provided as a hostname, as above (“”) or as an IP address.
    • The port component identifies the network port on which the server is listening. For
      HTTP, which uses the underlying TCP protocol, the default port is 80.
  • Paths:
    • The path component of the URL specifies where on the server machine the resource lives. The path often resembles a hierarchical filesystem path.
    • The path component for HTTP URLs can be divided into path segments separated by “/” characters (again, as in a file path on a Unix filesystem). Each path segment can have its own params component.
  • Parameters:
    • the path component for HTTP URLs can be broken into path segments. Each segment can have its own params. For example:;sale=false/index.html;graphics=true
  • Query Strings:
    • The query component of the URL is passed along to a gateway resource, with the
      path component of the URL identifying the gateway resource. Basically, gateways
      can be thought of as access points to other applications.
    • By convention, many gateways expect the query string to be formatted as a series of “name=value” pairs, separated by “&” characters.
    • 如下图是一种gateway情况
  • Fragments
    • To allow referencing of parts or fragments of a resource, URLs support a frag component to identify pieces within a resource.
      • Some resource types, such as HTML, can be divided further than just the resource level.
    • A fragment dangles off the right-hand side of a URL, preceded by a # character.
    • Because HTTP servers generally deal only with entire objects,* not with fragments of objects, clients don’t pass fragments along to servers. After your browser gets the entire resource from the server, it then uses the fragment to display the part of the resource in which you are interested.

3. A Sea of Schemes


4. 回顾

  • URL为URI的子集,未来将着重于URN,一种基于资源名字而非位置的方式。
  • 对于HTTP协议来说,URL中最为重要的几个部分:scheme, host, port, path, query string
  • fragment还是由server传递完整的html文件给client,但是由client选择该fragment呈现给用户。
  • query string中的gatewary是从其他应用(如数据库)获取资源的中介,可能本身是程序或者脚本。

修正之前Tinyhttp中说query string是参数的不准确说法。

想对作者说点什么? 我来说一句