- Uniform resource locators (URLs) are the standardized names for the Internet’s resources. URLs point to pieces of electronic information, telling you where they are located and how to interact with them.
- URLs are the usual human access point to HTTP and other protocols: a person points a browser at a URL and, behind the scenes, the browser sends the appropriate protocol messages to get the resource that the person wants.
- URLs actually are a subset of a more general class of resource identifier called a uniform resource identifier, or URI.
- URIs are a general concept comprised of two main subsets, URLs and URNs.
- URLs identify resources by describing where resources are located, whereas URNs identify resources by name, regardless of where they currently reside.
- URLs provide a way to uniformly name resources. Most URLs have the same “scheme://server location/path” structure. So, for every resource out there and every way to get those resources, you have a single way to name each resource so that anyone can use that name to find it.
- The HTTP specification uses the more general concept of URIs as its resource identifiers; in practice, however, HTTP applications deal only with the URL subset of URIs.
2. URL Syntax
- Most URL schemes base their URL syntax on this nine-part general format:
- Almost no URLs contain all these components. The three most important parts of a
URL are the scheme, the host, and the path.
- Hosts and Ports:
- The host component identifies the host machine on the Internet that has access to the resource. The name can be provided as a hostname, as above (“www.joes-hardware.com”) or as an IP address.
- The port component identifies the network port on which the server is listening. For
HTTP, which uses the underlying TCP protocol, the default port is 80.
- The path component of the URL specifies where on the server machine the resource lives. The path often resembles a hierarchical filesystem path.
- The path component for HTTP URLs can be divided into path segments separated by “/” characters (again, as in a file path on a Unix filesystem). Each path segment can have its own params component.
- the path component for HTTP URLs can be broken into path segments. Each segment can have its own params. For example:
- Query Strings:
- The query component of the URL is passed along to a gateway resource, with the
path component of the URL identifying the gateway resource. Basically, gateways
can be thought of as access points to other applications.
- By convention, many gateways expect the query string to be formatted as a series of “name=value” pairs, separated by “&” characters.
- To allow referencing of parts or fragments of a resource, URLs support a frag component to identify pieces within a resource.
- Some resource types, such as HTML, can be divided further than just the resource level.
- A fragment dangles off the right-hand side of a URL, preceded by a # character.
- Because HTTP servers generally deal only with entire objects,* not with fragments of objects, clients don’t pass fragments along to servers. After your browser gets the entire resource from the server, it then uses the fragment to display the part of the resource in which you are interested.
3. A Sea of Schemes
scheme, host, port, path, query string