HTTP协议?HTTP上的应用?

HTTP协议?HTTP上的应用?

 

         这篇文档比较乱,因为时间问题,没有整理,只是从书里找来一些资料来回答题目的2个问号,会在以后的时间里慢慢把这篇文章完善并补全的。

         这篇文章整理的基本出发点

1 浏览HTTP的发展过程,HTTP1.1协议如何解决以前版本协议中出现的问题,对HTTP有个大概的认识。

2 大致了解HTTP上的基本应用有哪些,和两种应用开发方式的优劣。

 

创建时间: 9/11/2006

        

 

 

参考内容:

Web应用开发原理与技术

RFC 2616

 

HTTP是什么?

 

 

HTTP.1如何处理好Http1.0无法处理的一些问题?

持久连接

HTTP1.1默认是持久连接,可以通过消息头【通用头的”Connection:close”显示关闭】显式关闭。

 

内容协商

HTTP协议中通过使用Accept,Vary等头字段来实现服务器驱动的内容协商以及客户驱动的内容协商。

 

访问认证

RFC 2617  有关访问认证的方案和具体细节并不属于HTTP/1.1的内容范畴。

 

缓存

HTTP/1.1协议中包含了一些机制,以尽可能地使客户,服务器以及代理,网关等中介系统在通信中使用缓存。其中,If-Modified-Since,If-Range等头字段能够保证缓存实现的有效性和准确性。

 

2 添加新的消息头

 

③实体是什么?

实体(Entity),是指作为请求或者响应消息的有效载荷而传输的信息。例如,当用户想浏览某个Web页面时,HTTP请求消息种的请求方法,响应消息中的状态码都不是有效载荷,它们都是为了实现文件下载这一最终目的而在客户于服务器之间传送的额外消息:而用户所要浏览的HTML文件及其元消息(文件大小,最近修改时间等)才是有效载荷。

实体信息分为两种类型:一种是对资源进行描述的元数据,这部分数据是作为实体头(entity-header)来发送的,另一种是资源内容北身,这部分数据是作为实体正文(entity-body)来发送的。

 

④缓存是什么?

缓存是用于暂存服务器端响应消息的本地存储空间,以及对暂存消息进行存储,检索和删除等控制的子系统。缓存用于存储可缓存的响应消息,以便在将来出现在相同的客户请求时减少响应时间并降低网络带宽的占用。HTTP协议引入了缓存机制和相应的控制手段,以便保证通信的高效性,正确性。

 

 

 

HTTP协议包含了大量的内容,从各种请求,响应消息,方法,状态码等基本定义,到持久连接,缓存,安全性等各种高级特征。

 

HTTP的特点

1 客户/服务器模式

2 简单快速

3 内容协商

4 可扩充性

5 非持久性连接

6 无状态性

 

HTTP的历史

HTTP/0.9           只支持GET  淘汰

HTTP/1.0            RFC1945

HTTP/1.1          RFC2068    

(跟HTTP1.0相比) 缓存,支持持久连接,增加了新的方法,增加了新的头字段

 

 

消息类型和格式

generic-message = start-line

                            *(message-header CRLF)

                            CRLF

                            [ message-body]

请求消息

Request = Request-Line

                   *((general-header|request-header|entity-header)CRLF)

                   CRLF

                   [ message-body]

响应消息:

Request = Status-Line

                  *((general-header | response-header | entity-header)CRLF)

                   CRLF
                   [message-body]

 

 

相关规范:

MIME的全称Multipurpose Internet E-mail Extensions,即多用途的Internet邮件扩展协议。MIMEIETF所制定的一组国际标准规范的总称(RFC2045---RFC2049共五个规范).它给出了一系列的已有消息类型的定义。并提供了可扩充机制以添加将来出现的新类型。

         UR I(Unified Resource Identifier,统一资源标识符)是对Internet上的资源进行命名的一种标准机制。

 

利用HTTP协议中定义的各种消息,客户,服务器除了实现资源共享和交换以外,还可以实现一下目标:

持久连接

内容协商

访问认证

缓存

 

 

 

HTTP

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless, protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers . A feature of HTTP is the typing and negotiation of data representation, allowing systems to be built independently of the data being transferred.

 

 

Purpose

The Hypertext Transfer Protocol (HTTP) is an application-level protocol for distributed, collaborative, hypermedia information systems. HTTP has been in use by the World-Wide Web global information initiative since 1990. The first version of HTTP, referred to as HTTP/0.9, was a simple protocol for raw data transfer across the Internet.HTTP/1.0, as defined by RFC 1945 , improved the protocol by allowing messages to be in the format of MIMElike messages, containing metainformation about the data transferred and modifiers on the request/response semantics. However, HTTP/1.0 does not sufficiently take into consideration the effects of hierarchical proxies, caching, the need for persistent connections, or virtual hosts. In addition, the proliferation of incompletelyimplemented applications calling themselves “HTTP/ 1.0” has necessitated a protocol version change in order for two communicating applications to determine each other’s true capabilities.

This specification defines the protocol referred to as “HTTP/ 1.1” . This protocol includes more stringent requirements than HTTP/ 1.0 in order to ensure reliable implementation of its features. Practical information systems require more functionality than simple retrieval, including search, front-end update, and annotation. HTTP allows an open-ended set of methods and headers that indicate the purpose of a request [47]. It builds on the discipline of reference provided by the Uniform Resource Identifier (URI) [3], as a location (URL) [4] or name (URN) [20], for indicating the resource to which a method is to be applied. Messages are passed in a

format similar to that used by Internet mail [9] as defined by the Multipurpose Internet Mail Extensions (MIME) [7].

HTTP is also used as a generic protocol for communication between user agents and proxies/gateways to other Internet systems, including those supported by the SMTP [16], NNTP [13], FTP [18], Gopher [2], and WAIS [10] protocols. In this way, HTTP allows basic hypermedia access to resources available from diverse applications.

 

 

 

Overall Operation

The HTTP protocol is a request/response protocol. A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server. The server responds with a status line,

including the message’s protocol version and a success or error code, followed by a MIME-like message containing server information, entity metainformation, and possible entity-body content. The relationship between HTTP and

MIME is described in appendix 19.4.

Most HTTP communication is initiated by a user agent and consists of a request to be applied to a resource on some origin server. In the simplest case, this may be accomplished via a single connection (v) between the user agent (UA) and the origin server (O).

request chain ------------------------>

UA -------------------v------------------- O

<----------------------- response chain

A more complicated situation occurs when one or more intermediaries are present in the request/response chain. There are three common forms of intermediary: proxy, gateway, and tunnel. A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI. A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server’s protocol. A tunnel acts as a relay point between two connections without changing the messages; tunnels are used when the communication needs to pass through an intermediary (such as a firewall) even when the intermediary cannot understand the contents of the messages.

request chain -------------------------------------->

UA -----v----- A -----v----- B -----v----- C -----v----- O

<------------------------------------- response chain

The figure above shows three intermediaries (A, B, and C) between the user agent and origin server. A request or response message that travels the whole chain will pass through four separate connections. This distinction is important because some HTTP communication options may apply only to the connection with the nearest, nontunnel

neighbor, only to the end-points of the chain, or to all connections along the chain. Although the diagram is linear, each participant may be engaged in multiple, simultaneous communications. For example, B may be receiving requests from many clients other than A, and/or forwarding requests to servers other than C, at the same time that it is handling A’s request. Any party to the communication which is not acting as a tunnel may employ an internal cache for handling requests. The effect of a cache is that the request/response chain is shortened if one of the participants along the chain has a cached response applicable to that request. The following illustrates the resulting chain if B has a cached copy of an earlier response from O (via C) for a request which has not been cached by UA or A.

request chain ---------->

UA -----v----- A -----v----- B - - - - - - C - - - - - - O

<--------- response chain

Not all responses are usefully cacheable, and some requests may contain modifiers which place special requirements on cache behavior. HTTP requirements for cache behavior and cacheable responses are defined in section 13.

In fact, there are a wide variety of architectures and configurations of caches and proxies currently being experimented with or deployed across the World Wide Web. These systems include national hierarchies of proxy caches to save transoceanic bandwidth, systems that broadcast or multicast cache entries, organizations that

distribute subsets of cached data via CD-ROM, and so on. HTTP systems are used in corporate intranets over highbandwidth links, and for access via PDAs with low-power radio links and intermittent connectivity. The goal of HTTP/1.1 is to support the wide diversity of configurations already deployed while introducing protocol constructs that meet the needs of those who build web applications that require high reliability and, failing that, at least reliable indications of failure.

HTTP communication usually takes place over TCP/IP connections. The default port is TCP 80 [19], but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used;the mapping of the HTTP/1.1 request and response structures onto the transport data units of the protocol in question is outside the scope of this specification.

In HTTP/1.0, most implementations used a new connection for each request/response exchange. In HTTP/1.1, a connection may be used for one or more request/response exchanges, although connections may be closed for a variety of reasons (see section 8.1).

 

 

 

HTTP应用开发和示例

HTTP协议贯穿了Web上多个层次的应用开发.HTTP应用开发.

HTTP应用可以分为三种类型:HTTP客户程序,HTTP服务器程序,服务器端应用程序。

 

一些典型的HTTP客户端程序包括:

Web浏览器 其用途主要是供用户浏览Web服务器上的HTML文档.

Web文档下载程序,其用途主要是采用断点续传,多线程下载等手段帮助用户快速,高校地从远程服务器上下载信息资源。例如:NetAnts,Net Vampire,等等。

Web Robot, 其用途用于信息检索,资源发现等目的的而对Web进行遍历,即,从某个或某些URL开始沿着Web上的超链接采用深度优先或广度优先的方法来下载所有的Web页面。

 

 

 

 

 

 

 

 

HTTP应用开发方法

我们将HTTP应用程序开发分为两种

一种使用的是网络层上的接口,SocketAPI

一种使用是应用层上的编程接口,Windows平台上编写额WinInet函数库和Internet Transfer ActiveX控件等。

 

Socket是网络层上的编程接口,因此,基于Socket可以开发各种类型的应用程序,包括:FTP客户/服务器,邮件客户/服务器。

 

 

 

基于Socket开发HTTP应用程序于基于Internet Transfer控件或WinInet函数库的开发HTTP应用程序有着很打的区别。在基于Socket的开发中,主要难点体现在两个方面:

(1)         利用Socket来显式的实现基于TCP/IP的客户/服务器通信。

开发者需要花费很多精力来处理有关网络编程方面的细节问题,包括:Socket函数的使用,循环/并发方式的选择,网络字节序的转换,接受到的数据保的拼装,等。

(2)         从语法上和语义两个方面显式地对HTTP消息进行解析和构造。

开发者需要透彻地理解HTTP协议并在程序中加以实现,包括:从语法上正确的构造和解析HTTP请求消息或响应消息:从语义上正确的构造和解析HTTP请求消息或者响应消息,并执行相应的处理逻辑。此外,开发者还可以对HTTP协议进行修改或者扩展,实现一些自定义的功能。

 

 

 

 

 

 

 

 

 

Terminology

This specification uses a number of terms to refer to the roles played by participants in, and objects of, the HTTP

communication.

connection

A transport layer virtual circuit established between two programs for the purpose of communication.

message

The basic unit of HTTP communication, consisting of a structured sequence of octets matching the syntax

defined in section 4 and transmitted via the connection.

request

An HTTP request message, as defined in section 5.

response

An HTTP response message, as defined in section 6.

resource

A network data object or service that can be identified by a URI, as defined in section 3.2. Resources may be

available in multiple representations (e.g. multiple languages, data formats, size, and resolutions) or vary in

other ways.

entity

The information transferred as the payload of a request or response. An entity consists of metainformation in the

form of entity-header fields and content in the form of an entity-body, as described in section 7.

representation

An entity included with a response that is subject to content negotiation, as described in section 12. There may

exist multiple representations associated with a particular response status.

content negotiation

The mechanism for selecting the appropriate representation when servicing a request, as described in section 12.

The representation of entities in any response can be negotiated (including error responses).

variant

A resource may have one, or more than one, representation(s) associated with it at any given instant. Each of

these representations is termed a ‘variant.’ Use of the term ‘variant’ does not necessarily imply that the resource

is subject to content negotiation.

client

A program that establishes connections for the purpose of sending requests.

user agent

The client which initiates a request. These are often browsers, editors, spiders (web-traversing robots), or other

end user tools.

server

An application program that accepts connections in order to service requests by sending back responses. Any

given program may be capable of being both a client and a server; our use of these terms refers only to the role

being performed by the program for a particular connection, rather than to the program’s capabilities in general.

Likewise, any server may act as an origin server, proxy, gateway, or tunnel, switching behavior based on the

nature of each request.

origin server

The server on which a given resource resides or is to be created.

proxy

An intermediary program which acts as both a server and a client for the purpose of making requests on behalf

of other clients. Requests are serviced internally or by passing them on, with possible translation, to other

servers. A proxy MUST implement both the client and server requirements of this specification. A “transparent

proxy” is a proxy that does not modify the request or response beyond what is required for proxy authentication

and identification. A “non-transparent proxy” is a proxy that modifies the request or response in order to provide

some added service to the user agent, such as group annotation services, media type transformation, protocol

reduction, or anonymity filtering. Except where either transparent or non-transparent behavior is explicitly

stated, the HTTP proxy requirements apply to both types of proxies.

gateway

A server which acts as an intermediary for some other server. Unlike a proxy, a gateway receives requests as if it

were the origin server for the requested resource; the requesting client may not be aware that it is

communicating with a gateway.

tunnel

An intermediary program which is acting as a blind relay between two connections. Once active, a tunnel is not

considered a party to the HTTP communication, though the tunnel may have been initiated by an HTTP request.

The tunnel ceases to exist when both ends of the relayed connections are closed.

cache

A program’s local store of response messages and the subsystem that controls its message storage, retrieval, and

deletion. A cache stores cacheable responses in order to reduce the response time and network bandwidth

consumption on future, equivalent requests. Any client or server may include a cache, though a cache cannot be

used by a server that is acting as a tunnel.

cacheable

A response is cacheable if a cache is allowed to store a copy of the response message for use in answering

subsequent requests. The rules for determining the cacheability of HTTP responses are defined in section 13.

Even if a resource is cacheable, there may be additional constraints on whether a cache can use the cached copy

for a particular request.

first-hand

A response is first-hand if it comes directly and without unnecessary delay from the origin server, perhaps via

one or more proxies. A response is also first-hand if its validity has just been checked directly with the origin

server.

explicit expiration time

The time at which the origin server intends that an entity should no longer be returned by a cache without further

validation.

heuristic expiration time

An expiration time assigned by a cache when no explicit expiration time is available.

age

The age of a response is the time since it was sent by, or successfully validated with, the origin server.

freshness lifetime

The length of time between the generation of a response and its expiration time.

fresh

A response is fresh if its age has not yet exceeded its freshness lifetime.

stale

A response is stale if its age has passed its freshness lifetime.

semantically transparent

A cache behaves in a “semantically transparent” manner, with respect to a particular response, when its use

affects neither the requesting client nor the origin server, except to improve performance. When a cache is

semantically transparent, the client receives exactly the same response (except for hop-by-hop headers) that it

would have received had its request been handled directly by the origin server.

validator

A protocol element (e.g., an entity tag or a Last-Modified time) that is used to find out whether a cache entry is

an equivalent copy of an entity.

upstream/downstream

Upstream and downstream describe the flow of a message: all messages flow from upstream to downstream.

inbound/outbound

Inbound and outbound refer to the request and response paths for messages: “inbound” means “traveling toward

the origin server”, and “outbound” means “traveling toward the user agent”

 

 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值