Understanding the HTTP Protocol(了解HTTP协议)

最新推荐文章于 2020-10-23 10:05:42 发布

iiprogram

最新推荐文章于 2020-10-23 10:05:42 发布

阅读量4.2k

点赞数

分类专栏：网络协议编程-攻击和网络驱动文章标签： web server browser cookies protocols internet

网络协议编程-攻击和网络驱动专栏收录该内容

148 篇文章 0 订阅

订阅专栏

原始连接： http://www.windowsnetworking.com/articles_tutorials/Understanding-HTTP-Protocol-Part1.html

HTTP the protocol

The world of computer network has been around for quite a few decades now, but it was not until the early 1990抯 that the internet, as we know it today, began to take off. It was slowly around this time that the Internet began to become better known, and it's popularity with the public increase. Now there are many reasons that I have heard, as to why the Internet grew in leaps and bounds. I will go with the urban legend that it was not until the appearance of adult content websites that the Internet really grew popular.

What did these websites all have in common though? Well they all used the HTTP protocol aka Hyper Text Transfer Protocol, to conduct their business. While HTTP is carried around by both IP and TCP, it is HTTP itself that allows you to interact with a web server. Your favorite web browser speaks HTTP to the web server, and you in turn get the web page that you requested.

There is much more to it then this as a great deal of information needs to be exchanged between the web client ie: Internet Explorer, and the web server ie: IIS. It is this exchange of information that will help set things up between the browser and server. You could think of it as the TCP/IP handshake really, as there are various details sent back and forth to facilitate the web transactions.

So, on that note lets get ready to take a look at what the web browser sends to the web server after the TCP/IP handshake is over. Well first off, once our three way TCP/IP handshake is done, the browser, or Internet Explorer in this case, sends it's information to the server. Right below this sentence is a packet showing such an example. I will comment on the information contained within it directly below it.

10:14:50.479387 IP (tos 0x0, ttl 128, id 11651, offset 0, flags [DF], proto: TCP (6), length: 529) 192.168.1.100.1722 > 72.14.207.99.80: P, cksum 0x899f (correct), 3141402438:3141402927(489) ack 3866955399 win 65535
0x0000: 4500 0211 2d83 4000 8006 f1e5 c0a8 0164 E....@........d
0x0010: 480e cf63 06ba 0050 bb3d ff46 e67d 0e87 H..c...P.=.F.}..
0x0020: 5018 ffff 899f 0000 4745 5420 2f20 4854 P.......GET./.HT
0x0030: 5450 2f31 2e31 0d0a 486f 7374 3a20 7777 TP/1.1..Host:.ww
0x0040: 772e 676f 6f67 6c65 2e63 610d 0a55 7365 w.google.ca..Use
0x0050: 722d 4167 656e 743a 204d 6f7a 696c 6c61 r-Agent:.Mozilla
0x0060: 2f35 2e30 2028 5769 6e64 6f77 733b 2055 /5.0.(Windows;.U
0x0070: 3b20 5769 6e64 6f77 7320 4e54 2035 2e31 ;.Windows.NT.5.1
0x0080: 3b20 656e 2d55 533b 2072 763a 312e 372e ;.en-US;.rv:1.7.
0x0090: 3130 2920 4765 636b 6f2f 3230 3035 3037 10).Gecko/200507
0x00a0: 3136 2046 6972 6566 6f78 2f31 2e30 2e36 16.Firefox/1.0.6
0x00b0: 0d0a 4163 6365 7074 3a20 7465 7874 2f78 ..Accept:.text/x
0x00c0: 6d6c 2c61 7070 6c69 6361 7469 6f6e 2f78 ml,application/x
0x00d0: 6d6c 2c61 7070 6c69 6361 7469 6f6e 2f78 ml,application/x
0x00e0: 6874 6d6c 2b78 6d6c 2c74 6578 742f 6874 html+xml,text/ht
0x00f0: 6d6c 3b71 3d30 2e39 2c74 6578 742f 706c ml;q=0.9,text/pl
0x0100: 6169 6e3b 713d 302e 382c 696d 6167 652f ain;q=0.8,image/
0x0110: 706e 672c 2a2f 2a3b 713d 302e 350d 0a41 png,*/*;q=0.5..A
0x0120: 6363 6570 742d 4c61 6e67 7561 6765 3a20 ccept-Language:.
0x0130: 656e 2d75 732c 656e 3b71 3d30 2e35 0d0a en-us,en;q=0.5..
0x0140: 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d2d 2d3a ---------------:
0x0150: 202d 2d2d 2d2d 2d2d 2d2d 2d2d 2d0d 0a41 .------------..A
0x0160: 6363 6570 742d 4368 6172 7365 743a 2049 ccept-Charset:.I
0x0170: 534f 2d38 3835 392d 312c 7574 662d 383b SO-8859-1,utf-8;
0x0180: 713d 302e 372c 2a3b 713d 302e 370d 0a4b q=0.7,*;q=0.7..K
0x0190: 6565 702d 416c 6976 653a 2033 3030 0d0a eep-Alive:.300..
0x01a0: 436f 6e6e 6563 7469 6f6e 3a20 6b65 6570 Connection:.keep
0x01b0: 2d61 6c69 7665 0d0a 436f 6f6b 6965 3a20 -alive..Cookie:.
0x01c0: 5052 4546 3d49 443d 3031 6130 3832 3234 PREF=ID=01a08224
0x01d0: 3534 6163 6232 3933 3a4c 443d 656e 3a54 54acb293:LD=en:T
0x01e0: 4d3d 3131 3231 3633 3830 3934 3a4c 4d3d M=1121638094:LM=
0x01f0: 3131 3231 3633 3830 3934 3a53 3d6a 2d30 1121638094:S=j-0
0x0200: 3970 3851 6870 5953 5f43 7253 500d 0a0d 9p8QhpYS_CrSP...
0x0210: 0a

If you remember from reading the articles on TCP/IP that I wrote, you will realize that the application layer data, in this case HTTP, will begin after the TCP header. I have underlined the first two lines showing exactly where it starts. Though I only underlined two lines please understand that the entire remainder of the packet consists of HTTP data. We will now explain the various words that we see in the ASCII content of this packet.

GET /HTTP/1.1

This says that the web client is issuing a GET request to the web server ie: it wants something from it, and that the web client understands HTTP 1.1. There is also HTTP 1.0 but that has largely been replaced by the newer version of HTTP 1.1. Currently there are efforts underway to deploy HTTP 2.0 at some future date.

Host: www.google.ca

This is the website that the client wants to connect to or GET as it were.

User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1;

This tells the webserver that the web client is Mozilla version 5.0 running on Windows version NT 5.1 or as it is more commonly known Windows XP, which is indeed what my operating system is.

en-US; rv: 1.7.10)

This above information tells the web server that the web client understands or uses the en:US character set, and if I am not mistaken revision 1.7.10

Gecko/20050716 Firefox/1.0.6

This lists the exact web browser that the client is using, which as listed above is Firefox

Accept: text/xml, application/xml, application/xhtml+xml

The client here tells the web server that he can accept the following formats, for both text and application ie: he will accept text in xml format and so on.

text/html; q=0.9, text/plain; q=0.8, image/png, */*;q=0.5

In the above the client is saying that he will accept text in both html and plain formats, and also will accept images in png and all other formats. The various q=0.8 and so on are weighting values used by Firefox to indicate it抯 preference for various mime types. These values are floating point, and are weighted between 0 and 1.

Accept-Charset: ISO-8859-1, utf-8; q=0.7, *;q=0/7

Listed here are the character sets that the web client will understand ie: ISO-8859-1, and utf-8.

Keep-Alive: 300 Connection: keep-alive

This tells the web server that it will keep the session alive for 300 seconds, or until the client explicitly ends the session. In version 1.1 of HTTP the connection will remain open until the client terminates it unlike in version 1.0 of HTTP which would terminate the connection after every request. It makes a lot more sense to simply keep it open for a designated time, or until the web client terminates it.

Cookie: PREF=ID=01a0822454acb293: LD=en:TM=1121638094?.

This last piece of the puzzle is the cookie and it's values. Many people have some odd conceptions when it comes to cookies. All a cookie is, is simple text, flat ASCII if you will. There is nothing executable about it. What cookies do contain though is information about your computer ie: browser type, and the such. Quickly said, there are also two types of cookies; session based and persistent. The first cookie is only good for the duration of your browser being up, and the second will remain on your computers hard drive for as long as it has been programmed to. You can also look at the cookies on your browser if you so choose. If you ever clicked 搚es?on one of those 搘ould you like us to remember your username and password?questions, then please realize that this is done via a cookie that the server stores on your computer.

Conclusion

Well this brings us to the end of part one of the HTTP article series. In this article we covered what is seen in the ASCII content of a packet when a web client first connects to a web server. Many fields of interest are listed as you can see, and specifically so to someone who may be a hacker for instance. Remember, the user agent string will reveal what your browser type and operating system are. This certainly helps a hacker who may be trying a client side exploit on you ;-). Till I see you in part two, have fun!

HTTP the protocol

Well we saw in part one of the HTTP article that there are certain values that a web client will send to a web server. This is done in order to make sure that both the client, and the server will speak the same language, as it were. These specific values are sent, and then parsed by the web server once received by the client.

Now seems a good time to point out that most everything on the Internet today communicates in a particular fashion. That being they observe the client/server model. In the case of HTTP there is Internet Explorer the client, and IIS the server. Another case would be the client Mozilla Firefox, and Apache the web server. There are some notable exceptions to this rule though, can you think of one?

Is all the world a client/server model?

Much to the chagrin of organizations like RIAA, and MPAA the p2p protocol does not observe the client/server model. Peer to peer works much like its name ie: direct peer to peer connections. P2P does not use centralized servers, but is rather solely made up of client computers. The closest thing to a server that P2P uses is a supernode which is once again you guessed it, a client. Enough said on P2P though, and now back to the topic at hand! Oh! Don抰 be fooled into thinking that Trojans operate like p2p applications due to their common usage of ephemeral ports. Trojans very much operate in a client/server configuration. Before I get sidetracked once again, back to HTTP we go.

The devil is in the details

Well the above noted explanation now brings us back to our packet seen below. We shall now begin to go through the various metrics, as seen in the ASCII content, and expand upon their meaning. Like the title above this paragraph says, the devil is very much in the details. Please note that I will make my comments directly beneath the packet.

10:14:50.526262 IP (tos 0x10, ttl 55, id 29186, offset 0, flags [none], proto: TCP (6), length: 1470) 72.14.207.99.80 > 192.168.1.100.1722: ., cksum 0xbe07 (correct), 3866955399:3866956829(1430) ack 3141402927 win 6432
0x0000: 4510 05be 7202 0000 3706 32aa 480e cf63 E...r...7.2.H..c
0x0010: c0a8 0164 0050 06ba e67d 0e87 bb3e 012f ...d.P...}...>./
0x0020: 5010 1920 be07 0000 4854 5450 2f31 2e31 P.......HTTP/1.1
0x0030: 2032 3030 204f 4b0d 0a43 6163 6865 2d43 .200.OK..Cache-C
0x0040: 6f6e 7472 6f6c 3a20 7072 6976 6174 650d ontrol:.private.
0x0050: 0a43 6f6e 7465 6e74 2d54 7970 653a 2074 .Content-Type:.t
0x0060: 6578 742f 6874 6d6c 0d0a 5365 7276 6572 ext/html..Server
0x0070: 3a20 4757 532f 322e 310d 0a54 7261 6e73 :.GWS/2.1..Trans
0x0080: 6665 722d 456e 636f 6469 6e67 3a20 6368 fer-Encoding:.ch
0x0090: 756e 6b65 640d 0a44 6174 653a 2053 6174 unked..Date:.Sat
0x00a0: 2c20 3330 204a 756c 2032 3030 3520 3134 ,.30.Jul.2005.14
0x00b0: 3a31 343a 3530 2047 4d54 0d0a 0d0a 6132 :14:50.GMT....a2
0x00c0: 630d 0a3c 6874 6d6c 3e3c 6865 6164 3e3c c..<html><head><
0x00d0: 6d65 7461 2068 7474 702d 6571 7569 763d meta.http-equiv=
0x00e0: 2263 6f6e 7465 6e74 2d74 7970 6522 2063 "content-type".c

Well to begin with we should know where in the packet the HTTP data actually starts. If you still have that TCP/IP and tcpdump flyer I recommended you download at the bottom of the page I just hyperlinked to we can easily find out where the data starts. We can see from the underlined 06 that the protocol being ferried about is TCP. From the underlined value 5 we know that there are no options in the TCP header. With this info in hand we know then that the HTTP data starts at the underlined 4854 and carries on to the end of the packet itself. This is a quick and easy way to orient yourself to the contents of the packet. With that now dealt with let us start breaking out the server抯 response as seen in the packet above.

Time to bust out the info!

HTTP/1.1 200 OK

The underlined text above appears in the ASCII content of the packet. It is also underlined in the packet itself above. This is saying that that the web server uses HTTP protocol version 1.1. It also means that the document the web client requested has been found, and is included in the response. The numerical value 200 as seen is actually a status code. More to follow on status codes later, and their role.

Cache-Control: private

This cryptic little field means that the document sent to the web client is not to be cached by a proxy, and is intended only for the user requesting the document. There is a whole lot more to caching, and how it works. Interesting reading if one is so inclined.

Content-type: text/html

Shown above is what the server is telling to the client ie: that the included document being sent is in a text/html format. That way the web client will know how to render the information.

Server: GWS/2.1

Identified here is the type of server, or server software that is being used by you guessed it, the web server. In this case the type of server used by Google.

Transfer-Encoding: chunked

In HTTP 1.1 chunked transfer encoding is supported. What does it do though you ask? Well simply put, chunked transfer encoding will modify the body of a message so that it can be transferred as a series of chunks. Each chunk has its own size indicator. In contrast a normal HTTP file transfer will contain a 揅ontent-Length?field indicating the amount of data being transferred.

Date: Sat 30 Jul 2005 14:14:50 GMT

Well we can infer from this line that it would be the date and time in GMT as seen on the server. That was one was fairly simple to guess at, and be correct. Were they all to be that easy!

<html><head><meta.http-equiv=攃ontent-type?/U>

For those of you who may be unfamiliar with HTML code, and what it looks like, be aware that the underlined portion above is indeed HTML code. A pretty good giveaway as well that you are seeing HTML is the use of <> before each code. It is via this HTML code that your web browser ie: Internet Explorer, knows how to properly render the page to you. Via all this HTML is the color of the page, and how all the information itself is formatted, be it in paragraph form, bullet, or table.

Should you wish to check out all the HTML contained in the web page you are presently reading (this document!) please click on 揤iew?in your web browser, then click on 揝ource? Doing so will display the web page in its source code format. Pretty neat eh! If you want to have more fun then simply paste the web pages source code in say notepad and made some modifications, and then reload that page. You will see that you have now modified the contents of that page. Once again that would elicit a 損retty neat?comment from me.

Well on that note we will break this article at this point. I will cover in the last part of the series on HTTP more of the minutiae of HTTP, and ways of playing, or modifying HTTP requests. Till then keep a close eye on your packets!

HTTP the protocol

Well, over the course of the past two articles on HTTP that I have written, it is becoming clear that there is a fair amount to the HTTP protocol. It is certainly what I would term a 揹ense?protocol, simply due to the fact there is a lot to it. Much like DNS is a dense protocol due to its many resource records and the functions they fulfill. You may be thinking that 揾oly cow!?is there yet more to HTTP!?. Indeed there is, would be the answer, and that is pretty neat if you ask me. Having protocols such as HTTP perform so many various tasks via its built in metrics, such as the ones we have already covered, is no small feat. Therefore my hat goes off to the developers of the HTTP protocol, in all of its revisions. They are the true talents without a doubt.

With that being said, just what else is there to cover. We shall approach some of the remaining parts of the HTTP protocol one piece at a time. Also as promised in part two, I shall quickly cover the usage of a HTTP proxy such as Spike proxy. This excellent tool was written by Dave Aitel, who also happens to be one of the premiere talents today in the world of network security.

What are status codes?

Like many protocols, HTTP works on the client/server model. Using this model, there logically would be a need to convey error conditions, as they are encountered during the session between the web client and web server. Makes sense doesn抰 it? For if you did not have such status codes things could quickly get very messy. The status codes used by many application layer protocols are broken down into five broad categories, and are as follows;

Status code series 1xx
Status code series 2xx
Status code series 3xx
Status code series 4xx
Status code series 5xx

Each application layer protocol usually has specific error conditions assigned to the categories noted above. In our case they are as follows;

1xx Series

Code 100 Continue
Code 101 Switching protocols

2xx Series

Code    200      OK
Code    201      Created
Code    202      Accepted
Code    203      Non-authoritative

There are quite a few more as this goes up to the 500 series. They can all be found at this link. I hope you don抰 cringe when you see that it is the actual RFC for HTTP. The definitive source for any protocol is always the RFC for it. Found there will be all of the nitty, gritty details that I enjoy reading.

Client requests and commands

Now we have covered the GET request as we saw in the packet which contained the web clients request to the web server, there are many more types that we can see a web client issue. Let抯 cover them! Following is a list of potential web client requests or actions to a web server.

GET
The web client is requesting a resource that is on the web server

POST
This illustrates that the web client is sending information to the web server and is quite often used, when say, filling in an online form.

PUT
This occurs when a web client is sending a replacement document to the web server.

HEAD
You will see this when the web client wants some information about a resource on the server and is not requesting the resource itself.

DELETE
Well this one is pretty easy, and is used to delete a document from the web server.

TRACE
This one you won抰 normally see, but is used when the web client wants the proxies to declare themselves. It is often used for debugging purposes.

OPTIONS
You will see this when the web client wants to know what other methods can be used for a document on the server.

CONNECT
Rounding out the various client methods is connect. This is used when a web client wants to connect to an HTTPS server via a proxy.

There are quite a few methods that a web client can use to chat with a web server as we can now see. That being said, you will normally only see the GET request and sometimes the HEAD if all you do is normal browsing. All of these are of use though if you are trying to perhaps debug a web application.

Someone stole my cookie!

It probably wasn抰 the cookie monster either. There are quite a few misconceptions about cookies, as I alluded to earlier in this article series on HTTP. One of the biggest I find is just, what are they and what do they do? It always pays to be as educated as possible, and using that as my segue way lets discuss cookies!

Just what is a cookie anyways? Well, simply put, they are a plain text file that is stored on the client computer by the web server. This cookie is used to store session variables such as your username, password, amongst other information. Cookies themselves are not part of the HTTP protocol itself, but it is now pretty much considered synonymous with it. Some websites force you into accepting cookies for the website to function properly. So in some cases it is not wise to deny cookies completely as part of your security settings.

There are two types of cookies; the session cookie, which is only valid for the length of the browser connection to the website, and the persistent cookie, which will survive beyond the session termination. It is important to realize, as I mentioned before that a cookie is not an executable. That being said the cookies themselves should ideally be encrypted, so as to negate casual dissection of them.

Using Spike proxy

I mentioned earlier that one way of furthering your knowledge of HTTP is to use a tool such as Spike proxy. What this tool will allow you to do is intercept your web client requests, and modify them. You don抰 want to be doing this against a real web server, or else the owners might get irate with you! What I suggest is that you simply download a version of Apache, and install it on a computer at your home. Please be aware though that you will need a JRE on your computer to use Spike Proxy. Once this is done you are free to now invoke Spike Proxy, and start modifying your HTTP requests to see how your Apache web server will react to it.

Final thoughts

Being involved in the computer network industry, in whatever form that might take, means that you must constantly be upgrading your skills. Whether that means learning the new Windows web server, or further developing your knowledge of a specific protocol. The underlying theme is that you must set aside some time to constantly seek out new information to learn. Staying still in this industry does not bode well for one抯 longevity in it. I sincerely hope that this article series was of interest to you. As always I welcome your feedback, both good and bad. Till next time!

About Don Parker

Don Parker, GCIA GCIH specializes in matters of intrusion detection, and incident handling. He has also enjoyed a role as guest speaker at various network security conferences, and writing for various online and print media on matters of computer security. You can contact Don Parker at don@windowsecurity.com