自己动手写http服务器(二) -- http协议分析

最新推荐文章于 2022-06-18 16:50:05 发布

灿哥！

最新推荐文章于 2022-06-18 16:50:05 发布

阅读量716

点赞数

分类专栏： HTTP 文章标签： http

原文链接：https://blog.csdn.net/zjy900507/article/details/88633469

版权

HTTP 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

自己动手写http服务器(一) -- UNIX C 网络编程
自己动手写http服务器(二) -- http协议分析
自己动手写http服务器(三) -- 代码实现

要编写一个 http 服务器，第一步就是分析 http 协议格式，之后才能对发送过来的http数据包进行正常解析，并返回正确的数据包；

Http协议包的格式

首先，让我们用 netcat 捕获浏览器发送给服务器的数据包，来见一见其庐山真面目。

（1）捕捉 http 协议的数据包

通过命令：

nc  -l  127.0.0.1 8888  >  http.data

开启本地的 8888 号端口，在浏览器中输入 url 地址 http://127.0.0.1:8888 ，浏览器将会发送给一个Get请求给nc，nc将接收到的数据写入文件 http.data , 接收到的内容如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       00000000: 
       
       4745 
       
       5420 
       
       2f
       
       20 
       
       4854 
       
       5450 
       
       2f
       
       31 
       
       2e
       
       31 
       
       0d
       
       0a  GET / HTTP/
       
       1.
       
       1..
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000010: 
       
       486f 
       
       7374 
       
       3a
       
       20 
       
       3132 
       
       372e 
       
       302e 
       
       302e 
       
       313a  Host: 
       
       127.0.0.1:
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000020: 
       
       3838 
       
       3838 
       
       0d
       
       0a 
       
       436f 
       
       6e
       
       6e 
       
       6563 
       
       7469 
       
       6f
       
       6e  
       
       8888..Connection
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000030: 
       
       3a
       
       20 
       
       6b
       
       65 
       
       6570 
       
       2d
       
       61 
       
       6c
       
       69 
       
       7665 
       
       0d
       
       0a 
       
       5570  : keep-alive..Up
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000040: 
       
       6772 
       
       6164 
       
       652d 
       
       496e 
       
       7365 
       
       6375 
       
       7265 
       
       2d
       
       52  grade-Insecure-R
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000050: 
       
       6571 
       
       7565 
       
       7374 
       
       733a 
       
       2031 
       
       0d
       
       0a 
       
       5573 
       
       6572  equests: 
       
       1..User
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000060: 
       
       2d
       
       41 
       
       6765 
       
       6e
       
       74 
       
       3a
       
       20 
       
       4d
       
       6f 
       
       7a
       
       69 
       
       6c
       
       6c 
       
       612f  -Agent: Mozilla/
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000070: 
       
       352e 
       
       3020 
       
       2858 
       
       3131 
       
       3b
       
       20 
       
       4c
       
       69 
       
       6e
       
       75 
       
       7820  
       
       5.
       
       0 (X
       
       11; Linux 
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000080: 
       
       7838 
       
       365f 
       
       3634 
       
       2920 
       
       4170 
       
       706c 
       
       6557 
       
       6562  x
       
       86_
       
       64) AppleWeb
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000090: 
       
       4b
       
       69 
       
       742f 
       
       3533 
       
       372e 
       
       3336 
       
       2028 
       
       4b
       
       48 
       
       544d  Kit/
       
       537.
       
       36 (KHTM
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000a0: 
       
       4c
       
       2c 
       
       206c 
       
       696b 
       
       6520 
       
       4765 
       
       636b 
       
       6f
       
       29 
       
       2043  L, like Gecko) C
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000b0: 
       
       6872 
       
       6f
       
       6d 
       
       652f 
       
       3539 
       
       2e
       
       30 
       
       2e
       
       33 
       
       3037 
       
       312e  hrome/
       
       59.
       
       0.
       
       3071.
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000c0: 
       
       3131 
       
       3520 
       
       5361 
       
       6661 
       
       7269 
       
       2f
       
       35 
       
       3337 
       
       2e
       
       33  
       
       115 Safari/
       
       537.
       
       3
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000d0: 
       
       360d 
       
       0a
       
       41 
       
       6363 
       
       6570 
       
       743a 
       
       2074 
       
       6578 
       
       742f  
       
       6..Accept: text/
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000e0: 
       
       6874 
       
       6d
       
       6c 
       
       2c
       
       61 
       
       7070 
       
       6c
       
       69 
       
       6361 
       
       7469 
       
       6f
       
       6e  html,application
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000f0: 
       
       2f
       
       78 
       
       6874 
       
       6d
       
       6c 
       
       2b
       
       78 
       
       6d
       
       6c 
       
       2c
       
       61 
       
       7070 
       
       6c
       
       69  /xhtml+xml,appli
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000100: 
       
       6361 
       
       7469 
       
       6f
       
       6e 
       
       2f
       
       78 
       
       6d
       
       6c 
       
       3b
       
       71 
       
       3d
       
       30 
       
       2e
       
       39  cation/xml;q=
       
       0.
       
       9
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000110: 
       
       2c
       
       69 
       
       6d
       
       61 
       
       6765 
       
       2f
       
       77 
       
       6562 
       
       702c 
       
       696d 
       
       6167  ,image/webp,imag
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000120: 
       
       652f 
       
       6170 
       
       6e
       
       67 
       
       2c
       
       2a 
       
       2f
       
       2a 
       
       3b
       
       71 
       
       3d
       
       30 
       
       2e
       
       38  e/apng,*/*;q=
       
       0.
       
       8
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000130: 
       
       0d
       
       0a 
       
       4163 
       
       6365 
       
       7074 
       
       2d
       
       45 
       
       6e
       
       63 
       
       6f
       
       64 
       
       696e  ..Accept-Encodin
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000140: 
       
       673a 
       
       2067 
       
       7a
       
       69 
       
       702c 
       
       2064 
       
       6566 
       
       6c
       
       61 
       
       7465  g: gzip, deflate
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000150: 
       
       2c
       
       20 
       
       6272 
       
       0d
       
       0a 
       
       4163 
       
       6365 
       
       7074 
       
       2d
       
       4c 
       
       616e  , br..Accept-Lan
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000160: 
       
       6775 
       
       6167 
       
       653a 
       
       207a 
       
       682d 
       
       434e 
       
       2c
       
       7a 
       
       683b  guage: zh-CN,zh;
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000170: 
       
       713d 
       
       302e 
       
       382c 
       
       6c
       
       61 
       
       3b
       
       71 
       
       3d
       
       30 
       
       2e
       
       36 
       
       2c
       
       64  q=
       
       0.
       
       8,la;q=
       
       0.
       
       6,d
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000180: 
       
       613b 
       
       713d 
       
       302e 
       
       340d 
       
       0a
       
       0d 
       
       0a              a;q=
       
       0.
       
       4....

左侧是接收的数据原始二进制流，右侧是对应的ASCII码；

可见浏览器默认使用的http协议是 HTTP/1.1，其头信息肯定是文本（ASCII编码）；

（2）捕捉 https 协议的数据包

通过命令：

nc  -l  127.0.0.1  8888  >  https.data

在浏览器中输入 url 地址 https://127.0.0.1:8888 ，即可获得浏览器发送给服务器的数据，内容如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       00000000: 
       
       1603 
       
       0100 c
       
       2ae 
       
       0100 
       
       00c
       
       2 aa
       
       03 
       
       0328 
       
       6fc
       
       2  .............(o.
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000010: a
       
       227 
       
       0f
       
       31 c
       
       392 c
       
       388 c
       
       392 
       
       42c
       
       2 
       
       8dc
       
       2 
       
       9dc
       
       2  .'.
       
       1......B.....
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000020: 
       
       8275 c
       
       297 
       
       2324 c
       
       38f 
       
       484d 
       
       75c
       
       2 
       
       8b
       
       23 
       
       5bc
       
       2  .u..#$..HMu..#[.
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000030: aac
       
       3 
       
       98c
       
       3 a
       
       17c 
       
       0d
       
       70 
       
       6cc
       
       2 be
       
       00 
       
       001c 
       
       2a
       
       2a  .....|.pl.....**
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000040: c
       
       380 
       
       2bc
       
       3 
       
       802f c
       
       380 
       
       2cc
       
       3 
       
       8030 c
       
       38c c
       
       2a
       
       9  ..+../..,..
       
       0....
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000050: c
       
       38c c
       
       2a
       
       8 c
       
       380 
       
       13c
       
       3 
       
       8014 
       
       00c
       
       2 
       
       9c
       
       00 c
       
       29d  ................
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000060: 
       
       002f 
       
       0035 
       
       000a 
       
       0100 
       
       0065 c
       
       2aa c
       
       2aa 
       
       0000  ./.
       
       5.....e......
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000070: c
       
       3bf 
       
       0100 
       
       0100 
       
       0017 
       
       0000 
       
       0023 
       
       0000 
       
       000d  ...........#....
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000080: 
       
       0014 
       
       0012 
       
       0403 
       
       0804 
       
       0401 
       
       0503 
       
       0805 
       
       0501  ................
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       00000090: 
       
       0806 
       
       0601 
       
       0201 
       
       0005 
       
       0005 
       
       0100 
       
       0000 
       
       0000  ................
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000a0: 
       
       1200 
       
       0000 
       
       1000 
       
       0e
       
       00 
       
       0c
       
       02 
       
       6832 
       
       0868 
       
       7474  ..........h
       
       2.htt
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000b0: 
       
       702f 
       
       312e 
       
       3175 
       
       5000 
       
       0000 
       
       0b
       
       00 
       
       0201 
       
       0000  p/
       
       1.
       
       1uP.........
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000c0: 
       
       0a
       
       00 
       
       0a
       
       00 
       
       086a 
       
       6a
       
       00 
       
       1d
       
       00 
       
       1700 
       
       182a 
       
       2a
       
       00  .....jj......**.
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       000000d0: 
       
       0100 
       
       0a                                  ...

可见，https协议的头信息是二进制数据流而非文本；

本文只对 http1.1 协议进行分析；

（3）分析 http Get请求数据包

浏览器发送到服务器端的请求数据为：


   
   
     
     
      
      
     
     
     
     
      
      
       
       GET / HTTP/
       
       1.
       
       1
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Host: 
       
       127.0.0.1:8888
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Connection: keep-alive
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Upgrade-Insecure-Requests: 
       
       1
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       User-Agent: Mozilla/
       
       5.
       
       0 (X
       
       11; Linux x
       
       86_
       
       64) AppleWebKit/
       
       537.
       
       36 (KHTML, like Gecko) Chrome/
       
       59.
       
       0.
       
       3071.
       
       115 Safari/
       
       537.
       
       36
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept: text/html,application/xhtml+xml,application/xml;q=
       
       0.
       
       9,image/webp,image/apng,*/*;q=
       
       0.
       
       8
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept-Encoding: gzip, deflate, br
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept-Language: zh-CN,zh;q=
       
       0.
       
       8,la;q=
       
       0.
       
       6,da;q=
       
       0.
       
       4

对于HTTP报文来说，第一行为报文的起始行，格式为

<method> <request-URL> <version>

每个字段用空格分隔；

在该例子中， method 为 GET ，request-URL 为 / ，version 为 HTTP/1.1 ；

在这里，因为我们只是在浏览器中输入一个ip地址及端口号，默认的请求资源为 / ；

如果在浏览器中输入 http://127.0.0.1:8888/xxx/yy?name=abc&age=23 则 request-URL 的值将是 * /xxx/yy?name=abc&age=23 * ；

（4）捕获 http Post 请求数据包

下面我们来捕获以下Post的请求包，看看其与Get请求包的不同；

首先，我们创建一个html文件，文件地址为 :
/home/hbfeng/Code/Year2017/Mon07/Day19/x.html

文件内容为：


   
   
     
     
      
      
     
     
     
     
      
      
       
       <!DOCTYPE html>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       <html lang="en">
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       <head>
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       <meta charset="UTF-8">
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       <title>Document
       
       </title>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       </head>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       <body>
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       <form action="http://127.0.0.1:8888" method="POST">
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
               color:
       
       <input type="text" name="color">
      
      
     
     

     
     
      
      
     
     
     
     
      
              
       
       <input type="submit" value="提交" />
      
      
     
     

     
     
      
      
     
     
     
     
      
          
       
       </form>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       </body>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       </html>

之后，开启服务器：

nc  -l  127.0.0.1  8888  >  post.dat

在浏览器中输入：
file:///home/hbfeng/Code/Year2017/Mon07/Day19/x.html

可以出现如下页面，在文本框中填入内容，点击提交即可获得一个Post数据包：

POST数据获取

Post数据包的内容如下：


   
   
     
     
      
      
     
     
     
     
      
      
       
       POST / HTTP/
       
       1.
       
       1
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Host: 
       
       127.0.0.1:8888
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Connection: keep-alive
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Content-Length: 
       
       12
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Cache-Control: max-age=
       
       0
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Origin: null
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Upgrade-Insecure-Requests: 
       
       1
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       User-Agent: Mozilla/
       
       5.
       
       0 (X
       
       11; Linux x
       
       86_
       
       64) AppleWebKit/
       
       537.
       
       36 (KHTML, like Gecko) Chrome/
       
       59.
       
       0.
       
       3071.
       
       115 Safari/
       
       537.
       
       36
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Content-Type: application/x-www-form-urlencoded
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept: text/html,application/xhtml+xml,application/xml;q=
       
       0.
       
       9,image/webp,image/apng,*/*;q=
       
       0.
       
       8
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept-Encoding: gzip, deflate, br
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Accept-Language: zh-CN,zh;q=
       
       0.
       
       8,la;q=
       
       0.
       
       6,da;q=
       
       0.
       
       4
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       color=yellow

与Get请求的数据相比，Post数据包多出了以下我们后续编写代码时需要使用的内容：

Content-Length: 12 : 表示HTTP正文的大小。POST请求将数据以URL编码的形式放在HTTP正文中，字段形式为 fieldname=value，用&分隔每个字段；
HTTP信息头与HTTP正文之间有一行空行；
HTTP中有表单内容 color=yellow ，正好等于 Content-Length 的长度；

服务器的工作流程

知道了浏览器给我们发送的数据格式以后，我们的http服务器就可以将数据包进行解析，并动态生成页面发送给浏览器；

服务器的大致工作流程如下图所示：

tinyhttpd的工作流程

反馈给客户端的数据格式

知道了服务器的运行流程，我们需要知道浏览器希望从服务器端得到什么格式的数据；

服务器按照HTTP协议返回数据给客户端，如响应码为400，返回的内容为：


   
   
     
     
      
      
     
     
     
     
      
      
       
       HTTP/1.0 
       
       400 BAD REQUEST
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       Content-type: text/html
      
      
     
     

     
     
      
      
     
     
     
     
      
       
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       <!DOCTYPE>
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       <html>
      
      
     
     

     
     
      
      
     
     
     
     
      
        
       
       <!-- html内容 -->
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
         ... ... 
      
      
     
     

     
     
      
      
     
     
     
     
      
      
       
       </html>