先说说事情的经过:
最近做一个网络的安全模块,需要对网络数据进行处理。
先从浏览器下手,捕获数据。
对于我这一点不懂web的人来说,分析html真是一件遭罪的事情
不过还好,现有抓包工具,抓来分析
一切正常,连浏览器下载数据,都能正常的捕获,分析,虽然那种thunked模式的,很烦躁,好在也可以处理。
终于能用了,长出一口气
突然发现,对有些网站,不能处理,咦,怎么回事,好吧,抓包继续分析
发现content-encoding: gzip,我晕,一看就是压缩的
这下麻烦了,压缩的数据,我要处理的话,意味着我需要解压,处理完后,在压缩
先翻翻协议吧
http://www.w3.org/Protocols/rfc2616/rfc2616.html
5.3 Request Header Fields
The request-header fields allow the client to pass additional information about the request, and about the client itself, to the server. These fields act as request modifiers, with semantics equivalent to the parameters on a programming language method invocation.
request-header = Accept ; Section 14.1
| Accept-Charset ; Section 14.2
| Accept-Encoding ; Section 14.3
| Accept-Language ; Section 14.4
| Authorization ; Section 14.8
| Expect ; Section 14.20
| From ; Section 14.22
| Host ; Section 14.23
| If-Match ; Section 14.24
| If-Modified-Since ; Section 14.25
| If-None-Match ; Section 14.26
| If-Range ; Section 14.27
| If-Unmodified-Since ; Section 14.28
| Max-Forwards ; Section 14.31
| Proxy-Authorization ; Section 14.34
| Range ; Section 14.35
| Referer ; Section 14.36
| TE ; Section 14.39
| User-Agent ; Section 14.43
14.3 Accept-Encoding
The Accept-Encoding request-header field is similar to Accept, but restricts the content-codings (section 3.5) that are acceptable in the response.
Accept-Encoding = "Accept-Encoding" ":"
1#( codings [ ";" "q" "=" qvalue ] )
codings = ( content-coding | "*" )
Examples of its use are:
Accept-Encoding: compress, gzip
Accept-Encoding:
Accept-Encoding: *
Accept-Encoding: compress;q=0.5, gzip;q=1.0
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
A server tests whether a content-coding is acceptable, according to an Accept-Encoding field, using these rules:
1. If the content-coding is one of the content-codings listed in
the Accept-Encoding field, then it is acceptable, unless it is
accompanied by a qvalue of 0. (As defined in section 3.9, a
qvalue of 0 means "not acceptable.")
2. The special "*" symbol in an Accept-Encoding field matches any
available content-coding not explicitly listed in the header
field.
3. If multiple content-codings are acceptable, then the acceptable
content-coding with the highest non-zero qvalue is preferred.
4. The "identity" content-coding is always acceptable, unless
specifically refused because the Accept-Encoding field includes
"identity;q=0", or because the field includes "*;q=0" and does
not explicitly include the "identity" content-coding. If the
Accept-Encoding field-value is empty, then only the "identity"
encoding is acceptable.
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
Note: If the request does not include an Accept-Encoding field,
and if the "identity" content-coding is unavailable, then
content-codings commonly understood by HTTP/1.0 clients (i.e.,
"gzip" and "compress") are preferred; some older clients
improperly display messages sent with other content-codings. The
server might also make this decision based on information about
the particular user-agent or client.
Note: Most HTTP/1.0 applications do not recognize or obey qvalues
associated with content-codings. This means that qvalues will not
work and are not permitted with x-gzip or x-compress.
尼玛,好几种压缩算法,以后要在加入新的支持,本屌也要跟着改。。。。。。。
算了,我还是继续流氓吧
先试试在请求的时候,把压缩标识去掉。就是告诉服务器,客户的浏览器不支持压缩
服务器太友好了,返回未压缩的内容,哈哈,暂且这么处理。
好吧,你的友好,就是我流氓的原因。
ps:由于取消了压缩选项,后台发送过来的数据,会大很多,通常能大2-3倍的样子。很明显的,访问网页等待的时间延长了......
不过没事,我是流氓我怕谁呀!!!
最近做一个网络的安全模块,需要对网络数据进行处理。
先从浏览器下手,捕获数据。
对于我这一点不懂web的人来说,分析html真是一件遭罪的事情
不过还好,现有抓包工具,抓来分析
一切正常,连浏览器下载数据,都能正常的捕获,分析,虽然那种thunked模式的,很烦躁,好在也可以处理。
终于能用了,长出一口气
突然发现,对有些网站,不能处理,咦,怎么回事,好吧,抓包继续分析
发现content-encoding: gzip,我晕,一看就是压缩的
这下麻烦了,压缩的数据,我要处理的话,意味着我需要解压,处理完后,在压缩
先翻翻协议吧
http://www.w3.org/Protocols/rfc2616/rfc2616.html
5.3 Request Header Fields
The request-header fields allow the client to pass additional information about the request, and about the client itself, to the server. These fields act as request modifiers, with semantics equivalent to the parameters on a programming language method invocation.
request-header = Accept ; Section 14.1
| Accept-Charset ; Section 14.2
| Accept-Encoding ; Section 14.3
| Accept-Language ; Section 14.4
| Authorization ; Section 14.8
| Expect ; Section 14.20
| From ; Section 14.22
| Host ; Section 14.23
| If-Match ; Section 14.24
| If-Modified-Since ; Section 14.25
| If-None-Match ; Section 14.26
| If-Range ; Section 14.27
| If-Unmodified-Since ; Section 14.28
| Max-Forwards ; Section 14.31
| Proxy-Authorization ; Section 14.34
| Range ; Section 14.35
| Referer ; Section 14.36
| TE ; Section 14.39
| User-Agent ; Section 14.43
14.3 Accept-Encoding
The Accept-Encoding request-header field is similar to Accept, but restricts the content-codings (section 3.5) that are acceptable in the response.
Accept-Encoding = "Accept-Encoding" ":"
1#( codings [ ";" "q" "=" qvalue ] )
codings = ( content-coding | "*" )
Examples of its use are:
Accept-Encoding: compress, gzip
Accept-Encoding:
Accept-Encoding: *
Accept-Encoding: compress;q=0.5, gzip;q=1.0
Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0
A server tests whether a content-coding is acceptable, according to an Accept-Encoding field, using these rules:
1. If the content-coding is one of the content-codings listed in
the Accept-Encoding field, then it is acceptable, unless it is
accompanied by a qvalue of 0. (As defined in section 3.9, a
qvalue of 0 means "not acceptable.")
2. The special "*" symbol in an Accept-Encoding field matches any
available content-coding not explicitly listed in the header
field.
3. If multiple content-codings are acceptable, then the acceptable
content-coding with the highest non-zero qvalue is preferred.
4. The "identity" content-coding is always acceptable, unless
specifically refused because the Accept-Encoding field includes
"identity;q=0", or because the field includes "*;q=0" and does
not explicitly include the "identity" content-coding. If the
Accept-Encoding field-value is empty, then only the "identity"
encoding is acceptable.
If an Accept-Encoding field is present in a request, and if the server cannot send a response which is acceptable according to the Accept-Encoding header, then the server SHOULD send an error response with the 406 (Not Acceptable) status code.
If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.
Note: If the request does not include an Accept-Encoding field,
and if the "identity" content-coding is unavailable, then
content-codings commonly understood by HTTP/1.0 clients (i.e.,
"gzip" and "compress") are preferred; some older clients
improperly display messages sent with other content-codings. The
server might also make this decision based on information about
the particular user-agent or client.
Note: Most HTTP/1.0 applications do not recognize or obey qvalues
associated with content-codings. This means that qvalues will not
work and are not permitted with x-gzip or x-compress.
尼玛,好几种压缩算法,以后要在加入新的支持,本屌也要跟着改。。。。。。。
算了,我还是继续流氓吧
先试试在请求的时候,把压缩标识去掉。就是告诉服务器,客户的浏览器不支持压缩
服务器太友好了,返回未压缩的内容,哈哈,暂且这么处理。
好吧,你的友好,就是我流氓的原因。
ps:由于取消了压缩选项,后台发送过来的数据,会大很多,通常能大2-3倍的样子。很明显的,访问网页等待的时间延长了......
不过没事,我是流氓我怕谁呀!!!