http2解析系列

参考文档

rfc7540
rfc7541

背景

公司需求,忽略前因后果,有一个较为麻烦的点。能拿到一堆16进制数据,需要解析出http2的原数据,这一堆16进制数据可能是拆开的,也可能是多条数据合并到一起的,需要解析出header头和请求内容。且解析后的数据可能是gzip压缩的,如果是gzip压缩的文本类型,如text,json等,需要解析出文本并存储;解析后的数据也可能是原文。

示例

	data := []string{
		"505249202a20485454502f322e300d0a0d0a534d0d0a0d0a",
		"000006040000000000000401000000",
		"00000408000000000000ff0001",
		"00004f01050000000382049b60f1f460722d58d292d9531616a90b616c62b193a8e62afd107abf419644acad295649d0bd7350848d35ac93da930ceb90f4ff875886a8eb10649cbf50839bd9ab7a893f59d29ad86577d70f",
		"000012040000000000000300000080000400010000000500ffffff0000040800000000007fff0000",
		"000000040100000000",
		"000000040100000000",
		"000423010400000003887685de5aa635455f8b1d75d0620d263d4c7441ea0f0d033338366196c361be9413ea5f29141004ca8066e32cdc13aa62d1bf7b8b84842d695b05443c86aa6f588aa47e561cc581c034f001408821ea496a4ad4c8df93ce6c1db1794d999b9f3f970cc782367fd860830f139be4c7f222882218630c6e4628df79a8c8e8de8e52048571e71afe7f6c96c361be9413ea5f29141004ca8066e322b827d4c5a37f408ef2b472124a9620c9395642469b5186df7b5c58f26f408bf2b75948b10649cbd4bebfaf9d29aee30c566aaa2d8b1a99905b3b96c5e3f492c8b9ec9981c8b5634a4b654c585aa42d85b18ac64ea398abf441ea408ef2b24e85a71a27589611c68a4a478e0bc003aeb4f05b138dbccb820362408cf2b24e85ac2f6b4a84ac693f9d78b292595e2b25009f7de95c6de2b2500961c8cc6ebeeac3232eb2c8bf408df2b24e85ac2a2b3d482ac9352596c361be9413ea5f29141004ca8066e32cdc1094c5a37f408df2b24e859093d8398ab1281a1187ddbe1d37e1db7f7ceda8410ac0742b05a0fd2841927289b5e8112a64227fff60fa07d0800581f58fff9f4a10649ca25e8112a64227fff6dfd03ff9f4a10649ca25e8112a64227fff6dfd03ff9f4a7720c9394205c953816d9ffd83e81f42001607d63ffe7d29dc8324e57d7254e05b67ff6173e81ffc4089f2b585ed6950958d5f8f9648e4b921001b8272e3217ae5c71c408af2b252b26c190ab24737963490019fb24952c419272c13993f69f5596042469313408cf2b5854567a9055886aa53ff8f9648e4b921001b8272e3217ae5c71c408bf2b5854567a905588324e5862c931629cc9f408a416cee5b1649a935537f9724952c419272fdc854120c7937fd16498bf725b640173f408bf2b252b26c190ab4e7427fff09004a2236eb608e40bb238291b6e5000658dc69d78028e375a8df1c850b6f46468523408d969c14836db6dc7de91b7c8410c9201f946dbf1824707211a788fb6090a01b32bedca22b32b858df91d7db2372048372421632b8e140f38365f032dbb1c607c8e082376378826872b4c81d92409051c82091f9060133200af0c4c81c74050b8495a13c2f4089f2b252b26c190ab1a4a900161321109f0859644ebf7afb4e5fbcddf742d0af5d71f0debab38079ac01770357d9c0ec37de58004088f2b252b50798d27f9a1004c84427c2165913afdebed397ef377dd0b42bd75c7c37aeff5a839bd9ab40921d06591e0d2a569a83c63a1640fb9526a4bf870ba065e101e03b0f068565b644f87f4085f2b10649cb9ac7937a9bef6b8b460d1163c9bd490d6557022b810b8eb6065917408cf2b23c1a54ac81f72a4d497f96df697e940094ca3a941004d28066e059b8cbea62d1bf408cf2b23c1a54ac419272a4d49785138065e7bf54012a408e49a935532c3a283f858f61a6355f012a408528e6a0a69392647248200801740d05d680fbc00842c8445f0001820000000000031f8b0800000000000003b493dd8edb201085df85ebd8e61f9c5759ad2af00c5927fe1310abda28fbec254aa366a5b657f10dd20ce8cc37e89c0b99dc88644f1c548bcb19e354459c002346b2232bc6d4cf53b96735ad1513a537bafed6f8c87949fba649415401ab3ca77a390c29bbbc94e29c7b37d7dd3c36b33f367fc49b877843a9d6de367d297fd61f791c8a74c4349f638789ecdf2ee41c8757cf39a6db02a08a6e07c2522b9c90c248635a6741486e02d75232a5597938e08a05815d779bc0fc5efa8e6399a7a8bd41cb5508b493d207e44081231861704b9cb5d4734c5fc35225383d7f92e83cb30ec083d242510bae05ebb9744160eb1d3c53bdef4870c3e05d777a1dda43f10ef5a87ecc4beec7fe13b71a84d3a19f6e36bc9055dcce7fa7a45ac5b7a0881214c65f9693225f654cb9b9eb6e9e97ffcebb9bc241a05e8696794d191a6f5971456885a4de688d6a0babfe1debc9a91218b742b9d6b69da7b2e4990372c5997220b895df9c7abd5e7f010000ffff03001ed95fc7f9040000",
		"000000000100000003",
		"00002b01050000000582049d60f1f460722d58d292d9531616a90b616c600071c8de60d5485f2bf447c18772868fd24a8800b7bf",
		"",
	}

其中,前4条是请求数据,后7条是响应数据。

解析

帧结构

如何将这堆16进制数据解析成http2请求?
首先将16进制解码成原文,看原文是否有含义

	hexStr := "505249202a20485454502f322e300d0a0d0a534d0d0a0d0a"
	hexString, _ := hex.DecodeString(hexStr)
	fmt.Println("hexString is:", string(hexString))

能看出第一个数据是http2开头的数据
“PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n”
实际上,http2的第一条连接数据都是0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a,代表开始http2请求。
其他的16进制数据大部分解析出来是乱码,其实是二进制流。也有部分数据能解析出明文,这些应该是body的一部分。

RFC7540规定,帧的结构:
±----------------------------------------------+
| Length (24) |
±--------------±--------------±--------------+
| Type (8) | Flags (8) |
±±------------±--------------±------------------------------+
|R| Stream Identifier (31) |
+=+=============================================================+
| Frame Payload (0…) …
±--------------------------------------------------------------+
单位是bit,即一共24+8+8+1+31=72bit
即72/8=9byte
16进制2位才能表示1个byte,因此前18位都是帧结构中固定的数据。

解析数据

第一条数据

除开固定开头外,下一条数据为:
000006040000000000000401000000
1 0x000006 长度为24bit也就是6位16进制数,表示6byte,也就是12个16进制数。
2 0x04 接下来8bit也就是2位16进制数表示Type,根据对应表
±--------------±-----±-------------+
| Frame Type | Code | Section |
±--------------±-----±-------------+
| DATA | 0x0 | Section 6.1 |
| HEADERS | 0x1 | Section 6.2 |
| PRIORITY | 0x2 | Section 6.3 |
| RST_STREAM | 0x3 | Section 6.4 |
| SETTINGS | 0x4 | Section 6.5 |
| PUSH_PROMISE | 0x5 | Section 6.6 |
| PING | 0x6 | Section 6.7 |
| GOAWAY | 0x7 | Section 6.8 |
| WINDOW_UPDATE | 0x8 | Section 6.9 |
| CONTINUATION | 0x9 | Section 6.10 |
±--------------±-----±-------------+
可以看出0x4,是SETTINGS帧。看下SETTING帧的介绍:

A SETTINGS frame MUST be sent by both endpoints at the start of a connection and MAY be sent at any other time by either endpoint over the lifetime of the connection

也就是说,第二帧一定是SETTINGS帧,数据满足条件。接下来看setting的Flag定义。
3 0x00 接下来8bit也就是2位16进制数表示Flags, 由于类别是SETTINGS,直接看setting的flag相关:

SETTINGS frames always apply to a connection, never a single stream. The stream identifier for a SETTINGS frame MUST be zero (0x0)

表示setting帧总是用在连接中而不是单个数据流中,流标识符应该是0x0,HTTP/2中的SETTINGS帧的FLAGS字段为0x00时,表示该帧没有任何标志位被设置,即所有位都为零。这种情况下,SETTINGS帧的作用是用于传递端点(endpoint)的配置参数,但不触发任何特定的动作或行为。这样的帧通常用于在HTTP/2连接的初始阶段或连接期间的某些时候,用于协商和传递配置参数而不引起任何额外的操作
4 0x00000000接下来32bit(也就是8个16进制),32bit中第一位是R,剩余31bit是流标识符。

R: R: A reserved 1-bit field. The semantics of this bit are undefined, and the bit MUST remain unset (0x0) when sending and MUST be ignored when receiving.

这是1位的保留字段,必须设置为0x0,且必须接收时忽略

Stream Identifier:A stream identifier (see Section 5.1.1) expressed as an unsigned 31-bit integer. The value 0x0 is reserved for frames that are associated with the connection as a whole as opposed to an individual stream.

流标识符,暂时先不仔细看这个到底干嘛的,可以理解为一个标识,但是全是0应该有特殊含义,查阅文档发现:

Streams are identified with an unsigned 31-bit integer. Streams initiated by a client MUST use odd-numbered stream identifiers; those initiated by the server MUST use even-numbered stream identifiers. A stream identifier of zero (0x0) is used for connection control messages; the stream identifier of zero cannot be used to establish a new stream

当流标识符为0x0时,是用来连接控制的,也就是说没什么实际含义,那就先忽略。
5 0x000401000000内容,首先长度为0x0000006 也就是12个16进制数。那也就是这条数据剩余部分就是内容。查询setting帧内容含义。

±------------------------------+
| Identifier (16) |
±------------------------------±------------------------------+
| Value (32) |
±--------------------------------------------------------------+

可以看出setting帧内容是48bit,也就是12个16进制数。
Identifier (16)4个16进制数,0x0004,查询作用:

SETTINGS_INITIAL_WINDOW_SIZE (0x4): Indicates the sender’s initial
window size (in octets) for stream-level flow control. The
initial value is 2^16-1 (65,535) octets.
This setting affects the window size of all streams (see
Section 6.9.2).
Values above the maximum flow-control window size of 2^31-1 MUST
be treated as a connection error (Section 5.4.1) of type
FLOW_CONTROL_ERROR.

作用是设置初始窗口大小,值为:0x01000000,即16777216(in octets),以8字节为单位,即134,217,728字节

第二条数据

原数据为0x00000408000000000000ff0001
数据长度为0x0000048个16进制数。
Type为0x8 WINDOW_UPDATE
Flags为0x00, The WINDOW_UPDATE frame does not define any flags,WINDOW_UPDATE帧没有定义任何FLAGS,因此忽略
R+Stream Identifier:0x00000000, 无意义
值:0x00ff0001

The payload of a WINDOW_UPDATE frame is one reserved bit plus an
unsigned 31-bit integer indicating the number of octets that the
sender can transmit in addition to the existing flow-control window.
The legal range for the increment to the flow-control window is 1 to
2^31-1 (2,147,483,647) octets

1个保留位+32bit的字节数,表示流量控制窗口的大小,一般来说不需要处理,忽略

第三条数据

原数据为:0x00004f01050000000382049b60f1f460722d58d292d9531616a90b616c62b193a8e62afd107abf419644acad295649d0bd7350848d35ac93da930ceb90f4ff875886a8eb10649cbf50839bd9ab7a893f59d29ad86577d70f
长度:为0x00004f:表示79字节,即158个16进制数
Type:为0x01,即HEADERS
FLAGS:为0x05,能查询到的header flags定义有:

END_STREAM (0x1)
END_HEADERS (0x4)
PADDED (0x8)
PRIORITY (0x20)

0x05表示设置了END_STREAM (0x1)和END_HEADERS (0x4)两个FLAGS,END_STREAM (0x1)表示这个帧将会是这个请求的最后一个header帧,END_HEADERS (0x4)表示header已经传完了
流标识符:0x00000003流标识符,先不管
数据:0x82049b60f1f460722d58d292d9531616a90b616c62b193a8e62afd107abf419644acad295649d0bd7350848d35ac93da930ceb90f4ff875886a8eb10649cbf50839bd9ab7a893f59d29ad86577d70f:数据部分,直接hpack解析(hpack解析工具可以github上搜索,python的golang的工具都有),能解析到数据为:
[{:method GET false} {:path /obj/ad-pattern/renderer/package.json false} {:authority sf3-fe-tos.pglstatp-toutiao.com false} {:scheme https false} {cache-control no-cache false} {accept-encoding gzip false} {user-agent okhttp/3.9.1 false}]

第四条数据

原数据为0x000012040000000000000300000080000400010000000500ffffff0000040800000000007fff0000
长度:00001218字节,即36个16进制数
Type:0x04 SETTINGS
FLAGS:0x00无意义
流标识符:0x00000000无意义,忽略
内容:原本剩余的有:0x000300000080000400010000000500ffffff0000040800000000007fff0000,但根据长度定义,内容应该是0x000300000080000400010000000500ffffff,剩余的属于下一帧。
setting内容定义:

±------------------------------+
| Identifier (16) |
±------------------------------±------------------------------+
| Value (32) |
±--------------------------------------------------------------+

前面16bit,也就是8个2进制数是Identifier,标识符都先忽略
后面32bit,也就是16个2进制数是Value,0080000400010000000500ffffff

第四条数据_02

下一条数据是从上一条拆分出来的:
0x0000040800000000007fff0000
长度:0x000004 4字节,即8个16进制数
类型:0x08WINDOW_UPDATE
Flags:00
流标识符:00000000
值:7fff0000

第五条数据

原数据为0x000000040100000000
长度:0
类型:0x04setting
FLAGS:0x01 ACK
流标识符:00000000

第六条数据

与第五条数据相同

第七条数据_01

原数据为:
000423010400000003887685de5aa635455f8b1d75d0620d263d4c7441ea0f0d033338366196c361be9413ea5f29141004ca8066e32cdc13aa62d1bf7b8b84842d695b05443c86aa6f588aa47e561cc581c034f001408821ea496a4ad4c8df93ce6c1db1794d999b9f3f970cc782367fd860830f139be4c7f222882218630c6e4628df79a8c8e8de8e52048571e71afe7f6c96c361be9413ea5f29141004ca8066e322b827d4c5a37f408ef2b472124a9620c9395642469b5186df7b5c58f26f408bf2b75948b10649cbd4bebfaf9d29aee30c566aaa2d8b1a99905b3b96c5e3f492c8b9ec9981c8b5634a4b654c585aa42d85b18ac64ea398abf441ea408ef2b24e85a71a27589611c68a4a478e0bc003aeb4f05b138dbccb820362408cf2b24e85ac2f6b4a84ac693f9d78b292595e2b25009f7de95c6de2b2500961c8cc6ebeeac3232eb2c8bf408df2b24e85ac2a2b3d482ac9352596c361be9413ea5f29141004ca8066e32cdc1094c5a37f408df2b24e859093d8398ab1281a1187ddbe1d37e1db7f7ceda8410ac0742b05a0fd2841927289b5e8112a64227fff60fa07d0800581f58fff9f4a10649ca25e8112a64227fff6dfd03ff9f4a10649ca25e8112a64227fff6dfd03ff9f4a7720c9394205c953816d9ffd83e81f42001607d63ffe7d29dc8324e57d7254e05b67ff6173e81ffc4089f2b585ed6950958d5f8f9648e4b921001b8272e3217ae5c71c408af2b252b26c190ab24737963490019fb24952c419272c13993f69f5596042469313408cf2b5854567a9055886aa53ff8f9648e4b921001b8272e3217ae5c71c408bf2b5854567a905588324e5862c931629cc9f408a416cee5b1649a935537f9724952c419272fdc854120c7937fd16498bf725b640173f408bf2b252b26c190ab4e7427fff09004a2236eb608e40bb238291b6e5000658dc69d78028e375a8df1c850b6f46468523408d969c14836db6dc7de91b7c8410c9201f946dbf1824707211a788fb6090a01b32bedca22b32b858df91d7db2372048372421632b8e140f38365f032dbb1c607c8e082376378826872b4c81d92409051c82091f9060133200af0c4c81c74050b8495a13c2f4089f2b252b26c190ab1a4a900161321109f0859644ebf7afb4e5fbcddf742d0af5d71f0debab38079ac01770357d9c0ec37de58004088f2b252b50798d27f9a1004c84427c2165913afdebed397ef377dd0b42bd75c7c37aeff5a839bd9ab40921d06591e0d2a569a83c63a1640fb9526a4bf870ba065e101e03b0f068565b644f87f4085f2b10649cb9ac7937a9bef6b8b460d1163c9bd490d6557022b810b8eb6065917408cf2b23c1a54ac81f72a4d497f96df697e940094ca3a941004d28066e059b8cbea62d1bf408cf2b23c1a54ac419272a4d49785138065e7bf54012a408e49a935532c3a283f858f61a6355f012a408528e6a0a69392647248200801740d05d680fbc00842c8445f0001820000000000031f8b0800000000000003b493dd8edb201085df85ebd8e61f9c5759ad2af00c5927fe1310abda28fbec254aa366a5b657f10dd20ce8cc37e89c0b99dc88644f1c548bcb19e354459c002346b2232bc6d4cf53b96735ad1513a537bafed6f8c87949fba649415401ab3ca77a390c29bbbc94e29c7b37d7dd3c36b33f367fc49b877843a9d6de367d297fd61f791c8a74c4349f638789ecdf2ee41c8757cf39a6db02a08a6e07c2522b9c90c248635a6741486e02d75232a5597938e08a05815d779bc0fc5efa8e6399a7a8bd41cb5508b493d207e44081231861704b9cb5d4734c5fc35225383d7f92e83cb30ec083d242510bae05ebb9744160eb1d3c53bdef4870c3e05d777a1dda43f10ef5a87ecc4beec7fe13b71a84d3a19f6e36bc9055dcce7fa7a45ac5b7a0881214c65f9693225f654cb9b9eb6e9e97ffcebb9bc241a05e8696794d191a6f5971456885a4de688d6a0babfe1debc9a91218b742b9d6b69da7b2e4990372c5997220b895df9c7abd5e7f010000ffff03001ed95fc7f9040000

长度:0x000423,1059byte, 2018个16进制数
类型: 0x01 HEADERS
FLAGS: 0x04:END_HEADERS, 而这个数据是响应里的数据,意思是响应头就这一条
流标识符:0x00000003
数据:887685de5aa635455f8b1d75d0620d263d4c7441ea0f0d033338366196c361be9413ea5f29141004ca8066e32cdc13aa62d1bf7b8b84842d695b05443c86aa6f588aa47e561cc581c034f001408821ea496a4ad4c8df93ce6c1db1794d999b9f3f970cc782367fd860830f139be4c7f222882218630c6e4628df79a8c8e8de8e52048571e71afe7f6c96c361be9413ea5f29141004ca8066e322b827d4c5a37f408ef2b472124a9620c9395642469b5186df7b5c58f26f408bf2b75948b10649cbd4bebfaf9d29aee30c566aaa2d8b1a99905b3b96c5e3f492c8b9ec9981c8b5634a4b654c585aa42d85b18ac64ea398abf441ea408ef2b24e85a71a27589611c68a4a478e0bc003aeb4f05b138dbccb820362408cf2b24e85ac2f6b4a84ac693f9d78b292595e2b25009f7de95c6de2b2500961c8cc6ebeeac3232eb2c8bf408df2b24e85ac2a2b3d482ac9352596c361be9413ea5f29141004ca8066e32cdc1094c5a37f408df2b24e859093d8398ab1281a1187ddbe1d37e1db7f7ceda8410ac0742b05a0fd2841927289b5e8112a64227fff60fa07d0800581f58fff9f4a10649ca25e8112a64227fff6dfd03ff9f4a10649ca25e8112a64227fff6dfd03ff9f4a7720c9394205c953816d9ffd83e81f42001607d63ffe7d29dc8324e57d7254e05b67ff6173e81ffc4089f2b585ed6950958d5f8f9648e4b921001b8272e3217ae5c71c408af2b252b26c190ab24737963490019fb24952c419272c13993f69f5596042469313408cf2b5854567a9055886aa53ff8f9648e4b921001b8272e3217ae5c71c408bf2b5854567a905588324e5862c931629cc9f408a416cee5b1649a935537f9724952c419272fdc854120c7937fd16498bf725b640173f408bf2b252b26c190ab4e7427fff09004a2236eb608e40bb238291b6e5000658dc69d78028e375a8df1c850b6f46468523408d969c14836db6dc7de91b7c8410c9201f946dbf1824707211a788fb6090a01b32bedca22b32b858df91d7db2372048372421632b8e140f38365f032dbb1c607c8e082376378826872b4c81d92409051c82091f9060133200af0c4c81c74050b8495a13c2f4089f2b252b26c190ab1a4a900161321109f0859644ebf7afb4e5fbcddf742d0af5d71f0debab38079ac01770357d9c0ec37de58004088f2b252b50798d27f9a1004c84427c2165913afdebed397ef377dd0b42bd75c7c37aeff5a839bd9ab40921d06591e0d2a569a83c63a1640fb9526a4bf870ba065e101e03b0f068565b644f87f4085f2b10649cb9ac7937a9bef6b8b460d1163c9bd490d6557022b810b8eb6065917408cf2b23c1a54ac81f72a4d497f96df697e940094ca3a941004d28066e059b8cbea62d1bf408cf2b23c1a54ac419272a4d49785138065e7bf54012a408e49a935532c3a283f858f61a6355f012a408528e6a0a69392647248200801740d05d680fbc00842c8445f
从网上可以找到解析header的包,如python-hpack,go-hpack等都可以解析出来,这里的解析结果如下:

[{:status 200 false} {server Tengine false} {content-type application/json false} {content-length 386 false} {date Fri, 29 Dec 2023 03:33:27 GMT false} {vary Accept-Encoding false} {cache-control max-age=604800 fals
e} {content-md5 LiEqGxtrK5hLx6i/wc5oZA== false} {etag W/“2e212a1b1b6b2b984bc7a8bfc1ce6864” false} {last-modified Fri, 29 Dec 2023 03:32:29 GMT false} {x-bdcdn-cache-status TCP_HIT false} {x-kfc-cachekey http://pinner-imgserver.byted
.org/ad-pattern/renderer/package.json false} {x-tos-hash-crc64ecma 18007748152658362052 false} {x-tos-request-id 8efcff8e3e02998f658e3e02-ad3b797-ac37332 false} {x-tos-response-time Fri, 29 Dec 2023 03:33:22 GMT false} {x-tos-storag
e-class STANDARD false} {via n211-071-141, cache25.l2cn3129[0,0,200-0,H], cache2.l2cn3129[5,0], cache2.l2cn3129[5,0], vcache10.cn6153[0,0,200-0,H], vcache9.cn6153[16,0] false} {x-request-ip fdbd:dc01:26:318::66 false} {x-tt-trace-ta
g id=03;cdn-cache=hit;type=static false} {x-response-cinfo fdbd:dc01:26:318::66 false} {x-response-cache edge_hit false} {server-timing cdn-cache;desc=HIT,edge;dur=16 false} {x-tt-trace-host 01e2c5750bd17d62d55f00aeb647802bb74b9ade1
58bc42d40b3462da555698d59cc2acd09fa59b0d6adc48c950de0a3f95f2e3f6eb9d795c5d1ca6dcebe66e086a390357bb09c621a7b8c24af4307dd1c2bd21c9da023d0e8a230670e16cf4282 false} {x-tt-trace-id 00-2312291133279CD46DC5D7142CB691CB-6084E17E4D3E7AD8-00
false} {x-tt-logid 202312291133279CD46DC5D7142CB691CB false} {content-encoding gzip false} {ali-swift-global-savetime 1703820807 false} {age 353291 false} {x-cache HIT TCP_MEM_HIT dirn:12:116750332 false} {x-swift-savetime Tue, 02 Jan 2024 03:13:39 GMT false} {x-swift-cachetime 260388 false} {access-control-allow-origin * false} {timing-allow-origin * false} {eagleid 3add202017041740980111312e false}]

注意hpack的源码,当存储静态表时,需要把数据存到缓存或者数据库中,否则下次查静态表可能缺失数据或数据不准确!!!
注意这条数据按照长度算还有一堆数据超出length,要切割到下一条

第七条数据_02

上一条数据根据长度裁剪后还剩下一部分,应该是单独的数据帧。
值为:0x0001820000000000031f8b0800000000000003b493dd8edb201085df85ebd8e61f9c5759ad2af00c5927fe1310abda28fbec254aa366a5b657f10dd20ce8cc37e89c0b99dc88644f1c548bcb19e354459c002346b2232bc6d4cf53b96735ad1513a537bafed6f8c87949fba649415401ab3ca77a390c29bbbc94e29c7b37d7dd3c36b33f367fc49b877843a9d6de367d297fd61f791c8a74c4349f638789ecdf2ee41c8757cf39a6db02a08a6e07c2522b9c90c248635a6741486e02d75232a5597938e08a05815d779bc0fc5efa8e6399a7a8bd41cb5508b493d207e44081231861704b9cb5d4734c5fc35225383d7f92e83cb30ec083d242510bae05ebb9744160eb1d3c53bdef4870c3e05d777a1dda43f10ef5a87ecc4beec7fe13b71a84d3a19f6e36bc9055dcce7fa7a45ac5b7a0881214c65f9693225f654cb9b9eb6e9e97ffcebb9bc241a05e8696794d191a6f5971456885a4de688d6a0babfe1debc9a91218b742b9d6b69da7b2e4990372c5997220b895df9c7abd5e7f010000ffff03001ed95fc7f9040000
长度:0x000182
类型:0x00 DATA
FLAGS: 0x00
流标识符:0x00000003
数据:0001820000000000031f8b0800000000000003b493dd8edb201085df85ebd8e61f9c5759ad2af00c5927fe1310abda28fbec254aa366a5b657f10dd20ce8cc37e89c0b99dc88644f1c548bcb19e354459c002346b2232bc6d4cf53b96735ad1513a537bafed6f8c87949fba649415401ab3ca77a390c29bbbc94e29c7b37d7dd3c36b33f367fc49b877843a9d6de367d297fd61f791c8a74c4349f638789ecdf2ee41c8757cf39a6db02a08a6e07c2522b9c90c248635a6741486e02d75232a5597938e08a05815d779bc0fc5efa8e6399a7a8bd41cb5508b493d207e44081231861704b9cb5d4734c5fc35225383d7f92e83cb30ec083d242510bae05ebb9744160eb1d3c53bdef4870c3e05d777a1dda43f10ef5a87ecc4beec7fe13b71a84d3a19f6e36bc9055dcce7fa7a45ac5b7a0881214c65f9693225f654cb9b9eb6e9e97ffcebb9bc241a05e8696794d191a6f5971456885a4de688d6a0babfe1debc9a91218b742b9d6b69da7b2e4990372c5997220b895df9c7abd5e7f010000ffff03001ed95fc7f9040000
这些数据直接解析为明文解析不出来,回看前文的header信息,{content-encoding gzip false},出现了content-encoding值为gzip,说明内容可能经过gzip压缩。
于是将数据直接写入gzip文件,示例:

package main

import (
	"code.avlyun.org/hexiaojiao/gindemo/index/hpack/hpack"
	"encoding/base64"
	"encoding/hex"
	"encoding/json"
	"errors"
	"fmt"
	"os"
	"strconv"
)

func main() {
	ParseHttp2Hex("0001820000000000031f8b0800000000000003b493dd8edb201085df85ebd8e61f9c5759ad2af00c5927fe1310abda28fbec254aa366a5b657f10dd20ce8cc37e89c0b99dc88644f1c548bcb19e354459c002346b2232bc6d4cf53b96735ad1513a537bafed6f8c87949fba649415401ab3ca77a390c29bbbc94e29c7b37d7dd3c36b33f367fc49b877843a9d6de367d297fd61f791c8a74c4349f638789ecdf2ee41c8757cf39a6db02a08a6e07c2522b9c90c248635a6741486e02d75232a5597938e08a05815d779bc0fc5efa8e6399a7a8bd41cb5508b493d207e44081231861704b9cb5d4734c5fc35225383d7f92e83cb30ec083d242510bae05ebb9744160eb1d3c53bdef4870c3e05d777a1dda43f10ef5a87ecc4beec7fe13b71a84d3a19f6e36bc9055dcce7fa7a45ac5b7a0881214c65f9693225f654cb9b9eb6e9e97ffcebb9bc241a05e8696794d191a6f5971456885a4de688d6a0babfe1debc9a91218b742b9d6b69da7b2e4990372c5997220b895df9c7abd5e7f010000ffff03001ed95fc7f9040000")
}

func ParseHttp2Hex(hexStr string) {
	// 字符串转16进制
	hexString, _ := hex.DecodeString(hexStr)
	fmt.Println("", string(hexString))
	if len(hexStr) >= 6 {
		frameLenResult, _ := slicePreStr(6, hexStr)
		frameLen, _ := strconv.ParseInt(frameLenResult, 16, 64)

		frameTypeResult, _ := sliceStr(7, 8, hexStr)
		frameType, _ := strconv.ParseInt(frameTypeResult, 16, 64)

		frameFlagResult, _ := sliceStr(9, 10, hexStr)
		frameFlag, _ := strconv.ParseInt(frameFlagResult, 16, 64)

		streamIdentifierResult, _ := getSliceLen(10, 8, hexStr)
		streamIdentifier, _ := strconv.ParseInt(streamIdentifierResult, 16, 64)

		contLen := int(frameLen) * 2
		contResult, _ := getSliceLen(18, contLen, hexStr)

		fmt.Println("________0___origin__", hexStr)
		fmt.Println("________1___frameLen__", frameLen)
		fmt.Println("________2___frameType__", frameType)
		fmt.Println("________3___frameFlag__", frameFlag)
		fmt.Println("________4___streamIdentifier__", streamIdentifier)
		fmt.Println("________5___cont__", contResult)
		if frameType == 0 {
			dataDecode(contResult)
		} else if frameType == 1 || frameType == 2 {
			headerDecode([]string{contResult})
		}
	} else {
		fmt.Println("n 超过了原始字符串的长度")
	}
}

func headerDecode(encodedHexValues []string) {
	decoder := hpack.NewDecoder(2560)
	decoder.SetDynamicTableMaxSize(2560)
	for _, encodedHex := range encodedHexValues {
		encoded := make([]byte, len(encodedHex)/2)
		_, err := hex.Decode(encoded, []byte(encodedHex))
		if err != nil {
			fmt.Println("_________1______", err)
		}
		headers, err := decoder.Decode(encoded)
		if err != nil {
			fmt.Println("_________2______", err)
		}
		headersJson, _ := json.Marshal(headers)
		showDataStr := base64.StdEncoding.EncodeToString(headersJson)

		fmt.Println("____base64__", showDataStr)
		fmt.Println("_________3______", headers)
	}
}

func dataDecode(dataHex string) {
	data, _ := hex.DecodeString(dataHex)
	WriteFile(data, "test1.gz")
}

// WriteFile 写入文件
func WriteFile(data []byte, fileName string) error {
	dstFile, err := os.Create(fileName)
	if err != nil {
		fmt.Println(err.Error())
		return err
	}
	defer dstFile.Close()
	dstFile.WriteString(string(data))
	return nil
}

// slicePreStr 截取字符串前n位
func slicePreStr(n int, originalString string) (str string, err error) {
	if n <= len(originalString) {
		return originalString[:n], nil
	} else {
		return "", errors.New("无效的长度")
	}
}

// sliceStr 截取字符串第m到第n位
func sliceStr(m int, n int, originalString string) (str string, err error) {
	if m >= 0 && n <= len(originalString) && m <= n {
		return originalString[m:n], nil
	} else {
		return "", errors.New("无效的长度")
	}
}

// getSliceLen 从第startIndex位开始,截取长度为length的子字符串
func getSliceLen(startIndex int, length int, originalString string) (substring string, err error) {
	if startIndex >= 0 && startIndex < len(originalString) && (startIndex+length) <= len(originalString) {
		return originalString[startIndex : startIndex+length], nil
	} else {
		return "", errors.New("无效的startIndex或length值")
	}
}

解压gzip文件得到数据内容为:
{"name":"ad-pattern-renderer","version":"1.0.513","main":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/index.html","resources":[{"url":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/index.js","md5":"cd38083a34374779a8d3427f26441561","level":1},{"url":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/index.html","md5":"81b0e6b7e825ff0c44bfe2d0d2ed737e","level":1},{"url":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/vendors~lp-sdk.js","md5":"3cb18addbd563508da9d8b24af3e9bad","level":1}],"fallback":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/fallback.js","fallback_optimize":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer/0066b8/fallback.js","engines":{"v3":{"name":"ad-pattern-renderer-v3","version":"3.0.12","main":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer-v3-test/3.0.12/index.html","resources":[{"url":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer-v3-test/3.0.12/index.html","md5":"adf0b4f91b601e7b81a9df9340b766e5","level":1},{"url":"https://sf3-fe-tos.pglstatp-toutiao.com/obj/ad-pattern/renderer-v3-test/3.0.12/index.js","md5":"4d12835a989cb0479a2de25215ad3284","level":1}]}}}

第八条数据

000000000100000003
长度:0
类型:0x0DATA
FLAGS: 01:

END_STREAM (0x1): When set, bit 0 indicates that this frame is the
last that the endpoint will send for the identified stream.
Setting this flag causes the stream to enter one of the “half-
closed” states or the “closed” state (Section 5.1).

第九条数据

00002b01050000000582049d60f1f460722d58d292d9531616a90b616c600071c8de60d5485f2bf447c18772868fd24a8800b7bf
长度:0x00002b:43byte
类型:HEADER
FLAGS:0x05
流标识符:0x00000005
值:0x82049d60f1f460722d58d292d9531616a90b616c600071c8de60d5485f2bf447c18772868fd24a8800b7bf
这里值如果直接hpack解析是解析不出来的,因为中间有部分数据读的动态表,因此必须动态表存储过才能解析。
静态表的长度为61,具体的看下rfc7541Static Table Definition部分。加上之前的34条要写入动态表的数据,这里才能解析。
能拿到内容为:

[{:method GET false} {:path /obj/ad-pattern/renderer/0066b8/index.js false} {content-length 386 false} {:scheme https false} {range bytes=0- false} {server Tengine false}]

第十条数据

第十条数据比较长,原文已写在开头。依然是一堆header,需要读取静态表和动态表,然后拆分后还有一堆DATA内容,是gzip的,解析过程省略。

  • 24
    点赞
  • 27
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值