近日在分析DNS应答包的过程中发现一个有趣的现象——本应该显示域名的字段中频繁出现"0xC00C",或者其他以“C0”开头的两字节。查RFC 1035
4.1.4. Message compression In order to reduce the size of messages, the domain system utilizes a compression scheme which eliminates the repetition of domain names in a message. In this scheme, an entire domain name or a list of labels at the end of a domain name is replaced with a pointer to a prior occurance of the same name. The pointer takes the form of a two octet sequence: +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ | 1 1| OFFSET | +--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+ The first two bits are ones. This allows a pointer to be distinguished from a label, since the label must begin with two zero bits because labels are restricted to 63 octets or less. (The 10 and 01 combinations are reserved for future use.) The OFFSET field specifies an offset from the start of the message (i.e., the first octet of the ID field in the domain header). A zero offset specifies the first byte of the ID field, etc. The compression scheme allows a domain name in a message to be represented as either: - a sequence of labels ending in a zero octet - a pointer - a sequence of labels ending with a pointer Pointers can only be used for occurances of a domain name where the format is not class specific. If this were not the case, a name server or resolver would be required to know the format of all RRs it handled. As yet, there are no such cases, but they may occur in future RDATA formats.
得知原来这是DNS协议消息压缩技术,使用偏移指针代替重复的字符串。该指针用两个8bit表示,具体规定如下:
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1| OFFSET |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
最开始的两个bit都为1,可以与“label”(域名字符串)区分开。后面的14bit表示字符串在整个DNS消息包中的偏移量。由于DNS应答包中的Answers段出现的域名往往在Queries中已经出现,因此后面只需使用其偏移量表示即可。显然,Queries中的域名出现的频率最高,而其中第一个出现的域名偏移量固定为12字节(00001100),加上最开始的两个1,那二进制就是
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
| 1 1 | 0 0 0 0 0 0 0 0 0 0 1 1 0 0|
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
即0xC00C。以此类推,因最开始出现的字符串都比较靠前,因此指针大多以“C0”开头。