dwg文件格式中MODULAR CHARS(模块字符)的详解(压缩算法)

Modular characters are a method of storing compressed integer values. They are used in the object map to

indicate both handle offsets and file location offsets. They consist of a stream  of bytes, terminating when

the high bit of the byte is 0.

模块化字符是存储压缩整数值的方法。它们在对象映射中用于指示句柄偏移和文件位置偏移。它们由字节流组成,当字节的高位为0时终止。

In each byte, the high bit is a flag; when set, it indicates that  another byte follows. The concept is not

difficult to understand, but is a little difficult to explain. Lets look  at an example.

在每个字节中,高位是一个标志;当设置时,它指示另一个字节跟随。这个概念不难理解,但解释起来有点困难。让我们看一个例子。

Assume the next two bytes in the file are:

10000010 00100100

We read bytes until we reach a byte with a high bit of 0.  Obviously the second byte meets that criterion.

Since we are reading from least significant to most significant, let's reverse the order of the bytes so that

they read MSB to LSB from left to right.

假设文件中的下两个字节是:
10000010 00100100
我们读字节直到达到一个高0位的字节。显然第二个字节符合这个标准。

因为我们是从最重要的到最重要的阅读,让我们逆转的字节,从左至右读到LSB MSB的顺序。


Now we drop the high order flag bits:

现在我们放弃了高阶标志位:


And then re-group the bits from right to left, padding on the left with 0's:

然后从右到左重新分组,左边填充0个:


Result = 2 + 18*256 = 4610

Here’s another example using the basic formF1101001 F0010111 F1100110 00110101:

11101001 10010111 11100110 00110101

这里的另一个例子使用的基本formf1101001 f0010111 f1100110 00110101:

11101001 10010111 11100110 00110101

We read bytes until we reach a byte with a high bit of 0.  Obviously the fourth byte meets that criterion.

Since we are reading from least significant to most significant, let's reverse the order of the bytes so that

they read MSB to LSB from left to right.

我们读字节直到达到一个高0位的字节。显然,第四字节符合这个标准。
因为我们是从最重要的到最重要的阅读,让我们逆转的字节,从左至右读到LSB MSB的顺序。


Now we drop the high order flag bits:

现在我们放弃了高阶标志位:


And then re-group the bits from right to left, padding on the left with 0's:

然后从右到左重新分组,左边填充0个:


Result:233+139*256+185*256^2+6*256^3=112823273

This process is further complicated by the fact that if the final byte (high bit 0) also  has the 64 bit (0x40)

set, this means to negate the number.

This is a negative number: 10000101 01001011

Since we are reading from least significant to most significant, let's reverse the order of the bytes so that

they read MSB to LSB from left to right.

这一过程进一步复杂化的事实,如果最后的字节(高0位)也有64位(0x40),这意味着否定的数量。
这是负数:10000101 01001011。
因为我们是从最重要的到最重要的阅读,让我们逆转的字节,从左至右读到LSB MSB的顺序。


We then clear the bit that was used to represent the negative number, and note that the result must be

negated:

然后,我们清除用来表示负数的位,并注意结果必须否定:


Now we drop the high order flag bits:

现在我们放弃了高阶标志位:

 

And then re-group the bits from right to left, padding on the left with 0's:

然后从右到左重新分组,左边填充0个:


Result: 133+5*256=1413, which we negate to get 1413

Modular chars are also used to store handle offsets in the object map. In this  case there is no negation

used; handles in the object map are always in increasing order.

模块字符也用于存储对象映射中的句柄偏移。在这种情况下,没有使用否定;对象映射中的句柄总是递增的。

 MODULAR SHORTS

Modular shorts work just like modular chars -- except that the base module is a short  instead of a char.

There are only two cases to worry about here (from a practical point of view), because, in the case of

shorts, two modules make a long, and since these are used only to indicate object sizes, a  maximum

object size of 1 GB is probably correct.

00110001 11110100 10001101 00000000.

Reverse the order of the shorts:

模块化的shorts 就像模块化的字符一样工作——只是基本模块是短的,而不是char。
这里只有两个需要担心的例子(从实用的角度来看),因为在短时间的情况下,两个模块做得很长,因为这些只用于表示对象大小,所以最大的对象大小为1 GB可能是正确的。
00110001 11110100 10001101 00000000。

颠倒shorts 的顺序:


Reverse the order of the bytes in each short:

反转每个字节的字节顺序:


Drop the high order flag bit of each short:

删除每个短的高阶标志位:


And then re-group the bits from right to left, padding on the left with 0's:

然后从右到左重新分组,左边填充0个:


Result: 62513+70*65536=4650033

阅读更多

没有更多推荐了,返回首页