算术编码工作原理

最新推荐文章于 2019-06-05 19:37:55 发布

haiqianjing

最新推荐文章于 2019-06-05 19:37:55 发布

阅读量5.7k

点赞数

分类专栏：图像编码文章标签：工作 ibm licensing compression stream each

图像编码专栏收录该内容

2 篇文章 0 订阅

订阅专栏

算术编码 是一种无损数据压缩方法，也是一种熵编码的方法。和其它熵编码方法不同的地方在于，其他的熵编码方法通常是把输入的消息分割为符号，然后对每个符号进行编码，而算术编码是直接把整个输入的消息编码为一个数，一个满足(0.0 ≤ n < 1.0)的小数n。

算术编码工作原理

在给定符号集和符号概率的情况下，算术编码可以给出接近最优的编码结果。使用算术编码的压缩算法通常先要对输入符号的概率进行估计，然后再编码。这个估计越准，编码结果就越接近最优的结果。

例: 对一个简单的信号源进行观察，得到的统计模型如下：

60% 的机会出现符号中性
20% 的机会出现符号阳性
10% 的机会出现符号阴性
10% 的机会出现符号数据结束符. (出现这个符号的意思是该信号源'内部中止'，在进行数据压缩时这样的情况是很常见的。当第一次也是唯一的一次看到这个符号时，解码器就知道整个信号流都被解码完成了。)

算数编码可以处理的例子不止是这种只有四种符号的情况，更复杂的情况也可以处理，包括高阶的情况。所谓高阶的情况是指当前符号出现的概率受之前出现符号的影响，这时候之前出现的符号，也被称为上下文。比如在英文文档编码的时候，例如，在字母Q或者q出现之后，字母u出现的概率就大大提高了。这种模型还可以进行自适应的变化，即在某种上下文下出现的概率分布的估计随着每次这种上下文出现时的符号而自适应更新，从而更加符合实际的概率分布。不管编码器使用怎样的模型，解码器也必须使用同样的模型。

编码过程的每一步，除了最后一步，都是相同的。编码器通常需要考虑下面三种数据：

下一个要编码的符号
当前的区间(在编第一个符号之前，这个区间是[0,1), 但是之后每次编码区间都会变化)
模型中在这一步可能出现的各个符号的概率分布(像前面提到的一样，高阶或者自适应的模型中，每一步的概率并不必须一样)

编码其将当前的区间分成若干子区间，每个子区间的长度与当前上下文下可能出现的对应符号的概率成正比。当前要编码的符号对应的子区间成为在下一步编码中的初始区间。

例: 对于前面提出的4符号模型:

中性对应的区间是 [0, 0.6)
阳性对应的区间是 [0.6, 0.8)
阴性对应的区间是 [0.8, 0.9)
数据结束符对应的区间是 [0.9, 1)

当所有的符号都编码完毕，最终得到的结果区间即唯一的确定了已编码的符号序列。任何人使用该区间和使用的模型参数即可以解码重建得到该符号序列。

实际上我们并不需要传输最后的结果区间，实际上，我们只需要传输该区间中的一个小数即可。在实用中，只要传输足够的该小数足够的位数(不论几进制)，以保证以这些位数开头的所有小数都位于结果区间就可以了。

例: 下面对使用前面提到的4符号模型进行编码的一段信息进行解码。编码的结果是0.538(为了容易理解，这里使用十进制而不是二进制；我们也假设我们得到的结果的位数恰好够我们解码。下面会讨论这两个问题)。

We start, as the encoder did, with the interval [0,1), and using the same model, we divide it into the same four sub-intervals that the encoder must have. Our fraction 0.538 falls into the sub-interval for NEUTRAL, [0, 0.6); this indicates to us that the first symbol the encoder read must have been NEUTRAL, so we can write that down as the first symbol of our message.

We then divide the interval [0, 0.6) into sub-intervals:

the interval for NEUTRAL would be [0, 0.36) -- 60% of [0, 0.6)
the interval for POSITIVE would be [0.36, 0.48) -- 20% of [0, 0.6)
the interval for NEGATIVE would be [0.48, 0.54) -- 10% of [0, 0.6)
the interval for END-OF-DATA would be [0.54, 0.6). -- 10% of [0, 0.6)

Our fraction of .538 is within the interval [0.48, 0.54); therefore the second symbol of the message must have been NEGATIVE.

Once more we divide our current interval into sub-intervals:

the interval for NEUTRAL would be [0.48, 0.516)
the interval for POSITIVE would be [0.516, 0.528)
the interval for NEGATIVE would be [0.528, 0.534)
the interval for END-OF-DATA would be [0.534, 0.540).

Our fraction of .538 falls within the interval of the END-OF-DATA symbol; therefore, this must be our next symbol. Since it is also the internal termination symbol, it means our decoding is complete. (If the stream was not internally terminated, we would need to know where the stream stops from some other source -- otherwise, we would continue the decoding process forever, mistakenly reading more symbols from the fraction than were in fact encoded into it.)

The same message could have been encoded by the equally short fractions .534, .535, .536, .537 or .539 suggests that our use of decimal instead of binary introduced some inefficiency. This is correct; the information content of a three-digit decimal is approximately 9.966 bits; we could have encoded the same message in the binary fraction .10001010 (equivalent to .5390625 decimal) at a cost of only 8 bits. This is only slightly larger than the information content, or entropy of our message, which with a probability of 0.6% has an entropy of approximately 7.381 bits. (Note that the final zero must be specified in the binary fraction, or else the message would be ambiguous.)

[ 编辑]

精度和再归一化

The above explanations of arithmetic coding contain some simplification. In particular, they are written as if the encoder first calculated the fractions representing the endpoints of the interval in full, using infinite precision, and only converted the fraction to its final form at the end of encoding. Rather than try to simulate infinite precision, most arithmetic coders instead operate at a fixed limit of precision that they know the decoder will be able to match, and round the calculated fractions to their nearest equivalents at that precision. An example shows how this would work if the model called for the interval [0,1) to be divided into thirds, and this was approximated with 8 bit precision. Note that now that the precision is known, so are the binary ranges we'll be able to use.

Symbol	Probability (expressed as fraction)	Interval reduced to eight-bit precision (as fractions)	Interval reduced to eight-bit precision (in binary)	Range in binary
A	1/3	[0, 85/256)	[0.00000000, 0.01010101)	00000000 - 01010100
B	1/3	[85/256, 171/256)	[0.01010101, 0.10101011)	01010101 - 10101010
C	1/3	[171/256, 1)	[0.10101011, 1.00000000)	10101011 - 11111111

A process called renormalization keeps the finite precision from becoming a limit on the total number of symbols that can be encoded. Whenever the range is reduced to the point where all values in the range share certain beginning digits, those digits are sent to the output. However many digits of precision the computer can handle, it is now handling fewer than that, so the existing digits are shifted left, and at the right, new digits are added to expand the range as widely as possible. Note that this result occurs in two of the three cases from our previous example.

Symbol	Probability	Range	Digits that can be sent to output	Range after renormalization
A	1/3	00000000 - 01010100	0	00000000 - 10101001
B	1/3	01010101 - 10101010	None	01010101 - 10101010
C	1/3	10101011 - 11111111	1	01010110 - 11111111

[ 编辑]

算术编码和其他压缩方法的联系

[ 编辑]

哈夫曼编码

在算术编码和哈夫曼编码之间有很大的相似性 -- 实际上，哈夫曼编码只是算术编码的一个特例 -- 但是由于算术编码将整个消息翻译成一个表示为基数 b,而不是将消息中的每个符号翻译成一系列的以b为基数的数字，它通常比哈夫曼编码更能达到最优熵编码。

[ 编辑]

距离编码

算术编码与距离编码又很深的相似渊源，它们如此相似以至于通常认为它们的性能是相同的，如果确实有什么不同的话也只是距离编码仅仅落后几个位的值而已。距离编码与算术编码不同，通常认为它不被任何公司的专利所涵盖。

The idea behind range encoding is that, instead of starting with the interval [0,1) and dividing it into sub-intervals proportional to the probability of each symbol, the encoder starts with a large range of non-negative integers, such as 000,000,000,000 to 999,999,999,999, and divides it into sub-ranges proportional to the probability of each symbol. When the sub-ranges get narrowed down sufficiently that the leading digits of the final result are known, those digits may be shifted "left" out of the calculation, and replaced by digits shifted in on the "right" -- each time this happens, it is roughly equivalent to a retroactive multiplication of the size of the initial range.

[ 编辑]

关于算术编码的美国专利

A variety of specific techniques for arithmetic coding have been protected by US patents. Some of these patents may be essential for implementing the algorithms for arithmetic coding that are specified in some formal international standards. When this is the case, such patents are generally available for licensing under what are called reasonable and non-discriminatory (RAND) licensing terms (at least as a matter of standards-committee policy). In some well-known instances (including some involving IBM patents) such licenses are available for free, and in other instances, licensing fees are required. The availability of licenses under RAND terms does not necessarily satisfy everyone who might want to use the technology, as what may be "reasonable" fees for a company preparing a proprietary software product may seem much less reasonable for a free software or open source project.

One company well known for innovative work and patents in the area of arithmetic coding is IBM. Some commenters feel that the notion that no kind of practical and effective arithmetic coding can be performed without infringing on valid patents held by IBM or others is just a persistent urban legend in the data compression community (especially considering that effective designs for arithmetic coding have now been in use long enough for many of the original patents to have expired). However, since patent law provides no "bright line" test that proactively allows you to determine whether a court would find a particular use to infringe a patent, and as even investigating a patent more closely to determine what it actually covers could actually increase the damages awarded in an unfavorable judgement, the patenting of these techniques has nevertheless caused a chilling effect on their use. At least one significant compression software program, bzip2, deliberately discontinued the use of arithmetic coding in favor of Huffman coding due to the patent situation.

Some US patents relating to arithmetic coding are listed below.

Patent 4,122,440 — (IBM) Filed March 4, 1977, Granted Oct 24, 1978 (Now expired)
Patent 4,286,256 — (IBM) Granted Aug 25, 1981 (presumably now expired)
Patent 4,467,317 — (IBM) Granted Aug 21, 1984 (presumably now expired)
Patent 4,652,856 — (IBM) Granted Feb 4, 1986 (presumably now expired)
Patent 4,891,643 — (IBM) Filed 1986/09/15, granted 1990/01/02
Patent 4,905,297 — (IBM) Granted Feb 27, 1990
Patent 4,933,883 — (IBM) Granted Jun 12, 1990
Patent 4,935,882 — (IBM) Granted Jun 19, 1990
Patent 4,989,000 — (???) Filed 1989/06/19, granted 1991/01/29
Patent 5,099,440
Patent 5,272,478 — (Ricoh)

Note: This list is not exhaustive. See the following link for a list of more patents. [1]

Patents on arithmetic coding may exist in other jurisdictions, see software patents for a discussion of the patentability of software around the world.

haiqianjing

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
算术编码工作原理

算术编码是一种无损数据压缩方法，也是一种熵编码的方法。和其它熵编码方法不同的地方在于，其他的熵编码方法通常是把输入的消息分割为符号，然后对每个符号进行编码，而算术编码是直接把整个输入的消息编码为一个数，一个满足(0.0 ≤ n 目录 [隐藏]
复制链接

扫一扫

专栏目录