用c语言实现信息熵和信息量,[牛羚的压缩算法tutorial]-[chapter-1]-信息熵与HelloWorld...

最新推荐文章于 2021-05-24 02:17:27 发布

郭逗

最新推荐文章于 2021-05-24 02:17:27 发布

阅读量153

点赞数

文章标签：用c语言实现信息熵和信息量

该楼层疑似违规已被系统折叠隐藏此楼查看此楼

对hello_compress的改进：

在下一章中我们这会讲到，通过前文来预测后文的结构称为“预测模型(predictor/model)”，而使用预测模型处理数据来去除数据冗余的结构称为“编码器(coder)”，这是一个压缩算法最重要的两个部分，在hello_compress中，我们使用的是简单记录最近的三字符短语的模型，编码器则是简单用一bit位来表示预测结果，所以这个hello_compress是一个简单、低效的压缩器。这里我们简单列举一下高级的模型和编码器：

模型有PPM(Prediction by Partial Matching)、DMC(Dynamic Markov Compression)、CTW(Context Weighting Tree)、CM(Context Mixing)等。

编码器有Shanon-Fano编码、Huffman编码、数字编码、区间编码等。

在后几节我可能会用高效的模型和编码器在实现一个与WinRAR效果相近的压缩器，现在我们先来继续看我们的hello_compress。

在hello_compress中，我们只用了两个字符来存放最近的数据(这两个字符被称为上下文Context)，我们可以增加到3个字符，但是predictor_map也要增加到3维，如果增加到4个字符的话，predictor_map就完全吃不消了，这里我们采用一个策略：减少每个字符的精确度，增加记录字符的个数。改进的程序如下：

// file: hello_compress_improved

#include

#include

int main(int argc, char** argv)

{

static char stdin_buffer[65536];

static char stdout_buffer[65536];

static unsigned char predict_map[65536];

unsigned short context = 0;

unsigned char block[8];

unsigned char flag_byte = 0;

int flag_pos = 0;

int block_pos = 0;

int i;

int next_char;

int out_char;

setbuffer(stdin, stdin_buffer, sizeof(stdin_buffer));

setbuffer(stdout, stdout_buffer, sizeof(stdout_buffer));

if(argc == 2 && strcmp(argv[1], "encode") == 0)

{

// compress

while((next_char = getchar()) != EOF)

{

// write to file if a flag_byte is full

if(flag_pos == 8)

{

putchar(flag_byte);

for(i = 0; i < block_pos; i++)

{

putchar(block[i]);

}

flag_byte = 0;

flag_pos = 0;

block_pos = 0;

}

// fill a flag byte

if(next_char == predict_map[context])

flag_byte |= (1 << flag_pos);

else

block[block_pos++] = next_char;

flag_pos += 1;

predict_map[context] = next_char;

context = (context << 4) ^ (next_char & 0xff);

}

// write last block

if(flag_pos > 0)

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
用c语言实现信息熵和信息量,[牛羚的压缩算法tutorial]-[chapter-1]-信息熵与HelloWorld...

该楼层疑似违规已被系统折叠隐藏此楼查看此楼对hello_compress的改进：在下一章中我们这会讲到，通过前文来预测后文的结构称为“预测模型(predictor/model)”，而使用预测模型处理数据来去除数据冗余的结构称为“编码器(coder)”，这是一个压缩算法最重要的两个部分，在hello_compress中，我们使用的是简单记录最近的三字符短语的模型，编码器则是简单用一bit位来表示预...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。