作业｜LZW编解码思想分析

最新推荐文章于 2022-07-07 15:34:12 发布

weixin_52188502

最新推荐文章于 2022-07-07 15:34:12 发布

阅读量200

点赞数

分类专栏：作业

本文链接：https://blog.csdn.net/weixin_52188502/article/details/115983665

版权

作业专栏收录该内容

12 篇文章 0 订阅

订阅专栏

LZW编解码基本思想

LZW编码是围绕称为词典的转换表来完成的。LZW编码器通过管理这个词典完成输入与输出之间的转换。LZW编码器的输入是字符流，字符流可以是用8位ASCII字符组成的字符串，而输出是用n位(例如12位)表示的码字流。

编码思想

在这里插入图片描述
EG：编码 abccbc
映射关系：

符号	字符
0	a
1	b
2	c

编码过程：

steps	P	C	P+C	P+C是否在词典里	new P	新映射关系	out
1	-	a	a	V	a	-	-
2	a	b	ab	X	b	ab:3	0
3	b	c	bc	X	c	bc:4	1
4	c	c	cc	X	c	cc:5	2
5	c	b	cb	X	b	cb:6	2
6	b	c	bc	V	bc	-	-
7	bc	-	-	-	-	-	4

输出结果：01224 即 abccc
更新后映射关系：

符号	字符
0	a
1	b
2	c
3	ab
4	bc
5	cc
6	cb

解码思想

在这里插入图片描述
解码：01224
映射关系：

符号	字符
0	a
1	b
2	c

编码过程：

steps	Pw	Cw	Cw是否在词典里	action	out
1	-	0	V	P=a	a
2	0	1	V	P=a,C=b,P+C=ab,ab:3	b
3	1	2	V	P=b,C=c,P+C=bc,bc:4	c
4	2	2	V	P=c,C=c,P+C=cc,cc:5	c
5	2	4	V	P=c,C=ab,P+C=cab,cab:6	bc

输出为：abccbc

出现重复情况

假设输入是abababa，就会出现Cw不在词典里的情况，因为其编码应该为0146，而输出时，输出ab后，再进来的Cw=6，此时Cw在词典里是没有的（词典里此时只有symbol为0,1，2,3，4，5的映射），这时候应该把aba->6(ab为Str（Pw）＋P的首字符串）写入词典并把aba输出。
Cw不存在在词典里的情况只会是在编码时刚把该字符串存入字典又马上来了一个相同的字符串时出现。

POINT：
1.一边解码，一边重建词条。
2.比编码器晚一个字符。

实验过程

解释程序

void LZWEncode( FILE *fp, BITFILE *bf){
	int character;
	int string_code;
	int index;
	unsigned long file_length;

	fseek( fp, 0, SEEK_END);
	file_length = ftell( fp);
	fseek( fp, 0, SEEK_SET);
	BitsOutput( bf, file_length, 4*8);
	InitDictionary();
	string_code = -1;
	while( EOF!=(character=fgetc( fp))){
		index = InDictionary( character, string_code);
		if( 0<=index){	// string+character in dictionary
			string_code = index;
		}else{	// string+character not in dictionary
			output( bf, string_code);
			if( MAX_CODE > next_code){	// free space in dictionary
				// add string+character to dictionary
				AddToDictionary( character, string_code);
			}
			string_code = character;
		}
	}
	output( bf, string_code);
}

//进行LZW解码
void LZWDecode( BITFILE *bf, FILE *fp){
	int character; 
	int new_code, last_code=-1;
	int phrase_length; 
	unsigned long file_length; 

	file_length = BitsInput( bf, 4*8);//BitsInput是根据bf和代号的长度，计算出有多少个字符，返回个数给filelength
	if( -1 == file_length) file_length = 0; 
	/*需填充*/
	InitDictionary(); //先初始化词典
	while (file_length > 0) {
		new_code = input(bf); //读入一个代号
		if (new_code >= next_code) {//读入的这个代号比字典代号最大值大，不在字典里
			d_stack[0] = character; // 当前字符代号先记录在栈里，也就是这个字符串的尾部是当前字符
			phrase_length = DecodeString(1, last_code);//解出字符，存入d_stack栈里
		}
		else //如果在字典里
		{
			phrase_length = DecodeString(0, new_code);//解出字符，存入d_stack栈里
		}
		character = d_stack[phrase_length-1]; // 更新下一个字符为当前字符串首字符
		while (0 < phrase_length)//输出 {
			phrase_length--;
			fputc(d_stack[phrase_length],fp); 
			file_length--;
		}
		if (MAX_CODE > next_code) {//当字典还有词条空间的时候
			AddToDictionary(character, last_code);//将字符加入到字典中
		}
		last_code = new_code;//更新字典条数last_code为最新的new_code
	}
}

文本文件压缩

在这里插入图片描述
请添加图片描述

压缩文件大小对比

在这里插入图片描述

文件类型	压缩前大小(kb)	压缩后大小(kb)	压缩效率(%)
xls	24	17	29.2
txt	2	3	-50.0
psd	352	169	52.0
ppt	152	152	0
png	9	1	88.9
mp4	109	76	30.3
mp3	7	5	28.6
jpg	240	306	17.3
gif	1213	1492	-23.0
docx	15	25	-66.7

经过实验我们可以得到并不是所有文件用LZW压缩都可以达到较好的压缩效果，甚至有些文件压缩后更大了，我觉得是因为文件内容的重复率问题，重复率越高，所建字典越小，则压缩效率越高。

weixin_52188502

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
作业｜LZW编解码思想分析

LZW编解码基本思想LZW编码是围绕称为词典的转换表来完成的。LZW编码器通过管理这个词典完成输入与输出之间的转换。LZW编码器的输入是字符流，字符流可以是用8位ASCII字符组成的字符串，而输出是用n位(例如12位)表示的码字流。编码思想EG：编码 abccbc映射关系：符号字符0a1b2c编码过程：stepsPCP+CP+C是否在词典里new P新映射关系out1-aaVa--2abab
复制链接

扫一扫