第五次作业：LZW编码

最新推荐文章于 2022-07-07 15:34:12 发布

一点点晚风一

最新推荐文章于 2022-07-07 15:34:12 发布

阅读量370

点赞数

文章标签：大数据

本文链接：https://blog.csdn.net/m0_68377615/article/details/125471684

版权

本文详细阐述了LZW编码的工作原理，包括编码和解码步骤，以及如何构建词典。介绍了编码器和解码器的实现，以及通过实际例子展示编码和解码的过程。重点在于如何利用LZW压缩数据，以及它在不同文件类型中的压缩效果分析。

摘要由CSDN通过智能技术生成

一、概述

LZW的编码思想是不断地从字符流中提取新的字符串，通俗地理解为新 “ 词条 ” ，然后用 “ 代号 ” 也就是码字表示这个 “ 词条 ” 。这样一来，对字符流的编码就变成了用码字去替换字符流，生成码字流，从而达到压缩数据的目的。
LZW编码是围绕称为词典的转换表来完成的， LZW 编码器通过管理这个词典完成输入与输出之间的转换。
LZW编码器的输入是字符流，字符流可以是用 8 位 ASCII 字符组成的字符串，而输出是用 n 位 ( 例如 12 位 ) 表示的码字流。

二、编码

步骤1：将词典初始化为包含所有可能的单字符，当前前缀P初始化为空。

步骤2：当前字符C=字符流中的下一个字符。

步骤3：判断P＋C是否在词典中

（1）如果“是”，则用C扩展P，即让P=P＋C，返回到步骤2；

（2）如果“否”，则输出与当前前缀P相对应的码字W；将P＋C添加到词典中；令P=C，并返回到步骤2。

注：LZW编码算法首先初始化词典，然后顺序从待压缩文件中读入字符并按照上述算法执行编码，最后将编得的码字流输出至文件中。

1.词典树的构建及初始化

#define MAX_CODE 256*256     //规定词典数目上限
 
//构造词典树
struct {   
	int suffix;    //当前字符的尾缀
	int parent, firstchild, nextsibling;  //当前节点对应的母节点、第一个孩子节点、下一个兄弟节点
} dictionary[MAX_CODE + 1];
int next_code;
int d_stack[MAX_CODE]; //存储解码后的信息
 
//初始化词典
void InitDictionary(void) {
	for (int i = 0; i < 256; i++) {    //单个字符写入词典
		dictionary[i].suffix = i;   //尾缀字符
		dictionary[i].parent = -1;   //母节点
		dictionary[i].firstchild = -1;   //第一个孩子节点
		dictionary[i].nextsibling = i + 1;   //下一个（右边的）兄弟节点
	}
	dictionary[255].nextsibling = -1;   //第一层最后一个词典的兄弟节点
	next_code = 256;   //下一个词条的编码
}

词典中默认包含ascii码对应的256个字符，如果想要写入新的词条，则需要从第256个位置开始

2.判断字符是否在词典中

int InDictionary(int character, int string_code) {   //判断词典中是否有当前字符character（尾缀），string_code是旧词条（前缀）
	int s;    //表示字符在词典中的位置
	//string_code=-1，说明是单个字符，已经存在词典中了，直接返回当前字符即可
	if (string_code < 0)  return character;  
	//否则从string_code的第一个孩子节点开始找，如果尾缀相同，返回这个尾缀，始终未找到，则返回-1
	s = dictionary[string_code].firstchild;   
	while (s > -1) {
		if (character == dictionary[string_code].suffix)
			return s;
		s = dictionary[string_code].nextsibling;
	}
	return -1;
}

3.在词典树中添加新字符

void AddToDictionary(int character, int string_code) {  //读入的新字符character（尾缀）和旧词条string_code（前缀）
	int s1, s2;
	//如果string_code=-1，说明是单个字符，直接返回即可
	if (string_code < 0)  return;
	//初始化下一个词条信息
	dictionary[next_code].suffix = character;
	dictionary[next_code].parent = string_code;
	dictionary[next_code].firstchild = -1;
	dictionary[next_code].nextsibling = -1;
	//找到链接的前缀
	s1 = dictionary[string_code].firstchild;
	if (s1 < 0) {
		dictionary[string_code].firstchild = next_code;
	}
	else {
		s2 = s1;
		while (dictionary[s2].nextsibling>-1) {
			s2 = dictionary[s2].nextsibling;
		}
		dictionary[string_code].firstchild = next_code;
	}
	next_code++;
}

4.编码Encode

void LZWEncode(FILE* fp, BITFILE* bf) {
	int character;    //新字符
	int string_code;    //已编码字符，旧词条
	int index;      //索引
	unsigned long file_length;    //文件长度
 
	fseek(fp, 0, SEEK_END);   //文件指针置文件尾
	file_length = ftell(fp);  //获取文件长度
	fseek(fp, 0, SEEK_SET);   //文件指针置文件头
	BitsOutput(bf, file_length, 4 * 8);   //调用BitsOutput函数
	InitDictionary();    //初始化词典
	string_code = -1;    //初始值赋值为-1，方便在第一次判断的时候判断读取是否为单个字符
	while (EOF != (character = fgetc(fp))) {  
		 //fgetc是从文件中读取一个字符，EOF是文件结束的标志，从文件中读取字符，直到读到结束标志
		 //fgetc是从文件指针stream指向的文件中读取一个字符，读取一个字节后，光标位置后移一个字节。
		index = InDictionary(character, string_code);    //判断当前字符是否在词典中，返回字符在词典中的index，如果不在则返回-1
		if (0 <= index) {	//P+C已经在词典中了
			string_code = index;  //P<-P+C
		}
		else {	
			output(bf, string_code);   //重定义的输出，向编码后的文件中输出旧词条
			if (MAX_CODE > next_code) {	// 如果词典还有空间
				AddToDictionary(character, string_code);    //将P+C添加到词典中
			}
			string_code = character;   //当前字符变为了旧字符，P<-C
		}
	}
	output(bf, string_code);  //循环读完文件后输出最后一个旧字符
}