LZW算法详解

最新推荐文章于 2023-09-23 10:54:51 发布

特立独行的猪鸭

最新推荐文章于 2023-09-23 10:54:51 发布

阅读量5k

点赞数 1

分类专栏：算法毕业设计

本文链接：https://blog.csdn.net/qq_40285036/article/details/103826481

版权

算法同时被 2 个专栏收录

29 篇文章 1 订阅

订阅专栏

毕业设计

5 篇文章 0 订阅

订阅专栏

1. LZW算法简介

LZW算法又叫“串表压缩算法”就是通过建立一个字符串表，用较短的代码来表示较长的字符串来实现压缩，是一种无损压缩算法。
LZW压缩有三个重要的对象：数据流（CharStream）、编码流（CodeStream）和编译表（String Table）。在编码时，数据流是输入对象（文本文件的据序列），编码流就是输出对象（经过压缩运算的编码数据）；在解码时，编码流则是输入对象，数据流是输出对象；而编译表是在编码和解码时都须要用借助的对象。
其中在编码和解码时编译表（下面称为字典），是中间产物，在编码和解码后删除即可。

2. LZW编码算法手动模拟

现在我们先假设一个简单的场景以便我们理解，假设我们现在对 ILOVEYOUILOVEYOU 这一串只有大写字母A-Z的序列进行压缩。
因为这段序列中只有字母A-Z那么我们只需要下面这个字典就可以一定可以编码这个序列。

Code	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16	17	18	19	20	21	22	23	24	25	26
Seq	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S	T	U	V	W	X	Y	Z

现在我们设想如果字典里有我们要编码的序列的子序列的话，例如字典中有（27，“YOU”）这一项，那么我们编码序列的长度与原来相比肯定更优，LZW就是基于这一思想，下面我们具体看看。

Current 表示当前编码的子序列。子序列选取的标准，例如对“IL“进行编码，字典中有条目（1，”I“）和（2，”L“），（3，”IL“）这时，依据字典编码有两种编码方式一种是12，一种是3，我们选取的标准就是，从字典中选取最长的编码序列对子序列进行编码。
Next表示Current的后一个字符
Code 是序列的编码。
Insert（Code，Seq）是向字典中插入一条数据。

Current	Next	Code	Insert(Code,Seq)
I	L	09	（27，IL）
L	O	12	（28，LO）
O	V	15	（29，OV）
V	E	22	（30，VE）
E	Y	05	（31，EY）
Y	O	25	（32，YO）
O	U	15	（33，OU）
U	I	21	（34，UI）
IL	O	27	（35，ILO）
OV	E	29	（36，OVE）
EY	O	31	（37，EYO）
OU	结束	33

Code序列就是编码序列。不难看出，假设我们对每个字符用1B进行编码，那么ILOVEYOUILOVEYOU序列总共需要16B，若采用LZW压缩算法，那么只需要12B就可以进行编码，只不过压缩时需要维护一个字典，牺牲了空间，但是传输过程不需要传输字典，字典在解码过程中重新生成即可。

3. LZW解码过程手动模拟

只要大家看懂上面的编码过程以后，解码过程一目了然。

Code: 表示编码序列
Prev: 表示当前解码出来的字符串的上一个字符串。
Text: 表示解码的字符串。
Insert(Code, Seq): 表示向字典中插入一条数据，字典中初始数据是1至26对应‘A’-‘Z’。

Code	Prev	Text	Insert(Code,Seq)
9	无	I	无
12	I	L	(27,“IL”)
15	L	O	(28,“LO”)
22	O	V	(29,“OV”)
5	V	E	(30,“VE”)
25	E	Y	(31,“EY”)
15	Y	O	(32,“YO”)
21	O	U	(33,“OU”)
27	U	IL	(34,“UI”)
29	IL	OV	(35,“ILO”)
31	OV	EY	(36,“OVE”)
33	EY	OU	(37,“EYO”)

4. 简单实现LZW编码解码过程的C++代码

#include <iostream>
#include <string>
#include <cstring>

#define NotExist -1
#define MaxSize 1000

using namespace std;

typedef struct Dictionary{
	char	**seq;		// 字符串集合
	int		*code;		// 字符串编码的集合
	int		size;		// 字典的大小
	int		maxsize;	// 字典的最大长度
}Dictionary, *PDictionary;

// 向字典中插入一条数据
void insert_seq(PDictionary dict, int code, char* seq)
{
	if (dict->size == dict->maxsize){
		printf("字典已满，插入失败!");
		return;
	}
	int i = dict->size;
	dict->code[i] = code;
	dict->seq[i] = (char*)malloc(sizeof(char) * strlen(seq) + 1);
	strcpy(dict->seq[i], seq);
	dict->size++;
	return;
}

// 初始化字典
void initDict(PDictionary dict, int maxsize)
{
	dict->maxsize = maxsize;
	dict->size = 0;
	dict->code = (int*)malloc(sizeof(int) * maxsize);
	dict->seq = (char**)malloc(sizeof(char*) * maxsize + 1);

	// 初始化时先放入‘A’~‘Z’
	char seq[2] = "A";
	for (int i = 0; i < 26; i++){
		insert_seq(dict, i, seq);
		seq[0]++;
	}
	return;
}

// 打印字典
void print_dict(PDictionary dict)
{
	printf("===================\n");
	printf("Code            Seq\n");
	for (int i = 0; i < dict->size; i++)
	{
		printf("%4d%7s\n", dict->code[i], dict->seq[i]);
	}
	return;
}

// 查找字典中有无seq
int is_exist_seq(PDictionary dict, char *seq)
{
	for (int i = 0; i < dict->size; i++)
	{
		if (strcmp(seq, dict->seq[i]) == 0){
			return dict->code[i];
		}
	}
	return NotExist;
}

// 根据编码查找Seq
char* find_seq(PDictionary dict, int code)
{
	return dict->seq[code];
}

// LZW编码
void lzw_encode(PDictionary dict, char* text)
{
	char cur[MaxSize];
	char next;
	int code;
	char seq[MaxSize];
	int i = 0;
	int len = strlen(text);
	while (i < len)
	{
		sprintf(cur, "%c", text[i]);
		
		while (is_exist_seq(dict, cur) != NotExist && i < len)
		{
			i++;
			sprintf(cur, "%s%c", cur, text[i]);
		}
		
		if (i < len) {
			insert_seq(dict, dict->size, cur);
			cur[strlen(cur) - 1] = '\0';
		}
		code = is_exist_seq(dict, cur);
		printf("%d,", code, cur);
	}
	return;
}

// LZE解码
void lzw_decode(PDictionary dict, int *code, int size)
{
	char *pre = NULL;
	int i = 0;
	char str[MaxSize];

	while (i < size)
	{
		strcpy(str, find_seq(dict, code[i]));
		printf("%s", str);
		if (pre != NULL)
		{
			char tmp[MaxSize];
			sprintf(tmp, "%s%c", pre, str[0]);
			insert_seq(dict, dict->size, tmp);
		}
		pre = (char*)malloc(sizeof(char) * strlen(str) + 1);
		strcpy(pre, str);
		i++;
	}
	printf("\n");
	return;
}

int main()
{
	Dictionary dict;
	int code[] = { 8, 11, 14, 21, 4, 24, 14, 20, 26, 28, 30, 32, 3, 14, 31, 20, 27, 29, 12, 4, 38, 40, 42, 4, 44 };
	int size = 25;

	initDict(&dict, MaxSize);
	//lzw_encode(&dict, "ILOVEYOUILOVEYOUDOYOULOVEMEDOYOULOVEME");
	lzw_decode(&dict, code, size);
	//print_dict(&dict);
	return 0;
}

特立独行的猪鸭

关注

1
点赞
踩
19

收藏

觉得还不错? 一键收藏
0
评论
LZW算法详解

1. LZW算法简介LZW算法又叫“串表压缩算法”就是通过建立一个字符串表，用较短的代码来表示较长的字符串来实现压缩，是一种无损压缩算法。LZW压缩有三个重要的对象：数据流（CharStream）、编码流（CodeStream）和编译表（String Table）。在编码时，数据流是输入对象（文本文件的据序列），编码流就是输出对象（经过压缩运算的编码数据）；在解码时，编码流则是输入对象，数据流...
复制链接

扫一扫

专栏目录