实验三 LZW编解码算法实现与分析

最新推荐文章于 2024-03-29 22:09:35 发布

Geronimo620

最新推荐文章于 2024-03-29 22:09:35 发布

阅读量1.1k

点赞数 1

分类专栏： Experiment 文章标签： c++

本文链接：https://blog.csdn.net/weixin_43175007/article/details/105694520

版权

Experiment 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

LZW简述

本部分参考wiki https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Welch

LZW压缩算法在1978年提出，由 Abraham Lempel, Jacob Ziv, and Terry Welch发明，因此称为LZW算法，是第一种在全世界计算机中广泛应用的压缩算法。

LZW是一种自适应词典编码，即对文件中的数据串进行分析，根据数据具体情况生成码字，字符串相对应的词典，在词典的基础上进行编解码。作为自适应词典编码，与一般词典编码不同处在于：一般词典编码需要遍历两遍文件，第一遍创造词典，第二遍进行编码，且传输数据时需要传送词典信息。自适应词典编码则只遍历一遍，一边创造词典一边编码，也不需要传送词典信息。

LZW编解码示例分析

从实例可以更容易理解LZW的编解码原理。

LZW编码

先介绍LZW编码基本步骤：

 1. 初始状态，字典里只有所有的默认项，例如0->a，1->b，2->c。此时P和C都是空的。
 2. 读入新的字符C，与P合并形成字符串P+C。
 3. 在字典里查找P+C，如果:
    - P+C在字典里，P=P+C。
    - P+C不在字典里，将P的记号输出；在字典中为P+C建立一个记号映射；更新P=C。
 4. 返回步骤2重复，直至读完原字符串中所有字符。

一般来说，默认词典即字符与其对应的ASCII码，即单个字符的词典。

假设对这么一串数据进行LZW编码：

ababbcbacb

j假设初始词典： ‘a’->0; ‘b’->1; ‘c’->2

具体过程如下

Step	P	C	P+C	P+C in dict？	Action	Output
1	—	a	a	Y	next P = ‘a’	—
2	a	b	ab	N	next P = ‘b’; 3 <- ‘ab’	0
3	b	a	ba	N	next P = 'a; 4 <- ‘ba’	1
4	a	b	a	Y	next P = ‘ab’;	—
5	ab	b	abb	N	next P = ‘b’; 5 <- ‘abb’	3
6	b	c	bc	N	next P = ‘c’; 6 <- ‘bc’	1
7	c	b	cb	N	next P = 'b; 7 <- ‘cb’	2
8	b	a	ba	Y	next P = ‘ba’	—
9	ba	c	bac	N	next P = ‘a’; 8 <- ‘bac’	4
10	c	b	cb	Y	next P = ‘cb’	—
11	cb	—	—	—	—	7

这串数据就可译成:0131247。

LZW解码

再将已编码的数据重新解码，解码算法比编码要复杂一些

1. 初始状态，字典里只有所有的默认项，例如0->a，1->b，2->c。此时pW和cW都是空的。
2. 读入第一个的符号cW，解码输出。注意第一个cW肯定是能直接解码的，而且一定是单个字符。
3. 赋值pW=cW。
4. 读入下一个符号cW。
5. 在字典里查找cW，如果:
   a. cW在字典里：
     (1) 解码cW，即输出 Str(cW)。
     (2) 令P=Str(pW)，C=Str(cW)的**第一个字符**。
     (3) 在字典中为P+C添加新的记号映射。
   b. cW不在字典里:
     (1) 令P=Str(pW)，C=Str(pW)的**第一个字符**。
     (2) 在字典中为P+C添加新的记号映射，这个新的记号一定就是cW。
     (3) 输出P+C。
6. 返回步骤3重复，直至读完所有记号。

其中最重要也是最难理解的时步骤5，通过实例理解会容易些。

将以下码字进行解码：

假设默认词典同上，解码具体过程如下，其中第一个字符肯定在默认词典中，可以顺利解码。

Step	pW	cW	cW in dict ?	Action	Output
1	—	0	—	—	a
2	0	1	Y	P = ‘a’; C = ‘b’; P+C = ‘ab’; 3 <- ‘ab’	b
3	1	3	Y	P = ‘b’; C = ‘a’; P+C = ‘ba’; 4 <- ‘ba’	ab
4	3	1	Y	P = ‘ab’; C = ‘b’; P+C = ‘abb’; 5 <- ‘abb’	b
5	1	2	Y	P = ‘b’; C = ‘c’; P+C = ‘bc’; 6 <- ‘bc’	c
6	2	4	Y	P = ‘c’; C = ‘b’; P+C = ‘cb’; 7 <- ‘cb’	ba
7	4	7	Y	P = ‘ba’; C = ‘c’; P+C = ‘bac’; 8 <- ‘bac’	cb

上述过程没有出现cW not in dict 的情况，这种情况下其他都相同，除了C赋值为pW的第一个字符，且输出为P+C。

解码后得到数据:ababbcbacb。

不仅得到了原数据，而且解码过程中创造的词典与编码时创造的一致。

LZW编解码的数据结构

继续分析之前的例子。

其实稍微分析我们就能看出，在编解码过程中添加的词典项目的前缀，往往在词典中早就存在。

因此，这些词典项目可以表示为：

3：‘ab’ = 0b
4: ‘ba’ = 1a
5: ‘abb’ = 3b
以此类推

有此可推出：为了便于LZW编解码的C++实现，本实验选择了字典树法的数据结构。

可创造字典树（查找树）：

在这里插入图片描述

对每一个词典项目，即每一个节点，可以定义类：

struct {
	int suffix;
	int parent;//default = -1
	int firstchild;//default = -1
	int nextsibling;//default = -1
} dictionary

suffix:当前项对应字符的最后一项，即从节点创造child时添加的末尾字符。
parent:当前节点对应的母节点的索引号，没有则为-1.
firstchild当前节点的第一个子节点（在图中显示为最左边的子节点）的索引号，没有则为-1.
nextsibling下一个同级节点的索引号，对‘ba’来说nextsibling为‘bc’，即6。没有则为-1

将这种思想引申到代码中，可以将编码/解码思想总结如下：

建立包含所有ascii字符的初始默认dictionary（0-255），int nextcode用于标记新写入的dictionary索引号
读取数据存入内存
依据上述的编码思想依次读取字符进行编码，同时对dictionary的属性进行操作
编码后输出文件。

LZW编解码代码分析：

实验所给出的具体代码如下：

bitio.h

/*
 * Declaration for bitwise IO
 *
 * vim: ts=4 sw=4 cindent
 */
#ifndef __BITIO__
#define __BITIO__

#include <stdio.h>

typedef struct{
	FILE *fp;
	unsigned char mask;
	int rack;
}BITFILE;

BITFILE *OpenBitFileInput( char *filename);
BITFILE *OpenBitFileOutput( char *filename);
void CloseBitFileInput( BITFILE *bf);
void CloseBitFileOutput( BITFILE *bf);
int BitInput( BITFILE *bf);
unsigned long BitsInput( BITFILE *bf, int count);
void BitOutput( BITFILE *bf, int bit);
void BitsOutput( BITFILE *bf, unsigned long code, int count);
#endif	// __BITIO__

bitio.c

/*
 * Definitions for bitwise IO
 *
 * vim: ts=4 sw=4 cindent
 */

#include <stdlib.h>
#include <stdio.h>
#include "bitio.h"
BITFILE *OpenBitFileInput( char *filename){
	BITFILE *bf;
	bf = (BITFILE *)malloc( sizeof(BITFILE));
	if( NULL == bf) return NULL;
	if( NULL == filename)	bf->fp = stdin;
	else bf->fp = fopen( filename, "rb");
	if( NULL == bf->fp) return NULL;
	bf->mask = 0x80;
	bf->rack = 0;
	return bf;
}

BITFILE *OpenBitFileOutput( char *filename){
	BITFILE *bf;
	bf = (BITFILE *)malloc( sizeof(BITFILE));
	if( NULL == bf) return NULL;
	if( NULL == filename)	bf->fp = stdout;
	else bf->fp = fopen( filename, "wb");
	if( NULL == bf->fp) return NULL;
	bf->mask = 0x80;
	bf->rack = 0;
	return bf;
}

void CloseBitFileInput( BITFILE *bf){
	fclose( bf->fp);
	free( bf);
}

void CloseBitFileOutput( BITFILE *bf){
	// Output the remaining bits
	if( 0x80 != bf->mask) fputc( bf->rack, bf->fp);
	fclose( bf->fp);
	free( bf);
}

int BitInput( BITFILE *bf){
	int value;

	if( 0x80 == bf->mask){
		bf->rack = fgetc( bf->fp);
		if( EOF == bf->rack){
			fprintf(stderr, "Read after the end of file reached\n");
			exit( -1);
		}
	}
	value = bf->mask & bf->rack;
	bf->mask >>= 1;
	if( 0==bf->mask) bf->mask = 0x80;
	return( (0==value)?0:1);
}

unsigned long BitsInput( BITFILE *bf, int count){
	unsigned long mask;
	unsigned long value;
	mask = 1L << (count-1);
	value = 0L;
	while( 0!=mask){
		if( 1 == BitInput( bf))
			value |= mask;
		mask >>= 1;
	}
	return value;
}

void BitOutput( BITFILE *bf, int bit){
	if( 0 != bit) bf->rack |= bf->mask;
	bf->mask >>= 1;
	if( 0 == bf->mask){	// eight bits in rack
		fputc( bf->rack, bf->fp);
		bf->rack = 0;
		bf->mask = 0x80;
	}
}

void BitsOutput( BITFILE *bf, unsigned long code, int count){
	unsigned long mask;

	mask = 1L << (count-1);
	while( 0 != mask){
		BitOutput( bf, (int)(0==(code&mask)?0:1));
		mask >>= 1;
	}
}
#if 0
int main( int argc, char **argv){
	BITFILE *bfi, *bfo;
	int bit;
	int count = 0;

	if( 1<argc){
		if( NULL==OpenBitFileInput( bfi, argv[1])){
			fprintf( stderr, "fail open the file\n");
			return -1;
		}
	}else{
		if( NULL==OpenBitFileInput( bfi, NULL)){
			fprintf( stderr, "fail open stdin\n");
			return -2;
		}
	}
	if( 2<argc){
		if( NULL==OpenBitFileOutput( bfo, argv[2])){
			fprintf( stderr, "fail open file for output\n");
			return -3;
		}
	}else{
		if( NULL==OpenBitFileOutput( bfo, NULL)){
			fprintf( stderr, "fail open stdout\n");
			return -4;
		}
	}
	while( 1){
		bit = BitInput( bfi);
		fprintf( stderr, "%d", bit);
		count ++;
		if( 0==(count&7))fprintf( stderr, " ");
		BitOutput( bfo, bit);
	}
	return 0;
}
#endif

lzw_E.c

/*
 * Definition for LZW coding 
 *
 * vim: ts=4 sw=4 cindent nowrap
 */
#include <stdlib.h>
#include <stdio.h>
#include "bitio.h"
#define MAX_CODE 65535

struct {
	int suffix;
	int parent, firstchild, nextsibling;
} dictionary[MAX_CODE+1];
int next_code;
int d_stack[MAX_CODE]; // stack for decoding a phrase

#define input(f) ((int)BitsInput( f, 16))
#define output(f, x) BitsOutput( f, (unsigned long)(x), 16)

int DecodeString( int start, int code);
void InitDictionary( void);
void PrintDictionary( void){
	int n;
	int count;
	for( n=256; n<next_code; n++){
		count = DecodeString( 0, n);
		printf( "%4d->", n);
		while( 0<count--) printf("%c", (char)(d_stack[count]));
		printf( "\n");
	}
}

int DecodeString( int start, int code){
	int count;
	count = start;
	while( 0<=code){
		d_stack[ count] = dictionary[code].suffix;
		code = dictionary[code].parent;
		count ++;
	}
	return count;
}
void InitDictionary( void){
	int i;

	for( i=0; i<256; i++){
		dictionary[i].suffix = i;
		dictionary[i].parent = -1;
		dictionary[i].firstchild = -1;
		dictionary[i].nextsibling = i+1;
	}
	dictionary[255].nextsibling = -1;
	next_code = 256;//下一个字典条目编号
}
/*
 * Input: string represented by string_code in dictionary,
 * Output: the index of character+string in the dictionary
 * 		index = -1 if not found
 */
int InDictionary( int character, int string_code){
	int sibling;
	if( 0>string_code) return character;//针对第一个字符没有母节点的情况
	sibling = dictionary[string_code].firstchild;//stringcode + character
	while( -1<sibling){
		if( character == dictionary[sibling].suffix) return sibling;
		sibling = dictionary[sibling].nextsibling;
	}
	return -1;
}

void AddToDictionary( int character, int string_code){
	int firstsibling, nextsibling;
	if( 0>string_code) return;
	dictionary[next_code].suffix = character;
	dictionary[next_code].parent = string_code;
	dictionary[next_code].nextsibling = -1;
	dictionary[next_code].firstchild = -1;
	firstsibling = dictionary[string_code].firstchild;
	if( -1<firstsibling){	// the parent has child
		nextsibling = firstsibling;
		while( -1<dictionary[nextsibling].nextsibling ) 
			nextsibling = dictionary[nextsibling].nextsibling;
		dictionary[nextsibling].nextsibling = next_code;
	}else{// no child before, modify it to be the first
		dictionary[string_code].firstchild = next_code;
	}
	next_code ++;
}

void LZWEncode( FILE *fp, BITFILE *bf){
	int character;
	int string_code;
	int index;
	unsigned long file_length;

	fseek( fp, 0, SEEK_END);
	file_length = ftell( fp);
	fseek( fp, 0, SEEK_SET);
	BitsOutput( bf, file_length, 4*8);
	InitDictionary();
	string_code = -1;
	while( EOF!=(character=fgetc( fp))){
		index = InDictionary( character, string_code);
		if( 0<=index){	// string+character in dictionary
			string_code = index;
		}else{	// string+character not in dictionary
			output( bf, string_code);
			if( MAX_CODE > next_code){	// free space in dictionary
				// add string+character to dictionary
				AddToDictionary( character, string_code);
			}
			string_code = character;//P=C
		}
	}
	output( bf, string_code);//对应最后一个字符没有suffix的情况
}

void LZWDecode( BITFILE *bf, FILE *fp){
	int character;
	int new_code, last_code;
	int phrase_length;
	unsigned long file_length;
	InitDictionary();
	file_length = BitsInput( bf, 4*8);//预存输出码字
	if( -1 == file_length) file_length = 0;
	/*需填充*/
	last_code = -1;//第一个码字前无码字，设为-1,且第一个码字一定在默认词典(ascii)中，所以使用character一定有有效值，不必赋初值
	while( 0<file_length){
		new_code = input( bf);
		if( new_code >= next_code){ // this is the case CSCSC( not in dict)
			d_stack[0] = character;///dstack[0]存储当前last_code第一位
			phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
		}else{//in dict
			phrase_length = DecodeString( 0, new_code);//解码cW
		}
		character = d_stack[phrase_length-1];//character取PW/CW的第一位
		while( 0<phrase_length){
			phrase_length --;
			fputc( d_stack[ phrase_length], fp);//倒序输出解码得到的字符
			file_length--;
		}
		if( MAX_CODE>next_code){	// add the new phrase to dictionary
			AddToDictionary( character, last_code);
		}
		last_code = new_code;
	}
}




int main( int argc, char **argv){
	FILE *fp;
	BITFILE *bf;

	if( 4>argc){
		fprintf( stdout, "usage: \n%s <o> <ifile> <ofile>\n", argv[0]);
		fprintf( stdout, "\t<o>: E or D reffers encode or decode\n");
		fprintf( stdout, "\t<ifile>: input file name\n");
		fprintf( stdout, "\t<ofile>: output file name\n");
		return -1;
	}
	if( 'E' == argv[1][0]){ // do encoding
		fp = fopen( argv[2], "rb");
		bf = OpenBitFileOutput( argv[3]);
		if( NULL!=fp && NULL!=bf){
			LZWEncode( fp, bf);
			fclose( fp);
			CloseBitFileOutput( bf);
			fprintf( stdout, "encoding done\n");
		}
	}else if( 'D' == argv[1][0]){	// do decoding
		bf = OpenBitFileInput( argv[2]);
		fp = fopen( argv[3], "wb");
		if( NULL!=fp && NULL!=bf){
			LZWDecode( bf, fp);
			fclose( fp);
			CloseBitFileInput( bf);
			fprintf( stdout, "decoding done\n");
		}
	}else{	// otherwise
		fprintf( stderr, "not supported operation\n");
	}
	system("pause");
	return 0;
}

需要着重分析的是lzw_E.c中的代码。

结构体定义与初始化部分

struct {
	int suffix;
	int parent, firstchild, nextsibling;
} dictionary[MAX_CODE+1];
int next_code;
int d_stack[MAX_CODE]; // stack for decoding a phrase

void InitDictionary( void){
	int i;

	for( i=0; i<256; i++){
		dictionary[i].suffix = i;
		dictionary[i].parent = -1;
		dictionary[i].firstchild = -1;
		dictionary[i].nextsibling = i+1;
	}
	dictionary[255].nextsibling = -1;
	next_code = 256;//下一个字典条目编号
}

定义了dictionary结构体，预留充足内存空间以写入新项目。

对结构体的初始化，即写入单字符ascii码词典的过程。需要注意的是此时所有词典都为一级节点，没有母节点，子节点，因此parent,firstchild都为-1,且第255项没有nextsibling，值为-1。

dstack[]再解码时用到，用来存储一次循环解码出的一串数据（字符串）。

全局变量next_code用于标记下一项将要写入词典的项目的索引号。

编码部分

void LZWEncode( FILE *fp, BITFILE *bf){
	int character;
	int string_code;
	int index;
	unsigned long file_length;

	fseek( fp, 0, SEEK_END);
	file_length = ftell( fp);
	fseek( fp, 0, SEEK_SET);
	BitsOutput( bf, file_length, 4*8);
	InitDictionary();
	string_code = -1;
	while( EOF!=(character=fgetc( fp))){
		index = InDictionary( character, string_code);
		if( 0<=index){	// string+character in dictionary
			string_code = index;
		}else{	// string+character not in dictionary
			output( bf, string_code);
			if( MAX_CODE > next_code){	// free space in dictionary
				// add string+character to dictionary
				AddToDictionary( character, string_code);
			}
			string_code = character;//P=C
		}
	}
	output( bf, string_code);//对应最后一个字符没有suffix的情况
}

共涉及到查找字典，添加词典条目以及解码三个函数，我们逐个分析。

先来看主函数LZWEncode：

在LZWEncode中，character用于存储当前新读取的单字符C（ascii码），index判断新读取的P+C对应的词典索引号（存在即为索引号，不存在即为-1）。string_code存放最终编码得到的码字并输出，bf为输出码字预存。

需要指出的是，在LZWEncode函数中，由于直接字符P+C实现难以操作，可以运用ascii码的特点，将字符首尾相连转换为两者对应词典索引值相加（新读取的单字符C必然在词典中）。用此索引值在dict中查询，若P+C在dict中，则返回其对应的索引值。然后nextP = C，string_code = index，并输出。

整个函数流程如下：

第一次读取：不输出，nextP=C（string_code = character），进行下一次读取。

读取下一个字符C，结合之前的到的P，首先用InDictionary函数返回的index值判断P+C是否在词典中。

在词典中：则返回index值为P+C对应索引值，直接index赋值给string_code（nextP=P+C），不进行输出，进行下一次读取。
不在词典中：返回index为-1。首先output( bf, string_code)输出P对应对应索引值。用AddToDictionary将P+C写入词典。并nextP=C（string_code = character），进行下一次读取。

最后一次读取：没有C，只有P，且P一定在词典中，输出P对应索引值，结束编码。

/*
 * Input: string represented by string_code in dictionary,
 * Output: the index of character+string in the dictionary
 * 		index = -1 if not found
 */
int InDictionary( int character, int string_code){
	int sibling;
	if( 0>string_code) return character;//针对第一个字符没有母节点的情况
	sibling = dictionary[string_code].firstchild;//stringcode + character
	while( -1<sibling){
		if( character == dictionary[sibling].suffix) return sibling;
		sibling = dictionary[sibling].nextsibling;
	}
	return -1;
}

在查找词典函数InDictionary中，输入参数分别为新读取单字符C的索引值与P的索引值。

输出值到LZWEncode赋值给index，-1则不在词典中，否则为P+C对应的词典索引。

此算法利用了sibling和firstchild值进行查找。其中0>string_code针对第一次读取，因为string_code初值为-1，之后都不可能为负值。而第一次读取时，C必定在默认词典中，直接返回C的索引值，输出index，进行下一次读取。

要查找P+C是否在词典里，可以根据词典树结构转化为先判断P是否有子节点，再判断P+C是否在其子节点群中。

一般情况下，首先查找P有没有子节点，若没有，则firstchild应为-1，说明P+C一定不在词典中，返回-1。

否则，则遍历P的各子节点中的suffix值是否对应C。没有对应则P+C不在词典中，返回-1。

若有对应的，即character == dictionary[sibling].suffix，即返回当前的sibling值，即该子节点的编号，也是P=C对应的词典编号。

void AddToDictionary( int character, int string_code){
	int firstsibling, nextsibling;
	if( 0>string_code) return;
	dictionary[next_code].suffix = character;
	dictionary[next_code].parent = string_code;
	dictionary[next_code].nextsibling = -1;
	dictionary[next_code].firstchild = -1;
	firstsibling = dictionary[string_code].firstchild;
	if( -1<firstsibling){	// the parent has child
		nextsibling = firstsibling;
		while( -1<dictionary[nextsibling].nextsibling ) 
			nextsibling = dictionary[nextsibling].nextsibling;
		dictionary[nextsibling].nextsibling = next_code;
	}else{// no child before, modify it to be the first
		dictionary[string_code].firstchild = next_code;
	}
	next_code ++;
}

写入词典函数AddToDictionary，输入参数为C和P。写入词典需要赋值dictionary各属性值，新节点的索引值为此时全局变量next_code的值。

写入算法步骤：

为新节点赋初值：suffix值为当前新读入的C，parent母节点为P。由于是新写入的词典项，一定没有子节点和下一个兄弟节点，firstchild和nextsibling为-1。
为父类节点及其子类节点更新属性值：首先判断父类节点之前有没有子类节点：
若没有，则将父类节点firstchild更新为当前的新节点索引值next_code。
如果有，则用nextsibling进行循环，其值依次为按顺序父类节点在此之前的的各子节点的索引值。（每个子节点的nextsibling的值即父类节点的下一个子类节点的值）通过此关系依次串联到最后一个子类节点后，将最后一个子类节点的nextsibling值设为当前next_code。
最后，next_code递增，为下一次写入词典做准备。

解码部分

void LZWDecode( BITFILE *bf, FILE *fp){
	int character;
	int new_code, last_code;
	int phrase_length;
	unsigned long file_length;
	InitDictionary();
	file_length = BitsInput( bf, 4*8);//预存输出码字
	if( -1 == file_length) file_length = 0;
	/*需填充*/
	last_code = -1;//第一个码字前无码字，设为-1,且第一个码字一定在默认词典(ascii)中，所以使用character一定有有效值，不必赋初值
	while( 0<file_length){
		new_code = input( bf);
		if( new_code >= next_code){ // this is the case CSCSC( not in dict)
			d_stack[0] = character;///dstack[0]存储当前last_code第一位
			phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
		}else{//in dict
			phrase_length = DecodeString( 0, new_code);//解码cW
		}
		character = d_stack[phrase_length-1];//character取PW/CW的第一位
		while( 0<phrase_length){
			phrase_length --;
			fputc( d_stack[ phrase_length], fp);//倒序输出解码得到的字符
			file_length--;
		}
		if( MAX_CODE>next_code){	// add the new phrase to dictionary
			AddToDictionary( character, last_code);
		}
		last_code = new_code;
	}
}

解码部分比编码部分更难懂些。牵涉的函数与编码部分相比增加了一个DecodeString，重复的函数语句解析不再赘述。

先来看主函数DecodeLZW。

character储存的是pW或cW中的第一个字符，没有赋初值，因为其值与上一次循环有关，用到时一定有有效值。new_code对应的每次新读取的cW，last_code则是由一次循环定义的pW（每次循环pW一定在词典中），phrase_length为此次解码的字符长度，由DecodeString函数返回值得到。dstack[]用来存储一次循环解码出的一串数据（字符串），倒序存储，输出时要倒序输出字符，fp为输出预存区。

第一次循环时，没有pW（因此last_code初值设为-1），cW一定在默认词典中，character赋值为cW字符（到下一次循环即为pW的第一个字符），直接输出cW对应的字符（一定是单个字符）。nextpW=cW（last_code = new_code），进行下一次循环。

之后每一次循环中，先判断cW是否在词典中：if( new_code >= next_code)

若在词典中，则比较简单：phrase_length = DecodeString( 0, new_code)，解码cW，dstack长度即为cW的长度，character赋值为cW的第一个字符（character = d_stack[phrase_length-1]），到下一次循环即为pW的第一个字符。接下来的while循环即为依次输出解码得到的字符dstack[]。将cW+pW第一个字符AddToDictionary写入词典中。最后nextpW=cW（last_code = new_code），进行下一次循环。

若不在词典中，则比较复杂：

phrase_length = DecodeString( 1, last_code)，解码pW，dstack长度即为pW的长度+1，dstack长度是pW长度+1，最终解码的是pW+pW第一位，且此值一定等于cW。而且此次循环cW不在词典中，则上一次循环中一定在（参考LZW编码步骤），可以保证character为当前pW第一位（上一次循环中character赋值为cW第一位）。将pW+pW第一位写入词典中。最后nextpW=cW（last_code = new_code），进行下一次循环。

最后一次循环中，没有cW，则执行语句phrase_length = DecodeString( 0, new_code)，解码pW并输出预存区，结束解码。

int DecodeString( int start, int code){
	int count;
	count = start;
	while( 0<=code){
		d_stack[ count] = dictionary[code].suffix;
		code = dictionary[code].parent;
		count ++;
	}
	return count;
}

DecodeString函数的作用为编码后字符倒序存储到dstack[]中，并返回dstack[]长度。

输入为start,即写入起始长度和code，即解码对象。

函数以start为起点将code解码写入stack。解码过程中利用了词典树中的parent属性。读取当前J节点的suffix后，将当前节点转换为母节点，可以一次得到一条链上各节点的suffix。倒序组起来即为解码得到的字符。code赋值为当前节点的母节点，循环到头时其值为-1。此时的count即为dstack的长度，作为返回值输出。

值得注意的是，对cW in or not in dict 情况下的两条解码语句：

//cW not in dict：
d_stack[0] = character;///dstack[0]存储当前last_code第一位
phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
//cW in dict
phrase_length = DecodeString( 0, new_code);//解码cW

not in dict时，输入sart值为1，为倒序的最后一位pW留出位置，而cW in dict的情况则不用，start为0。

当前last_code第一位
phrase_length = DecodeString( 1, last_code);//空留dstack[0]，解码pW
//cW in dict
phrase_length = DecodeString( 0, new_code);//解码cW

not in dict时，输入sart值为1，为倒序的最后一位pW留出位置，而cW in dict的情况则不用，start为0。

LZW的压缩效果分析

选取几种常用的文件格式用以上代码进行压缩：

需要特别住注明的是，LZW是无损压缩，不会损失信息。我们常用的zipRAR就是用了LZW压缩。

本次实验共使用了十种学习中经常会碰到的文件格式进行LZW压缩，其中不同图片格式皆为同一图片。（输出为bit文件）

在这里插入图片描述

可以看到，虽然使用了压缩算法，但不是所有文件都变小了，神奇。

以图片文件为例，同一张图片，bmp获得了非常不错的压缩效果（898KB——43KB)，但同时tif，png，jpg都有不同程度的增大。

反观音视频文件wav，ts，avi都获得了一定程度的压缩，虽然效果没有bmp那么显著。

doc能够成功压缩，但pdf和pptx都变大了。

其实这种变大的现象是可以预见的，分析如下：

变大的现象往往多出现于数据重复度不高的情况。在这种情况下算法将不断写入新的词典项目，导致最终输出bit流仍然体积很大。光靠这个原因还不足以让文件变大。
同时，对LZW算法中源文件的编码单位的一个字节，转为比特流后变成了对应的两个字节。再结合重复度不高的特点，可以预见会出现文件体积不减反增的现象。
同时png等文件格式自身已经进行了一定程度的压缩，也许本身重复度就已经很低。

Geronimo620

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
实验三 LZW编解码算法实现与分析

LZW简述本部分参考wiki https://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93WelchLZW压缩算法在1978年提出，由 Abraham Lempel, Jacob Ziv, and Terry Welch发明，因此称为LZW算法，是第一种在全世界计算机中广泛应用的压缩算法。LZW是一种自适应词典编码，即对文件中的...
复制链接

扫一扫