数据压缩实验[2]：Huffman编码学习及实验

最新推荐文章于 2024-04-12 15:04:01 发布

AaronXueNF

最新推荐文章于 2024-04-12 15:04:01 发布

阅读量1.1k

点赞数 1

分类专栏：研究生数据压缩课程文章标签： c++ 霍夫曼树信息压缩

本文链接：https://blog.csdn.net/AaronXueNF/article/details/120680003

版权

研究生数据压缩课程专栏收录该内容

5 篇文章 0 订阅

订阅专栏

补充：
无失真信源编码的实质
对离散信源进行适当的变换，是变换后新的符号序列信源尽可能为等概分布，从而使新信源的每个码符号平均所含信息量达到最大！

Huffman算法概述

Huffman编码的分类

静态统计模型
- 码表根据训练数据得到，不同信源利用同一个概率表
- 无需传送Huffman码表/码树
半自适应统计模型
- 对每个输入信源符号，计算其概率
- 需要对输入进行两边扫描，无法适应于实时传输
- 必须传送Huffman码表和压缩流
自适应/动态统计模型
- 一遍扫描
- 当频率改变时，改变Huffman码表/树

半自适应统计模型的Huffman码表生成算法：

第一次计算所有符号的频率/概率
对所有符号按其概率排序
将符号集合划为为两个概率差异最小集合
在第一个集合的码字前加‘0’，在第二个集合的码字前加‘1’
对划分得到的两个子集递归编码，直到每个集合不能被再划分

Huffman解码

对于半自适应统计模型的Huffman需要传送码表吗，通常在文件中写入频率/概率，即将频率/概率标定为整数，然后解码器利用该信息构造Huffman树，这相比于写入Huffman树结构可以有更小的比特开销。

Huffman解码通常有两种方式：

比特串行解码：固定输入比特率，可变输出符号率
- 从码树根节点出发，根据逐个读入的比特走过各节点（通过0、1确定下一个走过的是左子树还是右子树），直至走到树叶，输出树叶的字符，回到根节点继续。
基于查找表的解码
- 对于最长码字为l比特，构造2l个入口的查找表。假设x对应码字w长为3，为“001”，最长码字长为5，则查找表中对应x的记录有4项：“001xx”（后两位任意）。
- 解码时一次读入l比特，从前到后在码表中查找对应，找到则输出相应码字（假设码长为w），并移除前w个比特，读入w个比特。

基于查找表的解码
请添加图片描述

规范Huffman编码

规范的本质是数据结构化问题！
使用某些强制的约定，仅通过很少的数据便能重构出霍夫曼编码树的结构。

一种典型的规范Huffman编码方式如下：

使用某种算法，统计每个符号的频率并求出该符号所需的位数/编码长度
统计从最大编码长度到1的每个长度对应多少个符号，然后为每个符号递增分配编码
- 码字长度最小的第一个编码从0开始
- 长度为i的第一个码字f(i)能从长度为i-1的最后一个码字得出, 即: f(i) = 2(f(i-1)+1)。假定长度为4的最后一个码字为1001，那么长度为5的第一个码字便为10100
编码输出压缩信息

简言之：相同码长的码，依次按二进制+1；码长增加时，+1补0，这点从下面例子很容易看出来。

其中，需要3个关键列表：

first[i] : 码长为i的第一个码字
index[j] : 第一个码长为j的符号索引
Table[k] : 索引为k的码字

举例如下：
请添加图片描述

从例子中可知，在当前的规范下，只要识别到了一个码字的码长l，将其与第一个码长l码字相减的二进制数即为其相较于第一个码长l的码字在索引中的偏移！

自适应Huffman编码算法

自适应Huffman编码中，编码器和解码器都从一棵空的码树开始，随着符号的读入按相同的方式修改码树。

所有的节点分为两类：树叶节点（对应实际的信源符号）和中间节点（对应中间合并的结果）；
树叶节点的权重即为其对应的信源符号出现的次数。中间节点的权重是其子节点权重的和。节点编号是为每个节点分配的唯一编号。如果有n个信源符号，则共有2n-1个节点（包含树叶节点和中间节点）。

动态霍夫曼码树的两条决策原则：

权重值大的节点，其节点编号也较大；
父节点的节点编号总是大于其子节点的节点编号。

动态霍夫曼码树的初始化问题：

在初始化编码树时，不知道各种信源符号频率。设置一个节点，称为NYT（Not Yet Transmitted，尚未传送），其权重为0，用一个逃逸码（escape code）表示，与任何一个要传送的符号不同。发送器和接收器的树都从只有一个节点（NYT）开始。
在编码过程中，如果输入未包含在码树中的符号，编码器一边更新编码树，同时输出NYT码字加上符号的原始表达。解码器收到一个NTY之后，将后一个码字识别为符号加入码树中。

算法过程：

更新过程要求节点保持固定顺序，即从NYT节点到根节点的编号始终从左向右、自下而上递增。
定义权值相同的一组节点组成一个块。
请添加图片描述

举例：
请添加图片描述

四个图为给定码树后插入字符v按照算法框图执行的过程。

Huffman编码实验内容

使用《数据压缩导论（第四版）》中的示例程序进行以下操作：

使用huff_enc对Sena、Sensin、Omaha图像进行编码；
编写程序，得到相邻像素之差，使用huff_enc对其进行编码；
利用huff_enc和huff_dec生成图像Sensin的码表，使用该码表对BookShelf1和Sena图像进行编码，并将编码结果与各图像自身码表的编码结果进行对比。

实验素材

实验所用图像：

未经压缩的256 * 256像素8位灰度图，其各像素值按光栅扫描顺序存储。
Sena
sena
Sensin
请添加图片描述
Omaha

Bookshelf

实验所用程序：

《数据压缩导论（第四版）》中的示例程序huff_enc、huff_dec，程序分析见文章结尾；
自行编写的DPCM编码程序（使用左侧像素预测）；
自行编写的8位灰度图像查看程序。

实验结果

1. Huffman编码结果

图像名称	原始文件大小	压缩文件大小	压缩率
Sena	64kB	56.1kB	87.66%
Sensin	64kB	59.9kB	93.59%
Omaha	64kB	57.0kB	89.06%

结论：
在仅使用Huffman编码的情况下图像压缩程度有限。

2. 像素差值文件编码结果

预测图像举例（以Sensin为例）
请添加图片描述

图像名称	原始文件大小	直接压缩文件大小	预测后压缩文件大小	原始压缩率	预测后压缩率
Sena	64kB	56.1kB	32.4kB	87.66%	51.56%
Sensin	64kB	59.9kB	37.8kB	93.59%	59.38%
Omaha	64kB	57.0kB	51.4kB	93.59%	80.31%

结论：
在DPCM预测编码与Huffman编码联合使用的情况下图像压缩程度相较于仅使用Huffman编码有了明显提升。证明DPCM的预测可以有效地利用像素值间的相关性，预测后的差值信号近似于拉普拉斯分布，即信源符号在个别取值上较为集中，使其更适应于Huffman编码的方式。

3. 使用不同码表对图像进行编码

图像名称	原始文件大小	压缩文件大小	压缩率
Sena	64kB	59.2kB	92.50%
Bookshelf1	64kB	70.8kB	110.63%

结论：
使用Sensin图像的Huffman码表对Sena、BookShelf图像进行编码时图像压缩率均出现了下降。其中，通过观察可知Sena与Sensin图像在整体像素值分布上较为近似，因此压缩率下降较小；而Bookshelf1图像在像素值上与Sensin图像差异较大，因此压缩率下降明显，甚至出现压缩图像大于原始图像的现象。这说明进行Huffman编码时要求码表中各符号概率与信源符号概率严格匹配。因此对于基础的Huffman压缩算法，需要在压缩前统计文件符号概率并在压缩后将码表一并输出。

程序分析

分析添加在程序的注释中。

自行编写的DPCM编码程序（使用左侧像素预测）

imgdiff.cpp

#include <fstream>
#include <iostream>
using namespace std;

void usage()
{
	cout << "Usage:\n" << endl;
	cout << "imgdiff [infile][width][height][outfile]\n" << endl;
	cout << "\t infile : Input raw img file. \n" << endl;
	cout << "\t width : width of img. \n" << endl;
	cout << "\t height : height of img. \n" << endl;
	cout << "\t outfile : Out img contains the diffence between pixels\n" << endl;
}

inline int index(int x, int y, int width)
{
	return y * width + x;
}

inline uint8_t clip255(int x)
{
	x = x + 127;
	if (x >= 255)
		return UINT8_MAX;
	else if (x <= 0)
		return 0;
	else
		return x;
}

int main(int argc, char** argv)
{
	if (argc != 5) {
		usage();
	}

/* Read image file */

	ifstream img_in(argv[1], ios::binary);
	_ASSERT(img_in.is_open());

	uint16_t width = atoi(argv[2]);
	uint16_t height = atoi(argv[3]);

	uint8_t* img_data = new uint8_t[width * height];
	_ASSERT(img_data);

	// Check file size
	istream::pos_type start_pos = img_in.tellg();
	img_in.seekg(0, ios_base::end);
	istream::pos_type end_pos = img_in.tellg();
	img_in.seekg(start_pos);
	istream::pos_type file_size = end_pos - start_pos;
	_ASSERT(file_size, width * height);
	
	img_in.read((char*)img_data, width * height);
	img_in.close();
	
/* Calculate diff img */
	uint8_t* diff_data = new uint8_t[width * height];
	_ASSERT(diff_data);
	for (int y = 0; y < height; y++) {
		diff_data[index(0, y, width)] = img_data[index(0, y, width)];
		for (int x = 1; x < width; x++) {
			int diff = img_data[index(x, y, width)] - 
						img_data[index(x - 1, y, width)];
			diff_data[index(x, y, width)] = clip255(diff);
		}
	}

/* Save diff file */
	ofstream diff_out(argv[4], ios::binary);
	_ASSERT(diff_out.is_open());

	diff_out.write((char*)diff_data, width * height);
	diff_out.close();

	delete[] img_data;
	delete[] diff_data;
	return 0;
}

自行编写的8位灰度图像查看程序（matlab）

imgshow.m

clear; close all; clc;

%% definition of size
M = 256;
N = 256;
%% definition of filepath
pathname = ".\images\diff";      % 文件路径
filename = "\sensin_diff.img";   % 文件名称

%% read files
f = fopen(pathname + filename, 'r');
data = fread(f, 'uint8');
fclose(f);
len = length(data);
k = len/(M*N);
image = uint8((reshape(data, M, N, k))');

%% show
imshow(uint8(image));

《数据压缩导论（第四版）》中的示例程序huff_enc、huff_dec

idc.h

#pragma once
#include <string.h>
#include "getopt.h"
#include "unistd.h"

/* define the structure node */

struct node {
    float pro; /* probabilities */
    int l; /* location of probability before sorted */
    unsigned int code; /* code */
    struct node* left; /* pointer for binary tree */
    struct node* right; /* pointer for binary tree */
    struct node* forward;
    struct node* back;
    struct node* parent; /* pointer to parent */
    int check;
};

/* define subroutines and pointers to nodes */

typedef struct node NODE;
typedef struct NODE* BTREE;
BTREE create_list(float prob[], int loc[], int num);
void create_code(NODE* root, int lgth, unsigned int* code, char* length);

//BTREE make_list(int num);
//void write_code(NODE* root, int lgth, unsigned int* code, char* length, int* loc);


void sort(float* prob, int* loc, int num);
void huff(float prob[], int loc[], int num, unsigned int* code, char* length);
void getcode(FILE* fp, int num, unsigned int* code, char* length);
void value(int* values, unsigned char* image, int size, int num);
int files(int size, int* code, char* length, unsigned char* file);

huff_enc.c

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include "idc.h"

/**********************************************************************
*                                                                      *
*  File: huff_enc.c                                                    *
*  Function:  Huffman encodes an input file assuming a 256 letter      *
*             alphabet                                                 *
*  Author  : S. Faltys                                                 *
*  Last mod: 7/21/95                                                   *
*  Usage:  see usage(), for details see man page or huff_enc.doc       *
*                                                                      *
***********************************************************************/

/*******************************************************************************
*NOTICE:                                                                       *
*This code is believed by the author to be bug free.  You are free to use and  *
*modify this code with the understanding that any use, either direct or        *
*derivative, must contain acknowledgement of its author and source.  The author*
*makes no warranty of any kind, expressed or implied, of merchantability or    *
*fitness for a particular purpose.  The author shall not be held liable for any*
*incidental or consequential damages in connection with or arising out of the  *
*furnishing, performance, or use of this software.  This software is not       *
*authorized for use in life support devices or systems.                        *
********************************************************************************/

void usage(void);

void main(int argc, char** argv)
{
	unsigned char* file; /*pointer to an array for file */
	char infile[80], outfile[80], codefile[80], scodefile[80];
	/* input and output files*/
	char temp[80], type, where, t;
	int size, num, c;
	int i, j, n; /* counters */
	FILE* ifp, * ofp, * cfp, * sfp, * tmp_fp;
	char* length, x; /*pointer to an array for code lengths*/
	int values[256], loc[256];
	unsigned int* code;
	/* pointer to an array for code */
	float prob[256], p;
	extern int optint;
	extern char* optarg;

	ifp = stdin;
	t = 0;  /*flag to see if an input filename was given*/
	ofp = stdout;
	x = 0; /* flag if output is piped to decoder */
	cfp = NULL;
	sfp = NULL;
	num = 256;

	code = (unsigned int*)malloc(num * sizeof(unsigned int));
	length = (char*)malloc(num * sizeof(char));

    // 读取输入参数
	while ((c = getopt(argc, argv, "i:o:c:s:h")) != EOF)
	{
		switch (c) {

			/* input file */

		case 'i':
			strcpy(infile, optarg);
			if ((ifp = fopen(infile, "rb")) == NULL) {
				fprintf(stderr, "Image file cannot be opened for input.\n");
				return;
			}
			t = 1;
			break;

			/* output file */

		case 'o':
			strcpy(outfile, optarg);
			if ((ofp = fopen(outfile, "wb")) == NULL) {
				fprintf(stderr, "Output file cannot be opened for output.\n");
				return;
			}
			x = 1;
			break;
			/* code file */

		case 'c':
			strcpy(codefile, optarg);
			if ((cfp = fopen(codefile, "rb")) == NULL) {
				fprintf(stderr, "Code file cannot be opened for input.\n");
				return;
			}
			getcode(cfp, num, code, length);
			break;

			/* file to store code in */

		case 's':

			strcpy(scodefile, optarg);
			if ((sfp = fopen(scodefile, "wb")) == NULL) {
				fprintf(stderr, "Code file cannot be opened for output.\n");
				return;
			}
			break;

		case 'h':
			usage();
			exit(1);
			break;
		}
	}

	/* get size of file */

	   /* create a temporary file for input */

	if (t == 0) {
		strcpy(infile, "tmpf");
		tmp_fp = fopen(infile, "wb+");
		while ((t = getc(ifp)) != EOF)
			putc(t, tmp_fp);
		fclose(tmp_fp);
		ifp = fopen(infile, "rb");
		t = 0;
	}

	fseek(ifp, 0, 2); /* set file pointer at end of file */
	size = ftell(ifp); /* gets size of file */
	++size;
	fseek(ifp, 0, 0); /* set file pointer to begining of file */

/* get memory for file */

	file = (unsigned char*)malloc(size * sizeof(unsigned char) * 4); /*amer 3/99 */
	if (file == NULL) {
		printf("Unable to allocate memory for file.\n");
		exit(1);
	}

	/* get file */

	fread(file, sizeof(unsigned char), size, ifp);
	fclose(ifp);

	/* remove temporary file if one was used */

	if (t == 0)
		remove("tmpf");

	/* create code */

	if (cfp == NULL) {

		/* set values to zero */

		for (i = 0; i < num; i++)
			values[i] = 0;

		/* find values */
        // 统计各码字出现次数
		value(values, file, size, num);

		/* find probs */
        // 计算码字出现概率
		p = size + 0.0;
		for (i = 0; i < num; i++)
			prob[i] = values[i] / p;

		/* set to zero */

		for (i = 0; i < num; i++) {
			code[i] = 0;
			length[i] = 0;
		}

		/* sort prob array */
        // 按概率从大到小对符号进行排序
		sort(prob, loc, num);

		/* make huff code */
        // 计算Huffman码表
		huff(prob, loc, num, code, length);

	}

	/* encode file */
    // 编码文件
	size = files(size, code, length, file);

	/* write length of encoded file to the decoder */

	/*	if(x==0)
		fwrite(&size,sizeof(int),1,ofp);
	*/

	if (sfp == NULL) {

		/* write encoded file to file */

		fwrite(code, sizeof(unsigned int), num, ofp);
		fwrite(length, sizeof(char), num, ofp);
	}

	fwrite(file, sizeof(unsigned char), size, ofp);
	fclose(ofp);

	/* write code to a file */

	if (sfp != NULL) {
		fwrite(code, sizeof(unsigned int), num, sfp);
		fwrite(length, sizeof(char), num, sfp);
		fclose(sfp);
	}

}

void usage()
{
	fprintf(stderr, "Usage:\n");
	fprintf(stderr, "huff_enc [-i infile][-o outfile][-c codefile][-s storecode][-h]\n");
	fprintf(stderr, "\t imagein : file containing the input to be encoded.  If no\n");
	fprintf(stderr, "\t\t name is provided input is read from standard in.  This\n");
	fprintf(stderr, "\t\t feature can be used to directly encode the output of programs\n");
	fprintf(stderr, "\t\t such as jpegll_enc, and aqfimg_enc.\n");
	fprintf(stderr, "\t outfile : File to contain the encoded representation.  If no\n");
	fprintf(stderr, "\t\t name is provided the output is written to standard out.\n");
	fprintf(stderr, "\t codefile: If this option is used the program will read the\n");
	fprintf(stderr, "\t\t Huffman code from codefile.  If the option is not used the\n");
	fprintf(stderr, "\t\t program computes the Huffman code for the file being encoded\n");
	fprintf(stderr, "\t storecod: If this option is specified the Huffman code used to\n");
	fprintf(stderr, "\t\t encode the file is stored in codefile.  If this option is\n");
	fprintf(stderr, "\t\t not specified the code is stored in outfile\n");
}

imsub.c

#include <stdio.h>
/*

		imsub.c

		This is a file of subroutines

		create_code, create_list, huff, getim, storeim, and sort

*/

#include "idc.h"

/*******************************************************************************
*NOTICE:                                                                       *
*This code is believed by the author to be bug free.  You are free to use and  *
*modify this code with the understanding that any use, either direct or        *
*derivative, must contain acknowledgement of its author and source.  The author*
*makes no warranty of any kind, expressed or implied, of merchantability or    *
*fitness for a particular purpose.  The author shall not be held liable for any*
*incidental or consequential damages in connection with or arising out of the  *
*furnishing, performance, or use of this software.  This software is not       *
*authorized for use in life support devices or systems.                        *
********************************************************************************/

/* This subroutine writes code for a binary tree
   the code is placed in an array named code
   the length of the code is placed in an array named length

   This subroutine checks to see if the node it is given when it is called
   is at the end of the tree.  If it is the code and length for this node is
   written to arrays. If it is not at the end of the tree the code for the
   next node in the tree is written.  This subroutine travels left first
   assigning zeros for each step left and then travels right assigning
   ones for each step right. */

/** 
 * @brief 码树码字生成
 * @param root   输入根节点指针
 * @param lgth   输入根节点的码长
 * @param code   输出各符号的码字
 * @param length 输出各符号的码长
 *
 * @return 空
 */
void create_code(NODE* root, int lgth, unsigned int* code, char* length)
{
    // 递归调用，为当前节点的左子树码字串末尾添加0，为右子树码字串末尾添加1

	if (root->left != NULL) {

		/* writes code for next node */
		root->left->code = root->code;
		root->left->code = root->left->code << 01;

		/* keep track of length of code */
		++lgth;

		/* call subroutine again but with next node left as root node */
		create_code(root->left, lgth, code, length);
	}
	if (root->right != NULL) {

		/* writes code for next node */
		root->right->code = root->code;
		root->right->code = root->right->code << 01;

		/* add one for step right */
		root->right->code += 1;

		/* call subroutine again but with next node right as root node */
		create_code(root->right, lgth, code, length);
	}
	if (root->left == NULL && root->right == NULL) {

		/* write code and length to arrays,
		   code and length are written in the same order
		   in the code and length arrays as the probability
		   they go with before the probabilities were sorted */

		code[root->l] = root->code;
		length[root->l] = lgth;
	}
}

/* This subroutine creates a binary tree */
/** 
 * @brief Huffman码树生成
 * @param prob   输入符号概率
 * @param loc    输入符号
 *
 * @return 空
 */
BTREE create_list(float prob[], int loc[], int num)
{
	NODE* head, * tail, * f, * b, * e, * par = NULL; /* pointers to nodes */
	int i, j, k, count, x; /* counters */

	count = num - 1; /* counter used when assigning parents */

    // 分配头节点
	if (num > 0) {
		head = (NODE*)malloc(sizeof(NODE)); /* head is the start of the list */
		head->pro = prob[0];
		head->l = loc[0];
		head->left = NULL;
		head->right = NULL;

		tail = head; /* tail is the end of the list */

/* create a list with a node for each probability
   and conect them together using b as a pointer */
        // 将各符号节点按概率从大到小的顺序连接成一链表
		for (j = 1; j < num; j++) {
			tail->forward = (NODE*)malloc(sizeof(NODE));
			b = tail;
			tail = tail->forward;
			tail->back = b;
			tail->pro = prob[j];
			tail->l = loc[j];
			tail->left = NULL;
			tail->right = NULL;

		}

		/* create the tree by creating parents
		   after each parent is created its probability is equal to the sum
		   of its children
		   then the parent is inserted into the list so it is still sorted
		   by probabilities
		*/
        // 从链表尾开始扫描，合并概率最小的两个节点至一父节点，并将两个子节点从链表中删除
        // 按概率查找父节点在链表中的位置并将其插入

		for (i = 1; i < num; i++) {

            // 合并末尾节点，将末尾节点从链表中断开
			tail->parent = (NODE*)malloc(sizeof(NODE));
			par = tail->parent; /* par is a pointer of parent */
			par->right = tail;
			par->left = tail->back;
			b = tail;
			tail = tail->back;
			tail->parent = par;
			b->back = NULL; /* unhook original list */
			tail->forward = NULL; /* unhook original list */
			par->pro = par->right->pro + par->left->pro;

            // 末尾节点前一节点不为头节点，将其从链表中断开
			if (count != 1) {
				b = tail;
				tail = tail->back;
				b->back = NULL; /* unhook original list */
				tail->forward = NULL; /* unhook original list */
			}

            // 按概率将合并的父节点插入至链表
			/* insert parent into list */
			b = head;
			x = 0; /* marker to see if parent has been placed in list */
			for (j = 1; j < count; j++) {
				if (b != NULL) {
					if ((b->pro <= par->pro) && x == 0) {

						if (b == head) {

							/* probability is larger than largest probability
							   in the list so it becomes the head */
							par->forward = b;
							b->back = par;
							head = par;
							x = 1;
						}
						if (x != 1) {

							/* f is a pointer to a node
							   the probability of f is larger than parent's
							   the probability of b is smaller then parent's
							   parent is placed between f and b */
							f = b->back;
							f->forward = par;
							par->back = f;
							par->forward = b;
							b->back = par;
							x = 1;
						}
					}
					if (x == 0 && count != 0)
						b = b->forward;
				}
			}

			/* if the probability of parent is the smallest on the list
			   x will equal zero */
			if (x == 0 && tail->pro > par->pro) {
				tail->forward = par;
				par->back = tail;
				tail = par;
			}

			count--;
		}
	}

	/* tree is created */
	head = par; /* the last parent created becomes the start of
			 the tree */
	head->code = 0; /* the code for the first node is set to
			 zero */
	return (BTREE)head;

}

/* This subroutine creates huffman code */
/** 
 * @brief Huffman编码主函数
 * @param prob   输入符号概率
 * @param loc    输入符号
 * @param code   输出各符号的码字
 * @param length 输出各符号的码长
 *
 * @return 空
 */
void huff(float prob[], int loc[], int num, unsigned int* code, char* length)
{
	NODE* tree; /* pointer to a node */
	int lgth;

	lgth = 0;
	tree = (NODE*)create_list(prob, loc, num); /* create tree */
	create_code(tree, lgth, code, length); /* get code from tree */
}

/* This subroutine gets an image */

void getim(char* fname, unsigned char* image, int size)
{
	FILE* fp;

	if ((fp = fopen(fname, "rb")) == NULL) {
		fprintf(stderr, "Image file cannot be opened for input.\n");
		return;
	}
	printf("Input file opened successsful.\n");
	fread(image, sizeof(unsigned char), size, fp);
	fclose(fp);
}

/* This subroutine stores an image to a file */

void storeim(char* fname, unsigned char* image, int size)
{
	FILE* fp;

	if ((fp = fopen(fname, "wb")) == NULL) {
		fprintf(stderr, "File cannot be opened for output.\n");
		return;
	}
	fwrite(image, sizeof(unsigned char), size, fp);
}

/* This subroutine sorts an array */
/** 
 * @brief 排序函数
 * @param value  各符号概率，输入为待排序列表，输出为排序完成列表
 * @param loc    各符号取值，输入为待排序列表，输出为排序完成列表
 *
 * @return 空
 */
void sort(float* value, int* loc, int num)
{
	int i, j, k; /* counters */
	int maxloc; /* location of largest probability */
	float* svalue; /* arrray to hold sorted values */
	float max; /* largest probability in the array */

/* get memory for an array of sorted probabilities and set it to zero */

	svalue = (float*)malloc(num * sizeof(float));
	for (i = 0; i < num; i++)
		svalue[i] = 0.0;

	i = num - 1;
	for (j = 0; j < num; j++) {
		if (value[j] == 0.0) {
			svalue[i] = value[j];
			loc[i] = j;
			--i;
		}
	}
	k = 0;
	for (i = 0; i < num; i++) {
		max = 0.0;
		/* find largest probability */
		for (j = 0; j < num; j++) {
			if (value[j] > max) {
				max = value[j];
				maxloc = j;
			}
		}
		if (max != 0.0) {
			svalue[k] = value[maxloc];
			loc[k] = maxloc;
			/* largest probability is set to zero */
			value[maxloc] = 0.0;
			k++;
		}
	}
	/* sorted probabilities are written to original array */
	for (i = 0; i < num; i++)
		value[i] = svalue[i];
}

sub.c

#include <stdio.h>
/*

	sub.c

	This is a file of subroutines

	getcode, diff, value, images, and files

*/

#include "idc.h"

/*******************************************************************************
*NOTICE:                                                                       *
*This code is believed by the author to be bug free.  You are free to use and  *
*modify this code with the understanding that any use, either direct or        *
*derivative, must contain acknowledgement of its author and source.  The author*
*makes no warranty of any kind, expressed or implied, of merchantability or    *
*fitness for a particular purpose.  The author shall not be held liable for any*
*incidental or consequential damages in connection with or arising out of the  *
*furnishing, performance, or use of this software.  This software is not       *
*authorized for use in life support devices or systems.                        *
********************************************************************************/

/* This subroutine gets code and code lengths from a file
   code is put in an array named code and the code lengths
   are put in an array named length */

void getcode(FILE *fp, int num, unsigned int *code, char *length)
{
        fread(code,sizeof(unsigned int),num,fp); /* gets code from a file */
        fread(length,sizeof(char),num,fp); /* gets code lengths from file */
        fclose(fp);
}

/* This subroutine calculates the differences between pixals, 
   the differences are put in an array named dimage so they can
   be used later and do not have to be recalculated */	

void diff(int *diffs, unsigned char *image,int rows,int cols,int num, unsigned char *dimage)
{
	int i,j,d; /* i,j are counters, d is the difference between pixals */
	unsigned char p; /* p is an unsigned char used to write the 
			    differences into an array */

/* find diffs */

	d=0-*(image); /* the first character's difference is subtracted 
			 from zero */

        for(i=0;i<rows;i++){
                for(j=0;j<cols;j++){
                if(i!=0 && j==0)

	/* the first column's difference is found by subtracting the
	   pixal value from the pixal value above it */

		d=*(image+(i-1)*cols)-*(image+i*cols);
		if(j!=0)
	
	/* all other column's differences are calculated by subtracting
	   the pixal value from the pixal value before it */

			d=*(image+i*cols+j-1)-*(image+i*cols+j);
		if (d<0)
		   d+=256; /* if the difference is negative 256 is added to
			      it so there are only 256 different difference
			      values */
                ++diffs[d]; /* sums up how many times each differece 
			       occurs */
		p=d; /* p equals the difference */
		*(dimage+i*cols+j)=p; /* the differences are written to
				         an array */
		}
	}
}

/* This program counts the number of times a character is used in a file
   the results are placed in an array named values */

void value(int *values, unsigned char *image, int size,int num)
{

	int i,c;

        for(i=0;i<size;i++){
                c=*(image+i);
                ++values[c];
		}
}

/* This subroutine encodes an image */

void images(int rows,int cols,int *code,char *length,unsigned char *file,unsigned char *dimage)
{
	int i,j,k,l,count; /* counters */
	int d,num;
	unsigned int w,c;
	unsigned char word;

	l=0;
	word=0;
	count=0;
	num=256;

/* encode image */
        for(i=0;i<rows;i++){
                for(j=0;j<cols;j++){
                        d=*(dimage+i*cols+j); /* d=difference between pixals */
                        if(length[d]==1)
                                w=length[d];
                                else
                                w=(1<<(length[d]-1));
                        c=code[d];
                for(k=0;k<length[d];k++){
                        if((c & w) == w) /* checks bit value of codes */
                                ++word;
                        c<<=1;
                        ++l;
                        if(l==8){
                                *(file+count)=word; /* encoded image */
                                ++count; /* counts lenghth of encoded image */
                                word=0;
                                l=0;
			}
                        else
                                word<<=1;
		}
		}
	}

/* left shift word so highest bit is the next code word */

        if(l!=0){
                word=word<<(7-l);
                *(file+count)=word;
                ++count;
	}

/* end to image equals NULL */
                word=NULL;

        *(file+count)=word;
        ++count;
/* store count at end of length array */

        length[num]=count;

}

/* This subroutine encodes a file */
/** 
 * @brief 排序函数
 * @param code   输入各符号码字，按0-255排序
 * @param length 输入各符号码长，按0-255排序
 * @param file   输入原始文件，输出编码后的文件
 *
 * @return 压缩文件长度（字节）
 */
int files(int size,int *code,char *length,unsigned char *file)
{
        int i,k,l,count; /* counters */
        int v, num;
        unsigned int w,c;
        unsigned char word, *newfile;

        l=0;
        word=0;
        count=0;
	num=256;

	newfile=(unsigned char*)malloc(size*sizeof(unsigned char) * 4); /* amer 3/99 */
        if (newfile==NULL){
                printf("Unable to allocate memory for file.\n");
                exit(1);
	}

/* encode file */
// 注意这里采用的位操作
        for(i=0;i<size;i++){
                        // 读取字符
                        v=*(file+i); /* character  */

                        // 确定最高位在一字节中的位置，该位置在w中置1
                        if(length[v]==1)
                                w=length[v];
                                else
                                w=(1<<(length[v]-1));
                        // 读取码字，c为码字
                        c=code[v];

                for(k=0;k<length[v];k++){
                        // 码字逐位左移，输出字符逐位左移
                        // 依次从左到右将码字比特写入输出字符
                        // word为输出字符，l为其已写入比特的长度
                        if((c & w) == w) /* checks bit value of codes */
                                ++word;
                        c<<=1;
                        ++l;
                        // 写满8各比特，输出1字节
                        if(l==8){
                                *(newfile+count)=word; /* encoded file */
                                ++count; /* counts lenghth of encoded file */
                                word=0;
                                l=0;
			}
                        else
                                word<<=1;
		}
	}

/* left shift word so highest bit is the next code word */

        if(l!=0){
                word=word<<(7-l);
                *(newfile+count)=word;
                ++count;
	}

/* put encoded file array into old file array */

	for(i=0;i<count;i++)
		*(file+i)=*(newfile+i);

	return count;

}

huff_dec.c

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#include "unistd.h"
#include "idc.h"

/**********************************************************************
*                                                                      *
*  File: huff_dec.c                                                    *
*  Function:  this program decodes files encoded using huff_enc        *
*  Author  : S. Faltys                                                 *
*  Last mod: 7/21/95                                                   *
*  Usage:  see usage(), for details see man page or huff_dec.doc       *
*                                                                      *
***********************************************************************/

/*******************************************************************************
*NOTICE:                                                                       *
*This code is believed by the author to be bug free.  You are free to use and  *
*modify this code with the understanding that any use, either direct or        *
*derivative, must contain acknowledgement of its author and source.  The author*
*makes no warranty of any kind, expressed or implied, of merchantability or    *
*fitness for a particular purpose.  The author shall not be held liable for any*
*incidental or consequential damages in connection with or arising out of the  *
*furnishing, performance, or use of this software.  This software is not       *
*authorized for use in life support devices or systems.                        *
********************************************************************************/

void usage(void);

void main(int argc, char** argv)
{
	unsigned char* file, * efile;
	/*pointer to an array for file
	  pointer to an array for encoded image*/
	unsigned char w, p, n; /* place holders */
	char infile[80], outfile[80], codefile[80]; /* input and output files*/
	char temp[80], where, size;
	int num, s, x, c;
	int i, k, l, count, f; /* counters */
	FILE* ifp, * ofp, * cfp, * tmp_fp;
	unsigned int* code, word, d;
	char* length, t;  /* pointer to an array for code lengths */
	extern int optint;
	extern char* optarg;

	ifp = stdin;
	t = 0; /* flag to see if an input filename was given */
	ofp = stdout;
	cfp = NULL;
	num = 256;
	l = 0;
	word = 0;
	n = 1 << 7; /* set highest bit equal to one */
	size = 0;
	f = 10000;
	x = 0;

	/* get memory */

	code = (unsigned int*)malloc(num * sizeof(unsigned int));
	length = (char*)malloc(num * sizeof(char));
	file = (unsigned char*)malloc(f * sizeof(unsigned char));
	if (file == NULL) {
		printf("Unable to allocate memory for image.\n");
		exit(1);
	}

	while ((c = getopt(argc, argv, "i:o:c:h")) != EOF)
	{
		switch (c) {

			/* input file */

		case 'i':
			strcpy(infile, optarg);
			if ((ifp = fopen(infile, "rb")) == NULL) {
				printf("Cannot open file for input!\n");
				return;
			}
			t = 1;
			break;

			/* output file */

		case 'o':
			strcpy(outfile, optarg);
			ofp = fopen(outfile, "wb");
			break;

			/* code file */

		case 'c':
			strcpy(codefile, optarg);
			cfp = fopen(optarg, "rb");
			getcode(cfp, num, code, length);
			break;
		case 'h':
			usage();
			exit(1);
			break;
		}
	}

	fprintf(stderr, "\t Patience -- This may take a while\n");

	/* get file size */

	   /* create a temporary file for input */

	if (t == 0) {
		strcpy(infile, "tmpf");
		tmp_fp = fopen(infile, "wb+");
		while ((t = getc(ifp)) != EOF)
			putc(t, tmp_fp);
		fclose(tmp_fp);
		ifp = fopen(infile, "rb");
		t = 0;
	}

	fseek(ifp, 0, 2); /* set file pointer at end of file */
	s = ftell(ifp); /* gets size of file */
	fseek(ifp, 0, 0); /* set file pointer to begining of file */

	if (cfp == NULL) {
		fread(code, sizeof(unsigned int), num, ifp);
		fread(length, sizeof(char), num, ifp);
		s = s - ftell(ifp); /* s = the size of the encoded file */
	}

	/* get memory for encoded file */

	efile = (unsigned char*)malloc(s * sizeof(unsigned char));

	/* get encoded file */

	fread(efile, sizeof(unsigned char), s, ifp);
	fclose(ifp);

	/* remove temporary file if one was used */

	if (t == 0)
		remove("tmpf");

	/* decode file */

	count = 0;
	w = *(efile + count); /* w equals encoded word */
	++count; /* counter to keep track of which word is being decoded */
	for (; count < s; ) {
		for (k = 0; k < 8; k++) {
			if ((w & n) == n) { /* checks bit value of encoded word */
				++word;
			}
			++size; /* counter to keep track of length of word */

		/* check to see if word equals a code */

			for (i = 0; i < num; i++) {
				if (word == code[i] && size == length[i]) {
					if (l < f) {
						p = i; /* pixal value is a char */
						*(file + l) = p; /* decoded image */
						++l; /* counter of length of image */
						i = num; /* ends loop */
					}
					else {
						fwrite(file, sizeof(unsigned char), f, ofp);
						l = 0;
						p = i; /* pixal value is a char */
						*(file + l) = p; /* decoded image */
						++l; /* counter of length of image */
						i = num; /* ends loop */
						x = 1;
					}
					word = 0; /* reset word */
					size = 0; /* reset size */
				}
			}
			word <<= 1;
			w <<= 1;
		}
		w = *(efile + count); /* get next encoded value */
		++count;
	}
	*(file + l) = EOF;
	fwrite(file, sizeof(unsigned char), l, ofp);
	fclose(ofp);

}

void usage()
{
	fprintf(stderr, "Usage:\n");
	fprintf(stderr, "huff_enc [-i infile][-o outfile][-c codefile][-h]\n");
	fprintf(stderr, "\t imagein : file containing the Huffman-encoded data.  If no\n");
	fprintf(stderr, "\t\t name is provided input is read from standard in.\n");
	fprintf(stderr, "\t outfile : File to contain the reconstructed representation.  If no\n");
	fprintf(stderr, "\t\t name is provided the output is written to standard out.\n");
	fprintf(stderr, "\t codefile: This option is required if the Huffman encoded file\n");
	fprintf(stderr, "\t\t does not contain the Huffman code as part of the header\n");
	fprintf(stderr, "\t\t If this option is specified the program will read the\n");
	fprintf(stderr, "\t\t Huffman code from codefile.\n");
}

AaronXueNF

关注

1
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
数据压缩实验[2]：Huffman编码学习及实验

Huffman算法概述Huffman编码的分类静态统计模型码表根据训练数据得到，不同信源利用同一个概率表无需传送Huffman码表/码树半自适应统计模型对每个输入信源符号，计算其概率需要对输入进行两边扫描，无法适应于实时传输必须传送Huffman码表和压缩流自适应/动态统计模型一遍扫描当频率改变时，改变Huffman码表/树半自适应统计模型的Huffman码表生成算法：第一次计算所有符号的频率/概率对所有符号按其概率排序将符号集合划为为两个概率差异最小集
复制链接

扫一扫