1072 - Huffman Codes

Dan McAmbi is a member of a crack counter-espionage team and has recently obtained the partial contents of a file containing information vital to his nation's interests. The file had been compressed using Huffman encoding. Unfortunately, the part of the file that Dan has shows only the Huffman codes themselves, not the compressed information. Since Huffman codes are based on the frequencies of the characters in the original message, Dan's boss thinks that some information might be obtained if Dan can reverse the Huffman encoding process and obtain the character frequencies from the Huffman codes. Dan's gut reaction to this is that any given set of codes could be obtained from a wide variety of frequency distributions, but his boss is not impressed with this reasoned analysis. So Dan has come to you to get more definitive proof to take back to his boss.

Huffman encoding is an optimal data compression method if you know in advance the relative frequencies of letters in the text to be compressed. The method works by first constructing a Huffman tree as follows. Start with a forest of trees, each tree a single node containing a character from the text and its frequency (the character value is used only in the leaves of the resulting tree). Each step of the construction algorithm takes the two trees with the lowest frequency values (choosing arbitrarily if there are ties), and replaces them with a new tree formed by joining the two trees as the left and right subtrees of a new root node. The frequency value of the new root is the sum of the frequencies of the two subtrees. This procedure repeats until only one tree is left. An example of this is shown below, assuming we have a file with only 5 characters -- A, B, C, D and E -- with frequencies 10%, 14%, 31%, 25% and 20%, respectively.

\epsfbox{p4122a.eps}
=6in  \epsfbox{p4122b.eps}

After you have constructed a Huffman tree, assign the Huffman codes to the characters as follows. Label each left branch of the tree with a 0 and each right branch with a 1. Reading down from the root to each character gives the Huffman code for that character. The tree above results in the following Huffman codes: A - 010, B - 011, C - 11, D - 10 and E - 00.

For the purpose of this problem, the tree with the lower frequency always becomes the left subtree of the new tree. If both trees have the same frequencies, either of the two trees can be chosen as the left subtree. Note that this means that for some frequency distributions, there are several valid Huffman encodings.

The same Huffman encoding can be obtained from several different frequency distributions: change 14% to 13% and 31% to 32%, and you still get the same tree and thus the same codes. Dan wants you to write a program to determine the total number of distinct ways you could get a given Huffman encoding, assuming that all percentages are positive integers. Note that two frequency distributions that differ only in the ordering of their percentages (for example 30% 70% for one distribution and 70% 30% for another) are not distinct.

Input 

The input consists of several test cases. Each test case consists of a single line starting with a positive integer n (2$ \le$n$ \le$20) , which is the number of different characters in the compressed document, followed byn binary strings giving the Huffman encoding of each character. You may assume that these strings are indeed a Huffman encoding of some frequency distribution (though under our additional assumptions, it may still be the case that the answer is 0 -- see the last sample case below).

The last test case is followed by a line containing a single zero.

Output 

For each test case, print a line containing the test case number (beginning with 1) followed by the number of distinct frequency distributions that could result in the given Huffman codes.

Sample Input 

5 010 011 11 10 00 
8 00 010 011 10 1100 11010 11011 111 
8 1 01 001 0001 00001 000001 0000001 0000000 
0

Sample Output 

Case 1: 3035 
Case 2: 11914 Case 3: 0





#include<stdio.h>
#include<stdlib.h>
#include<string.h>
const int maxn=22;
const int maxways=30000;
int n,i,j,new_tot,nodes,last,node,cases,ans[maxways][maxn],new_ans[maxways][maxn];
int l[maxn*2],r[maxn*2],step[maxn];
char a[maxn];

void split(int node)
{
	int i,j,k,f;
	new_tot=0;
	for(i=0;ans[i][0];i++)
	{
		f=ans[i][node];
		for(j=f-1;j>(f-1)/2;j--)
			if(j<=ans[i][nodes])
			{
				for(k=0;k<nodes;k++)
				{
					if(k>=node)
						new_ans[new_tot][k]=ans[i][k+1];
					else
						new_ans[new_tot][k]=ans[i][k];
				}
				new_ans[new_tot][k]=j;
				new_ans[new_tot++][k+1]=f-j;
			}
	}
	nodes++;
	memcpy(ans,new_ans,new_tot*maxn*4);
	ans[new_tot][0]=0;
}

int main()
{
	while(scanf("%d",&n)&&n)
	{
		memset(l,0,sizeof(l));
		memset(r,0,sizeof(r));
		last=0;
		for(i=0;i<n;i++)
		{
			scanf(" %s",a);
			node=0;
			for(j=0;a[j];j++)
				if(a[j]=='0')
				{
					if(!l[node])
					{
						l[node]=++last;
						r[node]=++last;
					}
					node=l[node];
				}
				else
				{
					if(!r[node])
					{
						l[node]=++last;
						r[node]=++last;
					}
					node=r[node];
				}
		}
		i=0;
		j=0;
		node=0;
		nodes=0;
		ans[0][0]=100;
		ans[1][0]=0;
		while(i<=j)
		{
			if(l[step[i]])
			{
				split(node);
				step[++j]=r[step[i]];
				step[++j]=l[step[i]];
			}
			else
				node++;
			i++;
		}
		printf("Case %d: %d\n",++cases,new_tot);
	}
	return 0;
}


### 回答1: Huffman编码是一种用于数据压缩的算法,它通过将出现频率较高的字符用较短的编码表示,从而减少数据的存储空间。该算法的基本思想是构建一棵哈夫曼树,将字符的出现频率作为权值,然后从叶子节点开始向上遍历,将左子树标记为,右子树标记为1,最终得到每个字符的编码。哈夫曼编码具有唯一性,即每个字符都有唯一的编码,且任何一个编码都不是另一个编码的前缀。 ### 回答2: Huffman编码是一种压缩数据的方式。它使用的基本原理是将数据中频繁出现的字符使用较短的编码,而不常用的字符使用较长的编码,以达到压缩数据的目的。在Huffman编码中,我们使用二叉树来表示每个字符的编码。左孩子被标记为0,右孩子被标记为1。当我们从根节点到叶子节点的路径上移动时,我们收集的所有0和1的序列将编码作为该字符的压缩表示。 具体来说,生成Huffman编码的过程包括以下步骤: 1. 统计给定数据集中每个字符出现的次数。 2. 将字符作为叶子节点构建二叉树,每个叶子节点包含一个字符和该字符的频率。 3. 选择频率最小的两个节点,将它们作为左右子树合并成一个新节点,其频率等于两个节点频率之和。 4. 将新节点插入二叉树,并在每个节点添加一个标记为0或1的位。 5. 重复步骤3和步骤4,直到只剩下一个节点。 6. 通过树遍历收集每个字符的Huffman编码。递归树,并在每个节点处添加0或1,直到我们到达一个叶子节点。 Huffman编码的优点在于它可以使数据更紧凑,占用更少的存储空间。它也是在许多压缩和编码算法中广泛应用的基础。Huffman编码的缺点是在压缩小数据时,压缩效果可能不明显。这是因为压缩率受到输入数据的分布和大小的影响。在Huffman编码中,来自数据集的所有字符的比特序列可能具有不同的长度。因此,我们需要在压缩和解压缩时花费一些额外的时间来恢复原始数据。 总之,Huffman编码是一种有效的数据压缩算法,可以通过使用二叉树来表示每个字符的编码来实现。它的主要优点是可以更紧凑地存储数据,但它仍然受到输入数据大小和分布的影响,并且在进行压缩和解压缩时需要花费额外的时间。 ### 回答3: 题目描述 Huffman code是一种贪心算法,用于编码数据,每个字符都对应一种可辨识的前缀二进制码,使得所有字符的编码总长度最短。给定n个权值作为n个叶子结点,构造一棵二叉树,若该树的带权路径长度达到最小,则称这样的二叉树为最优二叉树,也称为赫夫曼树。 在赫夫曼树中,每个叶子节点的权值就是原始数据中的权值,而非叶子节点不存储权值,比较特别的一种二叉树。 输入格式 第1行: 一个正整数n(<=1000) 接下来n行: 每行一个正整数weight[i](weight[i]<=100000) 输出格式 共n-1行,为赫夫曼编码表,每个字符的赫夫曼编码占据一行。 样例输入1 5 1 3 2 10 5 样例输出1 0 110 111 10 11 样例输入2 5 23 3 6 16 8 样例输出2 100 0 101 1101 1100 解题思路 首先,将所有节点的权值从小到大排序。 接着构造一棵二叉树: 每次从节点集合中选出最小的两个节点(即最小的两个权值) 将这两个点组成一棵新的二叉树,其权值为这两个节点权值之和,这棵新树的左右子树即为这两个节点。 把这棵新树加入到权值序列中,其位置按照新树的权值插入,继续循环,直到权值序列只含有一个节点为止,这个节点就是赫夫曼树的根。 最后,根据赫夫曼树将每个叶子节点的编码求出来,一般情况下,将左子树编码置“0”,右子树编码置“1”,然后做前缀无歧义编码,按照这种编码方式,我们得到了每个节点的Huffman编码。 代码实现
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值