《数据结构》05-树9 Huffman Codes

题目

In 1953, David A. Huffman published his paper “A Method for the Construction of Minimum-Redundancy Codes”, and hence printed his name in the history of computer science. As a professor who gives the final exam problem on Huffman codes, I am encountering a big problem: the Huffman codes are NOT unique. For example, given a string “aaaxuaxz”, we can observe that the frequencies of the characters ‘a’, ‘x’, ‘u’ and ‘z’ are 4, 2, 1 and 1, respectively. We may either encode the symbols as {‘a’=0, ‘x’=10, ‘u’=110, ‘z’=111}, or in another way as {‘a’=1, ‘x’=01, ‘u’=001, ‘z’=000}, both compress the string into 14 bits. Another set of code can be given as {‘a’=0, ‘x’=11, ‘u’=100, ‘z’=101}, but {‘a’=0, ‘x’=01, ‘u’=011, ‘z’=001} is NOT correct since “aaaxuaxz” and “aazuaxax” can both be decoded from the code 00001011001001. The students are submitting all kinds of codes, and I need a computer program to help me determine which ones are correct and which ones are not.

Input Specification:
Each input file contains one test case. For each case, the first line gives an integer N (2≤N≤63), then followed by a line that contains all the N distinct characters and their frequencies in the following format:

c[1] f[1] c[2] f[2] … c[N] f[N]

where c[i] is a character chosen from {‘0’ - ‘9’, ‘a’ - ‘z’, ‘A’ - ‘Z’, ‘_’}, and f[i] is the frequency of c[i] and is an integer no more than 1000. The next line gives a positive integer M (≤1000), then followed by M student submissions. Each student submission consists of N lines, each in the format:

c[i] code[i]

where c[i] is the i-th character and code[i] is an non-empty string of no more than 63 '0’s and '1’s.

Output Specification:
For each test case, print in each line either “Yes” if the student’s submission is correct, or “No” if not.

Note: The optimal solution is not necessarily generated by Huffman algorithm. Any prefix code with code length being optimal is considered correct.

Sample Input:

7
A 1 B 1 C 1 D 3 E 3 F 6 G 6
4
A 00000
B 00001
C 0001
D 001
E 01
F 10
G 11
A 01010
B 01011
C 0100
D 011
E 10
F 11
G 00
A 000
B 001
C 010
D 011
E 100
F 101
G 110
A 00000
B 00001
C 0001
D 001
E 00
F 10
G 11

Sample Output:

Yes
Yes
No
No

分析

大概题意就是根据输入的字符和频率判断给出的字符编码是否最优编码

提交错误改了一晚上,直到我发现样例输出是"Yes"和"No",而不是“yes”和"no"…

解法一

纯粹地模拟,把哈夫曼树建起来,再根据每个学生的提交建树,最后判断是否最优。
循规蹈矩地自己建最小堆存哈夫曼树,最小堆的插入,最小堆的删除,最小堆的初始化,哈夫曼树的创建。
然后正戏开始,当输入字符为 ‘0’,建立左树,输入字符为 ‘1’,建立右树,然后在输入字符串结束的地方存放它的权值,当一组数据输入完成,就能得到一棵树,再验证该树的结点个数,WPL 是否和最优的哈夫曼树相同

#include<cstdio>
#include<cstdlib>
#include<string>
#include<iostream>
#include<map>
#define HeapCapacity 64
#define MinData 0
typedef struct TreeNode *HuffmanTree;
typedef struct Heap *MinHeap;
struct Heap{
      // 堆 
	HuffmanTree *data;  // 存哈夫曼树 
	int size; // 堆的当前大小 
};
struct TreeNode{
      // 哈夫曼树 
	int weight;  // 频率
	HuffmanTree left; 
	HuffmanTree right; 
};
using namespace std;

MinHeap createHeap();   // 建堆 
HuffmanTree createHuffman();  // 建哈夫曼树 
void sortHeap(MinHeap H,int i); // 调整子最小堆 
void adjust(MinHeap H);  // 调整堆 
MinHeap InitHeap(int n); // 初始化堆 
HuffmanTree Delete(MinHeap H); // 堆的删除 
void Insert(MinHeap H,HuffmanTree Huff); // 堆的插入 
HuffmanTree Huffman(MinHeap H);  // 哈夫曼树的构造 
int WPL(HuffmanTree Huff,int depth); // 计算 HuffmanTree 的编码长度 


map<char,int> mappp;  // 保存字符到频率的映射关系 

// 建堆 
MinHeap createHeap(){
   
	MinHeap H;
	H = (MinHeap)malloc(sizeof(struct Heap));
	H->data = (HuffmanTree *)malloc(sizeof(struct TreeNode) * HeapCapacity);
	H->size = 0; 
	// 设置哨兵
	HuffmanTree Huff = createHuffman();
	H->data[0] = Huff; 
	return H;
}

// 建哈夫曼树 
HuffmanTree createHuffman(){
   
	HuffmanTree Huff;
	Huff = (HuffmanTree)malloc(sizeof(struct TreeNode));
	Huff->weight = MinData;   // 初始化成频率最小 
	Huff->left = NULL;
	Huff->right = NULL;
	return Huff;
}

// 调整子最小堆 
void sortHeap(MinHeap H
为了回答这个问题,我将介绍哈夫曼树的基本概念和构建过程,以及如何使用哈夫曼树进行编码。 哈夫曼树是一种特殊的二叉,它的叶子节点对应于要编码的字符,而非叶子节点对应于权值。哈夫曼树的构建过程是这样的: 1. 统计文件中每个字符出现的频度。 2. 将每个字符及其频度作为一个节点,构建一个森林。 3. 从森林中选出两个权值最小的节点,将它们合并成一个新节点,权值为两个节点的权值之和,新节点作为一棵的根节点,原来的两个节点作为新节点的左右子节点。 4. 将新节点插入到森林中,重复步骤3,直到森林中只剩下一棵,这棵就是哈夫曼树哈夫曼树构建完成后,就可以使用它来进行编码了。编码的过程是这样的: 1. 从哈夫曼树的根节点开始,如果当前节点是叶子节点,则输出该节点对应的字符的编码。 2. 如果当前节点不是叶子节点,则根据要编码的字符在哈夫曼树中的位置,向左或向右遍历哈夫曼树,直到找到叶子节点为止。 下面是一个Python实现的例子,假设要编码的文件名为input.txt,编码后的文件名为output.txt: ```python import heapq import os class HuffmanNode: def __init__(self, char, freq): self.char = char self.freq = freq self.left = None self.right = None def __lt__(self, other): return self.freq < other.freq class HuffmanTree: def __init__(self, text): self.text = text self.freq = {} self.heap = [] self.codes = {} self.reverse_codes = {} def count_freq(self): for char in self.text: if char in self.freq: self.freq[char] += 1 else: self.freq[char] = 1 def make_heap(self): for char in self.freq: node = HuffmanNode(char, self.freq[char]) heapq.heappush(self.heap, node) def merge_nodes(self): while len(self.heap) > 1: node1 = heapq.heappop(self.heap) node2 = heapq.heappop(self.heap) merged = HuffmanNode(None, node1.freq + node2.freq) merged.left = node1 merged.right = node2 heapq.heappush(self.heap, merged) def make_codes_helper(self, root, current_code): if root is None: return if root.char is not None: self.codes[root.char] = current_code self.reverse_codes[current_code] = root.char return self.make_codes_helper(root.left, current_code + "0") self.make_codes_helper(root.right, current_code + "1") def make_codes(self): root = heapq.heappop(self.heap) current_code = "" self.make_codes_helper(root, current_code) def get_encoded_text(self): encoded_text = "" for char in self.text: encoded_text += self.codes[char] return encoded_text def pad_encoded_text(self, encoded_text): extra_padding = 8 - len(encoded_text) % 8 for i in range(extra_padding): encoded_text += "0" padded_info = "{0:08b}".format(extra_padding) padded_encoded_text = padded_info + encoded_text return padded_encoded_text def get_byte_array(self, padded_encoded_text): if len(padded_encoded_text) % 8 != 0: print("Encoded text not padded properly") exit(0) b = bytearray() for i in range(0, len(padded_encoded_text), 8): byte = padded_encoded_text[i:i+8] b.append(int(byte, 2)) return b def compress(self): self.count_freq() self.make_heap() self.merge_nodes() self.make_codes() encoded_text = self.get_encoded_text() padded_encoded_text = self.pad_encoded_text(encoded_text) b = self.get_byte_array(padded_encoded_text) with open("output.txt", "wb") as output: output.write(bytes(b)) print("Compressed") return "output.txt" def remove_padding(self, padded_encoded_text): padded_info = padded_encoded_text[:8] extra_padding = int(padded_info, 2) padded_encoded_text = padded_encoded_text[8:] encoded_text = padded_encoded_text[:-1*extra_padding] return encoded_text def decode_text(self, encoded_text): current_code = "" decoded_text = "" for bit in encoded_text: current_code += bit if current_code in self.reverse_codes: char = self.reverse_codes[current_code] decoded_text += char current_code = "" return decoded_text def decompress(self, input_path): with open(input_path, "rb") as file: bit_string = "" byte = file.read(1) while byte: byte = ord(byte) bits = bin(byte)[2:].rjust(8, '0') bit_string += bits byte = file.read(1) encoded_text = self.remove_padding(bit_string) decompressed_text = self.decode_text(encoded_text) with open("decompressed.txt", "w") as output: output.write(decompressed_text) print("Decompressed") return "decompressed.txt" text = open("input.txt", "r").read() tree = HuffmanTree(text) compressed_file_path = tree.compress() decompressed_file_path = tree.decompress(compressed_file_path) ```
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值