霍夫曼编码
原理
霍夫曼编码是依据每个字母出现的频率来选择编码长度。具体可通过最小堆实现,先将字母按照出现的频率升序排列得到minHeap
,然后将最小的两个结点node_left
(最小),node_right
(次小)合并成一个新结点node_top
(频次为两个结点之和);在minHeap
中删除node_left
,node_right
,同时添加node_top
,根据频次调整node_top
在minHeap
中的位置。直到minHeap
中只剩下一个元素。
假设要对we will we will r u
进行压缩编码。
按照上述构建带权的二叉树:
有了上面带权重的二叉树之后,我们就可以进行编码了。我们把二叉树分支中左边的支路编码为0,右边分支表示为1。
注:图片来源详细图解哈夫曼Huffman编码树.
代码实现
转自https://www.geeksforgeeks.org/huffman-coding-greedy-algo-3/。
// C++ program for Huffman Coding
#include<queue>
#include<iostream>
using namespace std;
// A Huffman tree node
struct MinHeapNode {
// One of the input characters
char data;
// Frequency of the character
unsigned freq;
// Left and right child
MinHeapNode* left, * right;
MinHeapNode(char data, unsigned freq)
{
left = right = NULL;
this->data = data;
this->freq = freq;
}
};
// For comparison of
// two heap nodes (needed in min heap)
struct compare {
bool operator()(MinHeapNode* l, MinHeapNode* r)
{
return (l->freq > r->freq);
}
};
// Prints huffman codes from
// the root of Huffman Tree.
void printCodes(struct MinHeapNode* root, string str)
{
if (!root)
return;
if (root->data != '$')
cout << root->data << ": " << str << "\n";
printCodes(root->left, str + "0");
printCodes(root->right, str + "1");
}
// The main function that builds a Huffman Tree and
// print codes by traversing the built Huffman Tree
void HuffmanCodes(char data[], int freq[], int size)
{
struct MinHeapNode* left, * right, * top;
// Create a min heap & inserts all characters of data[]
priority_queue<MinHeapNode*, vector<MinHeapNode*>, compare> minHeap;
for (int i = 0; i < size; ++i)
minHeap.push(new MinHeapNode(data[i], freq[i]));
// Iterate while size of heap doesn't become 1
while (minHeap.size() != 1) {
// Extract the two minimum
// freq items from min heap
left = minHeap.top();
minHeap.pop();
right = minHeap.top();
minHeap.pop();
// Create a new internal node with
// frequency equal to the sum of the
// two nodes frequencies. Make the
// two extracted node as left and right children
// of this new node. Add this node
// to the min heap '$' is a special value
// for internal nodes, not used
top = new MinHeapNode('$', left->freq + right->freq);
top->left = left;
top->right = right;
minHeap.push(top);
}
// Print Huffman codes using
// the Huffman tree built above
printCodes(minHeap.top(), "");
}
// Driver program to test above functions
int main()
{
char arr[] = { 'a', 'b', 'c', 'd', 'e', 'f' };
int freq[] = { 5, 9, 12, 13, 16, 45 };
int size = sizeof(arr) / sizeof(arr[0]);
string s1 = "abcdef";
vector<char> cha(s1.c_str(), s1.c_str() + s1.size() + 1);
HuffmanCodes(&cha[0], freq, size);
return 0;
}
// This code is contributed by Aditya Goel
f: 0
c: 100
d: 101
a: 1100
b: 1101
e: 111