Huffman coding.
The basic idea of Huffman coding is to encode symbols that have higher probability to appear with fewer bits of output. Since each symbol can be represented by different amount of bits, it is important that no symbol code is a prefix of other symbol code.
Static Huffman Coding.
Create a list of sorted nodes corresponding to probabilities for each symbol. Probability of a symbol is assigned to be equal to the node weight.
Start of loop:
1.Find and removetwo nodes with smallest probabilities. Mark nodes A and B.
2.Create new nodewith weight[node] = weight[A] +weight[B].
3.Assign left andright children of node to A and B.
4.Insert the newnode back to the sorted list.
5.Repeat the loopuntil the list consist of the only last node.
On each loop in the process of creating Huffman Tree itis possible to decide what child will be the left and what will be right. Inadvance if some symbols or sum of symbols have equal weights, it is possible toselect each of them when minimal-weight node is searched. Therefore it ispossible to create dozen different Huffman Trees. Each tree will be valid Huffman Tree and therefore can be used for compression.
Encoding Static Huffman Code.
Build Huffman Tree and calculate codes.
Start the encoding loop here.
o Read the nextinput symbol s.
o Find or calculatecode for symbol s - code[s].
o Output code[s].
o Continue theencoding loop.
Decoding Static Huffman Code.
Decoding is symmetric to encoding, however, here it is:
o Build Huffman Treeand/or calculate codes.
Start the decoding loop here.
o Find a codecorresponding to the given bits stream, starting from the current position.
Suppose found code is corresponding to symbol s. The length of code[s] is lbits.
o Output symbol s.
o Adjust currentpoint in the input stream by l bits.
o Continue thedecoding loop.
Create CodeBook with codes listed.
It is possible to maintain array, Hash-Map, list, tree, Heap or what ever structure with codes ready for use on each symbol. The key is a symbol.
We want thisprocess could be faster, so we build the Hash-Map to store the code of eachsymbol.
General notes:
- The list of probabilities do not have to contain symbols that are not in use.
#include<iostream>
#include<string>
#include<fstream>
using namespace std;
ofstream outfile;
ifstream infile;
struct node
{
intiweight;
stringcode;
chardata;
node*left,*right;
};
void bubble(node A[],int m,int n)
{
intcount=0;
while(count<2)
{
for(inti=n;i>m;i--)
{
if(A[i].iweight<A[i-1].iweight)
swap(A[i],A[i-1]);
}
count++;
}
}
int weight[256]={0},j=0;
node A[512];
string Hash[256];
void Create_Tree(node A[],int n)
{
intj,count=n-1;
for(j=0;j<count;j+=2)
{
if(count-j>1)
bubble(A,j,count);
count++;
A[count].iweight=A[j].iweight+A[j+1].iweight;
A[count].left=&A[j];
A[count].right=&A[j+1];
}
}
void Encoding(node* &T)
{
if(T!=NULL)
{
if(T->left!=NULL)
{
T->left->code=T->code+'0';
T->right->code=T->code+'1';
}
if(T->data!=NULL)
Hash[int(T->data)]=T->code;
Encoding(T->left);
Encoding(T->right);
}
}
void Decoding(char code[])
{
node*Root=&A[2*j-2];
node*p;
for(inti=0;i<strlen(code);)
{
p=Root;
while(p->right!=NULL)
{
if(code[i]=='0')
p=p->left;
if(code[i]=='1')
p=p->right;
i++;
}
outfile<<(p->data);
}
}
int main()
{
infile.open("in.txt");
charch[2000],code[10000];
infile.get(ch,2000,EOF);
infile.close();
for(inti=0;i<strlen(ch);i++)
weight[int(ch[i])]++;
for(inti=0;i<256;i++)
{
if(weight[i]!=0)
{
A[j].iweight=weight[i];
A[j].data=i;
j++;
}
}
Create_Tree(A,j);
node*p=&A[2*j-2];
Encoding(p);
outfile.open("out.txt");
for(inti=0;i<strlen(ch);i++)
outfile<<Hash[int(ch[i])];
outfile.close();
infile.open("out.txt");
outfile.open("Decoding.txt");
infile.get(code,10000,EOF);//marked
Decoding(code);
infile.close();
outfile.close();
return0;
}