哈夫曼编码



</pre>定义哈夫曼树之前先说明几个与哈夫曼树有关的概念:<span style="font-size:14px; font-family: Arial; line-height: 26px; "> </span><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;"><u>路径</u>: </span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;">        树中一个结点到另一个结点之间的分支构成这两个结点之间的路径。</span><span style="font-size:14px; "> </span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;"><u>路径长度</u>:</span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;">       路径上的分枝数目称作路径长度。</span><span style="font-size:14px; "> </span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;"><u>树的路径长度</u>:</span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;">       从树根到每一个结点的路径长度之和。</span><span style="font-size:14px; "> </span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;"><u>结点的带权路径长度</u>:</span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size:14px;">      在一棵树中,如果其结点上附带有一个权值,通常把该结点的路径长度与该结点上的权值</span><span style="font-size:14px; ">之积称为该结点的带权路径长度(weighted path length)</span></p></blockquote><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><strong><span style="font-size:18px;">  什么是权值?( From 百度百科 )</span></strong></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "><strong>     计算机领域中(<a target=_blank href="http://baike.baidu.com/view/9900.htm" target="_blank" style="color: rgb(202, 0, 0); text-decoration: none; ">数据结构</a>)</strong></p><div class="spctrl" style="font-family: Arial; font-size: 14px; line-height: 26px; "></div><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; ">         权值就是定义的路径上面的值。可以这样理解为节点间的距离。通常指字符对应的二进制编码出现的概率。</p><div class="spctrl" style="font-family: Arial; font-size: 14px; line-height: 26px; "></div><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; ">  至于哈夫曼树中的权值可以理解为:权值大表明出现概率大! </p><div class="spctrl" style="font-family: Arial; font-size: 14px; line-height: 26px; "></div><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; ">       一个结点的权值实际上就是这个结点子树在整个树中所占的比例.</p><div class="spctrl" style="font-family: Arial; font-size: 14px; line-height: 26px; "></div><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; ">  abcd四个<a target=_blank href="http://baike.baidu.com/view/2335663.htm" target="_blank" style="color: rgb(202, 0, 0); text-decoration: none; ">叶子结点</a>的权值为7,5,2,4. 这个7,5,2,4是根据实际情况得到的,比如说从一段文本中统计出abcd四个字母出现的次数分别为7,5,2,4. 说a结点的权值为7,意思是说a结点在系统中占有7这个份量.实际上也可以化为百分比来表示,但反而麻烦,实际上是一样的.</p><p style="text-align: center;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "></p><p style="text-align: center;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 18px; color: rgb(51, 0, 51); "><strong> 设某二叉树有n个带权值的叶子结点,则该二叉树的带权路径长度记为:</strong></span></span><img src="https://img-blog.csdn.net/20140802171506353?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvd29sb25nemh1bWVuZw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="" /></p><p style="text-align: center;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(51, 0, 51); ">公式中,Wk为第k个叶子结点的权值;Lk为该结点的路径长度</span></span></p><p style="text-align: left;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; line-height: 26px; "><span style="font-size: 14px;">               示例</span></p><p style="text-align: left;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "><img src="http://hi.csdn.net/attachment/201203/11/0_13314321000EIR.gif" alt="" /></p><p style="text-align: left;margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "></p><p style="text-align: left; margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-size: 14px; line-height: 26px; "></p><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><strong><span style="font-size: 16px; color: rgb(51, 0, 51); ">一般来说,用n(n>0)个带权值的叶子来构造二叉树,限定二叉树中除了这n个叶子外只能出现度为2的结点。</span></strong></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "> <strong><span style="font-size: 16px; color: rgb(51, 0, 51); ">那么符合这样条件的二叉树往往可构造出许多棵,</span></strong><span style="color: rgb(51, 0, 51); font-size: 16px; "><strong>其中带权路径长度最小的二叉树就称为</strong></span><span style="font-size: 16px; color: rgb(0, 0, 153); "><strong>哈夫曼树</strong></span><span style="color: rgb(51, 0, 51); font-size: 16px; "><strong>或</strong></span><span style="font-size: 16px; color: rgb(0, 153, 0); "><strong><span style="color: rgb(0, 0, 153); ">最优二叉树</span></strong></span></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(0, 153, 0); "><strong><span style="color: rgb(0, 0, 153); "></span></strong></span></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(0, 153, 0); "><strong><span style="color: rgb(0, 0, 153); "></span></strong></span><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; font-family: Arial; font-size: 14px; line-height: 26px; "><strong><span style="font-size: 18px; color: rgb(51, 0, 51); ">  <span style="font-size: 24px; ">二、哈夫曼树的构造</span></span></strong></p><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><strong><span style="font-size: 16px; color: rgb(51, 0, 51); "></span></strong> </div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><strong><span style="font-size: 16px; color: rgb(51, 0, 51); ">根据哈夫曼树的定义,一棵二叉树要使其WPL值最小,必须使权值越大的叶子结点越靠近根结点,而权值越小的叶子结点</span></strong></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><strong><span style="font-size: 16px; color: rgb(51, 0, 51); ">越远离根结点。</span></strong> </div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(51, 0, 51); ">哈夫曼依据这一特点提出了一种构造最优二叉树的方法,其基本思想如下:</span></div><img src="http://hi.csdn.net/attachment/201203/11/0_1331433194xuvE.gif" alt="" /></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(0, 153, 0); "><strong><span style="color: rgb(0, 0, 153); "></span></strong></span></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; font-weight: bold; color: rgb(0, 153, 0); "><span style="color: rgb(0, 0, 153); "><span style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(51, 0, 51); ">下面演示了用Huffman算法构造一棵Huffman树的过程:</span></span></span></span></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; font-weight: bold; color: rgb(0, 153, 0); "><span style="color: rgb(0, 0, 153); "><span style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(51, 0, 51); "><img src="http://hi.csdn.net/attachment/201203/11/0_1331433273zW25.gif" alt="" /></span></span></span></span></div><div style="font-size: 14px; line-height: 26px; "><span style="font-size: 16px; "><strong><span style="font-size: 14px; line-height: 26px; "><span style="font-size: 16px; "></span></span></strong></span><div style="font-size: 14px; line-height: 26px; "><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; "><span style="font-weight: bold;"><span style="font-size: 18px; "><span style="font-size: 24px; "><span style="font-family:Microsoft YaHei;">三、哈夫曼树的在编码中的应用</span></span></span></span></p><p style="margin-top: 0px; margin-bottom: 0px; padding-top: 0px; padding-bottom: 0px; "><span style="font-family:Microsoft YaHei;"><strong><span style="font-size: 24px; "></span></strong> </span></p></div><div style="font-size: 14px; line-height: 26px; "><div><strong><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">在电文传输中,需要将电文中出现的每个字符进行二进制编码。在设计编码时需要遵守两个原则:</span></span></strong></div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">(1)发送方传输的二进制编码,到接收方解码后必须具有唯一性,即解码结果与发送方发送的电文完全一样;</span></strong></span></div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">(2)发送的二进制编码尽可能地短。下面我们介绍两种编码的方式。</span></strong></span></div><div><span style="font-family:Microsoft YaHei;"> </span></div><div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">1. 等长编码</span></strong></span></div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">            这种编码方式的特点是每个字符的编码长度相同(编码长度就是每个编码所含的二进制位数)。假设字符集只含有4个字符A,B,C,D,用二进制两位表示的编码分别为00,01,10,11。若现在有一段电文为:ABACCDA,则应发送二进制序列:00010010101100,总长度为14位。当接收方接收到这段电文后,将按两位一段进行译码。这种编码的特点是译码简单且具有唯一性,但编码长度并不是最短的。</span></strong></span></div><div><span style="font-family:Microsoft YaHei;"><span style="font-size: 16px; "><strong></strong></span> </span></div><div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">2. 不等长编码</span></strong></span></div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">            在传送电文时,为了使其二进制位数尽可能地少,可以将每个字符的编码设计为不等长的,使用频度较高的字符分配一个相对比较短的编码,使用频度较低的字符分配一个比较长的编码。例如,可以为A,B,C,D四个字符分别分配0,00,1,01,并可将上述电文用二进制序列:000011010发送,其长度只有9个二进制位,但随之带来了一个问题,接收方接到这段电文后无法进行译码,因为无法断定前面4个0是4个A,1个B、2个A,还是2个B,即译码不唯一,因此这种编码方法不可使用。</span></strong></span></div><div><span style="font-family:Microsoft YaHei;"><span style="font-size: 16px; "></span> </span></div><div><span style="font-weight: bold;"><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">因此,为了设计长短不等的编码,以便减少电文的总长,还必须考虑编码的<span style="background-color: rgb(51, 204, 0); ">唯一性</span>,即在建立不等长编码时必须使任何一个字符的编码都不是另一个字符的前缀,这宗编码称为前缀编码(prefix  code)</span></span></span></div><div><span style="font-family:Microsoft YaHei;"> </span></div><div><span style="font-family:Microsoft YaHei;"> </span></div><div><span style="font-family:Microsoft YaHei;"> </span></div><div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">(1)利用字符集中每个字符的使用频率作为权值构造一个哈夫曼树;</span></strong></span></div><div><span style="font-size: 16px; "><strong><span style="font-family:Microsoft YaHei;">(2)从根结点开始,为到每个叶子结点路径上的左分支赋予0,右分支赋予1,并从根到叶子方向形成该叶子结点的编码</span></strong></span></div><div><span style="font-family:Microsoft YaHei;"><strong><span style="font-size: 16px; "></span></strong> </span></div><div><span style="font-family:Microsoft YaHei;"><strong><span style="font-size: 16px; "></span></strong> </span></div><div><strong><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">例题:</span></span></strong></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">假设一个文本文件TFile中只包含7个字符{A,B,C,D,E,F,G},这7个字符在文本中出现的次数为{5,24,7,17,34,5,13}</span></span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">利用哈夫曼树可以为文件TFile构造出符合前缀编码要求的不等长编码</span></span></div><div><span style="font-family:Microsoft YaHei;"><span style="font-size: 16px; "></span> </span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">具体做法:</span></span></div><div><span style="font-family:Microsoft YaHei;"><span style="font-size: 16px; "></span> </span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">1. 将TFile中7个字符都作为叶子结点,每个字符出现次数作为该叶子结点的权值</span></span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">2. 规定哈夫曼树中所有左分支表示字符0,所有右分支表示字符1,将依次从根结点到每个叶子结点所经过的分支的二进制位的序列作为该</span></span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">     结点对应的字符编码</span></span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">3. 由于从根结点到任何一个叶子结点都不可能经过其他叶子,这种编码一定是前缀编码,哈夫曼树的带权路径长度正好是文件TFile编码</span></span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">    的总长度</span></span></div><div><span style="font-family:Microsoft YaHei;"><span style="font-size: 16px; "></span> </span></div><div><span style="font-size: 16px; "><span style="font-family:Microsoft YaHei;">通过哈夫曼树来构造的编码称为<strong>哈弗曼编码(huffman code)</strong></span></span></div></div></div></div></div><img src="https://img-blog.csdn.net/20140802174742796?watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvd29sb25nemh1bWVuZw==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/SouthEast" alt="" /></div><div style="font-family: Arial; font-size: 14px; line-height: 26px; "><span style="font-size: 16px; color: rgb(0, 153, 0); "><strong><span style="color: rgb(0, 0, 153); "></span></strong></span>


<pre name="code" class="cpp">#include<iostream>#include<cstdio>#include<cstring>using namespace std;#define N 10 // 带编码字符的个数,即树中叶结点的最大个数#define M (2*N-1) // 树中总的结点数目class HTNode{ // 树中结点的结构public: unsigned int weight;unsigned int parent,lchild,rchild;}; class HTCode{public:char data; // 待编码的字符int weight; // 字符的权值char code[N]; // 字符的编码};void Init(HTCode hc[], int *n){// 初始化,读入待编码字符的个数n,从键盘输入n个字符和n个权值int i;printf("input n = ");scanf("%d",&(*n));printf("\ninput %d character\n",*n); fflush(stdin);for(i=1; i<=*n; ++i)scanf("%c",&hc[i].data);printf("\ninput %d weight\n",*n);for(i=1; i<=*n; ++i)scanf("%d",&(hc[i].weight) );fflush(stdin);}//void Select(HTNode ht[], int k, int *s1, int *s2){// ht[1...k]中选择parent为0,并且weight最小的两个结点,其序号由指针变量s1,s2指示int i;for(i=1; i<=k && ht[i].parent != 0; ++i){ ; ;}*s1 = i;for(i=1; i<=k; ++i){if(ht[i].parent==0 && ht[i].weight<ht[*s1].weight)*s1 = i;}for(i=1; i<=k; ++i){if(ht[i].parent==0 && i!=*s1)break;}*s2 = i;for(i=1; i<=k; ++i){if(ht[i].parent==0 && i!=*s1 && ht[i].weight<ht[*s2].weight)*s2 = i;}}void HuffmanCoding(HTNode ht[],HTCode hc[],int n){// 构造Huffman树ht,并求出n个字符的编码char cd[N];int i,j,m,c,f,s1,s2,start;m = 2*n-1;for(i=1; i<=m; ++i){if(i <= n)ht[i].weight = hc[i].weight;elseht[i].parent = 0;ht[i].parent = ht[i].lchild = ht[i].rchild = 0;}for(i=n+1; i<=m; ++i){Select(ht, i-1, &s1, &s2);ht[s1].parent = i;ht[s2].parent = i;ht[i].lchild = s1;ht[i].rchild = s2;ht[i].weight = ht[s1].weight+ht[s2].weight;}cd[n-1] = '\0';for(i=1; i<=n; ++i){start = n-1;for(c=i,f=ht[i].parent; f; c=f,f=ht[f].parent){if(ht[f].lchild == c)cd[--start] = '0';elsecd[--start] = '1';}strcpy(hc[i].code, &cd[start]);}}int main(){int i,m,n,w[N+1];HTNode ht[M+1];HTCode hc[N+1];Init(hc, &n); // 初始化 HuffmanCoding(ht,hc,n); // 构造Huffman树,并形成字符的编码for(i=1; i<=n; ++i) printf("\n%c---%s",hc[i].data,hc[i].code); printf("\n");return 0;}

 






  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值