【知识发现】python开源哈夫曼编码库huffman

最新推荐文章于 2024-08-14 12:11:50 发布

fjssharpsword

最新推荐文章于 2024-08-14 12:11:50 发布

阅读量2.9k

点赞数 2

分类专栏： Big data python专栏

本文链接：https://blog.csdn.net/fjssharpsword/article/details/78109265

版权

Big data 同时被 2 个专栏收录

195 篇文章 6 订阅

订阅专栏

python专栏

114 篇文章 48 订阅

订阅专栏

1、哈夫曼树：

安装：pip install huffman

Github地址： https://github.com/nicktimko/huffman
pypi地址：https://pypi.python.org/pypi/huffman
源码很值得参考。

2、案例：

# -*- coding: utf-8 -*-
'''
Created on 2017年9月26日

@author: Administrator
'''
import huffman
import collections

t1=huffman.codebook([('A', 2), ('B', 4), ('C', 1), ('D', 1)])
print (t1)
t2=huffman.codebook(collections.Counter('man the stand banana man').items())
print (t2)

说明：rovided an iterable of 2-tuples in (symbol, weight) format, generate a Huffman codebook, returned as a dictionary in {symbol: code, ...} format.

3、构造哈夫曼树参考：

#构造哈夫曼树
import heapq
trees=huff_df.values.T.tolist()  #dataframe转化成list
heapq.heapify(trees)
while len(trees)>1:
    rightChild,leftChild=heapq.heappop(trees),heapq.heappop(trees)
    parentNode=(leftChild[0]+rightChild[0],leftChild,rightChild)
    heapq.heappush(trees,parentNode)
print (trees)

huff_df是一个dataframe，转化成list，里面是结点及其频率。