Python的Huffman编码文件压缩实现
一、背景介绍:
Huffman编码是一种无损编码方式,采用Huffman树实现,具体可以见百度百科。
代码参考博客https://blog.csdn.net/EggWave/article/details/78697965。本篇将Huffman编码文件压缩脱离django框架,独立实现。
二、注意事项:
如果直接将Huffman编码写入文件,一股脑儿将0,1比特串怼进文件里,你会发现最后的压缩文件比没压缩之前的还大。因此,在输出文件之前,应该将8位比特串转化为一个字节,再逐个将字节写入文件中。
import six
#############################
#create time 2020.4.20
#定义一个huffman节点类
#############################
class Node:
def __init__(self,freq):
self.left = None
self.right = None
self.father = None
self.freq = freq
def is_left(self):
return self.father.left == self
#############################
#create time 2020.4.20
#节点生成函数
#############################
def create_nodes(frequencies):
return [Node(freq) for freq in frequencies]
#############################
#create time 2020.4.20
#huffman树生成函数
#############################
def create_huffman_tree(nodes):
queue = nodes[:]
while len(queue) > 1:
queue.sort(key=lambda item: item.freq)
node_left = queue.pop(0)
node_right = queue.pop(0)
node_father = Node(node_left.freq + node_right.freq)
node_father.left = node_left
node_father.right = node_right
node_left.father = node_father
node_right.father = node_father
queue.append(node_father)
queue[0].father = None
return queue[0]
###############################
#create time 2020.4.20
#根据huffman树产生huffman编码
###############################
def huffman_encoding(nodes, root):
codes = [''] * len(nodes)
for i in range(len(nodes)):
node_tmp = nodes[i]
while node_tmp != root:
if node_tmp.is_left():
codes[i] = '0' + codes[i]
else:
codes[i] = '1' + codes[i]
node_tmp = node_tmp.father
return codes
#############################################
#create time 2020.4.20
# 获取字符出现的频数
#############################################
def count_frequency(input_string):
# 用于存放字符
char_store = []
# 用于存放频数
freq_store = []
# 解析字符串
for index in range(len(input_string)):
if char_store.count(input_string[index]) > 0:
temp = int(freq_store[char_store.index(input_string[index])])
temp = temp + 1
freq_store[char_store.index(input_string[index])] = temp
else:
char_store.append(input_string[index])
freq_store.append(1)
# 返回字符列表和频数列表
return char_store, freq_store
############################
#create time 2020.4.20
# 获取字符、频数的列表
############################
def get_char_frequency(char_store=[], freq_store=[]):
# 用于存放char_frequency
char_frequency = []
for item in zip(char_store, freq_store):
temp = (item[0], item[1])
char_frequency.append(temp)
return char_frequency
################################
#create time 2020.4.20
#编码转换
################################
def write_file(code):
f=open("huffman_encoding.txt","wb")
out=0
while len(code)>8:
for x in range(8):
out=out<<1
if code[x]=="1":
out=out|1
code=code[8:]
f.write(six.int2byte(out))
out=0
f.write(six.int2byte(len(code)))
out=0
for i in range(len(code)):
out=out<<1
if code[i]=="1":
out=out|1
for i in range(8-len(code)):
out=out<<1
f.write(six.int2byte(out))
f.close()
return True
###############################
#create time 2020.4.20
# 将字符转换成huffman编码
###############################
def get_huffman_file(input_string, char_frequency, codes):
# 逐个字符替换
file_content = ''
for index in range(len(input_string)):
for item in zip(char_frequency, codes):
if input_string[index] == item[0][0]:
file_content = file_content + item[1]
return file_content
##################################
#create time 2020.4.20
#解码huffman编码文件
###################################
def decode_huffman(input_string, char_store, freq_store):
encode = ''
decode = ''
for index in range(len(input_string)):
encode = encode + input_string[index]
for item in zip(char_store, freq_store):
if encode == item[1]:
decode = decode + item[0]
encode = ''
return decode
fo=open("/home/ming/Desktop/图像处理/信息论大作业/text.txt","r+")#读取待压缩文件
input_string=fo.read()
fo.close()
char_store, freq_store=count_frequency(input_string)#字母出现次数统计
char_frequency=get_char_frequency(char_store,freq_store)#频段统计
nodes=create_nodes([i[1] for i in char_frequency])#节点生成
root= create_huffman_tree(nodes)#标记根结点
codes=huffman_encoding(nodes,root)#生成huffman树
save_file=get_huffman_file(input_string,char_frequency,codes)#根据生成的huffman树,生成huffman编码
write_file(save_file)#将01比特串按位写入文件
三、压缩结果:
待压缩文件大小为6.5kb,可以看如下图所示。
压缩后,压缩文件的大小为3.6kb,很明显,压缩效果显著。