Python中使用哈夫曼算法实现文件的压缩与解压缩

最新推荐文章于 2024-05-11 10:30:00 发布

LeafCC

最新推荐文章于 2024-05-11 10:30:00 发布

阅读量2.8k

点赞数 14

CC 4.0 BY-SA版权

文章标签： Python 哈夫曼算法文件压缩文件解压

本文链接：https://blog.csdn.net/cc815107613/article/details/103260408

比较详细的注释，故在此不多解释，注意.docx文件效率比较差，因为其本身已经压缩，txt则可以见到明显的压缩效果
github地址：https://github.com/LeafCCC/HuffmanFileCompression

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import sys
sys.setrecursionlimit(1000000)	  #python默认的递归深度有限，约900多，压缩文件时会超过，故引用sys修改最大递归深度

# 代码中出现的列表及字典的详细解释
# bytes_list: 存放读入的字节
# count_dict: key 不重复的字节 val 出现的次数
# node_dict: key 不重复的字节 val 对应的结点
# bytes_dict: key 不重复的字节 val 对应的编码
# nodes : 保存结点并构建树，构建完后只剩下根结点

#哈夫曼树结点的类定义
class node(object):
    def __init__(self,value=None,left=None,right=None,father=None):
        self.value=value
        self.left=left
        self.right=right
        self.father=father #分别定义左右结点，父结点，和结点的权重

#构建哈夫曼树
def creat_tree(nodes_list):
    nodes_list.sort(key=lambda x: x.value)      #将结点列表进行升序排序
    if len(nodes_list)==1:
        return nodes_list[0]        #只有一个结点时，返回根结点
    father_node=node(nodes_list[0].value+nodes_list[1].value,nodes_list[0],nodes_list[1]) #创建最小的两个权重结点的父节点
    nodes_list[0].father=nodes_list[1].father=father_node
    nodes_list.pop(0)
    nodes_list.pop(0)
    nodes_list.insert(0,father_node)   #删除最小的两个结点并加入父结点
    return creat_tree(nodes_list)

def node_encode(node1):            #对叶子结点进行编码
    if node1.father==None:
        return b''
    if node1.father.left==node1:
        return node_encode(node1.father)+b'0'
    else:
        return node_encode(node1.father)+b'1'

def file_encode(input_file):
    print('打开文件并读取中...\n')
    with open(input_file,'rb') as f:
        f.seek(0, 2)        #读取文件的总长度，seek(0,2)移到文件末尾，tell()指出当前位置，并且用seek(0)重新回到起点
        size=f.tell()
        f.seek(0)
        bytes_list=[0]*size  #创建一个长度为size的列表，存放读入的字节

        i=0
        while i<size:
            bytes_list[i]=f.read(1