关于哈夫曼编码的一些问题

最新推荐文章于 2024-10-05 15:32:05 发布

张焚雪

最新推荐文章于 2024-10-05 15:32:05 发布

阅读量707

点赞数 12

文章标签：数据结构算法

本文链接：https://blog.csdn.net/2301_79096986/article/details/141817777

版权

一、前言

我偶然在编程刷题网站看到了一题，它是让我们给一个给定了字母出现频次的字母表进行编码，以让其最后整体编码长度最小。看到了这里，自然一下就想到了哈夫曼编码，所以就写了这篇整理哈夫曼编码有关问题的文章。

在这篇文章中，我会给出哈夫曼编码的基础内容，即它的概念，然后将通过流程图来展示我的代码逻辑并附上我的代码，最后我将我的代码进行修改，让它可以对一个英文的文章进行统计，然后用统计结果来进行编码，并打印出编码来。

二、哈夫曼编码

2.1 概念

哈夫曼编码（Huffman Coding）是一种广泛使用的数据压缩算法，主要用于无损数据压缩。其核心是通过字母在文本中出现频次的不同而有不同长度的二进制编码，举个例子，有a与b与c这三个字母组成的文段，具体为：aaabbbaaaac。其中a出现了7次，b出现了三次，c出现了一次那么就依靠这个次数编码：a:1，b:00，c:01。

我们观察这个例子，不难发现一个特点，就是较短的那个编码不会是较长的编码的一部分，好比说a改为0，其他不变，如果这样，我们就无法区分了，因为之前的那个文段在转为编码后为这样：000000000000001，我们可以按两个、两个的0去看，这样就成了：bbbbbbac，这显然与正确答案不同，所以为了区分就要像方才我说的那样，短的编码不能是长的编码的一部分，而要做到这样一点，就需要哈夫曼树。

2.2 哈夫曼树

依旧用刚才的那个例子，在这里我直接给出哈夫曼树：

上图就是刚才的例子组成的哈夫曼树。而构建它的过程也很简单，首先我们将所有的字母与其出现频次组成一个表，每次从表里找最小的两个结合，并将结合的点作为新的参考点放入刚才的表中，其中这个新的参考点的值就是刚才的两个最小的频次的相加结果。

2.3 python代码1.0

class HTreeNode(object):
    def __init__(self,key,value,father,code = ''):
        self.key = key
        self.value = value
        self.father = father
        self.code = code


def find_father(child1,child2):
    value = child1.value + child2.value
    father = HTreeNode(None,value,None)
    child1.father = father
    child2.father = father
    child1.code = '0'
    child2.code = '1'
    return father

def getCoding(child):
    coding_list = []
    char_list = []
    while child.father != None:
        coding_list.append(child.code)
        char_list.append(child.key)
        child = child.father
    return coding_list, char_list

def HaffmanCoding(dic = {}):
    coding_list = []
    char_list = []
    node_list = []
    nodes = []
    for key, value in dic.items():
        node = HTreeNode(key,value,None)
        node_list.append(node)
        nodes.append(node)
    # for
    while len(node_list) > 1:
        child1 = min(node_list, key=lambda n: n.value)
        node_list.remove(child1)
        child2 = min(node_list, key=lambda n: n.value)
        node_list.remove(child2)
        father = find_father(child1, child2)
        node_list.append(father)
    # while
    for node in nodes:
        code, char = getCoding(node)
        coding_list.append(code)
        char_list.append(char)
    return coding_list, char_list

if __name__ == '__main__':
    dic = {'a':10, 'b':20, 'c':3, 'd':4, 'e':18, 'f':50}
    # 编码字母以及出现频次  可以单独编写一个函数用来返回一个频次字典
    length = 0
    index = 0
    coding_list, char_list = HaffmanCoding(dic)

    for key in dic.keys():
        length = length + len(coding_list[index])*dic[key]
        index += 1
    print("the min length : ",length)

2.4 流程图

在这里，我只写了关于哈夫曼编码部分的流程图，而非整个代码的流程图。流程图如下：

2.5 python代码2.0

接下来，我考虑可以让代码自己建立不同的字典，并打印出最后的编码来，所以就有了2.0的代码。代码如下：

class HTreeNode(object):
    def __init__(self,key,value,father,code = ''):
        self.key = key
        self.value = value
        self.father = father
        self.code = code


def find_father(child1,child2):
    value = child1.value + child2.value
    father = HTreeNode(None,value,None)
    child1.father = father
    child2.father = father
    child1.code = '0'
    child2.code = '1'
    return father

def getCoding(child):
    coding_list = []
    char_list = []
    while child.father != None:
        coding_list.append(child.code)
        char_list.append(child.key)
        child = child.father
    return coding_list, char_list

def HaffmanCoding(dic = {}):
    coding_list = []
    char_list = []
    node_list = []
    nodes = []
    for key, value in dic.items():
        node = HTreeNode(key,value,None)
        node_list.append(node)
        nodes.append(node)
    # for
    while len(node_list) > 1:
        child1 = min(node_list, key=lambda n: n.value)
        node_list.remove(child1)
        child2 = min(node_list, key=lambda n: n.value)
        node_list.remove(child2)
        father = find_father(child1, child2)
        node_list.append(father)
    # while
    for node in nodes:
        code, char = getCoding(node)
        coding_list.append(code)
        char_list.append(char)
    return coding_list, char_list

def frequency_dic(path):
    # count_dic = {chr(i): 0 for i in range(ord('a'),ord('z')+1)}
    count_dic = {}
    with open(path,'r',encoding='utf-8') as file:
        while True:
            char = file.read(1)
            if not char:
                break
            char_l = char.lower()
            if ord('a')<= ord(char_l) <= ord('z'):
                if char_l in count_dic:
                    count_dic[char_l] += 1
                else:
                    count_dic[char_l] = 1
    return count_dic

def print_code(coding_list = []):
    str_list = [''.join(sublist) for sublist in coding_list]
    big_list = '\n'.join(str_list)
    print(big_list)

if __name__ == '__main__':
    #dic = {'a':10, 'b':20, 'c':3, 'd':4, 'e':18, 'f':50}
    # 编码字母以及出现频次  可以单独编写一个函数用来返回一个频次字典
    path = r"C:\Users\20349\Desktop\Studing\English_Text.txt"
    dic = frequency_dic(path)
    length = 0
    index = 0
    coding_list, char_list = HaffmanCoding(dic)
    print_code(coding_list)
    for key in dic.keys():
        length = length + len(coding_list[index])*dic[key]
        index += 1
    print("the min length : ",length)

（当然我这个代码里还有许多不妥的地方，只是偷懒没有改进、优化与完善）

同时我附上我让AI给我随机生成的一篇英文文章：

The Importance of Continuous Learning

Continuous learning is a key factor in personal and professional growth. In today's rapidly evolving world, it is essential to stay updated with new technologies, methodologies, and best practices. Learning does not stop after formal education; rather, it is a lifelong process that can significantly enhance one's skills and knowledge.

Why is continuous learning important? Here are a few reasons:

1. **Staying Relevant**: The job market is constantly changing, and new skills are required to remain competitive. Continuous learning helps you stay relevant in your field.
2. **Problem Solving**: Learning new things enhances your problem-solving abilities. It broadens your perspective and allows you to approach challenges from different angles.
3. **Personal Growth**: Continuous learning contributes to personal growth by increasing self-confidence and fostering a sense of accomplishment.

How can you make continuous learning a part of your life?

1. **Set Goals**: Define clear learning goals that align with your career aspirations.
2. **Utilize Resources**: Take advantage of online courses, workshops, and seminars to expand your knowledge.
3. **Practice Regularly**: Consistency is key. Make time for learning every day, even if it's just for a short period.

In conclusion, continuous learning is not just beneficial but necessary for success in both personal and professional life. Embrace the journey of lifelong learning and watch yourself grow.

（需注意，我在代码里面是只对字母进行了编码，而像阿拉伯数字之类的则进行了限制没有编码，因为这只是个例子，而若要对其编码，就只要把限制语句删除就好）

此上