信息熵计算及代码

最新推荐文章于 2024-06-11 14:35:37 发布

Ie802.3

最新推荐文章于 2024-06-11 14:35:37 发布

阅读量2.1k

点赞数 4

分类专栏：信息论文章标签：算法

本文链接：https://blog.csdn.net/weixin_52450702/article/details/126853331

版权

信息论专栏收录该内容

1 篇文章 0 订阅

订阅专栏

该博客介绍了如何使用Python进行离散信源数学模型的构建，计算信息熵，并通过matplotlib绘制字母出现概率的柱状图。内容涉及信息熵的物理含义、计算方法以及其在衡量信源不确定性中的作用。通过示例文本，展示了如何统计字母出现频率并计算信息熵，最后展示了一个可视化结果。

摘要由CSDN通过智能技术生成

实验目的:了解离散信源数学模型和信息熵
实验内容:以附件中英文文本文件中的内容为来源，构建26个英文字母(区分大小写)为信源符号的数学模型，要求输出字母的概率和该模型的信息熵。
要求:请使用自己熟悉的编程语言，完成信源建模，输出英文字母的概率和信源的信息熵。

使用python编写，最后输出相应柱状图，展示出字母的输出概率。

等消息个数信源，消息概率分布差异大，信源熵小，不确定程度小;消息等概分布，信源熵大，不确定程度大。
消息等概分布，消息个数多，信源熵大,不确定程度大。

信源熵有三种物理含义:

信源熵H(X)表示信源输出后，离散消息所提供的平均信息量。
信源熵H(X)表示信源输出前，信源的平均不确定度。
信源熵H(X)反映了变量X的随机性。

信息熵计算过程为：

for i in dict3:
    if dict3[i] != 0:
        sum1 += dict3[i] * (math.log(1 / (dict3[i]), 2))

全部代码如下所示：

import string
import matplotlib.pyplot as plt
import math


def draw_from_dict(dicdata, RANGE, heng=0):
    # dicdata：字典的数据。
    # RANGE：截取显示的字典的长度。
    # heng=0，代表条状图的柱子是竖直向上的。heng=1，代表柱子是横向的。考虑到文字是从左到右的，让柱子横向排列更容易观察坐标轴。
    by_value = sorted(dicdata.items(), key=lambda item: item[0], reverse=False)
    x = []
    y = []
    plt.xlabel("Sequential letters")
    plt.ylabel("Probability of occurrence of each letter")
    plt.title("Character probability statistics")
    for xx, yy in zip(dicdata.keys(), dicdata.values()):
        # plt.text(xx, yy + 0.1, str(yy), ha='center')
        if yy != 0:
            plt.text(xx, yy, '%.3f' % yy, ha='center', va='bottom', fontsize=5)
    for d in by_value:
        x.append(d[0])
        y.append(d[1])
    if heng == 0:
        plt.bar(x[0:RANGE], y[0:RANGE])
        plt.show()
        return
    elif heng == 1:
        plt.barh(x[0:RANGE], y[0:RANGE])
        plt.show()
        return
    else:
        return "heng的值仅为0或1！"


def countLetters(string):
    s_count = 0
    for i in s:
        if i.isalpha():
            s_count += 1
    print('字母的个数有：', s_count, '个')
    return s_count


s = 'Love is a set of emotions and behaviors characterized by intimacy, passion, and commitment. It involves care, ' \
    'closeness, protectiveness, attraction, affection, and trust. Love can vary in intensity and can change over ' \
    'time. It is associated with a range of positive emotions, including happiness, excitement, life satisfaction, ' \
    'and euphoria, but it can also result in negative emotions such as jealousy and stress. '
letterSum = countLetters(s)
print(letterSum)
asciiAll = string.ascii_lowercase + string.ascii_uppercase
dict3 = {key: 0.0 for key in asciiAll}
for x in s:
    if x.isalpha():
        dict3[x] = round(s.count(x) / letterSum, 6)

for i in sorted(dict3):
    print((i, dict3[i]), end="\n")
sum1 = 0
for i in dict3:
    if dict3[i] != 0:
        sum1 += dict3[i] * (math.log(1 / (dict3[i]), 2))

print('计算出的熵是', round(sum1, 4))

draw_from_dict(dict3, 52, 0)

通过matplotlib包输出可视化图形，输出的柱状图如下

Ie802.3

关注

4
点赞
踩
18

收藏

觉得还不错? 一键收藏
0
评论
信息熵计算及代码

以附件中英文文本文件中的内容为来源，构建26个英文字母(区分大小写)为信源符号的数学模型，要求输出字母的概率和该模型的信息熵。请使用自己熟悉的编程语言，完成信源建模，输出英文字母的概率和信源的信息熵。使用python编写，最后输出相应柱状图，展示出字母的输出概率。了解离散信源数学模型和信息熵。
复制链接

扫一扫

专栏目录