实验目的:了解离散信源数学模型和信息熵
实验内容:以附件中英文文本文件中的内容为来源,构建26个英文字母(区分大小写)为信源符号的数学模型,要求输出字母的概率和该模型的信息熵。
要求:请使用自己熟悉的编程语言,完成信源建模,输出英文字母的概率和信源的信息熵。
使用python编写,最后输出相应柱状图,展示出字母的输出概率。
- 等消息个数信源,消息概率分布差异大,信源熵小,不确定程度小;消息等概分布,信源熵大,不确定程度大。
- 消息等概分布,消息个数多,信源熵大,不确定程度大。
信源熵有三种物理含义:
- 信源熵H(X)表示信源输出后,离散消息所提供的平均信息量。
- 信源熵H(X)表示信源输出前,信源的平均不确定度。
- 信源熵H(X)反映了变量X的随机性。
信息熵计算过程为:
for i in dict3:
if dict3[i] != 0:
sum1 += dict3[i] * (math.log(1 / (dict3[i]), 2))
全部代码如下所示:
import string
import matplotlib.pyplot as plt
import math
def draw_from_dict(dicdata, RANGE, heng=0):
# dicdata:字典的数据。
# RANGE:截取显示的字典的长度。
# heng=0,代表条状图的柱子是竖直向上的。heng=1,代表柱子是横向的。考虑到文字是从左到右的,让柱子横向排列更容易观察坐标轴。
by_value = sorted(dicdata.items(), key=lambda item: item[0], reverse=False)
x = []
y = []
plt.xlabel("Sequential letters")
plt.ylabel("Probability of occurrence of each letter")
plt.title("Character probability statistics")
for xx, yy in zip(dicdata.keys(), dicdata.values()):
# plt.text(xx, yy + 0.1, str(yy), ha='center')
if yy != 0:
plt.text(xx, yy, '%.3f' % yy, ha='center', va='bottom', fontsize=5)
for d in by_value:
x.append(d[0])
y.append(d[1])
if heng == 0:
plt.bar(x[0:RANGE], y[0:RANGE])
plt.show()
return
elif heng == 1:
plt.barh(x[0:RANGE], y[0:RANGE])
plt.show()
return
else:
return "heng的值仅为0或1!"
def countLetters(string):
s_count = 0
for i in s:
if i.isalpha():
s_count += 1
print('字母的个数有:', s_count, '个')
return s_count
s = 'Love is a set of emotions and behaviors characterized by intimacy, passion, and commitment. It involves care, ' \
'closeness, protectiveness, attraction, affection, and trust. Love can vary in intensity and can change over ' \
'time. It is associated with a range of positive emotions, including happiness, excitement, life satisfaction, ' \
'and euphoria, but it can also result in negative emotions such as jealousy and stress. '
letterSum = countLetters(s)
print(letterSum)
asciiAll = string.ascii_lowercase + string.ascii_uppercase
dict3 = {key: 0.0 for key in asciiAll}
for x in s:
if x.isalpha():
dict3[x] = round(s.count(x) / letterSum, 6)
for i in sorted(dict3):
print((i, dict3[i]), end="\n")
sum1 = 0
for i in dict3:
if dict3[i] != 0:
sum1 += dict3[i] * (math.log(1 / (dict3[i]), 2))
print('计算出的熵是', round(sum1, 4))
draw_from_dict(dict3, 52, 0)
通过matplotlib包输出可视化图形,输出的柱状图如下