我有一个作业,写在:
如果完美,存储一个英国英语字母的平均位数是多少
是否使用压缩?
因为实验的熵可以解释为存储结果所需的最小位数。我试着做了一个程序,计算所有字母的熵,然后把它们加在一起,得到所有字母的熵。在
这给了我4.17位,但是根据this link
有了完美的压缩算法,每个字符只需要2位!在
那么我该如何实现这个完美的压缩算法呢?在import math
letters=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
sum =0
def find_perc(s):
perc=[0.082,0.015,0.028,0.043,0.127,0.022,0.02,0.061,0.07,0.002,0.008,0.04,0.024,0.067,0.075,0.019,0.001,0.060,0.063,0.091,0.028,0.01,0.023,0.001,0.02,0.001]
letter=['A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z']
pos = 0
temp = s.upper()
if temp in letter:
for x in xrange(1,len(letter)):
if temp==letter[x]:
pos = x
return perc[pos]
def calc_ent(s):
P=find_perc(s)
sum=0
#Calculates the Entropy of the current letter
temp = P *(math.log(1/P)/math.log(2))
#Does the same thing just for binary entropy (i think)
#temp = (-P*(math.log(P)/math.log(2)))-((1-P)*(math.log(1-P)/math.log(2)))
sum=temp
return sum
for x in xrange(0,25):
sum=sum+calc_ent(letters[x])
print "The min bit is : %f"%sum