《Introduction to data mining》中有一个例子:假设有10组相同大小不同种类的数,从中随机选取R个,R属于0-60的范围,每种样本大小下10种类型都被选中的概率的变化趋势是怎样的。通过使用随机数生成进行抽样实验,可以得到概率的走势。代码如下:
import random
import pylab as pl
def _getAnswer(sample):
dic={}
coun=0
random.seed()
answer = [ random.choice(['A','B','C','D','E','F','G','H','I','J']) for i in range(sample) ]
for i in range(len(answer)):
dic[answer[i]]=dic.get([answer[i]].0)+1
for j in dic:
if dic[j]==0:
break
else:
coun=coun+1
if coun==10:
return True
else:
return False
if __name__ == '__main__':
n=10000
results=[]
for sample in range(10,60):
count=0
for i in range(n):
if _getAnswer(sample):
count=count+1
re=count/10000
results.append(re)
pl.plot(range(10,60),results)
pl.show()
进行10000次实验最终生成的图形: