线性概率计数器可以在线性时间你估计数据的基数。
# -*- coding: utf-8 -*-
"""
Created on Tue Jan 15 16:17:10 2013
@author: chen ming
基数估计:线性概率计数器
"""
import random
import math
M=120000 #
rna= 20000
N=14
data=[random.random() for i in range(M)]
bitmap=[0]*rna
for it in data:
ind = ((hash(it))) % (rna-1)
bitmap[ind]=1
rate=float(rna-sum(bitmap))/rna
print 'sum(bitmap) %d' % sum(bitmap)
print 'size(bitmap) %d' % rna
print 'num of 0 in bitmap %f' % rate
aa=-rna*math.log(rate)
print 'result: %d ' % aa
ac=aa/M
print 'result//M: %f ' % ac
结果:
sum(bitmap) 19942
size(bitmap) 20000
num of 0 in bitmap 0.002900
result: 116860
result//M: 0.973841
准确率为97.3%