Python使用布隆过滤器

安装
Python
pip <span class="hljs-keyword">install</span> pybloom
1
pip < span class = "hljs-keyword" > install < / span > pybloom

该模块包含两个类实现布隆过滤器功能。
BloomFilter 是定容。
ScalableBloomFilter 可以自动扩容

使用
Python
<span class="hljs-meta">></span><span class="bash">>> from pybloom import BloomFilter</span> <span class="hljs-meta">></span><span class="bash">>> f = BloomFilter(capacity=1000, error_rate=0.001) <span class="hljs-comment"># capacity是容量, error_rate 是能容忍的误报率</span></span> <span class="hljs-meta">></span><span class="bash">>> f.add(<span class="hljs-string">'Traim304'</span>) <span class="hljs-comment"># 当不存在该元素,返回False</span></span> False <span class="hljs-meta">></span><span class="bash">>> f.add(<span class="hljs-string">'Traim304'</span>) <span class="hljs-comment"># 若存在,返回 True</span></span> True <span class="hljs-meta">></span><span class="bash">>> <span class="hljs-string">'Traim304'</span> <span class="hljs-keyword">in</span> f <span class="hljs-comment"># 值得注意的是若返回 True。该元素可能存在, 也可能不存在。过滤器能容许存在一定的错误</span></span> True <span class="hljs-meta">></span><span class="bash">>> <span class="hljs-string">'Jacob'</span> <span class="hljs-keyword">in</span> f <span class="hljs-comment"># 但是 False。则必定不存在</span></span> False <span class="hljs-meta">></span><span class="bash">>> len(f) <span class="hljs-comment"># 当前存在的元素</span></span> 1 <span class="hljs-meta"> ></span><span class="bash">>> f = BloomFilter(capacity=1000, error_rate=0.001) </span> <span class="hljs-meta">></span><span class="bash">>> from pybloom import ScalableBloomFilter</span> <span class="hljs-meta">></span><span class="bash">>> sbf = ScalableBloomFilter(mode=ScalableBloomFilter.SMALL_SET_GROWTH)</span> <span class="hljs-meta">></span><span class="bash">>> <span class="hljs-comment"># sbf.add() 与 BloomFilter 同</span></span>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> from pybloom import BloomFilter < / span >
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> f = BloomFilter ( capacity = 1000 , error_rate = 0.001 ) < span class = "hljs-comment" > # capacity是容量, error_rate 是能容忍的误报率</span></span>
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> f . add ( < span class = "hljs-string" > 'Traim304' < / span > ) < span class = "hljs-comment" > # 当不存在该元素,返回False</span></span>
False
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> f . add ( < span class = "hljs-string" > 'Traim304' < / span > ) < span class = "hljs-comment" > # 若存在,返回 True</span></span>
True
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> < span class = "hljs-string" > 'Traim304' < / span > < span class = "hljs-keyword" > in < / span > f < span class = "hljs-comment" > # 值得注意的是若返回 True。该元素可能存在, 也可能不存在。过滤器能容许存在一定的错误</span></span>
True
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> < span class = "hljs-string" > 'Jacob' < / span > < span class = "hljs-keyword" > in < / span > f < span class = "hljs-comment" > # 但是 False。则必定不存在</span></span>
False
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> len ( f ) < span class = "hljs-comment" > # 当前存在的元素</span></span>
1
< span class = "hljs-meta" >
> < / span > < span class = "bash" >>> f = BloomFilter ( capacity = 1000 , error_rate = 0.001 ) < / span >
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> from pybloom import ScalableBloomFilter < / span >
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> sbf = ScalableBloomFilter ( mode = ScalableBloomFilter . SMALL_SET_GROWTH ) < / span >
< span class = "hljs-meta" >> < / span > < span class = "bash" >>> < span class = "hljs-comment" > # sbf.add() 与 BloomFilter 同</span></span>
超过误报率时抛出异常
Python
>>> f = BloomFilter(capacity=<span class="hljs-number">1000</span>, error_rate=<span class="hljs-number">0.0000001</span>) >>> <span class="hljs-keyword">for</span> <span class="hljs-keyword">a</span> <span class="hljs-keyword">in</span> range(<span class="hljs-number">1000</span>): ... _ = f.<span class="hljs-built_in">add</span>(<span class="hljs-keyword">a</span>) ... >>> <span class="hljs-built_in">len</span>(<span class="hljs-keyword">a</span>) Traceback (most recent call <span class="hljs-keyword">last</span>): File <span class="hljs-string">"<stdin>"</span>, <span class="hljs-built_in">line</span> <span class="hljs-number">1</span>, <span class="hljs-keyword">in</span> <module> TypeError: object <span class="hljs-keyword">of</span> type <span class="hljs-string">'int'</span> has no <span class="hljs-built_in">len</span>() >>> <span class="hljs-built_in">len</span>(f) <span class="hljs-number">1000</span> >>> f.<span class="hljs-built_in">add</span>(<span class="hljs-number">1000</span>) False >>> f.<span class="hljs-built_in">add</span>(<span class="hljs-number">1001</span>) <span class="hljs-comment"># 当误报率超过 error_rate 会报错</span> Traceback (most recent call <span class="hljs-keyword">last</span>): File <span class="hljs-string">"<stdin>"</span>, <span class="hljs-built_in">line</span> <span class="hljs-number">1</span>, <span class="hljs-keyword">in</span> <module> File <span class="hljs-string">"/usr/local/lib/python2.7/site-packages/pybloom/pybloom.py"</span>, <span class="hljs-built_in">line</span> <span class="hljs-number">182</span>, <span class="hljs-keyword">in</span> <span class="hljs-built_in">add</span> raise IndexError(<span class="hljs-string">"BloomFilter is at capacity"</span>) IndexError: BloomFilter is <span class="hljs-keyword">at</span> capacity
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
>>> f = BloomFilter ( capacity = < span class = "hljs-number" > 1000 < / span > , error_rate = < span class = "hljs-number" > 0.0000001 < / span > )
>>> < span class = "hljs-keyword" > for < / span > < span class = "hljs-keyword" > a < / span > < span class = "hljs-keyword" > in < / span > range ( < span class = "hljs-number" > 1000 < / span > ) :
. . .      _ = f . < span class = "hljs-built_in" > add < / span > ( < span class = "hljs-keyword" > a < / span > )
. . .
>>> < span class = "hljs-built_in" > len < / span > ( < span class = "hljs-keyword" > a < / span > )
Traceback ( most recent call < span class = "hljs-keyword" > last < / span > ) :
   File < span class = "hljs-string" > "<stdin>" < / span > , < span class = "hljs-built_in" > line < / span > < span class = "hljs-number" > 1 < / span > , < span class = "hljs-keyword" > in < / span > < module >
TypeError : object < span class = "hljs-keyword" > of < / span > type < span class = "hljs-string" > 'int' < / span > has no < span class = "hljs-built_in" > len < / span > ( )
>>> < span class = "hljs-built_in" > len < / span > ( f )
< span class = "hljs-number" > 1000 < / span >
>>> f . < span class = "hljs-built_in" > add < / span > ( < span class = "hljs-number" > 1000 < / span > )
False
>>> f . < span class = "hljs-built_in" > add < / span > ( < span class = "hljs-number" > 1001 < / span > ) < span class = "hljs-comment" > # 当误报率超过 error_rate 会报错</span>
Traceback ( most recent call < span class = "hljs-keyword" > last < / span > ) :
   File < span class = "hljs-string" > "<stdin>" < / span > , < span class = "hljs-built_in" > line < / span > < span class = "hljs-number" > 1 < / span > , < span class = "hljs-keyword" > in < / span > < module >
   File < span class = "hljs-string" > "/usr/local/lib/python2.7/site-packages/pybloom/pybloom.py" < / span > , < span class = "hljs-built_in" > line < / span > < span class = "hljs-number" > 182 < / span > , < span class = "hljs-keyword" > in < / span > < span class = "hljs-built_in" > add < / span >
     raise IndexError ( < span class = "hljs-string" > "BloomFilter is at capacity" < / span > )
IndexError : BloomFilter is < span class = "hljs-keyword" > at < / span > capacity

pybloom github




  • zeropython 微信公众号 5868037 QQ号 5868037@qq.com QQ邮箱
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值