BloomFilter
简单代码实现
首先我们先来看一下guava如何使用的
添加依赖
<dependency>
<groupId>com.google.guava</groupId>
<artifactId>guava</artifactId>
<version>27.0.1-jre</version>
</dependency>
简单实现
package com.example.demo;
import com.google.common.hash.BloomFilter;
import com.google.common.hash.Funnel;
import com.google.common.hash.Funnels;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import java.nio.charset.Charset;
@SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
BloomFilter<CharSequence> bloomFilter = BloomFilter.create(
Funnels.stringFunnel(Charset.forName("utf-8")),
10000,0.0001);
for(int i =0;i<5000;i++){
bloomFilter.put(""+i);
}
System.out.println("数据写入完毕");
for(int i =0;i<10000;i++){
if(bloomFilter.mightContain(""+i)){
System.out.println(i+"存在");
}else {
System.out.println(i+"不存在");
}
}
SpringApplication.run(DemoApplication.class, args);
}
}
Guava底层使用的是long
源码分析
Guava的布隆过率涉及BloomFilter和BloomFilterStrategies两个类
废话不多说,直接上源码
BloomFilter有四个参数
/** The bit set of the BloomFilter (not necessarily power of 2!) */
private final LockFreeBitArray bits;
/** Number of hashes per element */
private final int numHashFunctions;
/** The funnel to translate Ts to bytes */
private final Funnel<? super T> funnel;
/** The strategy we employ to map an element T to {@code numHashFunctions} bit indexes. */
private final Strategy strategy;
- Funnel这是Guava中定义的一个接口,它和PrimitiveSink配套使用,主要是把任意类型的数据转化成Java基本数据类型(primitive value,如char,byte,int……),默认用java.nio.ByteBuffer实现,最终均转化为byte数组;
- strategy是定义在BloomFilter类内部的接口,有三个方法,put(元素),mightContain(判定元素是否存在)和ordinal方法。此接口由BloomFilterStragies实现,BloomFilterStragies是一个枚举类型。
- numHashFunctions 哈希函数个数
- LockFreeBitArray 封装这对bit数组的各种操作,如set某个位为1,计算位的大小。(在BloomFilterStrategies中)
create
static <T> BloomFilter<T> create(
Funnel<? super T> funnel, long expectedInsertions, double fpp, Strategy strategy) {
checkNotNull(funnel);
checkArgument(
expectedInsertions >= 0, "Expected insertions (%s) must be >= 0", expectedInsertions);
checkArgument(fpp > 0.0, "False positive probability (%s) must be > 0.0", fpp);
checkArgument(fpp < 1.0, "False positive probability (%s) must be < 1.0", fpp);
checkNotNull(strategy);
if (expectedInsertions == 0) {
expectedInsertions = 1;
}
/*
* TODO(user): Put a warning in the javadoc about tiny fpp values, since the resulting size
* is proportional to -log(p), but there is not much of a point after all, e.g.
* optimalM(1000, 0.0000000000000001) = 76680 which is less