众所周知,google的guava框架实现了BloomFilter,guava的BloomFilter和redis的bitMap都是基于位图算法的,所以redis也可以实现BloomFilter,并且相对于BloomFilter,redis的数据存在三方redis服务器上的,并不像guava的BloomFilter是存在本地的,这对于内存损耗及分布式系统来说显然是不适合的,所以今天分享一个基于redis实现的BloomFilter(看之前最好了解一下redis的bitmap----https://blog.csdn.net/u012888052/article/details/80380143)。
直接上代码:
@ConfigurationProperties("bloom.filter")
@Component
public class RedisBloomFilter {
//预计插入量
private long expectedInsertions;
//可接受的错误率
private double fpp;
@Autowired
private RedisTemplate redisTemplate;
//bit数组长度
private long numBits;
//hash函数数量
private int numHashFunctions ;
public long getExpectedInsertions() {
return expectedInsertions;
}
public void setExpectedInsertions(long expectedInsertions) {
this.expectedInsertions = expectedInsertions;
}
public void setFpp(double fpp) {
this.fpp = fpp;
}
public double getFpp() {
return fpp;
}
@PostConstruct
public void init(){
this.numBits = optimalNumOfBits(expectedInsertions, fpp);
this.numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, numBits);
}
//计算hash函数个数
private int optimalNumOfHashFunctions(long n, long m) {
return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
}
//计算bit数组长度
private long optimalNumOfBits(long n, double p) {
if (p == 0) {
p = Double.MIN_VALUE;
}
return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}
/**
* 判断keys是否存在于集合
*/
public boolean isExist(String key) {
long[] indexs = getIndexs(key);
List list = redisTemplate.executePipelined(new RedisCallback<Object>() {
@Nullable
@Override
public Object doInRedis(RedisConnection redisConnection) throws DataAccessException {
redisConnection.openPipeline();
for (long index : indexs) {
redisConnection.getBit("bf:hilite".getBytes(), index);
}
redisConnection.close();
return null;
}
});
return !list.contains(false);
}
/**
* 将key存入redis bitmap
*/
public void put(String key) {
long[] indexs = getIndexs(key);
redisTemplate.executePipelined(new RedisCallback<Object>() {
@Nullable
@Override
public Object doInRedis(RedisConnection redisConnection) throws DataAccessException {
redisConnection.openPipeline();
for (long index : indexs) {
redisConnection.setBit("bf:hilite".getBytes(),index,true);
}
redisConnection.close();
return null;
}
});
}
/**
* 根据key获取bitmap下标
*/
private long[] getIndexs(String key) {
long hash1 = hash(key);
long hash2 = hash1 >>> 16;
long[] result = new long[numHashFunctions];
for (int i = 0; i < numHashFunctions; i++) {
long combinedHash = hash1 + i * hash2;
if (combinedHash < 0) {
combinedHash = ~combinedHash;
}
result[i] = combinedHash % numBits;
}
return result;
}
/**
* 获取一个hash值
*/
private long hash(String key) {
Charset charset = Charset.forName("UTF-8");
return Hashing.murmur3_128().hashObject(key, Funnels.stringFunnel(charset)).asLong();
}
}
yml配置:
bloom:
filter:
expectedInsertions: 1000
fpp: 0.001F
大致流程为:
1:根据预计插入量及可接受错误率计算出bit数组长度及hash函数数量。
2:将key值hash后,根据hash函数的数量,计算出这个key的不同的下标数组,用于匹配key值。
3:遍历key值的下标,将相同的值(bf:hilite)根据下标的值,转存为对应下标值长度的二进制数存入bitmap。
4:判断时,将key按相同方式转换为下标数组,通过getBit()方法判断是否存在。
看代码也可以知道hash函数数量numHashFunctions与预计插入量expectedInsertions无关,与可接受的错误率fpp成反比,bit数组长度numBits与预计插入量expectedInsertions成正比,与可接受的错误率fpp成反比。