什么是布隆过滤器
**布隆过滤器**
,用于快速的判断一个元素是否存在集合中,因为布隆过滤器的底层采用**bit数组**
,因此布隆过滤器**占用空间较小**
。
初始化
- 布隆过滤器的初始化需要两个参数,即
**误判率**
和**元素数量**
,会调用**tryInit**
方式来进行布隆过滤器的初始化。 - 元素数量和误判率的主要会影响到
**bit数组的长度**
和**哈希函数的数量**
和bit数组的长度。如果数组的长度过长,会导致**占用的空间较大**
,如果哈希函数的数量过多会导致布隆过滤器的**效率降低**
。
@Bean
public RBloomFilter<String> userRegisterCachePenetrationBloomFilter(RedissonClient redissonClient, UserRegisterBloomFilterProperties userRegisterBloomFilterProperties) {
RBloomFilter<String> cachePenetrationBloomFilter = redissonClient.getBloomFilter(userRegisterBloomFilterProperties.getName());
cachePenetrationBloomFilter.tryInit(userRegisterBloomFilterProperties.getExpectedInsertions(), userRegisterBloomFilterProperties.getFalseProbability());
return cachePenetrationBloomFilter;
}
@Override
public boolean tryInit(long expectedInsertions, double falseProbability) {
if (falseProbability > 1) {
throw new IllegalArgumentException("Bloom filter false probability can't be greater than 1");
}
if (falseProbability < 0) {
throw new IllegalArgumentException("Bloom filter false probability can't be negative");
}
size = optimalNumOfBits(expectedInsertions, falseProbability);
if (size == 0) {
throw new IllegalArgumentException("Bloom filter calculated size is " + size);
}
if (size > getMaxSize()) {
throw new IllegalArgumentException("Bloom filter size can't be greater than " + getMaxSize() + ". But calculated size is " + size);
}
hashIterations = optimalNumOfHashFunctions(expectedInsertions, size);
CommandBatchService executorService = new CommandBatchService(commandExecutor);
executorService.evalReadAsync(configName, codec, RedisCommands.EVAL_VOID,
"local size = redis.call('hget', KEYS[1], 'size');" +
"local hashIterations = redis.call('hget', KEYS[1], 'hashIterations');" +
"assert(size == false and hashIterations == false, 'Bloom filter config has been changed')",
Arrays.<Object>asList(configName), size, hashIterations);
executorService.writeAsync(configName, StringCodec.INSTANCE,
new RedisCommand<Void>("HMSET", new VoidReplayConvertor()), configName,
"size", size, "hashIterations", hashIterations,
"expectedInsertions", expectedInsertions, "falseProbability", BigDecimal.valueOf(falseProbability).toPlainString());
try {
executorService.execute();
} catch (RedisException e) {
if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
throw e;
}
readConfig();
return false;
}
return true;
}
//bit数组长度计算函数
private long optimalNumOfBits(long n, double p) {
if (p == 0) {
p = Double.MIN_VALUE;
}
return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}
//哈希函数数量计算
private int optimalNumOfHashFunctions(long n, long m) {
return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
}
添加元素
- 通过元素与哈希函数计算的值判断出一系列下标,
**添加元素**
的过程就是将所有的**下标设置为1**
- 只要存在一个下标的位置
**原始是0**
,即该认为该元素在布隆过滤器中**不存在**
,添加成功 - 如果所有的下标都已经为1,则认为该元素在布隆过滤器中存在,但布隆过滤器
**存在误判**
的概率,因此需要再次查询数据库。
public boolean add(T object) {
// 根据带插入元素得到两个long类型散列值
long[] hashes = hash(object);
while (true) {
if (size == 0) {
readConfig();
}
int hashIterations = this.hashIterations;
long size = this.size;
// 得到位下标数组
// 以两个散列值根据指定策略生成hashIterations个散列值,从而得到位下标
long[] indexes = hash(hashes[0], hashes[1], hashIterations, size);
CommandBatchService executorService = new CommandBatchService(commandExecutor);
addConfigCheck(hashIterations, size, executorService);
RBitSetAsync bs = createBitSet(executorService);
for (int i = 0; i < indexes.length; i++) {
// 将位下标对应位设置1
bs.setAsync(indexes[i]);
}
try {
List<Boolean> result = (List<Boolean>) executorService.execute().getResponses();
for (Boolean val : result.subList(1, result.size()-1)) {
if (!val) {
// 元素添加成功
return true;
}
}
// 元素已存在
return false;
} catch (RedisException e) {
if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
throw e;
}
}
}
}
private long[] hash(Object object) {
ByteBuf state = encode(object);
try {
return Hash.hash128(state);
} finally {
state.release();
}
}
private long[] hash(long hash1, long hash2, int iterations, long size) {
long[] indexes = new long[iterations];
long hash = hash1;
for (int i = 0; i < iterations; i++) {
indexes[i] = (hash & Long.MAX_VALUE) % size;
// 散列函数的实现方式
if (i % 2 == 0) {
// 新散列值
hash += hash2;
} else {
// 新散列值
hash += hash1;
}
}
return indexes;
}
查看元素是否存在
- 通过元素与哈希函数计算的值判断出一系列下标,
**判断元素是否存在**
的过程即是判断**是否所有下标都为1**
- 只要存在一个下标的位置是0,即该认为该元素在布隆过滤器中
**不存在**
- 如果所有的
**下标都为1**
,则认为该元素在布隆过滤器中存在,但布隆过滤器**存在误判**
的概率,因此需要再次查询数据库。
public boolean contains(T object) {
// 根据带插入元素得到两个long类型散列值
long[] hashes = hash(object);
while (true) {
if (size == 0) {
readConfig();
}
int hashIterations = this.hashIterations;
long size = this.size;
// 得到位下标数组
// 以两个散列值根据指定策略生成hashIterations个散列值,从而得到位下标
long[] indexes = hash(hashes[0], hashes[1], hashIterations, size);
CommandBatchService executorService = new CommandBatchService(commandExecutor);
addConfigCheck(hashIterations, size, executorService);
RBitSetAsync bs = createBitSet(executorService);
for (int i = 0; i < indexes.length; i++) {
// 获取位下标对应位的值
bs.getAsync(indexes[i]);
}
try {
List<Boolean> result = (List<Boolean>) executorService.execute().getResponses();
for (Boolean val : result.subList(1, result.size()-1)) {
if (!val) {
// 若存在不为1的位,则认为元素不存在
return false;
}
}
// 都为1,则认为元素存在
return true;
} catch (RedisException e) {
if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
throw e;
}
}
}
}
删除
- 布隆过滤器只能添加不能删除,因此可以考虑定期重新创建布隆过滤器