布隆过滤器

什么是布隆过滤器

  • **布隆过滤器**,用于快速的判断一个元素是否存在集合中,因为布隆过滤器的底层采用**bit数组**,因此布隆过滤器**占用空间较小**

初始化

  • 布隆过滤器的初始化需要两个参数,即**误判率****元素数量**,会调用**tryInit**方式来进行布隆过滤器的初始化。
  • 元素数量和误判率的主要会影响到**bit数组的长度****哈希函数的数量**和bit数组的长度。如果数组的长度过长,会导致**占用的空间较大**,如果哈希函数的数量过多会导致布隆过滤器的**效率降低**
@Bean
public RBloomFilter<String> userRegisterCachePenetrationBloomFilter(RedissonClient redissonClient, UserRegisterBloomFilterProperties userRegisterBloomFilterProperties) {
    RBloomFilter<String> cachePenetrationBloomFilter = redissonClient.getBloomFilter(userRegisterBloomFilterProperties.getName());
    cachePenetrationBloomFilter.tryInit(userRegisterBloomFilterProperties.getExpectedInsertions(), userRegisterBloomFilterProperties.getFalseProbability());
    return cachePenetrationBloomFilter;
}
@Override
public boolean tryInit(long expectedInsertions, double falseProbability) {
    if (falseProbability > 1) {
        throw new IllegalArgumentException("Bloom filter false probability can't be greater than 1");
    }
    if (falseProbability < 0) {
        throw new IllegalArgumentException("Bloom filter false probability can't be negative");
    }

    size = optimalNumOfBits(expectedInsertions, falseProbability);
    if (size == 0) {
        throw new IllegalArgumentException("Bloom filter calculated size is " + size);
    }
    if (size > getMaxSize()) {
        throw new IllegalArgumentException("Bloom filter size can't be greater than " + getMaxSize() + ". But calculated size is " + size);
    }
    hashIterations = optimalNumOfHashFunctions(expectedInsertions, size);

    CommandBatchService executorService = new CommandBatchService(commandExecutor);
    executorService.evalReadAsync(configName, codec, RedisCommands.EVAL_VOID,
            "local size = redis.call('hget', KEYS[1], 'size');" +
                    "local hashIterations = redis.call('hget', KEYS[1], 'hashIterations');" +
                    "assert(size == false and hashIterations == false, 'Bloom filter config has been changed')",
                    Arrays.<Object>asList(configName), size, hashIterations);
    executorService.writeAsync(configName, StringCodec.INSTANCE,
                                            new RedisCommand<Void>("HMSET", new VoidReplayConvertor()), configName,
            "size", size, "hashIterations", hashIterations,
            "expectedInsertions", expectedInsertions, "falseProbability", BigDecimal.valueOf(falseProbability).toPlainString());
    try {
        executorService.execute();
    } catch (RedisException e) {
        if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
            throw e;
        }
        readConfig();
        return false;
    }

    return true;
}
//bit数组长度计算函数
private long optimalNumOfBits(long n, double p) {
    if (p == 0) {
        p = Double.MIN_VALUE;
    }
    return (long) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
}
//哈希函数数量计算
private int optimalNumOfHashFunctions(long n, long m) {
    return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
}

添加元素

  • 通过元素与哈希函数计算的值判断出一系列下标,**添加元素**的过程就是将所有的**下标设置为1**
  • 只要存在一个下标的位置**原始是0**,即该认为该元素在布隆过滤器中**不存在**,添加成功
  • 如果所有的下标都已经为1,则认为该元素在布隆过滤器中存在,但布隆过滤器**存在误判**的概率,因此需要再次查询数据库。
public boolean add(T object) {
    // 根据带插入元素得到两个long类型散列值
    long[] hashes = hash(object);

    while (true) {
        if (size == 0) {
            readConfig();
        }

        int hashIterations = this.hashIterations;
        long size = this.size;

        // 得到位下标数组
        // 以两个散列值根据指定策略生成hashIterations个散列值,从而得到位下标
        long[] indexes = hash(hashes[0], hashes[1], hashIterations, size);

        CommandBatchService executorService = new CommandBatchService(commandExecutor);
        addConfigCheck(hashIterations, size, executorService);
        RBitSetAsync bs = createBitSet(executorService);
        for (int i = 0; i < indexes.length; i++) {
            // 将位下标对应位设置1
            bs.setAsync(indexes[i]);
        }
        try {
            List<Boolean> result = (List<Boolean>) executorService.execute().getResponses();

            for (Boolean val : result.subList(1, result.size()-1)) {
                if (!val) {
                    // 元素添加成功
                    return true;
                }
            }
            // 元素已存在
            return false;
        } catch (RedisException e) {
            if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
                throw e;
            }
        }
    }
}

private long[] hash(Object object) {
    ByteBuf state = encode(object);
    try {
        return Hash.hash128(state);
    } finally {
        state.release();
    }
}

private long[] hash(long hash1, long hash2, int iterations, long size) {
    long[] indexes = new long[iterations];
    long hash = hash1;
    for (int i = 0; i < iterations; i++) {
        indexes[i] = (hash & Long.MAX_VALUE) % size;
        // 散列函数的实现方式
        if (i % 2 == 0) {
            // 新散列值
            hash += hash2;
        } else {
            // 新散列值
            hash += hash1;
        }
    }
    return indexes;
}

image.png

查看元素是否存在

  • 通过元素与哈希函数计算的值判断出一系列下标,**判断元素是否存在**的过程即是判断**是否所有下标都为1**
  • 只要存在一个下标的位置是0,即该认为该元素在布隆过滤器中**不存在**
  • 如果所有的**下标都为1**,则认为该元素在布隆过滤器中存在,但布隆过滤器**存在误判**的概率,因此需要再次查询数据库。
public boolean contains(T object) {
    // 根据带插入元素得到两个long类型散列值
    long[] hashes = hash(object);

    while (true) {
        if (size == 0) {
            readConfig();
        }

        int hashIterations = this.hashIterations;
        long size = this.size;

        // 得到位下标数组
        // 以两个散列值根据指定策略生成hashIterations个散列值,从而得到位下标
        long[] indexes = hash(hashes[0], hashes[1], hashIterations, size);

        CommandBatchService executorService = new CommandBatchService(commandExecutor);
        addConfigCheck(hashIterations, size, executorService);
        RBitSetAsync bs = createBitSet(executorService);
        for (int i = 0; i < indexes.length; i++) {
            // 获取位下标对应位的值
            bs.getAsync(indexes[i]);
        }
        try {
            List<Boolean> result = (List<Boolean>) executorService.execute().getResponses();

            for (Boolean val : result.subList(1, result.size()-1)) {
                if (!val) {
                    // 若存在不为1的位,则认为元素不存在
                    return false;
                }
            }
            // 都为1,则认为元素存在
            return true;
        } catch (RedisException e) {
            if (e.getMessage() == null || !e.getMessage().contains("Bloom filter config has been changed")) {
                throw e;
            }
        }
    }
}

删除

  • 布隆过滤器只能添加不能删除,因此可以考虑定期重新创建布隆过滤器
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值