Redis集群实现布隆过滤器

最新推荐文章于 2024-07-26 15:45:58 发布

胖虎。。

最新推荐文章于 2024-07-26 15:45:58 发布

阅读量7.2k

点赞数 9

分类专栏： redis java 文章标签：布隆布隆过滤器 redis实现布隆过滤器

本文链接：https://blog.csdn.net/weixin_38003389/article/details/89918747

版权

java 同时被 2 个专栏收录

33 篇文章 2 订阅

订阅专栏

redis

6 篇文章 0 订阅

订阅专栏

前言

其实，在这之前我还真的不知道布隆过滤器是个啥，感谢群里一位大神网友，提供了相关代码和资料，至此我才知道什么叫布隆过滤器，本文就是为了指导很多像我一样的人，初识布隆过滤器，并可以结合 redis 及代码实现。

正文

本文分两部分讲解

第一部分让读者知道什么是布隆过滤器，以及适用的场景。

第二部分我们结合 redis集群实现一个简单布隆过滤器，并测试达到过滤的效果，本文的封装的代码可以直接拿到项目使用，有需要的在末尾添加微信。

介绍布隆过滤器

概念：布隆过滤器（Bloom Filter）是1970年由布隆提出的。它实际上是一个很长的二进制向量和一系列随机映射函数。布隆过滤器可以用于检索一个元素是否在一个集合中。它的优点是空间效率和查询时间都比一般的算法要好的多，缺点是有一定的误识别率和删除困难。

为什么叫布隆，是拿自己的名字命名了，概念并不能清楚的了解布隆过滤器，接下来我用自己总结的话解释给大家。

布隆过滤器的实现基础是哈希函数，但不同于哈希表的精确查找，他是一个很大很大的数组，我认为把这个数组理解成一个带坐标的（x，y，z）三维空间，每一个值在这个空间里都可以用 x，y，z 三个坐标找到他。

贴一个网上的图，虽然我知道这个图只是辅助作用。

具体操作流程

首先我们要存一个元素，这个元素的key 固定，这个元素的 value 是字符串 1，那么我们通过hash算法得到了 value 为1 的值在这个三维空间的坐标是多少了，下面看代码截图。

然后，当我们再从过滤器里判断 value 为 1 的元素存不存在我们就再次通过hash算法生成 1 的坐标，当数组内3个值均存在这个key里，我们就可以说，这个key 的value 为1 的元素存在，否则不存在，请看截图。

上面代码使用 bitmap 位存储的方式操作，简单说一句，Redis Setbit 命令用于对 key 所储存的字符串值，设置或清除指定偏移量上的位(bit)。

基本语法

redis 127.0.0.1:6379> Setbit KEY_NAME OFFSET

同理，getbit就是返回这个ket 的偏移量存不存在。

适用场景

布隆过滤器适用于大量数据的过滤，判断是否存在。比如垃圾邮箱过滤，非法ip过滤等，一般校验被判断数据在不在过滤器里，只能返回要么在，要么不在，而且不是100%准确，但是根据设置误差率可以满足大概 99.9999% 准确率。

代码实现布隆过滤器

前置准备工作

本机安装一个 redis 集群，端口按默认的，然后启动，mac用户可看博主博客 Mac系统搭建Redis集群模式，公众号用户可复制链接：https://blog.csdn.net/weixin_38003389/article/details/86295944。
创建一个 eureka-service ，端口随意，保证你的 eureka客户端能注册上，然后启动。
本文基于上篇文章的redis -tool 工程改写的，有兴趣可关注并查看博主更多文章。
redis -tool 工程pom文件，滑动滚轮即可看到pom 的内容。

redis-tool工程 pom需添加依赖

<dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>22.0</version>
        </dependency>

使用所有框架和中间件的版本

环境
框架	版本
Spring Boot	2.0.3.RELEASE
Spring Cloud	Finchley.RELEASE
redis	redis-4.0.11
JDK	1.8.x

核心类

package com.config;

import com.google.common.base.Preconditions;
import com.google.common.hash.Funnel;
import com.google.common.hash.Hashing;
import org.springframework.beans.factory.annotation.Configurable;

@Configurable
public class BloomFilterHelper<T> {
    private int numHashFunctions;
    private int bitSize;
    private Funnel<T> funnel;

    public BloomFilterHelper(Funnel<T> funnel, int expectedInsertions, double fpp) {
        Preconditions.checkArgument(funnel != null, "funnel不能为空");
        this.funnel = funnel;
        bitSize = optimalNumOfBits(expectedInsertions, fpp);
        numHashFunctions = optimalNumOfHashFunctions(expectedInsertions, bitSize);
    }

    public int[] murmurHashOffset(T value) {
        int[] offset = new int[numHashFunctions];

        long hash64 = Hashing.murmur3_128().hashObject(value, funnel).asLong();
        int hash1 = (int) hash64;
        int hash2 = (int) (hash64 >>> 32);
        for (int i = 1; i <= numHashFunctions; i++) {
            int nextHash = hash1 + i * hash2;
            if (nextHash < 0) {
                nextHash = ~nextHash;
            }
            offset[i - 1] = nextHash % bitSize;
        }

        return offset;
    }

    /**
     * 计算bit数组长度
     */
    private int optimalNumOfBits(long n, double p) {
        if (p == 0) {
            p = Double.MIN_VALUE;
        }
        return (int) (-n * Math.log(p) / (Math.log(2) * Math.log(2)));
    }

    /**
     * 计算hash方法执行次数
     */
    private int optimalNumOfHashFunctions(long n, long m) {
        return Math.max(1, (int) Math.round((double) m / n * Math.log(2)));
    }

}

封装出来的工具类

package com.redislock;

import com.config.BloomFilterHelper;
import com.google.common.base.Preconditions;
import org.springframework.stereotype.Component;
import redis.clients.jedis.JedisCluster;

@Component
public class RedisBloomFilter<T> {
    private JedisCluster cluster;

    public RedisBloomFilter(JedisCluster jedisCluster) {
        this.cluster = jedisCluster;
    }

    /**
     * 根据给定的布隆过滤器添加值
     */
    public <T> void addByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) {
        Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能为空");
        int[] offset = bloomFilterHelper.murmurHashOffset(value);
        for (int i : offset) {
            //redisTemplate.opsForValue().setBit(key, i, true);
            cluster.setbit(key, i, true);
        }
    }

    /**
     * 根据给定的布隆过滤器判断值是否存在
     */
    public <T> boolean includeByBloomFilter(BloomFilterHelper<T> bloomFilterHelper, String key, T value) {
        Preconditions.checkArgument(bloomFilterHelper != null, "bloomFilterHelper不能为空");
        int[] offset = bloomFilterHelper.murmurHashOffset(value);
        for (int i : offset) {
            //if (!redisTemplate.opsForValue().getBit(key, i)) {
            if (!cluster.getbit(key, i)) {
                return false;
            }
        }

        return true;
    }
}

redis 配置

spring.redis.cluster.nodes=127.0.0.1:7000,127.0.0.1:7001,127.0.0.1:7002,127.0.0.1:7003,127.0.0.1:7004,127.0.0.1:7005
spring.redis.password=
#连接池最大连接数（使用负值表示没有限制）
spring.redis.pool.max-active=8
#连接池最大阻塞等待时间（使用负值表示没有限制）
spring.redis.pool.max-wait=-1
#连接池中的最大空闲连接
spring.redis.pool.max-idle=8
#连接池中的最小空闲连接
spring.redis.pool.min-idle=0
#连接超时时间（毫秒）
spring.redis.timeout=0
spring.redis.commandTimeout=5000



/**
 * @author yanlin
 * @version v1.3
 * @date 2019-04-13 3:50 PM
 * @since v8.0
 **/
@Configuration
public class RedisConfig {
    private Logger logger = LoggerFactory.getLogger(RedisConfig.class);

    @Value("${spring.redis.cluster.nodes}")
    private String clusterNodes;
    @Value("${spring.redis.timeout}")
    private int timeout;
    @Value("${spring.redis.pool.max-idle}")
    private int maxIdle;
    @Value("${spring.redis.pool.max-wait}")
    private long maxWaitMillis;
    @Value("${spring.redis.commandTimeout}")
    private int commandTimeout;

    @Bean
    public JedisCluster getJedisCluster() {
        String[] cNodes = clusterNodes.split(",");
        Set<HostAndPort> nodes = new HashSet<>();
        // 分割出集群节点
        for (String node : cNodes) {
            String[] hp = node.split(":");
            nodes.add(new HostAndPort(hp[0], Integer.parseInt(hp[1])));
        }
        JedisPoolConfig jedisPoolConfig = new JedisPoolConfig();
        jedisPoolConfig.setMaxIdle(maxIdle);
        jedisPoolConfig.setMaxWaitMillis(maxWaitMillis);
        return new JedisCluster(nodes, commandTimeout, jedisPoolConfig);

    }


    /**
     * redis序列化
     *
     * @param connectionFactory
     * @return
     */
    @Bean
    public RedisTemplate<String, Serializable> redisTemplate(LettuceConnectionFactory connectionFactory) {
        RedisTemplate<String, Serializable> redisTemplate = new RedisTemplate<>();
        redisTemplate.setKeySerializer(new StringRedisSerializer());
        redisTemplate.setValueSerializer(new GenericJackson2JsonRedisSerializer());
        redisTemplate.setConnectionFactory(connectionFactory);
        return redisTemplate;
    }

}

测试


@SpringBootApplication
@EnableDiscoveryClient
@ComponentScan(value = {"com.annotaion", "cn.springcloud", "com.config", "com.redislock"})
public class Ch34EurekaClientApplication implements ApplicationRunner {

    private static BloomFilterHelper<CharSequence> bloomFilterHelper;

    @Autowired
    RedisBloomFilter redisBloomFilter;

    public static void main(String[] args) {
        SpringApplication.run(Ch34EurekaClientApplication.class, args);

    }

    @PostConstruct
    public void init() {
        bloomFilterHelper = new BloomFilterHelper<>(Funnels.stringFunnel(Charset.defaultCharset()), 1000, 0.1);
    }

    @Override
    public void run(ApplicationArguments args) throws Exception {



//******* Redis集群测试布隆方法*********
        int j = 0;
        for (int i = 0; i < 100; i++) {
            redisBloomFilter.addByBloomFilter(bloomFilterHelper, "bloom", i+"");
        }
        for (int i = 0; i < 1000; i++) {
            boolean result = redisBloomFilter.includeByBloomFilter(bloomFilterHelper, "bloom", i+"");
            if (!result) {
                j++;
            }
        }
        System.out.println("漏掉了" + j + "个");
    }
}

输出结果