Redis布隆过滤器

最新推荐文章于 2024-04-25 11:41:33 发布

南宫拾壹

最新推荐文章于 2024-04-25 11:41:33 发布

阅读量502

点赞数 1

分类专栏： Redis 文章标签： redis 数据库

本文链接：https://blog.csdn.net/mutf7/article/details/120099152

版权

Redis 专栏收录该内容

22 篇文章 0 订阅

订阅专栏

Redis布隆过滤器

01、什么是布隆过滤器

缓存穿透故障、幂等性判断、今日头条，抖音的推荐

相信很多小伙伴在预览的时候，头条会根据的你喜好，为你推荐一些图文信息供你阅读，比如：你喜欢汽车、喜欢美女，它就会经常推荐一些美女图或者汽车文章供你阅读。

我们今天讲的内容不是推荐算法，而是：已读去重算法。

推荐引擎给你推荐合适的文章，要过滤你已经阅读过的，不然就会重复给你推荐。比如你阅读了一篇文章，如果你已经阅读了，如果没有已读去重，明天又给你推荐。故为了达到更好的体验效果必须把你读过的内容去重。

这种已读去重的技术解决方案，一般会想到数据库存储。这种方案对于一线互联网公司来说是有问题的。像今日头条这种高并发的场景，每次推荐都去数据库查询，必定==导致数据库扛不住==。所以肯定是redis缓存来处理的。

既然用到了redis，想一想用什么数据结构？

大部分人会想到set。

set是一种很不错的去重数据结构，对于小量数据可以实现，但是对于大数据，例如：几个月的已读历史数据的存储，set就不太适合了。业界的做法一般是使用==布隆过滤器==来实现。

02、安装

redis本身是没有布隆过滤器的。它是redis开发的一个插件，需要进行集成和配置才可以使用，具体如下：

下载地址（redis官网下载即可）：https://github.com/RedisLabsModules/redisbloom/

开发文档：https://github.com/RedisBloom/JRedisBloom

2.1 下载bloomFilter

wget https://github.com/RedisLabsModules/rebloom/archive/v1.1.1.tar.gz
tar -zxvf v1.1.1.tar.gz

2.2 插件编译

cd RedisBloom-1.1.1
make

成功后可看到目录下有个.so文件

2.3 Redis整合boomfilter

1、在redis.conf配置文件里加入如下引入配置

# redis集群每个配置文件都需要加入这一行
loadmodule /example/redis/RedisBloom-2.2.1

2、添加完配置后重启redis

>cd redis的安装目录
>ps -ef | grep redis 
>kill redispid
>src/redis-server ./redis.conf

3、通过客户端连接redis服务

[root@iZwz94p9y07ns86pck1l2jZ redis-6.2.4]# src/redis-cli 
127.0.0.1:6379> auth mkxiaoer
OK
127.0.0.1:6379> bf.add filter 2
(integer) 1

03、BloomFilter的命令

BF.RESERVE {key} {error_rate} {capacity}

error_rate：错误率，允许布隆过滤器的错误率。值越低，过滤器的位数组越大，占用空间也就越大。
capacity：存储的元素个数，当实际存储的元素个数超过这个值之后，过滤器的准确率会下降。

注意：必须在add之前使用bf.reserve命令显示的创建。

如果对应的key,已经存在，bf.reserve会报错。

如果不使用bf.reserve，默认的：error_rate是0.0001；capacity是10000

127.0.0.1:6379> bf.reserve filter 0.01 100
OK
127.0.0.1:6379> bf.add filter 1
(integer) 1
127.0.0.1:6379> bf.exists filter 1
(integer) 1
127.0.0.1:6379> bf.madd filter 1
1) (integer) 0
127.0.0.1:6379> bf.madd filter 1 2 3 4 5 6
1) (integer) 0
2) (integer) 1
3) (integer) 1
4) (integer) 1
5) (integer) 1
6) (integer) 1
127.0.0.1:6379> bf.madd filter 1 2 3 4 5 6 
1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 0
5) (integer) 0
6) (integer) 0
127.0.0.1:6379> bf.madd filter 1 2 3 4 5 6 7 6
1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 0
5) (integer) 0
6) (integer) 0
7) (integer) 1
8) (integer) 0
127.0.0.1:6379> bf.mexists filter 1 2 3 4 5 6 7 6
1) (integer) 1
2) (integer) 1
3) (integer) 1
4) (integer) 1
5) (integer) 1
6) (integer) 1
7) (integer) 1
8) (integer) 1

04、使用布隆过滤器解决缓存穿透

4.1 增加布隆过滤器的依赖包

<dependency>
    <groupId>com.redislabs</groupId>
    <artifactId>jrebloom</artifactId>
    <version>2.1.0</version>
</dependency>

4.2 配置封装

redis:
  bloom:
    host: 
    port: 6379
    password: 
    capacity: 100
    rate: 0.01

4.3 布隆过滤的配置类

package com.example.config;

import io.rebloom.client.Client;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;

@Configuration
public class RedisBloomConfiguration {

    private static final Logger log = LoggerFactory.getLogger(RedisBloomConfiguration.class);

    // host地址
    @Value("${redis.bloom.host}")
    private String host;
    // host地址
    @Value("${redis.bloom.password}")
    private String password;
    // 端口
    @Value("${redis.bloom.port}")
    private Integer port;
    // 基数
    @Value("${redis.bloom.capacity}")
    private Integer capacity;
    // 错误率
    @Value("${redis.bloom.rate}")
    private Double rate;

    @Bean
    public JedisPool jedisPool() {
        JedisPoolConfig poolConfig = new JedisPoolConfig();
        poolConfig.setMaxIdle(8);
        poolConfig.setMaxTotal(8);
        poolConfig.setMaxWaitMillis(5 * 1000);
        JedisPool jp = new JedisPool(poolConfig, host, port,
                3 * 1000, password, 0);
        return jp;
    }

    @Bean
    public Client rebloomClient(JedisPool pool) {
        // 1: 初始化布隆过滤器的client对象
        Client client = new Client(pool);
        try {
            // 初始化bloomfilter的容器错率
            client.createFilter("redis:bloom:filter", capacity, rate);
        } catch (Exception ex) {
            log.info("bloom过滤器已经存在，异常信息是：{}", ex.getMessage());
        }
        return client;
    }
}

4.4 初始化数据到布隆中

package com.example.service;

import com.baomidou.mybatisplus.core.conditions.query.QueryWrapper;
import com.baomidou.mybatisplus.extension.service.impl.ServiceImpl;
import com.example.entity.Category;
import com.example.mapper.CategoryMapper;
import io.rebloom.client.Client;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.util.CollectionUtils;

import javax.annotation.PostConstruct;
import java.util.List;

@Service
public class CategoryServiceImpl extends ServiceImpl<CategoryMapper, Category> implements CategoryService {

    @Autowired
    private Client rebloomClient;

    /**
     * 把需要进行过滤的数据进行放入到布隆过滤器中
     */
    @PostConstruct
    public void initBloomData() {
        List<Category> categroies = this.findCategory(0);
        if (!CollectionUtils.isEmpty(categroies)) {
            // 把对应一级分类的id放入到bloomfilter中
            String[] ids = new String[categroies.size()];
            for (int i = 0; i < categroies.size(); i++) {
                ids[i] = categroies.get(i).getId() + "";
            }
            // bf.madd
            rebloomClient.addMulti("redis:bloom:category", ids);
        }
    }

    @Override
    public List<Category> findCategory(Integer cid) {
        QueryWrapper<Category> queryWrapper = new QueryWrapper<>();
        queryWrapper.eq("pid", cid);
        return this.list(queryWrapper);
    }
}

问题1：如果后面的元素要追加到布隆过滤器中，如何追加？

定时任务
重启服务（一定灰度发布），比如三台服务器，停止一台，启动一台，然后再停一台再启动一台。以此类推
把数据添加到布隆过滤器中
MQ同步

4.5 布隆过滤器解决缓存穿透的问题

@Autowired
    private CategoryService categoryService;
    @Autowired
    private RedisTemplate redisTemplate;

    @Autowired
    private Client reBloomClient;


    @GetMapping("/findCategory/{cid}")
    public List<Category> findCategory(@PathVariable("cid") Integer cid) {
        // 判断分类id是否传入，如果没有传入那么直接返回
        KAssert.isEmpty(cid, 401, "分类不存在");
         // 进行布隆过滤 bf.madd redis:bloom:category “1” "2" "3" "4" "5"
         // 进行布隆过滤 bf.mexists redis:bloom:category “1”
        boolean[] booleans = reBloomClient.existsMulti("redis:bloom:category", cid + "");
         // 如果cid没有在布隆过滤器中，说明你数据不存在，直接返回
        if (!booleans[0]) {
            throw new ValidationException(401, "分类不存在");
        }

        // BloomFilter的认识
        // 用了Redis缓存就真的不会进入数据库了吗？
        List<Category> categoryList = new ArrayList<>();
        // 先去缓存中去获取分类信息
        String categories = (String) redisTemplate.opsForValue().get("subCid:" + cid);
        if (StringUtils.isEmpty(categories)) {
            log.info("db去查询了........");
            categoryList = categoryService.findCategroies(cid);
            if (!CollectionUtils.isEmpty(categoryList)) {
                redisTemplate.opsForValue().set("subCid:" + cid, JsonUtil.obj2String(categoryList));
            }
        } else {
            categoryList = JsonUtil.string2Obj(categories, List.class, Category.class);
        }
        return categoryList;
    }

05、布隆过滤器的问题和使用场景

因为布隆过滤器存在一定的误判率，如果业务场景是严格要求的话，不适合使用布隆过滤器。
黑名单
抖音、今日头条数据已读去重过滤
要对平台用户的行为进行监控，然后根据用户行为，把用户喜欢的内容推送给用户。
大数据分析平台
拦截垃圾邮件和IP

南宫拾壹

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
Redis布隆过滤器

Redis布隆过滤器01、什么是布隆过滤器缓存穿透故障、幂等性判断、今日头条，抖音的推荐相信很多小伙伴在预览的时候，头条会根据的你喜好，为你推荐一些图文信息供你阅读，比如：你喜欢汽车、喜欢美女，它就会经常推荐一些美女图或者汽车文章供你阅读。我们今天讲的内容不是推荐算法，而是：已读去重算法。推荐引擎给你推荐合适的文章，要过滤你已经阅读过的，不然就会重复给你推荐。比如你阅读了一篇文章，如果你已经阅读了，如果没有已读去重，明天又给你推荐。故为了达到更好的体验效果必须把你读过的内容去重。这种已读去重的技
复制链接

扫一扫