php+redis布隆过滤器

最新推荐文章于 2024-08-03 14:49:17 发布

Joker_Wangx

最新推荐文章于 2024-08-03 14:49:17 发布

阅读量901

点赞数 1

分类专栏： Redis 文章标签： redis php

本文链接：https://blog.csdn.net/wx940627/article/details/108849933

版权

Redis 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

欢迎评论，共同学习，手动比心。

0、什么是布隆过滤器？

布隆过滤器（Bloom Filter）是1970年由布隆提出的。它实际上是一个很长的二进制向量和一系列随机映射函数。布隆过滤器可以用于检索一个元素是否在一个集合中。它的优点是空间效率和查询时间都比一般的算法要好的多，缺点是有一定的误识别率和删除困难。

时间复杂度O(1);

1、布隆过滤器的思想

如果想要判断一个元素是不是在一个集合里，一般想到的是将所有元素保存起来，然后通过比较确定。链表，树等等数据结构都是这种思路. 但是随着集合中元素的增加，我们需要的存储空间越来越大，检索速度也越来越慢(O(n),O(logn))。不过世界上还有一种叫作散列表（又叫哈希表，Hash table）的数据结构。它可以通过一个Hash函数将一个元素映射成一个位阵列（Bit array）中的一个点。

Hash面临的问题就是冲突，解决方法就是使用多个Hash函数对元素进行处理。

在这里插入图片描述

当一个元素被加入集合时，通过K个Hash函数将这个元素映射成一个位阵列（Bit array）中的K个点，把它们置为1。检索时，我们只要看看这些点是不是都是1就（大约）知道集合中有没有它了：

如果这些点有任何一个0，则被检元素一定不在；
如果都是1，则被检元素很可能在。

在这里插入图片描述

2、布隆过滤器处理流程

2.1 开辟空间

开辟一个长度为m的位阵列（Bit array）。

2.2 寻找hash函数

获取几个hash函数，前辈们已经发明了很多运行良好的hash函数，比如BKDRHash，JSHash，RSHash等等。这些hash函数我们直接获取就可以了。

2.3 写入数据

将所需要判断的内容经过这些hash函数计算，得到几个值，比如用3个hash函数，得到值分别是1000，2000，3000。之后设置m位数组的第1000，2000，3000位的值位二进制1。

2.4 判断

接下来就可以判断一个新的内容是不是在我们的集合中。判断的流程和写入的流程是一致的。

3、应用场景

缓存击穿
爬虫系统，URL的记录
黑名单，垃圾邮件的判别
集合重复元素的判别
查询加速（比如基于key-value的存储系统）

4、代码实现（php + redis）

4.1 Laravel框架实现的控制器代码

<?php

namespace App\Http\Controllers\Api;

use App\Common\CustomRedis;
use App\Http\Controllers\Api\BloomFilter;
use App\Http\Controllers\Controller;

class TestBloomController extends Controller
{

    /**
     * 向布隆过滤器压入字符串
     * 这里使用了两个hash函数
     *
     * @return false|string
     */
    public function testPut()
    {
        $redis = new CustomRedis();
        $HashFunctionArr = ['JSHash', 'PJWHash'];
        $RedisKey = 'test';

        $BloomLength = 8;//设置布隆位阵列的长度，hash后的数值对其取余
        $BloomFilter = new BloomFilter($redis, $HashFunctionArr, $RedisKey, $BloomLength);
        $res = $BloomFilter->put('joker');

        return json_encode($res);
    }

    /**
     * 判断是否存在
     *
     * @return false|string
     */
    public function testExists()
    {
        $redis = new CustomRedis();
        $HashFunctionArr = ['JSHash', 'PJWHash'];
        $RedisKey = 'test';
        $BloomFilter = new BloomFilter($redis, $HashFunctionArr, $RedisKey);

        $res = $BloomFilter->isExists('joker');

        return json_encode($res);
    }
}

4.2 BloomFilter 布隆过滤器（需要依赖redis对象）

代码中的$Redis可以替换为laravel自带的：use Illuminate\Support\Facades\Redis;

<?php

namespace App\Http\Controllers\Api;

use App\Http\Controllers\Api\BloomFilterHash;

class BloomFilter
{
    public $HashFunctionArr;
    public $Redis;
    public $RedisKey;
    public $BloomLength;

    public function __construct(Object $redis, Array $HashFunctionArr, string $RedisKey,
                                int $BloomLength = 0)
    {
        $this->HashFunctionArr = $HashFunctionArr;
        $this->Redis = $redis;
        $this->RedisKey = $RedisKey;
        $this->BloomLength = $BloomLength;
    }

    /**
     * 添加 : 将哈希函数计算后的数字，在二进制对应位置置为1
     * 返回 : hash处理后的数字
     *
     * @param string $string
     * @return mixed
     */
    public function put(string $string)
    {
        $arr = [];
        foreach($this->HashFunctionArr as $function)
        {
            $arr[] = $hash = $this->BloomLength == 0 ?
                BloomFilterHash::$function($string) :
                BloomFilterHash::$function($string) % $this->BloomLength;
            $this->Redis->setBit($this->RedisKey, $hash, 1);
        }

        return $arr;

    }

    /**
     * 查询是否存在
     *  1,存在的一定会存在
     *  2,不存在有一定几率会误判
     *
     * @param string $string
     * @return bool
     */
    public function isExists(string $string)
    {
        $len = strlen($string);
        $res = [];
        foreach($this->HashFunctionArr as $function)
        {
            $hash = $this->BloomLength == 0 ?
                BloomFilterHash::$function($string, $len) :
                BloomFilterHash::$function($string, $len) % $this->BloomLength;
            $res[] = $this->Redis->getBit($this->RedisKey, $hash);
        }
        foreach($res as $bit)
        {
            if($bit == 0)
            {
                return false;
            }
        }
        return true;
    }
}

4.3 BloomFilterHash hash方法类

<?php

namespace App\Http\Controllers\Api;

/**
 * hash方法类
 *
 * Class BloomFilterHash
 * @package App\Http\Controllers\Api
 */
class BloomFilterHash
{
    /**
     * 由Justin Sobel编写的按位散列函数
     */
    public static function JSHash($string, $len = null)
    {
        $hash = 1315423911;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash ^= (($hash << 5) + ord($string[$i]) + ($hash >> 2));
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 该哈希算法基于AT＆T贝尔实验室的Peter J. Weinberger的工作。
     * Aho Sethi和Ulman编写的“编译器（原理，技术和工具）”一书建议使用采用此特定算法中的散列方法的散列函数。
     */
    public static function PJWHash($string, $len = null)
    {
        $bitsInUnsignedInt = 4 * 8; //（unsigned int）（sizeof（unsigned int）* 8）;
        $threeQuarters = ($bitsInUnsignedInt * 3) / 4;
        $oneEighth = $bitsInUnsignedInt / 8;
        $highBits = 0xFFFFFFFF << (int)($bitsInUnsignedInt - $oneEighth);
        $hash = 0;
        $test = 0;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = ($hash << (int)($oneEighth)) + ord($string[$i]);
        }
        $test = $hash & $highBits;
        if($test != 0)
        {
            $hash = (($hash ^ ($test >> (int)($threeQuarters))) & (~$highBits));
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 类似于PJW Hash功能，但针对32位处理器进行了调整。它是基于UNIX的系统上的widley使用哈希函数。
     */
    public static function ELFHash($string, $len = null)
    {
        $hash = 0;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = ($hash << 4) + ord($string[$i]);
            $x = $hash & 0xF0000000;
            if($x != 0)
            {
                $hash ^= ($x >> 24);
            }
            $hash &= ~$x;
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 这个哈希函数来自Brian Kernighan和Dennis Ritchie的书“The C Programming Language”。
     * 它是一个简单的哈希函数，使用一组奇怪的可能种子，它们都构成了31 .... 31 ... 31等模式，它似乎与DJB哈希函数非常相似。
     */
    public static function BKDRHash($string, $len = null)
    {
        $seed = 131;  # 31 131 1313 13131 131313 etc..
        $hash = 0;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = (int)(($hash * $seed) + ord($string[$i]));
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 这是在开源SDBM项目中使用的首选算法。
     * 哈希函数似乎对许多不同的数据集具有良好的总体分布。它似乎适用于数据集中元素的MSB存在高差异的情况。
     */
    public static function SDBMHash($string, $len = null)
    {
        $hash = 0;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = (int)(ord($string[$i]) + ($hash << 6) + ($hash << 16) - $hash);
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 由Daniel J. Bernstein教授制作的算法，首先在usenet新闻组comp.lang.c上向世界展示。
     * 它是有史以来发布的最有效的哈希函数之一。
     */
    public static function DJBHash($string, $len = null)
    {
        $hash = 5381;
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = (int)(($hash << 5) + $hash) + ord($string[$i]);
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * Donald E. Knuth在“计算机编程艺术第3卷”中提出的算法，主题是排序和搜索第6.4章。
     */
    public static function DEKHash($string, $len = null)
    {
        $len || $len = strlen($string);
        $hash = $len;
        for($i = 0; $i < $len; $i++)
        {
            $hash = (($hash << 5) ^ ($hash >> 27)) ^ ord($string[$i]);
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }

    /**
     * 参考 http://www.isthe.com/chongo/tech/comp/fnv/
     */
    public static function FNVHash($string, $len = null)
    {
        $prime = 16777619; //32位的prime 2^24 + 2^8 + 0x93 = 16777619
        $hash = 2166136261; //32位的offset
        $len || $len = strlen($string);
        for($i = 0; $i < $len; $i++)
        {
            $hash = (int)($hash * $prime) % 0xFFFFFFFF;
            $hash ^= ord($string[$i]);
        }
        return ($hash % 0xFFFFFFFF) & 0xFFFFFFFF;
    }
}

5、再考虑能不能写个composer包，哈哈哈哈，写好了会发出来

参考文章：http://imhuchao.com/1271.html

Joker_Wangx

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
1
评论
php+redis布隆过滤器

欢迎评论，共同学习，手动比心。目录0、什么是布隆过滤器？1、布隆过滤器的思想2、布隆过滤器处理流程2.1 开辟空间2.2 寻找hash函数2.3 写入数据2.4 判断3、应用场景4、代码实现（php + redis）4.1 Laravel框架实现的控制器代码4.2 BloomFilter 布隆过滤器（需要依赖redis对象）4.3 BloomFilterHash hash方法类5、再考虑能不能写个composer包，哈哈哈哈，写好了会发出来0、什么是布隆过滤器？布隆过滤器（Bloom Fil.
复制链接

扫一扫

专栏目录