数据结构-六度分离(Bloom过滤器)

文章介绍了如何利用Bloom过滤器和广度优先搜索(BFS)在大规模社交网络图中模拟‘六度分离’实验。通过添加和获取关注者,以及使用BloomFilter来限制内存使用,从起点Alice开始,寻找到达终点Bob的最短路径。实验展示了在网络中信息传播的可能性和效率。
摘要由CSDN通过智能技术生成

你听说过“六度分离[1]”实验,并决定亲自测试。你的计划很简单:随机选择两个人,Alice和Bob,以及一个秘密单词。请Alice在他们最喜欢的社交媒体Touhiter上发布该消息,并将其发布给所有追随者,请他们按照相同的指示重新发布该消息。

然后等到Bob收到消息,问他们需要多少次回复。(当然,一次是不够的,所以你必须重复几次实验来建立信心。)

在本作业中,您将实现并模拟这一点。有一个包含4亿个顶点的底层(有向)图:您可以对每个用户进行以下访问(表示为一个顶点;您不必实现它):

- add_follower(Vertex): Adds a follower to the current user. 

- get_followers(): Returns a list of all the followers connected to the Vertex.

- get_username(): Returns the (guaranteed unique) username of the user.

您的目标是实现一个BFS(广度优先搜索)来执行实验,从Alice(作为输入提供给您的起始顶点)开始,到Bob(提供给您另一个节点)时停止。不幸的是,您无法承受使用太多内存的代价,尤其是对于40000000大小的数据结构而言。虽然错误概率很小,但您还是可以接受的,因此您必须首先实现Bloom过滤器(类BloomFilter),并通过实现GraphBFS方法修改BFS算法(类Bloom过滤器)来使用它:

bfs(self, start: Vertex, end: Vertex, bloom_filter: BloomFilter)

请注意,您不必实现类Vertex,也不必创建自己的哈希函数。它们将在测试期间提供给您的代码!您也不必指定比特数bits和散列函数m的数量,这也将在测试期间由您的代码传递(当然,对于您自己的测试,您可能希望使用问题e中的设置)。例如,作业的书面部分)。

[1] 六度分离:早在上个世纪60年代,美国著名社会心理学家米尔格伦(Stanley Milgram)就提出了“六级分隔”(Six Degrees of Separation)的理论,并设计了一个连锁信的实验来验证这个假设。他认为,任何两个陌生人都可以通过“朋友的朋友”建立联系,并且他们之间所间隔的人不会超过六个,无论是美国总统与威尼斯的船夫,或热带雨林中的土著与爱斯基摩人。也就是说,最多通过六个人你就能够认识任何一个陌生人。这就是著名的“小世界假设”。

从2001年秋天开始,美国哥伦比亚大学的社会学教授瓦茨(Duncan Watts)组建了一个研究小组,利用Email这一现代通信工具,开始进行“小世界假设”的实验(http://smallworld.columbia.edu)。在1年多时间里,总共有166个国家和地区的6万多名志愿者参与实验,实验结果证明,一封邮件平均被转发6次,即可回到接收者那里。

这个玄妙理论引来了数学家、物理学家和电脑科学家纷纷投入研究,结果发现,世界上许多其他的网络也有极相似的结构。经济活动中的商业联系网络、通过超文本链接的网络、生态系统中的食物链,甚至人类脑神经元,以及细胞内的分子交互作用网络,都有着完全相同的组织结构。

from typing import Callable, List

"""
BloomFilter Class
----------

This class represents the bloom filter to be used in the graph. 

Each BloomFilter consists of the following properties:
    - hash_functions: A list of hash functions used in the bloom filter. Each function hashes a string to a number in the range of [0, bits*elements).
    - bits: The number of bits per element in the bloom filter.
    - elements: The number of elements in the bloom filter (capacity).
    - filter: The bit array representing the bloom filter.

The class also supports the following functions:
    - add(str): Adds a new entry to the bloom filter.
    - get(str): Returns a boolean that probabilistically indicates whether the given string is in the bloom filter.

Your task is to complete the following functions which are marked by the TODO comment.
You are free to add properties and functions to the class as long as the given signatures remain identical.
Good Luck!
"""


class BloomFilter:
    # These are the defined properties as described above
    hash_functions: List[Callable[[str], int]]
    bits: int
    elements: int
    filter: List[bool]

    def __init__(self, hash_functions: List[Callable[[str], int]], bits: int, elements: int) -> None:
        """
        The constructor for the BloomFilter class.
        :param hash_functions: A list of hash functions used in the bloom filter. Each function hashes a string to a number in the range of [0, BITS).
        :param bits: The number of bits in the bloom filter.
        :param elements: The number of elements in the bloom filter (capacity).
        """
        # TODO: Fill this in
        self.hash_functions = hash_functions
        self.bits = bits
        self.elements = elements
        self.bitMap = [0] * self.bits

    def add(self, item: str) -> None:
        """
        Adds a new entry to the bloom filter.
        :param item: The string to be added to the bloom filter.
        """
        # TODO: Fill this in
        for func in self.hash_functions:
            self.bitMap[func(item) % self.bits] = 1

    def get(self, item: str) -> bool:
        """
        Given a string, returns a boolean that probabilistically indicates whether the given string is in the bloom filter.
        :param item: The string to be checked in the bloom filter.
        :return: A boolean that indicates whether the string has been seen before.
        """
        # TODO: Fill this in
        for func in self.hash_functions:
            if self.bitMap[func(item) % self.bits] != 1:
                return False

        return True


"""
Vertex Class
----------

This class represents an individual user in the social network. 

Each Vertex consists of the following properties:
    - username: The username for the current User. Guaranteed to be unique.
    - followers: A list of followers connected to this Vertex (adjlist)

The class also supports the following functions:
    - add_follower(Vertex): Adds a follower to the current user. 
    - get_followers(): Returns a list of all the followers connected to the Vertex.
    - get_username(): Returns the username of the User.

**You do not need to edit this class**

"""


class Vertex:
    # These are the defined properties as described above
    username: str
    followers: "List[Vertex]"

    def __init__(self, username: str) -> None:
        """
        The constructor for the Vertex class.
        :param username: The username of the user.
        """
        self.username = username
        self.followers = []

    def add_follower(self, vertex: "Vertex") -> None:
        """
        Adds a follower to the given user with the given distance.
        :param vertex: The vertex to add an edge to.
        """

        # vertex is None, self-loop or already in the adjlist
        if not vertex or vertex == self or vertex in self.followers:
            return

        # Add to adjlist
        self.followers.append(vertex)

    def get_followers(self) -> "List[Vertex]":
        """
        Returns the list of followers for the user.
        :return: The list of followers for the user.
        """

        return self.followers

    def get_username(self) -> str:
        """
        Returns the username of the user.
        :return: The username of the user.
        """

        return self.username



"""
Graph Class
----------

This class represents the social network as a graph. 

The class supports the following functions:
    - bfs(Vertex, Vertex, BloomFilter): Perform a BFS from the start vertex to the end using the bloom filter.

Your task is to complete the following functions which are marked by the TODO comment.
You are free to add properties and functions to the class as long as the given signatures remain identical.
Good Luck!
"""
from collections import deque


class Graph:
    def bfs(self, start: Vertex, end: Vertex, bloom_filter: BloomFilter) -> int:
        """
        Performs a BFS from the start vertex to the end using the bloom filter.
        Note you should use the bloom filter to avoid excessive memory usage with large graph inputs.
        The bloom filter is constructed from your filled in BloomFilter class.
        :param start: The start vertex.
        :param end: The end vertex.
        :param bloom_filter: The bloom filter used in the graph.
        :return: The degree of separation between the start and end vertices or -1 if they're not connected.
        """
        # TODO: Fill this in
        queue = deque()
        queue.append(start)
        visited = bloom_filter
        visited.add(start.get_username())
        count = 0
        while queue:
            v = queue.popleft()
            for w in v.get_followers():
                if w.get_username() == end.get_username():
                    return count

                if not visited.get(w.get_username()):
                    visited.add(w.get_username())
                    queue.append(w)
            count += 1

        return -1



from unittest import TestCase

BITS = 4
ELEMENTS = 4

"""
This is a sample test suite for you to write your own test.
Note that the provided hash functions are NOT good hash function but
are used for demonstration purposes only. You should write your own
hash functions and test cases to test your implementation.

No tests are provided or will be provided for the bfs function.
You are free (and encouraged!) to use this as a base to write your own tests.
"""


# These are BAD hash function and are used for demonstration only.
# It just takes the first letter of a string and returns its ASCII value in the range [0, BITS * ELEMENTS).
# This means that it will always return the same value for a given string
# or any string that starts with the same letter.
def bad_hash_function_1(name: str) -> int:
    return ord(name[0]) % (BITS * ELEMENTS)


def bad_hash_function_2(name: str) -> int:
    return (ord(name[0]) + 8) % (BITS * ELEMENTS)


class TestBloomFilterSample(TestCase):
    def setUp(self) -> None:
        self.bf = BloomFilter([bad_hash_function_1, bad_hash_function_2], bits=16, elements=1)

    def test_bloom_filter_sample(self):
        self.bf.add("a")  # This should set the 1st and 9th index to True
        self.assertEqual(self.bf.get("a"), True, "Bloom Filter returns incorrect result")

        self.bf.add("z")  # This should set the 10th and 2nd index to True
        self.assertEqual(self.bf.get("z"), True, "Bloom Filter returns incorrect result")

        # This should return True because the 2nd and 10th index are set to True even though the key 'b' has not been added
        # This is what makes a Bloom Filter a probabilistic data structure
        self.assertEqual(self.bf.get("b"), True, "Bloom Filter returns incorrect result")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值