系统设计学习（二）Design a key-value cache to save the results of the most recent web server queries

会说话的皮卡丘

已于 2022-08-28 18:32:15 修改

阅读量396

点赞数 5

分类专栏：系统设计文章标签：学习缓存

于 2022-06-06 22:52:08 首次发布

本文链接：https://blog.csdn.net/xiaocaij_icai/article/details/125125270

版权

系统设计专栏收录该内容

6 篇文章 1 订阅

订阅专栏

老规矩先看题

Use cases We’ll scope the problem to handle only the following use cases
User sends a search request resulting in a cache hit
User sends a search request resulting in a cache miss
Service has high availability Constraints and assumptions
State assumptions Traffic is not evenly distributed
Popular queries should almost always be in the cache
Need to determine how to expire/refresh Serving from cache requires fast lookups
Low latency between machines Limited memory in cache
Need to determine what to keep/remove
Need to cache millions of queries 10 million users 10 billion queries per month

计算：
缓存存储的键的有序列表分别为 query, value: results
query - 50 bytes
title - 20 bytes
snippet - 200 bytes
Total: 270 bytes
也就是说单条缓存大概需要270bytes
如果10 billion的用户每条数据都是独一无二的话
可以计算得到一个月的存储消耗为2.7t
4000 requests per second = 01 billion requests per month
也就是说每秒大概四千次请求平均

Create a high level design
在这里插入图片描述
因为缓存的容量有限制，所以我们要设置一个过期策略，这里的话可以采用LRU机制
具体设计大概如下：
1.client发送一个请求到webserver，运行反向代理
2.webserver执行查询api
3.查询api会执行下述操作
1）解析语法：
删除标记
将文本分解为术语
修复拼写错误
规范化大写
将查询转换为使用布尔运算
2）检查是否命中缓存：
如果命中：
利用LRU机制更新缓存
返回缓存内容
如果没有命中：
使用Reverse Index Service查找匹配的文档，对查找出来的文档利用匹配对进行排序得出排名，其中排名最高的就是要查找的文档
使用Document Service 返回 titles and snippets
更新缓存的内容，更新LRU
关于LRU的实现其实是一个很好的设计，是用hash+链表的结构设计能够使得其插入和查找的时间复杂度都为常数级别，lc上有一道相关的算法题，忘了的可以看一下，这题Java写起来比较快，cpp要自己写一遍所有的基本操作hhh
https://leetcode.cn/problems/lru-cache/
查询api的实现：

class QueryApi(object):

    def __init__(self, memory_cache, reverse_index_service):
        self.memory_cache = memory_cache
        self.reverse_index_service = reverse_index_service

    def parse_query(self, query):
        """Remove markup, break text into terms, deal with typos,
        normalize capitalization, convert to use boolean operations.
        """
        ...

    def process_query(self, query):
        query = self.parse_query(query)
        results = self.memory_cache.get(query)
        if results is None:
            results = self.reverse_index_service.process_search(query)
            self.memory_cache.set(query, results)
        return results

节点的实现：

class Node(object):

    def __init__(self, query, results):
        self.query = query
        self.results = results

链表的实现：

class LinkedList(object):

    def __init__(self):
        self.head = None
        self.tail = None

    def move_to_front(self, node):
        ...

    def append_to_front(self, node):
        ...

    def remove_from_tail(self):
        ...

缓存的实现：

class Cache(object):

    def __init__(self, MAX_SIZE):
        self.MAX_SIZE = MAX_SIZE
        self.size = 0
        self.lookup = {}  # key: query, value: node
        self.linked_list = LinkedList()

    def get(self, query)
        """Get the stored query result from the cache.

        Accessing a node updates its position to the front of the LRU list.
        """
        node = self.lookup[query]
        if node is None:
            return None
        self.linked_list.move_to_front(node)
        return node.results

    def set(self, results, query):
        """Set the result for the given query key in the cache.

        When updating an entry, updates its position to the front of the LRU list.
        If the entry is new and the cache is at capacity, removes the oldest entry
        before the new entry is added.
        """
        node = self.lookup[query]
        if node is not None:
            # Key exists in cache, update the value
            node.results = results
            self.linked_list.move_to_front(node)
        else:
            # Key does not exist in cache
            if self.size == self.MAX_SIZE:
                # Remove the oldest entry from the linked list and lookup
                self.lookup.pop(self.linked_list.tail.query, None)
                self.linked_list.remove_from_tail()
            else:
                self.size += 1
            # Add the new key and value
            new_node = Node(query, results)
            self.linked_list.append_to_front(new_node)
            self.lookup[query] = new_node

缓存更细的条件：
页面内容更改
删除页面或添加新页面
页面排名更改
解决这些最直接的办法是设置一个最大时间

非常感谢阅读到最后，也感谢您的支持。感兴趣的同学可以看一下我的这一篇帖子
系统设计学习(一)Design Pastebin.com (or Bit.ly)

会说话的皮卡丘

关注

5
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
系统设计学习（二）Design a key-value cache to save the results of the most recent web server queries

老规矩先看题计算：缓存存储的键的有序列表分别为 query, value: resultsquery - 50 bytestitle - 20 bytessnippet - 200 bytesTotal: 270 bytes也就是说单条缓存大概需要270bytes如果10 billion的用户每条数据都是独一无二的话可以计算得到一个月的存储消耗为2.7t4000 requests per second = 01 billion requests per month也就是说每秒大概四千次请
复制链接

扫一扫