levelDB源码笔记（3）-cache

最新推荐文章于 2024-07-29 17:09:15 发布

windows2

最新推荐文章于 2024-07-29 17:09:15 发布

阅读量605

点赞数

本文链接：https://blog.csdn.net/windows2/article/details/24587627

版权

levelDB实现的cache是LRU(Least Recently Used 近期最少使用)算法。其实现在ShardedLRUCache中，类成员主要有

class ShardedLRUCache : public Cache {
 private:
  LRUCache shard_[kNumShards];
  port::Mutex id_mutex_;
  uint64_t last_id_;

  static inline uint32_t HashSlice(const Slice& s) {
    return Hash(s.data(), s.size(), 0);
  }

  static uint32_t Shard(uint32_t hash) {
    return hash >> (32 - kNumShardBits);
  }

 public:
  explicit ShardedLRUCache(size_t capacity)
      : last_id_(0) {
    const size_t per_shard = (capacity + (kNumShards - 1)) / kNumShards;
    for (int s = 0; s < kNumShards; s++) {
      shard_[s].SetCapacity(per_shard);
    }
  }
  virtual ~ShardedLRUCache() { }
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) {
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Insert(key, hash, value, charge, deleter);
  }
  virtual Handle* Lookup(const Slice& key) {
    const uint32_t hash = HashSlice(key);
    return shard_[Shard(hash)].Lookup(key, hash);
  }
  virtual void Release(Handle* handle);
  virtual void Erase(const Slice& key);
  virtual void* Value(Handle* handle);

};

其主要的成员是LRUCache shard_[kNumShards];

每个SharedLRUCache包含多个LRUCache，查找Key时首先计算key属于哪一个分片hash=Shard(HashSlice(key)) ,然后在相应的shard_[hash]上进行查找。分片采用hash值的高位，这是一种常见的方法。使用多个LRUCache上，可以减少多线程的锁开销。对了，cache里都使用了mutex，ref等技术，保证了线程安全

LRUCache用的是个比较标准的算法。

 class LRUCache {
 public:
  void SetCapacity(size_t capacity) { capacity_ = capacity; }

  // Like Cache methods, but with an extra "hash" parameter.
  Cache::Handle* Insert(const Slice& key, uint32_t hash,
                        void* value, size_t charge,
                        void (*deleter)(const Slice& key, void* value));
  Cache::Handle* Lookup(const Slice& key, uint32_t hash);
  void Release(Cache::Handle* handle);
  void Erase(const Slice& key, uint32_t hash);

 private:
 // Initialized before use.
  size_t capacity_;

  // mutex_ protects the following state.
  port::Mutex mutex_;
  size_t usage_;

  // Dummy head of LRU list.
  // lru.prev is newest entry, lru.next is oldest entry.
  LRUHandle lru_;

  HandleTable table_;
}

LRUCache需要通过key来获取对应的value（或null表示missing），每个数据对的指针保存在一个Handle节点里。

其主要成员包括一个按使用时间排列的双向链表lru。双向列表适合插入删除，特别是可以快速删除最老的块。其中lru.prev指向最新使用的块，lru.next指向最老的块。capacity_是cache最大长度，useage_则是当前已用，当useage_>capicity_，则删掉最老的一些块，释放内存。

另一个成员table_。这是一个自己实现的哈希表（内部实现采用二维链表），可以通过key找到对应的块。

双向链表和handletable的单个节点都是LRUHandle。所以LRUHandle包含了数据key/value，双向列表需要的前向，后向指针，以及handletable需要的指针next_hash

其成员函数定义如下：

struct LRUHandle {
  void* value;
  void (*deleter)(const Slice&, void* value);
  LRUHandle* next_hash; //handletable需要的指针
  LRUHandle* next;  //双向链表需要的next指针
  LRUHandle* prev;   //双向链表需要的prev指针
  size_t charge;      // 本块的数据大小（似乎是只有value的大小？不过这个不重要）
  size_t key_length;
  uint32_t refs;
  uint32_t hash;      // Hash of key(); used for fast sharding and comparisons
  char key_data[1];   // Beginning of key
......
};

这里用了几个技巧

1. key_data[1] .这个必须放在结构的最后一个成员。当申请一个LRUHandle变量时，申请的长度是sizeof(LRUHandle)-1 + key.size()

  LRUHandle* e = reinterpret_cast<LRUHandle*>(
      malloc(sizeof(LRUHandle)-1 + key.size()));

用key_data可以索引到后面一些多出来的内存，等于实现了一个变长的buffer。

2. refs。

我们知道，cache随时可能被替换。当一个Handle块被从cache删除时，可能外部另一个线程正在使用它，这时候我们直接释放掉对应的内存，就会出错。

这里用ref配合delete函数来解决这个问题。当insert一个块时，ref=2。因为此时外部正在访问这个块，同时cache内部保留了这个块。另一个外部使用函数是lookup。每次外部从cache中获取一个handle，其ref++。当外部使用完毕后，必须调用unref函数，ref--。从cache中删除一个块，同样造成ref--。只有当ref<=0,才真正调用delete释放内存，同时修改对应cache的大小（useage）

3.将双向列表和hashtable使用同一个handle结构作为基本节点。

Handletable的主要成员变量是

 private:
  // The table consists of an array of buckets where each bucket is
  // a linked list of cache entries that hash into the bucket.
  uint32_t length_;
  uint32_t elems_;
  LRUHandle** list_;

查找方式如下：

  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    while (*ptr != NULL &&
           ((*ptr)->hash != hash || key != (*ptr)->key())) {
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

其实相当于采用链表法避免冲突的哈希表。