2022-15445-fall-project1

最新推荐文章于 2024-03-29 14:22:31 发布

小贺的学习日记

最新推荐文章于 2024-03-29 14:22:31 发布

阅读量456

点赞数

文章标签：哈希算法算法数据库开发

本文链接：https://blog.csdn.net/Hxy_666/article/details/129736777

版权

Lru-k理解

Lru-k是lru的进阶，其中lru是置换掉最近最久未访问的页面

lru-k的意思是，当前缓冲区如果被占满，那么我置换出去的页面是缓冲区中没有被访问k次的页面；如果缓冲区中的页面全部访问了k次以上，那么即退化为lru算法

所以应该用到两个队列，一个history用来存储访问次数没有到k次的页面，这个队列用hash表比较合适，key用来保存页面，value用来存储访问次数k（key,value是pair，这是两个数据结构，记混了）；另一个队列则是来保存访问次数超过k次的页面，并且如果我当前访问的页面在这个队列中，应该将其移至队列的头部，这样的话，当需要置换的页面在这个队列中的时候，我们就可以直接把队尾的页面给置换出去即可。

Task #1 - Extendible Hash Table

这个跟去年的p2是一样的，并且比去年的p2少了一个删除的时候合并桶的函数（这个困扰了我好久）。

    size_t size_;

    int depth_;

    std::list<std::pair<K, V>> list_;

观察桶类的成员变量我们发现桶中存储的主要数据是键值对。

哈希表中的成员函数写得很清楚就不介绍了。

Task1中主要说一下哈希表的插入，其他的都不难。

关于extendible hash table

Extendible Hashing (Dynamic approach to DBMS) - GeeksforGeeks

这个网站有较为详细的解释。

在插入的时候有以下几种情况：

一、桶不满，通过目录找到对应桶直接插入即可。

二、桶满且globaldepth > localdepth

这个时候需要再扩展一个桶，调整目录表，并且将原来桶中的数据重新划分一下。然后再将新的数据插入（重复这三种情况）。

三、桶满且glodepth == localdepth

这个时候需要先扩展目录表。然后再扩展一个桶，调整目录表，并且将原来桶中的数据重新划分一下。然后再将新的数据插入（重复这三种情况）。

如果我们仔细观察的话发现第一种情况是都会经历的，第三种情况又只比第一种情况多了一个扩展目录表。

目录表扩展的代码如下：

template <typename K, typename V>

void ExtendibleHashTable<K, V>::Grow() {

  int capacity = dir_.size();

  dir_.resize(capacity << 1);

  for (int i = 0; i < capacity; i++) {   // 调整索引

    dir_[i + capacity] = dir_[i];

  }

  global_depth_++;

}

注意这个Grow()在头文件中是不存在的，需要我们去声明一下。

Insert的全部代码如下

template <typename K, typename V>

void ExtendibleHashTable<K, V>::Insert(const K &key, const V &value) {

  std::scoped_lock<std::mutex> lock(latch_);

  while (dir_[IndexOf(key)]->IsFull()) {    // 需要递归判断是否能将要插入的桶是否为满

    size_t index = IndexOf(key);

    auto target_bucket = dir_[index];

    int bucket_localdepth = target_bucket->GetDepth();

    if (global_depth_ == bucket_localdepth) {

      Grow();

    }

    int mask = 1 << bucket_localdepth;

    // 声明两个桶，废除原来目录中指向的那个桶。

    auto bucket_0 = std::make_shared<Bucket>(bucket_size_, bucket_localdepth + 1);

    auto bucket_1 = std::make_shared<Bucket>(bucket_size_, bucket_localdepth + 1);

    // 重新调整桶中数据

    for (auto &item : target_bucket->GetItems()) {

      size_t hash_key = std::hash<K>()(item.first);

      if ((hash_key & mask) != 0U) {

        bucket_1->Insert(item.first, item.second);

      } else {

        bucket_0->Insert(item.first, item.second);

      }

    }

    num_buckets_++;

    // 调整桶中数据

    for (size_t i = 0; i < dir_.size(); i++) {

      if (dir_[i] == target_bucket) {

        if ((i & mask) == 0U) {

          dir_[i] = bucket_0;

        } else {

          dir_[i] = bucket_1;

        }

      }

    }

  }

  // 将数据插入桶中

  auto index = IndexOf(key);

  auto target_bucket = dir_[index];

  target_bucket->Insert(key, value);

}

（写笔记的时候发现自己的代码一个注释都没有，这并不是一个好习惯。）

在task1 make的时候，编译器推荐使用std::any_of，下边简略介绍一下：

在c++11中，其定义为

template< class InputIt, class UnaryPredicate >

bool any_of( InputIt first, InputIt last, UnaryPredicate p );

其含义为

Checks if unary predicate p returns true for at least one element in the range [first, last).

下补充一下其“兄弟”用法

template< class InputIt, class UnaryPredicate >

bool all_of( InputIt first, InputIt last, UnaryPredicate p );

Checks if unary predicate p returns true for all elements in the range [first, last).

template< class InputIt, class UnaryPredicate >

bool none_of( InputIt first, InputIt last, UnaryPredicate p );

 Checks if unary predicate p returns true for no elements in the range [first, last).

在桶类下的remove方法中，我们使用了

template <typename K, typename V>

auto ExtendibleHashTable<K, V>::Bucket::Remove(const K &key) -> bool {

  if (list_.empty()) {

    return false;  // 空桶

  }

  return std::any_of(list_.begin(), list_.end(), [&key, this](const auto &item) {

    // for (auto &item : list_) {

    if (item.first == key) {

      this->list_.remove(item);

      return true;

    }

    return false;

  });

  return false;  // 未找到该元素

}

在any_of第三个传入的参数中，我们使用了lambda匿名函数

lambda匿名函数的完整形式为[](){}

其中[]内是捕获的参数，()内是参数列表，{}内是函数主体。

Lambda 匿名函数用[ ]来代替函数名，并且跟据返回值判断返回类型

Task #2 - LRU-K Replacement Policy

LRU-K和2Q缓存算法介绍 - 简书

这篇文章有关于lru-k算法的一个简要介绍

下边来看看15445给出的是什么解释

The LRU-K algorithm evicts a frame whose backward k-distance is maximum of all frames in the replacer. Backward k-distance is computed as the difference in time between current timestamp and the timestamp of kth previous access. A frame with less than k historical accesses is given +inf as its backward k-distance. When multipe frames have +inf backward k-distance, the replacer evicts the frame with the earliest timestamp.

谷歌翻译一下

LRU-K 算法驱逐一个帧，其后向 k 距离是替换器中所有帧的最大值。向后 k 距离计算为当前时间戳与第 k 次先前访问的时间戳之间的时间差。具有少于 k 个历史访问的帧被赋予 +inf 作为其向后 k 距离。当 multipe 帧具有 +inf backward k-distance 时，替换器驱逐具有最早时间戳的帧。

其意思就是每次淘汰的是访问k次时与当前访问页面距离最远的。那如果一个页面没有访问到k次，它将被赋予+inf也即是正无穷的距离。当所有页面都有正无穷距离时，该算法退化为先进先出。

根据上边的简要介绍，这个task2也就没有那么难了。

上边关于lru-k的介绍网页中给出了两个数据结构，一个链表history_list_是用来存储访问次数不大于k次的页面，一个链表cache_list_是用来存储访问次数大于k次的页面。

根据此我们可以设置两个链表。

但是如果一个页面访问次数大于了k次，我们应该将其从history_list_中删除，并且加入到cache_list_中去，为了提高删除效率，我们加一个哈希表，映射页面和其所在的迭代器。

同理对于cache_list_也应该加一个哈希表。

以及记录页面访问次数的链表，页面是否被固定的链表

故最终的成员变量为

  size_t curr_size_{0};

  size_t replacer_size_;

  std::unordered_map<frame_id_t, size_t> access_count_;

  std::list<frame_id_t> history_list_;

  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> history_map_;

  std::list<frame_id_t> cache_list_;

  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> cache_map_;

  std::unordered_map<frame_id_t, bool> is_evictable_;

  size_t k_;

  std::mutex latch_;

下边来说成员函数，Evict()

找出置换出的页，故应当先在history_list_中寻找，找不到则在cache_list_中寻找

auto LRUKReplacer::Evict(frame_id_t *frame_id) -> bool {

  std::scoped_lock<std::mutex> lock(latch_);

  if (curr_size_ == 0) {

    return false;

  }

  for (auto it = history_list_.rbegin(); it != history_list_.rend(); ++it) {

    if (is_evictable_[*it]) {

      *frame_id = *it;

      access_count_[*frame_id] = 0;

      history_list_.erase(history_map_[*frame_id]);

      history_map_.erase(*frame_id);

      is_evictable_[*frame_id] = false;

      curr_size_--;

      return true;

    }

  }

  for (auto it = cache_list_.rbegin(); it != cache_list_.rend(); ++it) {

    if (is_evictable_[*it]) {

      *frame_id = *it;

      access_count_[*frame_id] = 0;

      cache_list_.erase(cache_map_[*frame_id]);

      cache_map_.erase(*frame_id);

      is_evictable_[*frame_id] = false;

      curr_size_--;

      return true;

    }

  }

  return false;

}

void LRUKReplacer::RecordAccess(frame_id_t frame_id) {

  std::scoped_lock<std::mutex> lock(latch_);

  if (frame_id >= static_cast<int>(replacer_size_)) {

    throw std::exception();

  }

  access_count_[frame_id]++;

  if (access_count_[frame_id] == k_) {  // 放入cache中去

    auto it = history_map_[frame_id];

    history_list_.erase(it);

    cache_list_.push_front(frame_id);

    cache_map_[frame_id] = cache_list_.begin();

  } else if (access_count_[frame_id] > k_) {  // 已经在cache中，则将其放入cache_list_的对首，表示访问

    if (access_count_.count(frame_id) != 0U) {

      auto it = cache_map_[frame_id];

      cache_list_.erase(it);

    }

    cache_list_.push_front(frame_id);

    cache_map_[frame_id] = cache_list_.begin();

  } else {  // 首次访问

    if (access_count_[frame_id] == 1) {

      history_list_.push_front(frame_id);

      history_map_[frame_id] = history_list_.begin();

    }

  }

}

其余的按照头文件给的步骤写就可以。

Task #3 - Buffer Pool Manager Instance

这个将task1和task2都用了起来，在这里，我们将知道其作用。

打开头文件看成员变量。

    /** Page table for keeping track of buffer pool pages. */

  ExtendibleHashTable<page_id_t, frame_id_t> *page_table_;

  /** Replacer to find unpinned pages for replacement. */

  LRUKReplacer *replacer_;

明显，可扩展哈希表是用来存储页号和页框号的映射关系的。lru-k就是页面置换策略。

这一章同样是按照注释所给的步骤写即可，要注意的是我们存放页面的数组pages_，其应该用frame_id来指引。

因为我们在声明的时候，pages_ = new Page[pool_size_];

Pool_size_是缓冲池的大小，缓冲池中存放的是页框，理所当然应该用页框号即frame_id。

完整版代码

GitHub - HexyinUESTC/bustub_2022 at local_new