【CMU15-455/645 2022 FALL】

海是不够秃

已于 2023-05-06 11:56:57 修改

阅读量506

点赞数 1

文章标签：数据库 c++

于 2023-05-04 19:09:50 首次发布

本文链接：https://blog.csdn.net/qq_44763629/article/details/130492495

版权

CMU15-455/645 2022 FALL :PROJECT 1

课程链接是卡内基梅隆大学的数据库课程

2022FALL 原版代码由于每年都会更新课程和代码，如果链接失效，请联系我

不同版本代码样式、任务实现有些许出入，注意区别。例如2022版中包含大量模板 (template) ，实现Extendible Hash Table代替unordered_map等

课程Gradescope 非CMU学生同样可以进行在线样例测试，具体使用可以看课程ASSIGNMENTS和FAQ，2024年之前2022的LAB应该都可以在线提交

Task #1 Extendible Hash Table

吐槽：这个task有种为了实现而实现的感觉，使用unordered_map多是一件美事

Extendible Hash Table 在 task 3 中才被使用

在实现Extendible Hash Table的 增删查 之前，一定要理清 Extendible Hash Table 的逻辑，尤其是 global_depth_ 和 local_depth_ 的作用

global_depth_的作用有三个：
- 指定 dir_大小，最大为 $_ 2^{global\_depth\_}$
- 取 hash(key) 的低 $2^{global\_depth\_-1}$ 位作为 dir_的索引 index
- 限制 local_depth_大小，防止 Bucket 无限加深，影响效率
local_depth_的作用是指定 dir_[index]即每个 Bucket 的大小，最大为 $_ 2^{local\_depth\_}$

当我们进行 Insert()，Find()，Remove()时，通过以下代码计算 dir_索引值，位运算不熟练，记录一下

template <typename K, typename V>
auto ExtendibleHashTable<K, V>::IndexOf(const K &key) -> size_t {
  int mask = (1 << global_depth_) - 1;	// mask 为 2^{global_depth_ - 1}个二进制1
  return std::hash<K>()(key) & mask;		// 取 hash() 的 低 2^{global_depth_ - 1} 位
}

下面讨论何时需要扩展 global_depth_和 local_depth_

如图，当插入的新元素的 hash(key)= 0b1000 时， index取低二位 $0 b 00$ ，则元素应插入 dir_[0] 中，但是当前 Bucket 大小达到了 $2^{1} = 2$ ，即已经满了

在这里插入图片描述

因此当且仅当 Bucket 满时，需要按照以下步骤扩展 Extendible Hash Table

$dir\_.resize(2^{global\_depth\_+ 1 })$
$_ dir\_[2^{global\_depth\_}+ i] = dir\_[i], 0 <= i < 2^{global\_depth\_}$
$_ global\_depth\_$ ++
分裂 $dir\_[index]$ ，创建新的Bucket，使 $dir\_[2^{global\_depth\_ - 1}+ index]$ 指向新的Bucket，重新分配 $dir\_[index]中$ $< k ey, v a l u e >$ 键值对

需要注意，在第二步中，务必保证对 Bucket 指针进行复制，否则当我们进行 Find(0b11) 时，会出现如下问题:

在这里插入图片描述

正确的做法如下图:

在这里插入图片描述

插入完成后如图：

在这里插入图片描述

图中发生了 Bucket 分裂，但是分裂产出的 Bucket 没有包含元素

记得加锁！

注意 std::list 迭代中erase 一开始还以为实现逻辑问题，后来发现是迭代器被我扬了。。。

#include <iostream>
#include <vector>
#include <list>

using namespace std;

template<class K>
auto stlprint(const K& l) -> void {
    for(auto ite = l.begin(); ite != l.end(); ite++) {
        cout << *ite << " ";
    }
    cout << endl;
}

int main()
{
    list<int> l{1,2,3,4,5,6,7,8,9};
    stlprint(l);
  	// 正确用法
    for(auto ite = l.begin(); ite != l.end();) {
        if (*ite == 5) {
            l.erase(ite++);
        } else {
            ite++;
        }
    }
    stlprint(l);
  	// list错误用法， vector则可正常使用
    for(auto ite = l.begin(); ite != l.end(); ite++) {
        if (*ite == 4) {
            l.erase(ite);   // interrupted by signal 11: SIGSEGV
          // list中这样用的结果就是，迭代器直接没了，也就不能正常遍历了
        }
    }
    stlprint(l);
    return 0;
}

Task #2 LRU-K Replacer

LRU记录每个 frame 最近一次被使用的时间戳，驱逐时最近一次调用时间最早的 frame 。实现中不需要也没必要记录时间戳，只需要用 list 表达元素先后关系即可。 $O (1)$ 时间复杂度实现LRU，需要一个 list<key>，一个 unordered_map<key, list<key>::iterator> >

LRU-K记录每个 frame 最近K次被使用的时间戳，驱逐倒数第K次调用时间最早的 frame 。

通过 frame_call_counts 记录每个 frame 的访问次数，当访问次数小于k：
- 且未被 eventmap 记录，则将 frame 记录在 eventlist 和 eventmap 中
- 若已经被 eventmap 记录，无需调整元素位置，只需要访问次数计数即可
当访问次数为k时，在 eventlist 和 eventmap 中删除 frame，在 lruklist和 lrukmap 中添加 frame
当访问次数大于k时，采用 LRU策略维护 lruklist

	size_t curr_size_{0};
  size_t replacer_size_;
  size_t k_;
  std::mutex latch_;

  std::list<frame_id_t> lruklist_;                                      // list of lru
  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> lrukmap_;  // map of lru

  std::unordered_map<frame_id_t, bool> evictable_;
  std::unordered_map<frame_id_t, size_t> frame_call_counts_;
  // std::unordered_map<frame_id_t, std::list<size_t> > frame_call_list_;
	// 没必要记录最近k次的list

  std::list<frame_id_t> eventlist_;                                   // list of event
  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> eventmap_;  // map of event

LRU-K页面驱逐流程如下：

优先从 eventmap 和 eventlist 中驱逐次数不足k的页面，因为这些页面的倒数第k次调用时间被认为是无穷。注意，驱逐这些页面的策略，并不是按照调用的次数，而是按照第一次被调用的时间戳，即在 list 的尾部（如果你是在 list 头部插入元素的话）此外驱逐时，务必注意 frame 是否被 $P I N$ ，即 evictable 状态是否为 true
当 eventmap 和 eventlist 中没有满足条件的 frame时，则需要从 lrukmap 和 lruklist 中驱逐最近最久未被调用的 frame

记得加锁！

Task #3 Buffer Pool Manager Instance

这个任务实现的是一个缓冲区管理，涉及读写磁盘、LRU任务分配等多个方面

主要数据成员如下:

/** Array of buffer pool pages. */
  Page *pages_;
  /** Pointer to the disk manager. */
  DiskManager *disk_manager_ __attribute__((__unused__));
  /** Pointer to the log manager. Please ignore this for P1. */
  LogManager *log_manager_ __attribute__((__unused__));
  /** Page table for keeping track of buffer pool pages. */
  ExtendibleHashTable<page_id_t, frame_id_t> *page_table_;
  /** Replacer to find unpinned pages for replacement. */
  LRUKReplacer *replacer_;
  /** List of free frames that don't have any pages on them. */
  std::list<frame_id_t> free_list_;
  /** This latch protects shared data structures. We recommend updating this comment to describe what it protects. */
  std::mutex latch_;

其中， pages_构造为一个指定大小的 Page 数组， Page 主要数据成员如下：

  /** The actual data that is stored within a page. */
  char data_[BUSTUB_PAGE_SIZE]{};
  /** The ID of this page. */
  page_id_t page_id_ = INVALID_PAGE_ID;
  /** The pin count of this page. */
  int pin_count_ = 0;
  /** True if the page is dirty, i.e. it is different from its corresponding page on disk. */
  bool is_dirty_ = false;
  /** Page latch. */
  ReaderWriterLatch rwlatch_;

整个task中，实现的最重要的两个成员函数，分别是 NewPgImp 和 FetchPgImp，函数职责头文件写的明明白白、规规矩矩，需要注意的内容如下：

操作中一旦有页面为脏数据，及时写回磁盘！
Page 中的 data_ , page_id_ , pin_count_ , is_dirty_ 及时清除/初值覆盖!
DeletePgImp 中，需要将 frame 加入 free_list_
Buffer Pool Manager 和 LRUKReplacer 很多操作需要关联处理，本质上是前者业务实现需要调用后者，因此 NewPgImp、FetchPgImp 需要调用 LRUKReplacer 记录时间戳；DeletePgImp中需要清除LRUKReplacer 中的记录等
记得加锁

/**
   * TODO(P1): Add implementation
   *
   * @brief Create a new page in the buffer pool. Set page_id to the new page's id, or nullptr if all frames
   * are currently in use and not evictable (in another word, pinned).
   *
   * You should pick the replacement frame from either the free list or the replacer (always find from the free list
   * first), and then call the AllocatePage() method to get a new page id. If the replacement frame has a dirty page,
   * you should write it back to the disk first. You also need to reset the memory and metadata for the new page.
   *
   * Remember to "Pin" the frame by calling replacer.SetEvictable(frame_id, false)
   * so that the replacer wouldn't evict the frame before the buffer pool manager "Unpin"s it.
   * Also, remember to record the access history of the frame in the replacer for the lru-k algorithm to work.
   *
   * @param[out] page_id id of created page
   * @return nullptr if no new pages could be created, otherwise pointer to new page
   */
  auto NewPgImp(page_id_t *page_id) -> Page * override;

  /**
   * TODO(P1): Add implementation
   *
   * @brief Fetch the requested page from the buffer pool. Return nullptr if page_id needs to be fetched from the disk
   * but all frames are currently in use and not evictable (in another word, pinned).
   *
   * First search for page_id in the buffer pool. If not found, pick a replacement frame from either the free list or
   * the replacer (always find from the free list first), read the page from disk by calling disk_manager_->ReadPage(),
   * and replace the old page in the frame. Similar to NewPgImp(), if the old page is dirty, you need to write it back
   * to disk and update the metadata of the new page
   *
   * In addition, remember to disable eviction and record the access history of the frame like you did for NewPgImp().
   *
   * @param page_id id of page to be fetched
   * @return nullptr if page_id cannot be fetched, otherwise pointer to the requested page
   */
  auto FetchPgImp(page_id_t page_id) -> Page * override;

代码规范

对代码规范有严格要求

/* return 后不要有 else */
if () {
  // do something
  return ;
} 
// do something

/* 而不是下面这样 */
if () {
  // do something
  return ;
} else {
  // do something
}

在线测试烦得一批，看不到测试样例，最后总算过了，但是慢

在这里插入图片描述