Project-1-Buffer-Pool-Manager

Lab 2 Summary

 cmu15445第二个项目是关于在内存中实现buffer pool的,通过实现三个部分来加深对缓冲池的实现机制的理解,分别是LRU算法实现frame管理,单个buffer pool的管理,并行多个buffer pool的管理。

 实验地址:https://15445.courses.cs.cmu.edu/fall2021/project1/#buffer-pool-instance

 首先谈一谈对整个page在磁盘和内存中的进出过程,理解到这些各部分过后,后续的实验就很好理解了;

 1、首先就是说DBMS不是直接去disk里拿到存储页儿,这个点儿但凡开始看这个课应该都可以理解;

 2、当执行某条SQL语句的时候,执行引擎先是去内存中的buffer pool里获取该页儿,这个时候会发生两种情况:

  • 当buffer pool内存在该页儿的时候,首先我们知道对页儿操作有两种,读和写,但是写仍然是对于内存中buffer内的frame的修改,因此当多个线程操作的时候,有可能该缓存内的页儿没有写到disk内,因此这个时候该该页儿可能是dirty page,所以这个时候就检测,如果该页儿时脏页儿,就先写回disk,然后再进行该页儿的操作,具体的操作由上层决定,返回指向该page的指针;
  • 当buffer pool内不存在该页儿的时候,这个时候,buffer pool manager会从disk manager内从disk读取需要的page,然后缓存到buffer pool内,缓存进来过后,返回指向该page的指针给执行引擎;

DBMS Gets Page

 3、如何管理好buffer pool内存中页儿的换进换出,通过策略性的控制,实现线程安全的线程控制:

  • buffer pool:内存中的frame数组
  • page table:map[page_id] = frame_id,page和frame的哈希表
  • replacer:控制页儿的置换与否
  • pin/reference counter:正在使用该page的thread,pin_count为0的才会写回disk

 当操作某个页儿的时候,不管是添加还是删除,还是从磁盘读取都要对该page进行pin操作,也就是说当前的页再被线程操作,不可以置换,而且这样的操作是原子的,因此得需要进行上锁。

Buffer Pool Manager

LRU Replacement Policy

个人理解

 LRU置换算法,把使用频率最低的置换出去,体现在数据结构上就是队列的队尾给去掉,使用某个页儿时加入到队列的对头,当然再加入的时候得检测队列的大小是否超过容量;LRUReplacer管理的是frame_id也就是确定buffer pool内的page是否可以被置换,当我们把某些页儿给LRUReplacer时,通过lru策略管理他什么时候置换出去,而lru通过自身的算法优势来确定要victim哪些页儿;

  • Victim:从buffer pool中置换出某个frame,victim要与buffer pool manager配合使用来置换页儿;
  • Pin:page有一个pin_count的数据结构来表示是否被线程使用,因此当某个page被pin的时候,表示正在被线程使用不可以被置换,因此对应到LRU就是这个page对应的frame不应该在可以被置换的队列内;
  • Unpin:该页儿可以被置换出去,因此将该page对应的frame加入可被置换的队列内,如果本身在的话就不用管了;

 LRU这个部分总结起来就是,这个部分是一种策略性的东西,是buffer pool manager的一个步骤,自身无法操作page那些的。

实现细节
TODO 1 data structure
 private:
  // TODO(student): implement me!
  std::mutex lru_latch_;
  size_t num_pages_;
  std::list<frame_id_t> lru_list_;
  std::unordered_map<frame_id_t, std::list<frame_id_t>::iterator> lru_map_{};
TODO 2 constructor and destructor
LRUReplacer::LRUReplacer(size_t num_pages) {
  std::lock_guard<std::mutex> lock_guard(lru_latch_);
  num_pages_ = num_pages;
}

LRUReplacer::~LRUReplacer() {
// std::lock_guard<std::mutex> lock_guard(lru_latch_);
    // 我人傻了,搞了2天,好蠢~~~
  lru_list_.clear();
  lru_map_.clear();
};
TODO 3 Victim
bool LRUReplacer::Victim(frame_id_t *frame_id) {
  std::lock_guard<std::mutex> lock_guard(lru_latch_);
  if (lru_list_.empty()) {
    return false;
  }
  *frame_id = lru_list_.back();
  lru_map_.erase(*frame_id);
  lru_list_.pop_back();
  return true;
}
TODO 4 Pin
void LRUReplacer::Pin(frame_id_t frame_id) {
  std::lock_guard<std::mutex> lock_guard(lru_latch_);
  auto item = lru_map_.find(frame_id);
  if (item == lru_map_.end()) {
    return;
  }
  lru_list_.erase(item->second);
  lru_map_.erase(item);
}
TODO 5 Unpin
void LRUReplacer::Unpin(frame_id_t frame_id) {
  std::lock_guard<std::mutex> lock_guard(lru_latch_);
  if (lru_list_.size() >= num_pages_) {
    return;
  }
  if (lru_map_.count(frame_id) != 0) {
    return;
  }
  lru_list_.push_front(frame_id);
  lru_map_[frame_id] = lru_list_.begin();
}
Test
	make lru_replacer_test
	./test/lru_replacer_test

LRU 本地测试

	valgrind --leak-check=full --suppressions=../build_support/valgrind.supp ./test/lru_replacer_test --gtest_filter=LRUReplacerTest.SimplePageTest

LRU内存检测

Buffer Pool Manager Instance

个人理解

 Buffer Pool Manager主要实现几个功能,向disk写新的page,从disk里fetch page,刷新page等,这个部分千万千万反复看视频和PPT,当对page进行内容操作的时候会对该page进行pin,跟PPT上所说的一样。

实现细节
Notice
page.h
 private:
  /** Zeroes out the data that is held within the page. */
  inline void ResetMemory() { memset(data_, OFFSET_PAGE_START, PAGE_SIZE); }

  /** The actual data that is stored within a page. */
  char data_[PAGE_SIZE]{};
  /** The ID of this page. */
  page_id_t page_id_ = INVALID_PAGE_ID;
  /** The pin count of this page. */
  int pin_count_ = 0;
  /** True if the page is dirty, i.e. it is different from its corresponding page on disk. */
  bool is_dirty_ = false;
  /** Page latch. */
  ReaderWriterLatch rwlatch_;
buffer_pool_manager_instance.h
/** Number of pages in the buffer pool. */
  const size_t pool_size_;
  /** How many instances are in the parallel BPM (if present, otherwise just 1 BPI) */
  const uint32_t num_instances_ = 1;
  /** Index of this BPI in the parallel BPM (if present, otherwise just 0) */
  const uint32_t instance_index_ = 0;
  /** Each BPI maintains its own counter for page_ids to hand out, must ensure they mod back to its instance_index_ */
  std::atomic<page_id_t> next_page_id_ = instance_index_;

  /** Array of buffer pool pages. */
  Page *pages_;
  /** Pointer to the disk manager. */
  DiskManager *disk_manager_ __attribute__((__unused__));
  /** Pointer to the log manager. */
  LogManager *log_manager_ __attribute__((__unused__));
  /** Page table for keeping track of buffer pool pages. */
  std::unordered_map<page_id_t, frame_id_t> page_table_;
  /** Replacer to find unpinned pages for replacement. */
  Replacer *replacer_;
  /** List of free pages. */
  std::list<frame_id_t> free_list_;
  /** This latch protects shared data structures. We recommend updating this comment to describe what it protects. */
  std::mutex latch_;
disk_manager.h
  /**
   * Write a page to the database file.
   * @param page_id id of the page
   * @param page_data raw page data
   */
  void WritePage(page_id_t page_id, const char *page_data);

  /**
   * Read a page from the database file.
   * @param page_id id of the page
   * @param[out] page_data output buffer
   */
  void ReadPage(page_id_t page_id, char *page_data);
TODO 1 FlushPgImp
bool BufferPoolManagerInstance::FlushPgImp(page_id_t page_id) {
  // Make sure you call DiskManager::WritePage!
  std::lock_guard<std::mutex> lock_guard(latch_);
  if (page_id == INVALID_PAGE_ID) {
    return false;
  }
  auto item = page_table_.find(page_id);
  if (item == page_table_.end()) {
    return false;
  }

  frame_id_t frame_id = item->second;
  Page *page = &pages_[frame_id];
  disk_manager_->WritePage(page->GetPageId(), page->GetData());
  page->is_dirty_ = false;

  return true;
}
TODO 2 FlushAllPgsImp
void BufferPoolManagerInstance::FlushAllPgsImp() {
  // You can do it!
  std::lock_guard<std::mutex> lock_guard(latch_);
  for (auto &item : page_table_) {
    frame_id_t frame_id = item.second;
    Page *page = &pages_[frame_id];
    disk_manager_->WritePage(page->GetPageId(), page->GetData());
    page->is_dirty_ = false;
  }
}
TODO 3 NewPgImp
Page *BufferPoolManagerInstance::NewPgImp(page_id_t *page_id) {
  // 0.   Make sure you call AllocatePage!
  // 1.   If all the pages in the buffer pool are pinned, return nullptr.
  // 2.   Pick a victim page P from either the free list or the replacer. Always pick from the free list first.
  // 3.   Update P's metadata, zero out memory and add P to the page table.
  // 4.   Set the page ID output parameter. Return a pointer to P.
  std::lock_guard<std::mutex> lock_guard(latch_);

  frame_id_t frame_id;
  Page *page = nullptr;

  if (!free_list_.empty()) {
    frame_id = free_list_.front();
    free_list_.pop_front();
    page = &pages_[frame_id];
  } else if (replacer_->Victim(&frame_id)) {
    page = &pages_[frame_id];
    if (page->IsDirty()) {
      disk_manager_->WritePage(page->GetPageId(), page->GetData());
    }
    page_table_.erase(page->GetPageId());
  } else {
    return nullptr;
  }

  page_id_t new_page_id = AllocatePage();
  page->page_id_ = new_page_id;
  page->is_dirty_ = false;
  page->pin_count_ = 1;
  page->ResetMemory();

  page_table_[page->GetPageId()] = frame_id;
  *page_id = page->GetPageId();
  replacer_->Pin(frame_id);

  return page;
}
TODO 4 FetchPgImp
Page *BufferPoolManagerInstance::FetchPgImp(page_id_t page_id) {
  // 1.     Search the page table for the requested page (P).
  // 1.1    If P exists, pin it and return it immediately.
  // 1.2    If P does not exist, find a replacement page (R) from either the free list or the replacer.
  //        Note that pages are always found from the free list first.
  // 2.     If R is dirty, write it back to the disk.
  // 3.     Delete R from the page table and insert P.
  // 4.     Update P's metadata, read in the page content from disk, and then return a pointer to P.

  std::lock_guard<std::mutex> lock_guard(latch_);
  frame_id_t frame_id;
  Page *page = nullptr;

  auto item = page_table_.find(page_id);
  if (item != page_table_.end()) {
    frame_id = item->second;
    page = &pages_[frame_id];
    page->pin_count_++;
    replacer_->Pin(frame_id);
    return page;
  }

  if (!free_list_.empty()) {
    frame_id = free_list_.front();
    free_list_.pop_front();
    page = &pages_[frame_id];
  } else if (replacer_->Victim(&frame_id)) {
    page = &pages_[frame_id];
    if (page->IsDirty()) {
      disk_manager_->WritePage(page->GetPageId(), page->GetData());
    }
    page_table_.erase(page->GetPageId());
  } else {
    return nullptr;
  }

  page->page_id_ = page_id;
  page->pin_count_ = 1;
  page->is_dirty_ = false;
  disk_manager_->ReadPage(page->GetPageId(), page->GetData());
  page_table_[page->GetPageId()] = frame_id;
  replacer_->Pin(frame_id);

  return page;
}
TODO 5 DeletePgImp
bool BufferPoolManagerInstance::DeletePgImp(page_id_t page_id) {
  // 0.   Make sure you call DeallocatePage!
  // 1.   Search the page table for the requested page (P).
  // 1.   If P does not exist, return true.
  // 2.   If P exists, but has a non-zero pin-count, return false. Someone is using the page.
  // 3.   Otherwise, P can be deleted. Remove P from the page table, reset its metadata and return it to the free list.

  std::lock_guard<std::mutex> lock_guard(latch_);

  DeallocatePage(page_id);

  auto item = page_table_.find(page_id);
  if (item == page_table_.end()) {
    return true;
  }

  frame_id_t frame_id = item->second;
  Page *page = &pages_[frame_id];
  if (page->GetPinCount() != 0) {
    return false;
  }

  if (page->IsDirty()) {
    disk_manager_->WritePage(page->GetPageId(), page->GetData());
  }

  replacer_->Pin(frame_id);
  page_table_.erase(page->GetPageId());
  page->pin_count_ = 0;
  page->is_dirty_ = false;
  page->ResetMemory();
  page->page_id_ = INVALID_PAGE_ID;
  free_list_.push_back(frame_id);

  return true;
}
TODO 6 UnpinPgImp
bool BufferPoolManagerInstance::UnpinPgImp(page_id_t page_id, bool is_dirty) {
  std::lock_guard<std::mutex> lock_guard(latch_);
  auto item = page_table_.find(page_id);
  if (item == page_table_.end()) {
    return false;
  }

  frame_id_t frame_id = item->second;
  Page *page = &pages_[frame_id];
  if (page->GetPinCount() <= 0) {
    return false;
  }

  page->pin_count_--;
  if (is_dirty) {
    page->is_dirty_ = true;
  }

  if (page->GetPinCount() <= 0) {
    replacer_->Unpin(frame_id);
  }

  return true;
}
Test
	make buffer_pool_manager_instance_test
	./test/buffer_pool_manager_instance_test

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ITbGpFMT-1663687677342)(https://cdn.jsdelivr.net/gh/shatanyumi/imgs@master/imgs/2022/02/28/20220228170639.png)]

	valgrind --leak-check=full --suppressions=../build_support/valgrind.supp ./test/buffer_pool_manager_instance_test --gtest_filter=BufferPoolManagerInstanceTest.BinaryDataTest:BufferPoolManagerInstanceTest.SampleTest

Buffer Pool Manager Instance内存检测

Parallel Buffer Pool Manager

个人理解

multiple buffer pools

 并行buffer pool manager,这个部分其实是想让我们实现一个并行的数据库优化,也就是四个数据库优化中的第一个,multiple buffer pools,估计后边儿的实验老师会围绕这几个优化分别进行项目的制作,太牛b啊,这个部分的实现是采用了第二个Hash的方式来实现对每一个page的分配,也就是说当我们拿到一个page的时候,根据page_id来确定该page保存在那个buffer pool,注意:

page_id_t BufferPoolManagerInstance::AllocatePage() {
  const page_id_t next_page_id = next_page_id_;
  next_page_id_ += num_instances_;
  ValidatePageId(next_page_id);
  return next_page_id;
}

void BufferPoolManagerInstance::ValidatePageId(const page_id_t page_id) const {
  assert(page_id % num_instances_ == instance_index_);  // allocated pages mod back to this BPI
}

 通过观察上述的代码跟踪,可以分析得出,每次对page_id的分配就是对其进行hash取模来判断是存在哪个pool;然后为了达到并行,我们直接每个函数内操作该buffer pool instance内地数据就好了,其他的就是分配页儿的策略了,看如何达到比较好的性能,哈哈哈我不会,我就每次newpage的时候去每个instance挨个找,性能可能会极差。

实现细节
TODO 1 Structure
 private:
  std::mutex parallel_buffer_pool_latch_;
  size_t num_instance_;
  size_t pool_size_;
  size_t start_index_;
  std::vector<BufferPoolManagerInstance *> buffer_pool_manager_;
TODO 2 Constructor
ParallelBufferPoolManager::ParallelBufferPoolManager(size_t num_instances, size_t pool_size, DiskManager *disk_manager,
                                                     LogManager *log_manager) {
  // Allocate and create individual BufferPoolManagerInstances
  std::lock_guard<std::mutex> lock_guard(parallel_buffer_pool_latch_);
  num_instance_ = num_instances;
  pool_size_ = pool_size;
  start_index_ = 0;
  for (size_t i = 0; i < num_instance_; ++i) {
    buffer_pool_manager_.push_back(
        new BufferPoolManagerInstance(pool_size_, num_instance_, i, disk_manager, log_manager));
  }
}
TODO 3 Destructor
// Update constructor to destruct all BufferPoolManagerInstances and deallocate any associated memory
ParallelBufferPoolManager::~ParallelBufferPoolManager() {
  for (size_t i = 0; i < num_instance_; ++i) {
    delete buffer_pool_manager_[i];
  }
}
TODO 4 GetPoolSize
size_t ParallelBufferPoolManager::GetPoolSize() {
  // Get size of all BufferPoolManagerInstances
  return pool_size_ * num_instances_;
}
TODO 5 GetBufferPoolManager
BufferPoolManager *ParallelBufferPoolManager::GetBufferPoolManager(page_id_t page_id) {
  // Get BufferPoolManager responsible for handling given page id. You can use this method in your other methods.
  return buffer_pool_manager_[page_id % num_instance_];
}

TODO 6 FetchPgImp
Page *ParallelBufferPoolManager::FetchPgImp(page_id_t page_id) {
  // Fetch page for page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->FetchPage(page_id);
}
TODO 7 UnpinPgImp
bool ParallelBufferPoolManager::UnpinPgImp(page_id_t page_id, bool is_dirty) {
  // Unpin page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->UnpinPage(page_id, is_dirty);
}
TODO 8 FlushPgImp
bool ParallelBufferPoolManager::FlushPgImp(page_id_t page_id) {
  // Flush page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->FlushPage(page_id);
}
TODO 9 NewPgImp
Page *ParallelBufferPoolManager::NewPgImp(page_id_t *page_id) {
  // create new page. We will request page allocation in a round robin manner from the underlying
  // BufferPoolManagerInstances
  // 1.   From a starting index of the BPMIs, call NewPageImpl until either 1) success and return 2) looped around to
  // starting index and return nullptr
  // 2.   Bump the starting index (mod number of instances) to start search at a different BPMI each time this function
  // is called

  std::lock_guard<std::mutex> lock_guard(parallel_buffer_pool_latch_);
  for (size_t i = start_index_; i < start_index_ + num_instance_; ++i) {
    BufferPoolManager *manager = buffer_pool_manager_[i % num_instance_];
    Page *page = manager->NewPage(page_id);
    if (page != nullptr) {
      start_index_ = (i + 1) % num_instance_;
      return page;
    }
  }
  return nullptr;
}
TODO 10 DeletePgImp
bool ParallelBufferPoolManager::DeletePgImp(page_id_t page_id) {
  // Delete page_id from responsible BufferPoolManagerInstance
  return GetBufferPoolManager(page_id)->DeletePage(page_id);
}
TODO 11 FlushAllPgsImp
void ParallelBufferPoolManager::FlushAllPgsImp() {
  // flush all pages from all BufferPoolManagerInstances
  for (size_t i = 0; i < num_instance_; ++i) {
    buffer_pool_manager_[i]->FlushAllPages();
  }
}
Test
 	make parallel_buffer_pool_manager_test
 	./test/parallel_buffer_pool_manager_test

Parallel Buffer Pool Manager 本地测试

	valgrind --leak-check=full --suppressions=../build_support/valgrind.supp ./test/parallel_buffer_pool_manager_test --gtest_filter=ParallelBufferPoolManagerTest.BinaryDataTest:ParallelBufferPoolManagerTest.Sample
Test

Parallel Buffer Pool Manager 内存测试

Gradescope
	make format
 	make check-lint
 	make check-clang-tidy
 	
	zip project1-submission.zip \
    src/include/buffer/lru_replacer.h \
    src/buffer/lru_replacer.cpp \
    src/include/buffer/buffer_pool_manager_instance.h \
    src/buffer/buffer_pool_manager_instance.cpp \
    src/include/buffer/parallel_buffer_pool_manager.h \
    src/buffer/parallel_buffer_pool_manager.cpp	

GradeScope

总结

 仿佛打开新世界的大门啊,嗯怎么说,总结几点需要注意的吧:

  • LRU的析构函数(其他的也是)还是手动释放一下内存比较好,但是不要愚蠢
  • 并行buffer pool的页分配,参照代码注释采用轮询的时候,可以考虑分配的更加均匀,反正我是上一次分配buffer pool的下一个来,挨着来找;可能一开始会比较均匀,但是最终由于读取那些page每个被访问的概率不一样,其实这也是假想的负载均衡,反正我象不太清楚,能大概的做一个global policy就好了,达到目的
  • 然后就是反复看视频和PPT,加深理解

 行,基本Over,继续学习,太菜了~

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值