[15-445 fall 2023] P2代码解析

最新推荐文章于 2024-07-19 00:12:38 发布

Wedge_Ss

最新推荐文章于 2024-07-19 00:12:38 发布

阅读量1.5k

点赞数 18

文章标签：数据库 c++ 数据库系统

本文链接：https://blog.csdn.net/MarksSa/article/details/135646298

版权

Task#1 Read/Write Page Guards

这个Guard主要作用在于避免Page使用完后，忘记把Page给Unpin或者是Unlock了，这就会导致该Page无法被evit以及被其他线程进行Lock。Page Guard在离开作用域后，就会触发析构函数，自动执行释放操作，达到RAII的效果，类比对应的就是智能指针。

我在P1中的bufferpool实现有问题，但是分数拿满了，因此PageGuard测试跑不过，有可能是bufferpool那边出问题了。说下当时犯的错误，我当时认为bufferpool manager只要查看Page对象的元数据或者要对元数据进行修改，那就要获取Page的读写锁。这其实是不对的，Page的读写锁管理的是Page的data，bufferpool manager的latch_确保了对Page的元数据操作的有序性。唯有当该Page的pin_count非0，而bufferpool manager还要对该读写Page的data时，此时才应获取Page的读写锁。

BasicPageGuard

最基本的PageGuard，无需获取读写锁，仅需确保在Guard调用析构函数时，及时Unpin即可。

这里需要注意点在于：拷贝构造函数和‘=’操作符的重载实现，都需要先把原Guard中的Page进行Drop操作后，再给Guard赋新值。

Reset

Reset方法是我自己新加的，功能就是将对象自身的bpm_ 、page_、is_dirty_属性进行初始化。

Drop

需要对Guard内部的Page进行Unpin操作，但是需要先检查Guard中是否保存了合法的Page。如果Page为nullptr，那么表明该Guard已经被初始化了，无需进行Drop操作。

拷贝构造和=号重载实现

其实就是接管that管理的Page。

首先需要先调用Drop()来释放Guard上管理的旧Page，后续将that对象上的Page信息拷贝过来。由于接管了that管理的Page，因此that也需要进行Reset操作，避免that后续析构会对Page进行Unpin操作。

operator = 的重载实现和BasicPageGuard(BasicPageGuard &&that )操作流程相同，就是返回值变成return *this即可。

UpgradeRead()/UpgradeWrite

将BasicPageGuard管理的Page转接给一个WriteGuard/ReadGuard进行管理。

注意：还需要完成对Page的读写锁的申请。此外，头文件中指出一个要求，函数执行过程中，这个Page不应该存在被Evit出去的可能，因此在这个过程中不能调用Drop方法。并且，Page的管理转接给WriteGuard后，原BasicGuard将会失效，不再管理任何Page。

先申请该Page的读锁/写锁，然后根据BasicGuard中的Page和bpm信息来构建一个ReadGuard/WriteGuard，同时调用Reset方法，将BasicGuard自身初始化。

WritePageGuard/ReadPageGuard

Drop

相比BasicGuard，Write/ReadGuard还持有了写/读锁，因此多一步释放锁的步骤，当前也需要检查Guard中是否保存了合法的Page信息，如果该Guard已经被初始化了，不再管理任何Page信息，无需进行任何操作。释放完锁，后续就是让guard_调用Drop函数来进行Unpin释放Page即可。

拷贝构造和=号重载实现

同理，先将自身管理的Page进行释放操作，调用Drop，然后将that的guard_拷贝过来即可。注意需要将that进行初始化。

BufferPoolManager

FetchPageBasic

FetchPage返回的是Page * ，而这里要求BasicPageGuard，因此非常简单，就是把FetchPage的返回值构造出一个BasicPageGuard即可。

FetchPageRead / FetchPageWrite

相比FetchPageBasic，这俩函数要求返回的是ReadPageGuard/WritePageGuard，相比FetchPageBasic就是多了一个步骤，获取Page的读写锁，然后构造出Read/WriteGuard即可。

NewPageGuarded

同理，调用NewPage，然后将NewPage得到的Page构造出一个BasicPageGuard即可。

Task#2 Extendible Hash Table Pages

这里主要实现的是针对底层Hash Table的Header、Directory、Bucket这些基本类的完善

ExtendibleHTableHeaderPage

Init

将参数max_depth复制给max_depth_，然后通过memset来将directory_page_ids数组设置成全-1即可。

HashToDirectoryIndex

Header将Hash映射为Index是取Hash的高max_depth_位，那我们通过右移来取高位即可。

if (max_depth == 0) {
    return 0;
}
return hash >> (sizeof(hash) * 8 - max_depth);

我们需要知道一个冷知识，C++中左移或者右移超过了这个数字的类型的位数，那么就会自动对移动的位数进行取余然后再进行移位。

例如：这里的hash为uint32_t为32bit，hash >> 32的值，按照我们正常预想来说应该为0，但是真正运行过程中，移位的位数大于等于了32，那么移位的位数就会进行取余，32 % 32 = 0，最后hash >> 32 => 会变成 hash >> 0，hash就保持不变了。因此我特地为max_depth==0的情况加了if分支来处理。

GetDirectoryPageId/SetDirectoryPageId

这俩函数的实现过于简单，不赘述

ExtendibleHTableDirectoryPage

Init

max_depth_ = max_depth;
global_depth_ = 0;
memset(local_depths_, 0, HTABLE_DIRECTORY_ARRAY_SIZE);
memset(bucket_page_ids_, -1, sizeof(page_id_t) * HTABLE_DIRECTORY_ARRAY_SIZE);

非常简单，对成员变量进行初始化

SetBucketPageId

auto count = 1 << global_depth_;
for (auto idx = 0; idx < count; ++idx) {
    if (bucket_page_ids_[idx] == bucket_page_id) {
        local_depths_[bucket_idx] = local_depths_[idx];
        break;
    }
    if (idx == count - 1) {
        local_depths_[bucket_idx] = global_depth_;
    }
}
bucket_page_ids_[bucket_idx] = bucket_page_id;

这个函数的作用显而易见，就是在bucket_page_ids数组中填写指定bucket的pageId，并给该bucket的local depth赋值。

但是还有一些点需要注意：

1、第一种情况，一个新的Bucket需要一个新的page来映射，此时，这个bucket的local depth必然等于global depth；

2、第二种情况，这个pageId并非是一个新的page，已有一个bucket对应了该page，那么就会出现多个bucket映射同个page，此时这几个bucket的local_depth均小于global_depth，因此更新该bucket的local depth时需要遍历bucket_page_ids_来找到哪个bucket也是对应这个pageId，然后将该bucket的local depth来拷贝过来。

GetSplitImageIndex

在Global Depth增大时，我会调用该函数。Global Depth增大，htable_dir的容量会翻倍，htable_dir扩容的过程举例来说：

bucket: page: => bucket: page:

0 1 00 1

1 2 01 2

10 1

11 2

我们可以看到0变为00和10，1变为01和11。

10就是0的splitImageIndex，11就是1的splitImageIndex

因此当Global Depth扩容时，就是会给每个原有的bucket创建一个对应的SplitImage，而这些SplitImage和原Bucket都是使用相同的page。

每个bucket对应的SplitImageIndex = (1 << global_depth_) + bucket_idx

GetGlobalDepthMask和GetLocalDepthMask

Mask就是根据depth的大小来取低depth位的。

因此mask = (1 << depth) - 1

IncrGlobalDepth

BUSTUB_ASSERT(global_depth_ < max_depth_, true);
// update the whole directory
uint32_t count = 1 << global_depth_;
for (uint32_t idx = 0; idx < count; ++idx) {
    page_id_t pg_id = bucket_page_ids_[idx];
    uint32_t add_idx = GetSplitImageIndex(idx);
    bucket_page_ids_[add_idx] = pg_id;
    local_depths_[add_idx] = local_depths_[idx];
}
++global_depth_;

结合GetSplitImageIndex函数那边的讲解来理解

DecrGlobalDepth

减小Global Depth就非常简单了，因为就是将htable_dir中后面的那些bucket作废即可，直接减小Global Depth即可，那么就访问不到那些了。

其余函数都十分简单粗暴，就不赘述了。

ExtendibleHTableBucketPage<K, V, KC>

Lookup

for (uint32_t i = 0; i < size_; ++i) {
    if (!cmp(key, array_[i].first)) {
        value = array_[i].second;
        return true;
    }
}
return false;

主要就是要看懂cmp的用法，cmp就是传递两个参数，两个一样大返回0，左边大返回1，右边大返回-1.

Insert

需要注意一点，当前作用的版本的实现并不支持重复key，因此Insert前，需要检查一下是否已有相同的key，若已有就返回false。

如果没有重复key，并且bucket未满，那就直接放到数组末尾，然后增大size_即可。

Remove

  for (uint32_t i = 0; i < size_; ++i) {
    if (!cmp(key, array_[i].first)) {
      if (i != size_ - 1) {
        // if the removed one located in the last
        // we only need to --size
        for (uint32_t j = i; j < size_ - 1; ++j) {
          array_[j] = array_[j + 1];
        }
        // uint32_t size = sizeof(std::pair<K, V>) * (size_ - i - 1);
        // memmove(&array_[i], &array_[i + 1], size);
      }
      --size_;
      return true;
    }
  }
  return false;

进行remove的时候，如果被删除的元素在中间位置，由于bucket是使用数组来存储的，那么后面的元素都要整体前移，注意这里不能使用memmove，array里面的pair<KeyType, ValueType>中的KeyType和ValueType的copy constructor并非是trivial的，那么就不支持bitwise的拷贝。

bitwise拷贝和copy constructor是否non-trivial这块可以去看C++相关资料，可以去看《深入理解C++对象模型》

因此我改用了=号赋值来进行一个个前移。

其余比较简单，就不赘述了

Task#3 DiskExtendibleHashTable

这里需要频繁从内存池获取内存帧，课堂提示中也建议使用Guard

这里有几个坑，我们需要注意：

1、在Grade Scope中部分测试会使用很小的内存池，一个内存池中仅有3个Page，因此每个方法的实现中在一些Page用来后，我们需要主动进行Drop来腾出内存池；例如：htable_header我们用来找到指定的htable_dir后就没用了，我们就可以第一时间将htable_header的Page进行Drop；

2、一个Page Guard中有两个方法可以让我们获取到Page中的data，GetData和GetDataMut，前者返回的是const char *因此无法对Page中的data进行修改，仅可查看Page中的数据内容；后者返回的是char *，可以对Page中的数据进行修改，此外GetDataMut调用的时候，还会自动将该Page标记为脏的；

3、我们可以看到前面的ExtendibleHtable类中，我们都没有实现构造函数，我们都是实现了一个Init函数，因为构造函数一般对应了new方法，new会申请一块内存，同时会执行构造函数对这块内存进行初始化。但是，我们后续都不会用new来申请内存，我们直接都是使用内存池中的内存，我们直接用指定类型的指针指向Page的地址头，然后调用Init函数来完成对象的初始化；

4、我一开始看到每个方法的开头都必定都是先从HTable Header来查找到指定的Dir，因此直接新增了一个成员变量htable_header_的指针来指向内存池中htable_header所在的内存帧。这会出问题，因为内存池可能会因为实在太小，将这个高频访问的htable_header所在的内存帧中的数据给evit出去，用来存放其他数据了。

Init

index_name_ = name;
pages_ = bpm_->GetPages();
auto new_page = bpm_->NewPageGuarded(&header_page_id_);
auto header_page = new_page.UpgradeWrite();
auto htable_header = reinterpret_cast<ExtendibleHTableHeaderPage *>(header_page.GetDataMut());
htable_header->Init(header_max_depth);

GetValue

uint32_t hash = Hash(key);
auto header_page = bpm_->FetchPageRead(header_page_id_);
auto htable_header = reinterpret_cast<const ExtendibleHTableHeaderPage *>(header_page.GetData());
uint32_t dir_idx = htable_header->HashToDirectoryIndex(hash);
page_id_t dir_page_id = htable_header->GetDirectoryPageId(dir_idx);
if (dir_page_id == -1) {
    return false;
}
header_page.Drop();
auto dir_page = bpm_->FetchPageRead(dir_page_id);
auto htable_dir = reinterpret_cast<const ExtendibleHTableDirectoryPage *>(dir_page.GetData());
uint32_t bucket_idx = htable_dir->HashToBucketIndex(hash);
page_id_t bukcet_page_id = htable_dir->GetBucketPageId(bucket_idx);
if (bukcet_page_id == -1) {
    return false;
}
dir_page.Drop();
auto bucket_page = bpm_->FetchPageRead(bukcet_page_id);
auto htable_bucket = reinterpret_cast<const ExtendibleHTableBucketPage<K, V, KC> *>(bucket_page.GetData());
V res;
if (htable_bucket->Lookup(key, res, cmp_)) {
    result->push_back(res);
    return true;
}
return false;

从HTable Header找到HTable Dir，然后再找到HTable Bucket，过程中一旦发现还没有指定的Page，那就失败了。

可以看到，我会及时Drop掉那些已经不再用的page

Insert

uint32_t hash = Hash(key);
auto header_page = bpm_->FetchPageWrite(header_page_id_);
auto htable_header = reinterpret_cast<ExtendibleHTableHeaderPage *>(header_page.GetDataMut());
uint32_t dir_idx = htable_header->HashToDirectoryIndex(hash);
page_id_t dir_page_id = htable_header->GetDirectoryPageId(dir_idx);
if (dir_page_id == -1) {
    return InsertToNewDirectory(htable_header, dir_idx, hash, key, value);
}
header_page.Drop();
auto dir_page = bpm_->FetchPageWrite(dir_page_id);
auto htable_dir = reinterpret_cast<ExtendibleHTableDirectoryPage *>(dir_page.GetDataMut());

uint32_t bucket_idx = htable_dir->HashToBucketIndex(hash);
page_id_t bucket_page_id = htable_dir->GetBucketPageId(bucket_idx);
if (bucket_page_id == -1) {
    return InsertToNewBucket(htable_dir, bucket_idx, key, value);
}
auto bucket_page = bpm_->FetchPageWrite(bucket_page_id);
auto htable_bucket = reinterpret_cast<ExtendibleHTableBucketPage<K, V, KC> *>(bucket_page.GetDataMut());

if (!htable_bucket->IsFull()) {
    return htable_bucket->Insert(key, value, cmp_);
}
// if the bucket is full, we should split the bucket first
while (htable_bucket->IsFull()) {
    uint32_t local_depth = htable_dir->GetLocalDepth(bucket_idx);
    uint32_t global_depth = htable_dir->GetGlobalDepth();
    if (local_depth == global_depth && global_depth == directory_max_depth_) {
        return false;
    }
    // 1.create a new bucket
    page_id_t new_bucket_page_id = -1;
    auto new_page = bpm_->NewPageGuarded(&new_bucket_page_id);
    if (new_bucket_page_id == -1) {
        return false;
    }
    auto new_bucket_page = new_page.UpgradeWrite();
    auto new_htable_bucket = reinterpret_cast<ExtendibleHTableBucketPage<K, V, KC> *>(new_bucket_page.GetDataMut());
    new_htable_bucket->Init(bucket_max_size_);
    // 2.migrate entries to the new bucket
    MigrateEntries(htable_bucket, new_htable_bucket, 0, htable_dir->GetLocalDepthMask(bucket_idx));
    // 3.compare local depth with global depth
    if (local_depth + 1 > global_depth) {
        // local depth + 1 is larger than global depth
        htable_dir->IncrGlobalDepth();
    }
    // 4.update dir mapping with split image
    // example:
    // bucket_idx   **101
    //              *1101 *0101
    // local = 3, global = 5, dis = 2, need to update 4 bucket
    global_depth = htable_dir->GetGlobalDepth();
    uint32_t update_count = 1 << (global_depth - local_depth);
    uint32_t base_idx = bucket_idx & htable_dir->GetLocalDepthMask(bucket_idx);
    bucket_idx = htable_dir->HashToBucketIndex(hash);
    for (uint32_t i = 0; i < update_count; ++i) {
        uint32_t tmp_idx = (i << local_depth) + base_idx;
        if (i % 2 == 0) {
            htable_dir->IncrLocalDepth(tmp_idx);
        } else {
            if (tmp_idx == bucket_idx) {
                htable_bucket = new_htable_bucket;
            }
            UpdateDirectoryMapping(htable_dir, tmp_idx, new_bucket_page_id, local_depth + 1, 0);
        }
    }
}
return htable_bucket->Insert(key, value, cmp_);

这里插入，着重需要讲的是当一个bucket满了以后的情况：

1、如果bucket的local depth < global depth：

我们根据local depth和global depth之间的差距大小来确定有多少个bucket共用一个page

例如：

bucket_idx = **101, local depth = 3, global depth = 5

那么dis = global - local = 2，就有1 << 2 = 4 个bucket共用一个page，我们需要更新4个bucket。

对应的就是*1101和*0101，这两拨bucket将分别使用各自的page。

我的实现中让*1101使用新的page，因此*0101仅需IncrLocalDepth即可。

我们已知local depth = 3，因此我们获取local mask来获取bucket_idx的低3位， base_idx = bucket_idx & local_mask

然后其余4个bucket都是在base_idx上，4、5位上有所不同。因此通过for循环遍历，来tmp_idx = (i << local_depth) + base_idx，对应的就是每次只改变第4bit上的数字，一路递增加上去，由于是二进制，因此i为偶数时，第4位必然是0；i为奇数时，第4位必然为1；

此外，我们进行split bucket后，有可能split后，原来的bucket中的元素全都migrate到了一个bucket中，导致新的元素要插入的bucket还是满的，因此我们这个Split的检查必须是一个递归的。

2、如果bukcet的local depth == global depth

那我们在split bucket前还需要先IncrGlobalDepth，来扩容HTableDir

Remove

uint32_t hash = Hash(key);
auto header_page = bpm_->FetchPageWrite(header_page_id_);
auto htable_header = reinterpret_cast<ExtendibleHTableHeaderPage *>(header_page.GetDataMut());
uint32_t dir_idx = htable_header->HashToDirectoryIndex(hash);
page_id_t dir_page_id = htable_header->GetDirectoryPageId(dir_idx);
if (dir_page_id == -1) {
    return false;
}
header_page.Drop();
auto dir_page = bpm_->FetchPageWrite(dir_page_id);
auto htable_dir = reinterpret_cast<ExtendibleHTableDirectoryPage *>(dir_page.GetDataMut());

uint32_t bucket_idx = htable_dir->HashToBucketIndex(hash);
page_id_t bucket_page_id = htable_dir->GetBucketPageId(bucket_idx);
if (bucket_page_id == -1) {
    return false;
}

auto bucket_page = bpm_->FetchPageWrite(bucket_page_id);
auto htable_bucket = reinterpret_cast<ExtendibleHTableBucketPage<K, V, KC> *>(bucket_page.GetDataMut());
bool res = htable_bucket->Remove(key, cmp_);
bucket_page.Drop();
if (!res) {
    return false;
}
auto check_page_id = bucket_page_id;
auto check_page = bpm_->FetchPageRead(check_page_id);
auto check_bucket = reinterpret_cast<const ExtendibleHTableBucketPage<K, V, KC> *>(check_page.GetData());
uint32_t local_depth = htable_dir->GetLocalDepth(bucket_idx);
uint32_t global_depth = htable_dir->GetGlobalDepth();
while (local_depth > 0) {
    uint32_t convert_mask = 1 << (local_depth - 1);
    uint32_t merge_bucket_idx = bucket_idx ^ convert_mask;
    uint32_t merge_local_depth = htable_dir->GetLocalDepth(merge_bucket_idx);
    page_id_t merge_bucket_page_id = htable_dir->GetBucketPageId(merge_bucket_idx);
    auto merge_bucket_page = bpm_->FetchPageRead(merge_bucket_page_id);
    auto merge_bucket = reinterpret_cast<const ExtendibleHTableBucketPage<K, V, KC> *>(merge_bucket_page.GetData());
    if (merge_local_depth != local_depth || (!check_bucket->IsEmpty() && !merge_bucket->IsEmpty())) {
        break;
    }
    if (check_bucket->IsEmpty()) {
        check_page.Drop();
        bpm_->DeletePage(check_page_id);
        check_bucket = merge_bucket;
        check_page_id = merge_bucket_page_id;
        check_page = std::move(merge_bucket_page);
    } else {
        merge_bucket_page.Drop();
        bpm_->DeletePage(merge_bucket_page_id);
    }
    htable_dir->DecrLocalDepth(bucket_idx);
    local_depth = htable_dir->GetLocalDepth(bucket_idx);
    uint32_t local_depth_mask = htable_dir->GetLocalDepthMask(bucket_idx);
    uint32_t mask_idx = bucket_idx & local_depth_mask;
    uint32_t update_count = 1 << (global_depth - local_depth);
    for (uint32_t i = 0; i < update_count; ++i) {
        uint32_t tmp_idx = (i << local_depth) + mask_idx;
        UpdateDirectoryMapping(htable_dir, tmp_idx, check_page_id, local_depth, 0);
    }
}

while (htable_dir->CanShrink()) {
    htable_dir->DecrGlobalDepth();
}
return true;

这里着重需要讲的是Remove后，需要merge的情况，我们首先讲清楚什么情况下需要merge

这个bucket的local depth不为0，且大于0，对应的Split Bucket的local depth相同，并且两个bucket其中有一个为空即可。

因此，我们在merge的过程中，即使该bucket不为空，但是对应的split bukcet可能为空，因此每次都需要对两个bucket进行检查：

if (merge_local_depth != local_depth || (!check_bucket->IsEmpty() && !merge_bucket->IsEmpty())) { break; }

这个判断分支就是，如果两个bucket的depth不相等，或者是两个bucket的都不是空的，那么就中止

我们只要将两个bucket中其中非空的那个bucket的page拿来共用即可，将空的bucket的page进行drop和delete处理。

后续就是将共用一个page的bucket在directory中进行更新，使用UpdateDirectoryMapping函数进行，该函数实现大家都可以自由发挥，对于我而言，这个函数的最后一个参数local_depth_mask没什么用处，因此我在调用该函数时，最后一个参数我就直接给0了。

举例：

bucket_idx = *1101, local depth = 4, global depth = 5

那么该bucket对应的split bucket为*0101

经过merge后，local depth = 4 - 1 = 3，对应会有 1 << (global depth - local depth) = 1 << 2 = 4个bucket需要更新。

local depth = 3，因此我们仅需取bucket_idx的低3位，mask_idx = bucket_idx * local_mask

后续就是遍历低3位为mask_idx的所有bucket，更新这些bucket在HTableDirectory中的信息。

最后就是循环调用CanShrink方法，如果可以减小global depth，那就减小即可。

MigrateEntries

++local_depth_mask;
for (uint32_t i = 0; i < old_bucket->Size(); ++i) {
    K key = old_bucket->KeyAt(i);
    uint32_t hash = Hash(key);
    if ((hash & local_depth_mask) != 0) {
        auto &entry = old_bucket->EntryAt(i);
        new_bucket->Insert(entry.first, entry.second, cmp_);
        old_bucket->RemoveAt(i);
    }
}

举例：

local_depth_mask= 00111

表明local depth = 3，那么就是split bucket就是按照第4位来区分，该将哪些元素放到新的bucket中。

因此选择将local_depth_mask++，此时local_depth_mask = 01000，即可拿来和hash进行&操作，来查看哪些key的第4位是0还是1。

其余方法的实现就比较简单，不赘述