Project 3：Query Execution-CSDN博客

本文链接：https://blog.csdn.net/cube__4/article/details/127092095

Project 3：Query Execution

前言：本项目是实现BusTub数据库里面的一些基本的查询计划，一共9个，在这里循序渐进进行分析。难度主要集中在对源码的理解与活学活用，不多说，现在开始。

主要的类有上述三个，需要细读源码，理解其构造与实现。

Catalog

这里主要介绍一下Catalog，其余类可类比学习。

using table_oid_t = uint32_t;// 物理表标号
using column_oid_t = uint32_t;// 列标号
using index_oid_t = uint32_t;// 索引标号

主要定义了几个在表中常用的数据类型

主要数据成员

table_info

struct TableInfo {
  TableInfo(Schema schema, std::string name, std::unique_ptr<TableHeap> &&table, table_oid_t oid)
      : schema_{std::move(schema)}, name_{std::move(name)}, table_{std::move(table)}, oid_{oid} {}
  /** The table schema */
  Schema schema_;
  /** The table name */
  const std::string name_;
  /** An owning pointer to the table heap */
  std::unique_ptr<TableHeap> table_;
  /** The table OID */
  const table_oid_t oid_;
};

这里定义了一个结构体存储物理表的主要信息，用于从Catalog中获取。
在这里插入图片描述

index_info

struct IndexInfo {
  IndexInfo(Schema key_schema, std::string name, std::unique_ptr<Index> &&index, index_oid_t index_oid,
            std::string table_name, size_t key_size)
      : key_schema_{std::move(key_schema)},
        name_{std::move(name)},
        index_{std::move(index)},
        index_oid_{index_oid},
        table_name_{std::move(table_name)},
        key_size_{key_size} {}
  /** The schema for the index key */
  Schema key_schema_;
  /** The name of the index */
  std::string name_;
  /** An owning pointer to the index */
  std::unique_ptr<Index> index_;
  /** The unique OID for the index */
  index_oid_t index_oid_;
  /** The name of the table on which the index is created */
  std::string table_name_;
  /** The size of the index key, in bytes */
  const size_t key_size_;
};

和table_info一样，这里声明了一个结构体用于存储索引的信息，需要注意的是这里的索引是基于一个存在的表生成的。
在这里插入图片描述

主要成员函数

在这里插入图片描述

  /** Indicates that an operation returning a `TableInfo*` failed */
  static constexpr TableInfo *NULL_TABLE_INFO{nullptr};

  /** Indicates that an operation returning a `IndexInfo*` failed */
  static constexpr IndexInfo *NULL_INDEX_INFO{nullptr};

声明两个静态常量类型，这里用的是constexpr，必须在类里面初始化

其余的类可类比总结，这里不再赘述，接下来是执行器的实现。

执行器实现

一共九个执行器，主要介绍几个比较难以实现的，以及些许坑点。

先来总体概括一下，查询计划的具体实现过程。

在这里插入图片描述

主要是实现上图中的Init和Next

顺序扫描

给定一个表，执行顺序扫描，要注意以下几点：

需要填充result_set，也就是说每一次Next需要返回一个tuple和rid
要记录下一次开始的位置，也就是要记录表迭代器的位置

要控制输出格式，也就是表本身的schema与执行计划输出的schema不一样，要进行判断与修改

auto opt = plan_->OutputSchema();
std::vector<Value> values;
values.reserve(opt->GetColumnCount());
//  当输出格式与原schema格式不同时，需要获取输出格式的所有Value，来构造tuple  
for (const auto &column : opt->GetColumns()) {   
    auto value = column.GetExpr()->Evaluate(tp, &schema); 
    values.push_back(value);
        
}        
*tuple = Tuple(values, opt);// 构造tuple       
*rid = tp->GetRid();

只要predicate不为nullptr，就需要进行Evaluate

插入、删除、更新

这几个操作都需要涉及更新表的数据以及索引，为此需要修改物理表数据（TableHeap）与索引信息（Index），同时不能修改result_set，为此需要全部处理后，返回false，有点像pipebreaker。

聚合函数

和顺序扫描操作类似，在Having不为nullptr时，需要进行谓词的判断，当符合条件就返回tuple，这里的重点在Init，因为需要构造哈希表，以供Next函数查询使用。

嵌套连接

编程难度从这里开始上升，这里要求实现下面关系

在这里插入图片描述

同时，要考虑IOcost，为此要记录外键的位置，同时遍历内键，这里利用了比较符的性质，巧妙的区分各种情况。

while (flag_ || left_executor_->Next(&left_tuple_, &left_rid_)) {
    while (right_executor_->Next(&right_tuple, &right_rid)) {
      res = true;
      if (predicate != nullptr) {
        res = predicate->EvaluateJoin(&left_tuple_, left_schema, &right_tuple, right_schema).GetAs<bool>();
      }
      if (res) {
        std::vector<Value> values;
        values.reserve(output_schema->GetColumnCount());
        for (const auto &column : output_schema->GetColumns()) {
          values.push_back(column.GetExpr()->EvaluateJoin(&left_tuple_, left_schema, &right_tuple, right_schema));
        }
        *tuple = Tuple(values, output_schema);
        flag_ = true;
        return true;
      }
    }
    flag_ = false;
    right_executor_->Init();
  }

flag_初始化为false，后面根据需要进行调整。

哈希连接

在这里插入图片描述

这个操作符难度较大，我根据官方提示进行操作，设计相应的哈希结构进行处理，然后将哈希函数进行特化，以此来适应tuple（多个值）的情况。具体形式可以参考：

// ------------------------------------------------------------------
namespace bustub {
struct HashJoinKey {
  Value keys_;
  auto operator==(const HashJoinKey &other) const -> bool {
    return (keys_.CompareEquals(other.keys_) == CmpBool::CmpTrue);
  }
};
} 
// ------------------------------------------------------------------
namespace std {
/** Implements std::hash on Key */
template <>
struct hash<bustub::HashJoinKey> {
  auto operator()(const bustub::HashJoinKey &key) const -> std::size_t {
    size_t curr_hash = 0;
    if (!key.keys_.IsNull()) {
      curr_hash = bustub::HashUtil::CombineHashes(curr_hash, bustub::HashUtil::HashValue(&key.keys_));
    }
    return curr_hash;
  }
};
}
// --------------------------------------------------------------------

通过上述的数据结构，在Init中可以将外键进行处理，将其存储在unorder_map，以供查询

接着在Next中，遍历内键，但要需要注意多个键共享一个key的情况，这里需要申请一些成员变量

 private:
  /** The NestedLoopJoin plan node to be executed. */
  const HashJoinPlanNode *plan_;
  std::unique_ptr<AbstractExecutor> left_child_;
  std::unique_ptr<AbstractExecutor> right_child_;
//-------------------------------------------------
  const Schema *l_schema_;
  std::unordered_map<HashJoinKey, std::vector<Tuple>> h_map_;
  std::vector<Tuple> left_tuples_{};
  RID right_rid_;
  Tuple right_tuple_;
  bool flag_;