环境搭建
使用windows,下载virtual box安装Ubuntu 22.04虚拟机
配置git
#git
$ git clone git@github.com:cmu-db/bustub.git ./database-sys
$ cd database-sys
$ git reset --hard d830931 #2022fall
$ git remote rm origin
$ git remote add origin git@github.com:sleepsheee/database-sys.git #添加自己仓库作为远程分支
$ git push -u origin master
#build
$ sudo build_support/packages.sh
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Debug ..
$ make -j$(nproc)
(optional)配置vscode: clang, clangd, lldb, cmake
sudo apt install clang clangd lldb cmake
环境测试
$ cd build
$ make starter_trie_test
$ ./test/starter_trie_test
Project 0 C++ Primer
实现一个Trie
Lecture 01 : Relational Model & Relational Algebra
1. Databases
database vs DBMS
2. Flat File Strawman
Database is stored as comma-separated value (CSV) files that the DBMS manages.
Issues: …
3. Database Management System
A DBMS is a software that allows applications to store and analyze information in a database.
A data model is a collection of concepts for describing the data in database.
Examples: relational (most common), NoSQL (key/value, graph), array/matrix/vectors
A schema is a description of a particular collection of data based on a data model.
4. Relational Model
5. Data Manipulation Languages (DMLs)
6. Relational Algebra
Lecture03 Database Storage I
-
Storage
-
Volatile Devices(memory):
data lost when pull the power from machine
fast random access -
Non-Volatile Devices(disk):
block/page addressable
Since our DBMS architecture assumes that the database is stored on disk, the components of the DBMS are responsible for figuring out how to move data between non-volatile disk and volatile memory since the system cannot operate on the data directly on disk.
-
-
Disk-Oriented DBMS Overview
The database is all on disk, and the data in database files is organized into pages.
- buffer pool: manages the data movement back and forth between disk and memory.
- execution engine that will execute queries. The execution engine will ask the buffer pool for a specific page, and the buffer pool will take care of bringing that page into memory and giving the execution.
Lecture #04: Database Storage II
- Log-Structured Storage
When the in memory page gets full, DBMS writes it to disk. Disk pages are immutable.
Project1 Buffer Pool
Task1 Extendible Hashing
- Build a thread-safe extendible hash table to manage the DBMS’s buffer pool page table.
- Page table是一个in-memory hash table,记录目前在内存中的页。此页表非OS中的页表!
- 它将page id(disk)映射到frame id(buffer pool)
- 可拓展哈希原理
- ExtendibleHashTable 内嵌 Bucket 类
- Bucket
size_t size_; //bucket size
int depth_; //local depth
std::list<std::pair<K, V>> list_; //bucket内存储的数据,list实现
- ExtendibleHashTable
int global_depth_; // The global depth of the directory
size_t bucket_size_; // The size of a bucket
int num_buckets_; // The number of buckets in the hash table
mutable std::mutex latch_;
std::vector<std::shared_ptr<Bucket>> dir_; // The directory of the hash table, a vector of shared pointer pointing to bucket
目录扩容:
global depth 由2变为3
扩容后,末pre global depth位指向,与扩容前末2位相同的目录指向的桶(低位相同)
例:4,5,6,7分别指向0,1,2,3指向的桶
桶分裂
- 如果有2n个目录指向一个桶,则桶分裂后每个桶分别有n个目录
- local depth 2->3:
6- 00 110 ,new bucket,insert,erase old
10- 01 010, remain in old bucket
22- 10 110 ,new bucket,insert,erase old
26- 11 010,remain in old bucket- 如果末第3位是1,new bucket对象
preindex = 10
index = 110 / 010
C++二进制中0b010==0b10- 更改目录指向
Task2 - LRU-K Replacement Policy
- Build a data structure that tracks the usage of pages using the LRU-K policy.
- 当DBMS需要清理一些frame,为新的page腾出空间,就需要将buffer pool中的page置换出去。
//class LRUKReplacer
size_t current_timestamp_{0}; //时间戳
size_t curr_size_{0};//lru replacer现在的size
size_t replacer_size_;//容量
size_t k_;
struct Frameinfo {
bool evictable_{false};
std::queue<size_t> time_;//最长为k,删除时找第k次最小的
};
std::unordered_map<frame_id_t, struct Frameinfo> hash_; //哈希
std::mutex latch_;
Evict:若都没有第k次访问,删除时间戳最小的
有第k次访问,删除第一次访问时间戳小的
Task3 - Buffer Pool Manager Instance
- fetching database pages from the DiskManager and storing them in memory.
- write dirty pages out to disk when it is either explicitly instructed to do so or when it needs to evict a page to make space for a new page.
/** Number of pages in the buffer pool. */
const size_t pool_size_;
/** The next page id to be allocated */
std::atomic<page_id_t> next_page_id_ = 0;
/** Bucket size for the extendible hash table */
const size_t bucket_size_ = 4;
/** Array of buffer pool pages. */
Page *pages_;
/** Pointer to the disk manager. */
DiskManager *disk_manager_ __attribute__((__unused__));
/** Pointer to the log manager. Please ignore this for P1. */
LogManager *log_manager_ __attribute__((__unused__));
/** Page table for keeping track of buffer pool pages. */
ExtendibleHashTable<page_id_t, frame_id_t> * ;
/** Replacer to find unpinned pages for replacement. */
LRUKReplacer *replacer_;
/** List of free frames that don't have any pages on them. */
std::list<frame_id_t> free_list_;
/** This latch protects shared data structures. We recommend updating this comment to describe what it protects. */
std::mutex latch_;