CMU15-445：Database Systems Fall 2021

fkuner

已于 2022-06-10 14:49:07 修改

阅读量1.8k

点赞数

分类专栏： CMU15-445 数据库文章标签：数据库

于 2022-03-23 01:48:12 首次发布

本文链接：https://blog.csdn.net/younothings/article/details/123676030

版权

数据库同时被 2 个专栏收录

3 篇文章 0 订阅

订阅专栏

CMU15-445

1 篇文章 0 订阅

订阅专栏

Just to keep track of the course for myself

文章目录

Lecture
Homework
Project
Reference

Lecture

Lecture 1: Course Introduction and the Relational Model
Lecture 2: Advanced SQL
Lecture 3: Database Storage I
Lecture 4: Database Storage II
Lecture 5: Buffer Pools
Lecture 6: Hash Tables
Lecture 7: Tree Indexes
Lecture 8: Index Concurrency Control
Lecture 9: Sorting + Aggregations
Lecture10: Joins Algorithms
Lecture 11: Query Execution I
Lecture 12: Query Execution II
Lecture 13: Query Planning & Optimization I (13:11)

Lecture1: Course Introduction and the Relational Model

Databases、Flat File Strawman、Database Management System、Relational Mode、Data Manipulation Languages (DMLs)、Relational Algebra

Relational Algebra
Select、Projection、Union、Itersection、Difference、Product、Join、Rename、Assignment、Duplicate Elimination、Aggregation、Sorting、Division

Lecture 2: Advanced SQL

Relational Languages、SQL History、Joins、Aggregates、String Operations、Date and Time、Output Redirection、Output Control、Nested Queries、Window Functions、Common Table Expressions

Lecture 3: Database Storage I

Storage, Disk-Oriented DBMS Overview, DBMS vs. OS, File Storage, Database Pages, Database Heap, Page Layout, Tuple Layout,

DBMS vs OS
Why not use the OS? Why not use the mmap?

File Storage
The DBMS stores a database as one or more files on disk.

Storage Manager
The storage manager is responsible for maintaining a database’s files.
It organizes the files as a collection of pages.

Pages
A page is fixed-size block of data. It contrain tuples, meta-data, indexes, log records…
Each page is given a unique identifier.
Hardware Page(usually 4KB) & OS Page(usually 4KB) & Database Page(512B-16KB)
Different DBMS manage pages in files on disk in different ways:

Heap File Organization (Linked List & Page Directory)
Sequential / Sorted File Organization
Hashing File Organization

Page Layout
Two approaches about organizing the data stored inside of the page:

Tuple-oriented
Log-structured

Page Header: Page Size、Checksum、DBMS Version、Transaction Visibility、Compression Information

Tuple Storage & Slotted Pages

Record ID
Each tuple is assigned a unique record identifier.

Most common: page_id + offset/slot
Can also contain file location info

Tuple Layout
Each tuple is prefixed with a header that contains meta-data about it.

Visibility info (concurrency control)
Bit Map for NULL values

We do not need to store meta-data about the schema.

physical denormalization??

Log-structured File Organization

The system appends log records to the file of how the database was modified:

Inserts store the entire tuple.
Deletes mark the tuple as deleted.
Updates contain the delta of just the attributes that were modified.

Why better write performance? avoid random io

To read a record, the DBMS scans the log backwards and “recreates” the tuple to find what it needs.

leveldb/rocksdb/hbase/cassandra

Lecture 4: Database Storage II

Data Representation, Workloads, Storage Models,

Variable Precision Numbers
Typically faster than arbitrary precision numbers but can have rouding errors…
Examples: FLOAT, REAL/DOUBLE

Fixed Precision Numbers
Examples: NUMERIC, DECIMAL

Large Values
To store values that are larger than a page, the DBMS uses separate overflow storage pages.
Example: BLOB

System Catalogs
Almost every DBMS stores the database’s catalog inside itself (i.e., as tables).

Database Workloads
On-Line Transaction Processing (OLTP): Fast operations that only read/update a small amount of data each time.
On-Line Analytical Processing (OLAP): Complex queries that read a lot of data to compute aggregates.
Hybrid Transacntion + Analytical Processing (HTAP): OLTP + OLAP together on the same database instance

N-ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a page.

Decomposition Storage Model (DSM)
The DBMS stores the values of a single attribute for all tuples contiguously in a page.
Also known as a column store.

Tuple Identification
Fixed-length Offesets & Embedded Tuple Ids

Lecture 5: Buffer Pools

Introduction, Locks vs. Latches, Buffer Pool, Buffer Pool Optimizations, OS Page Cache, Buffer Replacement Policies, Other Memory Pools

Buffer Pool Organization
Memory region organized as an array of fixed-size pages.
An array entry is called a frame.

Buffer Pool Meta-Data
The page table keeps track of pages that are currently in memory.
Also maintains additional meta-data per page: Dirty Flag, Pin/Reference Counter

Allocation Policies
Global Policies & Local Policies

Multiple Buffer Pools
Multiple buffer pool instances, Per-database buffer pool, Per-page type buffer pool
How to choose buffer pools: Object Id & Hashing

Pre-Fetching
The DBMS can also prefetch pages based on a query plan.

Scan Sharing
Queries can reuse data retrieved from storage or operator computations.
Aslo called synchronized scans.

Buffer Pool Bypass
The sequential scan operator will not store fetched pages in the buffer pool to avoid overhead.

OS Page Cache
Most DBMSs use direct I/O to bypass the OS’s page cache.

Buffer Replacement Policies
LRU, CLOCK, LRU-K
Localization, Priority hints

Dirty Pages
FAST: If a page in the buffer pool is not dirty, then the DBMS can simply drop it.
SLOW: If a page is dirty, then the DBMS must write back to disk to ensure that its changes are persisted.

Background Writing
The DBMS can periodically walk through the page table and write dirty pages to disk.

Other Memory Pools
The DBMS needs memory for things other than just tuples and indexes.

Sorting + Join Buffers
Query Caches
Maintenance Buffers
Log Buffers
Dictionary Caches

Lecture 6: Hash Tables

Data Structures, Hash Table, Hash Functions, Static Hashing Schemes, Dynamic Hashing Schemes

Data Structures
Internal Meta-Data, Core Data Storage, Temporary Data Structures, Table Indices

Hash Table
Trade-off: Hash Function & Hash Scheme

Hash Functions
CRC-64、MurmurHash、Google CityHash、Facebook XXHash、Google FarmHash

Static Hashing Schemes
Linear Probe Hashing
Robin Hood Hashing
Cuckoo Hashing

Dynamic Hashing Schemes
Chained Hashing
Extendible Hashing
Linear Hashing

Lecture 7: Tree Indexes

Table Indexes, B+Tree, B+Tree Design Choices, Optimizations

Table Indexes
A table index is a replica of a subset of a table’s attributes that are organized and/or sorted for efficient access using those attributes.

Leaf Node Values

Record IDs: A pointer to the location of the tuple to which the index entry corresponds.
Tuple Data: The leaf nodes store the actual contents of the tuple; Secondary indexes must store the Record ID as their values.

Clustered Indexes
The table is stored in the sort order specified by the primary key.

B+Tree Design Choices