CMU15-445:Database Systems Fall 2021

Just to keep track of the course for myself

Lecture

  • Lecture 1: Course Introduction and the Relational Model
  • Lecture 2: Advanced SQL
  • Lecture 3: Database Storage I
  • Lecture 4: Database Storage II
  • Lecture 5: Buffer Pools
  • Lecture 6: Hash Tables
  • Lecture 7: Tree Indexes
  • Lecture 8: Index Concurrency Control
  • Lecture 9: Sorting + Aggregations
  • Lecture10: Joins Algorithms
  • Lecture 11: Query Execution I
  • Lecture 12: Query Execution II
  • Lecture 13: Query Planning & Optimization I (13:11)

Lecture1: Course Introduction and the Relational Model

Databases、Flat File Strawman、Database Management System、Relational Mode、Data Manipulation Languages (DMLs)、Relational Algebra

Relational Algebra
Select、Projection、Union、Itersection、Difference、Product、Join、Rename、Assignment、Duplicate Elimination、Aggregation、Sorting、Division

Lecture 2: Advanced SQL

Relational Languages、SQL History、Joins、Aggregates、String Operations、Date and Time、Output Redirection、Output Control、Nested Queries、Window Functions、Common Table Expressions

Lecture 3: Database Storage I

Storage, Disk-Oriented DBMS Overview, DBMS vs. OS, File Storage, Database Pages, Database Heap, Page Layout, Tuple Layout,

DBMS vs OS
Why not use the OS? Why not use the mmap?

File Storage
The DBMS stores a database as one or more files on disk.

Storage Manager
The storage manager is responsible for maintaining a database’s files.
It organizes the files as a collection of pages.

Pages
A page is fixed-size block of data. It contrain tuples, meta-data, indexes, log records…
Each page is given a unique identifier.
Hardware Page(usually 4KB) & OS Page(usually 4KB) & Database Page(512B-16KB)
Different DBMS manage pages in files on disk in different ways:

  • Heap File Organization (Linked List & Page Directory)
  • Sequential / Sorted File Organization
  • Hashing File Organization

Page Layout
Two approaches about organizing the data stored inside of the page:

  • Tuple-oriented
  • Log-structured

Page Header: Page Size、Checksum、DBMS Version、Transaction Visibility、Compression Information

Tuple Storage & Slotted Pages

Record ID
Each tuple is assigned a unique record identifier.

  • Most common: page_id + offset/slot
  • Can also contain file location info

Tuple Layout
Each tuple is prefixed with a header that contains meta-data about it.

  • Visibility info (concurrency control)
  • Bit Map for NULL values

We do not need to store meta-data about the schema.

physical denormalization??

Log-structured File Organization

The system appends log records to the file of how the database was modified:

  • Inserts store the entire tuple.
  • Deletes mark the tuple as deleted.
  • Updates contain the delta of just the attributes that were modified.

Why better write performance? avoid random io

To read a record, the DBMS scans the log backwards and “recreates” the tuple to find what it needs.

leveldb/rocksdb/hbase/cassandra

Lecture 4: Database Storage II

Data Representation, Workloads, Storage Models,

Variable Precision Numbers
Typically faster than arbitrary precision numbers but can have rouding errors…
Examples: FLOAT, REAL/DOUBLE

Fixed Precision Numbers
Examples: NUMERIC, DECIMAL

Large Values
To store values that are larger than a page, the DBMS uses separate overflow storage pages.
Example: BLOB

System Catalogs
Almost every DBMS stores the database’s catalog inside itself (i.e., as tables).

Database Workloads
On-Line Transaction Processing (OLTP): Fast operations that only read/update a small amount of data each time.
On-Line Analytical Processing (OLAP): Complex queries that read a lot of data to compute aggregates.
Hybrid Transacntion + Analytical Processing (HTAP): OLTP + OLAP together on the same database instance

N-ARY Storage Model (NSM)
The DBMS stores all attributes for a single tuple contiguously in a page.

Decomposition Storage Model (DSM)
The DBMS stores the values of a single attribute for all tuples contiguously in a page.
Also known as a column store.

Tuple Identification
Fixed-length Offesets & Embedded Tuple Ids

Lecture 5: Buffer Pools

Introduction, Locks vs. Latches, Buffer Pool, Buffer Pool Optimizations, OS Page Cache, Buffer Replacement Policies, Other Memory Pools

Buffer Pool Organization
Memory region organized as an array of fixed-size pages.
An array entry is called a frame.

Buffer Pool Meta-Data
The page table keeps track of pages that are currently in memory.
Also maintains additional meta-data per page: Dirty Flag, Pin/Reference Counter

Allocation Policies
Global Policies & Local Policies

Multiple Buffer Pools
Multiple buffer pool instances, Per-database buffer pool, Per-page type buffer pool
How to choose buffer pools: Object Id & Hashing

Pre-Fetching
The DBMS can also prefetch pages based on a query plan.

Scan Sharing
Queries can reuse data retrieved from storage or operator computations.
Aslo called synchronized scans.

Buffer Pool Bypass
The sequential scan operator will not store fetched pages in the buffer pool to avoid overhead.

OS Page Cache
Most DBMSs use direct I/O to bypass the OS’s page cache.

Buffer Replacement Policies
LRU, CLOCK, LRU-K
Localization, Priority hints

Dirty Pages
FAST: If a page in the buffer pool is not dirty, then the DBMS can simply drop it.
SLOW: If a page is dirty, then the DBMS must write back to disk to ensure that its changes are persisted.

Background Writing
The DBMS can periodically walk through the page table and write dirty pages to disk.

Other Memory Pools
The DBMS needs memory for things other than just tuples and indexes.

  • Sorting + Join Buffers
  • Query Caches
  • Maintenance Buffers
  • Log Buffers
  • Dictionary Caches

Lecture 6: Hash Tables

Data Structures, Hash Table, Hash Functions, Static Hashing Schemes, Dynamic Hashing Schemes

Data Structures
Internal Meta-Data, Core Data Storage, Temporary Data Structures, Table Indices

Hash Table
Trade-off: Hash Function & Hash Scheme

Hash Functions
CRC-64、MurmurHash、Google CityHash、Facebook XXHash、Google FarmHash

Static Hashing Schemes
Linear Probe Hashing
Robin Hood Hashing
Cuckoo Hashing

Dynamic Hashing Schemes
Chained Hashing
Extendible Hashing
Linear Hashing

Lecture 7: Tree Indexes

Table Indexes, B+Tree, B+Tree Design Choices, Optimizations

Table Indexes
A table index is a replica of a subset of a table’s attributes that are organized and/or sorted for efficient access using those attributes.

Leaf Node Values

  1. Record IDs: A pointer to the location of the tuple to which the index entry corresponds.
  2. Tuple Data: The leaf nodes store the actual contents of the tuple; Secondary indexes must store the Record ID as their values.

Clustered Indexes
The table is stored in the sort order specified by the primary key.

B+Tree Design Choices

  1. Node Size
  2. Merge Threshold
  3. Variable-Length Keys
  4. Intra-Node Search

Lecture 14: Query Planner and Op

rule:

cost-based search: histogram, sampling, dynamic programming

Homework

Project

Reference

https://15445.courses.cs.cmu.edu/fall2021/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值