整理自CSE 444 Database Internals, Spring 2019 的课程Lectures,课程地址:https://courses.cs.washington.edu/courses/cse444/19sp/
Lecture1 Introduction
Six Labs
- Lab1:Build a DBMS that can scan a relation on disk.
- Lab2:Build a DBMS that can rn simple SQL queries and also supports data updates. a
- Lab3:Add a lock manager(transactions).
- Lab4:Add a write-ahead log(transactions).
- Lab5:Add a query optimizer.
Lab purpose
- Hands-on experience building a DBMS.
- Deepen your understanding significantly.
- We will build a classical DBMS.
Review:Data Model
What is a data model ?
- A mathematical formalism for data
What is the relational data model?
- Data is stored in tabels(aka relations).
- Data is requeried via relational queries.
- Queries are set-at-a-time
Review:Transactions
What is a transaction?
- A set of instructions that must be executed all or nothing.
What properties do transactions have?
- ACID
- Better: Serialization, recovery
Lecture2 Review of the Relational Model
Database/Relation/Tuple
A Database is collection of relations.
A Relation R is subset of S1 x S2 x … x Sn
- – Where Si is the domain of attribute i
- – n is number of attributes of the relation
- – A relation is a set of tuples
A Tuple t is an element of S1 x S2 x … x Sn
- Other names: relation = table; tuple = row
Discussion
Rows in a relation:
- Ordering immaterial (a relation is a set)
- All rows are distinct – set semantics
- Query answers may have duplicates – bag semantics
Columns in a tuple:
- Ordering is significant
- Applications refer to columns by their names
Domain of each column is a primitive type
Schema(结构)
Relation schema: describes column heads
- Relation name
- Name of each field (or column, or attribute)
- Domain of each field
Degree (or arity) of relation: # attributes
Database schema: set of all relation schemas
Instance
Relation instance: concrete table content
- Set of tuples (also called records) matching the schema
Database instance: set of all relation instances
Key Constraints
Super Key: “set of attributes that functionally determines all attributes”
Key: Minimal super-key; a.k.a. “candidate key”
Primary key: One minimal key can be selected as primary key
Foreign Key Constraints
A relation can refer to a tuple in another relation
Foreign key
- Field that refers to tuples in another relation
- Typically, this field refers to the primary key of other relation
- Can pick another field as well
Relational Query Language
Set-at-a-time:
- Query inputs and outputs are relations
Two variants of the query language
- Relational algebra: specifies order of operations
- Relational calculus / SQL: declarative
Relational Algebra
Queries specified in an operational manner
- A query gives a step-by-step procedure
Relational operators
- Take one or two relation instances as argument
- Return one relation instance as result
- Easy to compose into relational algebra expressions
Five Basic Relational Operators
Example of Algebra Queries
Extended Operators of Relational Algebra
Structured Query Language: SQL
Declarative query language, based on the relational calculus (see 344)
Data definition language(DDL)
- Statements to create, modify tables and views
Data manipulation language(DML)
- Statements to issue queries, insert, delete data
SQL Query
Nested Queries
Nested query
- Query that has another query embedded within it
- The embedded query is called a subquery
Why do we need them?
- Enables to refer to a table that must itself be computed
Subqueries can appear in:
- WHERE clause (common)
- FROM clause (less common)
- HAVING clause (less common)
Subqueries Returning Relations
Correlated Queries
Aggregation
SQL supports several aggregation operations: sum, count, min, max, avg
Except count, all aggregations apply to a single attribute
Grouping and Aggregation
RA
RA = Relational Algebra :
- RA is similar to normal algebra (as in 2+3*x-y), except we use relations as values instead of numbers, and the operations and operators are different.
From SQL to RA
Benefits of Relational Model
Physical data independence
- Can change how data is organized on disk without affecting applications
Logical data independence
- Can change the logical schema without affecting applications (not 100%... consider updates)
Query Evaluation Steps Review