Database Management Systems
Part 1: Foundations
Chapter 1. Overview of database systems
database: a collection of data
network data model, hierarchical data model, relational data model (Edgar Codd, 1970), object-oriented model, object-relational model
advantages of DBMS: data independence, efficient data access, data integrity and security, data administration, concurrent access and crash recovery, reduced application development time
data model: a collection of high-level data description constructs that hide many low-level storage details. (semantic data model: more abstract, high-level, e.g. ER)
relation: a set of records; schema: field (attribute, column) + type
integrity constraints: conditions that records in a relation must satisfy
levels of abstraction in a DBMS: external, conceptual (logical), physical
DDL: data definition language --> external, conceptual
system catalog: store information about external, conceptual, and physical schemas
physical schema: specify storage details - how relations described in conceptual schema are actually stored on secondary storage devices (file organizations, indexes)
external schema: a collection of views and relations; one conceptual/physical schema --> several external schemas, guided by end-user requirements
view: records in a view are not stored in DBMS, but computed using a definition of the view, in terms of relations stored
logical data independence: users shielded from changes in logical structure (schema) of data; physical data independence: conceptual schema hides storage details without altering applications
query language: relational calculus, relational algebra
DML: data manipulation language
transaction: any one execution of a user program in a DBMS -- the basic unit of change as seen by DBMS
concurrent execution of transactions --> locking protocal: a set of rules followed by each transaction to ensure net effect of interleaved transactions is identical to executing all transactions in some serial order
lock: mechanism to control access to database objects --> shared lock (read) and exclusive lock (write)
incomplete transactions and system crashes --> write-ahead log (WAL): each write action must be recorded in log on disk before corresponding change is reflected
checkpoint: periodically forcing some information (selectively a collection of pages in main memory) to disk --> reduce time needed to recover from crash
Architecture of a DBMS:
query optimizer: use information about how data is stored to produce an efficient execution plan for evaluating the query
execution plan: a blueprint for evaluating the query, a tree of relational operators annotated with access methods etc.
file and access methods layer: file = a collection of pages/records; heap file: files of unordered pages; indexes
buffer manager: bring pages in from disk to main memory as needed in response to read requests
disk space manager: manage space on disk
transaction manager, lock manager, recovery manager
database implementors, end users, database application programmers, database administrator (DBA)
tasks of DBA: design of conceptual and physical schemas, security and authorization, data availability and recovery from failures, database tuning
Chapter 2. Introduction to database design
entity-relationship data model --> conceptual database design
database design process:
(1) requirement analysis
(2) conceptual database design: develop a high-level description of data, along with constraints
(3) logical database design: convert conceptual design into database schema
(4) schema refinement: normalizing relations
(5) physical database design: build indexes, cluster tables
(6) application and security design
entity --> entity set; attributes, domain
key: a minimal set of attributes whose values uniquely identify an entity in the set. (more than one candidate key; designate one of them as primary key)
relationship set: a set of n-tuples; descriptive attributes; an instance of a relationship set is a set of relationships; ternary relationship; a relationship is uniquely identified by the participating entities
roles, role indicator
key constraints: arrow, A-->B means given a A entity, we can uniquely determine the B relationship in which it appears; one-to-many, many-to-many, one-to-one
participation constraint: total (thick line), partial
weak entities: identified uniquely only by considering some of its attributes in conjunction with the primary key of another entity called the identifying owner (one-to-many relationship; total participation). identifying relationship: dark lines; partial key of weak entity set: broken underline
class hierarchy: ISA. overlap constraints, covering constraints
aggregation: a relationship set (dashed box) participates in another relationship set
UML: unified modeling language approach; business modeling, system modeling, conceptual database modeling, physical database modeling, hardware system modeling
Chapter 3. The relational model
CREATE TABLE Students (sid CHAR(20),
name CHAR(20),
age INTEGER)
INSERT INTO Students (sid, name, age)
VALUES (53678, 'Smith', 19)
DELETE FROM Students S
WHERE S.name = 'Joe'
UPDATE Students S
SET S.gpa = S.gpa - 0.1
WHERE S.gpa >= 3.3
Integrity constraints (IC): a condition specified on a database schema and restricts the data that can be stored in an instance of the database
legal: database instance satisfies all integrity constraints --> DBMS enforces ICs
domain constraints;
key constraints: a certain minimal subset of fields of a relation is a unique identifier for a tuple --> (candidate) key: (1) two distinct tuples in a legal instance cannot have identical values in all fields of a key (2) no subset of the set of fields in a key is a unique identifier for a tuple (has to be minimal)
superkey: a set of fields that contains a key (the set of all fields is always a superkey)
UNIQUE: declare a key, PRIMARY KEY: declare a primary key
CREATE TABLE students (sid CHAR(20),
name CHAR(20),
age INTEGER,
UNIQUE (name, age),
CONSTRAINT studentsKey PRIMARY KEY (sid))
CREATE TABLE enrolled (stuid CHAR(20),
cid CHAR(20),
PRIMARY KEY (stuid, cid),
FOREIGN KET (stuid) REFERENCES students)
general constraints --> SQL: table constraints and assertions (e.g. participation, overlap, covering)
referential integrity enforcement steps --> 4 options on DELETE and UPDATE
The options are specified as part of the foreign key declaration: FOREIGN KEY (stuid) REFERENCES students ON DELETE CASCADE ON UPDATE NO ACTION
ON DELETE SET DEFAULT (specify default: sid CHAR(20) DEFAULT '53666'), ON DELETE SET NULL
By default, a constraint is checked at the end of every SQL statement that could lead to a violation (IMMEDIATE); if there is a violation, the statement is rejected
--> sometimes too inflexible: defer constraint checking: SET CONSTRAINT ConstraintFoo DEFERRED <-- check at commit time
view: not explicitly stored but computed as needed from a view definition
CREATE VIEW B-Students (name, sid)
AS SELECT S.sname, S.sid
FROM Students S, Enrolled E
WHERE S.sid = E.stuid AND E.grade = 'B'
The view can be used just like a base table --> view mechanism provides support for logical data independence (also valuable for security: access control)
updatable views: defined on a single base table using just selection and projection, with no use of aggregate operations
DROP VIEW, DROP TABLE (RESTRICT | CASCADE), ALTER TABLE xxx ADD COLUMN xxx
Chapter 4. Relational algebra and calculus
relational algebra: selection, projection, set operations (union, intersection, set-difference, cross-product), renaming (\rho), join (condition, equijoin, natural), division
union-compatible: same number of fields, and corresponding fields, taken in order from left to right, have the same domains
division: A/B, the set of all x values such that for every y value in B, there is a tuple <x, y> in A
\pi_x(A) - \pi_x((\pi_x(A) * B) - A)
Section 4.2.6: exercises
relational calculus (omitted)
Chapter 5. SQL: queries, constraints, triggers
data manipulation language, data definition language, triggers and advanced integrity constraints, embedded and dynamic SQL, client-server execution and remote database access, transaction management, security, advanced feature
active database: has a collection of triggers --> a trigger describes actions to be taken when certain situations arise
SELECT xxx AS yyy
WHERE xxx LIKE 'B_%B': wildcard symbols % (zero or more arbitrary characters) and _ (exactly one arbitrary character)
UNION, INTERSECT, EXCEPT, (NOT) IN, op ANY, op ALL, (NOT) EXISTS
nested query: a subquery typically appears within the WHERE clause of a query (can appear in FROM or HAVING clause)
correlated query:
SELECT S.name
FROM Sailors S
WHERE EXISTS ( SELECT *
FROM Reserves R
WHERE R.bid = 103 AND R.sid = S.sid )
The subquery must be re-evaluated for each row in Sailors. Similar predicate: NOT EXISTS, UNIQUE
The other good use of * in SELECT is COUNT(*)
set comparison: op ANY, op ALL
SELECT S.sid
FROM Sailors S
WHERE S.rating >= ALL ( SELECT S2.rating FROM Sailors S2 )
equivalent: IN and = ANY, NOT IN <> ALL
INTERSECT --> rewritten using IN, EXCEPT --> NOT IN
division:
SELECT S.name
FROM Sailors S
WHERE NOT EXISTS (( SELECT B.bid
FROM Boats B)
EXCEPT
(SELECT R.bid
FROM Reserves R
WHERE R.sid = S.sid ))
SELECT S.name
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
FROM Boats B
WHERE NOT EXISTS ( SELECT R.bid
FROM Reserves R
WHERE R.bid = B.bid AND R.sid = S.sid ))
aggregate operators: COUNT [DISTINCT], SUM [DISTINCT], AVG [DISTINCT], MAX [A], MIN[A]
If the SELECT clause uses an aggregate operation, then it must use only aggregate operations unless the query contains a GROUP BY clause
SELECT [DISTINCT] select-list
FROM from-list
WHERE qualification
GROUP BY grouping-list
HAVING grouping-qualification
SELECT S.rating, MIN (S.age) AS minage
FROM Sailors S
WHERE S.age >= 18
GROUP BY S.rating
HAVING COUNT(*) > 1 AND EVERY (S.age <= 60)
Section 5.5.2: examples of aggregate queries
comparisons using null values: IS [NOT] NULL, arithmetic operations + - * / is null (unknown) if one of the argument is null
COUNT(*) handles null just like others (they get counted), and all other aggregate operations simply discard null values
outer joins: rows without a match appear exactly once in the result, with the unmatched result columns assigned null values
SELECT S.sid, R.bid
FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R
disallowing null values: NOT NULL (implicit for primary key)
table constraints: CHECK conditional-expression
CREATE TABLE Reserves ( ...
CHECK ( 'Interlake' <>
( SELECT B.bname
FROM Boats B
WHERE B.bid = Reserves.bid )))
CREATE DOMAIN ratingval INTEGER DEFAULT 1 CHECK ( VALUE >= 1 AND VALUE <= 10 ) --> can be used in schema declaration: rating ratingval
CREATE TYPE ratingtype AS INTEGER --> a new distinct type, cannot be compared with values of other types, e.g. integer
assertions (involve two more more tables):
CREATE ASSERTION smallClub
CHECK (( SELECT COUNT (S.sid) FROM Sailors S )
+ ( SELECT COUNT (B.bid) FROM Boats B )
< 100 )
trigger: a procedure that is automatically invoked by DBMS in response to specified changes to the database -- event, condition, action
active database: a database that has a set of associated triggers
init_count: statement-level trigger
incr_count: row-level trigger
set-oriented trigger:
In an active database, when DBMS is about to execute a statement that modifies the database, it checks whether some trigger is activated by the statement. If so, the DBMS processes the trigger by evaluating its condition part, and then (if the condition evaluates to true) executing its action part.
Uses of triggers: to maintain database consistency; alert users to unusual events; generate a log of events to support auditing and security checks; gather statistics on table accesses and modifications