Database Management Systems (Foundations)

最新推荐文章于 2020-11-27 15:01:24 发布

lllcfr1

最新推荐文章于 2020-11-27 15:01:24 发布

阅读量952

点赞数

分类专栏： Book Database

Book 同时被 2 个专栏收录

27 篇文章 0 订阅

订阅专栏

Database

8 篇文章 0 订阅

订阅专栏

Database Management Systems

Part 1: Foundations

Chapter 1. Overview of database systems

database: a collection of data

network data model, hierarchical data model, relational data model (Edgar Codd, 1970), object-oriented model, object-relational model

advantages of DBMS: data independence, efficient data access, data integrity and security, data administration, concurrent access and crash recovery, reduced application development time

data model: a collection of high-level data description constructs that hide many low-level storage details. (semantic data model: more abstract, high-level, e.g. ER)

relation: a set of records; schema: field (attribute, column) + type

integrity constraints: conditions that records in a relation must satisfy

levels of abstraction in a DBMS: external, conceptual (logical), physical

DDL: data definition language --> external, conceptual

system catalog: store information about external, conceptual, and physical schemas

physical schema: specify storage details - how relations described in conceptual schema are actually stored on secondary storage devices (file organizations, indexes)

external schema: a collection of views and relations; one conceptual/physical schema --> several external schemas, guided by end-user requirements

view: records in a view are not stored in DBMS, but computed using a definition of the view, in terms of relations stored

logical data independence: users shielded from changes in logical structure (schema) of data; physical data independence: conceptual schema hides storage details without altering applications

query language: relational calculus, relational algebra

DML: data manipulation language

transaction: any one execution of a user program in a DBMS -- the basic unit of change as seen by DBMS

concurrent execution of transactions --> locking protocal: a set of rules followed by each transaction to ensure net effect of interleaved transactions is identical to executing all transactions in some serial order

lock: mechanism to control access to database objects --> shared lock (read) and exclusive lock (write)

incomplete transactions and system crashes --> write-ahead log (WAL): each write action must be recorded in log on disk before corresponding change is reflected

checkpoint: periodically forcing some information (selectively a collection of pages in main memory) to disk --> reduce time needed to recover from crash

Architecture of a DBMS:

query optimizer: use information about how data is stored to produce an efficient execution plan for evaluating the query

execution plan: a blueprint for evaluating the query, a tree of relational operators annotated with access methods etc.

file and access methods layer: file = a collection of pages/records; heap file: files of unordered pages; indexes

buffer manager: bring pages in from disk to main memory as needed in response to read requests

disk space manager: manage space on disk

transaction manager, lock manager, recovery manager

database implementors, end users, database application programmers, database administrator (DBA)

tasks of DBA: design of conceptual and physical schemas, security and authorization, data availability and recovery from failures, database tuning

Chapter 2. Introduction to database design

entity-relationship data model --> conceptual database design

database design process:

(1) requirement analysis

(2) conceptual database design: develop a high-level description of data, along with constraints

(3) logical database design: convert conceptual design into database schema

(4) schema refinement: normalizing relations

(5) physical database design: build indexes, cluster tables

(6) application and security design

entity --> entity set; attributes, domain

key: a minimal set of attributes whose values uniquely identify an entity in the set. (more than one candidate key; designate one of them as primary key)

relationship set: a set of n-tuples; descriptive attributes; an instance of a relationship set is a set of relationships; ternary relationship; a relationship is uniquely identified by the participating entities

roles, role indicator

key constraints: arrow, A-->B means given a A entity, we can uniquely determine the B relationship in which it appears; one-to-many, many-to-many, one-to-one

participation constraint: total (thick line), partial

weak entities: identified uniquely only by considering some of its attributes in conjunction with the primary key of another entity called the identifying owner (one-to-many relationship; total participation). identifying relationship: dark lines; partial key of weak entity set: broken underline

class hierarchy: ISA. overlap constraints, covering constraints

aggregation: a relationship set (dashed box) participates in another relationship set

UML: unified modeling language approach; business modeling, system modeling, conceptual database modeling, physical database modeling, hardware system modeling

Chapter 3. The relational model

CREATE TABLE Students (sid CHAR(20),
                       name CHAR(20),
                       age INTEGER)

INSERT INTO Students (sid, name, age)
VALUES (53678, 'Smith', 19)

DELETE FROM Students S
WHERE S.name = 'Joe'

UPDATE Students S
SET S.gpa = S.gpa - 0.1
WHERE S.gpa >= 3.3

Integrity constraints (IC): a condition specified on a database schema and restricts the data that can be stored in an instance of the database

legal: database instance satisfies all integrity constraints --> DBMS enforces ICs

domain constraints;

key constraints: a certain minimal subset of fields of a relation is a unique identifier for a tuple --> (candidate) key: (1) two distinct tuples in a legal instance cannot have identical values in all fields of a key (2) no subset of the set of fields in a key is a unique identifier for a tuple (has to be minimal)

superkey: a set of fields that contains a key (the set of all fields is always a superkey)

UNIQUE: declare a key, PRIMARY KEY: declare a primary key

CREATE TABLE students (sid CHAR(20),
                       name CHAR(20),
                       age INTEGER,
                       UNIQUE (name, age),
                       CONSTRAINT studentsKey PRIMARY KEY (sid))

CREATE TABLE enrolled (stuid CHAR(20),
                       cid CHAR(20),
                       PRIMARY KEY (stuid, cid),
                       FOREIGN KET (stuid) REFERENCES students)

general constraints --> SQL: table constraints and assertions (e.g. participation, overlap, covering)

referential integrity enforcement steps --> 4 options on DELETE and UPDATE

The options are specified as part of the foreign key declaration: FOREIGN KEY (stuid) REFERENCES students ON DELETE CASCADE ON UPDATE NO ACTION

ON DELETE SET DEFAULT (specify default: sid CHAR(20) DEFAULT '53666'), ON DELETE SET NULL

By default, a constraint is checked at the end of every SQL statement that could lead to a violation (IMMEDIATE); if there is a violation, the statement is rejected

--> sometimes too inflexible: defer constraint checking: SET CONSTRAINT ConstraintFoo DEFERRED <-- check at commit time

view: not explicitly stored but computed as needed from a view definition

CREATE VIEW B-Students (name, sid)
       AS SELECT S.sname, S.sid
FROM Students S, Enrolled E
WHERE S.sid = E.stuid AND E.grade = 'B'

The view can be used just like a base table --> view mechanism provides support for logical data independence (also valuable for security: access control)

updatable views: defined on a single base table using just selection and projection, with no use of aggregate operations

DROP VIEW, DROP TABLE (RESTRICT | CASCADE), ALTER TABLE xxx ADD COLUMN xxx

Chapter 4. Relational algebra and calculus

relational algebra: selection, projection, set operations (union, intersection, set-difference, cross-product), renaming (\rho), join (condition, equijoin, natural), division

union-compatible: same number of fields, and corresponding fields, taken in order from left to right, have the same domains

division: A/B, the set of all x values such that for every y value in B, there is a tuple <x, y> in A

\pi_x(A) - \pi_x((\pi_x(A) * B) - A)

Section 4.2.6: exercises

relational calculus (omitted)

Chapter 5. SQL: queries, constraints, triggers

data manipulation language, data definition language, triggers and advanced integrity constraints, embedded and dynamic SQL, client-server execution and remote database access, transaction management, security, advanced feature

active database: has a collection of triggers --> a trigger describes actions to be taken when certain situations arise

SELECT xxx AS yyy

WHERE xxx LIKE 'B_%B': wildcard symbols % (zero or more arbitrary characters) and _ (exactly one arbitrary character)

UNION, INTERSECT, EXCEPT, (NOT) IN, op ANY, op ALL, (NOT) EXISTS

nested query: a subquery typically appears within the WHERE clause of a query (can appear in FROM or HAVING clause)

correlated query:

SELECT S.name
FROM   Sailors S
WHERE  EXISTS ( SELECT *
                FROM Reserves R
                WHERE R.bid = 103 AND R.sid = S.sid )

The subquery must be re-evaluated for each row in Sailors. Similar predicate: NOT EXISTS, UNIQUE

The other good use of * in SELECT is COUNT(*)

set comparison: op ANY, op ALL

SELECT S.sid
FROM   Sailors S
WHERE  S.rating >= ALL ( SELECT S2.rating FROM Sailors S2 )

equivalent: IN and = ANY, NOT IN <> ALL

INTERSECT --> rewritten using IN, EXCEPT --> NOT IN

division:

SELECT S.name
FROM   Sailors S
WHERE NOT EXISTS (( SELECT B.bid
                    FROM Boats B)
                    EXCEPT
                    (SELECT R.bid
                     FROM Reserves R
                     WHERE R.sid = S.sid ))

SELECT S.name
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
                   FROM Boats B
                   WHERE NOT EXISTS ( SELECT R.bid
                                      FROM Reserves R
                                      WHERE R.bid = B.bid AND R.sid = S.sid ))

aggregate operators: COUNT [DISTINCT], SUM [DISTINCT], AVG [DISTINCT], MAX [A], MIN[A]

If the SELECT clause uses an aggregate operation, then it must use only aggregate operations unless the query contains a GROUP BY clause

SELECT   [DISTINCT] select-list
FROM     from-list
WHERE    qualification
GROUP BY grouping-list
HAVING   grouping-qualification

SELECT   S.rating, MIN (S.age) AS minage
FROM     Sailors S
WHERE    S.age >= 18
GROUP BY S.rating
HAVING   COUNT(*) > 1 AND EVERY (S.age <= 60)

Section 5.5.2: examples of aggregate queries

comparisons using null values: IS [NOT] NULL, arithmetic operations + - * / is null (unknown) if one of the argument is null

COUNT(*) handles null just like others (they get counted), and all other aggregate operations simply discard null values

outer joins: rows without a match appear exactly once in the result, with the unmatched result columns assigned null values

SELECT S.sid, R.bid
FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R

disallowing null values: NOT NULL (implicit for primary key)

table constraints: CHECK conditional-expression

CREATE TABLE Reserves ( ...
                        CHECK ( 'Interlake' <>
                                  ( SELECT B.bname
                                    FROM Boats B
                                    WHERE B.bid = Reserves.bid )))

CREATE DOMAIN ratingval INTEGER DEFAULT 1 CHECK ( VALUE >= 1 AND VALUE <= 10 ) --> can be used in schema declaration: rating ratingval

CREATE TYPE ratingtype AS INTEGER --> a new distinct type, cannot be compared with values of other types, e.g. integer

assertions (involve two more more tables):

CREATE ASSERTION smallClub
CHECK (( SELECT COUNT (S.sid) FROM Sailors S )
        + ( SELECT COUNT (B.bid) FROM Boats B )
        < 100 )

trigger: a procedure that is automatically invoked by DBMS in response to specified changes to the database -- event, condition, action

active database: a database that has a set of associated triggers

init_count: statement-level trigger

incr_count: row-level trigger

set-oriented trigger:

In an active database, when DBMS is about to execute a statement that modifies the database, it checks whether some trigger is activated by the statement. If so, the DBMS processes the trigger by evaluating its condition part, and then (if the condition evaluates to true) executing its action part.

Uses of triggers: to maintain database consistency; alert users to unusual events; generate a log of events to support auditing and security checks; gather statistics on table accesses and modifications