Database Management Systems (Foundations)

Database Management Systems

Part 1: Foundations


Chapter 1. Overview of database systems

database: a collection of data

network data model, hierarchical data model, relational data model (Edgar Codd, 1970), object-oriented model, object-relational model

advantages of DBMS: data independence, efficient data access, data integrity and security, data administration, concurrent access and crash recovery, reduced application development time


data model: a collection of high-level data description constructs that hide many low-level storage details. (semantic data model: more abstract, high-level, e.g. ER)

relation: a set of records; schema: field (attribute, column) + type

integrity constraints: conditions that records in a relation must satisfy

levels of abstraction in a DBMS: external, conceptual (logical), physical

DDL: data definition language --> external, conceptual

system catalog: store information about external, conceptual, and physical schemas

physical schema: specify storage details - how relations described in conceptual schema are actually stored on secondary storage devices (file organizations, indexes)

external schema: a collection of views and relations; one conceptual/physical schema --> several external schemas, guided by end-user requirements

view: records in a view are not stored in DBMS, but computed using a definition of the view, in terms of relations stored

logical data independence: users shielded from changes in logical structure (schema) of data; physical data independence: conceptual schema hides storage details without altering applications

query language: relational calculus, relational algebra

DML: data manipulation language


transaction: any one execution of a user program in a DBMS -- the basic unit of change as seen by DBMS

concurrent execution of transactions --> locking protocal: a set of rules followed by each transaction to ensure net effect of interleaved transactions is identical to executing all transactions in some serial order

lock: mechanism to control access to database objects --> shared lock (read) and exclusive lock (write)

incomplete transactions and system crashes --> write-ahead log (WAL): each write action must be recorded in log on disk before corresponding change is reflected

checkpoint: periodically forcing some information (selectively a collection of pages in main memory) to disk --> reduce time needed to recover from crash


Architecture of a DBMS:


query optimizer: use information about how data is stored to produce an efficient execution plan for evaluating the query

execution plan: a blueprint for evaluating the query, a tree of relational operators annotated with access methods etc.

file and access methods layer: file = a collection of pages/records; heap file: files of unordered pages; indexes

buffer manager: bring pages in from disk to main memory as needed in response to read requests

disk space manager: manage space on disk

transaction manager, lock manager, recovery manager


database implementors, end users, database application programmers, database administrator (DBA)

tasks of DBA: design of conceptual and physical schemas, security and authorization, data availability and recovery from failures, database tuning




Chapter 2. Introduction to database design

entity-relationship data model --> conceptual database design


database design process:

(1) requirement analysis

(2) conceptual database design: develop a high-level description of data, along with constraints

(3) logical database design: convert conceptual design into database schema

(4) schema refinement: normalizing relations

(5) physical database design: build indexes, cluster tables

(6) application and security design


entity --> entity set; attributes, domain

key: a minimal set of attributes whose values uniquely identify an entity in the set. (more than one candidate key; designate one of them as primary key)

relationship set: a set of n-tuples; descriptive attributes; an instance of a relationship set is a set of relationships; ternary relationship; a relationship is uniquely identified by the participating entities

roles, role indicator


key constraints: arrow, A-->B means given a A entity, we can uniquely determine the B relationship in which it appears; one-to-many, many-to-many, one-to-one

participation constraint: total (thick line), partial

weak entities: identified uniquely only by considering some of its attributes in conjunction with the primary key of another entity called the identifying owner (one-to-many relationship; total participation). identifying relationship: dark lines; partial key of weak entity set: broken underline

class hierarchy: ISA. overlap constraints, covering constraints

aggregation: a relationship set (dashed box) participates in another relationship set


UML: unified modeling language approach; business modeling, system modeling, conceptual database modeling, physical database modeling, hardware system modeling




Chapter 3. The relational model

CREATE TABLE Students (sid CHAR(20),
                       name CHAR(20),
                       age INTEGER)

INSERT INTO Students (sid, name, age)
VALUES (53678, 'Smith', 19)

DELETE FROM Students S
WHERE S.name = 'Joe'

UPDATE Students S
SET S.gpa = S.gpa - 0.1
WHERE S.gpa >= 3.3
Integrity constraints (IC): a condition specified on a database schema and restricts the data that can be stored in an instance of the database

legal: database instance satisfies all integrity constraints --> DBMS enforces ICs

domain constraints;

key constraints: a certain minimal subset of fields of a relation is a unique identifier for a tuple --> (candidate) key: (1) two distinct tuples in a legal instance cannot have identical values in all fields of a key (2) no subset of the set of fields in a key is a unique identifier for a tuple (has to be minimal)

superkey: a set of fields that contains a key (the set of all fields is always a superkey)

UNIQUE: declare a key, PRIMARY KEY: declare a primary key

CREATE TABLE students (sid CHAR(20),
                       name CHAR(20),
                       age INTEGER,
                       UNIQUE (name, age),
                       CONSTRAINT studentsKey PRIMARY KEY (sid))

CREATE TABLE enrolled (stuid CHAR(20),
                       cid CHAR(20),
                       PRIMARY KEY (stuid, cid),
                       FOREIGN KET (stuid) REFERENCES students)

general constraints --> SQL: table constraints and assertions (e.g. participation, overlap, covering)

referential integrity enforcement steps --> 4 options on DELETE and UPDATE

The options are specified as part of the foreign key declaration: FOREIGN KEY (stuid) REFERENCES students ON DELETE CASCADE ON UPDATE NO ACTION

ON DELETE SET DEFAULT (specify default: sid CHAR(20) DEFAULT '53666'), ON DELETE SET NULL

By default, a constraint is checked at the end of every SQL statement that could lead to a violation (IMMEDIATE); if there is a violation, the statement is rejected

--> sometimes too inflexible: defer constraint checking: SET CONSTRAINT ConstraintFoo DEFERRED <-- check at commit time


view: not explicitly stored but computed as needed from a view definition

CREATE VIEW B-Students (name, sid)
       AS SELECT S.sname, S.sid
FROM Students S, Enrolled E
WHERE S.sid = E.stuid AND E.grade = 'B'

The view can be used just like a base table --> view mechanism provides support for logical data independence (also valuable for security: access control)

updatable views: defined on a single base table using just selection and projection, with no use of aggregate operations

DROP VIEW, DROP TABLE (RESTRICT | CASCADE), ALTER TABLE xxx ADD COLUMN xxx




Chapter 4. Relational algebra and calculus

relational algebra: selection, projection, set operations (union, intersection, set-difference, cross-product), renaming (\rho), join (condition, equijoin, natural), division

union-compatible: same number of fields, and corresponding fields, taken in order from left to right, have the same domains

division: A/B, the set of all x values such that for every y value in B, there is a tuple <x, y> in A

\pi_x(A) - \pi_x((\pi_x(A) * B) - A)

Section 4.2.6: exercises

relational calculus (omitted)




Chapter 5. SQL: queries, constraints, triggers

data manipulation language, data definition language, triggers and advanced integrity constraints, embedded and dynamic SQL, client-server execution and remote database access, transaction management, security, advanced feature

active database: has a collection of triggers --> a trigger describes actions to be taken when certain situations arise

SELECT xxx AS yyy

WHERE xxx LIKE 'B_%B': wildcard symbols % (zero or more arbitrary characters) and _ (exactly one arbitrary character)

UNION, INTERSECT, EXCEPT, (NOT) IN, op ANY, op ALL, (NOT) EXISTS

nested query: a subquery typically appears within the WHERE clause of a query (can appear in FROM or HAVING clause)

correlated query:

SELECT S.name
FROM   Sailors S
WHERE  EXISTS ( SELECT *
                FROM Reserves R
                WHERE R.bid = 103 AND R.sid = S.sid )

The subquery must be re-evaluated for each row in Sailors. Similar predicate: NOT EXISTS, UNIQUE

The other good use of * in SELECT is COUNT(*)

set comparison: op ANY, op ALL

SELECT S.sid
FROM   Sailors S
WHERE  S.rating >= ALL ( SELECT S2.rating FROM Sailors S2 )

equivalent: IN and = ANY, NOT IN <> ALL

INTERSECT --> rewritten using IN, EXCEPT --> NOT IN

division:

SELECT S.name
FROM   Sailors S
WHERE NOT EXISTS (( SELECT B.bid
                    FROM Boats B)
                    EXCEPT
                    (SELECT R.bid
                     FROM Reserves R
                     WHERE R.sid = S.sid ))

SELECT S.name
FROM Sailors S
WHERE NOT EXISTS ( SELECT B.bid
                   FROM Boats B
                   WHERE NOT EXISTS ( SELECT R.bid
                                      FROM Reserves R
                                      WHERE R.bid = B.bid AND R.sid = S.sid ))

aggregate operators: COUNT [DISTINCT], SUM [DISTINCT], AVG [DISTINCT], MAX [A], MIN[A]

If the SELECT clause uses an aggregate operation, then it must use only aggregate operations unless the query contains a GROUP BY clause

SELECT   [DISTINCT] select-list
FROM     from-list
WHERE    qualification
GROUP BY grouping-list
HAVING   grouping-qualification

SELECT   S.rating, MIN (S.age) AS minage
FROM     Sailors S
WHERE    S.age >= 18
GROUP BY S.rating
HAVING   COUNT(*) > 1 AND EVERY (S.age <= 60)

Section 5.5.2: examples of aggregate queries

comparisons using null values: IS [NOT] NULL, arithmetic operations + - * / is null (unknown) if one of the argument is null

COUNT(*) handles null just like others (they get counted), and all other aggregate operations simply discard null values

outer joins: rows without a match appear exactly once in the result, with the unmatched result columns assigned null values

SELECT S.sid, R.bid
FROM Sailors S NATURAL LEFT OUTER JOIN Reserves R

disallowing null values: NOT NULL (implicit for primary key)


table constraints: CHECK conditional-expression

CREATE TABLE Reserves ( ...
                        CHECK ( 'Interlake' <>
                                  ( SELECT B.bname
                                    FROM Boats B
                                    WHERE B.bid = Reserves.bid )))

CREATE DOMAIN ratingval INTEGER DEFAULT 1 CHECK ( VALUE >= 1 AND VALUE <= 10 ) --> can be used in schema declaration: rating ratingval

CREATE TYPE ratingtype AS INTEGER --> a new distinct type, cannot be compared with values of other types, e.g. integer

assertions (involve two more more tables):

CREATE ASSERTION smallClub
CHECK (( SELECT COUNT (S.sid) FROM Sailors S )
        + ( SELECT COUNT (B.bid) FROM Boats B )
        < 100 )


trigger: a procedure that is automatically invoked by DBMS in response to specified changes to the database -- event, condition, action

active database: a database that has a set of associated triggers

init_count: statement-level trigger

incr_count: row-level trigger


set-oriented trigger:


In an active database, when DBMS is about to execute a statement that modifies the database, it checks whether some trigger is activated by the statement. If so, the DBMS processes the trigger by evaluating its condition part, and then (if the condition evaluates to true) executing its action part.

Uses of triggers: to maintain database consistency; alert users to unusual events; generate a log of events to support auditing and security checks; gather statistics on table accesses and modifications








  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值