4. 数据库管理系统

Content

  • The Architecture of DBMS(数据库管理系统的结构)
    1. The components of DBMS core(DBMS的内核组成结构)
    2. The process structure of DBMS(DBMS运行状态下的进程结构)
  • Database Access Management(数据库访问管理,物理层)
  • Query Optimization
  • Transaction Management
    1. Recovery
    2. Concurrent Control

4.1 The Components of DBMS Core

在这里插入图片描述

4.2 The Process Structure of DBMS

  • Single process structure
  • Multi processes structure
  • Multi threads structure
  • Communication protocols between processes threads

Single process structure 单进程

在这里插入图片描述

Multi processes structure 多进程

在这里插入图片描述

多线程

在这里插入图片描述

4.3 Database Access Management

  • Access types
  • File organization
  • Index technique
  • Access primitives

Index Technique

  • B+ Tree
  • Clustering index(簇集索引)
  • Inverted file(栅格文件)
  • Dynamic hashing(动态哈希)
  • Grid structure file and partitioned hash function
  • Bitmap index (used in data warehouse)
  • Others

Query Optimization

The operation optimization of the tree

  • Decide the order of two joins
  • For every join operation, there are many computing method:
  • Nest Loop, R as O, S as I
  • Nest Loop, S as O, R as I
  • Nest Loop, use possible index on R as S
  • Merge Scan
  • Hash Join
    This is so called algebra optimization. It takes a series of transform on original query expression, and transform it into an equivalent, most effective, form to be executed.
    For example:
Query tree
The equivalent transform rules of relational algebra
  • Exchange rule of ∞/×: E1×E2≡E2×E1
    连接和笛卡尔乘积的交换律
  • Combination rule of ∞/×: E1×(E2×E3)≡(E1×E2)×E3
    结合律
  • Cluster rule of ∏: ∏A1···A(∏B1···Bn(E))≡∏A1···An(E), legal when A1···An is sub set of {B1···Bm}
    投影的串接律
  • Cluster rule of σ: σF1(σF2(E))≡σF1ΛF2(E)
    选择的串接律
  • Exchange rule of σ and ∏: σF(∏A1···An(E))≡∏A1···An(σF(E)), if F includes attributes B1···Bm which don’t belong to A1···An, then ∏A1···An(σF(E))≡∏A1···AnσF(∏A1···An, B1···Bn(E))
    选择与投影的交换律
  • If the attributes in F are all the attributes in E1, then σF(E1×E2)≡σF(E1)×E2
  • if F in the form of F1ΛF2, and there are only E1’s attributes in F1, and there are only E2’s attributes in F2, then σF(E1×E2)≡σF(E1)×σF(E2)
  • if F in the form of F1ΛF2, and there are only E1’s attributes in F1, while F2 includes the attributes both in E1 and E2, then σF(E1×E2)≡σF2(σF1((E1)×E2)
  • σF(E1∪E2)≡σF(E1)∪σF(E2)
  • σF(E1-E2)≡σF(E1)-σF(E2)
  • Suppose A1···An is a set of attributes, in which B1···Bn are E1’s attributes, and C1···Ck are E2’s attributes, then ∏A1···An(E1×E2)≡∏B1···Bn(E1)×∏C1···Cn(E2))
Basic principles

The target of algebra optimization is to make the scale of the operands which involved in binary operations be as small as possible:
代数优化的目标是对用户提交的初始查询做改写,把它变成一个最优的形式,原则二元操作、连接操作的规模尽可能地小:

  • Push down the unary operations as low as possible
  • Look for and combine the common sub-expression

The Operation Optimization 操作优化

How to find a ‘good’ access strategy to compute the query improved by algebra optimization is introduced in this section:

  • Optimization of select operation 选择
  • Optimization of project operation 投影
  • Optimization of set operation 集合
  • Optimization of join operation 连接
  • Optimization of combined operation 组合

Optimization of join operation

  • Merge scan 归并扫描: order the relation R and S on disk in ahead, then we can compare their tuples in order, and both relation only need to scan one time. If R and S have not ordered in ahead, must consider the ordering cost to see if it is worth to use this method.
  • Using index or hash to look for mapping tuples:
  • hash join 哈希联结

4.5 Recovery

4.5.1 Introduction

The main roles of recovery mechanism in DBMS are:

  • Reducing the likeihood of failures 减少系统发生故障的可能(prevention 防)
  • Recover from failures (solving)
    Restore DB to a consistent state after some failures
  • Redundancy is necessary 冗余是必须的
  • Should inspect all possible failures 监测所有可能的故障
Periodical dumping
  • Variation: Backup+Incremental dumping 增量存储
    I.D.: updated parts of DB
Backup and Log

Log: record of all changes on DB since the last backup copy was made.
Change:

Old valueNew value
before image B.Iafter image A.I

While recovering:

  • Some transaction maybe half done, should undo them with B.I recorded in Log
  • Some transactions have finished but the results have not been written into DB in time, should redo them with A.I recorded in Log (finish writing into DB).
    It is possible to recover DB to the most recent consistent state with Log.

4.5.3 Transaction

A transaction T is a finite sequence of actions on DB exhibiting the following effects:
一个事务T就是一组对数据库操作的集合,具有以下性质:ACID准则

  • Atomic action 原子性: Nothing or All
  • Consistency preservation 保持一致性: consistency state of DB→another consistency state of DB
  • Isolation 隔离性: concurrent transactions should run as if they are independent each other.
    Unix隔离性更好,Windows隔离性差,会导致系统崩溃。
  • Durability 持久性: The effect of a successfullly completed transaction are permanently reflected in DB and recoverable even failure occurs later.
    一个事务成功完成的影响是应该永久反应在数据库中,将来发生故障也是可恢复的

4.5.3 Some Structures to Support Recovery

Recovery information (such as Log) should be stored in nonvolatile storage 非挥发存储器. The following information need to be stored in order to support recovery:

  • Commit list 提交事务列表: list of TID which have been committed.
  • Active list: list of TID which is in progress
  • Log:

4.5.4 Commit Rule ang Log Ahead Rule

  • Commit Rule 提交规则
    A.I must be writen to nonvolatile storage before commit of the transaction
    事务提交之前,新值A.I必须写到非挥发存储器
  • Log Ahead Rule 先寄后写规则
    If A.I is written to DB before commit then B.I must first written to Log.
  • Recovery strategies
  1. undo(undo(undo—undo(x)—)) = undo(x) 老值还原
  2. redo(redo(redo—redo(x)—)) = redo(x) 新值重做
Three kinds of update strategy
1. A.I→DB before commit 直接改数据库

TID→active list
B.I→Log
A.I→DB//Log Ahead Rule
···
TID→commit list
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit listActive list
Undo, delete TID from active list
delete TID from active list
nothing to do
2. A.I→DB after commit 直接改数据库

TID→active list
A.I→Log//Commit Rule
···
TID→commit list
A.I→DB
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit listActive list
delete TID from active list
redo, delete TID from active list
nothing to do
3. A.I →DB concurrently with commit

TID→active list
A.I, B.I→Log //two Rule
A.I → DB //partially done 后台线程执行
···
TID→commit list
A.I→DB //completed
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit listActive list
Undo, delete TID from active list
redo, delete TID from active list
nothing to do
Conclusion
undoredo
a×
b
c
d×

4.6 Concurrency Control 并发控制

4.6.1 Introduction
In multi users DBMS, permit multi transaction access the database concurrently.
在一个多用户关系系统中, 允许多个事务同时运行

  • Why concurrency?
  1. Improving system utilization response time.
  2. Different transaction may access to different parts of database.
  • Problems arise from concurrent executions
    在这里插入图片描述
  1. 更新丢失
  2. 脏读
  3. 不可重复读
    So there maybe three kinds of conflict when transactions execute concurrently. They are write – write, write – read, and read – write conflicts. Write – write conflict must be avoided anytime. Write – read and read – write conflicts should be avoided generally, but they are endurable in some applications.

4.6.2 Serialization — the criterion for concurrency consistency

Definition: suppose {T1,T2,…Tn} is a set of transactions executing concurrently. If a schedule of {T1,T2,…Tn} produces the same effect on database as some serial execution of this set of transactions, then the schedule is serializable.
Problem: different schedule → different equivalent Principles of Database Systems, Xu Lizhen 43 serial execution → different result? (yes, n!)
在这里插入图片描述
The result of this schedule is the same as serial execution TA →TB →TC, so
it is serializable. The equivalent serial executiont is TA→TB→TC .

4.6.3 Locking Protocol

Locking method is the most basic concurrency control method. There maybe many kinds of locking protocols.

X locks
  • Only one type of lock, for both read and write.
  • Compatibility matrix :
    NL-no lock
    X-X lock
    Y -compatible
    N-incompatible
NLX
NLYY
XYN

在这里插入图片描述

Two Phase Locking
  • Definiton1: In a transaction, if all locks precede all unlocks, then the transaction is called two phase transaction. This restriction is called two phase locking protocol.
    在一个事务中,所有的加锁请求都在所有的锁释放之前,这个事务就是一个两阶段事务。这种限制就叫两段加锁协议。
  • Definition2: In a transaction, if it first acquires a lock on the object before operating on it, it is called well-formed
    合式的
  • Theorem: If S is any schedule of well-formed and two phase transactions,
    then S is serializable.
Conclusions
  1. Well-formed + 2PL : serializable
  2. Well-formed + 2PL + unlock update at EOT: serializable and recoverable. (without domino phenomena)
  3. Well-formed + 2PL + holding all locks to EOT: strict two phase locking transaction.
(S, X) locks

S lock - if read access is intended
X lock - if update access is intended

NLSX
NLYYY
SYYN
XYNN
(S, U, X) locks
  • U lock — update lock.
    For an update access the transaction first acquires a U-lock and then promote it to X-lock.
  • Purpose: shorten the time of exclusion, so as to boost concurrency degree, and reduce deadlock.
NLSUX
NLYYYY
SYYYN
UYYNN
XYNNN

三种锁协议

4.6.5 Deadlock & Live Lock

Dead lock: wait in cycle, no transaction can obtain all of resources needed to complete.
Live lock: although other transactions release their resource in limited time, some transaction can not get the resources needed for a very long time.

  • Live lock is simpler, only need to adjust schedule strategy, such
    as FIFO 先后顺序排队
  • Deadlock: (1) Prevention(don’t let it occur); (2) Solving(permit it
    occurs, but can solve it)
Deadlock Detection 死锁检测

Timeout: If a transaction waits for some specified time then deadlock is assumed and the transaction should be aborted.
当事务的等待时间超过规定时间,就被认为发生死锁,事务就被删除。
Detect deadlock by wait-for graph G=<V,E> 等待图
V : set of transactions {Ti|Ti is a transaction in DBS (i=1,2,…n)} 顶点集合
E : {<Ti,Tj>|Ti waits for Tj (i ≠ j)} 边集合
If there is cycle in the graph, the deadlock occurs. 出现环路,出现死锁

  • When to detect? 检查时机
  1. whenever one transaction waits. 加新的边时
  2. periodically 周期性
  • What to do when detected?
  1. Pick a victim (youngest, minimum abort cost, …) 在循环等待链上选一个牺牲者
  2. Abort the victim and release its locks and resources 舍弃牺牲者,释放锁和资源
  3. Grant a waiter 等待者执行
  4. Restart the victim ( automatically or manually) 重启牺牲者
Deadlock avoidance
  1. Requesting all locks at initial time of transaction. 在事务的初始时间请求所有锁
  2. Requesting locks in a specified order of resource. 排序
  3. Abort once conflicted. 理论上可行,实用度低
  4. Transaction Retry 事务重置
    Every transaction is uniquely time stamped. If TA requires a lock on a data object that is already locked by TB, one of the following methods is used:
    给每个事务安排一个时间戳,当事务TA对某个数据对象申请一个锁,但已经被TB占有,方法如下:
  • Wait-die: TA waits if it is older than TB, otherwise it “dies”, i.e. it is aborted and automatically retried with original timestamp.
  • 等待死亡法:TA比TB年龄大,TA等待,否则死亡,即把自己滚回,然后自动地按原时间戳重新运行
  • Wound-wait: TA waits if it is younger than TB, otherwise wound” TB, i.e. TB is aborted and automatically retried with original timestamp.
    击伤等待法:TA比TB年轻,TA等待,否则他把TB击伤,即TB退出,休眠一会儿然后用原时间戳运行
    In above, both have only one direction wait, either older → younger or younger → older. It is impossible to occur wait in cycle, so the dead lock is avoided.
    以上方法都是单方向的等待,
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值