4. 数据库管理系统

最新推荐文章于 2024-08-07 12:43:48 发布

林林丶

最新推荐文章于 2024-08-07 12:43:48 发布

阅读量1k

点赞数

本文链接：https://blog.csdn.net/qq_42364827/article/details/85683722

版权

文章目录

Content
4.1 The Components of DBMS Core
4.2 The Process Structure of DBMS
4.3 Database Access Management
4.5 Recovery
4.6 Concurrency Control 并发控制

Content

The Architecture of DBMS(数据库管理系统的结构)
1. The components of DBMS core(DBMS的内核组成结构)
2. The process structure of DBMS(DBMS运行状态下的进程结构)
Database Access Management(数据库访问管理，物理层）
Query Optimization
Transaction Management
1. Recovery
2. Concurrent Control

4.1 The Components of DBMS Core

在这里插入图片描述

4.2 The Process Structure of DBMS

Single process structure
Multi processes structure
Multi threads structure
Communication protocols between processes threads

Single process structure 单进程

在这里插入图片描述

Multi processes structure 多进程

在这里插入图片描述

多线程

在这里插入图片描述

4.3 Database Access Management

Access types
File organization
Index technique
Access primitives

Index Technique

B+ Tree
Clustering index(簇集索引)
Inverted file(栅格文件)
Dynamic hashing(动态哈希)
Grid structure file and partitioned hash function
Bitmap index (used in data warehouse)
Others

Query Optimization

The operation optimization of the tree

Decide the order of two joins
For every join operation, there are many computing method:
Nest Loop, R as O, S as I
Nest Loop, S as O, R as I
Nest Loop, use possible index on R as S
Merge Scan
Hash Join
This is so called algebra optimization. It takes a series of transform on original query expression, and transform it into an equivalent, most effective, form to be executed.
For example:

Query tree

The equivalent transform rules of relational algebra

Exchange rule of ∞/×: E1×E2≡E2×E1
连接和笛卡尔乘积的交换律
Combination rule of ∞/×: E1×(E2×E3)≡(E1×E2)×E3
结合律
Cluster rule of ∏: ∏A1···A(∏B1···Bn(E))≡∏A1···An(E), legal when A1···An is sub set of {B1···Bm}
投影的串接律
Cluster rule of σ: σF1(σF2(E))≡σF1ΛF2(E)
选择的串接律
Exchange rule of σ and ∏: σF(∏A1···An(E))≡∏A1···An(σF(E)), if F includes attributes B1···Bm which don’t belong to A1···An, then ∏A1···An(σF(E))≡∏A1···AnσF(∏A1···An, B1···Bn(E))
选择与投影的交换律
If the attributes in F are all the attributes in E1, then σF(E1×E2)≡σF(E1)×E2
if F in the form of F1ΛF2, and there are only E1’s attributes in F1, and there are only E2’s attributes in F2, then σF(E1×E2)≡σF(E1)×σF(E2)
if F in the form of F1ΛF2, and there are only E1’s attributes in F1, while F2 includes the attributes both in E1 and E2, then σF(E1×E2)≡σF2(σF1((E1)×E2)
σF(E1∪E2)≡σF(E1)∪σF(E2)
σF(E1-E2)≡σF(E1)-σF(E2)
Suppose A1···An is a set of attributes, in which B1···Bn are E1’s attributes, and C1···Ck are E2’s attributes, then ∏A1···An(E1×E2)≡∏B1···Bn(E1)×∏C1···Cn(E2))

Basic principles

The target of algebra optimization is to make the scale of the operands which involved in binary operations be as small as possible:
代数优化的目标是对用户提交的初始查询做改写，把它变成一个最优的形式，原则二元操作、连接操作的规模尽可能地小：

Push down the unary operations as low as possible
Look for and combine the common sub-expression

The Operation Optimization 操作优化

How to find a ‘good’ access strategy to compute the query improved by algebra optimization is introduced in this section:

Optimization of select operation 选择
Optimization of project operation 投影
Optimization of set operation 集合
Optimization of join operation 连接
Optimization of combined operation 组合

Optimization of join operation

Merge scan 归并扫描: order the relation R and S on disk in ahead, then we can compare their tuples in order, and both relation only need to scan one time. If R and S have not ordered in ahead, must consider the ordering cost to see if it is worth to use this method.
Using index or hash to look for mapping tuples:
hash join 哈希联结

4.5 Recovery

4.5.1 Introduction

The main roles of recovery mechanism in DBMS are:

Reducing the likeihood of failures 减少系统发生故障的可能(prevention 防)
Recover from failures (solving)
Restore DB to a consistent state after some failures
Redundancy is necessary 冗余是必须的
Should inspect all possible failures 监测所有可能的故障

Periodical dumping

Variation: Backup+Incremental dumping 增量存储
I.D.: updated parts of DB

Backup and Log

Log: record of all changes on DB since the last backup copy was made.
Change:

Old value	New value
before image B.I	after image A.I

While recovering:

Some transaction maybe half done, should undo them with B.I recorded in Log
Some transactions have finished but the results have not been written into DB in time, should redo them with A.I recorded in Log (finish writing into DB).
It is possible to recover DB to the most recent consistent state with Log.

4.5.3 Transaction

A transaction T is a finite sequence of actions on DB exhibiting the following effects:
一个事务T就是一组对数据库操作的集合，具有以下性质：ACID准则

Atomic action 原子性: Nothing or All
Consistency preservation 保持一致性: consistency state of DB→another consistency state of DB
Isolation 隔离性: concurrent transactions should run as if they are independent each other.
Unix隔离性更好，Windows隔离性差，会导致系统崩溃。
Durability 持久性: The effect of a successfullly completed transaction are permanently reflected in DB and recoverable even failure occurs later.
一个事务成功完成的影响是应该永久反应在数据库中，将来发生故障也是可恢复的

4.5.3 Some Structures to Support Recovery

Recovery information (such as Log) should be stored in nonvolatile storage 非挥发存储器. The following information need to be stored in order to support recovery:

Commit list 提交事务列表: list of TID which have been committed.
Active list: list of TID which is in progress
Log:

4.5.4 Commit Rule ang Log Ahead Rule

Commit Rule 提交规则
A.I must be writen to nonvolatile storage before commit of the transaction
事务提交之前，新值A.I必须写到非挥发存储器
Log Ahead Rule 先寄后写规则
If A.I is written to DB before commit then B.I must first written to Log.
Recovery strategies

undo(undo(undo—undo(x)—)) = undo(x) 老值还原
redo(redo(redo—redo(x)—)) = redo(x) 新值重做

Three kinds of update strategy

1. A.I→DB before commit 直接改数据库

TID→active list
B.I→Log
A.I→DB//Log Ahead Rule
···
TID→commit list
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit list	Active list
	√	Undo, delete TID from active list
√	√	delete TID from active list
√		nothing to do

2. A.I→DB after commit 直接改数据库

TID→active list
A.I→Log//Commit Rule
···
TID→commit list
A.I→DB
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit list	Active list
	√	delete TID from active list
√	√	redo, delete TID from active list
√		nothing to do

3. A.I →DB concurrently with commit

TID→active list
A.I, B.I→Log //two Rule
A.I → DB //partially done 后台线程执行
···
TID→commit list
A.I→DB //completed
delete TID from active list

The recovery after failure this situation

Check two lists for every TID while restarting after failure 重启动恢复

Commit list	Active list
	√	Undo, delete TID from active list
√	√	redo, delete TID from active list
√		nothing to do

Conclusion

	undo	redo
a	×	√
b	√	√
c	√	√
d	×	√

4.6 Concurrency Control 并发控制

4.6.1 Introduction
In multi users DBMS, permit multi transaction access the database concurrently.
在一个多用户关系系统中，允许多个事务同时运行

Why concurrency?

Improving system utilization response time.
Different transaction may access to different parts of database.

Problems arise from concurrent executions

更新丢失
脏读
不可重复读
So there maybe three kinds of conflict when transactions execute concurrently. They are write – write, write – read, and read – write conflicts. Write – write conflict must be avoided anytime. Write – read and read – write conflicts should be avoided generally, but they are endurable in some applications.

4.6.2 Serialization — the criterion for concurrency consistency

Definition: suppose {T1,T2,…Tn} is a set of transactions executing concurrently. If a schedule of {T1,T2,…Tn} produces the same effect on database as some serial execution of this set of transactions, then the schedule is serializable.
Problem: different schedule → different equivalent Principles of Database Systems, Xu Lizhen 43 serial execution → different result? (yes, n!)
在这里插入图片描述
The result of this schedule is the same as serial execution TA →TB →TC, so
it is serializable. The equivalent serial executiont is TA→TB→TC .

4.6.3 Locking Protocol

Locking method is the most basic concurrency control method. There maybe many kinds of locking protocols.

X locks

Only one type of lock, for both read and write.
Compatibility matrix :
NL－no lock
X－X lock
Y －compatible
N－incompatible

	NL	X
NL	Y	Y
X	Y	N

在这里插入图片描述

Two Phase Locking

Definiton1: In a transaction, if all locks precede all unlocks, then the transaction is called two phase transaction. This restriction is called two phase locking protocol.
在一个事务中，所有的加锁请求都在所有的锁释放之前，这个事务就是一个两阶段事务。这种限制就叫两段加锁协议。
Definition2: In a transaction, if it first acquires a lock on the object before operating on it, it is called well-formed
合式的
Theorem: If S is any schedule of well-formed and two phase transactions,
then S is serializable.

Conclusions

Well-formed + 2PL : serializable
Well-formed + 2PL + unlock update at EOT: serializable and recoverable. (without domino phenomena)
Well-formed + 2PL + holding all locks to EOT: strict two phase locking transaction.

(S, X) locks

S lock - if read access is intended
X lock - if update access is intended

	NL	S	X
NL	Y	Y	Y
S	Y	Y	N
X	Y	N	N

(S, U, X) locks

U lock — update lock.
For an update access the transaction first acquires a U-lock and then promote it to X-lock.
Purpose: shorten the time of exclusion, so as to boost concurrency degree, and reduce deadlock.

	NL	S	U	X
NL	Y	Y	Y	Y
S	Y	Y	Y	N
U	Y	Y	N	N
X	Y	N	N	N

三种锁协议

4.6.5 Deadlock & Live Lock

Dead lock: wait in cycle, no transaction can obtain all of resources needed to complete.
Live lock: although other transactions release their resource in limited time, some transaction can not get the resources needed for a very long time.

Live lock is simpler, only need to adjust schedule strategy, such
as FIFO 先后顺序排队
Deadlock: (1) Prevention(don’t let it occur); (2) Solving(permit it
occurs, but can solve it)

Deadlock Detection 死锁检测

Timeout: If a transaction waits for some specified time then deadlock is assumed and the transaction should be aborted.
当事务的等待时间超过规定时间，就被认为发生死锁，事务就被删除。
Detect deadlock by wait-for graph G=<V,E> 等待图
V : set of transactions {Ti|Ti is a transaction in DBS (i=1,2,…n)} 顶点集合
E : {<Ti,Tj>|Ti waits for Tj (i ≠ j)} 边集合
If there is cycle in the graph, the deadlock occurs. 出现环路，出现死锁

When to detect? 检查时机

whenever one transaction waits. 加新的边时
periodically 周期性

What to do when detected?

Pick a victim (youngest, minimum abort cost, …) 在循环等待链上选一个牺牲者
Abort the victim and release its locks and resources 舍弃牺牲者，释放锁和资源
Grant a waiter 等待者执行
Restart the victim ( automatically or manually) 重启牺牲者

Deadlock avoidance

Requesting all locks at initial time of transaction. 在事务的初始时间请求所有锁
Requesting locks in a specified order of resource. 排序
Abort once conflicted. 理论上可行，实用度低
Transaction Retry 事务重置
Every transaction is uniquely time stamped. If TA requires a lock on a data object that is already locked by TB, one of the following methods is used:
给每个事务安排一个时间戳，当事务TA对某个数据对象申请一个锁，但已经被TB占有，方法如下：

Wait-die: TA waits if it is older than TB, otherwise it “dies”, i.e. it is aborted and automatically retried with original timestamp.
等待死亡法：TA比TB年龄大，TA等待，否则死亡，即把自己滚回，然后自动地按原时间戳重新运行
Wound-wait: TA waits if it is younger than TB, otherwise wound” TB, i.e. TB is aborted and automatically retried with original timestamp.
击伤等待法：TA比TB年轻，TA等待，否则他把TB击伤，即TB退出，休眠一会儿然后用原时间戳运行
In above, both have only one direction wait, either older → younger or younger → older. It is impossible to occur wait in cycle, so the dead lock is avoided.
以上方法都是单方向的等待，