文章目录
7.1 Introduction
What is DDB?
A DDB is a collection of correlated data which are spread across a network and managed by a software called DDBMS.
相关数据的集合,相关数据分布在网络的不同节点上,并且由分布式数据库管理系统
Two kinds:
- Distributed physically, centralized logically (general DDB)
- Distributed physically, distributed logically too (FDBMS)
We take the first as main topic in the course.
Features of DDBS
- Distribution
- Correlation
- DDBMS
The advantages of DDBS
- Local autonomy 局部自治性
- Good availiability (because support multi copies) 可用性好
- Good flexibility 灵活
- Low system cost 成本低
- parallel process
The disadvantages of DDBS
- hard to intergrate existing databases
- Too complex
The main problems in DDBS
- Query Optimization (different optimization goal)
- Concurrency control (should consider whole network)
- Recovery mechanism (all sub-transactions must commit or abort simultaneously)
Another problem specially for DDBS
- Data distribution
Data Distribution
- Centralized 集中
- Partitioned 划分
- Replicated 全复制
- Hybird (mix of the above)
Comparison of four strategies
Unit of Data Distribution
- According to relation (or file), that means non partition 以关系为单位
- According to Fragments 以裂片为单位
- Horizontal fragmentation: tuple partition 水平分割,元组划分,例如院系划分
- Veritical fragmentation: attribute partition 垂直分割,属性分割
- Mixed fragmentation: both
The criteria of fragmentation 原则
- Completeness: every tuple or attribute must has its reflection in some fragments 完整性
- Reconstruction: should be able to reconstruct the original global relation 可重构性
- Disjointness: for horizontal fragmentation 不相交性
Problem cause by Data Distribution
- Multi copies’ consistency 多副本的一致性问题
- Distribution consistency 分布一致性
- Redistribution 重新分布
- Piggybacking 背回
- Translation of Global Queries to Fragment Queries and Selection of Physical Copies 裂片查询和物理副本选择
- Design of Database Fragments and Allocation of Fragments 裂片设计
Federated Database
- In practical applications, there are strong requirements for solving the integration of multi existing, distributed and hererogeneous databases.
- The database system in which every member is autonomic and collaborate each other based on negotiation — federated database system
每个联邦成员都是自治的并且通过协商后互相合作 ,这样的数据库系统称为联邦式数据库 - No global schema in federated database system, every federated member keeps its own data schema
在联邦式数据库中没有全局模式来让每个成员保持自身的数据模式 - The member negotiate each other to decide respective input/output schema, then, the data sharing relations between each other are established.
- 成员通过协商决定各自的输入和输出模式
The schema structure in federated database System
7.4 Query Optimization in DDBMS
- Optimization goal: minimize the transmission cost on network
优化目标:减少网上传送量 - Algebra optimization 代数优化
- Translation of global queries to fragment queries and selection of physical copies 全局关系的查询转化为物理副本的选择
- Query Decomposition
- Global query plan
An example of global query optimization
Global query optimization may get an execution plan baseed on cost estimation, such as:
- send R2 to site 1, R’
- execute on site1:
Select *
From R1, R’
WHERE R1.a = R’.b;
7.5 Recovery Mechanisim in DDBMS
- The basic principle is the same as that in centralized DBMS
- Distributed transactions: the key of distributed transaction management is how to assure all sub-transactions either commit together or abort together.
分布式事务: - Realize the sub-transactions’ harmony with each other relies on communication, while the communication is not reliable.
- Two phase commitment protocol 两段提交协议
- Combination of failures 故障的组合问题
7.6 Concurrency Control in DDBMS
- The basic principle is the same as that in centralized DBMS, demand concurrent transactions to scheduled serializably
- Because of multi copies, need locking globally
- Communication overhead
- Global deadlock