数据库原理及应用（详细）

JAINGけんそう

已于 2022-12-14 15:48:08 修改

阅读量6.4k

点赞数 12

文章标签：数据库 sql mysql

于 2022-12-14 15:48:05 首次发布

本文链接：https://blog.csdn.net/weixin_65244004/article/details/128316953

版权

数据库原理及应用

Database Principles and Applications

-JAING

第一章绪论：

1.1术语

1.数据（Data）

数据是数据库中存储的基本对象。定义为描述事务的符号记录。数据的含义称为语义，数据与其语义是不可分的。（Data is the basic object stored in the database. Defined as a symbolic record that describes a transaction. The meaning of data is called semantics, and data and its semantics are inseparable.）

2.数据库（DataBase，DB）

长期存储在计算机内，有组织的，可共享的大量数据的集合。

（An organized and shareable collection of large amounts of data stored in a computer for a long time.）

数据库基本特征：1.永久存储 2.有组织 3.可共享 4.冗余度小 5.易扩展

（1. Permanent storage 2. Organized 3. Shareable 4. Low redundancy 5. Easy to expand）

数据库管理系统（DataBase Management System,DBMS）

DBMS是C/S的一个范例

位于用户与操作系统之间(数据库)的一层数据管理软件。

（A layer of data management software between the user and the operating system.）

数据库管理系统和操作系统一样是计算机的基础软件。

（The database management system, like the operating system, is the basic software of the computer.）

主要功能：提供数据定义语言（DDL）、数据操纵语言（DML）等

（Provide data definition language (DDL), data manipulation language (DML), etc）

4.数据库系统（DataBase System,DBS）

数据库系统由数据库（DB），数据库管理系统(DBMS)，应用程序和数据库管理员(DBA)组成的存储、管理、处理和维护数据的系统。

（A database system is a system composed of database (DB), database management system (DBMS), application program and database administrator (DBA) to store, manage, process and maintain data.）

1.2重要概念

数据库管理技术的发展过程（三个阶段）

1.人工管理阶段（Manual management stage）：

在计算机出现之前，人们运用常规的手段从事记录、存储和对数据加工，也就是利用纸张来记录和利用计算工具（算盘、计算尺）来进行计算，并主要使用人的大脑来管理和利用这些数据。

（Before the advent of computers, people used conventional means to record, store and process data, that is, use paper to record and use computing tools (abacus, slide rule) to calculate, and mainly use human brains to manage and use these data.）

特点:

计算机系统不提供对用户数据的管理功能；

（The computer system does not provide the management function of user data）

数据不能共享；

（Data cannot be shared）

不保存数据。

（Do not save data）

数据与程序不具有独立性。

（Data and program are not independent）

2.文件系统阶段（File System Phase）：

在这一阶段（20世纪50年代后期至60年代中期）计算机不仅用于科学计算，还利用在信息管理方面。随着数据量的增加，数据的存储、检索和维护问题成为紧迫的需要，数据结构和数据管理技术迅速发展起来。此时，外部存储器已有磁盘、磁鼓等直接存取的存储设备。软件领域出现了操作系统和高级软件。操作系统中的文件系统是专门管理外存的数据管理软件，文件是操作系统管理的重要资源之一。（At this stage (from the late 1950s to the mid 1960s), computers were not only used for scientific computing, but also used in information management. With the increase of data volume, the problem of data storage, retrieval and maintenance has become an urgent need. Data structure and data management technology have developed rapidly. At this time, the external memory has disk, magnetic drum and other direct access storage devices. Operating system and advanced software appeared in the software field. The file system in the operating system is the data management software specially managing the external storage, and the file is one of the important resources of the operating system management.）

特点:

数据以“文件”形式可长期保存在外部存储器的磁盘上。由于计算机的应用转向信息管理，因此对文件要进行大量的查询、修改和插入等操作，这些操作由文件系统提供。

（The data can be stored on the disk of external storage for a long time in the form of "file". As the application of computers turns to information management, a large number of operations such as querying, modifying and inserting files are required, which are provided by the file system.）

缺点: 文件采用字符流读取; 数据冗余; 不一致性（需要手动确保DB的一致性）; 数据独立性差; 权限访问控制能力弱。

（The file is read by character stream; Data redundancy; Inconsistency (you need to manually ensure the consistency of the DB); Poor data independence; Weak access control capability.）

数据库系统阶段（Database system stage）：

20世纪60年代后期以来，计算机性能得到进一步提高，更重要的是出现了大容量磁盘，存储容量大大增加且价格下降。在此基础上，才有可能克服文件系统管理数据时的不足，而满足和解决实际应用中多个用户、多个应用程序共享数据的要求，从而使数据能为尽可能多的应用程序服务，这就出现了数据库这样的数据管理技术。

（Since the late 1960s, the performance of computers has been further improved. More importantly, large capacity disks have emerged, which greatly increase the storage capacity and reduce the price. On this basis, it is possible to overcome the shortcomings of file system in managing data, and meet and solve the requirements of data sharing among multiple users and applications in practical applications, so that data can serve as many applications as possible. This is the data management technology such as database.）

特点:

（1）数据结构化。（Data structure）

（2）数据共享性高、冗余少且易扩充。（High data sharing, less redundancy and easy expansion）

（3）数据独立性高。（High data independence）

数据由DBMS统一管理和控制。（Data is uniformly managed and controlled by DBMS）

数据的最小存取单位是数据项。（The minimum access unit of data is data item.）

数据模型（Data Model）

一、数据模型的分类

1.概念模型（conceptual model）

按用户的观点来对数据和信息建模，主要用于数据库设计。

（Model data and information according to users' views, mainly for database design.）

概念模型的表示方法很多，最常用的为实体-联系方法(Entity-Relationship approach），该方法用E-R图来描述概念模型。E-R方法也成为E-R模型。

(There are many ways to express conceptual models, the most commonly used is the Entity Relationship approach, which uses E-R diagrams to describe conceptual models. E-R method also becomes E-R model.)

2.逻辑模型和物理模型(Logical model and physical model)

逻辑模型主要包括层次模型、网状模型、关系模型、面向对象模型和对象关系数据模型等，主要用于数据库管理系统的实现。

(The logical model mainly includes hierarchical model, mesh model, relational model, object-oriented model and object relational data model, etc. It is mainly used for the implementation of database management system.)

物理模型是对数据最底层的抽象，它描述数据在系统内部的表示方法和存取方法，在磁盘或磁带上的存储方式和存取方法，是面向计算机系统的。

(Physical model is the lowest abstraction of data. It describes the representation and access methods of data in the system, and the storage and access methods on disk or tape. It is computer oriented.)

二、数据模型的组成要素

数据模型通常由数据结构、数据操作和完整性约束三个部分组成。

(Data model usually consists of data structure, data operation and integrity constraint.)

数据结构描述数据库的组成对象以及对象之间的联系，通常按其数据结构的类型来命名数据模型，例如层次结构、网状结构和关系结构的数据模型分别命名为层次模型、网状模型、关系模型。

(Data structure describes the constituent objects of a database and the relationships between objects. Data models are usually named according to their data structure types, such as hierarchical model, mesh model, and relational model.)

数据操作是指对数据库中各种对象的实例允许执行的操作的集合，包括操作及其有关的规则，主要分为查询和更新（插入、删除、修改）两大类操作。

(Data operation refers to the set of operations allowed to be performed on instances of various objects in the database, including operations and related rules. It is mainly divided into two types of operations: query and update (insert, delete, modify).)

完整性约束条件是给定的数据模型中数据及其联系所具有的制约和依存规则，在关系模型中体现为实体完整性和参照完整性。例如，某大学的数据库中规定学生成绩如果有6门以上不及格则不能授予学士学位，教授的退休年龄是65周岁等。

(Integrity constraints are the constraints and dependency rules of data and its relationships in a given data model, which are embodied in entity integrity and reference integrity in the relational model. For example, the database of a university stipulates that if more than 6 students fail in their grades, they cannot be awarded a bachelor's degree, and the retirement age of professors is 65.)

逻辑模型的分类（非关系模型与关系模型）

非关系模型：层次模型(Hierarchical Model)、网状模型(Network Model)

(Classification of logical models (non relational models and relational models)

Non relational model: hierarchical model, network model)

关系模型(Relational Model)

层次模型，满足下面两个条件的基本层次联系的集合为层次模型（就是树结构）：

(Hierarchical model, the collection of basic hierarchical relationships that meet the following two conditions is a hierarchical model (namely, a tree structure):)

①有且只有一个结点没有双亲结点，这个结点称为根结点

②根以外的其它结点有且只有一个双亲结点

网状模型，满足下面两个条件的基本层次联系的集合（一对多关系，结构上像有向图）：

①允许一个以上的结点无双亲；

②一个结点可以有多于一个的双亲。

关系模型术语：

1.关系：一个关系对应通常说的一张表。

2.属性：表中的一列即为一个属性。

3.域：属性的取值范围。

4.元组：表中的一行即为一个元组。

5.码：也称码键。表中的某个属性组，它可以唯一确定一个元组。

6.分量：元组中的一个属性值。

关系模式：对关系的描述，一般表示为关系名（属性1，属性2，.....，属n）

(1. Relationship: A relationship corresponds to a table.

2. Attribute: a column in the table is an attribute.

3. Field: property value range.

4. Tuple: A row in the table is a tuple.

5. Code: also called code key. An attribute group in a table that uniquely determines a tuple.

6. Component: an attribute value in a tuple.

7. Relationship mode: the description of the relationship, generally expressed as the relationship name (attribute 1, attribute 2,....., attribute n))

画E-R图：

实体-联系方法（Entity-Relationship Approach）是概念模型的一种表示方法，其提供了表示实体型、属性和联系的方法。

(Entity Relationship Approach is a representation method of conceptual model, which provides a method to represent entity types, attributes and relationships.)

实体型(entity)：用矩形表示，矩形框内写明实体名

属性(attribute)：用椭圆表示，并用无向边将其与相应的实体型连接起来

联系(contact)：用菱形表示，菱形框内写明联系名，并用无向边分别与有关实体型连接起来，同时标明联系的类型（1:1， 1：n， m：n）

比如有两个简单实体，分别是单位和职员，它们的关系就是1:m，一个单位可以有多个职员；

再比如有两个实体，分别是学生和图书，它们的关系就是m:n，从而形成中间表学生借阅的图书，结果是1本图书可以被多人借阅，1个人也可以借多本图书。

数据库系统结构:

数据库系统的三级模式结构(Three level model structure)：模式（Schema）、外模式（External Schema）、内模式（Internal Schema）

模式（也称逻辑模式）：

①数据库中全体数据的逻辑结构和特征的描述

②所有用户的公共数据视图，综合了所有用户的需求

模式的地位：是数据库系统模式结构的中间层

①与数据的物理存储细节和硬件环境无关

②与具体的应用程序、开发工具及高级程序设计语言无关

(Mode (also called logic mode):

① Description of the logical structure and characteristics of all data in the database

② Public data view for all users, integrating the needs of all users

The status of schema: it is the middle layer of schema structure of database system

① Independent of physical storage details of data and hardware environment

② Independent of specific applications, development tools and high-level programming languages)

外模式（也称子模式或用户模式）：

①数据库用户（包括应用程序员和最终用户）使用的局部数据的逻辑结构和特征的描述

②数据库用户的数据视图，是与某一应用有关的数据的逻辑表示

外模式的用途：①保证数据库安全性的一个有力措施 ②每个用户只能看见和访问所对应的外模式中的数据

(External mode (also called sub mode or user mode):

① Description of logical structure and characteristics of local data used by database users (including application programmers and end users)

② The data view of a database user is a logical representation of data related to an application

Use of external mode: ① A powerful measure to ensure database security ② Each user can only see and access the data in the corresponding external mode)

内模式（也称存储模式）：

①是数据物理结构和存储方式的描述

②是数据在数据库内部的表示方式：

a. 记录的存储方式（顺序存储，按照B树结构存储，按hash方法存储）

b. 索引的组织方式

c. 数据是否压缩存储

d. 数据是否加密

e. 数据存储记录结构的规定

(Internal mode (also called storage mode):

① It is a description of the physical structure and storage mode of data

② It is the internal representation of data in the database:

a. Storage mode of records (sequential storage, B-tree structure storage, hash method storage)

b. How indexes are organized

c. Whether the data is compressed and stored

d. Whether the data is encrypted

e. Provision of data storage record structure)

注意：1. 物理模式（内模式）与概念模式（逻辑模式）存在映射关系，外模式可以根据概念模式的基表算出来。

(1. There is a mapping relationship between the physical mode (internal mode) and the conceptual mode (logical mode), and the external mode can be calculated according to the base table of the conceptual mode.)

当数据库的存储结构发生了改变，由DBA对模式/内模式映像作出相应的改变，可以使模式保持不变，从而保证了数据的物理独立性。

When the storage structure of the database changes, the DBA can make corresponding changes to the schema/internal schema image to keep the schema unchanged, thus ensuring the physical independence of the data.)

保证数据的独立性，需要修改模式与外模式之间的映射。

(To ensure the independence of data, it is necessary to modify the mapping between schemas and external schemas.)

DBS的两成映像技术：

DBMS在三级结构之间提供了两层映像，即外模式/模式映像，模式/内模式映像，正是两层映像保证了DB中的数据具有较高的数据逻辑独立性和物理独立性。

(The binary image technology of DBS:

The DBMS provides two layers of images between the three levels of structure, namely, the outer mode/mode image and the mode/inner mode image. It is the two layers of images that ensure that the data in the DB has high logical and physical independence.)

小结：

数据库系统(DBS):

1.DBMS提供高级的用户接口

(DBMS provides advanced user interface)

2.支持有效的查询处理与优化

(Support effective query processing and optimization)

3.支持有效的目录管理

(Support effective directory management)

4.支持有效的并发控制与恢复功能

(Support effective concurrency control and recovery functions)

5.支持完整性约束

(Support integrity constraints)

6.完善访问控制

(Improve access control)

数据库管理系统(DBMS)主要功能:

数据库定义与创建 (Database definition and creation)
数据组织存储与管理 (Data organization, storage and management)
数据库事务管理与运行管理 (Database transaction management and operation management)
数据存取 ( Data access)
数据库建立与维护( Database establishment and maintenance)

DBMS的工作过程：建立-使用-维护(Build- Use- Maintain)

数据库的三级模式结构：外模式-概念模式-内模式

(External mode - conceptual mode - internal mode)

作用：1.使用户能逻辑的抽象处理数据，不必关心数据在计算机中的具体的表示方法与存储结构。

(The user can process data logically and abstractly, without caring about the specific representation method and storage structure of the data in the computer)

开发者值需要关注整个结构中的其中一层。

(Developer value needs to focus on one layer of the whole structure.)

利于标准化

(It is conducive to standardization)

利于逻辑的复用

(Facilitate logic reuse)

降低层与层之间的依赖

(Reduce layer to layer dependency)

层次易替换

(Easy to replace layers)

第二章：关系模型（The Relational Model）

关系模型：

由之前的内容提到，关系模型是由关系数据结构、关系操作集合、关系完整性约束三部分组成的，所以要了解对于关系模型这三部分的含义。

(The relational model consists of three parts: relational data structure, relational operation set and relational integrity constraint.)

关系数据模型最初是由Edward Codd在1970年的一篇论文中提出的。

关系模型具有良好的理论基础，这是其他一些模型所缺乏的。

(The model has a good theoretical foundation, which is lacking in other models.)

关系模型的目标是：允许高度的数据独立性。用户与数据库的交互不得受到数据内部视图更改的影响，尤其是记录顺序和访问路径。为处理数据语义、一致性和冗余问题提供坚实的基础。Codd的论文介绍了规范化关系的概念。启用面向集合的数据操作语言。

(The goal of the model is to allow a high degree of data independence. The user's interaction with the database should not be affected by the change of the internal view of the data, especially the record order and access path. Provide a solid foundation for dealing with data semantics, consistency and redundancy. Codd's paper introduces the concept of standardized relation. Enable set-oriented data manipulation language.)

关系数据模型的历史：

关系模型最重要的实现之一是IBM在20世纪70年代后期开发的SystemR。

SystemR旨在作为“概念证明”，以表明关系数据库系统确实可以构建并有效工作。它带来了两大发展：一种称为SQL的结构化查询语言，现已成为ISO标准和事实上的标准关系语言。20世纪80年代生产的各种商业关系型DBMS产品，如DB2、SQL/DS和ORACLE。现在有几百个关系数据库系统。它们是商业和开源的。

关系：关系模型基于关系的数学概念。

由于科德是一位数学家，他使用了该领域的术语，特别是集合论和逻辑学。我们不会这样处理这些概念，但我们需要理解与关系数据模型相关的术语关系表示为包含行和列的二维表（很像电子表格）。关系用于保存有关要在数据库中表示的实体的信息。这些行对应于单个记录。列对应于属性或字段。属性的顺序并不重要。它们可以以任何顺序出现，并且关系将保持不变。

关系的类型：

基本表：基本表是实际存在的表，是实际存储数据的逻辑表示。

(A basic table is an actual table and a logical representation of the actual stored data.)

查询表：查询表示查询结果对应的表，或查询中生成的临时表。

(The query represents the table corresponding to the query result, or the temporary table generated in the query)

视图表：视图表示有基本表或者其他视图表导出的表，是虚拟表。不实际的储存数据。

(A view represents a table exported from a basic table or other view tables. It is a virtual table. Unreal stored data)

关系属性：

关系的名称是唯一的。–即没有两个关系可以具有相同的名称。

属性的名称仅在其关系中是唯一的。–因此，我们可以在单独的关系中有两个名为Name的属性，但不能在同一关系中。

属性的值都来自同一个域。例如，我们不应允许在工资栏中显示邮政编码。

关系中属性的顺序没有意义。–即，如果我们对关系的列重新排序，它不会成为不同的关系。

关系中的行顺序没有意义。–即，如果我们重新排列关系的行，它不会成为不同的关系。

关系的每个单元格最多应包含一个值。–例如，我们不能在同一个手机中存储两个电话号码。

关系中的记录都应该是不同的。–也就是说，如果我们检查每一行中的值，那么没有两行的值应该完全相同（没有重复）。

–因此，关系中的两行应该至少有一个属性的值不同。

(The two rows in the relationship should have different values of at least one attribute.)

关系数据结构及形式化定义:

关系模型是建立在集合代数的基础上的，因此从集合论角度给出关系数据结构的形式化定义。

域（Domain）：域是一组具有相同数据类型的值的集合，例如整数、自然数都是域。(A domain is a set of values with the same data type.)

2.笛卡尔积（Cartesian Product）：笛卡尔积是域上面的一种集合运算。

给定一组域D1，D2，…，Dn，允许其中某些域是相同的。 D1，D2，…，Dn的笛卡尔积为：

笛卡尔积中每个元素(d1,d2,…dn)称为一个n元组或简称元组

笛卡尔积可表示为一个二维表，表中每行对应一个元组，表中每一列的值来自一个域。(Cartesian product can be expressed as a two-dimensional table, in which each row corresponds to a tuple, and the value of each column in the table comes from a domain.)

例如，给出3个域：

D1=导师集合SUPERVISOR=｛张清玫，刘逸｝

D2=专业集合SPECIALITY=｛计算机专业，信息专业｝

D3=研究生集合POSTGRADUATE=｛李勇，刘晨，王敏｝

D1，D2，D3的笛卡尔积为

3.关系（Relation）

D1×D2×…×Dn的子集叫作在域D1，D2，…，Dn上的关系，

表示为R（D1，D2，…，Dn）

R：关系名

n：关系的目或度（Degree）

关系也是一个二维表，每行对应一个元组，每列对应一个域，每列称为属性

候选键(Candidate key)：若关系中的某一属性组的值能唯一地标识一个元组，则称该属性组为候选键（不含多余属性的超键）。(If the value of an attribute group in a relationship can uniquely identify a tuple, the attribute group is called a candidate key.)

若一个关系有多个候选键，则选定其中一个或多个为主键（Primary key）

候选键的多个属性称为主属性（Prime attribute），不包含在任何候选键中的属性称为非主属性；最简单的情况下，候选键只包含一个属性，如果关系模式的所有属性都是这个关系的候选键，则称为全键（All-key）

(Multiple attributes of candidate keys are called prime attributes, and attributes not included in any candidate keys are called non-prime attributes; In the simplest case, a candidate key contains only one attribute. If all attributes of a relational schema are candidate keys of this relationship, it is called All-key.)

关系操作：

关系模型中常用的关系操作包括查询（Query）操作和插入（Insert）、删除（Delete）、修改（Update）操作这两大部分。

查询操作又分为：选择（Select）、投影（Project）、连接（Join）、除（Divide）、并（Union）、差（Except）、交（Intersection）、笛卡尔积等；其中：选择、投影、并、差、笛卡尔积是5种基本操作，其他操作是可以用基本操作来定义和导出的。(Selection, projection, union, difference and Cartesian product are five basic operations.)

关系的完整性:

关系模型的完整性规则是对关系的某种约束条件，关系模型中有三类完整性约束：

实体完整性（Entity Integrity）：关系模型以主键作为唯一性标识，主键中的属性即主属性不能取空值。(The model takes the primary key as the unique identifier, and the attribute in the primary key, that is, the primary attribute, cannot be null.)

参照完整性（Referential Integrity）：外键(Foreign Keys)

用户自定义完整性（User-defined Integrity）：反映某一具体应用所涉及的数据必须满足的语义要求。(Reflect the semantic requirements that the data involved in a specific application must meet.)

MYSQL中数据的完整性：

1.mysql中的外键约束：MySQL 外键约束（FOREIGN KEY）用来在两个表的数据之间建立链接，它可以是一列或者多列。一个表可以有一个或多个外键。

外键对应的是参照完整性，一个表的外键可以为空值，若不为空值，则每一个外键的值必须等于另一个表中主键的某个值。

外键是表的一个字段，不是本表的主键，但对应另一个表的主键。定义外键后，不允许删除另一个表中具有关联关系的行。

外键的主要作用是保持数据的一致性、完整性。例如，部门表 tb_dept 的主键是 id，在员工表 tb_emp5 中有一个键 deptId 与这个 id 关联。

主表（父表）：对于两个具有关联关系的表而言，相关联字段中主键所在的表就是主表。

从表（子表）：对于两个具有关联关系的表而言，相关联字段中外键所在的表就是从表。

选取设置 MySQL 外键约束的字段

定义一个外键时，需要遵守下列规则：

1.父表必须已经存在于数据库中，或者是当前正在创建的表。如果是后一种情况，则父表与子表是同一个表，这样的表称为自参照表，这种结构称为自参照完整性。

必须为父表定义主键。

2.主键不能包含空值，但允许在外键中出现空值。也就是说，只要外键的每个非空值出现在指定的主键中，这个外键的内容就是正确的。

在父表的表名后面指定列名或列名的组合。这个列或列的组合必须是父表的主键或候选键。

外键中列的数目必须和父表的主键中列的数目相同。

外键中列的数据类型必须和父表主键中对应列的数据类型相同。

在数据表中创建外键使用 FOREIGN KEY 关键字，具体的语法规则如下：

其中：外键名为定义的外键约束的名称，一个表中不能有相同名称的外键；字段名表示子表需要添加外健约束的字段列；主表名即被子表外键所依赖的表的名称；主键列表示主表中定义的主键列或者列组合。

级联删除（Cascaded Deletion）：如果父表中的记录被删除，则子表中对应的记录自动被删除。

死锁(Deadly Embrace)：两个事务都持有对方需要的锁，并且在等待对方释放，并且双方都不会释放自己的锁。（可以通过添加表来摆脱死锁）

产生死锁的四个必要条件:

（1）互斥条件：一个资源每次只能被一个进程使用。

（2）请求与保持条件：一个进程因请求资源而阻塞时，对已获得的资源保持不放。

（3）不剥夺条件:进程已获得的资源，在末使用完之前，不能强行剥夺。

（4）循环等待条件:若干进程之间形成一种头尾相接的循环等待资源关系。

表约束：①主键约束，②非空约束，③唯一性约束，④外键约束，⑤用户自定义约束（检查约束）–MySQL不支持

键(key)：我们需要能够通过属性值唯一地标识关系中的每一行。这使得特定行能够被检索或与其他记录相关联。为此，我们使用关系键。它们由从相关关系中选择的一个或一组属性组成。

(We need to be able to uniquely identify each row in the relationship by the attribute value. This enables specific rows to be retrieved or associated with other records. To do this, we use relational keys. They consist of one or a set of attributes selected from related relationships.)

超键(Superkeys):在关系中能唯一标识元组的属性集称为关系模式的超键。

(Superkeys: The set of attributes that can uniquely identify tuples in a relationship is called the hyperkeys of the relational schema.)

超键的问题：它们可能包含唯一标识不严格要求的属性。

例如: –(Bno，Street，Area)街道和区域不是必需的。 –(Bno，邮政编码，电话号码)邮政编码和电话号码不是必需的。在这种情况下，很明显，单个属性Bno足以识别任何特定的记录(实际上在这种情况下，这种识别是包括Bno属性的主要原因)。

(Problems with super keys: They may contain attributes that are not strictly required to uniquely identify them.)

For example:–(bno, Street, Area) Streets and areas are not necessary. –(BNO, postal code, telephone number) Postal code and telephone number are not necessary. In this case, it is obvious that a single attribute Bno is sufficient to identify any specific record (actually, in this case, this identification is the main reason for including Bno attribute).)

候选键(Candidate key)：不含有多余属性的超键称为候选键。也就是在候选键中，若再删除属性，就不是键了。(Candidate key: Superkeys that do not contain redundant attributes are called candidate keys. That is, in the candidate keys, if the attribute is deleted again, it will not be a key.)

候选键的属性：独特性:–在R的每一行中，K的值唯一地标识该行。（换句话说:没有两行R可以有相同的k值。）

(Uniqueness:–In each row of R, the value of k uniquely identifies the row. (In other words: No two rows of R can have the same k value. ))

不可约性:–K的任何子集都不具有唯一性，因此，K不能包含更少的属性。

(Irreducibility: No subset of–k is unique, therefore, k cannot contain fewer attributes.)

一些关系可能有几个候选键。

(Some relationships may have several candidate keys.)

主键(primary key): 用户选作元组标识的一个候选键程序主键。对于数据库中的每个关系，我们必须选择它的一个候选键作为它的主键。因为一个关系不能有重复的行(根据定义)，所以总是可以唯一地标识每一行。这意味着理论上每个关系至少有一个候选键。因此，总是可以找到主键。在最坏的情况下，整个属性集可以作为主键，但通常一些较小的子集就足够了。然而，由于许多数据库系统允许关系包含副本，这一理论上的特性在实践中并不一定适用。如果找不到自然主键，我们通常最好引入一个人工键属性。

(A candidate key program primary key selected by the user as the tuple identifier. For each relationship in the database, we must choose one of its candidate keys as its primary key. Because a relationship cannot have duplicate rows (by definition), each row can always be uniquely identified. This means that theoretically each relationship has at least one candidate bond.)

外键(foreign key)：如果关系模式R中属性K是其它模式的主键，那么k在模式R中称为外键。

(If the attribute K in relational schema R is the primary key of other schemas, then K is called a foreign key in schema R.)

如图：

关系数据库模式(Relational Database Schemas):

为了表示关系数据库的概念模式，我们对每个关系使用以下符号: 关系名(属性1，属性2，…，属性n) 换句话说，我们写下关系的名称，然后是它包含的属性的名称列表。 主键属性带有下划线。外键属性应该使用一些有区别的特征来显示，例如虚线下划线。

Nulls(空值)：假设我们将查看关系定义为: 查看(Pno、Rno、日期、时间、评论) “注释”字段旨在查看后填写。因此，当最初安排观看时，字段中没有要存储的数据。 为了表示这种最初的信息缺乏，我们可以使用一个空值。使用空值是一个有争议的问题: –Codd将空值的使用视为关系模型不可或缺的一部分。 –然而，其他人认为空值破坏了模型坚实的理论基础。 –一般来说，信息缺失的问题还没有被完全理解，空值是处理该问题的一种方法，尽管它们可能不是最好的方法。(To indicate this initial lack of information, we can use a null value.)

完整性约束：

关系完整性(Relational Integrity):关系数据模型包含许多适用于我们创建的关系的完整性约束。两个主要的完整性规则是: 实体完整性规则 ,参照完整性规则。这两个规则都依赖于空值的概念。 null表示当前未知的属性值，或者不适用于特定记录的属性值。空值提供了一种处理不完整或异常数据的方法。 null不同于零值、空字符串或用空格填充的字符串。 –所有后者都是值，而null表示没有值。

实体完整性(entity integrity)：实体完整性是对关系中的记录唯一性，也就是主键的约束。准确地说，实体完整性是指关系中的主属性值不能为Null且不能有相同值。定义表中的所有行能唯一的标识, 实体完整性规则: 在关系中，主键的任何属性都不能为空。主键是用于唯一标识记录的最小超键。 –这意味着主数据库的任何子集都不足以提供唯一的标识。如果我们允许主键的任何部分为空，就意味着唯一标识不需要所有的属性。

(Entity integrity is a constraint on the uniqueness of records in a relationship, that is, the primary key. To be precise, entity integrity means that the main attribute values in a relationship cannot be Null and have the same value. Define that all rows in a table can be uniquely identified. entity integrity rule: in a relationship, no attribute of the primary key can be blank. A primary key is the smallest super key used to uniquely identify a record.)

参照完整性(Referential integrity)：如果关系包含外键，则外键值必须与归属关系中记录的候选键值匹配，或者外键值必须完全为空。是对关系数据库中建立关联关系的数据表间数据参照引用的约束，也就是对外键的约束。准确地说，参照完整性是指关系中的外键必须是另一个关系的主键有效值，或者是NULL。参考完整性维护表间数据的有效性,完整性,通常通过建立外部键联系另一表的主键实现,还可以用触发器来维护参考完整性。

(If the relationship contains a foreign key, the foreign key value must match the candidate key value recorded in the home relationship, or the foreign key value must be completely empty.Is the constraint of foreign keys. Precisely, referential integrity means that the foreign key in a relationship must be a valid value of the primary key of another relationship, or NULL.)

第三章：SQL

(SQL is a declarative language for manipulating a relational database)

基本概念：在SQL中一个关系就对应一个基本表，一个（或多个）基本表对应一个存储文件，一个表可以带若干索引，索引也存放在存储文件中。存储文件的逻辑结构组成了关系数据库的内模式。视图是从一个或几个基本表导出的表，它本身不独立存储在数据库中，这些数据仍存放在导出视图的基本表中。

(In SQL, a relationship corresponds to a basic table, and one (or more) basic tables correspond to a storage file. A table can have several indexes, and the indexes are also stored in the storage file. The logical structure of the stored files constitutes the internal schema of the relational database. A view is a table that is exported from one or several basic tables. It is not stored in the database independently, and these data are still stored in the basic tables of the exported view.)

2.专有名词及解释:

1.索引：建立索引是加快查询速度的有效手段，用户（数据库管理员或建表者）可以在基本表上建立一个或多个索引，以提供多种存取路径，系统在存取数据时会自动选择合适的索引作为存取路径。

(Index: Establishing index is an effective means to speed up the query. Users (database administrators or table builders) can establish one or more indexes on the basic table to provide multiple access paths. When accessing data, the system will automatically select the appropriate index as the access path.)

索引是关系数据库的内部实现技术，属于内模式的范畴。

(Index is the internal implementation technology of relational database, which belongs to the category of internal schema.)

唯一索引：使用UNIQUE关键字，每个索引值对应唯一一条数据记录

非唯一索引：不使用UNIQUE关键字

聚簇索引：指索引项的顺序与表中记录的物理顺序一致的索引组织，可以在最经常查询的列上建立聚簇索引，对于经常更新的列不宜建立聚簇索引。一个表只能建立一个聚簇索引，一般默认是主键，所谓“与记录的物理顺序一致”即该记录升序索引也升序之类的。

(It refers to the index organization in which the order of index items is consistent with the physical order recorded in the table. Clustered indexes can be established on the most frequently queried columns, but not on the frequently updated columns. Only one clustered index can be established for a table. Generally, it is the primary key by default. The so-called "consistent with the physical order of records" means that the ascending index of the record is also ascending.)

视图：视图是从一个或几个基本表（或视图）导出的表，数据库只存放视图的定义而不存放视图对应的数据，这些数据仍存放在原来的基本表中。所以基本表中数据发生变化，从视图查询出的数据也会随之改变。

(View: A view is a table exported from one or several basic tables (or views). The database only stores the definition of the view, but not the data corresponding to the view, which is still stored in the original basic table. Therefore, when the data in the basic table changes, the data queried from the view will also change.)

视图的作用：

①简化用户的操作：用户所做的只是对一个虚表的简单查询，而这个虚表是怎样得来的，用户无须了解。

(Simplify the operation of users.)

②使用户能以多种角度来看待同一数据：当许多不同种类的用户共享同一个数据库时，这种灵活性是非常有必要的。

(Enable users to view the same data from multiple angles.)

③对重构数据库提供了一定的逻辑独立性：即使重构数据库也不一定需要修改应用程序。

(It provides a certain logical independence for reconstructing the database.)

④能够对机密数据提供安全保护：只允许用户查询提供给他的视图而不是直接查询表，可以隐藏表中的机密数据。

(Can provide security protection for confidential data.)

⑤适当的利用视图可以更清晰地表达查询语句：利用视图来表达查询语句。

(Appropriate use of views can express query statements more clearly.)

第四章：ER模型（实体关系模型）

实体关系（ER）模型是一个高级概念数据模型它最初是在20世纪70年代开发的，旨在促进数据库设计。

概述：概念数据模型有两个主要目的：

–支持用户对数据的感知。(To support a user’s perception of data)

–隐藏数据库设计的技术。(To conceal the technical aspects of database design)

–它们还独立于实现数据库。(They are also independent of the particular DBMS used to implement the database)

ER模型的基本概念包括：–实体(Entities )–属性(Attributes)–关系(Relationships)。这些概念用ER图表示。

实体与关系(Entities and Relationships)：

基本ER图的示例如下所示：

实体“学生”和“学校”显示为矩形。参与的关系显示为实体。（实体用矩形表示）

我们可以用两种方式阅读此图：

1.一名学生“上”一所学校（从学生的角度来看，是1:1的关系）。

2.一所学校“由”一名学生就读（从以下观点来看，1:1关系学校）。

ER图表示实体类型之间的关系ER图中的每个实体框都引用一组实体，例如一组学生。

集合中的每个成员都可能参与指定的关系。

下面是一个更复杂的ER图：

关系的重要性：

crowsfoot用于指示关系的多个对应的结束。

解释ER图：

1.我们可以如下解释这个ER图：

1.一个校长只管理一所学校（从校长的角度看是1:1）

每所学校只有一个校长（从学校的角度看是1:1）

一所学校雇用了许多教师（从学校的角度来看，1:M）

4.一个老师教很多孩子（从老师的角度来看1:M）每个孩子都有很多老师教。（1:M，从

child）–因此这种关系总体上是M:M（双向/观点）

我们总是从以下角度解释ER图中的每个关系，实体的一个实例的视图。

1.即使在M:M关系“老师教孩子”的情况下，我们也会解释

这是：

–一名教师教多个孩子（从一名教师的角度来看，1:M）

–一个孩子由许多老师教（从一个孩子的角度来看，1:M）

2.因此，我们总是用单数表示实体名称，以帮助我们记住上述规则。

例如，teacher而不是teachers。

对于一段特定的关系，我们应该总是能够做到两个解释。

–我们可以从一个方向解释关系（“教学”），我们可以以相反的方向解释它（“由教授”）。

可选性和参与(Optionality and Participation)：

考虑我们前面示例中的1:M关系：

我们将其解释为：

一所学校雇用一名或多名教师。（1:M关系一所学校的观点）

-一个老师被一个学校雇佣。（1:1关系一位老师的观点）

然而，可能有些教师没有被任何学校雇用。

–我们认为，从以下观点来看，雇佣关系是可选的老师。

也就是说，一个特定的教师可能被学校雇佣，也可能不被学校雇佣。

为了在ER图中表达可选性的概念，我们需要绘制或关系连接线略有不同。

For example:

在此图中，（半）线的虚线部分（连接至上面ER图中的教师实体）表示教师可以选择受雇的学校。

然而，从特定学校的角度来看，至关重要的是这所学校至少雇用一名教师。

因此，学校必须始终雇佣教师。这由连接至学校的连接器的实心一半表示在上面。

可视化关系：关系表示从一个实体集到另一个实体的映射(Relationships represent a mapping from one entity set to another.)。例如，1:M关系将一个集合中的实体映射到一个或多个其他集合中的实体。

我们可以使用映射图可视化关系：

未被任何学校雇佣（即教师参与这种关系是可选的，而不是强制性的，这是明确的

通过虚线半线连接到ER图中1 M的教师实体）

显示属性的替代符号如下所示：

多对多：在对给定系统进行首次分析时，可能会出现一些关系是M:M。有时可以将M:M关系转换为两个1:M关系。

弱实体：弱实体是指没有某些实体的存在就不可能存在的实体。

(A weak entity is one which cannot exist without the existence of some other entity.)

这表示一个给定的电影必须录制在一个或多个更多DVD，并且给定的DVD必须包含特定的影片。因此，我们不能有没有相应电影的DVD。如果我们从数据库中删除一部电影，我们还希望删除所有有关包含该电影的DVD的信息。–因此，DVD是一个薄弱的实体，因为它依赖于相应的电影。

一元关系(Unary Relationships)：我们之前说过，关系表示一个集合中的实体和另一个集合的一个或多个实体。我们有时需要显示实体之间的关系同一套。一些典型示例包括：

左图：一个人可以选择与另一个人结婚。正确：一名员工管理零名或多名其他员工。每个

该员工由另一名员工管理。

实体间的多种关系：

5.将ER模型转换为关系：我们如何将ER模型转换为一组关系？

•我们首先定义一些关系来表示ER模型中存在的实体。

•然后将列添加到每个关系中，以表示相应的实体。

•关系使用外键表示，或可能使用单独的关系。

•识别每个关系中的主键和外键（如果有）。

–最后一步通常与其余步骤并行进行。

一，将实体转换为关系：

关系根据其基数（1:1、1:M或M:M）。

•考虑1:M关系，例如：

为了表示这种关系，我们从两个关系开始：–部门（Dno，Name，NumRooms）–员工（Sno，Name）

•为了建立关系，我们必须将每位员工与其联系起来通过嵌入外键来对应部门，因此，我们将Staff关系（即多方）扩展为包括一个额外的字段，外键DeptNo。

处理多对多关系，例如查看房地产和租户之间的关系，我们引入了一个单独的

表示M:M关系的关系：

总结：一对多：

我们将key（一侧的主键）作为外键嵌入到Y中（多方）：

对对多：

我们创建了第三个关系，例如Z，它包含两个元素的主键

其中X和Y是与主键keyx和keyy的两个关系。有创建从X到Y的1:1关系的两种方法。

练习题：1.考虑以下关系，这些关系可能构成美发数据库的一部分:

Client(CNo,Name,Phone,FavouriteStylist)

Stylist(SNo,Name,Phone)

Treatment(TreatmentName,Price,Duration)

Booking(CNo,SNo,Date,Time,TreatmentName)

在每种情况下指出所有候选关键字，讨论您所做的任何假设。选择

每个关系的主键与识别任何外键。

答案：候选键是实体的唯一且不可约的标识。

For Client, candidate key is (CNo)

For Stylist, candidate key is （SNo）

For Treatment candidate key is （TreatmentName）, although might also sensibly be(TreatmentName, Duration)

For Booking, candidate keys are (Cno, Date, Time) or (Sno, Date, Time) if it is assumed that a stylist can treat only one client at a time. Either of these could be used as a primary key, or an artificial key (BookingNo) introduced instead.

FavouriteStylist is a FK in Client, and CNo, SNo and TreatmentName are all FKs in Booking.

下面的教师和学生表显示了分配给学生的教师。每个学生的导师由学生表的tutorID列标识。主键有下划线。

这些表符合实体完整性和引用完整性的概念吗？

实体完整性:每个实体都是唯一的，并且有一个非空的主键。参照完整性:每个外键都是另一个关系中的主键。它们不一致两者都不符合。Student表的主键列为null，违反了实体完整性。Student表在tutorID列中有一个条目缺失(外键列)，它与主键(tutorID)中的任何现有值都不对应列，因此违反了引用完整性。

(Entity integrity: each entity is unique and has a non-null primary key. Referential integrity: every foreign key is a primary key in the other relation. They don't conform to either. The Student table has a null in the primary key column, violating entity integrity. The Student table has an entry 45 in the tutorID column (a foreign key column), which does not correspond to any existing value in the primary key (tutorID)

column in the Tutor table, thus violating referential integrity.)

画一个ER图，代表上面显示的导师和学生的例子.

从导师来看：一个导师可以对应多个学生，属于一对多的关系。

从学生来看：一个学生只有一个导师，属于一对一的关系。

从给定的数据来看，导师不需要一定分配给学生，但学生必须分配导师。

面包店使用数据库系统记录顾客、产品和订单的详细信息。系统记录客户的详细信息，包括客户的姓名、地址和联系人电话号码。客户可以下许多订单，每个订单都要求各种产品。系统会记录下每份订单的下订单日期,订单要交付产品。每份订单都要送到一个独特的客户，可能与下订单的客户不同(例如，礼物)。每个产品有唯一的名称和单价。有些产品是由一种组合构成的其他产品。例如，“鸡尾酒会选择”由5个“奶酪吸管”组成，两个“香肠卷”和三个“出口卷”。

构建一个实体关系(E-R)图来建模实体、属性和上述关系。确保你展示了参与度和基数适用于每个关系的约束。简要说明每个实体旨在代表。

订单必须至少包括一种产品。

可能有些产品从未订购过(例如新产品)。

一种产品不一定由其他产品组成。一种产品不一定是其他产品的组成部分。

订单由订单号识别

给每个关系的主键加下划线，并清楚地指出外键:

斜体代表外键：

4.将以下ER模型转换为数据库的关系：

第五章：关系数据理论

1.规范化（Normalisation）：

概述：数据库规范化是一件好事。它使数据存储更加高效,它简化了数据维护。

(It makes data storage more efficient, and it simplifies data maintenance.)

数据依赖：数据依赖是一个关系内部属性与属性之间的一种约束关系。有多种类型的数据依赖，其中最重要的是函数依赖（Functional Dependency,FD）和多值依赖（Multivalued Dependency,MVD）

规范化理论是用来改造关系模式，通过分解关系模式来消除其中不合适的数据依赖，以解决数据冗余、插入异常、更新异常、删除异常这些问题。

(By decomposing relational schema to eliminate inappropriate data dependencies, we can solve the problems of data redundancy, inserting exceptions, updating exceptions and deleting exceptions.)

非规范化的关系表

1NF：

如果一关系模式r(R)的每个属性对应的域值都是不可分的(即原子的)，则称r(R)属于第一范式，记为r(R)Î1NF.

(If the domain values corresponding to each attribute of a relational pattern r(R) are inseparable (that is, atomic), then r(R) belongs to first normal form, and it is named R (r(R)Î1NF.)

第一范式的目标是：将基本数据划分成称为实体集或表的逻辑单元，当设计好每个实体后，需要为其指定主键。

(The goal of first normal form is to divide basic data into logical units called entity sets or tables. After designing each entity, you need to specify a primary key for it.)

第一范式是对关系模式的最起码的要求。不满足第一范式的数据库模式不能称为关系数据库。

(First normal form is the minimum requirement for the relationship model. A database schema that does not satisfy the first normal form cannot be called a relational database.)

实现一范式：

1.我们通过为每个重复的值复制不重复的列来扩展表行。

(we extend the table rows by replicating the nonrepeated columns for each repeated item value.)

将重复数据和非重复数据分成单独的表格(非损失分解) 然后，我们必须为重复数据表选择一个主键,并将其作为外键插入非重复数据表中。（此方法更好，有利于帮助我们达成2NF）

(Divide duplicate data and non-duplicate data into separate tables (non-loss decomposition). Then, we have to select a primary key for the duplicate data table and insert it as a foreign key into the non-duplicate data table. (This method is better, and it will help us achieve 2NF)

注意：一个不好的关系模式会存在以下一些问题：

数据冗余太大：信息被重复存储，导致浪费大量存储空间

(Too much data redundancy: information is stored repeatedly, which wastes a lot of storage space.)

上面这张表：关于史密斯的信息是重复的，而且 2.7级员工的收入为26，813英镑也是重复的。

更新异常：当重复信息的一个副本被修改，所有副本都必须进行同样的修改。因此当更新数据时，系统要付出很大的代价来维护数据库的完整性，否则会面临数据不一致的危险。

(Update Exception: When one copy of duplicate information is modified, all copies must be modified in the same way. Therefore, when updating data, the system has to pay a great price to maintain the integrity of the database, otherwise it will face the danger of inconsistent data.)

以上的重复可能会使表的更新造成困难，

假设史密斯升到了一个新年级。那么，史密斯的所有记录都需要更改。

假设2.7级的工资发生变化。2.7级的所有工作人员的所有记录都必须更改。

事实应该只存储一次。这样更新就没有问题了。

因此，1NF会造成更新异常。

插入异常：只有当一些信息事先已经存放在数据库中时，另外一些信息才能存入数据库中

( Insert exception: only when some information has been stored in the database in advance, other information can be stored in the database.)

如果插入Grade = 2.8与Salary=27491,，那么会造成插入异常。

删除异常：删除某些信息时可能丢失其它信息。

(Exception in deletion: some information may be deleted and others may be lost.)

解决方案:无损分解(NLD)(Solution: Non-loss decomposition (NLD))

乍一看，新方案似乎会占用更多的空间(但是实际上花费更少，因为我们删除了重复)

元组表示真实的语句：

在上面的表中我们可以知道ID，Name为候选键。从中我们可以选择主键。

HoD不是键。

函数依赖：

在任何关系中，一列(或一组列)Y在函数上依赖于a 列(或一组列)X，如果在任一时刻恰好有一个Y值为与任何X值相关联。

(In any relationship, a column (or a group of columns) Y is functionally dependent on column A (or a group of columns) X, if there happens to be a Y value associated with any X value at any time.)

例如，在任何时候，一个姓(Y)与一个特定注册号(X)。

请注意，姓氏可能会改变。

并且这种依赖性只是单向的(即定向的):注册号决定姓氏，但反之则不然。

(And this dependence is only one-way (that is, directional): the registration number determines the last name, but vice versa.)

根据定义，函数依赖(FD)左边的属性是叫做FD的行列式

(By definition, the attribute to the left of functional dependency (FD) is a determinant called FD.)

我们称X为行列式，我们写为：X -> Y

根据定义，非键列是主键列上的FD

X -> Y是一个完全的FD (FFD)如果y依赖于整个x。

(By definition, non-key columns are FD on primary key columns.

X -> Y is a complete FD (FFD) if Y depends on the whole X.)

完全函数依赖：有一个关系模式S(Sno,Sname,Cno,Grade)

如果我想知道某位学生的某一门课的成绩Grade，那我必须得同时知道他的学号Sno和课程号Cno。但如果我只知道一部分信息，比如他的Sno或者Cno可以吗？答案是不行的！此时称Y[Grade]完全依赖于X[Sno,Cno]。

部分函数依赖：如果我想知道某位学生的姓名Sname，那我知道他的学号Sno就可以了。也就是说Y[Sname]只函数依赖于X[Sno,Cno]中的子集x[Sno]，此时称Y部分函数依赖于X。

(Partial function dependence: If I want to know a student's name Sname, then I just need to know his student number Sno. That is to say, only the function of Y[Sname] depends on the subset x[Sno] in X[Sno,Cno]. At this time, the partial function of y depends on x.)

传递函数依赖(Transfer function dependence)：有一个关系模式S(Sno,Sdept,Mname)

如果我知道了一个学生的学号Sno，那我就能知道他所在的系Sdept。(因为理论上一个学生只属于一个系)

如果我知道了某一个系Sdept，那么我就能知道这个系的系主任的姓名Mname。(一个系只有一个正的系主任。

也就是说，我知道了一个学生的学号Sno，其实我就知道了他所在系的系主任的姓名Mname。但这个过程中，他们是不存在直接函数依赖的，我需要通过系名称Sdept作为一个桥梁去把二者联系起来的。

第二范式(Second Normal Form):

在第一范式的基础上每一个非主属性都完全函数依赖于任何一个候选码，则为第二范式。

第二范式的目标：将只部分依赖于候选码（即依赖于候选码的部分属性）的非主属性移到其他表中。

(On the basis of first normal form, every non-main attribute is completely functionally dependent on any candidate code, which is the second normal form.

The goal of the second paradigm: move the non-primary attributes that only partially depend on candidate codes (that is, some attributes that depend on candidate codes) to other tables.)

上面这张表不满足第二范式：因为非主属性(如Sname)在主键(Sno，Bno)上不是完全函数依赖。

我们可以通过拆分表的方式来实现第二范式。

我们可以将上面这张表进行分解来满足第二范式：

看起来我们需要的两张表占用了很多的空间，但是实际上我们节约了更多得空间，并且避免了不必要的重复。

第三范式(Third Normal Form)：通过上述的的操作来看，其实还是存在问题。我们可以观察到，工资对Sno的依赖有一个特殊的性质。它是间接的(或“可传递的”)，因为它是通过第三个非关键属性实现的。

(According to the above operation, there are still problems. We can observe that the dependence of wages on Sno has a special nature. It is indirect (or "transitive") because it is implemented through the third non-critical attribute.)

为了避免出现传递依赖性，我们就将引入第三范式。

我们在满足1NF，2NF的基础上，我们可以对表进行拆分，来满足第三范式。

我们可以观察上面两张表：Loan已经满足第三范式。但是StaffBorrower2并不满足。

我们将第一张进行拆分：

以上的表是满足3NF。

3NF解决了大多数问题，但是有一些罕见的异常，这种情况仍然会出现(尤其是当候选关键字重叠时)。

(3NF has solved most of the problems, but there are some rare exceptions, which will still occur (especially when candidate key overlaps).)

因此，我们还有一些范式来解决这些问题。

例如：BCNF

通常认为BCNF是修正的第三范式，有时也称为扩充的第三范式。

一个满足BCNF的关系模式有：

所有非主属性都完全函数依赖于每个候选码

所有的主属性都完全函数依赖于每个不包含它的候选码

没有任何属性完全函数依赖于非码的任何一组属性

BCNF范式排除了：

任何属性(包括主属性和非主属性)对候选码的部分依赖和传递依赖

主属性之间的传递依赖。

但是，范式也不总是就是好的。

例如性能问题:

–如果某个特定查询很慢，这可能是因为要执行多个操作将规范化的表连接在一起。

–考虑将查询结果作为关系来维护(使用函数依赖性和复制)。尤其是当这个查询被频繁使用时。

–同样，对于常规报告和常用的计算值(分组小计或表达式)。

第六章：PHP

1.PHP 是一种创建动态交互性站点的强有力的服务器端（运行在服务器端的脚本语言）脚本语言。

PHP 是免费的，并且使用非常广泛。同时，对于像微软 ASP 这样的竞争者来说，PHP 无疑是另一种高效率的选项。

Web发展历史：Web 1.0：“只读”的信息展示平台

Web 2.0 ：“互动”的内容生产网络

Web3：“去中心”的个性化环境

3.静态网站的访问流程：

动态网站的访问流程：

语法：

写结束符时会解析后面的空格，当传输时会造成不必要的消耗。

所以，一般不采用结束符。

变量：

预定义变量：

可变变量：

变量的传值：

值传递：

引用传递：

常量（不需要美元符号定义）：

特殊常量：

系统常量：

魔术常量：

数据类型：

字符串类型的转换：

类型判断：

8.进制

9.浮点数（不是精确的）

运算符：

连接运算符：

错误抑制符：

自操作：

位运算符：

异或：有且仅有一个为true，才能为true。

数据运算符：

三元运算符：

对 expr1 求值为 TRUE 时的值为 expr2，在 expr1 求值为 FALSE 时的值为 expr3。

自 PHP 5.3 起，可以省略三元运算符中间那部分。表达式 expr1 ?: expr3 在 expr1 求值为 TRUE 时返回 expr1，否则返回 expr3。

组合比较符(PHP7+)：

同时还有一些HTML的表单与数组。实战练习。这里不再讲述。

第七章：关系代数与三级结构模式

(Relational algebra and ANSI-SPARC)

关系代数：关系代数是一种理论语言。

–它是SQL等查询语言的理论基础。

–它包含处理一个或多个关系的运算符。

–这些运算符为我们提供了从给定内容构建新关系的方法一个。

–它在某些方面类似于普通算术(通常称为代数当处理变量而不是显式数字时)。

例如，对于数字，我们可以这样写：

运算符：

在普通算术中，我们可以使用运算符：加减乘除。–这些函数处理两个数字，结果产生一个数字–它们是二元运算符。

但是“负”也有另一种含义，如示例所示：-7这个操作符处理一个数字并产生一个数字作为结果——这就是是一元运算符。

关系运算符：并（Union）、差（Difference）、投影（Projection）、笛卡尔积（Cartesian Product）、选择（Restriction）为五个基本操作，交（Intersection）、连接（Join）、除（Division）等为附加操作，附加操作可以用五个基本操作表示。

选择（Division）：选择运算是从关系R中选取使逻辑表达式F为真的元组，是从行的角度进行的运算。

写作：RESTRICT R TO C

举例:列出薪金高于20，000英镑的所有员工

投影：投影操作主要是从列的角度进行运算，但投影成功之后不仅取消了原关系中的某些列，而且还可能取消某些元组（避免重复行）。

写作：PROJECT ColumnList FROM R

举例：为所有员工制作一份工资清单，只显示Sno、姓名和薪资详细信息。

列出组织内的职位：–请注意，结果的行数比源关系少。为什么？

并：两个关系R和S的并集是通过合并它们的行得到的转换成一个关系，消除重复的行。

写作：

举例：查询:构造一个包含所有城市的列表，这些城市要么有分行，要么有财产。

联合兼容性：所有查询中的列数和列的顺序必须相同。

比较的两个查询结果集中的列数据类型可以不同但必须兼容

列属性相同

例如，我们不能形成分支和财产的联合: –两者都有5个属性，所以没问题。 –但相应的属性并不完全匹配(例如，邮政编码与租金)。

差：差运算符定义了一个关系，该关系由 在关系R中，但不在关系s中。

写作：

举例：查询:构建一个包含有分支机构但没有分支机构的所有城市的列表财产。

笛卡尔乘积：令A和B是任意两个集合，若序偶的第一个成员是A的元素，第二个成员是B的元素，所有这样的序偶集合，称为集合A和B的笛卡尔乘积或直积，记做A X B。

写作：

为了建立联系，我们经常需要将两个关系的行组合起来一个关系中的行与另一个关系中的相应行之间的关系。

由于两个输入关系可能具有相同名称的属性， 有必要避免以多个同名属性结束在结果中。

– Thus, in the previous example, we end up with STAFF.Dno and DEPT.Dno. – We also have STAFF.Name and DEPT.Name

举例：列出至少看过一处房产的房客的名字，以及相关的物业编号和任何注释。

为了执行这个查询，我们需要来自要与物业编号和相结合的租赁关系查看关系中的注释信息。

六，使用笛卡尔积：

下面结果有什么问题：

如果要解决上面这个问题的话：使用连接操作符。

限制的效果是消除已经形成的行通过合并两个原始表中不相关的行。

(The effect of the restriction is to eliminate rows that have been formed by combining unrelated rows in the two original tables. )

最后，投影会给我们提供我们最初寻找的数据:

(Finally, a projection will give us the data we originally sought:)

连接(join):数据库中三种常用的连接方式有：自然连接，等值连接，左连接，右连接....。

自然连接(Natural joins):自然连接操作符是笛卡尔积组合，选择和投影。

我们将两个关系R和S上的自然连接表示如下:

实际上，自然连接运算符执行以下操作:它首先形成R和S的笛卡尔积。然后，它将结果选择为R和s具有相同的值。(通常，公共属性实际上是主属性和外键。)最后，它应用投影运算符，以便每个公共来自R 和S的属性在最终结果中只出现一次。

( It first forms the Cartesian Product of R and S.It then Restricts the result to one in which common attributes from R and S have the same value. (Usually, the common attributes are in fact primary and foreign keys.Finally, it applies the Projection operator so that each of the common

attributes from R and S appears only once in the final result.)

举例：

三级结构模式(ANSI-SPARC):

大多数商业数据库管理系统的体系结构都是基于ANSISPARC体系结构。
三级架构包括:模式（Schema）、外模式（External Schema）、内模式（Internal Schema）

三层体系结构的目标是分离用户从数据库的物理表示方式来看数据库的视图。这是合乎需要的，原因如下：

(The objective of the three-level architecture is to separate the users’ view(s) of the database from the way that it is physically represented. This is desirable for the following reasons:)

-它允许独立定制的用户视图。

(It allows independent customised user views.)

每个用户应该能够访问相同的数据，但拥有不同的数据的定制视图。这些应该是独立的。

(Each user should be able to access the same data, but have a different customised view of the data. These should be independent: changes to one view should not affect others.)

-它对用户隐藏了物理存储的细节。

(It hides the physical storage details from users.)

用户不必在乎理物理数据库存储细节。他们可以处理数据本身，而不必担心它的物理存储方式。

(Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored.)

DBA应该能够更改数据库的存储结构，但不影响用户的视图。

(The database administrator should be able to change the database storage structures without affecting the users’views.)

数据库的内部结构（存储的物理结构）应该不受变化的影响。

(The database administrator should be able to change the database storage structures without affecting the users’views.)

数据库管理员能够更改概念上的或数据库的全局结构，而不影响用户（用户的视图不会改变）。

(The database administrator should be able to change the conceptual or global structure of the database without affecting the users.)

下面将介绍三个模式

外模式(The External Level):

-数据库用户的数据视图 (The external level represents the user’s view of the database.)

-它描述了数据库中与特定用户相关的部分。 ( It describes the part of the database relevant to a particular user.)

–例如，大型组织可能拥有财务和股票控制权部门。 –财务人员通常不会查看股票详细信息，因为他们例如，关心会计方面的事情。 –因此，每个部门的员工都需要不同的用户界面存储在数据库中的信息。

-视图可以提供相同数据的不同表示。(Views may provide different representations of the same data.)

例如，一些用户可能以(日/月/年)的形式查看日期而其他人更喜欢(年/月/日)。

-一些视图可能包含导出或计算的数据。(Some views might include derived or calculated data.)

–例如，一个人的年龄可以从出生日期开始计算因为存储它们的年龄需要每年更新。

外模式的用途：①保证数据库安全性的一个有力措施 ②每个用户只能看见和访问所对应的外模式中的数据.

(Uses of external schema: ① A powerful measure to ensure database security ② Each user can only see and access the data in the corresponding external schema.)

概念模式(The Conceptual Level):

1.概念层描述了什么数据存储在数据库中数据之间的关系。数据库中全体数据的逻辑结构和特征的描述.

(The conceptual level describes what data is stored in the database and the relationships among the data)

2.概念模式代表: –所有实体、它们的属性以及它们的关系。

–数据的约束。

–安全和完整性信息。

(– All entities, their attributes, and their relationships.

– The constraints on the data. – Security and integrity information.)

3.概念级别的描述不得包含任何与物理存储相关的详细信息。

(The description of the conceptual level must not contain any storagedependent details)

4.所有用户的公共数据视图(Public data view of all users)

5.是数据库系统模式结构的中间层.

内模式(The Internal Level):

内部级别包括数据库的物理表示在计算机上(并且可能在一些编程中被指定语言)。

(The internal level covers the physical representation of the database on the computer (and may be specified in some programming language)

它描述了数据在数据库中的存储方式特定的数据结构和文件组织。

(It describes how the data is stored in the database in terms of particular data structures and file organisations.)

内部层面涉及: –为数据和索引分配存储空间。 –描述记录存储时将采取的形式。 –记录位置。将记录整理成文件。 –数据压缩、安全和加密技术。

内部层与操作系统接口，将数据放在存储上设备、建立索引、检索数据等。

内部级别下面是物理级别，由操作系统在数据库管理系统的指导下。它研究的是在磁盘等设备上物理存储数据.

级别之间的差异：

九：数据库模式:数据库的总体描述称为数据库模式。

-There may be many external schemas for a given database.

-There is only one conceptual schema per database.

-There is only one internal schema per database.

十,模式之间的映射:

DBMS负责三种类型的模式(即它们实际上如何相互对应)。

它还必须检查模式的一致性。 –每个外部模式必须可从概念模式派生。

(It must also check the schemas for consistency. – Each external schema must be derivable from the conceptual schema)

概念模式与内模式之间的映射使DBMS能够找到实际的记录或组合物理存储器中的记录，它们构成了概念图式。(The mapping between conceptual schema and internal schema enables DBMS to find actual records or combine records in physical memory, which constitute conceptual schema)

数据独立性:ANSI-SPARC体系结构的一个主要目标是提供数据独立性.

(A major objective of the ANSI-SPARC architecture is to provide data independence)

数据独立性有两种:1.逻辑数据独立性2.物理数据独立性

物理独立性是指用户的应用程序与存储在磁盘上的数据库中数据是相互独立的。

(Logical data independence refers to the immunity of external schemas to changes in the conceptual schema.)

逻辑独立性是指用户的应用程序与数据库的逻辑结构是相互独立的，即当数据的逻辑结构改变时，用户程序也可以不变。

(Logical data independence refers to the immunity of external schemas to changes in the conceptual schema.)

十二,数据库语言:DBMS通常提供数据子语言可以操作数据库及其各种模式。数据子语言由两部分组成:–数据定义语言(DDL)–数据操作语言(DML)

DDL用于指定数据库模式，DML用于指定数据库模式更新数据库并从中提取信息。

DDL:DDL是一种描述性语言，它允许用户描述和说出所需的实体和可能存在的关系,不同实体之间的联系。

不同的数据库，可能拥有不同的DDL。

DML:DML是一种语言，它提供了一组支持对数据库中数据的操作。

十三,数据模型:数据库模式通常使用特定数据库管理系统。数据模型是概念的集成集合，用于描述中的数据、数据之间的关系以及对数据的约束组织。

数据模型由三部分组成: –结构部分，由一组规则组成，根据这些规则，数据库可以被构建。 –操纵部件，定义允许的操作类数据。

–一组完整性规则，确保存储的数据准确。

Examples: -Network Model -Hierarchical Model(层次模型) -Relational Model • (Entity-Relationship Model) –Object-Oriented Model –Object-Relational Model

第八章：数据库管理系统与数据法律

( Database management systems and data law)

1.数据库管理(Database Administration):几种特殊的角色:

数据管理员(Data Administrator (DA)

数据库管理员(DBA)

数据字典 (Data Dictionary)

2.数据管理员(Data Administrator (DA)与数据库管理员(DBA):

数据库管理员和数据管理员、两者都有具体的责任，相辅相成。DA具有更具战略性和管理性的角色。 DBA的技术性更强。DA将负责整个组织的信息战略(即超越各种子系统，如DBMS)

并将关注大规模开发(例如，在平台之间移动子系统、开发新的子系统、确保弹性、开发和维护标准、政策和程序等)。(• The DA has a more strategic and managerial role. • The DBA has a more technical role. • The DA will be responsible for the Information Strategy of the organisation as a whole (i.e. transcending the various subsystems, such as a DBMS) • and will be concerned with large-scale developments (e.g. moving subsystems between platforms, developing new subsystems, ensuring resilience, developing & maintaining standards, policies & procedures etc.)

数据管理员:在大多数组织中(可能在任何规模的组织中)，许多用户将共享数据。共享数据对核心业务至关重要。(the shared data will be crucial to the core business.)

数据可能会被划分到不同的系统中，例如大学:金融学生记录招聘和招生假期出租地产和建筑工资单人员.

DA与DBA是至关重要的。

数据库管理员:在任何的DBMS中，我们都希望找到一个DBA。(the shared data will be crucial to the core business)

-DBA将责项目的有序运行。(DBA will be responsible for the orderly running of the program)

MySQL的一个“实例”一直在运行(即使当没有用户登录到它),并且用户的数据不是存储在单独的文件中，而是存储在公共数据中的区域,并且一起进行备份和日志记录因此，至少必须有人确保所有这很正常。

(users’ data is not stored in separate files, but in common data areas… • and is backed up and journaled (logged) together… • so at the very least there must be someone who makes sure that all this happens properly.)

-DBA角色负责管理特定平台上的特定子系统(The DBA role is concerned with managing a particular (sub)system on a particular platform)

-DBA 将在系统设置期间和之后的系统运行期间扮演角色.(The DBA will have roles both during the setting-up of a system, and, later, while it is running.)

-数据库管理员将与DA一起，参与决定信息内容,数据库的编写概念规范(“概念模式”) 决定如何存储数据:使用工具 DBMS提供从逻辑到物理的构造。

(The DBA will with the DA, participate in deciding the information content of the database write the conceptual specification (“conceptual schema”) decide how the data should be stored: use the facilities of the DBMS to provide the mapping from the logical to the physical)

-数据库管理员还将 为用户视图编写“子模式” 记录视图分配所有权权和义务与DA一起，在不同利益之间进行裁决使用DBMS管理工具来监控、调整，重组、保护、备份和重新加载。(The DBA will also write “sub-schemas” for user views document the views allocate ownership rights and duties with the DA, adjudicate between different interests use the DBMS management tools to monitor, tune, reorganise, protect, backup, and reload)

数据库管理员使用的管理工具:任何成熟的DBMS都将提供工具来: 从其他格式的文件中批量加载数据,重构数据:例如，将数据分布到几个站点以可配置和可重构的方式提供对数据的差异访问,维护动态备份和恢复设施，以便能够快速恢复平台崩溃的DBMS (这些设施可能是自动的，但可由数据库管理员配置) 允许访问有关DBMS、其内容、其用户及其性能的数据(通过数据字典) 重新调整参数以提高性能。(bulk load data from files in other formats • restructure data: for example to distribute data across several sites • provide differential access to data in ways that can be configured and restructured • maintain dynamic backup and restore facilities to enable rapid recovery of a DBMS whose platform crashes • (these facilities may be automatic but configurable by the DBA) • give access to data about the DBMS, its contents, its users and its performance (via the Data Dictionary) • retune parameters to improve performance)

数据字典:DBMS必须提供允许用户找出存储项(元数据)属性的工具。DBMS本身的组件也可以使用这些工具。这些工具统称为数据字典(或系统目录)。数据字典必须保存有关存储在数据库中的表、视图、表单、查询和报表的信息。例如，它存储有关名称、类型、大小和适用于数据库中每个表的约束的信息。它还必须存储系统用户的信息。例如，它存储授权用户的姓名和详细信息，包括每个用户可以访问数据库的哪些区域的信息 其中一些信息只有数据库管理员可以访问。(A DBMS must provide facilities to allow users to find out the properties of the stored items (metadata). • These facilities may also be used by components of the DBMS itself. • Collectively, these facilities are called the Data Dictionary (or the System Catalogue). • The Data Dictionary must hold information about the tables, views, forms, queries, and reports stored in the database. • for example, it stores information about the name, types, sizes, and constraints that apply to each of the tables in the database • It must also store information about the users of the system. • For example, it stores the names and details of authorised users including information about which areas of the database are accessible to each user • Some of this information is accessible only to the DBA.)

在关系DBMS中，数据字典通常看起来像是由表组成的。这些表格被称为“视图”。例如，在Oracle RDBMS中，有许多名称为USER_xxx的表，这些表提供与特定用户相关的信息。例如，表USER_CATALOG显示所有表，并且当前用户拥有的视图。(这是猫的同义词) 该表只有两列:名称和类型 MySQL INFORMATION_SCHEMA提供对元数据的访问。 (In a relational DBMS, the Data Dictionary is often made to look as if it were made up of tables. • These tables are known as “views”. • For example, in Oracle RDBMS there are many tables with names such as USER_xxx which give information relating to a particular user. • For instance, the table USER_CATALOG shows all tables and views owned by the current user. (A synonym for this is CAT) • This table has only two columns: name and type • The MySQL INFORMATION_SCHEMA provides access to metadata. • Take a look at this in your database)

法律和专业问题(GDPR, DPA, FOI):

数据保护立法:显然，数据管理员(DA)和数据库管理员(DBA)的角色都有与数据处理相关的特定职责(法律、专业、安全) 作为个人，您应该了解您对自己数据的权利作为一名信息专业人员，你可能需要处理他人的信息，还可能需要向他人提供法律建议当前的立法是一般数据保护条例 2018年的《数据保护法》和2018年的《数据保护法》与之前的立法(1998年和1984年的《数据保护法》)有很大不同如果您为公共机构工作，2000年信息自由法案(英国)和2002年信息自由法案(苏格兰)也可能会有影响。

GDPR立法-定义:首先，关于单词的一些要点数据不仅仅是计算机数据。任何系统的记录收集都包括在内，包括纸质记录(归档系统的一部分) 个人数据是与已识别的或可识别的个人，例如姓名、号码、IP地址、cookie。个人数据必须与个人“相关”:它必须涉及他们(考虑信息的内容、您处理信息的目的以及该处理对个人的可能影响或作用) 敏感个人数据包括种族或民族血统、政治观点、宗教或哲学信仰、工会会员资格、遗传数据、生物特征数据、健康数据或自然人的性生活或性取向数据。数据主体(自然人)是作为个人数据主体的任何活着的个人。

七项原则:

合法、公平和透明 2.目的限制 3.数据最小化 4.准确(性) 5.存储限制 6.完整性和保密性(安全性) 7.责任

第八章:数据库管理系统(DBMS)

Recovery(恢复)

Distribution(分布式)

什么是分布式数据库：分布式+数据库。用一句话总结为：由多个独立实体组成，并且彼此通过网络进行互联的数据库。

分布式数据库(Distributed Databases):-逻辑上:相关的共享数据集合(以及该数据的描述)。

-物理上:分布在计算机网络。 –可以是同一房间的两台计算机，也可以是许多台计算机全世界的电脑,在多个CPU的控制下。

(–A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network. – Could be across two computers in the same room or lots of computers across the world – Under the control of more than one CPU)

2.分布式数据库管理系统：允许管理的软件系统,分布式数据库使分布对于用户透明。

2.关键概念：

1.逻辑相关的共享数据的集合。

(Collection of logically-related shared data.)

数据被分割成碎片。

(Data split into fragments.)

片段可以被复制。

(Fragments may be replicated.)

分配给站点的片段/副本。

(Fragments/replicas allocated to sites.)

通过通信网络连接的站点。

(Sites linked by a communications network.)

每个站点的数据都在DBMS的控制之下。

(Data at each site is under control of a DBMS.)

数据库管理系统自主处理本地应用程序。

(DBMSs handle local applications autonomously)

每个DBMS至少参与一个全局应用程序。

(Each DBMS participates in at least one global application.)

3.分布式数据库管理系统的优点：

反映组织结构—数据库片段位于他们所涉及的部门。

(Reflects organizational structure — database fragments are located in the departments they relate to. )

本地自治—一个部门可以控制关于他们的数据。

(Local autonomy — a department can control the data about them (as they are the ones familiar with it.)

提高可用性—一个数据库系统中的故障只会影响一个片段，而不是整个数据库。

(Improved availability — a fault in one database system will only affect one fragment, instead of the entire database.)

改进的性能—数据位于最大的站点附近需求，并且数据库系统本身被并行化，允许数据库上的负载在服务器之间平衡。(一高数据库的一个模块上的负载不会影响分布式数据库中的数据库。)

(Improved performance — data is located near the site of greatest demand, and the database systems themselves are parallelized, allowing load on the databases to be balanced among servers. (A high load on one module of the database won't affect other modules of the database in a distributed database.)

经济——创建一个小型计算机网络的成本更低。

(Economics — it costs less to create a network of smaller computers with the power of a single large computer. )

模块化—系统可以从中修改、添加和删除分布式数据库，不影响其他模块(系统)。

(Modularity — systems can be modified, added and removed from the distributed database without affecting other modules (systems). )

4.分布式数据库管理系统的缺点：

复杂性—数据库管理员必须做额外的工作来确保系统的分布式本质是透明的。还必须做额外的工作维护多个不同的系统，而不是一个大系统。额外的还必须完成数据库设计工作，以解决断开连接的问题数据库的性质—例如，连接变得非常昂贵当跨多个系统执行时。

(Complexity — extra work must be done by the DBAs to ensure that the distributed nature of the system is transparent. Extra work must also be done to maintain multiple disparate systems, instead of one big one. Extra database design work must also be done to account for the disconnected nature of the database — for example, joins become prohibitively expensive when performed across multiple systems. )

经济性—更高的复杂性和更广泛的基础架构意味着额外的劳动力成本。

(Economics — increased complexity and a more extensive infrastructure means extra labour costs. )

安全性——远程数据库片段必须是安全的，但事实并非如此集中化，因此远程站点也必须受到保护。基础设施也必须是安全的(例如，通过加密远程设备之间的网络链接站点)。

(Security — remote database fragments must be secured, and they are not centralized so the remote sites must be secured as well. The infrastructure must also be secured (e.g., by encrypting the network links between remote sites).)

难以维护完整性—在分布式数据库中，实施完整性可能需要太多的网络资源可行。

(Difficult to maintain integrity — in a distributed database, enforcing integrity over a network may require too much of the network's resources to be feasible.)

缺乏经验——分布式数据库很难使用，而且作为一个年轻的领域没有太多现成的正确实践的经验。

(Inexperience — distributed databases are difficult to work with, and as a young field there is not much readily available experience on proper practice. )

DDBMS是一个分布式数据库管理系统 –扩展通信服务。 –扩展数据字典。 –分布式查询处理。 –扩展的并发控制。 –扩展恢复服务。

(DDBMS is a Distributed Database Management System with – Extended communication services. – Extended Data Dictionary. – Distributed query processing. – Extended concurrency control. – Extended recovery services.Plus all the functionality you would expect from a centralized DBMS)

对比：

5.两种形式的DDB:

1.同构分布式数据库管理系统(Homogeneous DDBMS):–所有站点使用相同的DBMS产品。 –更易于设计和管理。 –该方法提供增量增长，并允许增加性能。

(– All sites use same DBMS product. – Much easier to design and manage. – Approach provides incremental growth and allows increased performance.)

异构分布式数据库(Heterogeneous DDBMS):–各站点可能运行不同的DBMS产品，可能有不同的基础数据模型。 –发生在营业点已经实施了自己的数据库并且集成是后面考虑的。 –翻译需要考虑到: 不同的硬件。不同的DBMS产品。不同的硬件和不同的DBMS产品。 –典型的解决方案是使用网关。

(– Sites may run different DBMS products, with possibly different underlying data models. – Occurs when sites have implemented their own databases and integration is considered later. – Translations required to allow for: • Different hardware. • Different DBMS products. • Different hardware and different DBMS products. – Typical solution is to use gateways.)

各种特点:

同构分布式数据库:–同构的数据库可能被设计为从一开始就分发 –设计师们有机会选择一种常见的数据库管理系统 –外键允许一个数据库中的表链接到另一个 –物流可能是为了安全和服务保护。 –在构建和维护这样一个数据库 –这种类型的设计被称为自上而下。

(– Databases that are homogeneous were probably designed to be distributed from the start – The designers had the luxury of being able to choose a common DBMS – Foreign keys allow tables in one database to be linked to tables in another – The physical distribution is probably for security and service protection. – Far fewer problems to solve when building and maintaining such a database – This type of design is known as top down.)

异构分布式数据库:–不同的数据库可能已经存在，并且在由不同用户使用 –显然，将它们合二为一是可取的单一数据库，而不是简单地把它们都扔掉重新开始 –它们可能各有不同的DBMS –他们将不止是一名DD、DA和DBA –它们可能不是匹配字段的简单方法一个到另一个中的等效场 –连接此类数据库被称为自底向上连接。

(– The different databases probably already exist and are in use by different users – It becomes desirable to join them into one apparently single database, without simply throwing them all away and starting again – They could each have a different DBMS – They will be more than one DD, DA and DBA – They may not be an easy way of matching fields from one to equivalent fields in another – Joining such databases is known as a Bottom Up join)

自下而上连接分布式数据库(Joining a DDBMS Bottom Up):分布式数据库管理系统的工作是位于几个独立的数据库之上并为用户提供单个视图，就好像真的只有一个视图一样它没有将数据库集成到一个新的单一数据库中。实现这一点的一种方法是使用网关，它可以转换从一个DBMS到另一个也有人试图开发开放的规范这将允许一个数据库管理系统与另一个数据库管理系统进行通信需要一个网关。一个更加困难的问题是连接来自一个以上的数据.自底向上连接有许多困难，因为数据库不是设计来连接的。每个人都有自己的数据库管理员，因此需要人的参与设计DB的目的可能与原因不符。

(The job of the DDBMS is to sit above the several separate databases and provide the user with a single view as if there were really only one database • It does not integrate the databases into one new single database. They stay where they are. • One method of achieving this is to use a gateway, which translates from one DBMS to another • There are also attempts being made to develop open specifications that will allow one DBMS to communicate with another without the need for a gateway. • A much more difficult problem is joining the data from more than one DB, for example, the current account DB has William Smith, but marketing communicate with Bill Smith.Bottom up joins have many difficulties because the databases were not designed to be joined. • Each will have its own DBA, so there needs to be human integration too • The purpose for which a DB was designed may not match the reasons that it is needed when it has been joined • E.g. city council and social services )

划分方案:分布式数据库的任何部分都称为片段对于同构分布式数据库，最常见的分区形式是 水平和垂直: 水平分区是指表中不同的完整记录存储在不同的位置。垂直分区是指不同的属性存储在不同的地点。(• Any part of a distributed database is called a fragment • For homogeneous DDBs, the most common forms of partitioning are horizontal and vertical: • Horizontal partitioning is where different full records from a table are stored in different locations. • Vertical partitioning is where different attributes are stored in different locations.)

水平分片：按行进行数据分割，数据被切割为一个个数据组，分散到不同节点上。

垂直分片：按列进行数据切割，一个数据表的模式（Schema）被切割为多个小的模式。

垂直拆分指按照功能进行拆分，秉着“专业的人干专业的事”的原则，把一个复杂的功能拆分为多个单一、简单的功能，不同单一简单功能组合在一起，和未拆分前完成的功能是一样的。由于每个功能职责单一、简单，使得维护和变更都变得更简单、容易、安全，所以更易于产品版本的迭代，还能够快速的进行敏捷发布和上线。

优点：

1. 拆分后业务清晰，拆分规则明确。

(After the split, the business is clear and the split rules are clear.)

2.系统之间整合或扩展容易。

(Easy integration or expansion between systems.)

3. 数据维护简单。

(Simple data maintenance)

缺点：

1. 部分业务表无法join，只能通过接口方式解决，提高了系统复杂度。

(Some business tables cannot be join, but can only be solved by interface, which increases the complexity of the system.)

2. 受每种业务不同的限制存在单库性能瓶颈，不易数据扩展跟性能提高。

(Due to the different restrictions of each service, there is a single library performance bottleneck, which makes it difficult to expand data and improve performance.)

3. 事务处理复杂。

(Complex transaction processing)

水平拆分是指由于单一节点无法满足需求，需要扩展为多个节点，多个节点具有一致的功能，组成一个服务池，一个节点服务一部分请求量，所有节点共同处理大规模高并发的请求量。

优点：

1. 不存在单库大数据，高并发的性能瓶颈。

(There is no performance bottleneck of single-library big data and high concurrency.)

2. 对应用透明，应用端改造较少。

(It is transparent to the application, with less modification on the application side.)

3. 按照合理拆分规则拆分，join操作基本避免跨库。

(Split according to reasonable splitting rules, and the join operation basically avoids cross-library.)

4. 提高了系统的稳定性跟负载能力。

(Improve the stability and load capacity of the system.)

缺点：

1. 拆分规则难以抽象。

(Split rules are difficult to abstract.)

2. 分片事务一致性难以解决。

(Fragment transaction consistency is difficult to solve.)

3. 数据多次扩展难度跟维护量极大。

(Multiple data expansion is difficult and the amount of maintenance is huge.)

4. 跨库join性能较差。

(Cross-base join performance is poor)

Transactions(事务) and concurrency(并发)

事务：

我们知道许多数据库管理系统允许用户同时操作数据库，如果这些操作不受控制，它们会相互干扰其他，数据库可能会变得不一致。为了克服这个问题，DBMS实现了并发控制协议。

事务管理(Transaction Management):事务是用户定义的一个数据库操作序列，这些操作要么全做要么全不做，是一个不可分割的工作单位。例如在关系数据库中，一个事务可以是一条SQL语句、一组SQL语句。事务的概念就是并发与恢复。事务是数据库中的一个逻辑工作单元。

事务通常是以BEGIN TRANSACTION开始，以COMMIT或ROLLBACK结束。COMMIT表示提交，ROLLBACK表示回滚，在事务运行的过程中发生某种故障事务不能继续执行，系统就会将事务对数据库的已完成操作全部撤销，从而回滚到事务开始时的状态。

3.事务的特性：

原子性（Atomicity）、一致性（Consistency）、隔离性（Isolation）、持续性（Durability），简称ACID

1.原子性：事务是数据库的逻辑工作单位，事务中包括的操作要么都做，要么都不做

2.一致性：事务执行的结果必须是使数据库从一个一致性状态变到另一个一致性状态。事务执行过程中出现故障则称这时的数据库处于不一致性状态。

3.隔离性：一个事务的执行不能被其他事务干扰，并发执行的各个事务之间不能互相干扰

4.持续性（永久性）：一个事务一旦提交，它对数据库中数据的改变就应该是永久性的。

事务的ACID特性可能遭到破坏的因素有：

(1) 多个事务并行运行时，不同事务的操作交叉执行

(2) 事务在运行过程中被强制停止

并发:

1.并发控制概述：事务是并发控制的基本单位，并发控制用于保证事务的隔离性和一致性。

事务读数据x一般记为R(x)，写数据x一般记为W(x)

如果不对并发操作进行正确调度，可能导致数据的不一致性问题，主要包括丢失修改、不可重复读和读“脏”数据。

-丢失修改：两个事务读入同一数据并修改，其中一个事务的修改会丢失

-不可读重复：事务T1读取数据后，T2执行更新操作，使T1无法再现前一次读取结果

-读脏数据：“脏”数据指事务T1修改某一数据，并将其写回磁盘，事务T2读取同一数据后，T1由于某种原因被撤销，则T2读取到的数据就为“脏”数据，即不正确的数据。

举例：

脏读(dirty read):

在这个场景中，B希望取款500元而后又撤销了动作，而A往相同的账户中转账100元，就因为A事务读取了B事务尚未提交的数据，因而造成账户白白丢失了500元。在Oracle数据库中，不会发生脏读的情况。

不可重复读（unrepeatable read）:

假设A在取款事务的过程中，B往该账户转账100元，A两次读取账户的余额发生不一致。

幻象读（phantom read）：

假设银行系统在同一个事务中，两次统计存款账户的总金额，在两次统计过程中，刚好新增了一个存款账户，并存入100元，这时，两次统计的总金额将不一致：

第一类丢失更新：

A事务撤销时，把已经提交的B事务的更新数据覆盖了。这种错误可能造成很严重的问题，通过下面的账户取款转账就可以看出来：

第二类丢失更新 :

A事务覆盖B事务已经提交的数据，造成B事务所做操作丢失:

解决并发问题的途径是什么?

答案是：采取有效的隔离机制。

怎样实现事务的隔离呢？

隔离机制的实现必须使用锁。

锁的基本原理:

a.当一个事务访问某个数据库资源时，如果执行的是select语句，必须为资源加上共享锁，如果执行的是insert,update,delete语句，必须为资源加上排他锁，这些锁锁定正在被操作的资源。

b.当第二个事务也要反问相同的资源时，如果执行的select语句，那么也必须为资源加上共享锁；如果执行的是insert,update,或delete语句，也必须为资源加上排他锁。但此时第二个事务并非就立即能为资源加上锁，当第一个事务为资源加的是共享锁时，第二个事务能够为资源加上共享锁，但当第一个事务为资源加的是排他锁时，第二个事务必须等待第一个事务结束，才能为资源加上排他锁。

1.共享锁（s锁）

共享锁用于读取数据操作，它允许其他事务同时读取锁定的资源，但不允许其他事务更新

它。

2.排他锁（X锁）

排他锁用于修改数据的场合，他锁定的资源，其他事务部能读取也不能修改。

3.更新锁（U锁）

更新锁在更新操作初始化截断用来锁定可能要被修改的资源，从而避免使用共享锁造成的死锁现象。

这里面有两个步骤：

1) 扫描获取Where条件时，这部分是一个更新查询，此时是一个更新锁。

如果将执行写入更新。此时该锁升级到排他锁。否则，该锁转变成共享锁。

4.悲观锁

悲观锁是指假设并发更新冲突会发生，所以不管冲突是否真的发生，都会使用锁机制。悲观锁会完成以下功能：锁住读取的记录，防止其它事务读取和更新这些记录。其它事务会一直阻塞，直到这个事务结束.悲观锁是在使用了数据库的事务隔离功能的基础上，独享占用的资源，以此保证读取数据一致性，避免修改丢失。悲观锁可以使用Repeatable Read事务，它完全满足悲观锁的要求。

5.乐观锁

乐观锁不会锁住任何东西，也就是说，它不依赖数据库的事务机制，乐观锁完全是应用系统层面的东西。如果使用乐观锁，那么数据库就必须加版本字段，否则就只能比较所有字段，但因为浮点类型不能比较，所以实际上没有版本字段是不可行的。

恢复：

数据转储

转储即DBA定期地将整个数据库复制到磁带或另一个磁盘上保存起来的过程。这些备用的数据称为后备副本。

转储又分为静态转储和动态转储：静态转储必须等待正在运行的用户事务结束才能进行；动态转储是指转储期间允许对数据库进行存取或修改，即转储和用户事务可以并发执行。

也可分为海量转储和增量转储：海量转储即每次转储全部数据库，增量转储即每次只转储上一次转储后更新的数据。

登记日志文件（Logging）

日志文件是用来记录事务对数据库的更新操作的文件。不同数据库系统采用的日志文件格式并不完全一样，主要有两种格式：以记录为单位的日志文件和以数据块为单位的日志文件

对以记录为单位的日志文件，日志文件中需要登记的内容包括：

各个事务的开始（BEGIN TRANSACTION）标记

各个事务的结束（COMMIT或ROLLBACK）标记

各个事务的所有更新操作

以上每一条内容记为一个日志记录（log record）

每个日志记录的内容主要包括：

事务标识（标明是哪个事务）

操作的类型（插入、删除或修改）

操作对象（记录内部标识）

更新前数据的旧值（对插入操作而言，此项为空值）

更新后数据的新值（对删除操作而言，此项为空值）

对以数据块为单位的日志文件，日志记录的内容包括事务标识和被更新的数据块。由于将更新前的整个块和更新后的整个块都放入日志文件中，操作的类型和操作对象等信息就不必放入日志记录中了。

日志文件的作用：

日志文件用于事务故障恢复和系统故障恢复，并协助后备副本进行介质故障恢复

具体作用如下：

事务故障恢复和系统故障恢复必须用日志文件。

在动态转储方式中必须建立日志文件，备份副本和日志文件结合起来才能有效地恢复数据库。

在静态转储方式中，也可以建立日志文件。

登记日志文件

为保证数据库是可恢复的，登记日志文件时必须遵循两条规则：

(1) 登记的次序严格按并发事务执行的时间次序

(2) 必须先写日志文件，后写数据库

如果先写了数据库修改，但是没有登记这个日志，那么中途运行故障就无法恢复这个修改了。

恢复策略

REDO：重做，正向扫描日志文件，对每个REDO事务重新执行日志文件登记的操作

UDNO：撤销，反向扫描日志文件，对每个UNDO事务的更新操作执行逆操作

COMMIT：提交，将事务中所有对数据库的更新写回到磁盘上的物理数据库中，事务正常结束

ROLLBACK：回滚，事务运行的过程中发生了某种故障，事务不能继续执行，系统将事务中对数据库的所有已完成操作全部撤销，回滚到事务开始时的状态

事务故障的恢复：

（1）反向扫描日志文件（即从最后向前扫描日志文件），查找该事务的更新操作

（2）对该事务的更新操作执行逆操作。（来得及或者未来得及写入数据库都没关系）

（3）继续反向扫描日志文件，查找该事务的其他更新操作，并做同样处理

（4）如此继续，直到读到该事务的开始标记

系统故障的恢复：

（1）正向扫描日志文件，找出在故障发生前已经提交的事务（这些事务既有BEGIN TRANSACTION记录，也有COMMIT记录），将其事务标记记入REDO队列；同时找出故障发生时尚未完成的事务（这些事务只有BEGIN TRANSACTION记录，无相应的COMMIT记录），将其事务标记记入UNDO队列

（2）对撤销队列中的各个事务执行UNDO操作

（3）对重做队列中的各个事务执行REDO操作

为什么要REDO？考虑已提交事务对数据库的更新可能还留在缓冲区没来得及写入数据库（磁盘）。

介质故障的恢复：

（1）装入最新的数据库后备副本，使数据库恢复到最近一次转储时的一致性状态

（2）装入相应的日志文件副本（转储结束时刻的日志文件副本），重做已完成的事务，即扫描日志文件找出需要重做和撤销的事务

在故障发生时还未完成的事务需要撤销，在检查点和故障点之间完成的事务需要重做，因为它们对数据库所做的修改在故障发生时可能还在缓冲区中。