XJTLU CSE103 revision note

/* The assessment generally covered the relational model (and of course relational algebra), ER modeling and SQL. Please do review the SQL part, since most MCQs are related to SQL. */

1. The relational Model

1.1 Terminologies

1.1.1 About a relation

Relation: a table with columns and rows.
Attributes: a named column of a relation.
Domain: the set of all allowable values for one or more attributes.
Tuple: a row of a relation.
Intension and extension: the structure of a relation + the domains and restrictions on possible values: intension;
the tuples: extension(state), changes over time
Degree: number of attributes.
Cardinality: number of tuples.

1.1.2 Database relations

Relation schema: the set of { A 1 : D 1 , A 2 : D 2 . . . . . . A n : D n } \{{A_1:D1,A_2:D_2......A_n:D_n}\} {A1:D1,A2:D2......An:Dn} is called a relation schema. A stands for attributes and D stands for domains.
a relation R defined by a relation schema S is a set of mappings from the attribute names to corresponding domains. R is the set of many tuples like: { A 1 : d 1 , A 2 : d 2 , . . . . . . , A n : d n } , s u c h   t h a t   d i ∈ D i \{A_1:d_1, A_2:d_2,......,A_n:d_n\}, such\ that\ d_i\in D_i {A1:d1,A2:d2,......,An:dn},such that diDi

When we write out a relation as a table, we usually write the attribute names as the colomn, and tuples’ attributes in rows, under the proper name and within the proper domain. Every row is referred to as a relation instance.

1.2 Properties of a relation

  • have a unique name
  • each cell of the relation contains one atomic (single) value (relations satisfy this are referred to as first normal form or normalized)
  • each attribute has a unique name
  • No duplicate tuples (methematically, a relation is not required to have no repeated tuples, however in databases people chose to disallow repeating to simplify the relational data model)
  • orders of attributes or tuples have no significance theoretically.

1.3 Relational keys—identifies each tuple in a relation

Keys that contain more than 1 attribute are often referred to as composite key

  • Superkey

A/A set of key(s) that uniquely identifies a tuple within a relation. may contain additional attributes (unnecessary) for identifying

  • Candicate key

A superkey such that no subsets of it could be superkey (a superkey that contains no unnecessary attributes)
Identifying candicate key can’t only search in the relation, an understanding and guarantee in real-world meaning is needed.

  • Primary key

The candicate key that is selected to be the identifying key within the relation.

  • Alternate key

The candicate keys that are not selected to be primary key.

  • Foreign key

An/A set of attribute(s) that matches the candicate key of some (possibly the same) relation.




Representing a relational schema:

The common convention for representing a relation schema is to give the name of the relation followed by the attribute names in parentheses. Normally, the primary key is underlined.

1.4 Integrity Constraints

Two principal integrity rules for relational models: entity integrity and referential integrity

1.4.1 Nulls

Null represents a value for an attribute that is currently unknown or is not applicable for this tuple.

Unknown/no values supplied yet/not applicable----to deal with incomplete or exceptional data
Nulls may raise implementation problems, since the relational model is based on first-order predicate calculus(一阶逻辑), while allowing nulls means that we have to work with higher-valued logic. (Incorporation of nulls)

1.4.2 Entity integrity (for base relation)

base relation means a relation that physically stored in the database(in contrast to a view).

For base relation, no attribute of a primary key can be null.

Reason:

If we allow a null for any part of a primary key, we are implying that not all the attributes are needed to distinguish between tuples, which contradicts the definition of the primary key.

1.4.3 Referential Integrity (for foreign keys)

Statement:

If a foreign key exists in a relation, either the foreign key value must match a candidate key value of some tuple in its home relation or the foreign key value must be wholly null.

foreign key must be contained in target (home) relation.

1.5 Views

Views in relational models are different from that in three-level architecture.

a view is a virtual or derived relation: a relation that does not necessarily exist in its own right, but may be dynamically derived from one or more base relations.

In other words, a view is a derived relation that doesn’t necessarily exist.

Base relation

Definition: a base relation is a named relation corresponding to an entity in the conceptual schema, whose tuples are physically stored in the database. (与conceptual schema的实体对应的relation)

View (dynamic, changes to base relations reflect immediately in view, vice versa)

The dynamic result of one or more relational operations operating on the base relations to produce another relation. A view is a virtual relation that does not necessarily exist in the database but can be produced upon request by a particular user, at the time of request.

A view is a relation that appears to the user to exist, and could be manupulated as if it is a base relation, but does not necessarily exist in storage (while base relations do).

The content of the view are defined upon queries.

2. Relational algebra1

The relational algebra is a theoretical language with operations that work on one or more relations to define another relation without changing the original relation(s).

Both operands and results are relations (closure: relations are closed under the algebra).

Relational algebra is a relation-at-a-time language in which all tuples are manipulated in one statement without looping.

2.1 Basic 5 operations: selection, projection, Cartesian product, union, set difference

  • Selection: σ p r e d i c a t e ( R ) \sigma_{predicate}(R) σpredicate(R)

Selection selects the tuples that meet the condition predicate.

  • Projection: Π a 1 , a 2 , . . . , a n ( R ) \Pi_{a_1,a_2,...,a_n}(R) Πa1,a2,...,an(R)

Projection defines a relation that contains a vertical subset of R, extracting the specified attributes and eliminating duplicates.

  • Union: R ∪ S R\cup S RS

Union defines a relation that contains all the tuples from both R and S, eliminating duplicated tuples.

Union compatible: two sets having their attributes all matched (not necessarily having same names, but have same domain)

two sets must be union-compatible to perform union.

  • Intersection: R ∩ S R\cap S RS

Intersection defines a relation that contains the tuples that are both in R and S.

Two sets must be union-compatible to perform intersection.

  • Set difference: R − S R-S RS

Set difference defines a relation consisting of the tuples that are in relation R but not in the relation S.

  • Cartesian product: R × S R\times S R×S

Cartesian product difines a relation that contains all possible pairs of tuples from the two relations. The degree is simply added.

The relations must be union-compatible when performing union, intersection and set difference operations.

2.2 Join operations:

Join operation is a selection over the Cartesian product. Join is one of the most difficult operations to implement efficiently.

  • Theta join ( θ \theta θ-join) R ⋈ F S = σ F ( R × S ) R\bowtie _FS=\sigma_F(R\times S) RFS=σF(R×S)

Theta join defines a relation that contains the tuples satisfying the predicate F from the cartesian product of R and S. Equijoin: when the predicate contains only =, the term equijoin is used.

  • Natural join R ⋈ S R\bowtie S RS

The degree of the result is d e g r e e R + d e g r e e S − d e g r e e c o m m o n _ a t t r i b u t e s degree_R+degree_S-degree_{common\_attributes} degreeR+degreeSdegreecommon_attributes.

Natural join is an equijoin over all common attributes.

  • Outer join
    R \leftouterjoin S= ( R ⋈ S ) ∪ R (R\bowtie S)\cup R (RS)R(the missing values in second relation is set to null, it can’t be unioned, just for understanding)
    The outer join preserves tuples that would have been lost by other types of join.

  • Semijoin: R   ▹ F S = Π A ( R ⋈ F S ) R\ \triangleright_{F}S=\Pi_A(R\bowtie_FS) R FS=ΠA(RFS)

The semijoin defines a relation that contains the tuples of R that participates R ⋈ F S R\bowtie_FS RFS .

  • Division operation: R ÷ S R\div S R÷S

Suppose that relation R has attributes set A, relation S has attributes set B, where B ⊆ A B\subseteq A BA.

R ÷ S = Π A − B R\div S=\Pi_{A-B} R÷S=ΠAB(The set of tuples in R that matches with every tuple in S,R中与S里所有属性均有对应的tuples2)

2.3 Aggregation and Grouping Operations

  • Aggregate operations applies the aggregate function list to the relation.

main aggregate functions:

COUNT—returns the number of values in the associated attribute对应attribute有的值的总数

SUM—returns the sum of the values of the associated attribute

AVG—returns the average of the values in the associated attribute

MIN—returns the minimum of the values in the associated attribute

MAX—returns the maximum of the values in the associated attribute

  • Grouping operation

Groups the relation by selected attributes, and then applies the aggregate operation, the result of which are pot into the result relation.

3. Entity-Relationship Modeling

3.1 Concepts of the ER model

  • Entity:

Entity type: a group of objects with same properties, having independent existence.

Entity occurence: Uniquely identifiable object of an entity type.

Strong/Weak Entity types: not existence-dependent on some other entity types.

  • Attributes:

Attribute: property of an entity or a relationship type

Simple/Composite attributes: simple attributes are attributes composed of a single component with an independent existence(cannot be broken down into smaller components). Composite attributes consists of multiple components.

Derived attributes: an attribute whose value can be derived from other attributes.

Attribute domain: set of allowable values for one or more attributes

  • Relationship:

Relationship type: set of meaningful associations among entity types.

Relationship occurrence: Uniquely identifiable association, which includes one occurrence from each participating entity type.

Degree of a relationship: number of participating entities in relationship.

Recursive relationship: same entity type participates more than once in different roles.

3.2 Structural Constraints

  • multiplicity:

Multiplicity is the range of possible occurrences of an entity type that may relate to a single occurrence of the another entity type through a particular relationship.

According to multiplicity, binary relationships are generally referred to as being: one-to-one/one-to-many/many-to-many relationships.

Cardinality: describes the maximum number of possible relationship occurrences for an entity participating in a given relationship type.

Participation: determines whether all or only some entity occurrences participate in a relationship.

3.3 Problems with ER Models–connection traps

  • fan traps

    在这里插入图片描述

    Solution: restructure the model to resolve the fan trap

  • chasm trap:

    missing value

在这里插入图片描述

​ Solution: add another relationship to direct the entities.

3.4 Enhenced ER Modeling: specialization&generalization

Object-oriented.

Inheritance: entity in subclass can inherit some attributes from the superclass, and have some subclass-specific attributes at the same time.

Specialization: the process of maximizing difference between members of an entity by identifying their distinguishing characteristics.

Generalization: the process of minimizing differences between entities by identifying their common characteristics.

Two constraints that apply:

  • Participation constraints

Determines whether every member in superclass must participate as a member of a subclass (是否所有attribute都必须继承) mandatory/optional

  • Disjoint constraint

Determines whether an entity occurrence can be a member of more than one occurrences.

disjoint(only one)/nondisjoint

3.5 Normalization–designing relational database

Normalization is a technique for producing a set of suitable relations that support the data requirements of an enterprise.

  • characteristics of a suitable set of relations:
    • minimal number of attributes necessary for the requirement
    • attributes of one relation have close logical relationship
    • minimal redundancy

Data redundancy may lead to update anomalies, including insertion, deletion and modification.

3.5.1 Functional dependency

—relationship between attributes

e.g. A and B are attributes of relation R, B is functionally dependent on A (denoted A->B) if each value of A in R is associated with exactly one value of B in R.

  • Determinant: refers to the attribute (or group of attributes) that is on the left-hand side of the arrow of functional dependency

  • Full functional dependency: determinants have the minimal number of attributes to maintain functional dependency. (any subset of the determinant cannnot maintain functional dependency)

    与partial functional dependency对应

  • Functional dependency is transitive

3.5.2 Identifying functional dependencies and primary key

if the meaning of the attributes and the relationship between them is well understood, it’s easy to identify these functional dependencies. Otherwise, use sample data and experience.

  • Identify candidate keys

    All attributes that are not part of a candidate key should be functionally dependent on that key.

3.5.3 The process of normalization

  • UNF (Unnormalized form)

    UNF is a table that contains one or more repeating groups.

    Transform the data from the source into table format with columns and rows.

  • 1NF (First Normal Form) every cell has only one value

    1NF is a form that every cell contains only one value.

    Choose a group of attributes as keys, then remove the repeating gtoups by creating new attributes or columns

  • 2NF: 1NF, and every non-primary-key attribute is fully functionally dependent on the primary key

    remove partial dependencies

    identify the primary key, identify the functional dependencies.

    If partial dependencies exist, replace them in a new relation along with a copy of their dominant

  • 3NF: no non-primary-key attribute is transitively dependent on the primary key.

    remove transitive dependencies

4. SQL (Structured Query Language)

Please refer to W3Schools SQL tutorials.

5. Finally, about the exam

The exam contains only MCQs. Be familiar with the SQL, and hopefully you can come out of the exam hall by an hour.


  1. 2019-2020 syllabus did not cover relational calculus. ↩︎

  2. 此处确实较难理解,建议观看:Division in relational algebra ↩︎

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值