数据建模&数据库设计概览

软件系统分析与设计专栏

Data Modeling

● A technique for organizing and documenting a system’s data. Sometimes
called database modeling.
Entity Relationship Diagram (ERD) - a data model utilizing several notations
to depict data in terms of the entities and relationships described by that data.

An ERD Example - Requirement

● We need to store data about STUDENT, COURSE, LECTURE.
● The value of STUDENT ID uniquely identifies one and only one STUDENT. The value of COURSE ID uniquely identifies one and only one COURSE. The value of LECTURE ID uniquely identifies one and only one LECTURE.
● For a student we need to know STUDENT NAME. For a COURSE we need to know COURSE NAME, COURSE DESCRIPTION and COURSE DEPARTMENT. For a LECTURE we need to know LECTURE NAME, LECTURE DEPARTMENT.
● A STUDENT applies zero, one or more COURSE APPLICATIONS.
● A COURSE is applied by one or more COURSE APPLICATIONS.
● A COURSE APPLICATION identifies a single COURSE on a single STUDENT. The STUDENT ID identifies the STUDENT, and the COURSE ID identifies the COURSE. Together, they identify one and only one COURSE APPLICATION.
● A LECTURE teaches zero, one or more COURSES.
● A COURSE is taught by exactly one LECTURE.

Entity - something about which the business needs to store data. A class of persons, places, objects, events or concepts about which we need to capture
and store data.
Attribute - a descriptive property or characteristic of an entity. Synonyms include element, property and field.
Relationship - a natural business association between one or more entities.
在这里插入图片描述

Attribute Domains

Logical Data TypeLogical Business MeaningDomainExamples
NUMBERAny number, real or integer.For integers, specify the range.
For real numbers, specify the range and precision.
{10-99}
{1.000-799.999}
TEXTA string of characters, inclusive of numbers. When numbers are included in a TEXT attribute, it means that we do not expect to perform arithmetic or comparisons with those numbers.Maximum size of attribute. Actual values usually infinite; however, users may specify certain narrative restrictions.Text(30)
MEMOSame as TEXT but of an indeterminate size. Some business systems require the ability to attach potentially lengthy notes to a give database record.
DATEAny date in any format.Variation on the MMDDYYYY format.MMDDYYYY
MMYYYY
TIMEAny time in any format.For AM/PM times: HHMMT
For military (24-hour times): HHMM
HHMMT
HHMM
YES/NOAn attribute that can assume only one of these two values.{YES, NO}{YES, NO}
{ON, OFF}
VALUE SETA finite set of values. In most cases, a coding scheme would be established (e.g.,FR=Freshman, SO=Sophomore, JR=Junior, SR=Senior).{value#1, value#2,…value#n}
{table of codes and meanings}
{M=Male, F=Female}
IMAGEAny picture or image.
Default ValueInterpretation Examples
A legal value from the domainFor an instance of the attribute, if the user does not specify a value, then use this value.
NONE or NULLFor an instance of the attribute, if the user does not specify a value, then leave it blank.
Required or NOT NULLFor an instance of the attribute, require that the user enter a legal value from the domain. (This is used when no value in the domain is common enough to be a default but some value must be entered.)

Identification

● Candidate Key - one of a number keys that may server as the primary key of an entity
● Primary Key - a candidate key that will most commonly be used to uniquely identify a single entity instance
● Alternate Key - a candidate key that is not selected to become the primary key
● Subsetting Criteria - an attribute(s) whose finite values divide entity instances into subsets 属性的有限值将实体实例划分为子集的属性

Cardinality

● Cardinality - the minimum and maximum number of occurrences of one entity that may be related to a single occurrence of the other entity
在这里插入图片描述

Degree

● Degree - the number of entities that participate in a relationship
● Recursive Relationship (degree = 1)
● Binary Relationship (degree = 2)
● Ternary Relationship (degree = 3)
在这里插入图片描述

Foreign Keys

● Parent entity VS Child entity

Parent entity: a data entity that contributes one or more attributes to another entity, called the child. In a one-to-many relationship the parent is the entity on the “one” side
Child entity: a data entity that derives one or more attributes from another entity, called the parent. In a one-to-many relationship the child is the entity on the “many” side

● Strong entity VS Weak entity

Strong (independent) entity: a data entity that does not depend on any other entity for its identification
Weak entity: a data entity whose identification is dependent on the parent entity’s existence

● Identifying relationship VS Nonidentifying relationship

○ Identifying relationship:A relationship in which each participating entity has its own independent primary key
○ Nonidentifying relationship:A relationship in which the parent entity’s key is also part of the primary of the child entity

● Nonspecific relationship

○ A relationship where many instances of an entity are associated with many instances of another entity. Also called many-to-many relationship

● Resolve nonspecific relationships

○ Resolving with an associative entity
○ Resolving by recognizing a fundamental business entity
○ Resolving by recognizing separate relationship

Notations

● Martin / IE / Crow’s Foot
● Chen


Student Information System

Entity Discovery

● Pay attention to key words in the interviews or JRP discussions
● Specifically ask system owners and users to identify things about which they would like to capture and store
● Study existing forms, files and reports. Some forms identify event entities
● Scan each use case narrative for nouns
● Learn physical data models of the existing system

The Context Data Model

在这里插入图片描述

The Key-Based Data Model

在这里插入图片描述


Guidelines for Keys

● The value of a key should not change over the lifetime of each entity instance
● The value of a key cannot be null
● Controls must be installed to ensure that the value of a key is valid

Methods to Generate Keys

● Auto increment
● Global unique
● Intelligence keys (business code)

Generalized Hierarchies

● The subtypes inherit the keys of the supertypes
在这里插入图片描述

The Fully Attributed Data Model

在这里插入图片描述

How to Construct Data Models

● Entity discovery
● Construct the context data model
● Construct the key-based data model
● Generalize hierarchies
● Construct the fully attributed data model

Analyzing the Data Model

● Good data model

○ Simple
○ Essentially nonredundant
○ Flexible and adaptable to future needs

● Normalization (to improve a data model)

○ First normal form (1NF)
An entity whose attributes have no more than one value for a single instance of that entity
○ Second normal form (2NF)
An entity whose non-primary-key attributes are dependent on the full primary key
○ Third normal form (3NF)
An entity whose non-primary-key attributes are not dependent on any other non-primary-key attributes

Third Normal Form

Data-to-Location-CRUD Matrix

Relational Database Design

Logical Data Model to Database Schema

● Each fundamental, associative and weak entity is implemented as a separate table

○ The primary key is identified as such and implemented as an index into the table
○ Each alternative key is implemented as its own index into the table
○ An index should be created for any nonkey attributes that were identified as subsetting criteria
requirements
○ Each foreign key will be implemented as such
○ Attributes will be implemented as fields with clear
Data type
Size of the field
Null or not null
Domains
Default value

● Supertype/subtype entities present additional options

○ Each supertype and subtype can be implemented with a separate table
○ Alternatively, if the subtypes are of similar size and data content, they can be collapsed into
the supertype table
在这里插入图片描述
○ Alternatively, the supertype’s attributes could be duplicated in a table for each subtype
Database Schema

Indexes

● Primary keys
● Alternative keys (unique)
● Query scenarios
● Sorting scenarios

Data and Referential Integrity

● Key Integrity
● Domain Integrity
● Referential Integrity
○ No restriction
○ Delete: Cascade
○ Delete: Restrict
○ Delete: Set null

Think More

● How much data will you have in a period?
● How often will you write into the DB?
● How complex will your queries be?
● What’s the backup and data restore plan?
● Who can access what type data?
● Other performance requirements?
● How to manage schema versions?

Schema Migration

● In software engineering, schema migration (also database migration, database change management) refers to the management of incremental,
reversible changes to relational database schemas. A schema migration is
performed on a database whenever it is necessary to update or revert that
database’s schema to some newer or older version.
● https://en.wikipedia.org/wiki/Schema_migration
● DB Migration in RoR:
https://guides.rubyonrails.org/active_record_migrations.html

Partitioning

● Considering about the table course_completions

○ How many course completions will be added in each semester?
○ Is it required that query the completions in different semesters? If yes, how often?

● Another common case in e-commerce: orders

Distribution

● Single DB Server
● Master/Slave(s) Replication

● Master/Master Replication

DB Users and Privileges

● DB admin
● DB writer
● DB reader

Normalization VS Denormalization

● 3NF can lead to pieces of a query being distributed across many tables,
which leads to slow and complex joins to get the full picture for a query.
● Example
○ As an admin, I want to know a classroom’s schedule (including teacher name, course name, course department, arrangement, etc.)
Normalized
Denormalized
Non Relational Database Design
NoSQL Database Types
● Document databases - pair each key with a complex data structure known as
a document. E.g. MongoDB
● Graph stores - store information about networks of data, such as social
connections. E.g. Neo4J
● Key-value stores - every single item in the database is stored as an attribute
name (or key), together with its value. E.g. Redis
● Wide-column stores - optimized for queries over large datasets, and store
columns of data together, instead of rows. E.g. HBase
Document Database - MongoDB
● Documents in a collection
● Documents can contain many different
key-value pairs, or key-array pairs.
● Very easy to use object-oriented
programming
Document Database - MongoDB
● Documents can contain even nested
documents.
● Rapidly changing structured,
semi-structured, and unstructured data.
● Design question: modeling One-to-N
relationships
Graph Store - Neo4j
● https://neo4j.com/developer/guide-data-modeling/
● https://neo4j.com/graphgists/
● https://neo4j.com/graphgist/restaurant-recommendations
Key-value Store - Redis
● In-memory data structure store
● Used as a database, cache and message broker
○ Database: https://redis.io/topics/twitter-clone
○ Cache
○ Message broker: https://redis.io/topics/pubsub
Wide-column Store - HBase
● This project’s goal is the hosting of very large tables – billions of rows X
millions of columns – atop clusters of commodity hardware.

Readings

● textbook
○ Chapter 8
○ Chapter 14
● Extracurricular readings
○ https://www.mongodb.com/nosql-explained
○ https://redis.io/topics/introduction
○ https://hbase.apache.org/book.html#arch.overview

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值