Data structure and relationships among entities.
Data Model (conceptual level, logical level, physical level)
Diff b/t Dimensional Model and ER Model
Dimensions, Facts, Granularity and various types of dimensions.
Conceptual Data Model
(collating requirements, focus on entities and relationships)
Logical Data Model
(attributes, key groups, relationships, business rules)
Physical Data Model
(performance of database, strategy of indexing, physical storage, denormalization are considered in this phase)
Logical Data Model | Physical Data Model |
Represents business info and defines business rules | Represents the physical implementation of the model in a database. |
Entity | table |
Attribute | column |
Primary key | primary key constraint |
alternate key | unique constraint or unique index |
Inversion key entry | non unique index |
rule | check constraint, default value |
relationship | foreign key |
definition | comment |
|
|
Basics of RDBMS
1. RDBMS
2. Data modeling approaches(Conceptual, Logical, Physical)
3. ER Modeling
4. Keys(CK, PK, FK)
5. Normalization Levels
6. OLTP
7. Data mart
8. OLAP
9. Datawarehouse(DWH)
Identifying Relationships
(It exists when the child table cannot be uniquely identified in the absence of the parent)
Ex : HolderAccount (p), Account, Holder
Non-Identifying Relationships
(the child can uniquely be identified without the parent)
Ex : Account, AccountType
Types of Entities :
Independent entity, dependent entity
Ex: unit, unit head
Referential Integrity
(Every non-null foreign key value must be matched by an existing primary key value)
Types of Attributes :
Key and Non-key attributes
Required or optional attributes
Simple and composite attributes
Single-valued and multi-valued attributes
Stored, coded, or derived attributes
ER Model only supports two dimensional view of data.
Dimensional database is optimized to retrieve data and analyze it. It tends to be subject oriented.
Dimensional Modeling(DM) is a logical design technique used in DWH.
Mostly denormalized relational model
Made up of entities with attributes
Relationships defined by primary keys and foreign keys
Used by most contemporary BI solutions “Right” mix of normalization and de normalization often called Dimentional Normalization.
Consists of two types of tables : Dimension tables and Fact tables.
Dimension Keys
Dimension Business Key
(Column or columns that uniquely identify an instance of the business record)
(Used in the process of ETL to tie fact records with dimension members)
Dimension Record Surrogate Key
(Defines the dimension’s primary key)
(Relates to the fact table foreign key field)
(Numeric data type, typically integer)
Dimension Surrogate Key
a. Surrogate Key Usage
Consolidates multi-value business keys
Allows tracking of dimension history
Standardizes dimension tables
Limits fact table width for optimization
b. Surrogate key Design Practices
Avoid smart keys
Avoid production keys (may changes!)
A company may acquire a competitor and thereby change the essential building rules changed-record, but intentionally not change the key.
Narrow as possible
Dimension Types
1. Slowly Changing Dimension
It is a dimension whose attributes or attributes for a record(row) change slowly over time.
Type-1(Overwriting the history)
Type-2(Preserving the history)
Type-3(Preserving one or many versions of history)
2. Junk(garbage) Dimension
A garbage Dimension is a dimension that includes low-cardinality columns such as codes, indicators and status flags.
The attributes in a Junk Dimension are not related to any hierarchy.
3. Role-playing Dimension
A single dimension which gets differently expressed in a fact table using views is called a role-playing dimension.
Ex : Create two views for OrderDate and ReceivedData based on the same date dimension table.
Facts (or measures) can be categorized by
a) Numerical data type
b) Aggregation type
c) Additive nature
d) Granularity
Useful facts normally are numeric and additive.
Fact Types :
1) Transactional
2) Additive
Additive (Facts that can be summed up/aggregated across all dimensions within the fact table)
ex : dollars sold
Semi-Additive (Facts that can be added up for some of dimensions in a fact table, but not for others)
ex : account balance, inventory level, distinct counts
Non-Additive (Facts cannot be summed up for any of the dimensions which are present in the fact table)
Ex: measurement of room temperature
3) Snapshot or inventory
4) Factless
5) Derived
6) Aggregate
When to use snowflake
1) The dimension table has two or more sets of attributes which normally define information at different grains.
2) The attribute sets of the same dimension table are getting populated by different source systems.
It is not possible to implement a many to many relationship in a physical model as two tables can’t be children of one another
It needs to use a bridge table to solve this issue.
Introduce a junction table Customer_PaymentModes, which has many to one relationship with both Customer and PaymentModes.