- DW Design Procedure
- The basic step of data warehousing starts with data modeling. i.e. creation dimensions and facts.
- Data warehouse starts with collection of data from source systems such as OLTP,CRM,ERPs etc
- Cleansing and transformation process is done with ETL(Extraction Transformation Loading) tool.
- By the end of ETL process target databases(dimensions, facts) are ready with data which accomplishes the business rules.
- Now finally with the use of Reporting tools(OLAP) we can get the information which is used for decision support.
- Granularity—Granularity is the level of detail required. In other words, should a data warehouse store every single transaction? Should it summarize transactions as a single record for each day, month, year, and so on? The more granularity the data warehouse contains, the bigger fact tables are because the more records they contain.
- Snowflake schema------ a normalized star schema, such that dimension entities are normalized (dimensions are separated into multiple tables). Normalized dimensions have all duplication removed from each dimension, such that the result is a single fact table, connected directly to some of the dimensions.
- class->table; object->column (dimension ,measure, detail)
- Source DB->ETL (Informatica)->Target DB(ErWin) -> redesign many schema on Target DB for different purpose (Desigher: Schema)->Build many Universe for different requirement from schema(Designer)->Build Query&Reporting (BO&Webintelligent)>Admin
- Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
- Difference from OLAP and OLTP
OLTP
OLAP
Current data
Current and historical data
Short database transactions
Long database transactions
Online update/insert/delete
Batch update/insert/delete
Normalization is promoted
De-normalization is promoted
High volume transactions
Low volume transactions
Transaction recovery is necessary
Transaction recovery is not necessary
- Materialized View (MV): The view is a tail raid representation of data, but materialized view is stores precaluculated data, view is a logical structure but MV is physical structure, view is can't occupy the data space (use stored query) but MV occupies data space.
- Star schema contains the dimension tables mapped around one or more fact tables. It is a denormalised model. No need to use complicated joins. Queries results fast.
Snowflake schema It is the normalised form of Star schema. contains in depth joins ,because the tables r spitted in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins ,since we have more tables . There will be some delay in processing the Query .
The snowflake schema is a schema in which the fact table is indirectly linked to a number of dimension tables. The dimension tables are normalized to remove redundant data and partitioned into a number of dimension tables for ease of maintenance. An example of the snowflake schema is the splitting of the Product dimension into the product_category dimension and product_manufacturer dimension..
- Conformed dimensions' are dimensions which are common to the cubes
- Generate DATE DIM drop table generic_date_dim
go
CREATE TABLE generic_date_dim (
generic_date_dim_id int not null primary key,
date_value smalldatetime NOT NULL unique,
the_day varchar(60) NOT NULL,
the_month varchar(60) NOT NULL,
the_year varchar(60) NOT NULL,
day_of_the_month int NOT NULL,
day_of_the_year int NOT NULL,
week_of_the_year int NOT NULL,
month_of_the_year int NOT NULL,
calendar_quarter int NOT NULL,
fiscal_year int NOT NULL,
fiscal_quarter int NOT NULL
)
go
declare @startDay datetime, @endDay datetime
select @startDay = '1/1/2003',
@endDay = '4/30/2004'
declare @i int, @currentDay smalldatetime
set @currentDay = @startDay
begin transaction
while @currentDay <= @endDay
begin
insert into generic_date_dim([generic_date_dim_id], [date_value],
[the_day], [the_month], [the_year], [day_of_the_month], [day_of_the_year],
[week_of_the_year], [month_of_the_year], [calendar_quarter], [fiscal_year],
[fiscal_quarter])
select dbo.date$calcIdValue(@currentDay),
@currentDay as dateValue,
datename(dw, @currentDay) as theDay,
datename(month, @currentDay) as theMonth,
datename(year, @currentDay) as theYear,
datepart(day,@currentDay) as dayOfTheMonth,
datepart(dayofyear,@currentDay) as dayOfTheYear,
datepart(week,@currentDay) as weekOfTheYear,
datepart(month,@currentDay) as monthOfTheYear,
cast((datepart(month,@currentDay) / 4) + 1 as varchar(3)) as
calendarQuarter,
datepart(year, @currentDay) as fiscalYear,
cast((datepart(month,@currentDay) / 4) + 1 as varchar(3)) as fiscalQuarter
if @@error <> 0
begin
raiserror 50000 'Error creating timeByDay dimension'
rollback transaction
goto endofscript
end
set @currentDay = dateadd(day,1,@currentDay)
end
commit transaction
- Data Warehouse Performance Tunings??