TIPS: Data Warehouse

3 篇文章 0 订阅

 

  1. DW Design Procedure
    • The basic step of data warehousing starts with data modeling. i.e. creation dimensions and facts.
    • Data warehouse starts with collection of data from source systems such as OLTP,CRM,ERPs etc
    • Cleansing and transformation process is done with ETL(Extraction Transformation Loading) tool.
    • By the end of ETL process target databases(dimensions, facts) are ready with data which accomplishes the business rules.
    • Now finally with the use of Reporting tools(OLAP) we can get the information which is used for decision support.
  2.  

  3. GranularityGranularity is the level of detail required. In other words, should a data warehouse store every single transaction? Should it summarize transactions as a single record for each day, month, year, and so on? The more granularity the data warehouse contains, the bigger fact tables are because the more records they contain.
  4.  

  5. Snowflake schema------ a normalized star schema, such that dimension entities are normalized (dimensions are separated into multiple tables). Normalized dimensions have all duplication removed from each dimension, such that the result is a single fact table, connected directly to some of the dimensions.
  6.  

  7. class->table; object->column (dimension ,measure, detail)
  8.  

  9. Source DB->ETL (Informatica)->Target DB(ErWin) -> redesign many schema on Target DB for different purpose (Desigher: Schema)->Build many Universe for different requirement from schema(Designer)->Build Query&Reporting (BO&Webintelligent)>Admin
  10.  

  11. Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.
  12.  

  13. Difference from OLAP and OLTP
  14. OLTP

    OLAP

    Current data

    Current and historical data

    Short database transactions

    Long database transactions

    Online update/insert/delete

    Batch update/insert/delete

    Normalization is promoted

    De-normalization is promoted

    High volume transactions

    Low volume transactions

    Transaction recovery is necessary

    Transaction recovery is not necessary

     

  15. Materialized View (MV): The view is a tail raid representation of data, but materialized view is stores precaluculated data, view is a logical structure but MV is physical structure, view is can't occupy the data space (use stored query) but MV occupies data space.
  16.  

  17. Star schema contains the dimension tables mapped around one or more fact tables. It is a denormalised model. No need to use complicated joins. Queries results fast.
  18.  

    Snowflake schema It is the normalised form  of Star schema. contains in depth joins ,because the tables r spitted in to many pieces. We can easily do modification directly in the tables. We have to use complicated joins ,since we have more tables . There will be some delay in processing the Query .

    The snowflake schema is a schema in which the fact table is indirectly linked to a number of dimension tables. The dimension tables are normalized to remove redundant data and partitioned into a number of dimension tables for ease of maintenance. An example of the snowflake schema is the splitting of the Product dimension into the product_category dimension and product_manufacturer dimension..

     

  19. Conformed dimensions' are dimensions which are common to the cubes
  20.  

  21. Generate DATE DIM drop table generic_date_dim
  22. go

    CREATE TABLE generic_date_dim (

           generic_date_dim_id  int not null primary key,

           date_value           smalldatetime NOT NULL unique,

           the_day              varchar(60) NOT NULL,

           the_month            varchar(60) NOT NULL,

           the_year             varchar(60) NOT NULL,

           day_of_the_month     int NOT NULL,

           day_of_the_year      int NOT NULL,

           week_of_the_year     int NOT NULL,

           month_of_the_year    int NOT NULL,

           calendar_quarter     int NOT NULL,

           fiscal_year          int NOT NULL,

           fiscal_quarter       int NOT NULL

    )

    go

     

     

    declare @startDay datetime, @endDay datetime

     

     

    select  @startDay = '1/1/2003',

     @endDay =   '4/30/2004'

     

     

    declare @i int, @currentDay smalldatetime

    set @currentDay = @startDay

     

     

    begin transaction

     

    while @currentDay <= @endDay

    begin

     

    insert into generic_date_dim([generic_date_dim_id], [date_value],

    [the_day], [the_month], [the_year], [day_of_the_month], [day_of_the_year],

    [week_of_the_year], [month_of_the_year], [calendar_quarter], [fiscal_year],

    [fiscal_quarter])

      select  dbo.date$calcIdValue(@currentDay),

      @currentDay as dateValue,

      datename(dw, @currentDay) as theDay,

      datename(month, @currentDay) as theMonth,

      datename(year, @currentDay) as theYear,

      datepart(day,@currentDay) as dayOfTheMonth,

      datepart(dayofyear,@currentDay) as dayOfTheYear,

      datepart(week,@currentDay) as weekOfTheYear,

      datepart(month,@currentDay) as monthOfTheYear,

      cast((datepart(month,@currentDay) / 4) + 1 as varchar(3)) as

      calendarQuarter,

      datepart(year, @currentDay) as fiscalYear,

      cast((datepart(month,@currentDay) / 4) + 1 as varchar(3)) as fiscalQuarter

     

      if @@error <> 0

      begin

      raiserror 50000 'Error creating timeByDay dimension'

      rollback transaction

      goto endofscript

      end

     

      set @currentDay = dateadd(day,1,@currentDay)

      end

      

    commit transaction

     

     

  23. Data Warehouse Performance Tunings??
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值