The Microsoft Data Warehouse Toolkit读书笔记(五):第四章 系统安装与物理设计

第一章 系统安装与物理设计

一、概述

主要任务包括:

n         决定DW/BI系统的规模

n         生产、测试、开发环境的软硬件配置

n         逻辑模型到物理模型转换

n         如何对事实表分区

n         物理数据如何存储

n         索引计划

 

二、系统模型考虑

可以从四个方面来考虑:数据容量、应用复杂性、同时在线用户数与系统可用性需求。

Overall System Size

Small

Medium

Large

Range of Configurations

One “Commodity”

32-bit or 64-bit

4-way

One high-end (8 or more processors) 64-bit system with max memory, or distribute components to multiple commodity servers

Distributed system on high-end 64-bit servers; some clusters

Data Volume

< 500 M fact rows,

< 50 GB atomic fact data

< 5 B fact rows,

< 500 GB atomic fact data

Billions and billions…

Usage Complexity

> 60% Simple use

30% Medium complexity

< 10% Demanding use

50% Simple use

35% Medium

15% Demanding

35% Simple

40% Medium

25% Demanding

Number of Users

< a dozen simultaneous users

< 200 simultaneous users

Thousands

Availability

Several hours of downtime each night acceptable

< 1 hour downtime or query slowdown each night

24x7 with excellent query performance

 

三、系统配置考虑

预算

单机/分布式系统

2.存储系统选择

RAID/SAN

Note 

Directly attached RAID disks can offer better performance than SANs, particularly for sequential I/O. However, the advantages of the SAN technology usually, although not always, outweigh the difference in performance for DW/BI applications.

 

3.高可用性

并行计算;分区技术;数据库镜像技术

四、软件安装与配置

角色不同,需要安装的软件也不同。

We strongly recommend that all relational database files be placed on RAID arrays, either RAID-1 or RAID-5, preferably with hardware controllers rather than managed by the operating system. RAID-1 is significantly better than RAID-5, although predictably more expensive, because it’s faster both for writes and for recovery after a disk failure.

The SQL Server system databases master, model, and msdb can be placed on the RAID-1 array that holds the operating system. These databases are typically very small, and there’s usually plenty of room for them on that array. Alternatively, place them on their own small RAID-1 array. A third alternative is to place them on the same RAID array as user databases. It is vital that these databases, especially master, be placed on a fault-tolerant array.

The fourth system database, tempdb, could grow significantly as the data warehouse database is being used. You should pre-allocate tempdb to a large size to avoid auto-growth during query operations. Don’t place tempdb on the system RAID-1 array which is usually small. Use at least one file per CPU for high-performance tempdb operations. Be sure to spread tempdb out over many drives to maximize I/O performance. If the total number of drives on your system is limited, you can place tempdb on the same RAID array as the user databases, especially if you use RAID-1 or RAID-0+1 for the user databases. However, if you set up the user databases on a RAID-5 array, you should consider separating out tempdb onto its own RAID-1 array to minimize potential bottlenecks.

 

五、物理设计

代理键

使用数值型,并且占用字节尽可能小

2.字符字段

定义三种类型 :

Short string with 20 characters

Intermediate string with 50 characters

Long string with 100 characters

小于10位用CHAR

3Insert an Unknown Member Row

为每个维度插入一条缺省数据

4Table and Column Extended Properties

将元数据定义在数据库对象的扩展属性上,如为表个字段增加描述,SCD等其它元数据

5Housekeeping Columns

定义审核维,并在事实表中与它关联;用来跟踪数据如何被加载进DW系统,一般在ETL阶段填充数据。

6.索引与约束

应当定义主外键,但在加载时应让其失效。少定义复合索引,通常一个外键定义一个索引。聚集索引一般定义为日期字段。

7.事实表分区

分区函数两端应是空分区。在初始加载时通常用到swap partition,以利于并行加载

 

8Staging Tables

只将加载时需要查询的数据放出关系数据库,如不需要转换,则应使用文件系统。

9.元数据

定义元数据库

 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值