The Microsoft Data Warehouse Toolkit读书笔记(五):第四章系统安装与物理设计

最新推荐文章于 2024-10-03 20:57:12 发布

brightgems

最新推荐文章于 2024-10-03 20:57:12 发布

阅读量578

点赞数

分类专栏：数据库技术文章标签： microsoft 读书 performance system database 数据库

本文链接：https://blog.csdn.net/brightgems/article/details/2030997

版权

数据库技术专栏收录该内容

5 篇文章 0 订阅

订阅专栏

第一章系统安装与物理设计

一、概述

主要任务包括：

n 决定DW/BI系统的规模

n 生产、测试、开发环境的软硬件配置

n 逻辑模型到物理模型转换

n 如何对事实表分区

n 物理数据如何存储

n 索引计划

二、系统模型考虑

可以从四个方面来考虑：数据容量、应用复杂性、同时在线用户数与系统可用性需求。

Overall System Size	Small	Medium	Large
Range of Configurations	One “Commodity” 32-bit or 64-bit 4-way	One high-end (8 or more processors) 64-bit system with max memory, or distribute components to multiple commodity servers	Distributed system on high-end 64-bit servers; some clusters
Data Volume	< 500 M fact rows, < 50 GB atomic fact data	< 5 B fact rows, < 500 GB atomic fact data	Billions and billions…
Usage Complexity	> 60% Simple use 30% Medium complexity < 10% Demanding use	50% Simple use 35% Medium 15% Demanding	35% Simple 40% Medium 25% Demanding
Number of Users	< a dozen simultaneous users	< 200 simultaneous users	Thousands
Availability	Several hours of downtime each night acceptable	< 1 hour downtime or query slowdown each night	24x7 with excellent query performance

三、系统配置考虑

预算

单机/分布式系统

2．存储系统选择

RAID/SAN

Note	Directly attached RAID disks can offer better performance than SANs, particularly for sequential I/O. However, the advantages of the SAN technology usually, although not always, outweigh the difference in performance for DW/BI applications.

3．高可用性

并行计算；分区技术；数据库镜像技术

四、软件安装与配置

角色不同，需要安装的软件也不同。

We strongly recommend that all relational database files be placed on RAID arrays, either RAID-1 or RAID-5, preferably with hardware controllers rather than managed by the operating system. RAID-1 is significantly better than RAID-5, although predictably more expensive, because it’s faster both for writes and for recovery after a disk failure.

The SQL Server system databases master, model, and msdb can be placed on the RAID-1 array that holds the operating system. These databases are typically very small, and there’s usually plenty of room for them on that array. Alternatively, place them on their own small RAID-1 array. A third alternative is to place them on the same RAID array as user databases. It is vital that these databases, especially master, be placed on a fault-tolerant array.

The fourth system database, tempdb, could grow significantly as the data warehouse database is being used. You should pre-allocate tempdb to a large size to avoid auto-growth during query operations. Don’t place tempdb on the system RAID-1 array which is usually small. Use at least one file per CPU for high-performance tempdb operations. Be sure to spread tempdb out over many drives to maximize I/O performance. If the total number of drives on your system is limited, you can place tempdb on the same RAID array as the user databases, especially if you use RAID-1 or RAID-0+1 for the user databases. However, if you set up the user databases on a RAID-5 array, you should consider separating out tempdb onto its own RAID-1 array to minimize potential bottlenecks.