只作为让自己坚持读下去的动力
In this first chapter we lay the groundwork for the case studies that follow.
We’ll begin by stepping back to consider data warehousing from a macro perspective.
Some readers may be disappointed to learn that it is not all about
tools and techniques—first and foremost, the data warehouse must consider
the needs of the business. We’ll drive stakes in the ground regarding the goals
of the data warehouse while observing the uncanny similarities between the
responsibilities of a data warehouse manager and those of a publisher. With
this big-picture perspective, we’ll explore the major components of the warehouse
environment, including the role of normalized models. Finally, we’ll
close by establishing fundamental vocabulary for dimensional modeling. By
the end of this chapter we hope that you’ll have an appreciation for the need
to be half DBA (database administrator) and half MBA (business analyst) as
you tackle your data warehouse.
在第一章中,我们主要是为后续的案例学习打下基础。我们将从宏观角度上去一步步地来考虑数据仓库。一些读者听到这些开发工具和技术,可能会感到失望。首先,数据仓库必须考虑业务的需求。如果我们留心数据仓库管理者和出版管理者之间职责的离奇的类似处,我们将为数据仓库的目标打下根基。从长远角度考虑,我们将研究包括标准建模角色在内的数据仓库开发的主要组件。最后,我们将以纬度建模的基本语法来结束本章。在本章结束的时候,我们希望你的能力在处理自己数据库仓库的时候,可以满足半个数据库管理员和半个商业分析者的需要。
Chapter 1 discusses the following concepts:
__ Business-driven goals of a data warehouse
__ Data warehouse publishing
__ Major components of the overall data warehouse
__ Importance of dimensional modeling for the data
warehouse presentation area
__ Fact and dimension table terminology
__ Myths surrounding dimensional modeling
__ Common data warehousing pitfalls to avoid
第一章 讨论以下内容:
-数据仓库的商业驱动目标
-数据仓库的发布
-整个数据仓库的主要组件
-数据纬度建模的重要性
-事实表和纬度表术语
-围绕在纬度建模的神秘
-应该避免的数据仓库的共同缺陷。
data warehouse requirements.
The data warehouse must make an organization’s information easily accessible.
The data warehouse must present the organization’s information consistently.
The data warehouse must be adaptive and resilient to change.
The data warehouse must be a secure bastion that protects our information
The data warehouse must serve as the foundation for improved decision
making.
The business community must accept the data warehouse if it is to be
deemed successful.
数据仓库需求
数据仓库要有易于读取
数据仓库要始终反映公司信息
数据仓库要适应不断修改的需求
数据仓库要有足够的安全性来保证企业信息。
数据仓库要为提高我们的决策能力服务。
数据仓库的成功要得到商业部门的认可。
Components of a Data Warehouse
One of the biggest threats to
data warehousing success is confusing the components’ roles and function
there are four separate and distinct components to
be considered as we explore the data warehouse environment—operational
source systems, data staging area, data presentation area, and data access tools.
混淆组件及功能是数据仓库获得成功最大的障碍之一。四大公认数据仓库组件是:源操作系统,数据处理区域,数据展现区域和数据访问工具。
Data Staging Area
The data staging area of the data warehouse is both a storage area and a set of
processes commonly referred to as extract-transformation-load (ETL). The data
staging area is everything between the operational source systems and the
data presentation area. It is somewhat analogous to the kitchen of a restaurant,
where raw food products are transformed into a fine meal. In the data warehouse,
raw operational data is transformed into a warehouse deliverable fit for
user query and consumption. Similar to the restaurant’s kitchen, the backroom
data staging area is accessible only to skilled professionals. The data warehouse
kitchen staff is busy preparing meals and simultaneously cannot be
responding to customer inquiries. Customers aren’t invited to eat in the
kitchen. It certainly isn’t safe for customers to wander into the kitchen. We
wouldn’t want our data warehouse customers to be injured by the dangerous
equipment, hot surfaces, and sharp knifes they may encounter in the kitchen,
so we prohibit them from accessing the staging area. Besides, things happen in
the kitchen that customers just shouldn’t be privy to.
数据处理区域是数据存储和数据处理的ETL过程,是源操作系统和数据展现区域之间的桥梁。
它就好比是厨房,将原汁原味的食物变成美味佳肴。在数据仓库体系中,原操作数据带着用户的怀疑被转换处理后装载进数据仓库。正如厨房一样,只有专业的厨师才可以进入。数据仓库的厨房人员忙于准备美味佳肴,从而不可能去对用户的质疑做出答复。顾客不能被邀请进入触犯是为顾客的安全着想。我们很不希望我们的数据仓库客户因为危险的设备,热水,尖刀,受到伤害,所以我们阻止他们访问数据处理区域,此外,厨房的事情对客户来说也是保密的。
The key architectural requirement for the data staging area is that it is off-limits to
business users and does not provide query and presentation services.
数据处理区域的关键需求是对商业用户保密的,也不会提供查询和展现服务的。
It is acceptable to create a normalized database to support the staging processes;
however, this is not the end goal. The normalized structures must be off-limits to
user queries because they defeat understandability and performance. As soon as
database supports query and presentation services, it must be considered part of
data warehouse presentation area. By default, normalized databases are excluded
from the presentation area, which should be strictly dimensionally structured.
建立支持处理过程的标准数据库是允许的,但是这不是最终目标。因为它很差的易懂性和效率性,标准结构是不支持用户的查询的。一旦数据库支持查询和展现服务,它就必须被认为是数据仓库展现区域的一部分。一般地,标准数据库必须被严格纬度化才可能进入数据展现区域。
Data in the queryable presentation area of the data warehouse must be dimensional,
must be atomic, and must adhere to the data warehouse bus architecture.
数据仓库的数据展现区域的数据必须是纬度化的,必须是粒度化的,并且必须与数据仓库总体结构相符合。