如何创建一个成功的数据仓库(data warehouse) (想了解数据仓库的人士快看)

原创 2001年09月14日 21:14:00

       如何创建一个成功的数据仓库(data warehose),下面的故事将告诉你!

     

The company's first data warehouse project began with a casual conversation between several executives on their way to lunch. The people involved were the IT manager for decision support as well as several members of a department that had just decided to install a data warehouse. They had planned to install their data warehouse without any involvement from the IT department; nonetheless, the following conversation ensued:

"We urgently need a data warehouse to analyze our data!"
"In that case, why don't you take an OLAP tool with a multidimensional database? "
"Is it possible to make the sales figures available to our sales people?"
"Yes, of course, that's no problem because of its Web capacity."
"We need our answers very fast."
"Performance isn't an issue, the data requested can be made available on a local server."
"Great, when can we start our analyses?"
"Installing such a system shouldn't take more than a few weeks."

Encouraged by these casual tips from an expert, the department decided to build a data warehouse that corresponded exactly to its specific needs. Some months later, the data warehouse was installed according to the original specifications. After the first successes had become public knowledge within the company, other departments began to show interest in the data warehouse. Proudly, the system was displayed, and enthusiasm was spreading. Suddenly, each department wanted their own data warehouse, and requests began to pile up on the desks of the IT managers. However, apart from the casual conversation previously, the IT department had not been involved in the development of this first data warehouse. The project itself had been implemented by the department with the help of an external system integrator. Nobody had planned on integrating additional user groups. It had become imperative to further develop this successful data warehouse. At this point it became clear that the department had locked itself into a data mart with only limited scalability, instead of building a data warehouse with unlimited capacity for expansion. This difference between data marts and data warehouses is a basic issue that the whole company now had to face.

What to do Next?

Based on the situation mentioned, some questions arise, such as: Can a data warehouse originally conceived only for one department be used for the whole company, or should new data marts be built for each department? If the latter solution is preferred, how would one department access data from another department? Who will guarantee that all users will receive exactly the same information? The question about whether to start with data marts or a data warehouse has been widely discussed.1, 2, 3 It can only be answered by clearly defining the project抯 goals: does it have to cover the information needs for certain departments or is it seen as the first step toward a shared enterprise information pool. If only departmental needs are the issue, it will suffice to install some isolated data marts. However, if a company regards access to an integrated, company-wide database as critical for its future survival in the market, then an enterprise data warehouse is the solution to implement.

My thoughts, so far, may have created the impression that there are only two options: either quickly install a few data marts to cover a few departments' current needs for information or embark on the expensive adventure of installing an enterprise data warehouse. There is a third option that combines the best of both worlds and can be implemented quickly without sacrificing future growth options. This third option is to lay down the foundation of an enterprise data warehouse by starting with a scalable data warehouse framework in a pilot project.

How to Proceed

The procedures that lead to this scalable data warehouse pilot project are specifically designed to satisfy two seemingly contradictory requirements ?fast delivery and expandability. Assuming the project is well prepared, it should not take more than three or four months to implement a fully operational pilot for an enterprise data warehouse. After it is finished, a company-wide data warehouse platform will be available allowing users to execute concrete analyses and develop a better understanding of their real and shared needs.

What is the difference between a data warehouse pilot and a departmental data mart? In fact, these two approaches differ more in their strategic goals than in total expenditure required for conceptualization and implementation. To run a project within a department means that you do not have to negotiate with other departments and IT managers ?something that can prove time-consuming. By contrast, if you want to establish a company-wide project, you must coordinate this effort with other departments, IT management and top executives.

Figure 1: The process of creating a pilot for an enterprise.

Figure 1 shows a preparatory phase to start the data warehouse pilot project. Thorough preparations will ensure that the pilot project will not exceed a three to four month time frame. Among other things, the project team will have to clarify technical issues regarding the system and the contracts with the system partners selected, issues mostly arising out of the chosen data warehouse architecture. The selection criteria for the central database computer will depend upon the amount of data anticipated now (and in the possibly known future), the number and types of users and the complexity of the queries. From the user's point of view, the selection of the analysis tool is the most critical issue; however, thanks to standardized interfaces like ODBC, it is not mandatory to stay with a chosen tool forever. In the beginning, it is sufficient to have a suitable OLAP tool for multidimensional analysis and software programs for accessing the data warehouse database directly.

Project Design

During the design phase, all the information necessary for implementing the data warehouse must be gathered, such as:

  • Requirements of the departments regarding the potential information uses;
  • Description of source data used;
  • Definition of business terms, data definitions and transformation rules;
  • Data models for the central data warehouse and the local data marts.

Simultaneously, the necessary hardware and software must be installed. Basically, the design phase can be broken down into four steps:

Business questions from departments. In order to increase the pilot project's chance for success, the selected business questions need to be of the greatest potential usefulness. Business questions do not necessarily have to be stated as questions. Existing reports that contain key figures or concrete suggestions as to analyses not possible before can also be used.

Data sources available. After the users' requests have been roughly analyzed, the IT department must investigate the source systems and interfaces available within the company. Due to the time constraints the pilot project faces, only data can be considered that is available and meets certain quality standards, such as completeness and correct contents. For a successful pilot installation, it is important to focus on the most important requirements. Therefore, the business questions must be correlated to the available data.

Business data model. The business data model reflects the real objects customer, order, product, etc. and their relationships to each other. In order to represent them correctly in the business data model, business rules have to be applied, such as "each order relates to one customer," or "each customer can belong to various categories."4

Logical data model. The logical data model in its normalized form is based on the business data model, and all objects are represented with their attributes. Usually, not all attributes available from the source systems are needed for answering the business questions submitted. However, potentially useful data elements will be integrated so it will not later be necessary to repeat all the analyses performed for the pilot project.

Figure 2: All the data is   available through the access layer.
Figure 2: All the data is available through the access layer.

For reasons of performance or because the query tool requires it, denormalized data models are needed alongside a normalized data model.5 One possibility is to complement the normalized data model with summary tables. In another approach, so-called "star schema" or "star models" are created in addition to the normalized data model (see Figure 2). Together with the normalized data, they are available to uses through views on the database. Each time, the data warehouse is accessed through the security layer, in which all the access authorizations are stores. The normalized approach provides a magnitude of much greater capability and scalability in allowing for any question to be asked of the data and also to easily add more data in the future to the data warehouse database.

Figure 2 represents the data marts as logical structures, i.e., the numbers are recalculated each time they are called up. When starting the pilot project, the first step should be to create the data marts logically. Only if performance is really lacking, will they be physically implemented by using fact tables, since optimizing performance is no the pilot project抯 top priority. If the performance is acceptable, it is sufficient to begin analyzing the data selected. More fine-tuning of both the database and the tools utilized should be accomplished AFTER the users/managers have begun to report their experiences and new value.

A note with regard to these two different data models: the normalized data model represents all the business relationships and, therefore, should not be changed without very good reasons for doing so. By contrast, data marts are for the most part based on star schema models and contain data for specific subject areas. If an organization utilized star models and there are changing business requirements, the data marts have to be redesigned and/or re- adjusted to meet any new business requirements. This can be very expensive and also limit the future scalability and growth of your data warehouse.

Checking the Results

The last step before adopting the logical data model is to check it by using selected business queries. A typical business query may be: "Give me all sales in the a specific month, broken down into industries (i.e.,hotels and restaurants only); number of transactions; types of customers; and mode of payment."

Using this query, system integrator and users check the model table by table to find possible interpretative errors of the data modeler. Experience shows that end users are very well able to understand a logical data model, even if they have never seen one before. Particularly for direct queries to the database, a profound understanding of the data warehouse data model is a necessity.

Implementation of the Design

After finishing the design phase with the system check outlined, the design will be implemented on the target system with the first users creating analysis and reports. The implementation phase consists of all steps necessary to transfer data from the operational systems into the data warehouse.

Core steps for success in using this method are:

  1. The transformation of the logical data model into a physical data structure on the target system;
  2. The creation of extraction and transformation programs;
  3. The implementation of the required control procedures to periodically update the data in the data warehouse;
  4. Users defining and testing their new analysis and reporting.

A pilot project for an enterprise data warehouse will usually contain only a few gigabytes of data, involve one or two departments and two to four source systems.

It may require more coordination between departments, IT managers and company executives than does a quickly installed, isolated data mart, but these efforts will really pay off once the pilot data warehouse is up and running. Your company will be using this platform both for its current informational needs and with an eye to the future as your business and your requirements expand.

 

 

Hive为什么适合做数据仓库

Hive为什么适合做数据仓库
  • u012965373
  • u012965373
  • 2017年10月09日 15:56
  • 179

数据仓库(data warehouse)与商务智能开发工具简介

数据仓库(data warehouse)是一个面向主题的、集成的、稳定的、包含历史数据的数据集合,它用于支持 经营管理中的决策制定过程。所谓主题,是指用户使用数据仓库进行决策时所关心的重点方面。数据仓...
  • mack415858775
  • mack415858775
  • 2013年09月18日 17:30
  • 4270

数据仓库成功的主要指标

数据仓库项目开始之前,为慎重起见,先花费一点时间评估一下继续行进所需要的机构准备情况。 根据从开发实践中积累起来的经验,确定了五个用于区别能够平稳推进的项目与总是出问题的项目的因素。 这些因素是衡...
  • liguoming05
  • liguoming05
  • 2013年09月09日 09:36
  • 1177

数据仓库(七):Oracle Warehouse Builder(OWB)创建数据仓库

本文简述使用OWB创建数据仓库的一般过程。Oracle的OWB是目前最好的三大ETL产品之一。OWB不但可以可以完成数据的抽取、转换和加载,还能帮助用户在Oracle数据库中创建ROLAP(Relat...
  • kingzone_2008
  • kingzone_2008
  • 2013年05月14日 10:01
  • 17766

数据仓库之四部"圣经"

Bill Inmon在Building the Data Warehouse主张建立数据仓库时采用自上而下(DWDM)方式 Ralph Kimball在The DataWarehouse Toolki...
  • u011538954
  • u011538954
  • 2017年02月06日 10:56
  • 678

Oracle数据仓库创建教程

Oracle数据仓库创建教程。如何创建一个数据仓库,创建实例,最近开始Oracle的数仓建模学习,详细记录了图形界面下的 Oracle database 12C 数据仓库创建过程。本教程在Linux环...
  • dream_an
  • dream_an
  • 2016年03月22日 21:13
  • 2415

数据仓库创建步骤

建设数据仓库   建立数据仓库是一个解决企业问题的过程,业务人员往往不懂如何建立和使用数据仓库,发挥其决策支持的作用;信息部门的人员往往又不懂业务,不知道应该建立哪些决策主题,从数据源中抽取哪些...
  • u013412535
  • u013412535
  • 2015年04月26日 18:59
  • 1590

Hadoop之数据仓库构建-Hive

目录 Hive是什么... 1 hadoop是什么... 1 hadoop、hive两者关系... 2 hive的优缺点... 2 hive的存在意义... 2 hive常用命令...
  • erliang20088
  • erliang20088
  • 2015年10月25日 13:26
  • 1496

漫谈数据仓库之维度建模

0x00 前言 下面的内容,是笔者在学习和工作中的一些总结,其中概念性的内容大多来自书中,实践性的内容大多来自自己的工作和个人理解。由于资历尚浅,难免会有很多错误,望批评指正! 概述数据仓库包含的...
  • zhaodedong
  • zhaodedong
  • 2017年01月07日 14:57
  • 3193

什么是数据仓库(Data Warehouse)

什么是数据仓库(Data Warehouse) 数据仓库(Data Warehouse)是一个面向主题的(Subject Oriented)、集成的(Integrate)、相对稳定的(Non-Vo...
  • wk360833257
  • wk360833257
  • 2013年10月09日 11:28
  • 320
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:如何创建一个成功的数据仓库(data warehouse) (想了解数据仓库的人士快看)
举报原因:
原因补充:

(最多只允许输入30个字)