如何创建一个成功的数据仓库(data warehouse) (想了解数据仓库的人士快看)

       如何创建一个成功的数据仓库(data warehose),下面的故事将告诉你!

     

The company's first data warehouse project began with a casual conversation between several executives on their way to lunch. The people involved were the IT manager for decision support as well as several members of a department that had just decided to install a data warehouse. They had planned to install their data warehouse without any involvement from the IT department; nonetheless, the following conversation ensued:

"We urgently need a data warehouse to analyze our data!"
"In that case, why don't you take an OLAP tool with a multidimensional database? "
"Is it possible to make the sales figures available to our sales people?"
"Yes, of course, that's no problem because of its Web capacity."
"We need our answers very fast."
"Performance isn't an issue, the data requested can be made available on a local server."
"Great, when can we start our analyses?"
"Installing such a system shouldn't take more than a few weeks."

Encouraged by these casual tips from an expert, the department decided to build a data warehouse that corresponded exactly to its specific needs. Some months later, the data warehouse was installed according to the original specifications. After the first successes had become public knowledge within the company, other departments began to show interest in the data warehouse. Proudly, the system was displayed, and enthusiasm was spreading. Suddenly, each department wanted their own data warehouse, and requests began to pile up on the desks of the IT managers. However, apart from the casual conversation previously, the IT department had not been involved in the development of this first data warehouse. The project itself had been implemented by the department with the help of an external system integrator. Nobody had planned on integrating additional user groups. It had become imperative to further develop this successful data warehouse. At this point it became clear that the department had locked itself into a data mart with only limited scalability, instead of building a data warehouse with unlimited capacity for expansion. This difference between data marts and data warehouses is a basic issue that the whole company now had to face.

What to do Next?

Based on the situation mentioned, some questions arise, such as: Can a data warehouse originally conceived only for one department be used for the whole company, or should new data marts be built for each department? If the latter solution is preferred, how would one department access data from another department? Who will guarantee that all users will receive exactly the same information? The question about whether to start with data marts or a data warehouse has been widely discussed.1, 2, 3 It can only be answered by clearly defining the project抯 goals: does it have to cover the information needs for certain departments or is it seen as the first step toward a shared enterprise information pool. If only departmental needs are the issue, it will suffice to install some isolated data marts. However, if a company regards access to an integrated, company-wide database as critical for its future survival in the market, then an enterprise data warehouse is the solution to implement.

My thoughts, so far, may have created the impression that there are only two options: either quickly install a few data marts to cover a few departments' current needs for information or embark on the expensive adventure of installing an enterprise data warehouse. There is a third option that combines the best of both worlds and can be implemented quickly without sacrificing future growth options. This third option is to lay down the foundation of an enterprise data warehouse by starting with a scalable data warehouse framework in a pilot project.

How to Proceed

The procedures that lead to this scalable data warehouse pilot project are specifically designed to satisfy two seemingly contradictory requirements ?fast delivery and expandability. Assuming the project is well prepared, it should not take more than three or four months to implement a fully operational pilot for an enterprise data warehouse. After it is finished, a company-wide data warehouse platform will be available allowing users to execute concrete analyses and develop a better understanding of their real and shared needs.

What is the difference between a data warehouse pilot and a departmental data mart? In fact, these two approaches differ more in their strategic goals than in total expenditure required for conceptualization and implementation. To run a project within a department means that you do not have to negotiate with other departments and IT managers ?something that can prove time-consuming. By contrast, if you want to establish a company-wide project, you must coordinate this effort with other departments, IT management and top executives.

Figure 1: The process of creating a pilot for an enterprise.

Figure 1 shows a preparatory phase to start the data warehouse pilot project. Thorough preparations will ensure that the pilot project will not exceed a three to four month time frame. Among other things, the project team will have to clarify technical issues regarding the system and the contracts with the system partners selected, issues mostly arising out of the chosen data warehouse architecture. The selection criteria for the central database computer will depend upon the amount of data anticipated now (and in the possibly known future), the number and types of users and the complexity of the queries. From the user's point of view, the selection of the analysis tool is the most critical issue; however, thanks to standardized interfaces like ODBC, it is not mandatory to stay with a chosen tool forever. In the beginning, it is sufficient to have a suitable OLAP tool for multidimensional analysis and software programs for accessing the data warehouse database directly.

Project Design

During the design phase, all the information necessary for implementing the data warehouse must be gathered, such as:

  • Requirements of the departments regarding the potential information uses;
  • Description of source data used;
  • Definition of business terms, data definitions and transformation rules;
  • Data models for the central data warehouse and the local data marts.

Simultaneously, the necessary hardware and software must be installed. Basically, the design phase can be broken down into four steps:

Business questions from departments. In order to increase the pilot project's chance for success, the selected business questions need to be of the greatest potential usefulness. Business questions do not necessarily have to be stated as questions. Existing reports that contain key figures or concrete suggestions as to analyses not possible before can also be used.

Data sources available. After the users' requests have been roughly analyzed, the IT department must investigate the source systems and interfaces available within the company. Due to the time constraints the pilot project faces, only data can be considered that is available and meets certain quality standards, such as completeness and correct contents. For a successful pilot installation, it is important to focus on the most important requirements. Therefore, the business questions must be correlated to the available data.

Business data model. The business data model reflects the real objects customer, order, product, etc. and their relationships to each other. In order to represent them correctly in the business data model, business rules have to be applied, such as "each order relates to one customer," or "each customer can belong to various categories."4

Logical data model. The logical data model in its normalized form is based on the business data model, and all objects are represented with their attributes. Usually, not all attributes available from the source systems are needed for answering the business questions submitted. However, potentially useful data elements will be integrated so it will not later be necessary to repeat all the analyses performed for the pilot project.

Figure 2: All the data is 
available through the access layer.
Figure 2: All the data is available through the access layer.

For reasons of performance or because the query tool requires it, denormalized data models are needed alongside a normalized data model.5 One possibility is to complement the normalized data model with summary tables. In another approach, so-called "star schema" or "star models" are created in addition to the normalized data model (see Figure 2). Together with the normalized data, they are available to uses through views on the database. Each time, the data warehouse is accessed through the security layer, in which all the access authorizations are stores. The normalized approach provides a magnitude of much greater capability and scalability in allowing for any question to be asked of the data and also to easily add more data in the future to the data warehouse database.

Figure 2 represents the data marts as logical structures, i.e., the numbers are recalculated each time they are called up. When starting the pilot project, the first step should be to create the data marts logically. Only if performance is really lacking, will they be physically implemented by using fact tables, since optimizing performance is no the pilot project抯 top priority. If the performance is acceptable, it is sufficient to begin analyzing the data selected. More fine-tuning of both the database and the tools utilized should be accomplished AFTER the users/managers have begun to report their experiences and new value.

A note with regard to these two different data models: the normalized data model represents all the business relationships and, therefore, should not be changed without very good reasons for doing so. By contrast, data marts are for the most part based on star schema models and contain data for specific subject areas. If an organization utilized star models and there are changing business requirements, the data marts have to be redesigned and/or re- adjusted to meet any new business requirements. This can be very expensive and also limit the future scalability and growth of your data warehouse.

Checking the Results

The last step before adopting the logical data model is to check it by using selected business queries. A typical business query may be: "Give me all sales in the a specific month, broken down into industries (i.e.,hotels and restaurants only); number of transactions; types of customers; and mode of payment."

Using this query, system integrator and users check the model table by table to find possible interpretative errors of the data modeler. Experience shows that end users are very well able to understand a logical data model, even if they have never seen one before. Particularly for direct queries to the database, a profound understanding of the data warehouse data model is a necessity.

Implementation of the Design

After finishing the design phase with the system check outlined, the design will be implemented on the target system with the first users creating analysis and reports. The implementation phase consists of all steps necessary to transfer data from the operational systems into the data warehouse.

Core steps for success in using this method are:

  1. The transformation of the logical data model into a physical data structure on the target system;
  2. The creation of extraction and transformation programs;
  3. The implementation of the required control procedures to periodically update the data in the data warehouse;
  4. Users defining and testing their new analysis and reporting.

A pilot project for an enterprise data warehouse will usually contain only a few gigabytes of data, involve one or two departments and two to four source systems.

It may require more coordination between departments, IT managers and company executives than does a quickly installed, isolated data mart, but these efforts will really pay off once the pilot data warehouse is up and running. Your company will be using this platform both for its current informational needs and with an eye to the future as your business and your requirements expand.

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
数据仓库又称为数据集市,是一个用于存储和管理企业各种数据的中心化系统。它通过将来自不同来源的数据进行整合和转化,为企业提供决策支持和业务分析的数据基础。 数据仓库的主要目的是提供一种经过整理和预处理的数据源,供企业管理层和决策者进行分析和决策使用。它不只是一个数据库,而是一个面向主题的跨系统的集成数据源,在数据集成和数据转换的基础上,提供简单、一致和易理解的数据视图。数据仓库包括了基本数据集以及与决策过程相关的元数据,以支持数据挖掘、OLAP分析和预测等业务需求。 数据仓库的优势主要体现在以下几个方面: 1. 数据一致性:数据仓库将来自不同系统的数据进行整合和统一,保证了数据在各个应用系统之间的一致性和准确性。 2. 决策支持:数据仓库为企业管理层提供了全面、准确的数据信息,帮助他们进行决策和制定战略。 3. 数据分析:通过数据仓库,企业可以进行复杂的数据分析,挖掘隐藏在海量数据背后的有价值信息,从而发现业务机会、预测市场趋势等。 4. 灵活性和可扩展性:数据仓库的设计可以根据企业需求进行灵活的配置和扩展,以适应业务的变化和发展。 总之,数据仓库作为企业信息化建设的核心组成部分,帮助企业整合、管理和分析多源、多种类的数据,为企业的决策和业务提供了有力支持,成为现代企业管理的重要工具。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值