【网摘】Data Warehousing and BI Introduction

Resource:

Data Modeling Resource Center: http://infogoal.com/dmc/dmcdmd.htm

Data Warehousing Tutorial:  http://www.infogoal.com/datawarehousing/

  

  

Data Warehousing and Business Intelligence Introduction

from http://www.infogoal.com/datawarehousing/overview.htm

In this part of the Data Warehousing and Business Intelligence Tutorial we will discuss:

  • Definition of Business Intelligence
  • Definition of Data Warehousing
  • Benefits of Data Warehousing
  • Decisions Impact the Bottom Line
  • Operational Data Versus Warehouse Data
  • Data timeliness, consistency, and comparability
  • Decision Support Goals
  • What Data Warehouse IS and IS NOT
What is Business Intelligence (BI)?

Business Intelligence (BI) is the practice of supporting decision making through the presentation and analysis of data.  BI supports analytical processes such as:  customer segmentation, product mix analysis, budgeting, and operations improvements.  Techniques and tools associated with BI including data mining, dashboards, statistical analysis and regression modeling.

【 

     BI ---(function/purpose)---  Decision Making

     BI ---(Based On) --- Data Warehouse

     BI –(technique/tools)--  data mining / dashboards / statistical analysis / regression modeling

 

The article titled Business Intelligence and the Data Warehouse provides further information about BI.  Data is the needed raw material for material for BI and that is where Data Warehousing comes into play.

  
What is Data Warehousing?

Data Warehousing is a total architecture for collecting, storing, and delivering decision support data for an entire enterprise. Data warehousing is a broad area that is described point by point in this series of tutorials.

William (Bill) H. Inmon has provided an alternate and useful definition, “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”

As a total architecture it includes people, processes and technologies to achieve its goal which is to provide decision support data that is:

  •  
    •  
      •  
        • Consistent across the enterprise
        • Integrated
        • Standardized
        • Easy to access from multiple viewpoints
        • Easy to understand

   Dimension and Fact tables

Benefits of Data Warehousing

Data warehousing and business intelligence efforts improve decision making which in turn provides numerous benefits including:

  • Identifies ways to increase revenues or results
  • Helps to control costs
  • Helps to manage risks
  • Helps to increase customer satisfaction
Decisions Impact The Bottom Line

Decisions can impact the bottom line by reducing cost and increasing revenues.

Costs may be reduced by:

  •  
    •  
      •  
        •  
          • Avoiding problems such a poor credit risks
          • Negotiating improvements in supply
          • Dropping unprofitable products
          • Reducing waste due to low quality

Revenues may be increased by:

  •  
    •  
      • Understanding and better serving customers
      • Focusing on the most profitable products
      • Cross selling to customers
      • Capitalizing on trends
      • Growing marketing opportunities
Operational Data vs. Data Warehouse Data

The analytic data stored in the data warehouse is different from operational data.

Operational data is:

  1. Optimized for Transaction Processing
  2. Frequently Updated
  3. Designed using Entity Relationship Modeling

In contrast, data warehouse / analytical data is:

  • Optimized for Analytical Processing
  • Not Updated (It is loaded instead)
  • Designed using multiple patterns including Entity Relationship Modeling and Multidimensional Modeling

-- Star Schema / Snowflake

  

High Quality Data

High quality data has the following characteristics:

  • Timeliness - Data is up to date
  • Consistency - The same answers are produced each time in each place
  • Comparability - Numbers can be added and compared

See our article Data Sources for Data Warehousing and Business Intelligence to learn how to assess and improve the quality of data.

Decision Support Goals

The Decision Support function typically has the following goals:

  • Make fact based decisions
  • Make timely decisions
  • Make profitable decisions that reduce costs and increase revenue

These decisions can support a number of stakeholders

  • Customers
  • Employees
  • Shareholders
  • Suppliers
  • Community
What a Data Warehouse Is and Is Not

Let's make clear what a data warehouse is and is not.

A data warehouse is:

  • A copy of operational or other data
  • Data useful for strategic decision making

A data warehouse is not:

  • Another name for a database
  • All historical data
  • Operational data

================================================================================

Business Intelligence and the Data Warehouse

There are numerous tools available for analyzing and presenting data.  We will help you sort through the options.  The types of tools breaks down into these categories:

  • Interactive Query and Analysis Tools
  • Reporting Tools
  • Data Mining Tools

This tutorial provides an overview of each of these types of tools.  It is our plan to create a multi part tutorial that provides further detail about each major business intelligence tool category and technique.

Interactive Query and Analysis Tools

Query tools enable the exploration of data through a user friendly exploration interface.  These tools are typically: 

  • Interactive
  • Ad Hoc
  • Driven By Spontaneous User Questions
  • Display Lower Volumes of data

They provide views of data that follow familiar patterns: 

  • Spreadsheet
  • Drill Down
  • Roll Up
  • Pivot

Examples of query tools include:

  • Access and Excel - Microsoft
  • ProClairity - Microsoft
  • BusinessQuery - Business Objects (now SAP)
  • ReportNet – Cognos
Reporting Tools

Reporting tools produce outputs that can be stored and reviewed.  Often reports are produced on time schedule such as monthly.  

Reporting tools are typically:

  • Less Interactive
  • Less Ad Hoc
  • Report View of Data (Header and Detail)
  • Driven By Pre-established user questions
  • Display Moderate Volumes of data

Examples of reporting tools include: 

  • Access - Microsoft
  • Crystal Reports
  • Managed Reporting Environment - SolutionsIQ
  • SQL Server 2005 Reporting Services - Microsoft
 
 
Data Mining Tools

Data mining tools are used by specialized analysts and driven by the search for patterns. Analytic methods are used such as:

  • Associations and Clusters
  • Decision Trees
  • Fuzzy Logic
  • Genetic Algorithm
  • Naive Bayes 
  • Neural Networks
  • Regression Trees
  • Sequential Clusters
  • Time Series

In addition to these methods, mathematical techniques are used: linear regression, probability and optimization algorithms.

Examples of data mining tools include: 

  • BusinessMiner - SAP/Business Objects
  • Enterprise Miner - SAS
  • See5 - RuleQuest Research
  • SQL Server Analysis Services – Microsoft

The patterns and rules discovered through data mining can be used to improve decision making and to forecast the results of those decisions.

=================================================================================

Data Warehousing Architecture

Technical architecture is all about making the right choices for the data warehousing and business intelligence effort.  This article will help you to set the foundation for the successful data warehouse.

According to IEEE standard 1471-2000, "Software architecture is the fundamental organization of a system,
embodied in its components, their relationships to each other and the environment, and the principles governing its design and evolution".

Data warehousing technical architecture includes:

  • Functional and Non-functional Requirements
  • Architectural Principles
  • Buy, Build or Re-Use
  • Metadata
  • Data Sources
  • Extracting
  • Physical Storage and Operation
  • Data Model Patterns
  • Mapping, Transforming, Enriching, and Loading
  • Analyzing and Presenting
  • Managing, Operating, and Securing
  • Architecture Roadmaps

This is a large topic, so there are many references to supporting data warehousing and business intelligence articles at this and other websites.

At another level, data warehousing architecture builds on the classic system pattern: input, process and output:

Data Warehousing Basic Technical Architecture

Functional and Non-functional Data Warehousing Requirements

The recommendation "Begin with the end in mind" is very true for data warehousing and business intelligence.  The end that we have in mind is a system that satisfies both functional and non-functional requirements, that is, a system that does what it is supposed to do.

Functional requirements (business requirements) are needs identified by the business relating to data and business processes.  For example, we may be looking for a system that provides information about customers, territories and products and that supports business processes of selling and customer support.  See the article, Requirements for Data Warehousing and Business Intelligence, for guidance on how to gather and organize business requirements.

Non-functional requirements are needs about performance and IT chosen practices.  Performance includes issues such as required system availability and recoverability.  It depends on the volume of data and number of users expected for the data warehouse.  IT chosen practices include selected technologies (the "tech stack") and standards. 

Data Warehousing Architectural Principles

Data warehousing architectural principles, largely complementing enterprise architecture set a framework for decision making.  One set of architectural principles are the "ilities":

  • Flexibility - systems should adaptable to changing conditons
  • Scalability - systems should be expandable

Some additional principles might include:

  • Re-use before buy and buy before build
  • Develop in manageable steps.
  • Don't boil the ocean
Buy, Build or Re-Use Data Warehouse Components?

A critical question to data warehousing efforts, is how to obtain the resources that make up the data warehousing system.  There are three options:

  • Re-use existing resources
  • Buy a new resource
  • Build a resource

Re-using existing resources can often save money and deliver a superior and more maintainable solution.  If we buy or build new components every time that there is a new project, then the portfolio of resources will soon become bloated and expensive to maintain.  Re-use can have drawbacks.  Existing resources may not meet current function or non-functional requirements. 

Buying a new data warehousing resource can save time and money over building a resource.  Buying is a good choice when products are available for a price less than building and meet a large percentage of requirements.  Purchased software may have more features and fewer problems than home grown software for example.

Building a resource can be a good answer when there are no existing resources to re-use and purchased resources that meet requirements are not available for a reasonable price.  Building a solution or part of a solution can result in a competitive advantage where your organization has a capability that is not readily duplicated by competitors.  Cost is also a factor.  Purchased software often has a per user or per computer charge while in-house developed software can be made available to internal users without additional licensing fees.

In general, we recommend "Re-use before buy and buy before build".  Some combination is likely.  Create a list of needed resources and specifying the type of sourcing for each item.

Metadata for Data Warehousing and Business Intelligence

Metadata is often defined as "data about data".  In practice, data warehousing metadata is any data that describes or controls the system that is not procedural programming code. 

Examples of metadata include:

  • Data definitions
  • Data models
  • Data mapping specifications

Defining data once through metadata and then re-using those data definitions can save much development and support time while resulting in more consistent data warehousing solutions.

Metadata is typically created in tools such as the data modeling tool and the ETL tool.  It may then be stored in metadata repository that manages and coordinates this information.

See the article, Metadata for Data Warehousing and Business Intelligence, for further insights.

Data Sources for the Data Warehouse

The information provided by the data warehouse and business intelligence is only as good as its inputs.  Finding, understanding, selecting and improving data sources are critical to the success of any data warehousing project.

A leading cause of data warehousing project failures is a lack of understanding of data sources and poor data quality of data sources.  We recommend gaining an understanding of the data by use of data profiling tools and the improvement of its data quality through data cleansing approaches.

The article Data Sources for Data Warehousing and Business Intelligence provides further information.

Data Warehouse Data Extracts

The data warehousing extract process pulls data out of data sources so that is available for later transformation and then load into the data warehouse and other databases.  Architectural choices include choice of extract tool and timing of extracts.

Data warehouse extracts are further explained the article ETL - Extract Transform Load for Data Warehousing and Business Intelligence.

Database Selection and Physical Storage for Data Warehouse

The choice of where and how to store the data for the data warehousing system is a critical architectural question.  Part of the issue is data warehousing "style":

  • Enterprise data warehouse
  • Basic data warehouse system
  • Specialized data warehouse / data marts
  • Federated data warehouse
  • Virtual data warehouse
  • ROLAP vs MOLAP

The basic data warehouse system calls for the creation of the following types of databases:

  • Data source staging (Kimball's "Landing Area")
  • Data warehouse (Kimbal's "Backroom")
  • Data marts (Kimball's "Frontroom")

The article, Database Choices for Data Warehousing and Business Intelligence, describes each of these alternatives in detail.

Data Model Patterns for Data Warehousing

A data model is a graphical view of data created for analysis and design purposes.  While architecture does not include designing data warehouse database in detail, it does include defining principles and patterns modeling specialized parts of the data warehouse system.

Areas that require specialized patterns are:

  • Staging / Landing Area - looks like source system
  • Data warehouse / Backroom - uses normalized ERD
  • Data Mart / Frontroom - uses dimension modeling - the ROLAP star schema or the MOLAP cube

In addition to these specialized patterns, the architecture should include other pattern descriptions for:

  • Naming of tables and columns
  • Assignment of keys
  • Indexing
  • Relational Integrity (RI)
  • Audit history

These patterns are described in the article Data Models for Data Warehousing and Business Intelligence in greater detail.

Mapping, Transforming, Enriching, and Loading Data Warehousing Data

After data has been extracted and the physical storage areas created, it is time to pump the data through the data warehousing system - from data sources to staging to data warehouse to data mart to BI query to the business user.

These key activities are needed to support this process:

  • Mapping - Data sources are aligned with data targets.  We decide what data goes where.
  • Transforming - Data is modified to meet requirements.
  • Enriching - Additional data may be added such as geocoding.
  • Loading - Data is inserted into databases

The article ETL - Extract Transform Load for Data Warehousing and Business Intelligence provides further information on this subject.

Business Intelligence Analysis and Presentation

Business Intelligence is the part of the data warehousing system where business users analyze the data and prepare presentations. The data warehouse architecture must provide for the needs of the business people who will access the system.

Business people are likely to act like farmers who harvest a crop of known information or explorers who are seeking new patterns. Both types of access must be supported. 

Data warehouse architecture includes the selection and use of the following types of tools: 

  • Query Tools  
  • Reporting Tools  
  • Fixed Reports  
  • Analysis Tools  
  • Data Mining Tools

The article BI - Business Intelligence supplies further information about each tool category.

Managing, Operating, and Securing the Data Warehouse

To achieve benefits from business intelligence, it is important to manage the data warehouse to make sure that it continues to provide value and avoids risk of loss.  This includes activities like:

  • Obtaining feedback from BI users
  • Assuring the quality of data
  • Monitoring the performance of the data warehouse
  • Securing the data warehouse from threats
  • Enabling governance of the overall system

The management of data warehousing is further described in the article Operations.

Data Warehousing Architecture Roadmaps

Data warehousing architecture helps to answer critical questions:

  • What is the current state of data warehousing and business intelligence?
  • What is the desired future state of data warehousing and business intelligence?
  • What are the high level steps that must be taken to move from the current state to the future state?

Roadmaps identify actions that must be taken to achieve the desired future state.  In addition, roadmaps specify intermediate future states that must be passed through to achieve the future state.

==================================================================================

转载于:https://www.cnblogs.com/fangwenyu/archive/2009/11/30/1613668.html

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值