The Microsoft Data Warehouse Toolkit读书笔记(八):第七章 OLAP数据库设计

最新推荐文章于 2021-11-23 14:30:57 发布

置顶 brightgems

最新推荐文章于 2021-11-23 14:30:57 发布

阅读量1.5k

点赞数

分类专栏：数据仓库与商业智能文章标签： microsoft 数据库读书 hierarchy attributes properties

本文链接：https://blog.csdn.net/brightgems/article/details/2040121

版权

数据仓库与商业智能专栏收录该内容

42 篇文章 0 订阅

订阅专栏

一、为什么使用分析服务

AS可以提供良好的查询性能和分析功能

1、聚集管理

两个挑战：哪些聚集是有用的；如何填充和管理这个聚集

2、聚集操作

预处理并存储聚集结果。允许用户直接参与前端操作

3、度量值汇总

4、查询性能

人们使用OLAP的主要原因。

5、其它原因

1）行级的安全性

2）高可用性

3）把元数据向业务用户公开

6、重要更新

· In Analysis Services 2000, dimensions were strongly tied to hierarchies like Year, Quarter, Month, Day. Non-hierarchical attributes like DayName or IsWeekend were second-class citizens and difficult to work with. In Analysis Services 2005, dimensions are attribute-based, and behave like relational dimensions. We talk more about this change later in this chapter, when we discuss how to design the dimensions.

· Analysis Services 2005 no longer requires that dimensions be cached in memory. You’ll be able to build cubes with extremely large dimensions—possibly beyond 100 M members. For those who don’t have extreme dimensions, the main implication of this important change is that although the server will still benefit from lots of physical memory, it should no longer fail if memory is tight.

· Proactive caching is used for BI applications with low latency. We discuss proactive caching in Chapter 17.

· The metadata repository is gone, replaced by XML data definition files.

· You can use the standard management tools to process partitions in parallel. Parallel processing occurs by default.

· Calculations are computed on the server, rather than distributed between client and server. This has the effect of reducing the performance of simple queries—probably by an amount that’s not noticeable to people. On the upside, complex queries will resolve much faster because the server rather than the client is always operating on the data.

· You can capture the text of users’ queries, which makes it much easier to understand what users are doing and to debug problematic queries.

· The Key Performance Indicator (KPI) framework lets you define corporate measures on the server. A KPI consists of expressions for value, goal, current status, and trend, which are displayed using simple graphics like gauges and stoplights.

· Analysis Services 2005 lets you present one database in multiple languages, including both metadata (the names of dimensions and levels) and data (dimension member contents and measure calculation and display).

· The administrative security model is richer than in Analysis Services 2000.

· MDX Scripts are the new mechanism for defining Calculated Members, Named Sets, and Cell Calculations. MDX Script syntax is simplified and improved, and can actually be debugged.

· You can write stored procedures for Analysis Services, using popular languages like VB.NET or C++. Stored procedures will perform far better than the old user-defined functions because stored procedures execute on the server.

· Data writeback, which was a flawed feature in Analysis Services 2000, is much better. Writeback performance has improved dramatically.

· There are new features for developers, like web services and the XMLA protocol. Analysis Services 2005 continues support for the old protocols and object models: OLE DB for OLAP, ADOMD, and ADOMD.NET. A new management object model, Analysis Management Objects (AMO), replaces DSO. AMO’s most important new feature is the ability to script an object’s creation or modification.

· There are new features to improve system management, like server trace events for monitoring and auditing system usage and performance. The Flight Recorder feature makes it much easier to debug problems.

7、为什么不使用AS

二、OLAP结构设计

步骤如下：

1） Set up the design and development environment

2） Create a Data Source View

3） Create and fine-tune your dimensions

4） Run the Cube Wizard and edit the resulting cube

5） Deploy the database to your development server

6） Create calculations and other decorations

7） Iterate, iterate, iterate

1、Getting Started

A. 安装

BI DEV STUDIO, Management studio

B、Create Relational Views

为维度模型中的每张表建立一个视图，并友好地命名

C、Populate the Data Warehouse Database

ETL过程

2、Create a Project and a Data Source View

在DSV中定义事实表和维度表，及它们的关系。在CSV中定义的对象描述将在Cube Wizard中引用

3、维度设计

Create dimensions one at a time, as we describe in the next section. Work on each dimension, setting its attributes and properties correctly, processing and browsing it until you like the way it looks. Move down the list of dimensions until you’ve defined all the dimensions associated with a fact table. Only then should you run the Cube Wizard.

· Dimension hierarchy: A common drilldown path in the dimension, predefined by the system developer for the benefit of users.

o Natural hierarchy: A drilldown path with referential integrity between attributes at different levels, like Year to Month to Day.

o Navigation hierarchy: A drilldown path between unrelated attributes, like Gender to Age. This is sometimes called a reporting hierarchy.

· Attribute hierarchy: Each dimension attribute is exposed as a single-level attribute hierarchy. You optionally define one or more multi-level dimension hierarchies from those attributes.

· Attribute relationship: An attribute relationship is, effectively, a declaration of referential integrity between two attributes. All attributes have an attribute relationship to the key of a dimension. You can and should define other attribute relationships.

· Member: A member is an item in a dimension level or attribute hierarchy. Examples are January 2006 in the Month level of the Date dimension, or Rural Cycle Emporium in the Customer level of the Customer dimension. A member is analogous to a row in the dimension table.

o Leaf member: A leaf is a member at the finest grain of the hierarchy, like a date, customer, or employee.

o All member: A member, usually defined only in Analysis Services, that represents the aggregation of all the leaf members: All customers, All products, and so on. The All member is optional.

o Default member: The member that’s included by default if a query does not reference the dimension. By default, the Default member is set equal to the All member—and behaves exactly as the relational database would if you exclude a dimension from the query.

· Reference dimension: The Kimball Method doesn’t have a name for this concept. A reference dimension is a dimension that’s referenced by others. For example, many dimensions like Customer, Vendor, and Shipping Destination could all use the same Geography table. All you need is to put the Geography surrogate key into those primary dimensions. We haven’t found reference dimensions to be particularly useful, and we don’t discuss them in this chapter.

· Fact dimension: A fact dimension is what the Kimball Method calls a degenerate dimension. It’s a dimension that’s populated from the fact table. The common example is a Purchase Order and Line Item Number from a sales transaction fact table. We don’t expect users to slice and dice by PO numbers, but it’s often very convenient to have them available in both the relational data warehouse and the cube. However, we almost never build a separate relational dimension table from this kind of entity. Any interesting attributes of the purchase order have been pulled out into one or more dimension tables. This entity is halfway between fact and dimension, so we call it a degenerate dimension. There are several ways to support degenerate dimensions in Analysis Services.

A、Standard Dimensions

A standard dimension contains a surrogate key, one or more attributes and attribute hierarchies, and sometimes zero or more multi-level hierarchies. Analysis Services can build dimensions from dimension tables that:

· Are denormalized into a star structure, as the Kimball Method recommends

· Are normalized into a snowflake structure, with separate tables for each hierarchical level

· Include a mixture of normalized and denormalized structures

· Include a parent-child hierarchy

· Have all Type 1 (update history) attributes, Type 2 (track history) attributes, or a combination

Tip: Analysis Services 2005 has built in many features to help you live without surrogate keys. For example, you can set up the database so that it automatically finds and “fixes” referential integrity violations. In our view, this activity should be done during the ETL process. Build Analysis Services databases from clean, dimensional data in the relational database.

If you have a lot of Type 1 changes in a very large database, you’re going to be unpleasantly surprised by the incremental processing performance.

B、Variable Depth or Parent-Child Hierarchies

当数据库很大或维度成员很多时，父子维度的性能并不理想

C、Multivalued or Many-to-Many Dimensions

可以将Bridge table 看成事实表，在DSV中不需要定义bridge table

4、Creating and Editing Dimensions

建立维度时有两种方法，一种是从关系数据仓库建立维度；另一种是先建立维度，再反向生成关系数据库，这使得微软可能在未来建立适用于不同行业的BI模板。

When you’re editing a new dimension, take the following steps:

· Edit the names of the dimension and each attribute within the dimension.

· Edit the properties of the dimension.

· Edit the properties of each attribute.

· Create attribute relationships and edit their properties.

· Create hierarchies. Edit the hierarchies’ properties and the properties of each level.

· If necessary, define dimension translations.

· Build, deploy, and process the dimension so you can look at the dimension’s data.

· Iterate, iterate, iterate.

A、Editing Dimension Properties

几个重要的属性:

Name: The name shows up in the user interface, so keep it short yet descriptive.
Description: Most query tools for Analysis Services have a way of showing the description to the user, usually as a tool tip. As discussed previously, populate the Description metadata from the information you collected during your requirements gathering process.
Type: Type is a frustratingly vague name for a property, but it’s hard to think of a better name. Type is a classification of the dimension that’s used by Analysis Services only in a few special cases (Time and Account). Most of the time, if you set the dimension type to anything other than Time or Account, nothing happens. Set your Date dimension’s Type to Time, so Analysis Services knows to use the dimension in time-related calculations.
AttributeAllMemberName: This label will show up to users when they start drilling down in the dimension and are looking at the All member of the dimension. By default, this property is blank, but that doesn’t mean the All member’s name is blank. Instead, the default All member name is All (on English language systems).
ErrorConfiguration: Analysis Services provides a lot of different options for handling problems with the dimension data, like duplicate keys and referential integrity violations. You can see the options if you set ErrorConfiguration to (custom). If you’re sourcing your dimension from a solid Kimball Method dimension table, as we’ve described in great detail in this book, you shouldn’t have to worry about this property. You shouldn’t have bad dimension data; shame on you if you do.

· ProcessingMode: Sometimes, all of the aggregations associated with a dimension need to be recalculated. With this property, you can specify how quickly after an update the new cube data is available for users to query.

· ProactiveCaching: This property is useful for low latency population of the dimension.

· StorageMode: Dimensions can be stored in Analysis Services format (MOLAP), or in the relational database (ROLAP). The vast majority of Analysis Services implementations should use MOLAP for all dimensions

B、Editing Attribute Properties

几个重要属性：

· Usage: This property is usually set correctly by the Dimension Wizard. One attribute for the dimension can and should have its usage set as Key. This is, obviously, the surrogate key for the dimension. Almost always, the other attributes are correctly set as Regular. The two exceptions are:

o The parent key for a Parent-Child dimension (set to Parent)

o A column in a Chart of Accounts dimension that defines the account type (set to AccountType)

· Source KeyColumns: By default, the KeyColumns property is correctly set to the appropriate column in the relational dimension table. The KeyColumns can be made up of more than one column, but in a Kimball Method system they always use the single-column surrogate key.

· NameColumn: You have an opportunity to set the NameColumn for the Key attribute in the Dimension Wizard.

· OrderBy and OrderByAttribute: You can easily set an attribute to be ordered by either its key or its name. For example, we usually want to sort months not alphabetically (by month name), but by the month number.

· IsAggregatable: This property of an attribute specifies whether the attribute can be aggregated. It should almost always be set to the default value of True. If you set this property to False, the attribute hierarchy will not contain an All level.

· AttributeHierarchyDisplayFolder: You can add a lot of value to your Analysis Services database by adding display folders for the Attribute Hierarchies. If you have more than 10 to 12 attributes in your dimension, you should create multiple display folders and assign attributes to them. Creating a display folder is as simple as typing a name into the attribute’s AttributeHierarchyDisplayFolder property

· AttributeHierarchyVisible: You can also prevent a specific attribute from appearing in the list of Attribute Hierarchies, by changing AttributeHierarchyVisible to False. Hiding the attribute hierarchy doesn’t prevent you from exposing the attribute in a multi-level hierarchy. In fact, it’s common to hide the attribute hierarchy for attributes that are available through multi-level hierarchies, especially if the dimension has a lot of attributes.

· DefaultMember: Each Attribute Hierarchy has a default member. Unless you change it here, the default member is the “All member” for the dimension

· Parent-child properties: If the dimension has a parent-child hierarchy, you need to set up several properties. parent-child properties include:

o RootMemberIf: How do you identify a member that’s at the top of the tree? This would be the CEO in an employee hierarchy. There are several options, all self-explanatory.

o NamingTemplate: How do you know what level of a parent-child hierarchy you’re on? Many parent-child hierarchies don’t have meaningful levels, but some do—employee organization is again a good example. For example, the top levels could be called Executive, VP, Senior Management, and then the remaining levels just called Level4, Level5, and so on. Use the NamingTemplate property to name these levels.

C、Related Attributes

When you create a dimension, the wizards automatically add all the dimension’s attributes as Related Attributes of the key.

尽量使用间接关联，如年->月->日，而不是年->日

成员键必须唯一

相关属性设置是否正确让会影响CUBE的性能

There is one vital property associated with a Related Attribute: RelationshipType, which can be Rigid or Flexible. RelationshipType is where Analysis Services manages Type 1 and Type 2 dimension changes. Rigid means the relationship is fixed over time: The Related Attribute is managed as a Type 2 slowly changing attribute. Flexible means the relationship can change over time: The Related Attribute can be updated in place

D、Creating Hierarchies and Editing Hierarchy Properties

Hierarchies have few interesting properties:

· Name and Description: Ensure these properties are set correctly, as we discussed earlier for the dimension.

· AllMemberName: The name of the All member of the hierarchy. This property of the hierarchy is very much like the AttributeAllMemberName property of the dimension. All attribute hierarchies of a dimension share the same All Member name. User-defined hierarchies can have unique All Member names.

· DisplayFolder: Usually we want the hierarchy to show up in the top level when the user is browsing the dimension, so we leave DisplayFolder blank. If you have a really complicated dimension, it may make sense to put hierarchies into a display folder, as discussed earlier for attributes.

· Display order for hierarchies: A subtlety of the Dimension Designer interface is that the order in which the hierarchies are displayed in the hierarchy pane is the same order in which they’ll appear in most query tools’ user interfaces. You can drag and drop the hierarchies to re-order them. This isn’t an official property that you can edit, except by reordering the hierarchies.

Levels are constructed from attributes：

· Name and Description: Ensure these properties are set correctly, as we discussed earlier for the dimension.

· HideMemberIf: Usually, you want this property set to its default value of Never. You might want to hide a member if you have an unbalanced hierarchy that doesn’t have members at all levels. The easiest example of an unbalanced hierarchy is a worldwide geography hierarchy. Small countries like Luxembourg don’t have states, so you’d usually just carry the country name down to the state level. In this case, you may choose to set HideMemberIf to ParentName or OnlyChildWithParentName. Be warned: Not all client query tools respect the HideMemberIf setting.

E、Browsing Dimension Data

5、Creating and Editing the Cube

术语：

· Measure: An Analysis Services measure is what the Kimball Method calls a fact.

o Physical measures are defined from a column in the relational fact table view. Because you’re sourcing from a view, you can create simple calculations in the view definition.

o Calculated measures are defined using Multidimensional Expressions (MDX). We discuss this topic in much greater detail later in the chapter.

o An aggregate function is associated with each measure. The default is to sum measures, but you can count, min, max, count distinct, or summarize by account. Some of the more complex aggregate functions, like by account, are available only in Enterprise Edition.

· Measure group: A measure group is directly analogous to a Kimball Method fact table. A measure group contains data at a grain specified by its dimensionality. If you have a relational fact table, you should expect to have a corresponding measure group.

· Cube: An Analysis Services cube is roughly analogous to a set of Kimball Method business process dimensional models. You could define a cube that encompasses the entire enterprise dimensional model, although that may be impractical for large and complex organizations.

This definition of a cube differs from Analysis Services 2000. The best practice in Analysis Services 2005 is to define a single cube for a database. To those familiar with Analysis Services 2000, this would be like defining a virtual cube that logically combines all the physical cubes in a database. You are still permitted to create multiple cubes in a database, but you shouldn’t. Instead, create a single cube with multiple measure groups.

· Analysis Services database: As discussed, an Analysis Services 2005 database should be one-for-one with the cube. You’re allowed to create multiple cubes within a single database, but the best practice is to create a database with a single cube. If you follow this practice, database and cube are synonymous.

· Perspective: Simplify the overall view of the cube by defining perspectives for different user communities. Perspectives are available only in SQL Server Enterprise Edition.

· Cells: A cell is a location in the cube.

o A leaf cell is analogous to a row in a fact table—although more accurately it’s a column in a row.

o A non-leaf cell is a location that corresponds to a summarized value, like Total Sales in November 2005 for Brand X. A cell is, effectively, the address of a number that you could bring back in a query. A cell represents a single coordinate from each dimension hierarchy, including the Measures dimension.

o A cell’s contents can be calculated, as we discuss later in this chapter.

o A cell can be empty. An empty cell takes up no space in an Analysis Services cube.

· Measure dimension: Analysis Services automatically creates a dimension called Measures where it stores all the fact-related attributes (called measures in Analysis Services). Users can locate measures by navigating to the Measures dimension.

The most important properties for a measure group are:

· Name and Description: Ensure these are set correctly. Most Analysis Services client query tools reveal the Description as a tooltip.

· StorageMode and ProactiveCaching: Usually during development, these are set to MOLAP and Off respectively. We talk about these properties later in this chapter, when we discuss physical design considerations. Early in the development process, you’ll be iterating on the design. Using MOLAP storage now—even if later you’ll be moving to a different kind of storage—keeps the development cycle faster and simpler.

· StorageLocation: You can point the cube’s MOLAP storage to a location other than the default.

The most important properties for a measure are:

· Name, Description, and FormatString: Ensure these are set correctly. Most Analysis Services client query tools automatically format data correctly, according to the FormatString.

· DisplayFolder: By default, measures are grouped by, well, measure group. Since each measure group can have dozens of measures, DisplayFolders give you an extra grouping layer within a measure group. They don’t exist anywhere else. You make them up by typing them in the property window.

· AggregateFunction: Most measures are defined to summarize by summing or counting; less frequently by distinct count, min, or max. You can also define options for semi-additive or non-additive measures: averaging, beginning of period, end of period, and so on.

· Visible: It’s surprisingly common to hide a measure. Many measures are used for calculations but aren’t very interesting on their own. Hiding them reduces clutter for the business users.

A、Edit the Cube Structure

There are a few properties of the cube：

· Name and Description: Edit the cube’s name and description as appropriate.

· DefaultMeasure: One measure will come up by default, if users don’t specify a measure in a query. You should choose which measure that is, rather than letting Analysis Services choose it for you.

· StorageLocation: You can set the StorageLocation here, rather than for each measure group individually.

B、Edit Dimension Usage

The Dimension Usage tab’s summarization of dimension usage is very similar to the Kimball Method bus matrix with the rows and columns flipped. At a glance you can identify which dimensions participate in which measure group, and at which cardinality.

Most dimension relationships are Regular, but you can also specify Fact, Reference, Data Mining, and Many-to-Many relationships here.

C、Build, Deploy, and Process the Project

By choosing Process, you’re doing three things:

· Building the project, looking for structural errors.

· Deploying that project to the target Analysis Services server, creating a new database structure (or updating an existing database structure).

· Processing that database, to fully or incrementally add data.

D、Create Calculations

Calculated measures will show up under the Measures dimension. Many calculated measures are quite simple, like the sum or ratio of two other measures. To a business user browsing the cube, calculated measures are largely indistinguishable from physical measures. Some query tools, including the Cube Browser integrated with BI Studio, will show a different icon for calculated measures and physical measures. A calculated measure is really just a calculated member, assigned to the Measures dimension. A very complex calculated measure may perform less well than a physical measure because calculations are performed at runtime.
Calculated members can be created on any dimension. Non-measure calculated members can be very powerful. They’re a way to create a kind of calculation that applies for some or all members. For example, you could create a calculated member Year To Date on the Date dimension, to automatically calculate multiple measures year-to-date.

Tip

The Add Business Intelligence Wizard will create a wide variety of calculations for you, including the Year To Date calculated member.

Named sets are a set of dimension members. A really simple named set would specify the set explicitly, perhaps as a list of important products. More complicated set definitions locate the set of products with a high price, products that sold well this year, and so on.
Calculated sub-cubes are a way to calculate an arbitrary portion of the cube’s data and summarizations. A common use of calculated subcubes is to allocate summary data (like monthly quotas) down to more fine-grained data. Or, calculated sub-cubes can provide a complex summarization method, if your business rules are more complicated than can be supported by the standard Aggregate Functions. If you’re among the handful of people who are intimately familiar with calculated cells from Analysis Services 2000, you’ll dig into calculated subcubes and readily understand the improvements that MDX scripting and scoped assignments provide.

E、Define Key Performance Indicators

· Name of the KPI: And the measure group with which the KPI is associated.

· Value to be measured: The underlying measure for the KPI. Most often you’ll use a physical measure or a calculated measure that you’ve already defined. You can enter an MDX expression here, but it usually makes more sense to create a calculated measure and use that. In the very simple example we used in this section, the value to be measured might be [Sales Revenue].

Goal for the value: An MDX expression that defines the target for the measure. A trivial goal would be a number: The sales target is $100,000. That would give a stupid KPI. More interesting would be a goal to increase sales 0.1 percent from the same day the previous week, or a goal that’s based on sales quotas
Status: A graphic and an MDX expression that describe how the business is doing relative to its goal. Analysis Services provides several built-in graphic gauges, including thermometers, traffic lights, and happy faces. The MDX expression needs to evaluate to numbers between -1 (very bad) to +1 (very good). If you’re clever about defining the status expression, you can do a good job of conveying important information about wide deviations from the goal, and keep minor deviations quiet.
Trend: A graphic and an MDX expression that describe whether you’re moving toward or away from your goal. Is the situation getting better or worse? As with the status graphic, Analysis Services provides a few built-in trend graphics, notably a trend arrow. The MDX expression must evaluate to numbers between -1 (going south in a hurry) and +1 (getting better fast).

F、Create Actions

The most obvious use of an Action is to execute a relational query. As we’ve already described, the majority of SQL Server 2005 Analysis Services databases are built at the same grain as the underlying dimensional model. An interesting BI application could include an Action that drills back to enterprise data, perhaps even back to the original transaction system. This query Action can easily be implemented as an Action that launches a Reporting Services report.

Analysis Services provides for three kinds of Actions:

· Drillthrough to the cube’s leaf-level data.

· Execute a Reporting Services report.

· Execute a generic Action. Actions can bring up a URL, return a rowset or dataset, or execute a command. An Action is defined as an MDX statement, and can be attached to specific parts of the cube. For example, you can have one Action that executes when someone right-clicks on the Month label in the Date dimension, and a second Action that launches when someone right-clicks on a data cell.

G、Maintain Perspectives

An Analysis Services Perspective is like a big view on top of the database, limiting a user’s view to a set of related measure groups and dimensions

与数据集市概念类似，但并不是用于安全方面的设置。

三、物理设计

1、Storage Mode: MOLAP, HOLAP, ROLAP

Some people argue that MOLAP storage wastes disk space. You’ve already copied data at least once to store it in the relational data warehouse. Now you’re talking about copying it again for the cube storage. To this argument we reply that a relational index also copies data. No one asserts you shouldn’t index your relational database.

MOLAP storage is highly efficient. The leaf data in MOLAP mode (data and indexes) tends to require about 20 percent of the storage of the relational source (data only, no indexes). The MOLAP store of the leaf data is comparable in size to a single relational index on the fact table—and buys a lot more for you than any single relational index could possibly do.

Another common argument against MOLAP is that it slows processing. ROLAP is always the slowest to process because writing the aggregations to the relational database is expensive. HOLAP is slightly faster to process than MOLAP, but the difference is surprisingly small.

2、Designing Aggregations

小的数据集不需要聚集

We recommend that you use the Aggregation Design Wizard during development. Set the “performance gain reaches” number very low: 5 to 20 percent,

The Aggregation Design Wizard positively encourages you to build too many aggregations, with its default setting of 30 percent. The first several aggregations provide a lot of benefit. After that, the incremental benefit at query time is slight; and the cost during processing time can grow substantially.

As clever as the Aggregation Design Wizard is, and as good as its recommendations are, it’s missing the most important information in aggregation design: usage. The aggregations that you really want are those based on the queries that business users issue.

3、Partitioning Plan

Partitioning is vital for large measure groups because partitions can greatly help query performance. The Analysis Services query engine can selectively query partitions: It’s smart enough to access only the partitions that contain the data requested in a query. This difference can be substantial for a cube built on, say, a billion rows of data.

With a small partition for current data, you have many more options to easily support real-time data delivery.

Partitioning also makes it possible—even easy!—to set up your measure group to support a rolling window of data. For example, your users may want to keep only 90 days of detailed data live in the cube. With a partitioned measure group, you simply drop the dated partitions. With a single partition, you have no option for deleting data other than reprocessing the entire measure group

There are several critical pieces of information to set correctly whenever you create a new partition:

· Source: It’s your job to ensure that the data that flows into each partition doesn’t overlap. In other words, the partition for year 2005 needs a source query that limits its data only to 2005.

· Slice: The Slice is where you tell Analysis Services which partitions to selectively access when resolving a query. Analysis Services 2005 sets this property automatically for you for MOLAP partitions. If you have a ROLAP partition—which is relatively unlikely unless you have a very low latency partition—set the slice by hand.

· Aggregations and storage: One interesting feature of Analysis Services partitions is that storage mode and aggregation design can differ for each partition. The most common intentional use of this feature is to add new kinds of aggregations only to current partitions as the aggregation plan changes over time.

4、Planning for Deployment

One of the biggest questions, of course, is how to size the system. Earlier in this chapter we mentioned the 20 percent rule: The leaf level MOLAP data tends to take 20 percent of the space required by the same data in the relational database (data only, no indexes). Another equally broad rule of thumb is that aggregations double that figure, up to a total of 40 percent of the relational data. This is still amazingly small.

5、Historical Processing Plan

We recommend that you use an Integration Services package rather than a script to perform measure group processing.

6、Incremental Processing Plan

A、Scheduled Processing

Incremental measure group processing is more complicated than dimension processing because you must design your system so you process only new data. Analysis Services doesn’t check to make sure you’re not inadvertently adding data twice or skipping a set of data.

It’s tempting to use the transaction date as the filter condition for the view or query that finds the current rows. In the real world, we usually see data flowing in for multiple days, so we tend to prefer the audit key method. If you’re sure that can’t happen in your environment, you can use the transaction date.

B、Planning for Updates to Dimensions

Specify whether or not to compute aggregations in the background by setting the ProcessingMode property of the dimension. The ProcessingMode can be:

· Regular: During processing, the leaf-level data plus any aggregations and indexes are computed before the processed cube is published to users.

· LazyAggregations: Aggregations and indexes are computed in the background, and new data is made available to the users as soon as the leaflevel data is processed. This sounds great, but it can be problematic for query performance, depending on the timing of the processing. You want to avoid a situation where many users are querying a large cube at a time when that cube has no indexes or aggregations in place because it’s performing background processing.

Deleting a dimension member is impossible, short of fully reprocessing the dimension. Fully reprocessing a dimension requires that any cubes using this dimension also be fully reprocessed. If you must delete dimension members, the best approach is to create a Type 1 attribute to flag whether the dimension member is currently active, and to filter those dimension members out of most reports and queries. Monthly or annually, fully reprocess the database.

C、Planning for Fact Updates and Deletes

There are several kinds of deleted data. The simplest, where you roll off the oldest month or year of fact data, is easily handled with a partitioned measure group. Just delete the partition by right-clicking it and choosing Delete in Management Studio or, more professionally, by scripting that action.

As with fact updates, deleting specific fact rows is impossible. The scenario is fairly common: You inadvertently load the same data twice into the relational database. It’s unpleasant but certainly possible to back out that load from the relational tables, especially if you use the auditing system described in Chapter 6. But within Analysis Services, you’re pretty much out of luck: Fully reprocess the affected partition.

brightgems

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
The Microsoft Data Warehouse Toolkit读书笔记(八):第七章 OLAP数据库设计

一、为什么使用分析服务AS可以提供良好的查询性能和分析功能1、聚集管理两个挑战：哪些聚集是有用的；如何填充和管理这个聚集2、聚集操作预处理并存储聚集结果。允许用户直接参与前端操作3、度量值汇总4、查询性能人们使用OLAP的主要原因。5、其它原因1）行级的安全性2）高可用性3）把元数据向业务用户公开6、重要更新· In An
复制链接

扫一扫

专栏目录