Mondrian ROLAP 小结

最新推荐文章于 2019-05-17 15:19:45 发布

大林-Java

最新推荐文章于 2019-05-17 15:19:45 发布

阅读量3.1k

点赞数

分类专栏：数据库OLTP&数据仓库OLAP

本文链接：https://blog.csdn.net/paul_wei2008/article/details/19349157

版权

数据库OLTP&数据仓库OLAP 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

eclipse mondrian rolap:

eclipse配置mondrian3.5 + jpvoit1.8 服务器jettty8，

启动报错java.lang.ClassNotFoundException: mondrian.web.servlet.MdxQueryServlet

解决方法如下：

1）将mondrian.war解压后,lib下jar包拷贝到Dynamic Project中 /WEB-INF/lib

将jpvoit.war解压后,lib下拷贝到到Dynamic Project中 /WEB-INF/lib，并且将jpivot、wcf文件夹及busy.jsp、testpage.jsp拷贝到 /WebContent下

2）或者WEB-INF/web.xml注释掉:MDXQueryServlet

mondrian.web.servlet.MdxQueryServlet 改成 mondrian.web.servlet.MDXQueryServlet（注意大小写）

mondrian3.5中是MdxQueryServlet ，jpivot中是MDXQueryServlet

MDX:解析SQL

1.在WEB-INF下添加一个mondrian.properties文件，内容如下：

mondrian.rolap.generate.formatted.sql=true

2 log4j.properties（src目录）

log4j.rootLogger=WARN, MONDRIAN

log4j.appender.MONDRIAN=org.apache.log4j.ConsoleAppender
log4j.appender.MONDRIAN.layout=org.apache.log4j.PatternLayout
log4j.appender.MONDRIAN.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

log4j.category.mondrian.sql=DEBUG

log4j.category.mondrian.mdx=MONDRIAN --对比MDX与SQL

MDX（Multi Dimensional Expression）:

OLAP多维立方体查询语言，支持定义和操作多维对象和数据，在功能上类似于关系数据库查询语言SQL。是OLAP服务器与外界交互的专用语言.

Mondrian是一个开放源代码的Rolap服务器，使用java开发。它实现了MDX语言,XML解析,JOLAP规范.它从SQL和其它数据源读取数据并把这数据聚集在内存缓存中，能够分析存储在SQL数据库中的海量数据而不需要编写任何SQL脚本。Mondrian用于数据的多维探测。它支持把MDX（Multi-Dimensional eXpression）语言转换成SQL。而且自定义了一种使用mdx语言的客户端接口。

•数据立方：维和度量在一个特定主题范围内的集合.

•度量：(投影) 一个你感兴趣的一个测量值，例如, 某种产品的销量.

•维：关联维度表一个属性, 属性的集合, 用于度量的分类. 例如, 你希望通过产品的颜色、客户的性别来区分它们的销量。在这里，颜色和性别被称之为维

•成员(member)是指维上的一个点,是该维值集合中的一个值. 如名为gender的层次结构有两个member：‘M’ and ‘F’.

•层次结构(hierarchy)是为了方便分析问题而将一组member组织成的结构. 例如，store层次结构由store name, city, state, 和nation组成。（下钻）

•级别（Level）人们观察数据的某个特定角度（即某个维）还可以存在细节程度不同的各个描述方面（时间维：日期、月份、季度、年）。即维的级别

MDX转换SQL：mondrian.properties

# Allow the use of aggregates

mondrian.rolap.aggregates.Use=true

mondrian.rolap.aggregates.Read=true

mondrian.native.topcount.enable=true

mondrian.native.filter.enable=true

# mondrian.properties

mondrian.result.limit=50000

# For XML/A JSPs

mondrian.rolap.generate.formatted.sql=true

MDX：语言

SELECT

NON EMPTY {[Measures].[Store Sales]} ON COLUMNS,

NON EMPTY CrossJoin([Customers].[Country].Members, [Gender].[Gender].Members) ON ROWS

FROM [Sales]

WHERE {[Education Level].[Bachelors Degree], [Education Level].[Partial College]}

[Product].[All Products].Children：维度Product，层次All Products 子级别

nameColumn="protypename"：显示维度名称

ordinalColumn:s属性使层次表中指定的Column在给定的级别下能够按顺序排列

hasAll：true显示所有层次级别，false隐藏所有层次级别

如果<Hierarchy>元素的 hasAll属性为"false", 则 'all' 级别将不存在. 该维的默认成员将是第一个level的第一个成员; 例如, 在一个时间层次,将是该层次的第一个年份. 改变默认成员可能出现一些意外, 因为一般情况下推荐使用hasAll="true".

degenerate:虚拟维度不需要关联表，可以是单表中某个字段，如Male

type:数据类型（非datatype）

Inline tables:内嵌表

<Dimension name="Severity">
<Hierarchy hasAll="true" primaryKey="severity_id">
    <InlineTable alias="severity">
      <ColumnDefs>
        <ColumnDef name="id" type="Numeric"/>
        <ColumnDef name="desc" type="String"/>
      </ColumnDefs>
      <Rows>
        <Row>
          <Value column="id">1</Value>
          <Value column="desc">High</Value>
        </Row>
        <Row>
          <Value column="id">2</Value>
          <Value column="desc">Medium</Value>
        </Row>
        <Row>
          <Value column="id">3</Value>
          <Value column="desc">Low</Value>
        </Row>
      </Rows>
    </InlineTable>
    <Level name="Severity" column="id" nameColumn="desc" uniqueMembers="true"/>
</Hierarchy>
</Dimension>

这和你在数据库在有一个名为'severity' 的效果一样:

id	desc
1	High
2	Medium
3	Low

CalculatedMember：计算成员非实际存在的列

DimensionUsage：共享维

Join 优化：

If a dimension has a small number of members, Mondrian reads it into a cache on first use. See the mondrian.rolap.LargeDimensionThreshold property.
If a dimension (or, more precisely, the level of the dimension being accessed) is in the fact table, Mondrian does not perform a join.
If two dimensions access the same table via the same join path, Mondrian only joins them once. For example, [Gender] and [Age] might both be columns in the customerstable, joined via sales_1997.cust_id = customers.cust_id.

退化维：不值得建立维度表（如：没有额外信息需要关联，造成无谓连接损耗性能）

因为这里没有连接，所以维度foreignkey 属性也就没有必要使用了，并且层次<table>元素和primaryKey 属性也不需要

Virtual cubes：虚拟立方虚拟立方将两个或两个以上的正规立方(regular cube)联接起来

<VirtualCube name="Warehouse and Sales">
<CubeUsages>
<CubeUsage cubeName="Sales" ignoreUnrelatedDimensions="true"/>
<CubeUsage cubeName="Warehouse"/>
<CubeUsages/>
<VirtualCubeDimension cubeName="Sales" name="Customers"/>
<VirtualCubeDimension cubeName="Warehouse" name="Warehouse"/>
<VirtualCubeMeasure cubeName="Warehouse" name="[Measures].[Average Warehouse Sale]"/>
<CalculatedMember name="Profit Per Unit Shipped" dimension="Measures">
<Formula>[Measures].[Profit] / [Measures].[Units Shipped]</Formula>
</CalculatedMember>
</VirtualCube>

父子层次：树形结构

这里parentColumn 和nullParentValue是重要的属性:
属性parentColumn 是一个成员连接到它父亲成员的列名。在这种情况下, 它是指向雇员经理的外键。元素<Level>的子元素 <ParentExpression> 是与属性 parentColumn 有相同作用的，但是元素允许定义任意的SQL表达式, 就像元素 <Expression>. 属性 parentColumn (或者元素<ParentExpression>) 是维一向Mondrian指出层次有父子结构的。
属性 nullParentValue 是指明成员没有父成员的值。缺省情况下 nullParentValue="null", 但是因为许多数据库不支持null, 建模时用其他值来代替空值，0和-1.

Closure tables ：查找特定层次进行调整父子层次。

Calculated members ：计算成员（如下三种）自定义表达式

假设你想要创建一个度量，但事实表中并没有这个列，一种方法是使用WITH MEMBER子句，如下：

WITH MEMBER [Measures].[Profit] AS '[Measures].[Store Sales]-[Measures].[Store Cost]', FORMAT_STRING = '$#,###' SELECT {[Measures].[Store Sales], [Measures].[Profit]} ON COLUMNS, {[Product].Children} ON ROWS FROM [Sales] WHERE [Time].[1997]

但更好的办法是在模式中定义这个成员，下面在是cube中的定义

<CalculatedMember name="Profit" dimension="Measures"> <Formula>[Measures].[Store Sales] - [Measures].[Store Cost]</Formula> <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/> </CalculatedMember>

如果你喜欢，也可以在属性中声明，如下：

<CalculatedMember name="Profit" dimension="Measures" formula="[Measures].[Store Sales]-[Measures].[Store Cost]"> <CalculatedMemberProperty name="FORMAT_STRING" value="$#,##0.00"/> </CalculatedMember>

<CalculatedMemberProperty > (not <Property>) 元素和MDX语句中的FORMAT_STRING = '$#,###'片段相符，你也可以在这里定义其他特征(properties)，但FORMAT_STRING是到目前为止最常用的。

FORMAT_STRING特征值还可以用一个表达式来判断，在格式化一个单元时，第一个表达式(value<0)将用于判断选择哪个format string来进行格式化，然后才将相应的format string 用于指定的单元值，下面是一个添加了额外表达式的例子：

<CalculatedMemberProperty name="FORMAT_STRING" expression="Iif(Value < 0, '|($#,##0.00)|style=red', '|$#,##0.00|style=green')"/>

One additional calculated member property that is worth mentioning is DATATYPE. As with measures, setting datatype specifies how the calculated member is returned via XML for Analysis. The DATATYPE property of a calculated member can have values "String", "Integer", or "Numeric":

You can specify SOLVE_ORDER for the calculated member property. Solve order determines the priority of calculation in the event of competing expressions <CalculatedMemberProperty name="SOLVE_ORDER" value="2000"/>

Named sets :自定义命名集，随时引用

The WITH SET clause of an MDX statement allows you to declare a set expression which can be used throughout that query. For example,

WITH SET [Top Sellers] AS 'TopCount([Warehouse].[Warehouse Name].MEMBERS, 5, [Measures].[Warehouse Sales])' SELECT {[Measures].[Warehouse Sales]} ON COLUMNS, {[Top Sellers]} ON ROWS FROM [Warehouse] WHERE [Time].[Year].[1997]

The WITH SET clause is very similar to the WITH MEMBER clause, and as you might expect, it has a construct in schema analogous to < CalculatedMember>. The <NamedSet> element allows you to define a named set in your schema as part of a cube definition. It is implicitly available for any query against that cube:

<Cube name="Warehouse"> ... <NamedSet name="Top Sellers"> <Formula>TopCount([Warehouse].[Warehouse Name].MEMBERS, 5, [Measures].[Warehouse Sales])</Formula> </NamedSet> </Cube> SELECT {[Measures].[Warehouse Sales]} ON COLUMNS, {[Top Sellers]} ON ROWS FROM [Warehouse] WHERE [Time].[Year].[1997]

Warehouse	Warehouse Sales
Treehouse Distribution	31,116.37
Jorge Garcia, Inc.	30,743.77
Artesia Warehousing, Inc.	29,207.96
Jorgensen Service Storage	22,869.79
Destination, Inc.	22,187.42

A named set defined against a cube is not inherited by a virtual cubes defined against that cube. (But you can define a named set against a virtual cube.)

You can also define a named set as global to a schema:

<Schema> <Cube name="Sales" ... /> <Cube name="Warehouse" ... /> <VirtualCube name="Warehouse and Sales" .../> <NamedSet name="CA Cities" formula="{[Store].[USA].[CA].Children}"/> <NamedSet name="Top CA Cities"> <Formula>TopCount([CA Cities], 2, [Measures].[Unit Sales])</Formula> </NamedSet> </Schema>

User-defined function：用户自定义函数，有些函数不满足时候需要自定义

Sometimes Mondrian's schema language isn't flexible enough, or the MDX language isn't powerful enough, to solve the problem at hand. What you want to do is add a little of your own Java code into the Mondrian application, and a plug-in is a way to do this.

Each of Mondrian's extensions is technically a Service Provider Interface (SPI); in short, a Java interface which you write code to implement, and which Mondrian will call at runtime. You also need to register an extension (usually somewhere in your schema.xml file) and to ensure that it appears on the classpath.

Plug-ins include user-defined functions; cell, member and property formatters; dynamic schema processors and data source change listeners. There is incomplete support for member readers and cell readers, and in future we may support pluggable SQL dialects.

Other extensions include Dynamic datasource xmla servlet

A user-defined function must have a public constructor and implement the mondrian.spi.UserDefinedFunction interface. For example,

package com.acme; import mondrian.olap.*; import mondrian.olap.type.*; import mondrian.spi.UserDefinedFunction; /** * A simple user-defined function which adds one to its argument. */ public class PlusOneUdf implements UserDefinedFunction { // public constructor public Object execute(Evaluator evaluator, Exp[] arguments) { final Object argValue = arguments[0].evaluateScalar(evaluator); if (argValue instanceof Number) { return new Double(((Number) argValue).doubleValue() + 1); } else { // Argument might be a RuntimeException indicating that // the cache does not yet have the required cell value. The // function will be called again when the cache is loaded. return null; } } public String[] getReservedWords() { return null; } }

Declare it in your schema:

<Schema> ... <UserDefinedFunction name="PlusOne" class="com.acme.PlusOneUdf" /> </Schema>

And use it in any MDX statement:

WITH MEMBER [Measures].[Unit Sales Plus One] AS 'PlusOne([Measures].[Unit Sales])' SELECT {[Measures].[Unit Sales]} ON COLUMNS, {[Gender].MEMBERS} ON ROWS FROM [Sales]

If a user-defined function has a public constructor with one string argument, Mondrian will pass in the function's name. Why? This allows you to define two or more user-defined functions using the same class:

package com.acme; import mondrian.olap.*; import mondrian.olap.type.*; import mondrian.spi.UserDefinedFunction; /** * A user-defined function which either adds one to or * subtracts one from its argument. */ public class PlusOrMinusOneUdf implements UserDefinedFunction { private final name; private final isPlus; // public constructor with one argument public PlusOneUdf(String name) { this.name = name; if (name.equals("PlusOne")) { isPlus = true; } else if (name.equals("MinusOne")) { isPlus = false; } else { throw new IllegalArgumentException("Unexpected name " + name); } } public Object execute(Evaluator evaluator, Exp[] arguments) { final Object argValue = arguments[0].evaluateScalar(evaluator); if (argValue instanceof Number) { if (isPlus) { return new Double(((Number) argValue).doubleValue() + 1); } else { return new Double(((Number) argValue).doubleValue() - 1); } } else { // Argument might be a RuntimeException indicating that // the cache does not yet have the required cell value. The // function will be called again when the cache is loaded. return null; } } public String[] getReservedWords() { return null; } }

and register two the functions in your schema:

<Schema> ... <UserDefinedFunction name="PlusOne" class="com.acme.PlusOrMinusOneUdf"> <UserDefinedFunction name="MinusOne" class="com.acme.PlusOrMinusOneUdf"> </Schema>

If you're tired of writing duplicated User-defined Function declarations in schema files, you can pack your User-defined Function implemention classes into a jar file with a embedded resource file META-INF/services/mondrian.spi.UserDefinedFunction. This resource file contains class names of implementations of interface mondrian.spi.UserDefinedFunction, one name per line. For more details, you may look into src/main/META-INF/services/mondrian.spi.UserDefinedFunction in source ball and Service Provider. User-defined Functions declared by this means are available to all mondrian schema in one JVM.

Caution: you can't define more than one User-defined Function implementations in one class when you declare User-defined Functions in this way.

Member reader：成员读取器用于访问成员

<Dimension name="Has bought dairy">
<Hierarchy hasAll="true" memberReaderClass="mondrian.rolap.HasBoughtDairySource">
<Level name="Has bought dairy" uniqueMembers="true"/>
<Parameter name="expression" value="not used"/>
</Hierarchy>
</Dimension>.

TimeDimension：时间维

Aggregate tables：汇总表

当事实表包含巨大数量的数据时，可以使用汇总表来改善mondrian的执行性能。汇总表保存了事实表中预先计算好的数据

下面是一个简单的汇总表.

<Cube name="Sales"> <Table name="sales_fact_1997"> <AggName name="agg_c_special_sales_fact_1997"> 

<AggForeignKeyfactColumn="product_id" aggColumn="product_id"/> <AggFactCount column="FACT_COUNT"/> <AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/> <AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/> <AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/> <AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/> <AggLevel name="[Time].[Year]" column="TIME_YEAR"/> <AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/> <AggLevel name="[Time].[Month]" column="TIME_MONTH"/> </AggName> </Table>  </Cube>

<AggForeignKey> 元素, 这里没有显示, 允许你直接引用一个维表，而不必包含汇总表中的列. It is described in the aggregate tables guide.

In practice, a cube which is based upon a very large fact table may have several aggregate tables. It is inconvenient to declare each aggregate table explicitly in the schema XML file, and luckily there is a better way. In the following example, Mondrian locates aggregate tables by pattern-matching.

<Cube name="Sales"> <Table name="sales_fact_1997"> <AggPattern pattern="agg_.*_sales_fact_1997">  <AggFactCount column="FACT_COUNT"/> <AggMeasure name="[Measures].[Store Cost]" column="STORE_COST_SUM"/> <AggMeasure name="[Measures].[Store Sales]" column="STORE_SALES_SUM"/> <AggLevel name="[Product].[Product Family]" column="PRODUCT_FAMILY"/> <AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/> <AggLevel name="[Time].[Year]" column="TIME_YEAR"/> <AggLevel name="[Time].[Quarter]" column="TIME_QUARTER"/> <AggLevel name="[Time].[Month]" column="TIME_MONTH"/> <AggExclude name="agg_c_14_sales_fact_1997"/>  <AggExclude name="agg_lc_100_sales_fact_1997"/> </AggPattern> </Table> </Cube>

It tells Mondrian to treat all tables which match the pattern "agg_.*_sales_fact_1997" as aggregate tables, except "agg_c_14_sales_fact_1997" and "agg_lc_100_sales_fact_1997". Mondrian uses rules to deduce the roles of the columns in those tables, so it's important to adhere to strict naming conventions. The naming conventions are described in the aggregate tables guide.

The performance guide has advice on choosing aggregate tables.

agg_.+_sales_fact_1997

This will match the following table names:

agg_l_05_sales_fact_1997
agg_c_14_sales_fact_1997
agg_lc_100_sales_fact_1997
agg_c_special_sales_fact_1997
AGG_45_SALES_FACT_1997
AGG_drop_time_id_sales_fact_1997

Access-control：访问控制

Roles are defined by `<Role>` elements, which occur as direct children of the `<Schema>` element, after the last `<Cube>`. Here is an example of a role:

<Role name="California manager"> <SchemaGrant access="none"> <CubeGrant cube="Sales" access="all"> <HierarchyGrant hierarchy="[Store]" access="custom" topLevel="[Store].[Store Country]"> <MemberGrant member="[Store].[USA].[CA]" access="all"/> <MemberGrant member="[Store].[USA].[CA].[Los Angeles]" access="none"/> </HierarchyGrant> <MemberGrant member="[Customers].[USA].[CA]" access="all"/> <MemberGrant member="[Customers].[USA].[CA].[Los Angeles]" access="none"/> </HierarchyGrant> <HierarchyGrant hierarchy="[Gender]" access="none"/> </CubeGrant> </SchemaGrant> </Role>

A <SchemaGrant> defines the default access for objects in a schema. The access attribute can be "all" or "none"; this access can be overridden for specific objects. In this case, because access="none", a user would only be able to browse the "Sales" cube, because it is explicitly granted.

A <CubeGrant> defines the access to a particular cube. As for <SchemaGrant>, the access attribute can be "all" or "none", and can be overridden for specific sub-objects in the cube.

A <HierarchyGrant> defines access to a hierarchy. The access attribute can be "all", meaning all members are visible; "none", meaning the hierarchy's very existence is hidden from the user; and "custom". With custom access, you can use the topLevel attribute to define the top level which is visible (preventing users from seeing too much of the 'big picture', such as viewing revenues rolled up to the Store Country level); or use the bottomLevel attribute to define the bottom level which is visible (here, preventing users from invading looking at individual customers' details); or control which sets of members the user can see, by defining nested <MemberGrant> elements.

聚合表例子：

<Cube name="Sales">
  <Table name="sales">
    <AggName name="agg_2_sales">
      <AggFactCount column="row count"/>
      <AggForeignKey factColumn="prodid" aggColumn="prodid"/>
      <AggMeasure name="[Measures].[Unit Sales]" column="sum units"/>
      <AggMeasure name="[Measures].[Min Units]" column="min units"/>
      <AggMeasure name="[Measures].[Max Units]" column="max units"/>
      <AggMeasure name="[Measures].[Dollar Sales]" column="sum dollars"/>
      <AggLevel name="[Time].[Year]" column="year"/>
      <AggLevel name="[Time].[Quarter]" column="quarter"/>
      <AggLevel name="[Time].[Month]" column="month"/>
      <AggLevel name="[Payment Method].[Payment Method]" column="payment"/>
      <AggLevel name="[Customer].[State]" column="state"/>
    </AggName>
  </Table>
  <Dimension name="Product">
    <Hierarchy hasAll="true" primaryKey="prodid" primaryKeyTable="Product">
      <Join leftKey="mfrid" rightKey="mfrid">
        <Table name="Product"/>
        <Table name="Mfr"/>
      </Join>
      <Level name="Manufacturer" table="Mfr" column="mfrid"/>
      <Level name="Brand" table="Product" column="brand"/>
      <Level name="Name" table="Product" column="prodid"/>
    </Hierarchy>
  </Dimension>
  <Dimension name="Day" foreignKey="day">
    <Hierarchy hasAll="true" primaryKey="day">
      <Table name="Time" />
      <Level name="Year" column="year" type="Numeric" uniqueMembers="true" />
      <Level name="Quarter" column="quarter" uniqueMembers="false" />
      <Level name="Month" column="month" type="Numeric" uniqueMembers="false" />
    </Hierarchy>
  </Dimension>
  <Dimension name="Customer" foreignKey="custid">
    <Hierarchy hasAll="true" primaryKey="custid">
      <Table name="Customer" />
      <Level name="City" column="city" uniqueMembers="ture" />
      <Level name="State" column="state" uniqueMembers="true" />
    </Hierarchy>
  </Dimension>
  <Dimension name="Payment method">
    <Hierarchy hasAll="true">
      <Level name="Payment method" column="payment" uniqueMembers="ture" />
    </Hierarchy>
  </Dimension>
  <AggMeasure name="Unit Sales" aggregator="sum" />
  <AggMeasure name="Min Units" aggregator="min" />
  <AggMeasure name="Max Units" aggregator="max" />
  <AggMeasure name="Dollar Sales" aggregator="sum" />
</Cube>