Pivoting Data in OWB

转载 2012年03月29日 19:14:56

The pivot transformation operator enables you to transform a single row of attributes into multiple rows in an efficient manner. This example illustrates transforming a table that has a row for each year with the quarterly sales in a table with a row for each quarter. The OWB pivot operator makes this simple (there is also an unpivot).

So taking a simple example as follows:

YEAR   Q1_sales      Q2_sales Q3_sales   Q4_sales
---------- ---------- ---------- ---------- ----------
      2005      10000      15000      14000      25000
      2006      12000      16000      15000      35000
      2007      16000      19000      15000      34000

we wish to transform the data set to the following with a row for each quarter:

---------- -- ----------
      2005 Q1      10000
      2006 Q1      12000
      2007 Q1      16000
      2005 Q2      15000
      2006 Q2      16000
      2007 Q2      19000
      2005 Q3      14000

We can design this in the OWB mapping as;

Looking at the internals of the operator we see how this is described. The pivot operator allows you to define the input columns, the output columns and how the data is pivoted. This is achieved by defining a few pieces of information;
  • the key columns (the columns from the source that will appear in the output of the pivoted data)
  • the row locator (this is the pivot column)
  • the pivot transformation (which values to project for the pivoted columns
Firstly in our example we define the key column to be YEAR, this will be the same for each pivoted row;

Then we define the QUARTER column as the pivot column, this is the row locator (in OWB terms).

Finally we define how the row is transformed from a row with columns to a number of rows, we do this by entering a row in the table for each case we desire (so we have a row for Q1, a row for Q2, a row ... etc.).

This makes the map design so much simpler as you can see, since the operator encapsulates the transformation in a simple manner. The example has been scripted (get the script here) so that you can create it and have a look around at how this is done....

Here is the sample data I used also:
-- Execute the following where you deployed SALES_BY_QTR
-- I manually added some rows in SALES_BY_QTR for the example:
insert into SALES_BY_QTR values (2005, 10000, 15000, 14000, 25000);
insert into SALES_BY_QTR values (2006, 12000, 16000, 15000, 35000);
insert into SALES_BY_QTR values (2007, 16000, 19000, 15000, 34000);

The reverse of this scenario is the unpivot, the script for the unpivot example can be found here.

You select the key just like the pivot above, then define the row locator (or unpivot column), defining the values for each match row:

Then define the output attributes for the unpivot:

Finally define the unpivot transformations (how the column data is taken from the matching row):

If your data has many rows with sales values for a quarter (for a single year) you will need to aggregate the data before unpivoting, for example the map below first aggregates and sums sales before unpivoting. The data is grouped by YEAR (key) and QUARTER (row locator) and
the output expression has SUM(SALES), the map then unpivots that data. (you cannot tweak the agg function
just now in the unpivot)

Hope this is useful and helps illustrates the pivoting transformation capability.

SQLServer 2005 MSG 2570 类型为"In-row data" 中。列的值超出了数据类型的范围

修复 SQLServer 2005 MSG 2570 类型为"In-row data")中。列 "NAME" 的值超出了数据类型"varchar"的范围...

HR--Copy of fetching HR data in different ways

There are 3 methods in fetching the data from HR database tables . 1.using select statements . Thi...

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center(论文译文)

Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center论文译文: 对Mesos的整体架构进行介绍...

【分享】More Author Data in ArnetMiner

More Autor Data in ArnetMiner aminer-authorname-and-id: Mapping between name and id of authors in...

bug宝典 hadoop篇 /hadoop/hdfs/data is in an inconsistent state: file VERSION has cTime missing.

原本2个datanode,现在只看到了一个 从下图也可以看到只有一个live nodes,但是也没有dead node啊。 检查进程, 缺少DataNode[hadoop@dashuju1...
  • warrah
  • warrah
  • 2017年02月20日 13:40
  • 495

数据结构复习笔记(Data Structures & Algorithms in Java, 4th) --Memory Management (Heap & Stack)

这篇文章主要是学习Java虚拟机(JVM)内存管理的基础,学习内存管理所需要使用的几个数据结构 (Stack, Heap)。...

RDF data in Neo4J - the Tinkerpop story

My previous blog post discussed the use of Neo4J as a RDF triple store. Michael Hunger however...

Spatial data in sql and draw boundary on bing maps

Spatial data at the sql server data in sql and draw boundary on bing maps" TITLE="Spatial d...

II. Data types in the Win32 API

II. Data types in the Win32 API (Win32 API 中的数据类型)WINAPI 定义了它独有的数据类型们,它们被定义在 C/C++ 语言中(int, char, fl...

Persistent and Transient Data Structures in Clojure

最近在项目中用到了Transient数据结构,使用该数据结构对程序执行效率会有一定的提高。刚刚接触Transient Data Stuctures,下面将自己关于对其的了解总结如下:1.clojure...
  • zdplife
  • zdplife
  • 2016年08月06日 21:03
  • 484
您举报文章:Pivoting Data in OWB