Pipeline Partitioning

原创 2015年07月09日 17:55:41

A pipeline consists of a source qualifier and all the transformations and targets that receive data from that source qualifier.

You can set the following attributes to partition a pipeline:
¨ Partition points. Partition points mark thread boundaries and divide the pipeline into stages. The Integration
Service redistributes rows of data at partition points.
¨ Number of partitions. A partition is a pipeline stage that executes in a single thread. If you purchase the
Partitioning option, you can set the number of partitions at any partition point.When you increase or decrease the number of partitions at any partition point, the Workflow Manager increases or decreases the number of partitions at all partition points in the pipeline.

¨ Partition types. The Integration Service creates a default partition type at each partition point. If you have the
Partitioning option, you can change the partition type. The partition type determines how the Integration Service redistributes data across partition points.


A partition is a pipeline stage that executes in a single reader, transformation, or writer thread.The number of
partitions in any pipeline stage equals the number of threads in the stage
. By default, the Integration Service
creates one partition in every pipeline stage.


You can define up to 64 partitions at any partition point in a pipeline.The number of partitions remains consistent throughout the pipeline. If you define three partitions at any partition point, the Workflow Manager creates three partitions at all other partition points in the pipeline.In certain circumstances, the number of partitions in the pipeline must be set to one.


You can define the following partition types in the Workflow Manager:
¨ Database partitioning. The Integration Service queries the IBM DB2 or Oracle database system for table
partition information. It reads partitioned data from the corresponding nodes in the database. You can use
database partitioning with Oracle or IBM DB2 source instances on a multi-node tablespace. You can use
database partitioning with DB2 targets.
¨ Hash auto-keys. The Integration Service uses a hash function to group rows of data among partitions. The
Integration Service groups the data based on a partition key. The Integration Service uses all grouped or sorted
ports as a compound partition key. You may need to use hash auto-keys partitioning at Rank, Sorter, and
unsorted Aggregator transformations.
¨ Hash user keys. The Integration Service uses a hash function to group rows of data among partitions. You
define the number of ports to generate the partition key.
¨ Key range. With key range partitioning, the Integration Service distributes rows of data based on a port or set
of ports that you define as the partition key. For each port, you define a range of values. The Integration
Service uses the key and ranges to send rows to the appropriate partition. Use key range partitioning when the
sources or targets in the pipeline are partitioned by key range.
¨ Pass-through. In pass-through partitioning, the Integration Service processes data without redistributing rows
among partitions. All rows in a single partition stay in the partition after crossing a pass-through partition point.
Choose pass-through partitioning when you want to create an additional pipeline stage to improve
performance, but do not want to change the distribution of data across partitions.
¨ Round-robin. The Integration Service distributes data evenly among all partitions. Use round-robin partitioning
where you want each partition to process approximately the same number of rows.


1)分区单位是pipeline,partition point把pipeline分成若干个stage,在partition point处可以设置partiton type,the number of partitions,
the number of partitons 在整条pipeline中数目必须一样,在partition point处将按照partition type做redistribute datas among the partitons.然后将分区后的数据交由partition point transformation处理,数据分区在到达下一个partition point之前保持不变,如果下一个partiton points使用的partition type不是pass through(或者和上一个partiton points相同的partition type),则数据重新分区。
在一些pipeline stage,可以把all source datas放在一个partition中,其他partition数据为空,这样可以在一个partition中sort all the datas,然后pass the sorted datas to 需要sorted data的transformation,像:sorted Joiner Transformation,sorted Aggregator.

2)有些Transformation默认设置好了partition point,比如:
Source Qualifier,Normalizer
Controls how the Integration Service extracts data from the source and passes it to the source qualifier.
You cannot delete this partition point.

Rank,Unsorted Aggregator:
Ensures that the Integration Service groups rows properly before it sends them to the transformation.
You can delete these partition points if the pipeline contains only one partition or if the Integration Service passes all rows in a group to a single partition before they enter the transformation.

Target Instances
Controls how the writer passes data to the targets.
You cannot delete this partition point.

Multiple Input Group
The Workflow Manager creates a partition point at a multiple input group transformation when it is configured to process each partition with one thread,
or when a downstream one input group Custom transformation is configured to process each partition with one thread.
For example, the Workflow Manager creates a partition point at a Joiner transformation that is connected to a downstream Custom transformation configured to use one thread per partition.
This ensures that the Integration Service uses one thread to process each partition at a Custom transformation that requires one thread per partition.
You cannot delete this partition point.

3)一些需要重新整合数据的Transformation需要自己设置partition point和partition type,保证传给该Transformation的数据符合该transformation的数据要求像:group data,sorted data,cache data的要求。如果设置的partition type或者the number of partitions设置不正确,会导致session fail.



UVa 11584 Partitioning by Palindromes(DP 最少对称串)

题意  判断一个串最少可以分解为多少个对称串   一个串从左往后和从右往左是一样的  这个串就称为对沉串 令d[i]表示给定串的前i个字母至少可以分解为多少个对称串  那么对于j=1~i   若(i,...
  • acvay
  • acvay
  • 2014年08月25日 15:07
  • 648

leetcode之 Palindrome Partitioning I&II

1 Palindrome Partitioning 问题来源:PalindromePartitioning 该问题简单来说就是给定一个字符串,将字符串分成多个部分,满足每一部分都是回文串,请输出所有...
  • yutianzuijin
  • yutianzuijin
  • 2013年11月20日 21:06
  • 13480

LeetCode Palindrome Partitioning

LeetCode Palindrome Partitioning 解题报告 将输入的字符串划分为一组回文字符串。动态规划加深度搜索。...
  • worldwindjp
  • worldwindjp
  • 2014年03月25日 10:45
  • 8597

LeetCode131:Palindrome Partitioning

Given a string s, partition s such that every substring of the partition is a palindrome. Retur...
  • u012501459
  • u012501459
  • 2015年07月07日 17:51
  • 930

LeetCode(131)Palindrome Partitioning

题目 Given a string s, partition s such that every substring of the partition is a palindrome. Retu...
  • fly_yr
  • fly_yr
  • 2015年12月28日 13:42
  • 1156

导入数据出错With the Partitioning, OLAP and Data Mining options

因为oracle没有启动分区Partitioning; 启动方法: pl/sql执行:select * from v$option where parameter = 'Partitioning'...
  • luman1991
  • luman1991
  • 2017年02月15日 20:43
  • 751

【LeetCode】132. Palindrome Partitioning II 基于动态规划DP、C++、Java的分析及解法

132. Palindrome Partitioning II Total Accepted: 50256 Total Submissions: 230441 Difficulty: Hard ...
  • Jin_Kwok
  • Jin_Kwok
  • 2016年05月16日 20:29
  • 781

Palindrome Partitioning与动态规划

首先看Leetcode上的Palindrome Partitioning题目: Given a string s, partition s such that every substring of...
  • xyzker
  • xyzker
  • 2015年09月24日 11:39
  • 496

这里和大家分享一下SqlServer 分区遇到的问题 How to Remove (Undo) Table Partitioning

The Problem - We have two partitioned tables (PartitionTable1 & PartitionTable2) split across four f...
  • b3727180
  • b3727180
  • 2014年11月19日 10:26
  • 311

uva 11584 - Partitioning by Palindromes (dp)

  • Wiking__acm
  • Wiking__acm
  • 2012年12月07日 09:05
  • 1586
您举报文章:Pipeline Partitioning