Informatica Powercenter调优

最新推荐文章于 2024-04-09 16:42:36 发布

shpd2852856

最新推荐文章于 2024-04-09 16:42:36 发布

阅读量2.3k

点赞数

分类专栏： Informatica Powercenter 文章标签： transformation table cache session join sql

本文链接：https://blog.csdn.net/shpd2852856/article/details/7408281

版权

Informatica Powercenter 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Mapping 调优

1) 尽量在source qualifier里过滤数据

2) Join的时候，用数据量较少的表作为Master Table, 尽量用整型字段作为key来join

3) 用aggregator, joiner的时候，如果能确认输入的数据已排过序, 可以打开sorted input. (Data must be sorted in the order of the 'group by' ports.) if the size of the data is small, can use the sorter trans before the aggregater trans. Otherwise do the sort out of informatica. when the size greater than 250,000 rows, it's better to use sorted input.

4) 用|| 而不是concat函数来连接字符

5) Transaction Control Transformation会影响后面的transformation（Aggregater[, Joiner, Rank, Soter] with the All input level transformation scope）或者target的效率, 因为它会忽略超过transaction上线的数据

6) SQL Override的时候尽量不要用aggregate function （it may cause the temp space in Oracle to fill up when selecting large amounts of data）

7) 优化Lookup

caching lookups, 但是当输入记录比较少（5000 rows）,但是lookup table比较大（5,000,000）, then the lookup table should not be cached.

static cache(by default), it is built when it processes the first lookup request, and its not updated during the session if rows are inserted.

dynamic cache, it should be used when the lookup table is also the target table. so that the lookup cache is updated.

persistent cache, when u want to save and reuse the cache files, when the lookup table doesn't change between session runs.

lookup trans应该去掉不用的端口

lookup condition把=（equality operator）用作第一个条件

lookup trans默认给每一个lookup port添加order by, 可以在lookup override sql中注释掉一部分字段来提高效率。或者在order by port上添加index

===================================================================================

Identify the bottlenecks in below stages :
Target
Source
Mapping
Session
System

Methods :
1) Test the throughput
To verify the database tuning, create a pipeline with one partition and measure the reader throughput in the workflow monitor and after that add more partitions and verify the throughput scales linearly. If the session has two partitions, the reader throughput should be twice as fast. If the reader throughput is slow then database has to be tuned properly.

Optimizing the Target
1) drop indexes and key constraints before load, then create them after load by pre-session and post-session sql command or pre-load and post-load store procedure.

2) minimize the relational database connections

3) using bulk load

4) In case the target is flat file, create in the same Informatica server, then FTP it to other server if needs.

Optimizing the Source
1) better to join multiple sources in the source qualifier, instead of using joiner trans.

2) create an index on the order by or group by column.

3) using conditional filter in the source qualifer to filter out unnecessary rows.

Optimizing the LookUp
1) consider to disable the lookup cache, if the lookup table is huge, and no efficient sql override. But also needs to consider about the no. of rows which will refer the lookup table. Uncached lookup can be made efficient by creating index on columns used in the lookup condition.

2) using conditional filter in the sql override to reduce the no. of rows cached.

3) don't fetch the unnecessary fields

4) remove the default ORDER BY clause, and create index on columns used in the ORDER BY clause.

5) use lookup trans towards the end of the mapping, this eliminates carring extra ports through the pipeline.

Optimizing the Aggregator
1) use Sorter trans first, then enable the Sorted Input Option. You can do the ORDER BY in SQ also.

2) if possible, use nemeric ports for group by columns.

Optimizing the Joiner
1) mark the source with less number of rows as MASTER.

2) if possible, use nemeric ports for join condition.

Optimizing the Sorter
1) if the amount of source data is greater than the Sorter cache size, then the Integration Service requires more than twice the amount of disk space. For optimal performance, allocate sort cache size as 16 MB of physical memory.

Optimizing the Expression
1) use operatoer instead of function in the expression.

2) uncheck the output option for the ports which are not required as output.

Optimizing in General
1) remove the unnecessary ports

2) don't use the verbose data

3) avoid unnecessary data type conversion

4) when updating targets, only link the necessary ports

5) use shortcut in sharefolder, and reusable components like mapplet, transformation

Optimizing in Session
1) A workflow runs faster when you do not configure it to write session and workflow log files. Workflows and sessions always create binary logs. To write the session or workflow log file, the Integration Service writes logging events twice

Optimizing by Partition
1) In task window under Config Object tab there is one option called Partitioning Options. In that there is one attribute called Dynamic Partitioning. We need to select the value Based on number of nodes in grid.
2) This will dynamically use the available nodes on Informatica servers. If dev has two nodes then the session will run with two partitions and if Prod has five nodes then the session will run with five partitions.
3) Merge the output file if using dynamical partitions. (sequential merge)
4) It is always best to select Target instances and Source qualifier as partition points. For these transformations we can select direct pass-through partition as type.
5) Also if the mapping involves aggregator we need to select a partition point before the aggregator. It is advised to select the sorter transformation as a partition point and for sorter we can select Hash Auto Keys partition as type