Informatica Best Practeces for ICC-CSDN博客

本文链接：https://blog.csdn.net/langcaiye/article/details/6573074

本文探讨了ICC作为整合标准的角色及其应用场景，包括命名规则、SQL覆盖问题、source与target的组织方式等。同时，提供了Lookup Transformation优化技巧，并讨论了复用与非复用转换、会话及工作流的内部差异。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

ICC的意思我总是记不住。老外好像蛮喜欢搞这些概念性的东西，叫的名字一个比一个好听。我只谈论一下我对ICC粗浅的理解。

所谓整合的东西，在这里就是提供了一些标准。包括命名规则等等。

mapping里其实是不提倡override任何东西的。尤其是sql override这种其实很好用的尴尬功能。因为写sql真的很方便，但sql是基于特定数据库的。出于将来对不同数据库迁移的考虑，sql override又很不应该被提倡。一个现实好用却不被提倡的东西，只能是个杯具。

关于source和target统一建立在一个独立的文件夹里，然后其他文件夹使用其快捷方式（shortcut）。这种组织方式是有其一定优越性的。缺点是把事情搞得复杂了，但优点比较明显：对于source和target的维护只要在一个地方进行就可以了。不过，再仔细想想，source和target变了，mapping还能独善其身吗？！

target load type里头的bulk和normal

bulk load绕过DB的log，大量快速进行，无法roll back。一般情况下建议只在truncate load的时候用bulk，存在update的时候都用normal。事实上对于update的操作，bulk会自动跳档到normal的。所以这里的建议是对同是存在insert和update的时候，均使用normal。

Lookup Transformation Optimizing Tips

When your source is large, cache lookup table columns for those lookup tables of 500,000 rows or less. This typically improves performance by 10 to 20 percent.
The rule of thumb is not to cache any table over 500,000 rows. This is only true if the standard row byte count is 1,024 or less. If the row byte count is more than 1,024, then the 500k rows will have to be adjusted down as the number of bytes increase (i.e., a 2,048 byte row can drop the cache row count to between 250K and 300K, so the lookup table should not be cached in this case). This is just a general rule though. Try running the session with a large lookup cached and not cached. Caching is often still faster on very large lookup tables.
When using a Lookup Table Transformation, improve lookup performance by placing all conditions that use the equality operator = first in the list of conditions under the condition tab.
Cache only lookup tables if the number of lookup calls is more than 10 to 20 percent of the lookup table rows. For fewer number of lookup calls, do not cache if the number of lookup table rows is large. For small lookup tables(i.e., less than 5,000 rows), cache for more than 5 to 10 lookup calls.
Replace lookup with decode or IIF (for small sets of values).
If caching lookups and performance is poor, consider replacing with an unconnected, uncached lookup.
For overly large lookup tables, use dynamic caching along with a persistent cache. Cache the entire table to a persistent file on the first run, enable the update else insert option on the dynamic cache and the engine will never have to go back to the database to read data from this table. You can also partition this persistent cache at run time for further performance gains.
Review complex expressions.

关于不推荐使用update else insert这种方式，我只找到这个非官方的回答： Informatica best practices does not recommend this approach because generally it does result in slower performance.

Questions:

What's the internal differences between reusable and non-reusable transformation, session, and worklet?

How to organize the tgt table, mapping and workflow?

最后是一些rep DB里的table的参考（更多建议使用View）

OPB_SUBJECT - PowerCenter folders table

This table stores the name of each PowerCenter repository folder.

Usage: Join any of the repository tables that have SUBJECT_ID as column with that of SUBJ_ID in this table to know the folder name.

OPB_MAPPING - Mappings table

This table stores the name and ID of each mapping and its corresponding folder.

Usage: Join any of the repository tables that have MAPPING_ID as column with that of MAPPING_ID in this table to know the mapping name.

OPB_TASK - Tasks table like sessions, workflow etc

This table stores the name and ID of each task like session, workflow and its corresponding folder.

Usage: Join any of the repository tables that have TASK_ID as column with that of TASK_ID/SESSION_ID in this table to know the task name. Observe that the session and also workflow are stored as tasks in the repository. TASK_TYPE for session is 68 and that of the workflow is 71.

OPB_SESSION - Session & Mapping linkage table

This table stores the linkage between the session and the corresponding mapping. As informed in the earlier paragraph, you can use the SESSION_ID in this table to join with TASK_ID of OPB_TASK table.

OPB_TASK_ATTR - Task attributes tables

This is the table that stores the attribute values (like Session log name etc) for tasks.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this table means. You can know more about OPB_ATTR table in the next paragraphs.

OPB_WIDGET - Transformations table

This table stores the names and IDs of all the transformations with their folder details.

Usage: Use WIDGET_ID from this table to that of the WIDGET_ID of any of the tables to know the transformation name and the folder details. Use this table in conjunction with OPB_WIDGET_ATTR or OPB_WIDGET_EXPR to know more about each transformation etc.

OPB_WIDGET_FIELD - Transformation ports table

This table stores the names and IDs of all the transformation fields for each of the transformations.

Usage: Take the FIELD_ID from this table and match it against the FIELD_ID of any of the tables like OPB_WIDGET_DEP and you can get the corresponding information.

OPB_WIDGET_ATTR - Transformation properties table

This table stores all the properties details about each of the transformations.

Usage: Use the ATTR_ID of this table to that of the ATTR_ID of OPB_ATTR table to find what each attribute in this transformation means.

OPB_EXPRESSION - Expressions table

This table stores the details of the expressions used anywhere in PowerCenter.

Usage: Use this table in conjunction with OPB_WIDGET/OPB_WIDGET_INST and OPB_WIDGET_EXPR to get the expressions in the Expression transformation for a particular, mapping or a set.

OPB_ATTR - Attributes

This table has a list of attributes and their default values if any. You can get the ATTR_ID from this table and look it up against any of the tables where you can get the attribute value. You should also make a note of the ATTR_TYPE, OBJECT_TYPE_ID before you pick up the ATTR_ID. You can find the same ATTR_ID in the table, but with different ATTR_TYPE or OBJECT_TYPE_ID.

OPB_COMPONENT - Session Component

This table stores the component details like Post-Session-Success-Email, commands in Post-Session/pre-Session etc.

Usage: Match the TASK_ID with that of the SESSION_ID in OPB_SESSION table to get the SESSION_NAME and to get the shell command or batch command that is there for the session, join this table with OPB_TASK_VAL_LIST table on TASK_ID.

OPB_CFG_ATTR - Session Configuration Attributes

This table stores the attribute values for Session Object configuration like "Save Session log by", Session log path etc.

Widget Ids and transformation types

1 Source
2 Target
3 Source Qualifier
4 Update Strategy
5 expression
6 Stored Procedures
7 Sequence Generator
8 External Procedures
9 Aggregator
10 Filter
11 Lookup
12 Joiner
14 Normalizer
15 Router
26 Rank
44 mapplet
46 mapplet input
47 mapplet output
55 XML source Qualifier
80 Sorter
97 Custom Transformation