Informatica Dynamic Lookup Cache

A LookUp cache does not change its data once built. But what if the underlying table upon which lookup was done changes the data after the lookup cache is created? Is there a way so that the cache always remain up-to-date even if the underlying table changes?

Lookup组件的cache在数据一旦创建后不会改变。但是如果lookup table在cache创建后 发生了改变呢?有没有一种随着lookup table实施更新cache的方式?

1 Why do we need Dynamic Lookup Cache?

我们为什么需要dynamic lookup cache

Let's think about this scenario. You are loading your target table through a mapping.Inside the mapping you have a Lookup and in the Lookup, you are actually looking up the same target table you are loading.

考虑下这种情境, 你通过一个mapping来load的target table,mapping里存在lookup, 并且lookup table正是你在loading的target table。

You may ask me, "So? What's the big deal? We all do it quite often...". And yes you are right.

我们或许经常遇到这样的情境。

There is no "big deal" because Informatica (generally) caches the lookup table in the very beginning of the mapping, so whatever record getting inserted to the target table through the mapping, will have no effect on the Lookupcache.

因为informaica会在mapping开始的时候加载lookup table到cache,所以无论什么记录被插入到target table都不会影响lookup。

The lookup will still hold the previously cached data, even if the underlying target table is changing.

Lookup table保留原有的cache,不会改变。

But what if you want your Informatica Lookup cache to get updated as and when the data in the underlying target table changes?

但是如果你希望lookup cache和lookup table能够实时同步呢?

What if you want your lookup cache to always show the exact snapshot of the data in your target table at that point in time? Clearly this requirement will not be fullfilled in case you use a static cache. You will need a dynamic cache tohandle this.

你需要使用dynamic lookup。

But in which scenario will someone need to use a dynamic cache? To understand this,let's first understand a static cache scenario.

具体在何种情况下使用dynamic lookup呢? 先理解下static cache的方案。

2 Static Lookup Cache Scenario

Let's suppose you run a retail business and maintain all your customer information in a customer master table (RDBMS table). Every night, all the customers from your customer master table is loaded in to a Customer Dimension table in your datawarehouse. Your source customer table is a transaction system table, probably in 3rd normal form, and does not store history. Meaning, if a customer changes his address, the old address is updated with the new address.

假设你运营零售业务,在客户主表里维护所有的客户信息。每天晚上客户主表里所欲客户都要load到DW的客户维度表里。你的源客户表是事物系统表,也许符合3NF,  但是不存储历史。这意味着,如果一个客户变更了地址,旧地址将会被变更为新地址。

But your data warehouse table stores the history (may be in the form of SCDType-II). There is a map that loads your data warehouse table from the source table. Typically you do a Lookup on target (static cache) and check with every incoming customer record to determine if the customer is already existing in target or not. If the customer is not already existing in target, you conclude the customer is new and INSERT the record whereas if the customer is already existing, you may want to update the target record with this new record (if the record is updated). This scenario - commonly known as 'UPSERT' (update else insert) scenario 

但是你的数据仓库存储历史。.从源表load进DW表存在映射。典型地,你对目标进行查找并与进来的客户记录比对来决定该客户是存在于目标中。如果客户不存在,这是一个新客户插入这条记录,如果已经存在,用新的记录去更新。这种方案,通常叫做 UPSERT

Youdon't need dynamic Lookup cache for the above type of scenario.

这种情况下你不需要dynamiclookup

DynamicLookup Cache Scenario

Noticein the previous example I mentioned that your source table is an RDBMS table.Generally speaking, this ensures that your source table does not have anyduplicate record.

But,What if you had a flat file as source with many duplicate records in the samebunch of data that you are trying to load? (Or even a RDBMS table may alsocontain duplicate records)

Wouldthe scenario be same if the bunch of data I am loading contains duplicate?

UnfortunatelyNot. Let's understand why from the below illustration. As you can see below,the new customer "Linda" has been entered twice in the source system- most likely mistakenly. The customer "Linda" is not present in yourtarget system and hence does not exist in the target side lookup cache.

Whenyou try to load the target table, Informatica processes row 3 and inserts it totarget as customer "Linda" does not exist in target. Then Informaticaprocesses row 4 and again inserts "Linda" into target sinceInformatica lookup's static cache can not detect that the customer "Linda"has already been inserted. This results into duplicate rows in target.

Theproblem arising from above scenario can be resolved by using dynamic lookupcache

Hereare some more examples when you may consider using dynamic lookup,

  • Updating a master customer table with both new and updated customer information coming together as shown above
  • Loading data into a slowly changing dimension table and a fact table at the same time. Remember, you typically lookup the dimension while loading to fact. So you load dimension table before loading fact table. But using dynamic lookup, you can load both simultaneously.
  • Loading data from a file with many duplicate records and to eliminate duplicate records in target by updating a duplicate row i.e. keeping the most recent row or the initial row
  • Loading the same data from multiple sources using a single mapping. Just consider the previous Retail business example. If you have more than one shops and Linda has visited two of your shops for the first time, customer record Linda will come twice during the same load.

Howdoes dynamic lookup cache work

Onceyou have configured your lookup to use dynamic cache (we will see below how todo that), when Integration Service reads a row from the source, it updates thelookup cache by performing one of the following actions:

  • Inserts the row into the cache: If the incoming row is not in the cache, the Integration Service inserts the row in the cache based on input ports or generated Sequence-ID. The Integration Serviceflags the row as insert.
  • Updates the row in the cache: If the row exists in the cache, the Integration Service updates the row in the cache based on the input ports. The Integration Serviceflags the row as update.
  • Makes no change to the cache: This happens when the row exists in the cache and the lookup is configured or specified To Insert New Rows only or, the row is not in the cache and lookup is configured to update existing rows only or, the row is in the cache, but based on the lookup condition, nothing changes. The Integration Serviceflags the row as unchanged.

Noticethat Integration Service actually flags the rows based on the above threeconditions.

Andthat's a great thing, because, if you know the flag you can actually reroutethe row to achieve different logic.

Fortunately,as soon as you create a dynamic lookup Informatica adds one extra port to thelookup. This new port is called:

  • NewLookupRow

Usingthe value of this port, the rows can be routed for insert, update or to donothing. You just need to use a Router or Filter transformation followed by anUpdate Strategy.

Oh,forgot to tell you the actual values that you can expect in NewLookupRow portare:

  • 0 = Integration Service does not update or insert the row in the cache.
  • 1 = Integration Service inserts the row into the cache.
  • 2 = Integration Service updates the row in the cache.

Whenthe Integration Service reads a row, it changes the lookup cache depending onthe results of the lookup query and the Lookup transformation properties youdefine. It assigns the value 0, 1, or 2 to the NewLookupRow port to indicate ifit inserts or updates the row in the cache, or makes no change.

Configuringa Dynamic Lookup - Mapping Example

Ok, Idesign a mapping for you to show Dynamic lookup implementation. I have given afull screenshot of the mapping. Since the screenshot is slightly bigger, so Ilink it below. Just click to expand the image.

Ifyou check the mapping screenshot, there I have used a router to reroute theINSERT group and UPDATE group. The router screenshot is also given below. Newrecords are routed to the INSERT group and existing records are routed to theUPDATE group.

DynamicLookup Sequence ID

Whileusing a dynamic lookup cache, we must associate each lookup/output port with aninput/output port or a sequence ID. The Integration Service uses the data inthe associated port to insert or update rows in the lookup cache. The Designerassociates the input/output ports with the lookup/output ports used in thelookup condition.

Whenwe select Sequence-ID in the Associated Port column, the Integration Servicegenerates a sequence ID for each row it inserts into the lookup cache.

When theIntegration Service creates the dynamic lookup cache, it tracks the range ofvalues in the cache associated with any port using a sequence ID and itgenerates a key for the port by incrementing the greatest sequence ID existingvalue by one, when the inserting a new row of data into the cache.

Whenthe Integration Service reaches the maximum number for a generated sequence ID,it starts over at one and increments each sequence ID by one until it reachesthe smallest existing value minus one. If the Integration Service runs out ofunique sequence ID numbers, the session fails.

DynamicLookup Ports

Thelookup/output port output value depends on whether we choose to output old ornew values when the Integration Service updates a row:

  • Output old values on update: The Integration Service outputs the value that existed in the cache before it updated the row.
  • Output new values on update: The Integration Service outputs the updated value that it writes in the cache. The lookup/output port value matches the input/output port value.

Note: We can configure to output old or newvalues using theOutputOld Value On Update transformation property.

HandlingNULL in dynamic LookUp

Ifthe input value is NULL and we select the IgnoreNull inputs for Update property for the associated input port, theinput value does not equal the lookup value or the value out of theinput/output port. When you select the IgnoreNull property, the lookup cache and the target table might becomeunsynchronized if you pass null values to the target. You must verify that youdo not pass null values to the target.

Whenyou update a dynamic lookup cache and target table, the source data mightcontain some null values. The Integration Service can handle the null values inthe following ways:

  • Insert null values: The Integration Service uses null values from the source and updates the lookup cache and target table using all values from the source.
  • Ignore Null inputs for Update property : The Integration Service ignores the null values in the source and updates the lookup cache and target table using only the not null values from the source.

If weknow the source data contains null values, and we do not want the IntegrationService to update the lookup cache or target with null values, then we need tocheck theIgnore Nullproperty for the corresponding lookup/output port.

Whenwe choose to ignore NULLs, we must verify that we output the same values to thetarget that the Integration Service writes to the lookup cache. We canConfigure the mapping based on the value we want the Integration Service tooutput from the lookup/output ports when it updates a row in the cache, so thatlookup cache and the target table might not become unsynchronized.

  • New values. Connect only lookup/output ports from the Lookup transformation to the target.
  • Old values. Add an Expression transformation after the Lookup transformation and before the Filter or Router transformation. Add output ports in the Expression transformation for each port in the target table and create expressions to ensure that we do not output null input values to the target.

Someother details about Dynamic Lookup

Whenwe run a session that uses a dynamic lookup cache, the Integration Servicecomparesthe values in all lookup ports with the values in their associated input portsby default.

Itcompares the values to determine whether or not to update the row in the lookupcache. When a value in an input port differs from the value in the lookup port,the Integration Service updates the row in the cache.

Butwhat if we don't want to compare all ports?

Wecan choose the ports we want the Integration Service to ignore when it comparesports. The Designer only enables this property for lookup/output ports when theport is not used in the lookup condition. We can improve performance byignoring some ports during comparison. (Learn how to improveperformance of lookup transformation here)

Wemight want to do this when the source data includes a column that indicateswhether or not the row contains data we need to update. Select theIgnore inComparison property for all lookup ports except the port that indicates whether or not to updatethe row in the cache and target table.

Note: We must configure the Lookuptransformation to compare at least one port else the Integration Service failsthe session when we ignore all ports.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值