BOOST lookup performance

本文介绍了几种优化Informatica Lookup操作的方法,包括使用缓存、持久化缓存、非连接查找及SQL查询中的ORDER BY子句等技巧,这些方法能够显著减少资源消耗并提高处理效率。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >



Introduction

Lookups are expensive in terms of resources and time.

A set of tips about how to setup lookup transformations would dramatically improve the main constrains such as time and performance.

 In this article you will learn about the following topics:

- Lookup cache

- Persistent lookup cache

- Unconnected lookup

- Order by clause within SQL

Lookup Cache

Problem:

For non-cached lookups, Informatica hits the database and bring the entire set of rows for each record coming from the source. There is an impact in terms of time and resources. If there are 2 Million rows from the source qualifier, Informatica hits 2 Million times the database for the same query.


 Solution:


When a lookup is cached: Informatica queries the database, brings the whole set of rows to the Informatica server and stores in a cache file. When this lookup is called next time, Informatica uses the file cached. As a result, Informatica saves the time and the resources to hit the database again.


 When to cache a lookup?


As a general rule, we will use lookup cache when the following condition is satisfied:


                                              N>>M


N is the number of records from the source


M is the number of records retrieved from the lookup


 Note: Remember to implement database index on the columns used in the lookup condition to provide better performance in non-cached lookups.


 


Persistent Lookup Cache


 Problem:


Informatica cache the lookups by default. Let’s consider the following scenario: A lookup table is used many times in different mappings. In each Lookup transformation, Informatica builds the same lookup cache table over and over again. Do we need to build the lookup cache every time for each lookup?


 Solution:


It is possible to build the cache file once instead of creating the same cache file N-times.


Just using persistent cache option will allow Informatica to save resources and time for something done before.


Check the following parameters in the transformation to use Persistent Lookup cache:


- Lookup caching enabled


- Lookup cache persistent


 


jg1


Figure 1: Cache Persistent Enabled


 


From now onwards, the same cache file will be used in all the consecutive runs, saving time building the cache file. However, the lookup data might change and then the cache must be refreshed by either deleting the cache file or checking the option “Re-cache from lookup source”.


jg2


Figure 2:Re-cache from Lookup Source Enabled


 


In case of using a lookup reusable in multiple mappings we will have 1 mapping with “Re-cache” option enabled while others will remain with the “Re-cache” option disabled. Whenever the cache needs to be refreshed we just need to run the first mapping.


Note:Take into account that it is necessary to ensure data integrity in long run ETL process when underlying tables change frequently. Furthermore, Informatica Power Center is not able to create larger files than 2GB. In case of a file exceeds 2GB, Informatica will create multiple cache files. Using multiple files will decrease the performance. Hence, we might consider joining the lookup source table in the database.


 


Unconnected lookup


 Problem:


Imagine the following mapping with 1,000,000 records retrieved from the Source Qualifier:


jg3


Figure 3: Connected Lookup Transformation


Suppose out of a million records, the condition is satisfied 10% of the amount of records. In case of connected lookup, the lookup will be called 900,000 times even there isn’t any match.


 Solution:


It is possible calling the Lookup transformation only when the condition is satisfied. As a result, in our scenario the transformation will be called and executed only 100,000 of times out of 1M. The solution is using an Expression transformation that calls the lookup transformation that is not connected to the dataflow:


jg4


Figure 4: Unconnected Lookup Transformation


 


For instance, an Expression transformation will contain a port with the following expression:


 IIF (ISNULL (COUNTRY),


:LKP.LKP_COUNTRY (EMPLOYEE_ID), COUNTRY)


 


If the COUNTRY is null, then the lookup named LKP_COUNTRY is called with the parameter EMPLOYEE_ID.


The ports in the look up transformation are COUNTRY and EMPLOYEE_ID, as well as the input port.


 


Order by clause within SQL


 Informatica takes the time (and the effort) to bring all the data for each port within the lookup transformation. Thereby, it is recommended to get rid of those ports that are not used to avoid additional processing.


 It is also a best practice to perform “ORDER BY” clause on the columns which are being used in the join condition. This “ORDER BY” clause is done by default and helps Informatica to save time and space to create its own index. Informatica sorts the query for each column on the SELECT statement. Hence, redundant or unnecessary columns should not be here.


 To avoid any sort, just add a comment at the end of the SQL override:


jg5


Figure 5: To Avoid ORDER BY in SQL Override


 


To sum up, it is possible to enhance Informatica lookups by using different set of configurations in order to increase performance as well as save resources and time. However, before applying any of the mentioned features, an analysis of the tables and the SQL queries involved needs to be done.

内容概要:《2024年中国城市低空经济发展指数报告》由36氪研究院发布,指出低空经济作为新质生产力的代表,已成为中国经济新的增长点。报告从发展环境、资金投入、创新能力、基础支撑和发展成效五个维度构建了综合指数评价体系,评估了全国重点城市的低空经济发展状况。北京和深圳在总指数中名列前茅,分别以91.26和84.53的得分领先,展现出强大的资金投入、创新能力和基础支撑。低空经济主要涉及无人机、eVTOL(电动垂直起降飞行器)和直升机等产品,广泛应用于农业、物流、交通、应急救援等领域。政策支持、市场需求和技术进步共同推动了低空经济的快速发展,预计到2026年市场规模将突破万亿元。 适用人群:对低空经济发展感兴趣的政策制定者、投资者、企业和研究人员。 使用场景及目标:①了解低空经济的定义、分类和发展驱动力;②掌握低空经济的主要应用场景和市场规模预测;③评估各城市在低空经济发展中的表现和潜力;④为政策制定、投资决策和企业发展提供参考依据。 其他说明:报告强调了政策监管、产业生态建设和区域融合错位的重要性,提出了加强法律法规建设、人才储备和基础设施建设等建议。低空经济正加速向网络化、智能化、规模化和集聚化方向发展,各地应找准自身比较优势,实现差异化发展。
数据集一个高质量的医学图像数据集,专门用于脑肿瘤的检测和分类研究以下是关于这个数据集的详细介绍:该数据集包含5249张脑部MRI图像,分为训练集和验证集。每张图像都标注了边界框(Bounding Boxes),并按照脑肿瘤的类型分为四个类别:胶质瘤(Glioma)、脑膜瘤(Meningioma)、无肿瘤(No Tumor)和垂体瘤(Pituitary)。这些图像涵盖了不同的MRI扫描角度,包括矢状面、轴面和冠状面,能够全面覆盖脑部解剖结构,为模型训练提供了丰富多样的数据基础。高质量标注:边界框是通过LabelImg工具手动标注的,标注过程严谨,确保了标注的准确性和可靠性。多角度覆盖:图像从不同的MRI扫描角度拍摄,包括矢状面、轴面和冠状面,能够全面覆盖脑部解剖结构。数据清洗与筛选:数据集在创建过程中经过了彻底的清洗,去除了噪声、错误标注和质量不佳的图像,保证了数据的高质量。该数据集非常适合用于训练和验证深度学习模型,以实现脑肿瘤的检测和分类。它为开发医学图像处理中的计算机视觉应用提供了坚实的基础,能够帮助研究人员和开发人员构建更准确、更可靠的脑肿瘤诊断系统。这个数据集为脑肿瘤检测和分类的研究提供了宝贵的资源,能够帮助研究人员开发出更准确、更高效的诊断工具,从而为脑肿瘤患者的早期诊断和治疗规划提供支持。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值