TIGER QA CheckList for DataStage Programs

最新推荐文章于 2023-04-06 13:46:53 发布

kj110

最新推荐文章于 2023-04-06 13:46:53 发布

阅读量342

点赞数

分类专栏： DATASTAGE 文章标签： parallel jobs file standards containers reference

本文链接：https://blog.csdn.net/kj110/article/details/5959482

版权

DATASTAGE 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

General Rules
1 File and identifiers named according to standards
2 Header documented with copyright, overview and amendment history
3 Comments added on amendment history and around code
4 Use of Transformer Stages avoided if there are more than 500,000 records
5 Use of multiple Transformer Stages where the logic can be combined into a single stage avoided
6 Use $PROJDEF as the value of job parameter that derived from environment variable, if possible
7 Define password variable as encrypted type
8 Keep parallel/server job containing no more than 50 stages, including those in containers
9 Disable job monitor by setting $APT_NO_JOBMON=TRUE, if possible
10 Use Sequential File Stage’s filter option to read part of the file content
11 Perform sort before Join, Merge and Aggregator Stages
12 Unneeded columns removed as early as possible in the job flow
13 Combine operations around the same sort keys if possible
14 The null field value and length for nullable column in Sequential File Stage is defined
15 Cross link avoided
16 Use of server jobs avoided, if possible
Parallel Jobs
17 Run time column propagation avoided
18 Repartition avoided
19 Use link sort with repartition, if possible
20 Use link sort instead of sort stage, if possible
21 Define links with less data as reference link for Lookup Stage
22 Use one node configuration for jobs that deal with small amount of data
23 Only use stage-wise config (.apt) for reading files in a specific machine
24 Use of Basic Transformer Stage avoided
25 Use Change Capture Stage instead of Compare Stage, if possible
26 Use persistent data set for sharing between parallel jobs and large volumes of data processed
27 Use of data set for long-term backup and recovery of source data avoided
28 Reading from sequential files using the Same partition method avoided
29 Use Join Stage instead of Lookup Stage if datasets are larger than available memory resources
30 Use Copy Stage for simple operations such as renaming columns, dropping columns and implicit type conversions
31 Use Modify Stage for single column operations such as keep/drop column, type conversion and null handling
32 Use Modify Stage for explicit type conversion and null handling
33 Every parallel job should have $APT_CONFIG_FILE as project parameter and with the value $PROJDEF
34 Write to a data set or file set instead of a sequential file if the data is going to be read back in parallel
35 Modify of buffering mode avoided
Server Jobs
36 Use an Inter Process Stage instead of a Sequential File Stage if the data is going to be read back
Job Senquence
37 Sequencer wrapped by StartLoop/EndLoop Activities with large loop number avoided
Container
38 Use of Server Shared Container in parallel job avoided, if possible
Routine
39 Attach the job to get the new jobhandler if it is reset in the routine
Multiple Instance
40 Use of multiple instance avoided, if possible
41 Table/file lock avoided
42 Invocation id is given in each calling program
43 Duplicate invocation id avoided
44 Misarranged log entry avoided
Meta Data
45 Date and timestamp type defined correctly (timestamp's length is 19 for Oracle)

kj110

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
TIGER QA CheckList for DataStage Programs

1File and identifiers named according to standards2Header documented with copyright, overview and amendment history3Comments added on amendment history and around code4Use of Transformer Stages avoided if there are more than 500,000 records5Use of multiple
复制链接

扫一扫