最近要求写一段程序,现把问题描述一下,以整理思绪。
这张表叫做收寄表TB_EVT_MAIL_CLCT,主要描述收寄的相关信息,为做简化,包含字段有邮件编号,收寄日期,收件人,发件人,加载日期,加载时间
收寄日期(CLCT_DATE)就是指的是你去邮局寄邮件的日期,但收寄日期并不代表邮局人员录入数据到系统的时间,很有可能他是隔了一两天才录入得,据经验过了一周之后,一周之前的收寄日期所在邮件的都会进入系统中。比如收寄日期为20130601,那20130608时,收寄日期为20130601所在的记录都进入系统。
加载日期(LOAD_DATE)就是指加载到数据仓库的日期。一般每隔半小时都会有数据加载到数据库中,也就是说收寄日期为20130601的邮件信息有可能在20130602不同时间段有数据加载入库。
目的是用来分析,比如收寄日期为20130601,大致何时这个日期的邮件都会加载入库。
--------------------------------------------------------------------------------------------------------------------------------------------------
我的设计如下:
设计一张表,用来解决问题。该表的字段有如下,举2条记录
Begin_Date | Fir_Acc | Sec_Acc | Thi_Acc | Fou_Acc | Fiv_Acc | Six_Acc | Sev_Acc | End_Acc |
20130601 | 100 | 200 | 300 | 400 | 500 | 500 | 500 | 20130605 |
20130602 | 100 | 200 | 300 | 300 | 300 | 300 | 300 | 20130604 |
不再增加便意味着结束。
邮件记录数比如FirDay_Account, 可以如下得到
SEL count(*)
From TB_EVT_MAIL_CLCT
WHERE CLCT_DATE = '20130601'
邮件记录数比如SecDay_Account, 可以如下得到
SEL count(*)
From TB_EVT_MAIL_CLCT
WHERE CLCT_DATE = '20130602'
可以看到第二天邮件的数量的是累加的。
程序可以用perl的DBI或者sql来写,程序每天执行一次。
--------------------------------------------------------------------------------------------------------------------------------------------------
之前一直用c语言的单个增加的思想,导致在设计实现上存在困难,具体表现在处理变量变化性上出现问题。仔细一想,用集合的思想便可以解决问题。比如group by便是典型的集合思想,有效解决了时间递进的问题。以下是实现的步骤
之前的改进:为后续做更好的分析,在目标表的末尾添加一字段End_Account。
0、目标表设置为SET,这样可以排重。
1、Load_Date -CLCT_Date作为从源表中插入数据的选择条件,共分7种情况,并将数据加载到临时表TEMP_TB_DAILY_MAIL_CLCT_SUM
对特殊的两个字段处理,End_Date 取TD数据库时间最大值。End_Account先赋值0.
2、临时表的记录按CLCT_Date做Group by,并做sum,加载到目标表中。
3、处理End_Date和End_Account字段,这里做个假设,若是相邻两天的记录数相同且不为0,则表明该天的记录数都已加入入库。这样便可设置字段End_Account和End_Date.
缺点:执行过程效率低
改进方向:1、临时表真正建为VOLATILE 2、对于已全部加载的数据,通过某个逻辑判断,使得不再计算。
启发:陷入思维的沼泽时,不妨以系统地,集合的思想来对待问题,说不定豁然开朗。
贴上核心代码以供参考:
DROP TABLE ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM;
.IF ERRORCODE <> 0 THEN QUIT 12;
CREATE MUTISET TABLE ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
{BEGIN_Date DATE NOT NULL
,Fir_Account INTEGER
,Sec_Account INTEGER
,Thi_Account INTEGER
,Fou_Account INTEGER
,Fiv_Account INTEGER
,Six_Account INTEGER
,Sev_Account INTEGER
,END_Date DATE
,END_Account INTEGER
}
PRIMARY INDEX(BEGIN_Date);
.IF ERRORCODE <> 0 THEN QUIT 12;
###从TB_EVT_MAIL_CLCT加载数据到临时表###
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,count(*),0,0,0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,count(*),0,0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-1 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,count(*),0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-2 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,count(*),0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-3 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,count(*),0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-4 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,count(*),0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-5 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,count(*),0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-6 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,0,count(*),MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-7 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
###从临时表中加载到目标表TB_DAILY_MAIL_CLCT_SUM###
INSERT INTO ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
sel BEGIN_Date
,sum(fir_Account)
,sum(sec_Account)
,sum(thi_Account)
,sum(fou_Account)
,sum(fiv_Account)
,sum(six_Account)
,sum(sev_Account)
,END_Date
,END_Account
FROM ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
GROUP BY BEGIN_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;
###修改结束时间###
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+5 and END_Account = six_Account
WHERE six_Account=sev_Account and six_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+4 and END_Account = fiv_Account
and sev_Account = fiv_Account
WHERE fiv_Account=six_Account and fiv_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+3 and END_Account = fou_Account
and six_Account = fou_Account
and sev_Account = fou_Account
WHERE fou_Account=fiv_Account and fou_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+2 and END_Account = thi_Account
and fiv_Account = thi_Account
and six_Account = thi_Account
and sev_Account = thi_Account
WHERE thi_Account=fou_Account and thi_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+1 and END_Account = sec_Account
and fou_Account = sec_Account
and fiv_Account = sec_Account
and six_Account = sec_Account
and sev_Account = sec_Account
WHERE sec_Account=thi_Account and sec_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE and END_Account = fir_Account
and thi_Account = fir_Account
and fou_Account = fir_Account
and fiv_Account = fir_Account
and six_Account = fir_Account
and sev_Account = fir_Account
WHERE fir_Account=sec_Account and fir_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;