妥投率问题

          最近要求写一段程序,现把问题描述一下,以整理思绪。

          这张表叫做收寄表TB_EVT_MAIL_CLCT,主要描述收寄的相关信息,为做简化,包含字段有邮件编号,收寄日期,收件人,发件人,加载日期,加载时间

           收寄日期(CLCT_DATE)就是指的是你去邮局寄邮件的日期,但收寄日期并不代表邮局人员录入数据到系统的时间,很有可能他是隔了一两天才录入得,据经验过了一周之后,一周之前的收寄日期所在邮件的都会进入系统中。比如收寄日期为20130601,那20130608时,收寄日期为20130601所在的记录都进入系统。

           加载日期(LOAD_DATE)就是指加载到数据仓库的日期。一般每隔半小时都会有数据加载到数据库中,也就是说收寄日期为20130601的邮件信息有可能在20130602不同时间段有数据加载入库。

           目的是用来分析,比如收寄日期为20130601,大致何时这个日期的邮件都会加载入库。

--------------------------------------------------------------------------------------------------------------------------------------------------

 我的设计如下:

设计一张表,用来解决问题。该表的字段有如下,举2条记录

 

Begin_DateFir_AccSec_AccThi_AccFou_AccFiv_AccSix_AccSev_AccEnd_Acc
20130601100200            300      40050050050020130605
2013060210020030030030030030020130604
 

不再增加便意味着结束。

邮件记录数比如FirDay_Account,  可以如下得到

SEL       count(*)
From   TB_EVT_MAIL_CLCT
WHERE  CLCT_DATE = '20130601'

邮件记录数比如SecDay_Account,  可以如下得到

SEL       count(*)
From   TB_EVT_MAIL_CLCT
WHERE  CLCT_DATE = '20130602'

可以看到第二天邮件的数量的是累加的。

程序可以用perl的DBI或者sql来写,程序每天执行一次。

--------------------------------------------------------------------------------------------------------------------------------------------------

之前一直用c语言的单个增加的思想,导致在设计实现上存在困难,具体表现在处理变量变化性上出现问题。仔细一想,用集合的思想便可以解决问题。比如group by便是典型的集合思想,有效解决了时间递进的问题。以下是实现的步骤

之前的改进:为后续做更好的分析,在目标表的末尾添加一字段End_Account。

0、目标表设置为SET,这样可以排重。

1、Load_Date -CLCT_Date作为从源表中插入数据的选择条件,共分7种情况,并将数据加载到临时表TEMP_TB_DAILY_MAIL_CLCT_SUM

     对特殊的两个字段处理,End_Date 取TD数据库时间最大值。End_Account先赋值0.

2、临时表的记录按CLCT_Date做Group by,并做sum,加载到目标表中。

3、处理End_Date和End_Account字段,这里做个假设,若是相邻两天的记录数相同且不为0,则表明该天的记录数都已加入入库。这样便可设置字段End_Account和End_Date.

缺点:执行过程效率低

改进方向:1、临时表真正建为VOLATILE   2、对于已全部加载的数据,通过某个逻辑判断,使得不再计算。

启发:陷入思维的沼泽时,不妨以系统地,集合的思想来对待问题,说不定豁然开朗。

 

 

贴上核心代码以供参考:

DROP TABLE ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM;
.IF ERRORCODE <> 0 THEN QUIT 12;

CREATE MUTISET TABLE ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
{BEGIN_Date DATE NOT NULL
 ,Fir_Account INTEGER 
 ,Sec_Account INTEGER 
 ,Thi_Account INTEGER 
 ,Fou_Account INTEGER 
 ,Fiv_Account INTEGER 
 ,Six_Account INTEGER 
 ,Sev_Account INTEGER 
 ,END_Date	DATE
 ,END_Account INTEGER 
}
PRIMARY INDEX(BEGIN_Date);
.IF ERRORCODE <> 0 THEN QUIT 12;

###从TB_EVT_MAIL_CLCT加载数据到临时表###
INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,count(*),0,0,0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,count(*),0,0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-1 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,count(*),0,0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-2 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,count(*),0,0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-3 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,count(*),0,0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-4 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,count(*),0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-5 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,count(*),0,MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-6 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

INSERT INTO ${TARGETDB}.TEMP_TB_DAILY_MAIL_CLCT_SUM
CLCT_Date AS BEGIN_Date,fir_Account,sec_Account,thi_Account,fou_Account,fiv_Account,six_Account,sev_Account,END_Date,END_Account
select
CLCT_Date,0,0,0,0,0,0,count(*),MAX_Date,0
FROM ${TARGETDB}.TB_EVT_MAIL_CLCT
WHERE CLCT_Date = Load_Date-7 and CLCT_Date<date+1
GROUP BY CLCT_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

###从临时表中加载到目标表TB_DAILY_MAIL_CLCT_SUM###
INSERT INTO ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
sel BEGIN_Date
   ,sum(fir_Account)
   ,sum(sec_Account)
   ,sum(thi_Account)
   ,sum(fou_Account)
   ,sum(fiv_Account)
   ,sum(six_Account)
   ,sum(sev_Account)
   ,END_Date
   ,END_Account
FROM ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
GROUP BY BEGIN_Date;
.IF ERRORCODE <> 0 THEN QUIT 12;

###修改结束时间###
UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+5 and END_Account = six_Account
WHERE six_Account=sev_Account and six_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;

UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+4 and END_Account = fiv_Account
		and sev_Account = fiv_Account
WHERE fiv_Account=six_Account and fiv_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;

UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+3 and END_Account = fou_Account
		and six_Account = fou_Account
		and sev_Account = fou_Account
WHERE fou_Account=fiv_Account and fou_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;

UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+2 and END_Account = thi_Account
		and fiv_Account = thi_Account
		and six_Account = thi_Account
		and sev_Account = thi_Account
WHERE thi_Account=fou_Account and thi_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;

UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE+1 and END_Account = sec_Account
		and fou_Account = sec_Account
		and fiv_Account = sec_Account
		and six_Account = sec_Account
		and sev_Account = sec_Account
WHERE sec_Account=thi_Account and sec_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;

UPDATE ${TARGETDB}.TB_DAILY_MAIL_CLCT_SUM
SET END_Date = CLCT_DATE and END_Account = fir_Account
		and thi_Account = fir_Account
		and fou_Account = fir_Account
		and fiv_Account = fir_Account
		and six_Account = fir_Account
		and sev_Account = fir_Account
WHERE fir_Account=sec_Account and fir_Account<>0;
.IF ERRORCODE <> 0 THEN QUIT 12;


 

 

 

 


 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值