一千万条数据去重_DB2千万级数据去重

insert into FACT_WBCKTJB_TMP select * from FACT_WBCKTJB where (rq,zh)) in(select rq,zh from FACT_WBCKTJB group by rq,zh having count(*)>1);

--3.删除原表重复数据

delete from FACT_WBCKTJB where (rq,zh)) in(select rq,zh from FACT_WBCKTJB group by rq,zh having count(*)>1);

--4.导入去重后数据

INSERT INTO FACT_WBCKTJB SELECT RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ FROM FACT_WBCKTJB_TMP GROUP BY RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ;

实际操作时,第2步就过不去了,数据量太大,将第2步换成导入19日后全部数据。

insert into FACT_WBCKTJB_TMP select * from FACT_WBCKTJB where rq>='2017/06/19';

这边命令执行了一个小时,坑啊——,万幸没把数据库搞挂。

其实执行到四十分钟时候,数据就已经全部插到TMP表了。

select count(1) from FACT_WBCKTJB where rq='2017/07/02';

select count(1) from FACT_WBCKTJB_TMP where rq='2017/07/02';

上面两个条数在四十分钟的时候已经一致了。以为是事务日志的原因,关了事务日志。

db2 commit;

db2 alter table FACT_WBCKTJB activate not logged initially;

db2 commit;

重新执行又是一个小时,没用。临时表还hang住了,drop不掉,查询没数据。8.2没有db2top,用db2pd找到对应的application,然后force掉。

db2pd -db bankdm -locks -transactions -applications -dynamic;

db2 force application 11111;

db2 drop table FACT_WBCKTJB_TMP;

换成用游标的方式把数据导出,为了避免前面步骤的第3步delete操作再卡住,直接把全部数据导出来,最终执行的步骤如下。

--1.创建临时表

19日以前的正确数据

create table FACT_WBCKTJB_18 like FACT_WBCKTJB;

19日以后的重复数据

create table FACT_WBCKTJB_19 like FACT_WBCKTJB;

--2.将数据以6月19日为界,导入两个临时表

db2 connect to bankdm ;

db2 "declare C1 cursor for SELECT * FROM DB2INST1.FACT_WBCKTJB where rq

db2 "LOAD FROM C1 of cursor insert INTO DB2INST1.FACT_WBCKTJB_18 NONRECOVERABLE" >>./load.log

db2 "declare C2 cursor for SELECT * FROM DB2INST1.FACT_WBCKTJB where rq

>='2017/06/19' with ur ";

db2 "LOAD FROM C2 of cursor insert INTO DB2INST1.FACT_WBCKTJB_19 NONRECOVERABLE" >>./load.log

db2 connect reset;

--3.将19日之后的数据去重后导入原表

db2 connect to bankdm ;

db2 "declare C1 cursor for SELECT RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ FROM DB2INST1.FACT_WBCKTJB_19 where rq>='2017/06/19' and rq

db2 "LOAD FROM C1 of cursor insert INTO DB2INST1.FACT_WBCKTJB NONRECOVERABLE" >>./load.log

db2 "declare C2 cursor for SELECT RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ FROM DB2INST1.FACT_WBCKTJB_19 where rq>='2017/06/24' and rq

db2 "LOAD FROM C2 of cursor insert INTO DB2INST1.FACT_WBCKTJB NONRECOVERABLE" >>./load.log

db2 "declare C3 cursor for SELECT RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ FROM DB2INST1.FACT_WBCKTJB_19 where rq>='2017/06/29' GROUP BY RQ, ZH, XM, JGH, BZ, YE_YB, YE_MY, YJS, YRJ, NJS, NRJ with ur ";

db2 "LOAD FROM C3 of cursor insert INTO DB2INST1.FACT_WBCKTJB NONRECOVERABLE" >>./load.log

db2 connect reset;

上面的语句放shell里执行了,按日期分了三个批次,执行的时候仅仅用了三四分钟,可能不分批也没事。

--4.导入19日以前的数据

db2 connect to bankdm ;

db2 "declare C1 cursor for SELECT * FROM DB2INST1.FACT_WBCKTJB_18 with ur ";

db2 "LOAD FROM C1 of cursor insert INTO DB2INST1.FACT_WBCKTJB NONRECOVERABLE" >>./load.log

--5.清理临时表,直接贴执行过程。

$ touch 1.del;

$ db2 connect to bankdm

Database Connection Information

Database server = DB2/AIX64 8.2.0

SQL authorization ID = ODS

Local database alias = BANKDM

$ db2 "load from 1.del of del replace into db2inst1.FACT_WBCKTJB_19 nonrecoverable"

SQL3109N The utility is beginning to load data from file "/ods/1.del".

SQL2036N The path for the file or device "/ods/1.del" is not valid.

SQL3107W There is at least one warning message in the message file.

Number of rows read = 0

Number of rows skipped = 0

Number of rows loaded = 0

Number of rows rejected = 0

Number of rows deleted = 0

Number of rows committed = 0

$ db2 drop table db2inst1.FACT_WBCKTJB_18

$ db2 "load from 1.del of del replace into db2inst1.FACT_WBCKTJB_19 nonrecoverable"

SQL3109N The utility is beginning to load data from file "/ods/1.del".

SQL2036N The path for the file or device "/ods/1.del" is not valid.

SQL3107W There is at least one warning message in the message file.

Number of rows read = 0

Number of rows skipped = 0

Number of rows loaded = 0

Number of rows rejected = 0

Number of rows deleted = 0

Number of rows committed = 0

$ db2 drop table db2inst1.FACT_WBCKTJB_18

$ db2 connect reset

总结:

1.cursor是个好东西,最好的数据数据搬运车,这两年工作多亏他了;

2.insert的执行过程还需要细细研究;

3.db2该学习oracle的rowid这种唯一列标识;

4.db2 V8明年终于可以sayounara了;

5.有空的时候需要锻炼锻炼颈椎和腰椎了。

  • 0
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值