执行的 SQL: select * from t1,t2 where t1.object_id=t2.object_id; -- 两个20亿大表关联耗时一天一夜
关联慢的原因:单个进程的PGA装不下大表数据 --> 消耗大量临时表空间 --> 临时表来回读写
优化方法:
1)开并行,并行hash。 -- 不可行
2)利用MPP架构/HADOOP架构思想:数据分割。
数据分割方法:一个是分区;另外一个是分表。本文选用分区。
步骤:
1)创建表P1,P2。在T1的表结构基础上多加一个字段HASH_VALUE,而且依据HASH_VALUE进行LIST分区。同理,P2。
注意:工作中具体需要多少分区需自己判断,但P1和P2表的分区必须一模一样。
CREATE TABLE P1(
HASH_VALUE NUMBER,
OWNER VARCHAR2(30),
OBJECT_NAME VARCHAR2(128),
SUBOBJECT_NAME VARCHAR2(30),
OBJECT_ID NUMBER,
DATA_OBJECT_ID NUMBER,
OBJECT_TYPE VARCHAR2(19),
CREATED DATE,
LAST_DDL_TIME DATE,
TIMESTAMP VARCHAR2(19),
STATUS VARCHAR2(7),
TEMPORARY VARCHAR2(1),
GENERATED VARCHAR2(1),
SECONDARY VARCHAR2(1),
NAMESPACE NUMBER,
EDITION_NAME VARCHAR2(30)
)
PARTITION BY list(HASH_VALUE)
(
partition p0 values (0),
partition p1 values (1),
partition p2 values (2),
partition p3 values (3),
partition p4 values (4)
)
2)向P1,P2中插入数据。
oracle中的hash分区就是利用的ora_hash函数。
partition by hash(object_id) <=> ora_hash(object_id,4294967295)
ora_hash(列,hash桶) ,hash桶默认是4294967295,能够设置0到4294967295。
ora_hash(object_id,4) 会把object_id的值进行hash运算,然后放到 0,1,2,3,4 这些桶里面。
delete t1 where object_id is null;
commit;
delete t2 where object_id is null;
commit;
insert into p1
select ora_hash(object_id,4), a.* from t1 a; ---工作中用append parallel并行插入
commit;
insert into p2
select ora_hash(object_id,4), a.* from t2 a; ---工作中用append parallel并行插入
commit;
3)执行表关联。
优化后执行时间大约1小时。可将该过程整理为存储过程。
select * from p1,p2 where p1.object_id=p2.object_id and p1.hash_value=0 and p2.hash_value=0;
select * from p1,p2 where p1.object_id=p2.object_id and p1.hash_value=1 and p2.hash_value=1;
select * from p1,p2 where p1.object_id=p2.object_id and p1.hash_value=2 and p2.hash_value=2;
select * from p1,p2 where p1.object_id=p2.object_id and p1.hash_value=3 and p2.hash_value=3;
select * from p1,p2 where p1.object_id=p2.object_id and p1.hash_value=4 and p2.hash_value=4;