批量插入如何提高性能?
原条件要求如下:
1)原表:100万条记录;
2)一次插入10万条记录;
3)原表中并且有索引存在;
问题:
1)此时要求提高插入的性能,如何快速完成一次性的插入?
2)如果一次性插入50万条记录情况下,如何快速完成一次性的插入?
1、为了使测试就会更有针对性,首先思考影响批量插入性能的可能因素有哪些?
然后分别针对这些可能的因素,一一证明这些因素是否真的影响到批量插入的性能,
并分析哪些因素是影响批量插入的主要因素?
/* wwj:如果你一开始就能猜到测试想证明什么问题,测试就会更有针对性*/
讨论出来影响批量插入性能的可能因素有:
记录数据量:原表:100万条记录,一次插入10万条记录
1)、有无append
2)、nologging(对表以及对索引)
3)、索引的个数
4)、parallel(aa,10)(对表以及对索引)
5)、实时更新索引和重建索引因素
6)、分区表和分区索引
7)、oracle的版本
--ADO往SP中传数组的方法
8) 一次插入50万条记录的话,以上情况又是如何?
2、测试过程:
测试的环境:
数据库版本:oracle 10g
虚拟机内存大小:512M 一个内存
测试步骤:
记录数据量:原表:100万条记录,一次插入10万条记录
a) 创建表:
drop table aa;
create table aa as
select * from all_col_comments;
b) 导入100万条记录
begin
for i in 1..24 loop
insert into aa
select * from all_col_comments;
end loop;
end ;
SQL> select count(*) from aa;
COUNT(*)
----------
1037746
c)、给新建的表的前三个字段创建复合索引
-- Create/Recreate indexes
--drop index aa_index
create index aa_index on AA (owner, table_name, column_name);
d)、建立大约10万条记录的表
drop table temp
create table temp
as
select * from all_col_comments;
begin
for i in 1..2 loop
insert into temp
select * from all_col_comments;
end loop;
end;
SQL> select count(*) from temp;
COUNT(*)
----------
124538
e)、创建一个一次插入10万记录的insert存储过程
create or replace procedure insertaa is
begin
insert into aa
select * from temp;
end insertaa;
f)、调用存储过程
SQL> begin
insertaa();
end; 2 3
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:02:30.86
---------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------------------------------------
测试影响因素:
1)、比较 添加了APPEND注释的insert存储过程
create or replace procedure insertaa1 is
begin
insert /*+ APPEND */ into aa
select * from temp;
end insertaa1;
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:57.13
--------------------------------------------------------------------------------------------------------------------------------------------
2)nologging(对表以及对索引)
修改aa表的属性,使得nologging对表aa
alter table aa nologging;
SQL> begin
2 insertaa();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:23.69
分析比较:没有添加APPEND的存储过程时间:Elapsed: 00:02:30.86
没有添加APPEND的存储过程在执行alter table aa nologging之后需要的时间:Elapsed: 00:00:23.69
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.33
分析比较:没有添加APPEND的存储过程时间:Elapsed: 00:02:30.86
执行添加APPEND的存储过程时间:Elapsed: 00:00:57.13
添加APPEND的存储过程在执行alter table aa nologging之后需要的时间Elapsed: 00:00:08.33
修改aa表中索引的属性,使得nologging对索引aa_index
alter index aa_index nologging;
SQL> begin
2 insertaa();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:17.83
分析比较:没有添加APPEND的存储过程时间:Elapsed: 00:02:30.86
没有添加APPEND的存储过程在执行alter table aa nologging之后需要的时间:Elapsed: 00:00:23.69
没有添加APPEND的存储过程在继续执行alter index aa_index nologging之后需要的时间:Elapsed: 00:00:17.83
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.32
分析比较:没有添加APPEND的存储过程时间:Elapsed: 00:02:30.86
执行添加APPEND的存储过程时间:Elapsed: 00:00:57.13
添加APPEND的存储过程在执行alter table aa nologging之后需要的时间Elapsed: 00:00:08.33
添加APPEND的存储过程在继续执行alter index aa_index nologging之后需要的时间:Elapsed: 00:00:08.32
建立存储过程如下:
create or replace procedure insertaa2 is
begin
insert /*+ APPEND nologging*/ into aa
select * from temp;
end insertaa1;
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.13
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:07.04
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.75
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.43
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.07
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.99
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.11
APPEND hint的学习:
APPEND When the APPEND hint is used with the INSERT statement, data is appended to the table. Existing free space in the block is not used.
If a table or an index is specified with nologging, this hint applied with an insert statement produces a direct path insert which reduces
generation of redo.
分析:由于测试中aa表以及aa_index索引都设置为nologging状态,所以使用了APPEND hint之后,是不会产生redo的,存储过程加与不加nologging提示无所谓的,
并且在oracle提示中nologging提示是不存在的。
--------------------------------------------------------------------------------------------------------------------------------------------------
3)、索引的个数
SQL> alter table aa nologging;
Table altered.
SQL> alter index aa_index nologging;
Index altered.
在nologging table以及nologging index条件下,并且使用APPEND提示,不多加索引时所测试的情况如下:
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:51.60
在nologging table以及nologging index条件下,并且使用APPEND提示,多创建两个索引的测试情况如下:
create index aa_index1 on AA (table_name, column_name);
create index aa_index2 on AA (column_name);
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:02:29.57
分析:条件都是在nologging table以及nologging index条件下,并且使用APPEND提示,在相同的条件下,增加索引的个数会影响的批量插入的性能,
使批量插入性能下降;
如何才能既能在多个索引情况下,又能够提高性能呢?
在后面会测试下实时更新索引和重建索引之间的性能比较;
---------------------------------------------------------------------------------------------------------------------------------------------------
4)、parallel(aa,10)(对表以及对索引)
要使parallel并行提示有效,必须要执行ALTER SESSION { ENABLE | FORCE } PARALLEL DML,此时加PARALLEL提示才有作用;
新建存储过程,加APPEND以及parallel提示;
create or replace procedure insertaa2 is
begin
insert /*+ APPEND parallel(aa,10)*/ into aa
select * from temp;
end insertaa2;
不加parallel提示的情况下测试如下:(条件都是在nologging table以及nologging index条件下,并且使用APPEND提示,并且还是在只有一个索引的情况下)
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:06.19
加parallel提示的情况下测试如下:(条件都是在nologging table以及nologging index条件下,并且使用APPEND提示,并且还是在只有一个索引的情况下)
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:44.53
再执行一次不加parallel提示的情况:
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.59
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:16.19
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:06.80
再加parallel提示的情况:
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:19.94
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:13.69
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:19.45
分析:parallel(aa,10)提示并没有完全提高性能
将parallel(aa,10)改为parallel(aa,3)又会如何?
不加parallel提示的情况下测试如下:
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.30
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:05.53
加parallel提示的情况下测试如下:
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:15.87
SQL> ALTER SESSION ENABLE PARALLEL DML;
Session altered.
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:10.55
分析:parallel(aa,5)提示并没有完全提高性能
因为parallel是多线程实现,线程切换和管理需要时间,而这些线程在单CPU下事实上是串行的,
对不分区的表进行parallel插入的话,需要分配多个临时表,先同时对多个临时表进行插入,然后再合并,
对多个临时表的插入也是使用多线程实现,是串行执行的。
5)、实时更新索引和重建索引因素
对表多建几个索引,有什么变化?
多创建两个索引
create index aa_index1 on AA (table_name, column_name);
create index aa_index2 on AA (column_name);
比较rebuild+insert(无索引或索引实效)和insert(有索引)测试(多个索引)的几种情况的测试:
1、rebuild+insert(无索引)情况下:
--做插入操作的存储过程定义,包含新建索引
create or replace procedure insertaa2 is
begin
insert /*+ APPEND nologging parallel(aa,10)*/ into aa
select * from temp;
EXECUTE IMMEDIATE 'create index aa_index on aa(owner, table_name, column_name)'; --新建索引
EXECUTE IMMEDIATE 'create index aa_index1 on AA (table_name, column_name)'; --新建索引
EXECUTE IMMEDIATE 'create index aa_index2 on AA (column_name)'; --新建索引
end insertaa2;
无索引测试步骤:
1)、建表aa,数据量115万条左右;
2)、调用存储过程插入10万条记录,并且即时创建索引;
SQL> begin
2 insertaa2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:03:36.41
比较:只有aa_index一个索引的情况下所需时间为:Elapsed: 00:01:30.47
分析:索引个数的增加使得所消耗的时间近三倍,时间的增长幅度比较的大
2、rebuild+insert(索引实效)情况下:
--做插入操作的存储过程定义,包含rebuild索引
create or replace procedure insertaa is
begin
insert /*+ APPEND nologging parallel(aa,10)*/ into aa
select * from temp;
EXECUTE IMMEDIATE 'alter index aa_index rebuild nologging'; --rebuild索引
EXECUTE IMMEDIATE 'alter index aa_index1 rebuild nologging'; --rebuild索引
EXECUTE IMMEDIATE 'alter index aa_index2 rebuild nologging'; --rebuild索引
end insertaa;
索引实效的测试步骤:
1)、建表aa,数据量115万条左右;
2)、创建索引
create index aa_index on AA (owner, table_name, column_name);
create index aa_index1 on AA (table_name, column_name);
create index aa_index2 on AA (column_name);
3)、使索引失效
alter index aa_index unusable;
alter index aa_index1 unusable;
alter index aa_index2 unusable;
SQL> select status
2 from all_indexes a
3 where a.owner=upper('daimin')
4 and a.index_name=upper('aa_index');
STATUS
--------
UNUSABLE --aa_index索引失效
SQL> select status
2 from all_indexes a
3 where a.owner=upper('daimin')
4 and a.index_name=upper('aa_index1');
STATUS
--------
UNUSABLE --aa_index1索引失效
SQL> select status
2 from all_indexes a
3 where a.owner=upper('daimin')
4 and a.index_name=upper('aa_index2');
STATUS
--------
UNUSABLE --aa_index2索引失效
4)、调用存储过程插入10万条记录,并且即时rebuild索引;
SQL> begin
2 insertaa();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:41.00
比较:只有aa_index一个索引的情况下所需时间为: Elapsed: 00:00:26.15
分析:索引个数的使得所消耗的时间增加幅度不是很大
2、insert(有索引)情况下:
--做插入操作的存储过程定义
create or replace procedure insertaa1 is
begin
insert /*+ APPEND nologging parallel(aa,10)*/ into aa
select * from temp;
end insertaa1;
有索引的测试步骤:
1)、建表aa,数据量115万条左右;
2)、创建索引
create index aa_index on AA (owner, table_name, column_name);
create index aa_index1 on AA (table_name, column_name);
create index aa_index2 on AA (column_name);
3)、调用存储过程插入10万条记录
SQL> begin
2 insertaa1();
3 end; 4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:28.49
比较:只有aa_index一个索引的情况下所需时间为:Elapsed: 00:00:15.94
分析:索引个数的使得所消耗的时间增加幅度不是很大
结论:还是insert(有索引)情况下快
6)、分区表和分区索引
wwj 说:
insert其他分区,也可以比对一下rebuild index
Joy Dai 说:
你是说让数据分散,插入到不同的分区?
wwj 说:
相同分区
wwj 说:
是有数据的
Joy Dai 说:
哦,你的意思是指不要插入到最后一个分区?
Joy Dai 说:
插入到中间的分区?
wwj 说:
是
1)、分区表和分区索引,插入到相同分区测试
create table temp
as
select * from all_col_comments
where rownum<1;
begin
for i in 1..3 loop
insert into temp
select * from all_col_comments;
end loop;
end;
SQL> select count(*) from temp;
COUNT(*)
----------
124538
SQL> drop table aa;
Table dropped.
SQL> create table aa as
2 select rownum as r,a.* from all_col_comments a;
Table created..
SQL> begin
2 for i in 1..24 loop
3 insert into aa
4 select 41554*i+rownum as r,a.* from all_col_comments a;
5 end loop;
6 end ;
7 /
PL/SQL procedure successfully completed.
SQL> select count(*) from aa;
COUNT(*)
----------
1039775
SQL> select max(r) from aa;
MAX(R)
----------
1038886
设置aa表nologging属性
SQL> alter table aa nologging;
Table altered.
新建索引
SQL> create index aa_index on aa(owner, table_name, column_name);
Index created.
创建分区表
-- Create table
drop table BBp;
SQL> create table BBp
2 PARTITION BY RANGE ( r )
3 (
4 PARTITION quarter_one VALUES LESS THAN ( 200000),
5 PARTITION quarter_two VALUES LESS THAN ( 400000),
6 PARTITION quarter_three VALUES LESS THAN ( 600000),
7 PARTITION quarter_four VALUES LESS THAN ( 800000),
8 PARTITION quarter_five VALUES LESS THAN ( 1000000),
9 PARTITION quarter_six VALUES LESS THAN (MAXVALUE)
10 )
11 as
12 select * from aa;
Table created.
创建分区索引
SQL> create index BBp_index on BBp (owner, table_name, column_name) LOCAL;
Index created.
设置BBp表nologging属性
SQL> alter table BBp nologging;
Table altered.
SQL> select count(*) from BBp;
COUNT(*)
----------
1039775
针对不同的想法,创建不同的存储过程进行测试:
--将10多条记录插入到不是分区表的表中去的存储过程
create or replace procedure insertaa1 is
begin
insert /*+ APPEND */ into aa
select 1039775+rownum as r ,t.* from temp t;
end insertaa1;
--因为124538<200000,该插入操作是插入到第一个分区quarter_1中(插入到带有数据的分区中去)
create or replace procedure insertBB1 is
begin
insert /*+ APPEND */ into BBp
select rownum as r ,t.* from temp t;
end insertBB1;
--因为1000000+124538>1000000,该插入操作是将数据插入到最后一个几乎没有数据的分区quarter_13中去
create or replace procedure insertBB2 is
begin
insert /*+ APPEND */ into BBp
select 1000000+rownum as r ,t.* from temp t;
end insertBB2;
-- temp表中的数据被分散插入到不同分区
create or replace procedure insertBB3 is
begin
insert /*+ APPEND */ into BBp
select trunc(dbms_random.value(1, 1039775))+rownum as r ,t.* from temp t;
end insertBB3;
调用存储过程:
SQL> begin
2 insertaa1(); --将10多条记录插入到不是分区表的表中
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:02:36.77
SQL> begin
2 insertbb1(); --所有数据插入有数据的相同分区
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:58.61
SQL> begin
2 insertbb2(); --所有数据插入到一个接近空的分区
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:48.33
SQL> begin
2 insertbb3(); --所有数据分散插入到不同分区
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:03:11.01
比较: 将数据批量插入到一个接近空的分区速度最快,将数据批量分散插入到不同分区速度最慢
2)对比先使索引失效,然后rebuild index的方法
针对不同的想法,创建不同的存储过程进行测试:
使索引失效
alter index aa_index unusable;
SQL> select status
2 from all_indexes a
3 where a.owner=upper('daimin')
4 and a.index_name=upper('aa_index');
STATUS
----------------
UNUSABLE
--将10多条记录插入到不是分区表的表中去的存储过程
create or replace procedure insertaa1 is
begin
insert /*+ APPEND */ into aa
select 1039775+rownum as r ,t.* from temp t;
EXECUTE IMMEDIATE 'alter index aa_index rebuild nologging'; --rebuild索引aa_index
end insertaa1;
SQL> begin
2 insertaa1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:17.25
使分区索引失效
ALTER INDEX BBp_index MODIFY PARTITION quarter_one UNUSABLE;
--因为124538<200000,该插入操作是插入到第一个分区quarter_one中(插入到带有数据的分区中去)
create or replace procedure insertBB1 is
begin
insert /*+ APPEND */ into BBp
select rownum as r ,t.* from temp t;
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_one nologging'; --rebuild quarter_1分区的索引BBp_index
end insertBB1;
SQL> begin
2 insertbb1();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:26.15
使分区索引失效
ALTER INDEX BBp_index MODIFY PARTITION quarter_six UNUSABLE;
--因为1000000+124538>1000000,该插入操作是将数据插入到最后一个几乎没有数据的分区quarter_six中去
create or replace procedure insertBB2 is
begin
insert /*+ APPEND */ into BBp
select 1000000+rownum as r ,t.* from temp t;
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_six nologging'; --rebuild quarter_six分区的索引BBp_index
end insertBB2;
SQL> begin
2 insertbb2();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:00:08.78
使索引失效
alter index BBp_index unusable;
-- temp表中的数据被分散插入到不同分区
create or replace procedure insertBB3 is
begin
insert /*+ APPEND */ into BBp
select trunc(dbms_random.value(1, 1039775))+rownum as r ,t.* from temp t;
EXECUTE IMMEDIATE 'alter index BBp_index rebuild nologging'; --rebuild bbp的索引BBp_index
end insertBB3;
SQL> begin
2 insertbb3();
3 end;
4 /
begin
*
ERROR at line 1:
ORA-14086: a partitioned index may not be rebuilt as a whole
ORA-06512: at "DAIMIN.INSERTBB3", line 5
ORA-06512: at line 2
然后修改存储过程如下:
-- temp表中的数据被分散插入到不同分区
create or replace procedure insertBB3 is
begin
insert /*+ APPEND */ into BBp
select trunc(dbms_random.value(1, 1039775))+rownum as r ,t.* from temp t;
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_one nologging';
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_two nologging';
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_three nologging';
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_four nologging';
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_five nologging';
EXECUTE IMMEDIATE 'alter index BBp_index rebuild partition quarter_six nologging';
end insertBB3;
SQL> begin
2 insertBB3();
3 end;
4 /
PL/SQL procedure successfully completed.
Elapsed: 00:01:28.62
比较:使用先使索引失效,然后rebuild index的方法都比实时更新索引的批量插入的性能好,并且在使用rebuild index的方法中仍然是
将数据批量插入到一个接近空的分区速度最快,将数据批量分散插入到不同分区速度最慢