为1.7亿记录表创建快速索引

 

From: http://www.zhujiangroad.com/html/soft/s8076.html

 

为1.7亿记录表创建快速索引

 

本文讲述了在大表上创建索引需要注意的事项,以及整个过程。

需求:在STAT_SUBMIT_CENTER表的RECORDTIME字段上面创建一索引。

环境:SunOS 5.9

 

oracle 9204

8 cpu

3G mem

  

1. 查看表的具体情况

是不是分区表,有多少个分区,分区字段:

SQL> col table_name for a20

SQL> col column_name for a20

SQL> select a.table_name,a.partitioned,b.partition_count,c.column_name 

2  from user_tables a, user_part_tables b, user_part_key_columns c 

3  where a.table_name='STAT_SUBMIT_CENTER' 

4  and b.table_name='STAT_SUBMIT_CENTER' 

5  and c.name='STAT_SUBMIT_CENTER';

 

TABLE_NAME           PAR PARTITION_COUNT COLUMN_NAME

-------------------- --- --------------- --------------------

STAT_SUBMIT_CENTER   YES              50 MSGDATE

 

  

已使用的每个分区的大小:

SQL> select segment_name,partition_name,round(bytes/1024/1024) from user_segments where segment_name ='STAT_SUBMIT_CENTER' and bytes/1024/1024>0.25 order by 3 desc;

 

SEGMENT_NAME               PARTITION_NAME                 ROUND(BYTES/1024/1024)

-------------------------- ------------------------------ ----------------------

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051101                      1722

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051021                      1488

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051111                      1440

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051121                      1355

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051221                      1335

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20050911                      1309

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051211                      1253

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051201                      1247

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20050921                      1198

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20060101                      1151

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20060111                      1068

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051001                      1018

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20051011                       865

STAT_SUBMIT_CENTER         STAT_SUBMIT_CENTER_20060121                       796

 

14 rows selected.

  

整个表的大小:

SQL> select segment_name,sum(bytes/1024/1024) from user_segments where segment_name ='STAT_SUBMIT_CENTER' group by segment_name;

 

SEGMENT_NAME                     SUM(BYTES/1024/1024)

-------------------------------- --------------------

STAT_SUBMIT_CENTER                              17234

  

表的记录数:

SQL> set timing on

SQL> select count(*) from STAT_SUBMIT_CENTER; 

 

COUNT(*)

----------

170341007

Elapsed: 00:14:18.60

  

还有这个表上的索引情况如下:

table   STAT_SUBMIT_CENTER             17234 Mindex   IDX_SUBCEN_ADDRUSER             5155 M        ADDRUSER    PK_STAT_SUBMIT_CENTER          10653 M        MSGDATE,ADDRUSER,MSGID

然后查看一些数据库参数情况:

SQL> show  parameter   work

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

workarea_size_policy                 string      AUTO

SQL> show parameter pga

NAME                                 TYPE        VALUE

------------------------------------ ----------- ------------------------------

pga_aggregate_target                 big integer 209715200

SQL> select * from dba_temp_files;

FILE_NAME

------------------------------------------------------------------------------------------------------------------------  

FILE_ID TABLESPACE_NAME                     BYTES     BLOCKS STATUS    RELATIVE_FNO AUT   MAXBYTES  MAXBLOCKS

---------- ------------------------------ ---------- ---------- --------- ------------ --- ---------- ----------

INCREMENT_BY USER_BYTES USER_BLOCKS

------------ ---------- -----------

/bgdata/oracle/temp01.dbf         1 TEMP                           3563061248     434944 AVAILABLE            1 YES 4294967296     524288        6400 3562012672      434816

 

2. 需要考虑的几个方面

1)创建的索引需要几个G的磁盘空间

2)创建索引需要排序,使用pga_aggregate_target,要把这个值从200M加大到2G

3)如果内存不够,需要temp表空间,把temp表空间加大到8G。itpub上有一个帖子说过,15亿条记录用了34G空间。

4)在线创建,时间会比较长。讨论后,停这个表的操作,非online创建。

3. 实际操作过程

1)数据文件够,不扩展;temp数据文件扩展:

alter database tempfile '/bgdata/oracle/temp01.dbf' resize 8192m;

2)在workarea_size_policy=AUTO的情况下,改pga_aggregate_target=2048m,对于串行操作,一个session能使用的pga=MIN(5%PGA_AGGREGATE_TARGET,100MB),这样可以使得pga用到最大的值:

alter system set pga_aggregate_target=2048m;

3)因为这是一个比较长的过程,所以写脚本让后台运行:

createind.sh

vi createind.sh

#!/bin/sh

sqlplus user/password <<EOF

create index IDX_SUBMIT_RECORDTIME on STAT_SUBMIT_CENTER(RECORDTIME) local;

exit

EOF

4)创建过程中可以观察v$sort_segment,v$sort_usage看排序情况:

select tablespace_name,current_users,total_blocks,used_blocks,free_blocks from v$sort_segment;

select * from v$sort_usage;

5)创建完成后,把tempfile和pga_aggregate_target改回原值:

alter database tempfile '/bgdata/oracle/temp01.dbf' resize 4096m;

alter system set pga_aggregate_target=500m;

4. 实际创建过程中观察到的情况

1)开始之前:

SQL> select tablespace_name,current_users,total_blocks,used_blocks,free_blocks from v$sort_segment;

TABLESPACE_NAME CURRENT_USERS TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS

------------------------------- ------------- ------------ ----------- -----------

TEMP 0 431360 0 431360

 

SQL> select * from v$sort_usage;

no rows selected

2)创建之初,抓到这么一条sql:

insert into obj$(owner#,name,namespace,obj#,type#,ctime,mtime,stime,status,remoteowner,linkname,subname,dataobj#,flags,oid$,spare1,spare2)values(:1,:2,:3,:4,:5,:6,:7,:8,:9,:10,:11,:12,:13,:14,:15,:16, :17)

3)然后v$sort_segment.USED_BLOCKS变大,v$sort_usage.BLOCKS变大,一直增长到:

SQL> select tablespace_name,current_users,total_blocks,used_blocks,free_blocks from v$sort_segment;

TABLESPACE_NAME CURRENT_USERS TOTAL_BLOCKS USED_BLOCKS FREE_BLOCKS

------------------------------- ------------- ------------ ----------- -----------

TEMP 1 431360 46720 384640

SQL> select * from v$sort_usage;

USERNAME USER SESSION_ADDR SESSION_NUM SQLADDR SQLHASH

------------------------------ ------------------------------ ---------------- ----------- ---------------- ----------

TABLESPACE CONTENTS SEGTYPE SEGFILE# SEGBLK# EXTENTS BLOCKS SEGRFNO#

------------------------------- --------- --------- ---------- ---------- ---------- ---------- ----------

DPC DPC 00000003974CFFB0 6134 0000000399CAB288 1254950678TEMP TEMPORARY SORT 201

431113 365 46720 1

这个过程中抓到的sql:select file# from file$ where ts#=:1

4)v$sort_segment.USED_BLOCKS变为0,v$sort_usage.BLOCKS变为0

5)重复3,4两步,估计这个是创建一个分区的索引

需要解释一下的是,上面的sql只是我随机抓到的运行时间比较长的,整个create index过程会复杂很多,具体怎么样可以用sqltrace跟踪。这里主要看的是temp表空间的使用情况。


同时,在创建的过程中:

SQL> select segment_name,partition_name from user_segments where segment_name='IDX_SUBMIT_RECORDTIME';

no rows selected

SQL> select index_name,partition_name from user_ind_partitions where INDEX_NAME='IDX_SUBMIT_RECORDTIME';

no rows selected

当时忘了查user_segments中其实是有一个segment_name为一串数字的记录,那个才是正在创建的索引;如果这个事务失败了,将回滚。


最后耗时99分钟完成。

 

5. 创建完成后分析索引

但是接下来还有一件事。创建完成后要分析索引,否则就是走了索引,查询也巨慢无比。

SQL> explain plan for select count(*) from stat_submit_center where recordtime>trunc(sysdate);

Explained.

SQL> @?/rdbms/admin/utlxplp.sql

PLAN_TABLE_OUTPUT

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |

-------------------------------------------------------------------------------------------------

| 0 | SELECT STATEMENT | | 1 | 9 | 4 | | || 1 | SORT AGGREGATE | | 1 | 9 | | | || 2 | PARTITION RANGE ALL | | | | | 1 | 50 ||* 3 | INDEX FAST FULL SCAN| IDX_SUBMIT_RECORDTIME | 8878K| 76M| 4 | 1 | 50 |

-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

---------------------------------------------------

3 - filter("STAT_SUBMIT_CENTER"."RECORDTIME">TRUNC(SYSDATE@!))Note: cpu costing is off16

rows selected.

SQL> set autotrace on explain

SQL> set timing on

SQL> select count(*) from stat_submit_center where recordtime>trunc(sysdate);aa^Cselect count(*) from stat_submit_center where recordtime>trunc(sysdate)*ERROR at line 1:ORA-01013: user requested cancel of current operationElapsed: 00:11:49.85

SQL>

SQL> set autotrace off

上面可以看到,因为没有分析索引,虽然它走的是新建的IDX_SUBMIT_RECORDTIME这个索引,但是查询巨慢,10分钟后也没有结果。下面我们分析一下。

SQL> Analyze index IDX_SUBMIT_RECORDTIME estimate statistics;

Index analyzed.

Elapsed: 00:00:06.84

SQL> set autotrace on explain

SQL> select count(*) from stat_submit_center where recordtime>trunc(sysdate);COUNT(*)

----------

926736

Elapsed: 00:00:05.37

Execution Plan

----------------------------------------------------------

0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4360 Card=1 Bytes=9)1 0 SORT (AGGREGATE)2 1 PARTITION RANGE (ALL)3 2 INDEX (RANGE SCAN) OF 'IDX_SUBMIT_RECORDTIME' (NON-UNIQUE) (Cost=4360 Card=8878740 Bytes=79908660)

SQL> set autotrace off

索引分析之后,查询时间在5分钟,效率大大提高。

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值