原文链接:http://blog.itpub.net/24236575/viewspace-1116110/
分区概述:
为了简化数据库大表的管理,从ORACLE推出了分区选项。分区将表分离在若干不同的表空间上,用分而治之的方法来支撑无限膨胀的大表,给大表在物理一级的可管理性。将大表分割成较小的分区可以改善表的维护、备份、恢复、事务及查询性能。针对当前社保及电信行业的大量日常业务数据,可以推荐使用ORACLE的该选项。
分区优点:
1 、增强可用性:如果表的一个分区由于系统故障而不能使用,表的其余好的分区仍然可以使用;
2 、减少关闭时间:如果系统故障只影响表的一部分分区,那么只有这部分分区需要修复,故能比整个大表修复花的时间更少;
3 、维护轻松:如果需要重建表,独立管理每个分区比管理单个大表要轻松得多;
4 、均衡I/O:可以把表的不同分区分配到不同的磁盘来平衡I/O改善性能;
5 、改善性能:对大表的查询、增加、修改等操作可以分解到表的不同分区来并行执行,可使运行速度更快;
6 、分区对用户透明,最终用户感觉不到分区的存在。
关于何时应该进行分区,Oracle有如下建议:
(1)Tables greater than 2GB should always be considered for partitioning. 表大小超过2G,总是要考虑分区
(2)Tables containing historical data, in which new data is added into the newest partition. A typical example is a historical table where only the current month's data is updatable and the other 11 months are read only. 表有历史数据和当前数据之分
一、常见分区:
1、范围分区
创建表空间,供分区使用:
create tablespace ts_emp1 datafile 'D:APPPETERORADATAWJIAO s_emp1.dbf' size 1m;
create tablespace ts_emp2 datafile 'D:APPPETERORADATAWJIAO s_emp2.dbf' size 1m;
create tablespace ts_emp3 datafile 'D:APPPETERORADATAWJIAO s_emp3.dbf' size 1m;
创建分区表:
create table emp
(
empno number(4) not null,
ename varchar2(10),
sal number(7,2)
)
partition by range(empno)
(
partition p1000 values less than (1000) tablespace ts_emp1,
partition p2000 values less than (2000) tablespace ts_emp2,
partition pmax values less than (maxvalue) tablespace ts_emp3
);
观察表内分区信息
HR@wjiao>select segment_name,segment_type,partition_name from user_segments where segment_name='EMP';
SEGMENT_NAME SEGMENT_TYPE PARTITION_NAME
------------------------------ ------------------ ------------------------------
EMP TABLE PARTITION P1000
EMP TABLE PARTITION P2000
EMP TABLE PARTITION PMAX
模拟数据:
insert into emp values (100,'Tom',1000);
insert into emp values (500,'Peter',2000);
insert into emp values (1000,'Scott',3000);
insert into emp values (1999,'Bill',4000);
insert into emp values (5000,'Gates',6000);
commit;
观察记录分布
HR@wjiao>select * from emp;
EMPNO ENAME SAL
---------- ---------- ----------
100 Tom 1000
500 Peter 2000
1000 Scott 3000
1999 Bill 4000
5000 Gates 6000
HR@wjiao>select * from emp partition(p1000);
EMPNO ENAME SAL
---------- ---------- ----------
100 Tom 1000
500 Peter 2000
HR@wjiao>select * from emp partition(p2000);
EMPNO ENAME SAL
---------- ---------- ----------
1000 Scott 3000
1999 Bill 4000
HR@wjiao>select * from emp partition(pmax);
EMPNO ENAME SAL
---------- ---------- ----------
5000 Gates 6000
维护分区
exp导出指定表分区
C:>exp userid=hr/hr file=d:apppeteroradataemp_p1000.dmp tables=emp:p1000
imp导入指定表分区
C:>imp userid=hr/hr file=d:apppeteroradataemp_p1000.dmp tables=(emp:p1000) ignore=y
分区不存在,则插入到p2000
ignore=y必须要有,忽略创建错误,否则导入失败
truncate指定分区
alter table emp truncate partition pmax;
drop指定分区
alter table emp drop partition pmax;
超出范围的数据,报错
HR@wjiao>insert into emp values (5000,'Gates',6000);
insert into emp values (5000,'Gates',6000)
*
ERROR at line 1:
ORA-14400: inserted partition key does not map to any partition
加入分区
alter table emp add partition pmax values less than(maxvalue);
HR@wjiao>insert into emp values (5000,'Gates',6000);
1 row created.
2、hash分区
hash分区最主要的机制是根据hash算法来计算具体某条纪录应该插入到哪个分区中
(问:hash算法是干什么的?呵呵,只能去看看数据结构了)
hash算法中最重要的是hash函数,Oracle中如果你要使用hash分区,只需指定分区的数量即可
建议分区的数量采用2的n次方,这样可以使得各个分区间数据分布更加均匀
hash列上数值不能有太多的重复值,否则会导致数据分布不均匀
drop table emp purge;
create table emp
(
empno number(5) not null,
ename varchar2(10),
sal number(7,2)
)
partition by hash(empno)
partitions 8 store in (users)
;
--插入数据,递增序列,分区数据分布均匀
set timing on;
begin
for i in 1..96400 loop
execute immediate 'insert into emp values(:1,:2,:3)' using i,'emp'||i,i;
end loop;
commit;
end;
/
HR@wjiao>analyze table emp compute statistics for table for all columns for all indexes;
HR@wjiao>select partition_name,count(*)
2 from user_extents
3 where segment_name = 'EMP'
4 group by partition_name;
PARTITION_NAME COUNT(*)
------------------------------ ----------
SYS_P82 6
SYS_P79 6
SYS_P81 6
SYS_P77 6
SYS_P78 6
SYS_P80 6
SYS_P83 6
SYS_P84 6
8 rows selected.
hash列上重复值过多,导致数据分布不均匀,建分区表应尽量使数据均匀分布
set timing on;
begin
for i in 1..96400 loop
execute immediate 'insert into emp values(:1,:2,:3)' using mod(i,4),'emp'||i,i;
end loop;
commit;
end;
/
HR@wjiao>select partition_name,count(*)
2 from user_extents
3 where segment_name = 'EMP'
4 group by partition_name;
PARTITION_NAME COUNT(*)
------------------------------ ----------
SYS_P205 6
SYS_P206 6
SYS_P208 16
SYS_P210 15
SYS_P212 16
SYS_P209 6
SYS_P207 6
SYS_P211 16
8 rows selected.
3、复合分区
将范围分区、hash分区综合在一起使用的技术
drop table emp purge;
create table emp
(
empno number(5) not null,
ename varchar2(10),
sal number(7,2)
)
partition by range(empno)
subpartition by hash(empno)
subpartitions 2
(
partition p1000 values less than (1000),
partition p2000 values less than (2000),
partition pmax values less than (maxvalue)
);
上面的例子中将雇员表先按照雇员工号empno进行范围分区,然后再把每个分区分为两个子hash分区。例子中一共将产生6个分区
HR@wjiao>select segment_name,segment_type,partition_name from user_segments where segment_name='EMP';
SEGMENT_NAME SEGMENT_TYPE PARTITION_NAME
------------------------------ ------------------ ------------------------------
EMP TABLE SUBPARTITION SYS_SUBP219
EMP TABLE SUBPARTITION SYS_SUBP220
EMP TABLE SUBPARTITION SYS_SUBP221
EMP TABLE SUBPARTITION SYS_SUBP222
EMP TABLE SUBPARTITION SYS_SUBP223
EMP TABLE SUBPARTITION SYS_SUBP224
6 rows selected.
查询需要使用子分区
HR@wjiao>select * from emp subpartition(SYS_SUBP219);
EMPNO ENAME SAL
---------- ---------- ----------
100 Tom 1000
500 Peter 2000
4、列表分区
一般分区列包含数据可枚举,有限个值,例如地区、或国家等
create table emp
(
empno number(4) not null,
ename varchar2(10),
location varchar2(13)
)
partition by list(location)
(
partition pnkg values ('Nanjing'),
partition pswc values ('Suzhou','Wuxi','Changzhou'),
partition pxls values ('Xuzhou','Lianyg','Suqian')
);
模拟数据
insert into emp values (100,'Tom','Nanjing');
insert into emp values (500,'Peter','Changzhou');
insert into emp values (1000,'Scott','Xuzhou');
insert into emp values (1999,'Bill','Wuxi');
insert into emp values (5000,'Gates','Lianyg');
commit;
观察记录分布
HR@wjiao>select * from emp partition(pnkg);
EMPNO ENAME LOCATION
---------- ---------- -------------
100 Tom Nanjing
HR@wjiao>select * from emp partition(pswc);
EMPNO ENAME LOCATION
---------- ---------- -------------
500 Peter Changzhou
1999 Bill Wuxi
二、如果一个表创建时没有利用分区技术,如何对一个没有添加过分区表的表上添加分区表呢?
比如scott下已存在emp_unpart表,未使用表分区技术
1、最简单方法
drop table emp_part purge;
create table emp_part
partition by hash(empno)
partitions 4 store in (users)
as select * from emp_unpart;
这个方法虽然简单,由于实际生产环境中数据量太大,几乎不可行
2、在线重定义,利用oracle提供的dbms_redefinition包
生成相同表结构的分区表
drop table emp_part purge;
create table emp_part
partition by hash(empno)
partitions 4 store in (users)
as select * from emp_unpart where 1=2;
SCOTT@wjiao>select segment_name,partition_name from user_extents;
SEGMENT_NAME PARTITION_NAME
------------------------------ ------------------------------
EMP_PART SYS_P285
EMP_PART SYS_P286
EMP_PART SYS_P287
EMP_PART SYS_P288
为了能够看到表的变化
SCOTT@wjiao>grant select on emp_unpart to hr;
HR@wjiao>select tablespace_name,table_name,owner from all_tables where owner = 'SCOTT';
TABLESPACE_NAME TABLE_NAME OWNER
------------------------------ ------------------------------ -----------------------------
USERS EMP_UNPART SCOTT
sys下执行表的在线重定义
SYS@wjiao>begin --使用包 DBMS_REDEFINITION 将非分区表转为分区表
2 dbms_redefinition.can_redef_table('SCOTT','EMP_UNPART',2);
3 dbms_redefinition.start_redef_table('SCOTT','EMP_UNPART','EMP_PART',null,2);
4 dbms_redefinition.finish_redef_table('SCOTT','EMP_UNPART','EMP_PART');
5 end;
6 /
PL/SQL procedure successfully completed.
SCOTT@wjiao>select segment_name,partition_name from user_extents;
SEGMENT_NAME PARTITION_NAME
------------------------------ ------------------------------
EMP_UNPART SYS_P285
EMP_UNPART SYS_P286
EMP_UNPART SYS_P287
EMP_UNPART SYS_P288
HR@wjiao>select tablespace_name,table_name,owner from all_tables where owner = 'SCOTT';
TABLESPACE_NAME TABLE_NAME OWNER
------------------------------ ------------------------------ ------------------------------
USERS EMP_PART SCOTT
从数据可以看出:在线重定义后,未分区的表重新使用分区表的定义;
三、如何实现分区滑动窗口
有需求:由于数据量巨大,只计划保留最近3个月数据,之前的数据入数据仓库
drop table emp purge;
drop table t_emp purge;
create table emp
(
empno number(4) not null,
ename varchar2(10),
hiredate date
)
partition by range(hiredate)
(
partition p201309 values less than (to_date('20131001','yyyymmdd')),
partition p201310 values less than (to_date('20131101','yyyymmdd')),
partition p201311 values less than (to_date('20131201','yyyymmdd')),
partition pmax values less than (maxvalue)
);
模拟数据
insert into emp values (100,'Tom', to_date('20130907','yyyymmdd'));
insert into emp values (500,'Peter', to_date('20131007','yyyymmdd'));
insert into emp values (1000,'Scott',to_date('20131107','yyyymmdd'));
insert into emp values (1999,'Bill', to_date('20131207','yyyymmdd'));
insert into emp values (5000,'Gates',to_date('20140107','yyyymmdd'));
commit;
假设当前时间为20131231235959,1s后操作如下
1、exchange分区和临时表,发生数据交换
创建另一张相同表结构的表
create table t_emp as select * from emp where 1=2;
观察记录分布
HR@wjiao>select * from emp partition(p201309);
EMPNO ENAME HIREDATE
---------- ---------- ---------
100 Tom 07-SEP-13
HR@wjiao>select * from t_emp;
no rows selected
利用exchange实现数据交换。exchange利用了指针原理,仅仅改变数据字典,速度非常快,几乎不涉及IO操作
HR@wjiao>alter table emp exchange partition p201309 with table t_emp;
注意exchange后数据变化,表面上分区和表数据发生了交换,实际是后台指针改变链接位置
HR@wjiao>select * from emp partition(p201309);
no rows selected
HR@wjiao>select * from t_emp;
EMPNO ENAME HIREDATE
---------- ---------- ---------
100 Tom 07-SEP-13
2、merge表分区
HR@wjiao>alter table emp merge partitions p201309,p201310 into partition p201310;
观察数据字典变化
HR@wjiao>select partition_name from user_extents where segment_name = 'EMP';
PARTITION_NAME
------------------------------
P201310
P201311
PMAX
3、split最后一个分区
HR@wjiao>alter table emp split partition pmax at (to_date('20140101','yyyymmdd')) into (partition p201312 tablespace ts_emp2, partition pmax tablespace ts_emp3);
观察数据字典变化
HR@wjiao>select partition_name from user_extents where segment_name = 'EMP';
PARTITION_NAME
------------------------------
P201310
P201311
P201312
PMAX
观察记录分布
HR@wjiao>select * from emp partition(p201312);
EMPNO ENAME HIREDATE
---------- ---------- ---------
1999 Bill 07-DEC-13
HR@wjiao>select * from emp partition(pmax);
EMPNO ENAME HIREDATE
---------- ---------- ---------
5000 Gates 07-JAN-14
接下来将t_emp的数据入到数据仓库
至此,滑动窗口完成