Hash Partitions
Filed under: Infrastructure,Partitioning,Performance — Jonathan Lewis @ 5:57 pm BST Sep 21,2009
I made a throwaway comment in a recent posting about using powers of two for the number of partitions when using hash partitioning. The article in question was talking about globally partitioned indexes, but the “power of 2” principle was first associated with tables.
Here’s a simple demonstration of hash partitioning in action demonstrating why Oracle adopted this “power of 2” rule. We start by creating a table that doesn’t obey the rule – with six partitions – and collect stats on it to see how many rows go into each partition:
create table t1 (
id,
v1
)
partition by hash(id)
partitions 6
as
select
rownum,
object_name
from
all_objects
where
rownum <= 32000
;
column partition_name new_value m_pt
execute dbms_stats.gather_table_stats(user,'t1')
select partition_position, partition_name, num_rows
from user_tab_partitions
where table_name = 'T1'
;
PARTITION_POSITION PARTITION_NAME NUM_ROWS
------------------ -------------------- ----------
1 SYS_P2161 3981
2 SYS_P2162 3988
3 SYS_P2163 8246
4 SYS_P2164 7945
5 SYS_P2165 3912
6 SYS_P2166 3928
Rather than seeing evenly scattered data, we see four small and two big partitions – and the two big partitions are numbers three and four, and they’re twice the size of the little ones (plus or minus a bit). Let’s see what happens if we tell Oracle to increase the number of partitons to seven:
alter table t1 add partition;
execute dbms_stats.gather_table_stats(user,'t1')
select partition_position, partition_name, num_rows
from user_tab_partitions
where table_name = 'T1'
;
PARTITION_POSITION PARTITION_NAME NUM_ROWS
------------------ -------------------- ----------
1 SYS_P2161 3981
2 SYS_P2162 3988
3 SYS_P2163 4112
4 SYS_P2164 7945
5 SYS_P2165 3912
6 SYS_P2166 3928
7 SYS_P2167 4134
Partition three (the lower of the two big partitions) has been split into two, and half its data copied into the new partition (partition seven). Oracle has improved the data distribution with minimum overheads. To get a uniform spread of data across partitions we need to do another “add partition” which will split partition four into partitions four and eight spreading the data evenly between them.
If you create a table with 2^N hash partitions then (assuming random distribution of lots of distinct values for the partitioning key) each partition will hold a similar amount of data. If you need to increase the number of partitions you “add partition” 2^N times and Oracle, working from the bottom up, will split partition X into two partitions, X and X + 2^N.
The same “power of 2” rule applies to hash subpartitions when you use composite partitioning.