oracle每一行的hash值,Hash分区表分区数与数据分布的测试

最新推荐文章于 2023-09-19 19:32:54 发布

weixin_39743357

最新推荐文章于 2023-09-19 19:32:54 发布

阅读量437

点赞数

oracle每一行的hash值

较早就知道Hash分区建议分区数是2的幂，只是当作经验值记录，一直都没有测试过，今天做了个简单测试，供大家参考。亲手做过的实验，

记忆更加深刻一些。

Oracle 10.2.0.4

1. 建立分区数为5的hash分区表test01：

create table test01

partition by hash(object_id)

(partition p1,

partition p2,

partition p3,

partition p4,

Partition p5)

as select * from sys.dba_objects;

查看各个分区的记录数 (隐约可以看出如果1，5合并的话，数据分布会非常平均)：

select count(*) from test01 partition (p1);

6746

select count(*) from test01 partition (p2);

13550

select count(*) from test01 partition (p3);

13764

select count(*) from test01 partition (p4);

13445

select count(*) from test01 partition (p5);

6777

2. 直接建立分区数为8 (2的3次方) 的hash分区表test02：

create table test02

partition by hash(object_id)

(partition p1,

partition p2,

partition p3,

partition p4,

partition p5,

partition p6,

partition p7,

Partition p8)

as select * from sys.dba_objects;

查看各个分区的记录数 (数据是平均分布的)：

select count(*) from test02 partition (p1);

6750

select count(*) from test02 partition (p2);

6861

select count(*) from test02 partition (p3);

6891

select count(*) from test02 partition (p4);

6682

select count(*) from test02 partition (p5);

6778

select count(*) from test02 partition (p6);

6689

select count(*) from test02 partition (p7);

6874

select count(*) from test02 partition (p8);

6766

3. 在test01上增加hash分区p6：

alter table test01 add partition p6 ;

这时候后来看test01的数据分布：

select count(*) from test01 partition (p1); -- 没变

6746

select count(*) from test01 partition (p2); -- 少了6689

6861

select count(*) from test01 partition (p3); -- 没变

13764

select count(*) from test01 partition (p4); -- 没变

13445

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 恰好是6689

6689

4. 在test01上增加hash分区p7：

alter table test01 add partition p7 ;

这时候后来看test01的数据分布(以下比较是相对于加入p6后)：

select count(*) from test01 partition (p1); -- 没变

6746

select count(*) from test01 partition (p2); -- 没变

6861

select count(*) from test01 partition (p3); -- 少了6874

6890

select count(*) from test01 partition (p4); -- 没变

13445

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 没变

6689

select count(*) from test01 partition (p7); -- 恰好是6874

6874

5. 在test01上增加hash分区p8：

alter table test01 add partition p8 ;

这时候后来看test01的数据分布(以下比较是相对于加入p7后)：

select count(*) from test01 partition (p1); -- 没变

6746

select count(*) from test01 partition (p2); -- 没变

6861

select count(*) from test01 partition (p3); -- 没变

6890

select count(*) from test01 partition (p4); -- 少了6765

6680

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 没变

6689

select count(*) from test01 partition (p7); -- 没变

6874

select count(*) from test01 partition (p7); -- 恰好是6765

6765

大家从上面的数据分布拆分情况可以大致看出Oracle是如何将数据平均分布

的，也应该大致理解了为什么Oracle的HASH分区数建议是2个幂。

还可以看到加入到8个分区(2的3次方)后数据都平均分布了，和一次性直接划分

为8个分区数据分布比较接近 (但是不相同)。

6. 下面简单测试一下如果从8个分区继续加入到9，10，11，16

个分区又是怎样的情况呢？这里我们还是以test01表来做测试。

alter table test01 add partition p9 ;

这时候后来看test01的数据分布(以下比较是相对于加入p8后)：

select count(*) from test01 partition (p1); -- 少了3390

3356

select count(*) from test01 partition (p2); -- 没变

6861

select count(*) from test01 partition (p3); -- 没变

6890

select count(*) from test01 partition (p4); -- 没变

6680

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 没变

6689

select count(*) from test01 partition (p7); -- 没变

6874

select count(*) from test01 partition (p8); -- 没变

6765

select count(*) from test01 partition (p9); -- 恰好是3390

3390

7. alter table test01 add partition p10 ;

这时候后来看test01的数据分布(以下比较是相对于加入p9后)：

select count(*) from test01 partition (p1); -- 没变

3356

select count(*) from test01 partition (p2); -- 少了3443

3418

select count(*) from test01 partition (p3); -- 没变

6890

select count(*) from test01 partition (p4); -- 没变

6680

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 没变

6689

select count(*) from test01 partition (p7); -- 没变

6874

select count(*) from test01 partition (p8); -- 没变

6765

select count(*) from test01 partition (p9); -- 没变

3390

select count(*) from test01 partition (p10); -- 恰好是3443

3443

8. alter table test01 add partition p11 ;

这时候后来看test01的数据分布(以下比较是相对于加入p10后)：

select count(*) from test01 partition (p1); -- 没变

3356

select count(*) from test01 partition (p2); -- 没变

3418

select count(*) from test01 partition (p3); -- 少了3444

3446

select count(*) from test01 partition (p4); -- 没变

6680

select count(*) from test01 partition (p5); -- 没变

6777

select count(*) from test01 partition (p6); -- 没变

6689

select count(*) from test01 partition (p7); -- 没变

6874

select count(*) from test01 partition (p8); -- 没变

6765

select count(*) from test01 partition (p9); -- 没变

3390

select count(*) from test01 partition (p10); -- 没变

3443

select count(*) from test01 partition (p11); -- 恰好是3444

3444

OK, 其实不用测试这么多，大家就可以看出规律了，但是这里之所以测试

这些，是为了通过概率的方式统计一下到底每次在拆分数据量的时候有什

么规律 (虽然大前提是hash算法)。这里可以粗略知道的是：假设一个

表从8个分区增加到16个分区， partition 1~8 的 hash bucket no 应

该和9～16 的对应相等，因为9～16的数据都是分别从1～8 partition中

拆分出来的。

9. 现在我们一次性将分区加到16个，看看数据分布情况，明显已经均匀分布了。

select count(*) from test01 partition (p1);

3356

select count(*) from test01 partition (p2);

3418

select count(*) from test01 partition (p3);

3446

select count(*) from test01 partition (p4);

3322

select count(*) from test01 partition (p5);

3427

select count(*) from test01 partition (p6);

3367

select count(*) from test01 partition (p7);

3392

select count(*) from test01 partition (p8);

3421

select count(*) from test01 partition (p9);

3390

select count(*) from test01 partition (p10);

3443

select count(*) from test01 partition (p11);

3444

select count(*) from test01 partition (p12);

3358

select count(*) from test01 partition (p13);

3350

select count(*) from test01 partition (p14);

3322

select count(*) from test01 partition (p15);

3482

select count(*) from test01 partition (p16);

3344

[本帖最后由 tolywang 于 2011-1-21 10:01 编辑]

weixin_39743357

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。