表分区性能测试

最新推荐文章于 2021-07-01 18:03:39 发布

追寻北极

最新推荐文章于 2021-07-01 18:03:39 发布

阅读量913

点赞数

分类专栏： db

本文链接：https://blog.csdn.net/joeyon1985/article/details/39393931

版权

db 专栏收录该内容

112 篇文章 0 订阅

订阅专栏

分区类型

hash、range、list、key

RANGE分区：基于一个给定连续区间的列值，把多行分配给分区。
LIST分区：类似于按RANGE分区，区别在于LIST分区是基于列值匹配一个离散值集合中的某个值来进行选择。
HASH分区：基于用户定义的表达式的返回值来进行选择的分区，该表达式使用将要插入到表中的这些行的列值进行计算。这个函数可以包含MySQL 中有效的、产生非负整数值的任何表达式。

hash用在数据相对比较随机的情况下。它是根据表中的内容进行hash运算后随机平均分配，假设这个列是性别，则不适合用hash分区，因为内容要么是男，要么是女，没有随机性。

KEY分区：类似于按HASH分区，区别在于KEY分区只支持计算一列或多列，且MySQL 服务器提供其自身的哈希函数。必须有一列或多列包含整数值。 ----很少用到

创建分区：

mysql> create table t1(id int)partition by hash(id)partitions 3;

Query OK, 0 rows affected (0.03 sec)

【实验】

分别创建一个分区的表和非分区的表，进行性能测试

创建分区表

mysql> create table part_tab ( c1 int default NULL, c2 varchar(30) default null, c3 date default null) engine=myisam
-> partition by range(year(c3))(
-> partition p0 values less than (1995),
-> partition p1 values less than (1996),
-> partition p2 values less than (1997),
-> partition p3 values less than (1998),
-> partition p4 values less than (1999),
-> partition p5 values less than (2000),
-> partition p6 values less than (2001),
-> partition p7 values less than (2002),
-> partition p8 values less than (2003),
-> partition p9 values less than (2004),
-> partition p10 values less than (2010),
-> partition p11 values less than MAXVALUE);
Query OK, 0 rows affected (0.14 sec)

创建非分区表
mysql> create table no_part_tab ( c1 int default NULL, c2 varchar(30) default null, c3 date default null) engine=myisam;
Query OK, 0 rows affected (0.11 sec)

mysql> \d // #由于下面要用到存储过程，这里需要修改结束符为“//”。所谓的存储过程其实也就是众多sql语句的集合。
mysql> create procedure load_part_tab()
-> begin
-> declare v int default 0;
-> while v < 8000000
-> do
-> insert into part_tab
-> values (v,'testing partitions',adddate('1995-01-01',(rand(v)*36520)mod 3652));
-> set v = v+1;
-> end while;
-> end
-> //
Query OK, 0 rows affected (0.04 sec)

mysql> \d ; // 执行完这个存储过程后，需要将结束符修改回去

上面的存储过程实际上是为了创建大量的数据（800万条）

mysql> call load_part_tab(); // 调用load_part_tab这个存储过程
Query OK, 1 row affected (9 min 18.95 sec)

快速将part_tab里面的数据插入到no_part_tab里面
mysql> insert no_part_tab select * from part_tab;
Query OK, 8000000 rows affected (8.97 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

测试一：

实验之前确保两个表里面的数据是一致的！保证实验的可比性

mysql> select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (0.49 sec)

mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
+----------+
| count(*) |
+----------+
| 795181 |
+----------+
1 row in set (3.94 sec)

mysql> desc select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: part_tab
type: ALL //全表扫描
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 798458
Extra: Using where
1 row in set (0.09 sec)

ERROR:
No query specified

mysql> desc select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: no_part_tab
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 8000000
Extra: Using where
1 row in set (0.00 sec)

ERROR:
No query specified

结论：可以看到，做了分区之后，只需要扫描79万条语句，而不做分区的，则需要进行全表扫描，故可以看出，做了分区技术后，可以提高读写效率。

测试2：
创建索引，查看语句执行情况

mysql> create index idx_c3 on no_part_tab(c3)；
Query OK, 8000000 rows affected (32.68 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

结果分析：

mysql> desc select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: NO_part_tab
type: range
possible_keys: idx_c3
key: idx_c3
key_len: 4
ref: NULL
rows: 785678
Extra: Using where; Using index
1 row in set (0.16 sec)

ERROR:
No query specified

结论：为未分区的表创建了索引之后，再次执行相同的语句，可以看到该SQL语句是根据range索引进行检索，而不是全表扫描了。明显效率也提高了。

测试3：

测试做索引与未作索引的读写效率。

mysql> create index idx_c3 on part_tab(c3);
Query OK, 8000000 rows affected (31.85 sec)
Records: 8000000 Duplicates: 0 Warnings: 0

mysql> desc select count(*) from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31'\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: part_tab
type: index
possible_keys: idx_c3
key: idx_c3
key_len: 4
ref: NULL
rows: 798458
Extra: Using where; Using index
1 row in set (0.14 sec)

ERROR:
No query specified

测试未创建索引字段

mysql> select count(*) from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31' and c2='hello';

+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (4.90 sec)

结论：可以看到如果没通过索引进行检索所耗费的时间将长于通过索引进行检索。

测试4：删除
mysql> delete from part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
Query OK, 795181 rows affected (14.02 sec)

mysql> delete from no_part_tab where c3 > date '1995-01-01' and c3 < date '1995-12-31';
Query OK, 795181 rows affected (15.21 sec)

结论：可以看到，在删除方面，有分区的还是比没分区的快一点。从而体现了其便于数据管理的特点
方便数据管理这点，我通过下面的例子来说明：比如数据库的表t1记录的是今年一整年（12个月）公司的营业额，在未分区的情况下，也就是说数据文件都存放在同一个文件里面，那么假如现在要删除第一个季度的记录，那么需要全表扫描才能得出结果。但如果t1这个表事先做了分区，那么我只需要分别删除1,2,3这三个文件即可。所以从一定程度上，还是方便了管理。