分区
官网地址 https://dev.mysql.com/doc/refman/5.7/en/partitioning.html
概述
随着时间的推移,或者是业务量的增加,导致我们系统中的某些表的数据量过大,查询相应速度过慢,甚至产生锁表堵塞现象,产生假死。那么如何解决这种数据量增加,减少历史数据的影响呢?如何定期淘汰历史数据,提升查询效率,减少缓存、IO开销呢?本节会通过mysql的分区特性来给大家说道说道。在系统设计初期没有考虑到表分区的问题,又该如何迁移表呢?本篇文章带着大家揭开谜底。
RANGE Partitioning
这种分区规则是最常见的,我们可以按照时间进行分区,通常是按照天进行
例子:
CREATE TABLE members (
firstname VARCHAR(25) NOT NULL,
lastname VARCHAR(25) NOT NULL,
username VARCHAR(16) NOT NULL,
email VARCHAR(35),
joined DATE NOT NULL
)
PARTITION BY RANGE( YEAR(joined) ) (
PARTITION p0 VALUES LESS THAN (1960),
PARTITION p1 VALUES LESS THAN (1970),
PARTITION p2 VALUES LESS THAN (1980),
PARTITION p3 VALUES LESS THAN (1990),
PARTITION p4 VALUES LESS THAN MAXVALUE
);
上面这个例子是通过年份进行分区,这样可以在查询的时候按照分区字段进行查询,快速的淘汰历史数据。
LIST Partitioning
例子:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LIST(store_id) (
PARTITION pNorth VALUES IN (3,5,6,9,17),
PARTITION pEast VALUES IN (1,2,10,11,19,20),
PARTITION pWest VALUES IN (4,12,13,14,18),
PARTITION pCentral VALUES IN (7,8,15,16)
);
//可以执行成功
insert into employees values(1,"zhang","deheng",'2013-02-03','9999-12-31',1,1);
//找不到分区,预先定义的分区不存在,所以不能插入
insert into employees values(1,"zhang","deheng",'2013-02-03','9999-12-31',1,21);
explain partitions select * from employees;
explain partitions select * from employees where store_id in (1,16);
list 分区的优势在于可以将灵活的离散数据进行分区。
HASH Partitioning
例子:创建一个t1 表,同时用hash 为col3 列建立4个分区
CREATE TABLE t1 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY HASH( YEAR(col3) )
PARTITIONS 4;
当我们执行 insert into t1 values(1,”2”,’2015-09-01’)时 ,hash 会为我们通过下面的代码计算所属的分区
MOD(YEAR('2005-09-01'),4)
= MOD(2005,4)
= 1
看效果
explain partitions select * from t1 ;
explain partitions select * from t1 where col3='2015-09-02';
Hash 分区这里可以不需要指定partitions子句,但是必须要写分区数量。
Linear Hash 分区
mysql也支持线性hashing 和Hash分区是两种不同的算法,优点是在数据量大的场景,譬如TB级,增加、删除、合并和拆分分区会更快,缺点是,相对于HASH分区,它数据分布不均匀的概率更大。
格式如下:
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY LINEAR HASH( YEAR(hired) )
PARTITIONS 4;
算法实现原理:
例子:
CREATE TABLE t2 (col1 INT, col2 CHAR(5), col3 DATE)
PARTITION BY LINEAR HASH( YEAR(col3) )
PARTITIONS 6;
1.当我们插入记录的col3是2003-04-14时,计算过程如下:
V = POWER(2, CEILING( LOG(2,6) )) = 8
N = YEAR('2003-04-14') & (8 - 1)
= 2003 & 7
= 3
(3 >= 6 is FALSE: record stored in partition #3)
2.当我们插入记录的col3是1998-10-19时,计算过程如下:
V = 8
N = YEAR('1998-10-19') & (8 - 1)
= 1998 & 7
= 6
(6 >= 6 is TRUE: additional step required)
N = 6 & ((8 / 2) - 1)
= 6 & 3
= 2
(2 >= 6 is FALSE: record stored in partition #2)
KEY Partitioning
key 分区算法使用的是md5进行计算,key对象必须是列,不能是表达式,key分区的定义分两种形式,一种是有主键或者唯一键,另一种是没有主键或者唯一键的情况
1.有主键的格式
CREATE TABLE k1 (
id INT NOT NULL PRIMARY KEY,
name VARCHAR(20)
)
PARTITION BY KEY()
PARTITIONS 2;
2.没主键的格式
CREATE TABLE k1 (
id INT NOT NULL,
name VARCHAR(20),
UNIQUE KEY (id)
)
PARTITION BY KEY()
PARTITIONS 2;
Subpartitioning
子分区是将前面几个分区模式组合起来进行使用,详细参考官网https://dev.mysql.com/doc/refman/5.7/en/partitioning-subpartitions.html
How MySQL Partitioning Handles NULL
官网给出了每种分区模式在处理Null 的情况https://dev.mysql.com/doc/refman/5.7/en/partitioning-handling-nulls.html
如何将表迁移到分区表
以下内容转载自:https://blog.csdn.net/fdipzone/article/details/79769524
mysql重建表分区并保留数据的方法:
1.创建与原始表一样结构的新表,新分区。
2.将原始表中数据复制到新表。
3.删除原始表。
4.将新表名称改为原始表名称。
实例:
日志表原始结构如下,按id分区。
CREATE DATABASE `test`;
use `test`;
CREATE TABLE `log` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`content` text NOT NULL COMMENT '内容',
`status` tinyint(3) unsigned NOT NULL COMMENT '记录状态',
`addtime` int(11) unsigned NOT NULL COMMENT '添加时间',
`lastmodify` int(11) unsigned NOT NULL COMMENT '最后修改时间',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (id)
(PARTITION p10w VALUES LESS THAN (100000) ENGINE = InnoDB,
PARTITION p20w VALUES LESS THAN (200000) ENGINE = InnoDB,
PARTITION p50w VALUES LESS THAN (500000) ENGINE = InnoDB,
PARTITION p100w VALUES LESS THAN (1000000) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
insert into `log`(content,status,addtime,lastmodify)
values('content1',1, unix_timestamp('2018-01-11 00:00:00'), unix_timestamp('2018-01-11 00:00:00')),
('content2',1, unix_timestamp('2018-02-22 00:00:00'), unix_timestamp('2018-02-22 00:00:00')),
('content3',1, unix_timestamp('2018-03-31 00:00:00'), unix_timestamp('2018-03-31 00:00:00'));
查看数据分区分布
SELECT PARTITION_NAME,TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA='test' AND TABLE_NAME = 'log';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| p10w | 3 |
| p20w | 0 |
| p50w | 0 |
| p100w | 0 |
| pmax | 0 |
+----------------+------------+
日志数据需要按时间进行搜寻,因此需要按日志时间重建分区。
1.创建log2,按时间分区(每月1个分区)
CREATE TABLE `log2` (
`id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`content` text NOT NULL COMMENT '内容',
`status` tinyint(3) unsigned NOT NULL COMMENT '记录状态',
`addtime` int(11) unsigned NOT NULL COMMENT '添加时间',
`lastmodify` int(11) unsigned NOT NULL COMMENT '最后修改时间',
PRIMARY KEY (`id`,`addtime`),
KEY `id`(`id`),
KEY `addtime`(`addtime`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8
/*!50100 PARTITION BY RANGE (addtime)
(PARTITION p201801 VALUES LESS THAN (unix_timestamp('2018-02-01 00:00:00')) ENGINE = InnoDB,
PARTITION p201802 VALUES LESS THAN (unix_timestamp('2018-03-01 00:00:00')) ENGINE = InnoDB,
PARTITION p201803 VALUES LESS THAN (unix_timestamp('2018-04-01 00:00:00')) ENGINE = InnoDB,
PARTITION p201804 VALUES LESS THAN (unix_timestamp('2018-05-01 00:00:00')) ENGINE = InnoDB,
PARTITION pmax VALUES LESS THAN MAXVALUE ENGINE = InnoDB) */;
2.将log的数据复制到log2
insert into `log2` select * from `log`;
3.删除log表
drop table `log`;
4.将log2表改名为log
rename table `log2` to `log`;
执行后查看数据分区分布
SELECT PARTITION_NAME,TABLE_ROWS FROM INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_SCHEMA='test' AND TABLE_NAME = 'log';
+----------------+------------+
| PARTITION_NAME | TABLE_ROWS |
+----------------+------------+
| p201801 | 1 |
| p201802 | 1 |
| p201803 | 1 |
| p201804 | 0 |
| pmax | 0 |
+----------------+------------+
可以看到log表的数据已经按新分区存储。