本文包含知识点:
1、mysql数据库给大表增加字段;
2、mysql表分区;
3、linux排查磁盘空间;
4、linux中的mysql导入导出数据;
环境:centos7.4 + mysql5.7.24
事件原由:线上版本迭代的过程中,根据新需求需要对表进行增减字段、重新分区,表数据达到一定量(600W左右),占用磁盘100多G,表结构如下:
CREATE TABLE `news_abc_article` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '主键',
`article_id` varchar(50) NOT NULL COMMENT '文章ID',
`crawl_year` int(4) NOT NULL COMMENT '抓取年份,第一版的时候是根据这个字段值来分区的',
`crawl_time` datetime NOT NULL COMMENT '抓取时间',
`publish_time` datetime NOT NULL COMMENT '发布时间,第二版用此字段值进行表分区',
`source` varchar(50) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '网站来源',
`source_url` varchar(500) DEFAULT NULL COMMENT '文章URL',
`title` varchar(200) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '标题',
`content_type` varchar(10) DEFAULT NULL COMMENT '内容类型(html、pdf)',
`clustering_id` varchar(100) DEFAULT NULL COMMENT '聚类ID',
`is_new_clustering` tinyint(1) DEFAULT NULL COMMENT '是否为新聚类',
`is_useable` tinyint(1) DEFAULT '0' COMMENT '可用状态,1可用 0不可用',
`is_unimportant` tinyint(1) DEFAULT NULL COMMENT '是否不重要',
`summary` varchar(1000) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT '摘要',
`content` longtext CHARACTER SET utf8mb4 COMMENT '正文',
`html_content` longtext CHARACTER SET utf8mb4 COMMENT '带标签正文',
`updateid` decimal(15,0) NOT NULL COMMENT '变动标识',
`isvalid` tinyint(1) NOT NULL DEFAULT '1' COMMENT '是否有效,0无效 1有效,可用于逻辑删除',
`entrytime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '入库时间',
`updatetime` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '更新时间',
`emotion` varchar(10) DEFAULT NULL COMMENT '文章情绪归类,取值:正面、负面、中性',
`news_type` varchar(50) DEFAULT NULL COMMENT '新闻类别',
PRIMARY KEY (`id`,`publish_time`),
KEY `idx_crawl_time` (`crawl_time`) USING BTREE,
KEY `idx_publish_time` (`publish_time`) USING BTREE,
KEY `idx_article_publish_time` (`article_id`,`publish_time`) USING BTREE,
KEY `idx_article_crawl_year` (`article_id`,`crawl_year`) USING BTREE,
KEY `idx_updateid` (`updateid`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=546 DEFAULT CHARSET=utf8 COMMENT='文章'
PARTITION BY RANGE (TO_DAYS(publish_time))(
PARTITION p_201901 VALUES LESS THAN (TO_DAYS('2019-01-01')) ENGINE = InnoDB,
PARTITION p_201902 VALUES LESS THAN (TO_DAYS('2019-02-01')) ENGINE = InnoDB,
PARTITION p_201903 VALUES LESS THAN (TO_DAYS('2019-03-01')) ENGINE = InnoDB,
PARTITION p_201904 VALUES LESS THAN (TO_DAYS('2019-04-01')) ENGINE = InnoDB,
PARTITION p_201905 VALUES LESS THAN (TO_DAYS('2019-05-01')) ENGINE = InnoDB,
PARTITION p_201906 VALUES LESS THAN (TO_DAYS('2019-06-01')) ENGINE = InnoDB,
PARTITION p_201907 VALUES LESS THAN (TO_DAYS('2019-07-01')) ENGINE = InnoDB,
PARTITION p_201908 VALUES LESS THAN (TO_DAYS('2019-08-01')) ENGINE = InnoDB,
PARTITION p_201909 VALUES LESS THAN (TO_DAYS('2019-09-01')) ENGINE = InnoDB,
PARTITION p_201910 VALUES LESS THAN (TO_DAYS('2019-10-01')) ENGINE = InnoDB,
PARTITION p_201911 VALUES LESS THAN (TO_DAYS('2019-11-01')) ENGINE = InnoDB,
PARTITION p_201912 VALUES LESS THAN (TO_DAYS('2019-12-01')) ENGINE = InnoDB,
PARTITION p_202001 VALUES LESS THAN (TO_DAYS('2020-01-01')) ENGINE = InnoDB,
PARTITION p_202002 VALUES LESS THAN (TO_DAYS('2020-02-01')) ENGINE = InnoDB,
PARTITION p_202003 VALUES LESS THAN (TO_DAYS('2020-03-01')) ENGINE = InnoDB,
PARTITION p_202004 VALUES LESS THAN (TO_DAYS('2020-04-01')) ENGINE = InnoDB,
PARTITION p_202005 VALUES LESS THAN (TO_DAYS('2020-05-01')) ENGINE = InnoDB,
PARTITION p_202006 VALUES LESS THAN (TO_DAYS('2020-06-01')) ENGINE = InnoDB,
PARTITION p_202007 VALUES LESS THAN (TO_DAYS('2020-07-01')) ENGINE = InnoDB,
PARTITION p_202008 VALUES LESS THAN (TO_DAYS('2020-08-01')) ENGINE = InnoDB,
PARTITION p_202009 VALUES LESS THAN (TO_DAYS('2020-09-01')) ENGINE = InnoDB,
PARTITION p_202010 VALUES LESS THAN (TO_DAYS('2020-10-01')) ENGINE = InnoDB,
PARTITION p_202011 VALUES LESS THAN (TO_DAYS('2020-11-01')) ENGINE = InnoDB,
PARTITION p_202012 VALUES LESS THAN (TO_DAYS('2020-12-01')) ENGINE = InnoDB,
PARTITION p_202101 VALUES LESS THAN (TO_DAYS('2021-01-01')) ENGINE = InnoDB,
PARTITION p_202102 VALUES LESS THAN (TO_DAYS('2021-02-01')) ENGINE = InnoDB,
PARTITION p_202103 VALUES LESS THAN (TO_DAYS('2021-03-01')) ENGINE = InnoDB,
PARTITION p_202104 VALUES LESS THAN (TO_DAYS('2021-04-01')) ENGINE = InnoDB,
PARTITION p_202105 VALUES LESS THAN (TO_DAYS('2021-05-01')) ENGINE = InnoDB,
PARTITION p_202106 VALUES LESS THAN (TO_DAYS('2021-06-01')) ENGINE = InnoDB,
PARTITION p_202107 VALUES LESS THAN (TO_DAYS('2021-07-01')) ENGINE = InnoDB,
PARTITION p_202108 VALUES LESS THAN (TO_DAYS('2021-08-01')) ENGINE = InnoDB,
PARTITION p_202109 VALUES LESS THAN (TO_DAYS('2021-09-01')) ENGINE = InnoDB,
PARTITION p_202110 VALUES LESS THAN (TO_DAYS('2021-10-01')) ENGINE = InnoDB,
PARTITION p_202111 VALUES LESS THAN (TO_DAYS('2021-11-01')) ENGINE = InnoDB,
PARTITION p_202112 VALUES LESS THAN (TO_DAYS('2021-12-01')) ENGINE = InnoDB,
PARTITION p_2022 VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);
可以看到,表中content和html_content字段值较大,具体表结构及存储值的合理性,不在此篇文章讨论。
需求1:新增一个字段`channel` varchar(20) DEFAULT NULL COMMENT '信息源分类';
分析:大数据量表直接执行容易导致mysql锁表,导致业务堆积,也可能导致mysql服务宕掉;测试环境mysql试过直接执行"alter table news_abc_article add column `channel` varchar(20) DEFAULT NULL COMMENT '信息源分类'"发现执行很久(3 hour)未响应,于是mysql执行show processlist;把该条alter语句的进程kill掉。
推荐思路:夜间业务量最少的时候暂停业务-->rename表-->新建含channel字段的表-->原来数据导出-->导入到新表-->恢复业务
这个期间,导出数据蛮快的。(我这是因为在客户的机器上操作的,测试和线上机器都比较严格)多提一句,导成格式为×××.sql后,如果需要拷贝到其它机器执行导入,最好要压缩一下,linux上,tar.gz格式是相对压缩率比较高的,tar -zcvf ×××.sql。
需求二:表要重新分区,根据crawl_time(值的格式形如2019-01-01 12:00:14)字段的年月来分区(上一版是根据crawl_year抓取年份来分区的),期间要做的工作和需求一类似,也涉及到导入导出数据,不赘述。【表分区并不会实际减少表数据占用空间,但是在查询时候用到了分区字段的话,会加快查询速度】
需求三:mysql导出导入数据
mysql导出数据方式1:mysql执行select * from news_abc_article where crawl_time>"2019-05-01 00:00:01" into outfile '/data.sql' character set 'utf8';
发现报错: ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement
解决方案: https://blog.csdn.net/qq_31518899/article/details/75662090
【不太推荐上述方式】
mysql导出数据方式2:
导出:使用mysql自带的mysqldump【推荐】
mysqldump后面支持很多参数,有兴趣的可以直接执行mysqldump --help查看
mysqldump -t -h127.0.0.1 -p3306 -usimonsfan -p htsec_db news_abc_article --where="crawl_time<'2019-01-01 00:00:00'" > /data.sql
-t参数表示不导出结构,只导出数据,如上语句即表示导入本台机器中,htsec_db数据库的news_abc_article表中crawl_time条件大于2019年度的数据到根目录/下的data.sql文件中
mysqldump -d 数据库名 -u用户名 -p > xxx.sql
-d表示导出结构不导出数据
导入:source /data.sql
登录进mysql中,执行如下
mysql -u用户名 -p密码
source /data.sql
这个过程中,如果导入的数据有错误(比如主键重复、或者其它错误啥的),source执行的导入会跳过错误,自动继续执行导入,直到全部数据导入为止。顺带提一句,linux上的mysql中执行查询时候加上"\G"可以让查询结果格式化,例如"select * from news_abc_article limit 1 \G;",因为今天有个同事看我写这个SQL时吃惊于竟然还可以用"\G"格式化。
需求四:linux查看及释放磁盘空间
首先是几个查看磁盘空间的命令:
du -sh ${dir} :查看机器上某个路径的磁盘空间
df -lh :查看机器的整体磁盘空间
ll -h :查看当前路径下的各个文件占用的磁盘空间
因为某个场景下,我需要把线上mysql某个表数据导入到线下测试环境mysql,在执行导入data.sql(大约50G的样子)时候,发现报错:表空间满了(这个错误忘记截图了),于是乎就用df -lh查看了下磁盘空间情况
执行结果:
[root@e-abc-yq-mysql data]# df -lh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 187G 0 100% /data
发现/data/下空间满了(/data目录下是测试环境mysql配置的存放数据目录),于是进去删除了一些不需要的数据(由于测试环境之前测试有一些测试表数据,竟然有80多G),包括一些不需要的表分区数据ibd文件,形如"news_abc_article_test#P#p_202011.ibd"。执行rm -rf ×××.ibd语句,删除了后,用du -sh /data 查看了一下,发现/data空间只占用了100G,和预期的差不多,于是继续执行source /data.sql导入mysql,期间又报错,还是磁盘空间已满,这下懵圈了,脑袋瓜子嗡嗡的,明明刚刚释放80多G空间,怎么又会不够呢?于是df -lh查看了一下,发现/data目录下还是:
[root@e-abc-yq-mysql data]# df -lh
[root@e-abc-yq-mysql data]# df -lh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 187G 0 100% /data
和没清理前一样,继续折腾了一会还是如此,无奈,找客户方的运维咨询了一下,他们提议说,可以把清理的数据对应的服务重启一下试试,于是就重启了测试环境的mysql,再执行du -sh /data以及df -lh发现ok
[root@e-abc-yq-mysql data]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 109G 78G 59% /data
所以结论就是:要重启服务,才能刷新释放服务对应的磁盘空间。偶尔总结一下工作,如有错误,欢迎留言。
【引申阅读】
Mysql主从复制-基于GTID复制:https://blog.csdn.net/fanrenxiang/article/details/70197004
Mysql主从复制-基于日志点复制:https://blog.csdn.net/fanrenxiang/article/details/70194707
Mysql二进制日志详解:https://blog.csdn.net/fanrenxiang/article/details/70193636
Mysql慢查询日志详解:https://blog.csdn.net/fanrenxiang/article/details/83687241