data目录下是测试环境mysql配置的存放数据目录

本文包含知识点:

1、mysql数据库给大表增加字段;

2、mysql表分区;

3、linux排查磁盘空间;

4、linux中的mysql导入导出数据;

环境:centos7.4 + mysql5.7.24

事件原由:线上版本迭代的过程中,根据新需求需要对表进行增减字段、重新分区,表数据达到一定量(600W左右),占用磁盘100多G,表结构如下:

CREATE TABLE news_abc_article (
id bigint(20) NOT NULL AUTO_INCREMENT COMMENT ‘主键’,
article_id varchar(50) NOT NULL COMMENT ‘文章ID’,
crawl_year int(4) NOT NULL COMMENT ‘抓取年份,第一版的时候是根据这个字段值来分区的’,
crawl_time datetime NOT NULL COMMENT ‘抓取时间’,
publish_time datetime NOT NULL COMMENT ‘发布时间,第二版用此字段值进行表分区’,
source varchar(50) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT ‘网站来源’,
source_url varchar(500) DEFAULT NULL COMMENT ‘文章URL’,
title varchar(200) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT ‘标题’,
content_type varchar(10) DEFAULT NULL COMMENT ‘内容类型(html、pdf)’,
clustering_id varchar(100) DEFAULT NULL COMMENT ‘聚类ID’,
is_new_clustering tinyint(1) DEFAULT NULL COMMENT ‘是否为新聚类’,
is_useable tinyint(1) DEFAULT ‘0’ COMMENT ‘可用状态,1可用 0不可用’,
is_unimportant tinyint(1) DEFAULT NULL COMMENT ‘是否不重要’,
summary varchar(1000) CHARACTER SET utf8mb4 DEFAULT NULL COMMENT ‘摘要’,
content longtext CHARACTER SET utf8mb4 COMMENT ‘正文’,
html_content longtext CHARACTER SET utf8mb4 COMMENT ‘带标签正文’,
updateid decimal(15,0) NOT NULL COMMENT ‘变动标识’,
isvalid tinyint(1) NOT NULL DEFAULT ‘1’ COMMENT ‘是否有效,0无效 1有效,可用于逻辑删除’,
entrytime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘入库时间’,
updatetime datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT ‘更新时间’,
emotion varchar(10) DEFAULT NULL COMMENT ‘文章情绪归类,取值:正面、负面、中性’,
news_type varchar(50) DEFAULT NULL COMMENT ‘新闻类别’,
PRIMARY KEY (id,publish_time),
KEY idx_crawl_time (crawl_time) USING BTREE,
KEY idx_publish_time (publish_time) USING BTREE,
KEY idx_article_publish_time (article_id,publish_time) USING BTREE,
KEY idx_article_crawl_year (article_id,crawl_year) USING BTREE,
KEY idx_updateid (updateid) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=546 DEFAULT CHARSET=utf8 COMMENT=‘文章’
PARTITION BY RANGE (TO_DAYS(publish_time))(
PARTITION p_201901 VALUES LESS THAN (TO_DAYS(‘2019-01-01’)) ENGINE = InnoDB,
PARTITION p_201902 VALUES LESS THAN (TO_DAYS(‘2019-02-01’)) ENGINE = InnoDB,
PARTITION p_201903 VALUES LESS THAN (TO_DAYS(‘2019-03-01’)) ENGINE = InnoDB,
PARTITION p_201904 VALUES LESS THAN (TO_DAYS(‘2019-04-01’)) ENGINE = InnoDB,
PARTITION p_201905 VALUES LESS THAN (TO_DAYS(‘2019-05-01’)) ENGINE = InnoDB,
PARTITION p_201906 VALUES LESS THAN (TO_DAYS(‘2019-06-01’)) ENGINE = InnoDB,
PARTITION p_201907 VALUES LESS THAN (TO_DAYS(‘2019-07-01’)) ENGINE = InnoDB,
PARTITION p_201908 VALUES LESS THAN (TO_DAYS(‘2019-08-01’)) ENGINE = InnoDB,
PARTITION p_201909 VALUES LESS THAN (TO_DAYS(‘2019-09-01’)) ENGINE = InnoDB,
PARTITION p_201910 VALUES LESS THAN (TO_DAYS(‘2019-10-01’)) ENGINE = InnoDB,
PARTITION p_201911 VALUES LESS THAN (TO_DAYS(‘2019-11-01’)) ENGINE = InnoDB,
PARTITION p_201912 VALUES LESS THAN (TO_DAYS(‘2019-12-01’)) ENGINE = InnoDB,
PARTITION p_202001 VALUES LESS THAN (TO_DAYS(‘2020-01-01’)) ENGINE = InnoDB,
PARTITION p_202002 VALUES LESS THAN (TO_DAYS(‘2020-02-01’)) ENGINE = InnoDB,
PARTITION p_202003 VALUES LESS THAN (TO_DAYS(‘2020-03-01’)) ENGINE = InnoDB,
PARTITION p_202004 VALUES LESS THAN (TO_DAYS(‘2020-04-01’)) ENGINE = InnoDB,
PARTITION p_202005 VALUES LESS THAN (TO_DAYS(‘2020-05-01’)) ENGINE = InnoDB,
PARTITION p_202006 VALUES LESS THAN (TO_DAYS(‘2020-06-01’)) ENGINE = InnoDB,
PARTITION p_202007 VALUES LESS THAN (TO_DAYS(‘2020-07-01’)) ENGINE = InnoDB,
PARTITION p_202008 VALUES LESS THAN (TO_DAYS(‘2020-08-01’)) ENGINE = InnoDB,
PARTITION p_202009 VALUES LESS THAN (TO_DAYS(‘2020-09-01’)) ENGINE = InnoDB,
PARTITION p_202010 VALUES LESS THAN (TO_DAYS(‘2020-10-01’)) ENGINE = InnoDB,
PARTITION p_202011 VALUES LESS THAN (TO_DAYS(‘2020-11-01’)) ENGINE = InnoDB,
PARTITION p_202012 VALUES LESS THAN (TO_DAYS(‘2020-12-01’)) ENGINE = InnoDB,
PARTITION p_202101 VALUES LESS THAN (TO_DAYS(‘2021-01-01’)) ENGINE = InnoDB,
PARTITION p_202102 VALUES LESS THAN (TO_DAYS(‘2021-02-01’)) ENGINE = InnoDB,
PARTITION p_202103 VALUES LESS THAN (TO_DAYS(‘2021-03-01’)) ENGINE = InnoDB,
PARTITION p_202104 VALUES LESS THAN (TO_DAYS(‘2021-04-01’)) ENGINE = InnoDB,
PARTITION p_202105 VALUES LESS THAN (TO_DAYS(‘2021-05-01’)) ENGINE = InnoDB,
PARTITION p_202106 VALUES LESS THAN (TO_DAYS(‘2021-06-01’)) ENGINE = InnoDB,
PARTITION p_202107 VALUES LESS THAN (TO_DAYS(‘2021-07-01’)) ENGINE = InnoDB,
PARTITION p_202108 VALUES LESS THAN (TO_DAYS(‘2021-08-01’)) ENGINE = InnoDB,
PARTITION p_202109 VALUES LESS THAN (TO_DAYS(‘2021-09-01’)) ENGINE = InnoDB,
PARTITION p_202110 VALUES LESS THAN (TO_DAYS(‘2021-10-01’)) ENGINE = InnoDB,
PARTITION p_202111 VALUES LESS THAN (TO_DAYS(‘2021-11-01’)) ENGINE = InnoDB,
PARTITION p_202112 VALUES LESS THAN (TO_DAYS(‘2021-12-01’)) ENGINE = InnoDB,
PARTITION p_2022 VALUES LESS THAN MAXVALUE ENGINE = InnoDB
);
https://blog.csdn.net/zxc131405/article/details/95740018
https://blog.csdn.net/zxc131405/article/details/95739969
https://blog.csdn.net/zxc131405/article/details/95739866

可以看到,表中content和html_content字段值较大,具体表结构及存储值的合理性,不在此篇文章讨论。

需求1:新增一个字段channel varchar(20) DEFAULT NULL COMMENT ‘信息源分类’;

分析:大数据量表直接执行容易导致mysql锁表,导致业务堆积,也可能导致mysql服务宕掉;测试环境mysql试过直接执行"alter table news_abc_article add column channel varchar(20) DEFAULT NULL COMMENT ‘信息源分类’"发现执行很久(3 hour)未响应,于是mysql执行show processlist;把该条alter语句的进程kill掉。

推荐思路:夜间业务量最少的时候暂停业务–>rename表–>新建含channel字段的表–>原来数据导出–>导入到新表–>恢复业务

这个期间,导出数据蛮快的。多提一句,导成格式为×××.sql后,如果需要拷贝到其它机器执行导入,最好要压缩一下,linux上,tar.gz格式是相对压缩率比较高的,tar -zcvf ×××.sql。

需求二:表要重新分区,根据crawl_time(值的格式形如2019-01-01 12:00:14)字段的年月来分区(上一版是根据crawl_year抓取年份来分区的),期间要做的工作和需求一类似,也涉及到导入导出数据,不赘述。【表分区并不会实际减少表数据占用空间,但是在查询时候用到了分区字段的话,会加快查询速度】

需求三:mysql导出导入数据

mysql导出数据方式1:mysql执行select * from news_abc_article where crawl_time>“2019-05-01 00:00:01” into outfile ‘/data.sql’ character set ‘utf8’;

发现报错: ERROR 1290 (HY000): The MySQL server is running with the --secure-file-priv option so it cannot execute this statement

解决方案: https://blog.csdn.net/qq_31518899/article/details/75662090

【不太推荐上述方式】

mysql导出数据方式2:

导出:使用mysql自带的mysqldump【推荐】

mysqldump后面支持很多参数,有兴趣的可以直接执行mysqldump --help查看

1、mysqldump -t -h127.0.0.1 -p3306 -usimonsfan -p htsec_db news_abc_article --where=“crawl_time<‘2019-01-01 00:00:00’” > /data.sql

-t参数表示不导出结构,只导出数据,如上语句即表示导入本台机器中,htsec_db数据库的news_abc_article表中crawl_time条件大于2019年度的数据到根目录/下的data.sql文件中

2、mysqldump -d 数据库名 -u用户名 -p > xxx.sql

-d表示导出结构不导出数据

导入:source /data.sql

登录进mysql中,执行如下

mysql -u用户名 -p密码
source /data.sql
这个过程中,如果导入的数据有错误(比如主键重复、或者其它错误啥的),source执行的导入会跳过错误,自动继续执行导入,直到全部数据导入为止。顺带提一句,linux上的mysql中执行查询时候加上"\G"可以让查询结果格式化,例如"select * from news_abc_article limit 1 \G;",因为今天有个同事看我写这个SQL时吃惊于竟然还可以用"\G"格式化。

需求四:linux查看及释放磁盘空间

首先是几个查看磁盘空间的命令:

du -sh ${dir} :查看机器上某个路径的磁盘空间

df -lh :查看机器的整体磁盘空间

ll -h :查看当前路径下的各个文件占用的磁盘空间

因为某个场景下,我需要把线上mysql某个表数据导入到线下测试环境mysql,在执行导入data.sql(大约50G的样子)时候,发现报错:表空间满了(这个错误忘记截图了),于是乎就用df -lh查看了下磁盘空间情况

执行结果:

[root@e-abc-yq-mysql data]# df -lh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 187G 0 100% /data
http://www.dianyuan.com/people/793982
http://www.dianyuan.com/people/793983
http://www.dianyuan.com/people/793984
http://www.dianyuan.com/people/793985
http://www.dianyuan.com/people/793986
http://www.dianyuan.com/people/793987
http://www.dianyuan.com/people/793988
http://www.dianyuan.com/people/793989
http://www.dianyuan.com/people/793990
https://www.ximalaya.com/youshengshu/25055619/
https://www.ximalaya.com/youshengshu/25055602/
https://www.ximalaya.com/youshengshu/25055564/
https://www.ximalaya.com/youshengshu/25055589/
https://www.ximalaya.com/youshengshu/25055570/
https://www.ximalaya.com/youshengshu/25055798/
https://www.ximalaya.com/youshengshu/25055777/
https://www.ximalaya.com/youshengshu/25055742/
https://www.ximalaya.com/youshengshu/25055722/
https://www.ximalaya.com/youshengshu/25055697/
发现/data/下空间满了(/data目录下是测试环境mysql配置的存放数据目录),于是进去删除了一些不需要的数据(由于测试环境之前测试有一些测试表数据,竟然有80多G),包括一些不需要的表分区数据ibd文件,形如"news_abc_article_test#P#p_202011.ibd"。执行rm -rf ×××.ibd语句,删除了后,用du -sh /data 查看了一下,发现/data空间只占用了100G,和预期的差不多,于是继续执行source /data.sql导入mysql,期间又报错,还是磁盘空间已满,这下懵圈了,脑袋瓜子嗡嗡的,明明刚刚释放80多G空间,怎么又会不够呢?于是df -lh查看了一下,发现/data目录下还是:

[root@e-abc-yq-mysql data]# df -lh

[root@e-abc-yq-mysql data]# df -lh
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 187G 0 100% /data
和没清理前一样,继续折腾了一会还是如此,无奈,找客户方的运维咨询了一下,他们提议说,可以把清理的数据对应的服务重启一下试试,于是就重启了测试环境的mysql,再执行du -sh /data以及df -lh发现ok

[root@e-abc-yq-mysql data]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/centos-root 15G 4.3G 11G 29% /
devtmpfs 16G 7.6G 8.1G 49% /dev
tmpfs 16G 0 16G 0% /dev/shm
tmpfs 16G 137M 16G 1% /run
tmpfs 16G 0 16G 0% /sys/fs/cgroup
/dev/vda1 497M 172M 326M 35% /boot
tmpfs 3.2G 36K 3.2G 1% /run/user/0
/dev/vdb 187G 109G 78G 59% /data
所以结论就是:要重启服务,才能刷新释放服务对应的磁盘空间。偶尔总结一下工作,如有错误,欢迎留言。

作者:simonsfan
来源:CSDN
原文:https://blog.csdn.net/fanrenxiang/article/details/91491778
版权声明:本文为博主原创文章,转载请附上博文链接!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值