MySQL千万级、亿级数据性能调优

前言

公司的一个项目,单表数据量过千万级甚至过亿,刚上线就遇到了性能问题。刚好可以优化一下,记录一下优化过程。

本文使用mysqlslap进行压测,对此工具不熟悉的先看我的另外一篇文章
https://blog.csdn.net/sishenhzy/article/details/104148892

环境

在自己的工作电脑上进行测试

数据库:mysql 5.7
压测工具:mysql自带压测工具mysqlslap
CPU:i5-8250U 4核心8线程
内存:16GB
硬盘:SSD 256GB

表结构

数据要求比较特殊,没有设置主键。

/*DDL 信息*/------------

CREATE TABLE `ads_cusprofile_et_info` (
  `dpt_dt` varchar(30) DEFAULT NULL COMMENT '航班日期',
  `oc_carr` varchar(20) DEFAULT NULL COMMENT 'OC承运人',
  `oc_flt_nbr` varchar(10) DEFAULT NULL COMMENT 'OC航班号',
  `dpt_airpt` varchar(100) DEFAULT NULL COMMENT '航段始发机场',
  `arrv_airpt` varchar(100) DEFAULT NULL COMMENT '航段到达机场',
  `book_oc_cab` varchar(10) DEFAULT NULL COMMENT 'oc订座母仓位',
  `iss_dt` varchar(20) DEFAULT NULL COMMENT '出票日期',
  `label_id` varchar(64) DEFAULT NULL COMMENT '标签标识',
  `label_val` varchar(100) DEFAULT NULL COMMENT '标签值',
  `user_num` varchar(10) DEFAULT NULL COMMENT '人数',
  `total_money` varchar(10) DEFAULT NULL COMMENT '总金额'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='ads et信息表'
mysql> desc ads_cusprofile_et_info;
+-------------+--------------+------+-----+---------+-------+
| Field       | Type         | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| dpt_dt      | varchar(30)  | YES  |     | NULL    |       |
| oc_carr     | varchar(20)  | YES  |     | NULL    |       |
| oc_flt_nbr  | varchar(10)  | YES  |     | NULL    |       |
| dpt_airpt   | varchar(100) | YES  |     | NULL    |       |
| arrv_airpt  | varchar(100) | YES  |     | NULL    |       |
| book_oc_cab | varchar(10)  | YES  |     | NULL    |       |
| iss_dt      | varchar(20)  | YES  |     | NULL    |       |
| label_id    | varchar(64)  | YES  |     | NULL    |       |
| label_val   | varchar(100) | YES  |     | NULL    |       |
| user_num    | varchar(10)  | YES  |     | NULL    |       |
| total_money | varchar(10)  | YES  |     | NULL    |       |
+-------------+--------------+------+-----+---------+-------+
11 rows in set (0.00 sec)

初始化表是没有设置索引的。

测试方法

sql语句

SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt

mysqlslapy压测命令

  • 1线程,1请求。相当于只查一次数据库。
mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1  -uroot -proot
  • 10线程,10请求。分别10个线程,每个线程执行1次语句
mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10  -uroot -proot
  • 100线程,100请求。分别100个线程,每个线程执行1次语句
mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
  • 10线程,100请求。分别10个线程,每个线程执行10次语句
mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
  1. 通过优化前后分别执行mysqlslap压测出来的结果进行时间对比
  2. 通过EXPLAIN对sql语句进行性能分析。

开始测试

. 10万数据

往数据表ads_cusprofile_et_info插入10万数据。

mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
|   100000 |
+----------+
1 row in set (0.03 sec)

分别执行mysqlslap脚本

1线程,1请求

C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 0.046 seconds
        Minimum number of seconds to run all queries: 0.046 seconds
        Maximum number of seconds to run all queries: 0.046 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1

单次请求时间0.046 seconds

10线程,10请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 0.172 seconds
        Minimum number of seconds to run all queries: 0.172 seconds
        Maximum number of seconds to run all queries: 0.172 seconds
        Number of clients running queries: 10
        Average number of queries per client: 1

100线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 3.594 seconds
        Minimum number of seconds to run all queries: 3.594 seconds
        Maximum number of seconds to run all queries: 3.594 seconds
        Number of clients running queries: 100
        Average number of queries per client: 1

10线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 3.547 seconds
        Minimum number of seconds to run all queries: 3.547 seconds
        Maximum number of seconds to run all queries: 3.547 seconds
        Number of clients running queries: 10
        Average number of queries per client: 10

. 100万数据

往数据表ads_cusprofile_et_info插入100万数据。

mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
|  1000000 |
+----------+
1 row in set (0.31 sec)

分别执行mysqlslap脚本

1线程,1请求

C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 1.500 seconds
        Minimum number of seconds to run all queries: 1.500 seconds
        Maximum number of seconds to run all queries: 1.500 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1

单次请求时间 1.500 seconds

10线程,10请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 2.500 seconds
        Minimum number of seconds to run all queries: 2.500 seconds
        Maximum number of seconds to run all queries: 2.500 seconds
        Number of clients running queries: 10
        Average number of queries per client: 1

100线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 38.641 seconds
        Minimum number of seconds to run all queries: 38.641 seconds
        Maximum number of seconds to run all queries: 38.641 seconds
        Number of clients running queries: 100
        Average number of queries per client: 1

10线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 45.938 seconds
        Minimum number of seconds to run all queries: 45.938 seconds
        Maximum number of seconds to run all queries: 45.938 seconds
        Number of clients running queries: 10
        Average number of queries per client: 10

平均实行时间来看,100万已经明显比10万慢了很多

. 1000万数据

往数据表ads_cusprofile_et_info插入1000万数据。

mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (4.20 sec)

分别执行mysqlslap脚本

1线程,1请求

C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 12.046 seconds
        Minimum number of seconds to run all queries: 12.046 seconds
        Maximum number of seconds to run all queries: 12.046 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1

单次请求时间 12.046 seconds

10线程,10请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 48.109 seconds
        Minimum number of seconds to run all queries: 48.109 seconds
        Maximum number of seconds to run all queries: 48.109 seconds
        Number of clients running queries: 10
        Average number of queries per client: 1

100线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 523.782 seconds
        Minimum number of seconds to run all queries: 523.782 seconds
        Maximum number of seconds to run all queries: 523.782 seconds
        Number of clients running queries: 100
        Average number of queries per client: 1

523.782 seconds,100线程100请求下需要500多秒,CPU飚满。宕机状态了

10线程,100请求

C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 511.219 seconds
        Minimum number of seconds to run all queries: 511.219 seconds
        Maximum number of seconds to run all queries: 511.219 seconds
        Number of clients running queries: 10
        Average number of queries per client: 10

1000万数据下,基本没法进行了。

EXPLAIN 分析:

Explain 用法可以参考下面这篇博客:
https://segmentfault.com/a/1190000008131735

mysql> EXPLAIN SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt;
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
| id | select_type | table                  | partitions | type | possible_keys | key  | key_len | ref  | rows    | filtered | Extra                                        |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
|  1 | SIMPLE      | ads_cusprofile_et_info | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 9862965 |     1.11 | Using where; Using temporary; Using filesort |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
1 row in set, 1 warning (0.00 sec)

分析结果:
type:ALL
就是全表扫描。
possible_keys 、key:NULL
说明没有用到索引。
rows :9862965
可能扫描的行数,接近1千万行。也说明全表扫描。(实际扫描行数有偏差)
filtered :1.11
只过滤了1.11%的数据。
Extra:Using where; Using temporary; Using filesort

  • Using temporary: 查询有使用临时表, 一般出现于排序, 分组和多表 join 的情况, 查询效率不高, 需要优化
  • Using filesort: 表示 MySQL 需额外的排序操作, 不能通过索引顺序达到排序效果. 一般有 Using filesort, 都建议优化去掉, 因为这样的查询 CPU 资源消耗大

开始优化:

  1. label_id字段添加索引。
ALTER TABLE `vi_db`.`ads_cusprofile_et_info` ADD KEY `INX_LABEL_ID` (`label_id`); 

再次查询分析:

C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 16.062 seconds
        Minimum number of seconds to run all queries: 16.062 seconds
        Maximum number of seconds to run all queries: 16.062 seconds
        Number of clients running queries: 1
        Average number of queries per client: 1

这时候发现查询耗时更长了,是因为没有用上索引?

其实不是的,正式因为走了索引,如果只select 索引的字段label_id,那速度是非常高的,但是如果select了非索引的其他字段(我这里是iss_dt,user_num),这些字段的的值不在索引的磁盘页上,所以需要回表查询这些非索引列的数据,因此会产生大量的随机io。这里虽然通过索引扫描的行数变少了,但是通过索引查找其他非索引列的值所产生的随机io将变得巨大,性能还不如全表扫描,因此这里时间边长的原因是回表查询非索引列随机io这个原因造成的。
试想一下,千万级别的表,索引过滤后如果还有百万的数据量,都回表随机io查数据,那得慢成啥样。

那是不是要将所有影响结果的字段都建立索引就可以解决问题?
也不是的,mysql一般情况下只会选择一个索引进行筛选行。原因请参考以下博客:

https://dbaplus.cn/news-155-1531-1.html

因此。要优化这类型亿级,千万级的表,可以参考以下要点:
  1. 建立联合索引,并且遵循最左原则,第一个索引位置放在第一个查询条件,并且第一个查询条件的值相比其他索引的值应该是最分散的。
  2. 所有需要select的字段都做索引,避免走索引后再回表查询其他字段,这里查出来都是至少百万的结果集,再回表io,消耗巨大。也就是直接从索引中获取结果集。
  3. 时间iss_dt字段修改为datetime或者int数值类型,user_num字段修改为int 数值类型。做> < 之类的条件查询或者sum之类的数值类操作效率更高,避免mysql进行数据类型转换。

添加联合索引

ALTER TABLE `vi_db`.`ads_cusprofile_et_info` ADD INDEX `IDX_LABEL` (`label_val`, `label_id`, `iss_dt`, `user_num`, `total_money`); 

加完以后表结构是这样子的

/*DDL 信息*/------------

CREATE TABLE `ads_cusprofile_et_info` (
  `dpt_dt` varchar(30) DEFAULT NULL COMMENT '航班日期',
  `oc_carr` varchar(20) DEFAULT NULL COMMENT 'OC承运人',
  `oc_flt_nbr` varchar(10) DEFAULT NULL COMMENT 'OC航班号',
  `dpt_airpt` varchar(100) DEFAULT NULL COMMENT '航段始发机场',
  `arrv_airpt` varchar(100) DEFAULT NULL COMMENT '航段到达机场',
  `book_oc_cab` varchar(10) DEFAULT NULL COMMENT 'oc订座母仓位',
  `iss_dt` int(20) DEFAULT NULL COMMENT '出票日期',
  `label_id` varchar(64) DEFAULT NULL COMMENT '标签标识',
  `label_val` varchar(100) DEFAULT NULL COMMENT '标签值',
  `user_num` int(10) DEFAULT NULL COMMENT '人数',
  `total_money` varchar(10) DEFAULT NULL COMMENT '总金额',
  KEY `IDX_LABEL` (`label_val`,`label_id`,`iss_dt`,`user_num`,`total_money`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='ads et信息表'

sql语句改为

SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE  label_val='女' and  label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt

表数据量加到4000万。

mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 40000000 |
+----------+
1 row in set (19.97 sec)
C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE  label_val='女' and  label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt" --number-of-queries=1000  -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
        Average number of seconds to run all queries: 0.141 seconds
        Minimum number of seconds to run all queries: 0.141 seconds
        Maximum number of seconds to run all queries: 0.141 seconds
        Number of clients running queries: 100
        Average number of queries per client: 10

0.141 seconds

可以看到,100并发,1000请求的查询都变得超级快,4000万数据也是毫秒级返回。

直接执行sql

mysql> SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE  label_val='女' and  label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt;
+----------+---------+
| date     | count   |
+----------+---------+
| 20181228 | 4000784 |
| 20181229 | 4915056 |
| 20181230 |     512 |
| 20190110 |  283812 |
| 20190112 |  495144 |
| 20191121 |  182784 |
| 20191122 |  251136 |
| 20191123 |  181248 |
| 20191124 |  197376 |
| 20191125 |  224256 |
| 20191126 |  222720 |
| 20191127 |  152832 |
| 20191128 |  176640 |
| 20191129 |  231936 |
| 20191201 |  157440 |
| 20191223 |    5376 |
| 20191225 | 1016064 |
| 20191226 | 1201920 |
| 20191227 | 1099008 |
| 20191228 | 1094400 |
+----------+---------+
20 rows in set (0.96 sec)

也是毫秒级别返回。

EXPLAIN 分析

mysql> explain SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE  label_val='女' and  label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt;
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
| id | select_type | table                  | partitions | type  | possible_keys | key       | key_len | ref  | rows    | filtered | Extra                    |
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
|  1 | SIMPLE      | ads_cusprofile_et_info | NULL       | range | IDX_LABEL     | IDX_LABEL | 667     | NULL | 2853312 |   100.00 | Using where; Using index |
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)

key 显示使用了索引IDX_LABEL
Extra 显示使用了索引,并且没有生成临时表,没有filesort。

因为是在自己的开发电脑上测试调优的,真正到了生产环境,linux系统和生产机器性能会更好,查询速度会更快,优化完以后千万级别甚至亿级别的数据毫秒级返回应该是没问题的。算是优化成功了

写的比较糙,如有错漏,欢迎指出。

  • 0
    点赞
  • 18
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值