前言
公司的一个项目,单表数据量过千万级甚至过亿,刚上线就遇到了性能问题。刚好可以优化一下,记录一下优化过程。
本文使用mysqlslap进行压测,对此工具不熟悉的先看我的另外一篇文章
https://blog.csdn.net/sishenhzy/article/details/104148892
环境
在自己的工作电脑上进行测试
数据库:mysql 5.7
压测工具:mysql自带压测工具mysqlslap
CPU:i5-8250U 4核心8线程
内存:16GB
硬盘:SSD 256GB
表结构
数据要求比较特殊,没有设置主键。
/*DDL 信息*/------------
CREATE TABLE `ads_cusprofile_et_info` (
`dpt_dt` varchar(30) DEFAULT NULL COMMENT '航班日期',
`oc_carr` varchar(20) DEFAULT NULL COMMENT 'OC承运人',
`oc_flt_nbr` varchar(10) DEFAULT NULL COMMENT 'OC航班号',
`dpt_airpt` varchar(100) DEFAULT NULL COMMENT '航段始发机场',
`arrv_airpt` varchar(100) DEFAULT NULL COMMENT '航段到达机场',
`book_oc_cab` varchar(10) DEFAULT NULL COMMENT 'oc订座母仓位',
`iss_dt` varchar(20) DEFAULT NULL COMMENT '出票日期',
`label_id` varchar(64) DEFAULT NULL COMMENT '标签标识',
`label_val` varchar(100) DEFAULT NULL COMMENT '标签值',
`user_num` varchar(10) DEFAULT NULL COMMENT '人数',
`total_money` varchar(10) DEFAULT NULL COMMENT '总金额'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='ads et信息表'
mysql> desc ads_cusprofile_et_info;
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| dpt_dt | varchar(30) | YES | | NULL | |
| oc_carr | varchar(20) | YES | | NULL | |
| oc_flt_nbr | varchar(10) | YES | | NULL | |
| dpt_airpt | varchar(100) | YES | | NULL | |
| arrv_airpt | varchar(100) | YES | | NULL | |
| book_oc_cab | varchar(10) | YES | | NULL | |
| iss_dt | varchar(20) | YES | | NULL | |
| label_id | varchar(64) | YES | | NULL | |
| label_val | varchar(100) | YES | | NULL | |
| user_num | varchar(10) | YES | | NULL | |
| total_money | varchar(10) | YES | | NULL | |
+-------------+--------------+------+-----+---------+-------+
11 rows in set (0.00 sec)
初始化表是没有设置索引的。
测试方法
sql语句
SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt
mysqlslapy压测命令
- 1线程,1请求。相当于只查一次数据库。
mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1 -uroot -proot
- 10线程,10请求。分别10个线程,每个线程执行1次语句
mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10 -uroot -proot
- 100线程,100请求。分别100个线程,每个线程执行1次语句
mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
- 10线程,100请求。分别10个线程,每个线程执行10次语句
mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
- 通过优化前后分别执行mysqlslap压测出来的结果进行时间对比
- 通过
EXPLAIN
对sql语句进行性能分析。
开始测试
. 10万数据
往数据表ads_cusprofile_et_info插入10万数据。
mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 100000 |
+----------+
1 row in set (0.03 sec)
分别执行mysqlslap脚本
1线程,1请求
C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 0.046 seconds
Minimum number of seconds to run all queries: 0.046 seconds
Maximum number of seconds to run all queries: 0.046 seconds
Number of clients running queries: 1
Average number of queries per client: 1
单次请求时间0.046 seconds
10线程,10请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 0.172 seconds
Minimum number of seconds to run all queries: 0.172 seconds
Maximum number of seconds to run all queries: 0.172 seconds
Number of clients running queries: 10
Average number of queries per client: 1
100线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 3.594 seconds
Minimum number of seconds to run all queries: 3.594 seconds
Maximum number of seconds to run all queries: 3.594 seconds
Number of clients running queries: 100
Average number of queries per client: 1
10线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 3.547 seconds
Minimum number of seconds to run all queries: 3.547 seconds
Maximum number of seconds to run all queries: 3.547 seconds
Number of clients running queries: 10
Average number of queries per client: 10
. 100万数据
往数据表ads_cusprofile_et_info插入100万数据。
mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 1000000 |
+----------+
1 row in set (0.31 sec)
分别执行mysqlslap脚本
1线程,1请求
C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 1.500 seconds
Minimum number of seconds to run all queries: 1.500 seconds
Maximum number of seconds to run all queries: 1.500 seconds
Number of clients running queries: 1
Average number of queries per client: 1
单次请求时间 1.500 seconds
10线程,10请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 2.500 seconds
Minimum number of seconds to run all queries: 2.500 seconds
Maximum number of seconds to run all queries: 2.500 seconds
Number of clients running queries: 10
Average number of queries per client: 1
100线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 38.641 seconds
Minimum number of seconds to run all queries: 38.641 seconds
Maximum number of seconds to run all queries: 38.641 seconds
Number of clients running queries: 100
Average number of queries per client: 1
10线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 45.938 seconds
Minimum number of seconds to run all queries: 45.938 seconds
Maximum number of seconds to run all queries: 45.938 seconds
Number of clients running queries: 10
Average number of queries per client: 10
平均实行时间来看,100万已经明显比10万慢了很多
. 1000万数据
往数据表ads_cusprofile_et_info插入1000万数据。
mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 10000000 |
+----------+
1 row in set (4.20 sec)
分别执行mysqlslap脚本
1线程,1请求
C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 12.046 seconds
Minimum number of seconds to run all queries: 12.046 seconds
Maximum number of seconds to run all queries: 12.046 seconds
Number of clients running queries: 1
Average number of queries per client: 1
单次请求时间 12.046 seconds
10线程,10请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=10 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 48.109 seconds
Minimum number of seconds to run all queries: 48.109 seconds
Maximum number of seconds to run all queries: 48.109 seconds
Number of clients running queries: 10
Average number of queries per client: 1
100线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 523.782 seconds
Minimum number of seconds to run all queries: 523.782 seconds
Maximum number of seconds to run all queries: 523.782 seconds
Number of clients running queries: 100
Average number of queries per client: 1
523.782 seconds,100线程100请求下需要500多秒,CPU飚满。宕机状态了
10线程,100请求
C:\Users\Hzy>mysqlslap --concurrency=10 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=100 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 511.219 seconds
Minimum number of seconds to run all queries: 511.219 seconds
Maximum number of seconds to run all queries: 511.219 seconds
Number of clients running queries: 10
Average number of queries per client: 10
1000万数据下,基本没法进行了。
EXPLAIN 分析:
Explain 用法可以参考下面这篇博客:
https://segmentfault.com/a/1190000008131735
mysql> EXPLAIN SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt;
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | ads_cusprofile_et_info | NULL | ALL | NULL | NULL | NULL | NULL | 9862965 | 1.11 | Using where; Using temporary; Using filesort |
+----+-------------+------------------------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------+
1 row in set, 1 warning (0.00 sec)
分析结果:
type:ALL
就是全表扫描。
possible_keys 、key:NULL
说明没有用到索引。
rows :9862965
可能扫描的行数,接近1千万行。也说明全表扫描。(实际扫描行数有偏差)
filtered :1.11
只过滤了1.11%的数据。
Extra:Using where; Using temporary; Using filesort
- Using temporary: 查询有使用临时表, 一般出现于排序, 分组和多表 join 的情况, 查询效率不高, 需要优化
- Using filesort: 表示 MySQL 需额外的排序操作, 不能通过索引顺序达到排序效果. 一般有 Using filesort, 都建议优化去掉, 因为这样的查询 CPU 资源消耗大
开始优化:
- label_id字段添加索引。
ALTER TABLE `vi_db`.`ads_cusprofile_et_info` ADD KEY `INX_LABEL_ID` (`label_id`);
再次查询分析:
C:\Users\Hzy>mysqlslap --concurrency=1 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_id = 'sex' AND iss_dt >= '20181228' AND iss_dt <= '20191228' GROUP BY iss_dt" --number-of-queries=1 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 16.062 seconds
Minimum number of seconds to run all queries: 16.062 seconds
Maximum number of seconds to run all queries: 16.062 seconds
Number of clients running queries: 1
Average number of queries per client: 1
这时候发现查询耗时更长了,是因为没有用上索引?
其实不是的,正式因为走了索引,如果只select 索引的字段label_id,那速度是非常高的,但是如果select了非索引的其他字段(我这里是iss_dt,user_num),这些字段的的值不在索引的磁盘页上,所以需要回表查询这些非索引列的数据,因此会产生大量的随机io。这里虽然通过索引扫描的行数变少了,但是通过索引查找其他非索引列的值所产生的随机io将变得巨大,性能还不如全表扫描,因此这里时间边长的原因是
回表查询非索引列随机io
这个原因造成的。
试想一下,千万级别的表,索引过滤后如果还有百万的数据量,都回表随机io查数据,那得慢成啥样。
那是不是要将所有影响结果的字段都建立索引就可以解决问题?
也不是的,mysql一般情况下只会选择一个索引进行筛选行。原因请参考以下博客:
因此。要优化这类型亿级,千万级的表,可以参考以下要点:
- 建立联合索引,并且遵循最左原则,第一个索引位置放在第一个查询条件,并且第一个查询条件的值相比其他索引的值应该是最分散的。
- 所有需要select的字段都做索引,避免走索引后再回表查询其他字段,这里查出来都是至少百万的结果集,再回表io,消耗巨大。也就是直接从索引中获取结果集。
- 时间iss_dt字段修改为datetime或者int数值类型,user_num字段修改为int 数值类型。做> < 之类的条件查询或者sum之类的数值类操作效率更高,避免mysql进行数据类型转换。
添加联合索引
ALTER TABLE `vi_db`.`ads_cusprofile_et_info` ADD INDEX `IDX_LABEL` (`label_val`, `label_id`, `iss_dt`, `user_num`, `total_money`);
加完以后表结构是这样子的
/*DDL 信息*/------------
CREATE TABLE `ads_cusprofile_et_info` (
`dpt_dt` varchar(30) DEFAULT NULL COMMENT '航班日期',
`oc_carr` varchar(20) DEFAULT NULL COMMENT 'OC承运人',
`oc_flt_nbr` varchar(10) DEFAULT NULL COMMENT 'OC航班号',
`dpt_airpt` varchar(100) DEFAULT NULL COMMENT '航段始发机场',
`arrv_airpt` varchar(100) DEFAULT NULL COMMENT '航段到达机场',
`book_oc_cab` varchar(10) DEFAULT NULL COMMENT 'oc订座母仓位',
`iss_dt` int(20) DEFAULT NULL COMMENT '出票日期',
`label_id` varchar(64) DEFAULT NULL COMMENT '标签标识',
`label_val` varchar(100) DEFAULT NULL COMMENT '标签值',
`user_num` int(10) DEFAULT NULL COMMENT '人数',
`total_money` varchar(10) DEFAULT NULL COMMENT '总金额',
KEY `IDX_LABEL` (`label_val`,`label_id`,`iss_dt`,`user_num`,`total_money`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='ads et信息表'
sql语句改为
SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_val='女' and label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt
表数据量加到4000万。
mysql> select count(1) from ads_cusprofile_et_info;
+----------+
| count(1) |
+----------+
| 40000000 |
+----------+
1 row in set (19.97 sec)
C:\Users\Hzy>mysqlslap --concurrency=100 --create-schema=vi_db --query="SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_val='女' and label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt" --number-of-queries=1000 -uroot -proot
mysqlslap: [Warning] Using a password on the command line interface can be insecure.
Benchmark
Average number of seconds to run all queries: 0.141 seconds
Minimum number of seconds to run all queries: 0.141 seconds
Maximum number of seconds to run all queries: 0.141 seconds
Number of clients running queries: 100
Average number of queries per client: 10
0.141 seconds
可以看到,100并发,1000请求的查询都变得超级快,4000万数据也是毫秒级返回。
直接执行sql
mysql> SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_val='女' and label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt;
+----------+---------+
| date | count |
+----------+---------+
| 20181228 | 4000784 |
| 20181229 | 4915056 |
| 20181230 | 512 |
| 20190110 | 283812 |
| 20190112 | 495144 |
| 20191121 | 182784 |
| 20191122 | 251136 |
| 20191123 | 181248 |
| 20191124 | 197376 |
| 20191125 | 224256 |
| 20191126 | 222720 |
| 20191127 | 152832 |
| 20191128 | 176640 |
| 20191129 | 231936 |
| 20191201 | 157440 |
| 20191223 | 5376 |
| 20191225 | 1016064 |
| 20191226 | 1201920 |
| 20191227 | 1099008 |
| 20191228 | 1094400 |
+----------+---------+
20 rows in set (0.96 sec)
也是毫秒级别返回。
EXPLAIN 分析
mysql> explain SELECT iss_dt AS date,SUM( user_num ) AS count FROM ads_cusprofile_et_info WHERE label_val='女' and label_id = 'sex' AND iss_dt >= 20181228 AND iss_dt <= 20191228 GROUP BY iss_dt;
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | ads_cusprofile_et_info | NULL | range | IDX_LABEL | IDX_LABEL | 667 | NULL | 2853312 | 100.00 | Using where; Using index |
+----+-------------+------------------------+------------+-------+---------------+-----------+---------+------+---------+----------+--------------------------+
1 row in set, 1 warning (0.00 sec)
key 显示使用了索引IDX_LABEL
Extra 显示使用了索引,并且没有生成临时表,没有filesort。
因为是在自己的开发电脑上测试调优的,真正到了生产环境,linux系统和生产机器性能会更好,查询速度会更快,优化完以后千万级别甚至亿级别的数据毫秒级返回应该是没问题的。算是优化成功了
写的比较糙,如有错漏,欢迎指出。