优化SQL：提升大数据时代性能瓶颈的关键策略-CSDN博客

本文链接：https://blog.csdn.net/xiaomengxin521/article/details/121707215

为什么要优化sql

随着生产环境数据量的急剧增长，很多sql开发逐渐出现性能问题，对生产的影响越来越大，sql语句成为整个系统的瓶颈

查看sql执行频率

show global status 'Com_______'查看全局的执行频率
show global status like ‘Innodb_rows_%’

定位低效率执行sql

1、开启慢查询日志

2、show processlist

查看当前mysql在进行的线程，包括线程状态，是否锁表等，可以实时地查看sql的执行情况同时对一些锁表操作进行优化
各列的含义和用途
	id列：一个标识，kill的时候很有用
	User列：显示当前用户，如果不是root这个命令只显示你权限范围内的sql语句
	Host列：显示这个语句是从那个ip的哪个端口上发出的，可以用来追踪出问题语句的用户
	db列：显示这个进程连接的是哪个数据库
	Command列：显示当前连接的执行命令，一般就是休眠（sleep），查询（query），连接（connect）
	time列：此这个状态持续的时间单位是秒
	state：只是一个执行中的状态
	info：显示这个sql语句，因为长度有限，一般显示不全，但是一个判断问题语句的重要依据

3、show profile分析sql

查看系统是否支持profiling：
	select @@have_profiling
查看系统是否开启profiling：
	select @@profiling
查询查询语句总耗时：
	show profiles
查看具体的查询占用时间：
	show profile for query 191（查询sql语句id）;
	-- sending data：从查询开始到数据返回到客户端

执行计划

查看执行计划 explain select * from emp;
在这里插入图片描述

id

sql语句执行的顺序
1、如果id相同，那么执行顺序从上到下
2、如果id不同，如果是子查询，id的序号会递增，id值越大优先级越高，越先被执行
3、id相同和不同的，同时存在：
	相同的可以认为是一组，从上往下顺序执行，在所有组中，id值越大，优先级越高，越先执行

select_type

区分sql的查询类型
sample:简单的查询，不包含子查询和union
	explain select * from emp;
	explain select id,name from t_user where id in (select id from t_user where id = 1);
primary:查询中若包含任何复杂的子查询，最外层查询则被标记为Primary
	explain select staname,ename supname from (select ename staname,mgr from emp) t join emp on t.mgr=emp.empno ;
union:若第二个select出现在union之后，则被标记为union
dependent union:跟union类似，此处的depentent表示union或union all联合而成的结果会受外部表影响
union result:从union表获取结果的select
	explain select id,name from t_user where id = 1 union select id,name from t_user where id = 2;
subquery:在select或者where列表中包含子查询
	explain select t1.id,(select tt.name from t_user tt where t1.id = tt.id) name from t_user t1;
	explain select * from t_user where id > (select min(id) from t_user);
dependent subquery:subquery的子查询要受到外部表查询的影响
	explain select * from emp e where e.deptno in (select distinct deptno from dept);
DERIVED: from子句中出现的子查询，也叫做派生类
UNCACHEABLE SUBQUERY：表示使用子查询的结果不能被缓存

table

表示sql正在访问哪一个表，表名或者别名，可能是临时表或者union合并结果集
1、如果是具体表名，则表明从实际的物理表中获取数据
2、表名是derived的形式，表示使用了id为N的查询产生的衍生表
3、当有union result的时候，表名是union n1,n2等形式，n1,n2表示参与union的id

type

type显示的是访问类型，访问类型表示用何种方式去访问我们的数据；
效率从高到低：
	system > const > eq_ref > ref > fulltext > ref_or_null > index_merge > unique_subquery > index_subquery > range > index > ALL 

all:全表扫描，一般情况下出现这样的sql语句而且数据量比较大的话那么就需要进行优化
		explain select id from t_user;
index: 全索引扫描这个比all的效率要好，主要有两种情况，一种是当前的查询时覆盖索引，即我们需要的数据在索引中就可以索取，或者是使用了索引进行排序，这样就避免数据的重排序
		explain select id from t_user;
range: 表示利用索引查询的时候限制了范围，在指定范围内进行查询，这样避免了index的全索引扫描，适用的操作符： =, <>, >, >=, <, <=, IS NULL, BETWEEN, LIKE, or IN() 
		explain select * from t_user where id between 2 and 3;
index_subquery: 利用索引来关联子查询，不再扫描全表
unique_subquery: 该连接类型类似与index_subquery,使用的是唯一索引
index_merge: 在查询过程中需要多个索引组合使用
ref_or_null: 对于某个字段即需要关联条件，也需要null值的情况下，查询优化器会选择这种访问方式
ref: 使用了非唯一性索引进行数据的查找
		explain select * from t_user where sex = 1;
eq_ref: 使用唯一性索引进行数据查找
		explain select * from emp,emp2 where emp.empno = emp2.empno;
const: 这个表至多有一个匹配行，
		explain select * from t_user where id = 1;
system：表只有一行记录（等于系统表），这是const类型的特例，平时不会出现		
INSERT INTO runoob_tbl (runoob_title, runoob_author, submission_date) VALUES ("学习 PHP", "菜鸟教程", NOW());

possible_key

可能使用到的索引

key

properties 实际使用的索引

key_len

properties 索引长度：越短效率越高

rows

可能扫描的行数，越少效率越高

ref

显示哪列的索引被使用了

extra

额外的信息
using filesort: 
	说明mysql无法利用索引进行排序，只能利用排序算法进行排序，会消耗额外的位置
using temporary: 
	建立临时表来保存中间结果，查询完成之后把临时表删除
using index: 
	这个表示当前的查询时覆盖索引的，直接从索引中读取数据，而不用访问数据表
	explain select name,count(1) from t_user where sex = 1 group by name;
using where: 
	使用where进行条件过滤
impossible where：
	where语句的结果总是false

通过索引优化

索引匹配方式

1、全值匹配: 全值匹配指的是和索引中的所有列进行匹配
	explain select * from staffs where name = 'July' and age = '23' and pos = 'dev';
2、匹配最左前缀: 只匹配前面的几列
	explain select * from staffs where name = 'July' and age = '23';
	explain select * from staffs where name = 'July' and age = '23';
3、匹配前缀: 可以匹配某一列的值的开头部分
	explain select * from staffs where name like 'J%';
4、匹配范围: 可以查找某一个范围的数据
	explain select * from t_user where id > 2;
5、精确匹配某一列并范围匹配另外一列: 可以查询第一列的全部和第二列的部分
	explain select * from t_user where sex = 1 and age > 10;
6、只访问索引的查询: 只访问索引，不需要访问数据行，本质上就是覆盖索引
	explain select name,sex,age from t_user where sex = 1 and age = 10;

组合索引

当包含多个列作为索引，需要注意的是正确的顺序依赖于该索引的查询，同时需要考虑如果更好的满足排序和分组的需要建立a,b,c索引，不同SQL语句使用索引情况
where a=3 							只使用了a
where a=3 and b=5					使用a，b
where a=3 and b=5 and c=4			使用了a，b，c
where b=3 or where c=4				否
where a=3 and c=4					仅使用a
where a=3 and b>10 and c=7			使用了a，b
where a=3 and b like '%xx%' and c=7 使用了a

SQL优化

关联查询

三种关联查询
1、Simple Nested-Loop Join 简单的关联查询，类似直接排序for内嵌套for
2、Index Nested-Loop Join 索引关联查询
3、Block Nested-Loop Join 缓存块关联查询
	1、join buffer会缓存所有参与查询的列，而不是只有join列
	2、可以调整join_buffer_size缓存大小
	3、join_buffer_size的默认值是256k，join_buffer_size的最大值在
	4、使用block Nested-Loop Join算法需要开启优化器管理配置的optimizer_switch的设置
		block_nested_loop为on，默认开启
show variables like '%optimizer_switch%';

插入

批量

load data local infile 'path/file.log' into table 'table_name' fields terminated by ',' lines terminated by '\n';
尽量导入id有序的数据，无序的数据数据库要重新维护插入
关闭唯一校验 setUNIQUE_CHECKS=0
关闭事务自动提交 set AUTOCOMMIT=0

少量

尽量使用减少重复获取连接 insert into table_name values(...,...,...);
手动事务管理
数据有序插入

order by 语句

using index 使用索引进行排序（返回索引列）
	要么全部升序，要么全部降序
	如果是复合索引，要保证索引顺序
filesort 通过文件系统进行排序
	两次扫描算法：效率比较低（先排序字段和指针在sort buffer中排序 然后在临时表中排序）
	一次扫描算法：性能高吃内存（直接在sort buffer中排序）
	如果max_length_for_sort_data的大于Query语句字段总大小用一次扫描算法可以适当提高sort_buffer_size 和 max_length_for_soft_data系统变来增大排序区的大小
		show variables like 'max_length_for_sort_data';
		show variables like 'sort_buffer_size';

优化 group by 语句

group by 字段添加索引
group by会先进行排序再进行分组
	如果只要分组不排序，可以在分组后面添加 order by null优化

使用sql提示

sql提示是优化数据库的一个重要手段，简单来说就是在sql语句中加入一些认为的提示来达到优化操作的目的

USE INDEX(建议使用) ---> 
	explain select * from t_corp use index (index_gbm_dm) where c_gbm = '000';
IGNORE INDEX ---> 
	explain select * from t_corp IGNORE index (index_corp_gbm) where c_gbm = '000';
FORCE INDEX（强制使用）--->
	explain select * from t_corp force index (index_gbm_dm) where c_dm is not null;