数据库调优系列

最新推荐文章于 2024-10-02 00:27:17 发布

老宋别偷懒

最新推荐文章于 2024-10-02 00:27:17 发布

阅读量98

点赞数 1

文章标签：数据库 sql

本文链接：https://blog.csdn.net/weixin_43538215/article/details/120171443

版权

B-Tree(B+Tree)的特性
- 完全匹配： inde(name) ===> where name = ‘xxx’
- 范围匹配： index(age) ===> where age > 5;
- 前缀匹配： index(name) ===> where name like ‘xx%’
B-Tree(B+Tree)的限制
- index(name,age,sex)
- 查询条件不包括最左列，无法使用索引。
- ===> where age = 5 and sex = 1 无法使用索引
- 跳过了索引中的列，则无法完全使用索引。
- ===> where name = ‘xxx’ and sex = 32 只能用name这一列
- 查询中有某个列的范围(模糊)查询，则其右边所有列都无法使用索引
- ===> where name=‘xxx’ and sex = 1 只能用name、age 两列
Hash索引
- 一般性能比B-Tree(B+Tree )要好一些
Hash索引限制
- Hash索引并不是按照索引值排序，所以没办法使用排序。
- 不支持部分索引列匹配查找。
- ===> hash(a,b) -> where a = 1 无法使用索引
- 只支持等值查询(例如 = 、IN) ，不支持范围查询、模糊查询。
- Hash冲突越严重，性能下降越厉害。

创建索引的原则

建议创建索引的场景
- select语句，频繁作为where条件的字段。
- update/delete语句的where条件。
- 需要分组、排序的字段。
- distinct所使用的字段。
- 字段的值有唯一性约束。
- 对于多表查询，联接字段应创建索引，且类型务必保持一致（避免隐式转换）。
不建议创建索引的场景
- where子句里用不到的字段。
- 表的记录非常少。
- 有大量的重复数据，选择性低。
- ===> 索引的选择性越高，查询效率越好，因为可以在查找是过滤更多行。
- 频繁更新的字段，如果创建索引要考虑其索引维护开销。

导致索引失效的场景

索引不独立。独立是指：列不能是表达式的一部分，也不能是函数的参数。.

explain select * from employees where emp_no + 1 = 10003;
#解决方案：事先计算好表达式的值，在传过来，避免在SQL where 条件的左侧做计算
explain select * from employees where emp_no = 10002;

explain select * from employees where substring(first_name , 1 , 3) = 'Geo'
#解决方案：预先计算好结果，在传过来，在where条件的左侧，不要使用函数；或者使用等价的SQL去实现
explain select * from employees where first_name like 'Geo%'

使用了左模糊。

explain select * from employees where first_name like  '%Geo%'
#解决方案：尽量不去使用左模糊
explain select * from employees where first_name like  'Geo%'

使用了OR查询的部分字段没有索引。

explain select * from employees where first_name = 'Georgi' or last_name = 'Georgi';
#解决方案：全部添加索引(添加索引列 last_name)
explain select * from employees where first_name = 'Georgi' or last_name = 'Georgi';

字符串条件未使用 ’ ’ 引起来。

explain select * from dept_emp where dept_no = 3;
#解决方案：规范的编写SQL
explain select * from dept_emp where dept_no = '3';

不符合最左前缀查询的原则。

# 建立索引为index(last_name,first_name)
explain select * from employees where first_name = 'Facello'
#解决方案：使用正确是查询顺序，修改sql语句或者调整索引顺序
explain select * from employees where last_name = 'Facello'

索引字段建议添加NOT NULL 约束。
===> 单列索引无法使用null值，复合索引无法存储全为null的值
===>查询的时候使用is null 为查询条件时，不能利用到索引，只能全表扫描
===>Mysql官方建议尽量把字段定义为not null

explain select * from employees where first_name is null;
#解决方案：把索引字段设置成NOT NULL，甚至可以把所有字段设置成为NOT NULL

隐式转换导致索引失效。

连接表去查询的时候，查询条件的类型不一致，比如说一个是varcahr类型的，一个是int类型的，就会导致隐式
转换，导致索引失效。

索引调优

长字段的索引调优 ( 说明：在实际项目中遇到对长字段进行索引又遇到瓶颈的时候该如何处理？)

 比如是first_name 是一个存储很长的字段，我们去检索的时候分为两种情况，一种是精确搜索，一种是模糊搜索
 Q:对于这种精确查询我们该怎么优化呢？
 select * from employees where first_name = 'Facello';
 A:我们需要额外增加一个字段，可以存储这个长字段的hash值，mysql默认有crc32()或者fnc64()两种方法。
 例如增加字段 first_name_hash 来存储 first_name 的hash值 插入时可以：
 insert into employees(emp_no, birth_date, first_name, last_name, gender, hire_date,
 first_name_hash) value (999999,now(),'tom.................................','tom','M',
 now(),crc32('tom.................................'));
 然后可以对 first_name_hash 建立索引，执行查询语句：
 select * from employees where first_name_hash = crc32('Facello') and	
 first_name = 'Facello';
 后面加上 [first_name = 'Facello'] 是为了避免hash冲突.
 
###########################################################

Q:对于遇到模糊查询我们该怎么优化呢？
select * from employees where first_name like 'Facello%';
A:我们可以使用前缀索引，建立前缀索引
alert table employess add key (first_name(7));
这里有个值是7 ， 表示给这个字段的前几位做前缀索引呢？那么这个值应该怎么取呢？
它的取值标准是来自 一个值 叫索引的选择性， 数值越大，表示选择性越高，性能越好
索引的选择性 = 不重复的索引值/数据表的总记录数
例如：下面公式得出 first_name 这个字段的索引的选择性为 0.0042
select count(distinct first_name) / count(*) from employees;
我们可以尝试使用取值，得出最接近0.0042 的值， 5: 0.0038   6: 0.0041   7:0.0042
select count(distinct left(first_name , 5))/count(*) from employees;
结论：
alter table employees add key (first_name(7));

#########################################################

Q:如果遇到需要后匹配的话，可以用到“后缀索引”?
A:“后缀索引”: mysql不支持后缀索引 ， 只需要额外创建一个字段，比如说first_name_reverse,在存储
的时候，把 first_name 翻转过来在存储，比如：Facello ===> ollecaF 存储到first_name_reverse，
然后再为这个字段创建一个前缀索引

单列索引和联合索引

explain select * from salaries where from_date = '1986-06-26' and to_date = '1987-06-26';
创建两个单列索引：130 ms (execution: 43 ms, fetching: 87 ms)

覆盖索引

 对于索引X,SELECT的字段只需要从索引就能获得，而无需到表数据里获取，这样的索引就叫覆盖索引。

explain select * from salaries where from_date = '1986-06-26' and to_date = '1987-06-26';
type: ref rows: 86 Extra: null

explain select from_date , to_date  from salaries where from_date = '1986-06-26' 
and to_date = '1987-06-26';
type: ref rows: 86 Extra: usering index

可以通过来比较两次开销：
SET OPTIMIZER_TRACE = "enabled=on" , END_MARKERS_IN_JSON = ON;
SET optimizer_trace_offset=-30, optimizer_trace_limit=30;

SELECT * FROM information_schema.OPTIMIZER_TRACE
WHERE QUERY LIKE '%salaries%'
LIMIT 30;


 结论：尽量只查询想要的字段
 原因一：有可能是使用到覆盖索引
 原因二：减少网络传输的开销

重复索引、冗余索引、未使用的索引【索引是有开销的】

重复索引错误实例：
create table test_table(
id int not null primary key auto_increment,
a int not null,
b int not null,
UNIQUE(id),
INDEX(id)
) ENGINE = InnoDB;
改进：发生了重复的索引
create table test_table(
id int not null primary key auto_increment,
a int not null,
b int not null
) ENGINE = InnoDB;

冗余索引：如果已经存在索引index(A, B), 又创建了index(A)，那么index(A)就是index(A,B)的冗余索引	

未使用的索引：创建但是没有用到的索引。

JOIN优化

JOIN的种类
- 在这里插入图片描述

1.left join;
2.right join;
3.inner join;
4.left join ON a.key = b.key where b.key is null;
5.right join on a.key = b.key where a.key is null;
5.full outer join;
5.full outer join on a.key = b.key where a.key is null or b.key is null;
6.cross join;[求笛卡尔连接值]
7.cross join on a,key = b.key; ==> [相当于 inner join]

Table  Join Type
t1		range
t2		ref
t3    ALL

对于上面的表结构JOIN算法伪代码：

JOIN算法1 - Nested-Loop Join(NLJ) 嵌套循环算法

for each row in t1 matching range{
	for each row in t2 matching reference key{
		for each row in t3{
			if row satisfies join conditions, send to client
		}
	}
}

结论：会全表扫描

JOIN算法2 - Block-Nested-Loop Join(BNLJ) 块循环嵌套算法

for each row in t1 matching range{
	for each row in t2 matching reference key{
		store used columns from t1,t2 in join buffer
		if buffer is full{
			for each row in t3{
				for each t1,t2 combination in join buffer{
					if row satisfies join conditions, send to client
				}
			}
		}
	}
}
结论：把t1 , t2 需要用到的字段存到join buffer[连接缓存]中，然后去扫描t3表。

如果一次性将1 , t2 需要用到的字段存到join buffer中，需要扫描1次t3表，如果一次存不进去的话需要扫描
几次t3表呢？
计算公式： (S * C)/join_buffer_size + 1
		S:缓存t1/t2表的一行数据需要的空间 
		C:缓存的行数
		S * C:如果缓存所有的行数需要多大的空间
		join_buffer_size: join buffer 的大小

使用join buffer的条件
1.连接类型是ALL、index或range
2.第一个非常量表不会分配join buffer，即使类型是ALL或者是index
3.join buffer只会缓存需要的字段，而非整行数据
4.可以通过join_buffer_size变量设置join buffer的大小

show variables like 'join_buffer_size';
设置当前session的join_buffer_size
set join_buffer_size = 1024 * 1024 * 50;
设置全局的join_buffer_size中join_buffer_size,不建议
set global join_buffer_size = 1024 * 1024 * 50;