Mysql索引优化和锁机制

最新推荐文章于 2022-10-30 22:56:14 发布

大熊超人

最新推荐文章于 2022-10-30 22:56:14 发布

阅读量259

点赞数

文章标签： mysql 数据库 database

本文链接：https://blog.csdn.net/m0_46376464/article/details/117109051

版权

1. mysql的架构

存储引擎

对比项	MyISAM	InnoDB
主外键	不支持	支持
事务	不支持	支持
行表锁	表锁，即使操作一条记录也会锁住整个表，不适合高并发的操作	行锁，操作时只锁某一行，不对其他行有影响，适合高并发的操作
缓存	只缓存索引，不缓存真实数据	不仅缓存索引还要缓存真实数据，对内存要求较高，而且内存大小对性能有决定性的影响。
表空间	小	大
关注点	性能	事务
默认安装	Y	Y

mysql逻辑架构分四层
- 连接层：与其他语言的链接，如Perl、Python
- 服务层：mysql查询，内部优化
- 引擎层：可拔插的引擎，innoDB、myISAM
- 存储层：硬件

查看引擎

show engines;
show variables like '%engine%';

2. 索引

1. 性能下降SQL慢，执行时间长，等待时间长原因

查询语句写的烂

索引失效

单值索引（在查的时候只给某个表的某一个字段建索引）

select * from user where name = '';
/*
根据user表的name字段建立索引
*/
create index idx_user_name on user(name);

复合索引

select * from user where name = '' and email = '';
/*
根据user表的name字段和email建立复合索引
*/
create index idx_user_nameEmail on user(name,email);

关联查询太多join（设计缺陷或不得已的需求）
服务器调优及各个参数设置（缓冲、线程数等）

2. 如何进行MySql优化

mysql优化的总结：

开启慢查询，捕获慢sql
explain+慢sql分析
show profile查询sql在Mysql服务器里面的执行细节和生命周期情况
sql数据库服务器参数的调优

3. 什么是索引

索引：是帮助提高mysql高效获取数据的【【数据结构】】
可以简单的理解为：排好序的快速查找数据结构
数据本身之外，数据库还维护着一个满足特定查找算法的【数据结构】，这些数据结构以某种方式指向数据，这样就可以在这些数据的基础上实现高级查找算法，这种数据结构就是索引。
索引往往以索引文件的形式存储在磁盘上
索引往往指的是B树（多路搜索树）结构组织的索引
聚集索引、全文索引、复合索引、前缀索引、唯一索引默认都是使用B+树索引，统称索引。
除了B+树类型，还有哈希索引（hash index）
主键索引一定是唯一索引，反过来不一定

4. 索引的优势劣势

优势
- 提高数据检索的效率，降低数据库IO的成本
- 降低数据排序成本，降低了CPU的消耗
劣势
- 实际上索引也是一张表，该表保存了主键与索引字段，并指向实体表的记录，所以索引列也是要占空间的
- 提高查询速度的同时，降低了更新表的速度（增删改时，索引也会发生变化）
- 数据量较大，需要花时间研究建立最优秀的索引

5. 索引分类

单值索引：一个索引只包含单个列，一个表可以有多个单列索引，但一个表最好不超过五个单值索引
唯一索引：索引列的值必须唯一，但允许有空值
复合索引：一个索引包含多个列

6. 索引基本语法

创建

CREATE INDEX indexName ON mytable(字段);

删除
```
DROP INDEX [indexName] ON mytable;
```
查看
```
show index from mytable;
```

7. 有四种方式来添加数据表的索引

ALTER TABLE table_name ADD PRIMARY KEY(column_list)：该语句添加一个主键，这意味着索引值必须是唯一的，且不能为null；
ALTER TABLE table_name ADD UNIQUE index_name (column_list)：该语句创建索引的值必须是唯一的（除了null，null可能会出现多次）
ALTER TABLE table_name ADD INDEX index_name (column_list)：添加普通索引，所以只可出现多次
ALTER TABLE table_name ADD FULLTEXT index_name (column_list)：该语句指定了索引为FULLTEXT，用于全文索引

8. mysql索引结构

主要探究BTee索引的结构
最底层是叶子节点，只有叶子节点保存数据，其他节点只保存指针和关键字
父节点的关键字在子节点中存在，要么取子节点最大值，要么最小值，如果子节点是升序排列，则父节点关键字取最小值
比如查找28，过程：因为28大于20小于35，所以由数据页1的p2指针指向数据页3，因为28小于30，所以由数据页3的p1指针指向数据页7，最后找到28存储的data

9. 哪些情况需要创建索引

主键自动建立唯一索引
频繁作为查询条件的字段应该创建索引
查询中与其它表关联的字段，外键关系建立索引
频繁更新的字段不适合创建索引，因为每次更新不单单是更新了记录，还会更新索引，加重IO负担
where条件里用不到的字段不创建索引
在高并发下倾向创建组合索引
查询中排序的字段，排序字段若通过索引去访问将大大提高排序速度
查询中统计或者分组字段

10. 哪些情况不需要创建索引

表记录太少（3百万以下）
经常增删改的表
数据重复且分布平均的表字段，因此应该只为最经常查询和最经常排序的数据列建立索引。注意，如果某个数据列包含许多重复的内容，为它建立索引就没有太大的实际效果。

11. 性能分析 Explain

用法：explain sql语句;
作用：
- 表的读取顺序
- 数据读取操作的操作类型
- 哪些索引可以使用
- 是哪索引被实际使用
- 表之间的引用
- 每张表有多少行被优化器查询
explain的字段：
- id
  - id相同，执行顺序从上到下
  - id不同，如果是子查询，id的序号会递增，id值越大优先级越高，越先被执行
- select_type
  - SIMPLE : 最简单的查询，不包含子查询和union
  - PRIMARY：主查询，最外层，最后执行
  - SUBQUERY：from之前的子查询
  - DERIVED：from字句中出现的子查询，也叫做派生表
  - UNION：union关键字后面的表的select_type都是union
  - UNION RESULT:从union表获取结果的select
- table
  - 显示这一张行数据是关于哪张表的
- type
  - 显示查询使用了何种类型，由好到坏依次为：system>const>eq_ref>ref>range>index>All
  - system:系统表，只有一行数据，const的特例，忽视掉
  - const：表示通过索引一次就找到了，用于比较primary和unique索引，因为只匹配一行数据，如where查主键的一个值
  - eq_ref：常用于主键和唯一性索引扫描，对于每一个索引键，表中只有一条记录与之匹配
  - ref：非唯一性索引扫描，返回匹配某个单独值的所有行（符合条件的多行）
  - range:只检索给定范围的行，使用一个索引来选择行。key列显示使用了哪个索引。一般就是在你的where语句中出现了between、<、>、in等的查询。这种范围扫描索引扫描比全表扫描要好，不会扫描全部索引
  - index:.与all的区别在于index只遍历索引数
  - ALL：遍历全表
- possible_keys
  - 显示可能应用在这张表的索引，一个或者多个（但不一定被查询实际使用）
- key
  - 实际使用的索引【查询中若使用了覆盖索引，则该索引仅出现在key列表中】
  - 覆盖索引：select的字段和复合索引的个数和顺序一致，举例：用c1、c2建索引，查询为select c1，c2 from …
- key_len
  - 索引中使用的字节数，可通过该值计算查询中使用的索引的长度。在不损失精度的情况下，【长度越短越好】。该值为索引字段的最大可能长度，【并非实际使用长度】
  - 即key_len是根据表计算而得，不是通过表内检索出来的
- ref
  - 显示索引的哪一列被使用了，如果可能的话，是一个常数。哪些列或常量被用于查找索引列上的值。查询中与其它表关联的字段，外键关系建立索引。
- rows
  - 根据表统计信息及索引选用情况，大致估算出找到所需记录要读取的行数，越小越好
  - 也就是每个表有多少行被优化器查询
- Extra
  - 包含不适合在其他列中显示但十分重要的额外信息
    - using filesort：说明mysql会对数据使用一个外部的索引排序，而不是按照表内的索引顺序进行读取。MySQL中无法利用索引完成的排序操作成为“文件排序”。
    - using temporary：使用了临时表保存中间结果，MYsql对查询结果排序时使用临时表
      常见于order by 和 group by
    - using index：查询中使用了覆盖索引，【效率不错】
      同时出现了using where 表明索引被用来执行索引键值的查找（where 后面的列被用于建索引）
      没出现using where表明索引中的列只用来select，没进行where
    - using index condition：搜索条件中虽然出现了索引列，但是有部分条件无法使用索引，会根据能用索引的条件先搜索一遍再匹配无法使用索引的条件
    - using where：使用了where
    - using join buffer：使用了连接缓存，配置文件里的using buffer调大
    - impossible where：where子句的值总是false，不能用来获取任何元组
    - selec table optimized away：在没有group by子句的情况下，基于索引优化min/max
    - distinct：优化distinct操作，在找到第一匹配的元组后即停止找同样的动作
- Filtered：它指返回结果的行占需要读到的行(rows列的值)的百分比

12. 索引的建立和优化

单表索引优化：查询category_id为1，且comments>1的情况下，views最多的article_id

CREATE table if not exists `article`(
    `id` int(10) unsigned not null primary key auto_increment,
    `author_id` int(10) unsigned not null,
    `category_id` int(10) unsigned not null,
    `views` int(10) unsigned not null,
    `comments` int(10) unsigned not null,
    `title` varbinary(255) not null,
    `content` text not null
);

insert into `article`(`author_id`,`category_id`,`views`,`comments`,`title`,`content`) values
(1,1,1,1,'1','1'),
(2,2,2,2,'2','2'),
(1,1,3,3,'3','3');

select * from article;

explain查询语句，发现 type为all，extra提示using filesort，需要建立索引优化SQL

explain select id, author_id from article where category_id=1 and comments>1 order by views DESC limit 1;

新建索引后再explain，type为range，但是extra还是提示using filesort，因为comments为范围查询，sql无法利用索引再对后面的views进行检索，即range类型查询字段后面的索引无效。此索引不合格，删掉重来

create index idx_article_ccv on article(category_id, comments, views);

创建索引时越过需要范围查询的comments字段，explain发现type为ref，using filesort也没了，优化成功。

create index idx_article_cv on article(category_id, views);

两表索引优化：两表索引优化,左拼给右表加索引。

create table if not exists `class` (
    `id` int(10) unsigned not null AUTO_INCREMENT,
    `card` int(10) unsigned not null,
    primary key(`id`)
);
create table if not exists `book` (
    `bookid` int(10) unsigned not null AUTO_INCREMENT,
    `card` int(10) unsigned not null,
    primary key(`bookid`)
);
insert into class(card) values(FLOOR(1 + (RAND() * 20)));
insert into book(card) values(FLOOR(1 + (RAND() * 20)));

explain查询语句，发现 type为all，需要建立索引优化SQL

explain select * from class left join book on class.card=book.card;

给右表建立索引，分析发现class为all，book为ref

Alter table `book` add index Y (`card`);

给左表建立索引，分析发现class为index，book为all
得出结论：两表索引优化,左连给右表加索引。

三表索引优化：三表连接，用小表驱动大表，索引建立在left 或 right 后面的表

create table if not exists `phone`(
`phoneid` int(10) unsigned not null auto_increment,
`card` int(10) unsigned not null,
primary key(`phoneid`)
)engine=InnoDB;
insert into phone(card) values(floor(1+(rand()*20)));
insert into phone(card) values(floor(1+(rand()*20)));

show index from book;
drop index Y on book;

alter table `phone` add index z (`card`);
alter table `book` add index Y (`card`);

explain select * from class left join book on class.card=book.card left join phone on book.card=phone.card;

13. 如何避免索引失效

create table staffs(
    id int primary key auto_increment,
    NAME varchar(24) not null default '' COMMENT '姓名',
    age int not null default 0 comment '年龄',
    pos varchar(20) not null default '' COMMENT '职位',
    add_time timestamp not null default current_timestamp comment '入职时间'
)charset utf8 comment '员工记录表';

insert into staffs(NAME, age, pos, add_time) values('z3', 22, 'manager', NOW());
insert into staffs(NAME, age, pos, add_time) values('July', 23, 'dev', NOW());
insert into staffs(NAME, age, pos, add_time) values('2000', 23, 'dev', NOW());

alter table staffs add index idx_staffs_nameAgePos(name, age, pos);

select * from staffs where age=25 and pos='dev';
select * from staffs where pos='dev';

explain select * from staffs where age=25 and pos='dev';
explain select * from staffs where pos='dev';

经过分析，发现上面索引失效。

防止索引失效：

全值匹配我最爱(推荐全职匹配)
最佳左前缀法则(最重要)
- 建立索引时的第一个字段不能丢失，否则会造成索引失效。
- 例如索引是idx_staffs_nameAgePos(name, age, pos);则必须从name开始，不能跳过name直接age或者pos。
不在索引列上作任何操作（计算、函数、（自动or手动）类型转换），会导致索引失效而转向全表扫描
存储引擎不能使用索引中范围条件右边的列
尽量使用覆盖索引（只访问索引的查询（索引列和查询列一致）），减少select *
mysql在使用不等于（！=或者<>）的时候无法使用索引会导致全表扫描
is null，is not null也无法使用索引
like以通配符开头(‘%abc…’)mysql索引失效会变成全表扫描的操作
- 解决方法：like%加右边
- 正确写法：where a = 3 and b like ‘kk%’ and c = 4;
字符串不加单引号索引失效
少用or，用它来连接时会索引失效

口诀：

全职匹配我最爱，最左点缀要遵守
带头大哥不能死，中间兄弟不能断
索引列上少计算，范围之后全失效
like百分写最右，覆盖索引不写星
不等空值还有or，索引失效要少用
var引号不可丢，SQL高级也不难

14. 索引优化面试题

create table test03 (
	id int primary key not null auto_increment,
    c1 char(10),
    c2 char(10),
    c3 char(10),
    c4 char(10),
    c5 char(10)
);

insert into test03(c1,c2,c3,c4,c5) values('a1','a2','a3','a4','a5');
insert into test03(c1,c2,c3,c4,c5) values('b1','b2','b3','b4','b5');
insert into test03(c1,c2,c3,c4,c5) values('c1','c2','c3','c4','c5');
insert into test03(c1,c2,c3,c4,c5) values('d1','d2','d3','d4','d5');
insert into test03(c1,c2,c3,c4,c5) values('e1','e2','e3','e4','e5');

select * from test03;

create index idx_test03_c1234 on test03(c1,c2,c3,c4);
show index from test03;

问题：根据以下sql分析下索引使用情况

1. explain select * from test03 where c1='a1' and c2='a2' and c3='a3' and c4='a4';
2. explain select * from test03 where c1='a1' and c2='a2' and c4='a4' and c3='a3';
3. explain select * from test03 where c4='a4' and c3='a3' and c2='a2' and c1='a1';
4. explain select * from test03 where c1='a1' and c2='a2' and c3>'a3' and c4='a4';
5. explain select * from test03 where c1='a1' and c2='a2' and c4>'a4' and c3='a3';
6. explain select * from test03 where c1='a1' and c2='a2' and c4='a4' order by c3;
7. explain select * from test03 where c1='a1' and c2='a2' order by c3;
8. explain select * from test03 where c1='a1' and c2='a2' order by c4;
9. explain select * from test03 where c1='a1' and c5='a5' order by c2,c3;
10. explain select * from test03 where c1='a1' and c5='a5' order by c3,c2;
11. explain select * from test03 where c1='a1' and c2='a2' order by c2,c3;
12. explain select * from test03 where c1='a1' and c2='a2' order by c3,c2;

第一条sql：正常

第二条sql：和第一条一样，正常，因为mysql会自动调优，但不推荐这样写

第三条sql：和第一条一样，正常，因为mysql会自动调优，但不推荐这样写

第四条sql：出问题，type为range，ref为null，using index condition，只用到了3个索引，因为范围之后全失效。

第五条sql：type为range，ref为null，using index condition，用到了4个索引，因为mysql底层会自动调优，把C3放到C4前面。

第六条sql：type为ref，只用到了c1,c2两个索引，其实c3也用到了，但是只使用了c3的排序功能，所以没有统计到explain里面。

第七条sql：和第六条一样，但filtered仅为20%

第八条sql：用到了c1，c2索引，但是有文件内排序错误，因为违背了最佳左前缀法则，没写c3直接写的c4

第九条sql：只用c1一个字段索引，c2,c3用于排序，无filesort。

第十条sql：出现了filesort，因为违背了最佳左前缀法则。

第十一条sql：只用c1,c2两个字段索引，c2,c3用于排序，无filesort。

第十二条sql：没有filesort，因为前面有c2字段索引，所以后面group by没有filesort。建议与第十条对比。

group by会导致排序，如果错乱后会导致临时表产生

15. 查询截取分析

查询优化的原则：

小表驱动大表

select * from tbl_emp;  -- 大表
select * from tbl_dept; -- 小表

explain select * from tbl_emp e where e.deptId in (select id from tbl_dept d);
-- 当tbl_dept的数据集小于tbl_emp的数据集时，用in优于用exists
explain select * from tbl_emp e where exists (select 1 from tbl_dept d where d.id=e.deptId);
-- 当tbl_emp的数据集小于tbl_dept的数据集时，用exists优于用in

/**
mysql中exists和in的区别：
1. in是把外表和内表做hash连接，先查询内表，也就是先查询in括号里面的表；
2. exists是对外表做loop循环，循环后拿外表的每一项对内表进行验证，看看外表的每一项在内表是否存在，也就是说，EXISTS()查询是将主查询的结果集放到子查询中做验证，根据验证结果是true或false来决定主查询数据结果是否得以保存；
在外表大的时用in效率更快，内表大用exists更快。
**/

Order by排序优化
- ORDER BY满足两种情况，会使用Index方式排序：
  - ORDER BY语句使用索引最左前列
  - 使用Where子句与ORDER BY子句条件列组合满足索引最左前列
- 提高order by 的方法：
  - 使用order by大忌使用select * ，应该只查询需要的字段
    - 当查询的字段的大小总和<max_length_for_sort_data而且排序字段不是text/BLOB时，会使用改进算法–单路排序，否则使用对路排序，速度慢
    - 单路多路都可能超过sort_buffer的容量（单路可能性更大），超出后会创建temporary，导致慢速
  - 尝试提高sort_buffer_size
  - 尝试提高max_length_for_sort_data，但是设置太高，容易超过sort_buffer_size
group by ：同order by，但能用where就不用having

16. 开启慢sql日志

开启：set global slow_query_log=1;
查看是否开启：show variables like ‘%slow_query_log%’;
查看慢查询设置时间：show variables like ‘%long_query_time%’;
设置阈值为1秒：set global long_query_time=1;
显示当前系统中较慢的sql 条数：show global status like ‘%show_queries%’;
若执行超过阈值的sql会在慢查询中显示：select sleep(2);

17 批量数据脚本 - 往表里插入50W条数据

create database bigData;
use bigData;

-- 部门表
create table dept(
	id int unsigned primary key auto_increment,
    deptno mediumint unsigned not null default 0,
    dname varchar(20) not null default "",
    loc varchar(13) not null default ""
)engine=innodb default charset=GBK;

-- 员工表
create table emp(
	id int unsigned primary key auto_increment,
    empno mediumint unsigned not null default 0,/*编号*/
    ename varchar(20) not null default "",/*名字*/
    job varchar(9) not null default "",/*工作*/
    mgr mediumint unsigned not null default 0,/*上级编号*/
    hiredate Date not null,/*入职时间*/
    sal decimal(7,2) not null,/*薪水*/
    comm decimal(7,2) not null,/*红利*/
    deptno mediumint unsigned not null default 0 /*部门编号*/
)engine=innodb default charset=GBK;

-- 这个值要设为ON
show variables like 'log_bin_trust_function_creators';

-- 随机产生字符串
DELIMITER $$
create function rand_string(n int) returns varchar(255)
begin
	declare chars_str varchar(100) default 'abcdefghijklmnopqrstuvwxyzABCDEFGHIGKLMNOPQRSTUVWXYZ';
    declare return_str varchar(255) default '';
    declare i int default 0;
    while i < n do
    set return_str=concat(return_str,SUBSTRING(chars_str,FLOOR(1+RAND()*52),1));
    set i = i + 1;
    end while;
    return return_Str;
end $$

-- 随机产生部门编号
DELIMITER $$
create function rand_num()
returns int(5)
begin
	declare i int default 0;
    set i = floor(100+rand()*10);
	return i;
end $$

-- 假如要删除
-- drop function rand_num;

-- 创建存储过程
DELIMITER $$
create procedure insert_emp(in start int(10), in max_num int(10))
begin
	declare i int default 0;
    -- set autocommit = 0;
    repeat
    set i = i + 1;
    insert into emp(empno, ename, job, mgr, hiredate, sal, comm, deptno) values ((start+i), rand_string(6), 'SALESMAN', 0001, curdate(), 2000, 400, rand_num());
    until i = max_num
    end repeat;
    commit;
end $$

-- 创建存储过程，往dept表添加随机数据
DELIMITER $$
create procedure insert_dept(in start int(10), in max_num int(10))
begin
	declare i int default 0;
    set autocommit = 0;
    repeat
    set i = i + 1;
    insert into dept(deptno, dname, loc) values ((start + i), rand_string(10),rand_string(8));
    until i = max_num
    end repeat;
    commit;
end $$
-- drop procedure insert_dept;

-- 调用存储过程
DELIMITER ;
CALL insert_dept(100, 10);

DELIMITER ;
CALL insert_emp(100001, 500000);

18. Show Profile

Show Profile是mysql提供的可以用来分析当前会话中语句执行的资源消耗情况。可以用于SQL的调优的测量
默认情况下，参数处于关闭状态，并保存最近15次的运行结果

show variables like '%profiling%';
set profiling = on;
show profiles;
show profile cpu, block io for query 17;#一条sql内部执行的完整生命周期
/**
#如果status出现以下条目，降低速度
1 coverting HEAP to MyISAM :查询结果太大，内存不够，往磁盘上搬了
2 creating tpm table ：创建临时表，[拷贝数据到临时表][用完再删除]
3 cooying to tmp table on disk ：把内存中临时表复制到磁盘。危险！！ 
4 locked
**/

19. Mysql锁机制

锁的分类
- 从对数据操作的粒度来分：
  - 表锁
  - 行锁
- 从对数据操作的类型：
  - 读锁（共享锁）：针对同一份数据，多个读操作可以同时进行，不会互相影响
  - 写锁（排它锁）：当前写锁没有完成之前，它会阻断其他写锁和读锁
  - 总结：读锁阻塞改表，写锁阻塞读写

use test01;
show open tables from test01;
lock table tbla read, book write;
unlock tables; -- 释放锁
select * from tbla;

表锁(MyISAM默认使用表锁)
- 锁定表：LOCK TABLES tbl_name {READ | WRITE},[ tbl_name {READ | WRITE},…]
- 表锁：级操作一条记录也会锁着整张表，不适合高并发的操作
行锁(innodb的特点)
- 锁定一行：sql语句+for update;
- 行锁的前提：使用索引，使用事务
- 索引失效后，行锁会变表锁
间隙锁
- 有事务A和事务B，事务A会对数据库表增加10-15这个区间锁，这时insert id = 12 的数据的时候就会因为区间锁10-15而被锁住无法执行，即使12这个id在表中实际上不存在，但他仍然会锁住。