mysql-底层存储和索引原理

最新推荐文章于 2024-03-09 11:04:13 发布

dragonlicat

最新推荐文章于 2024-03-09 11:04:13 发布

阅读量302

点赞数

分类专栏： mysql 文章标签： mysql

本文链接：https://blog.csdn.net/u011935048/article/details/105624277

版权

mysql 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

本文介绍了MySQL底层数据的存储方式，包括.frm和.ibd文件的作用。重点讲解了索引原理，特别是B+tree的优势。讨论了聚簇索引、主键索引、唯一索引的区别，并解释了回表和覆盖索引的概念。此外，还阐述了最左匹配原则及其应用，以及如何选择合适的字段作为索引字段，强调了离散度高的字段更适合建立索引。

摘要由CSDN通过智能技术生成

文章目录

mysql 底层数据存储


mysql> use employees;
Database changed
mysql> show tables;
+---------------------+
| Tables_in_employees |
+---------------------+
| departments         |
| dept_emp            |
| dept_manager        |
| employees           |
| salaries            |
| titles              |
+---------------------+
6 rows in set (0.00 sec)

数据目录下的database /appdata/mysql-data/employees
总用量 237888
-rw-r----- 1 mysql mysql        61 4月  19 21:47 db.opt
-rw-r----- 1 mysql mysql      8606 4月  19 21:47 departments.frm
-rw-r----- 1 mysql mysql    114688 4月  19 21:48 departments.ibd
-rw-r----- 1 mysql mysql      8676 4月  19 21:47 dept_emp.frm
-rw-r----- 1 mysql mysql  31457280 4月  19 21:48 dept_emp.ibd
-rw-r----- 1 mysql mysql      8676 4月  19 21:47 dept_manager.frm
-rw-r----- 1 mysql mysql    131072 4月  19 21:48 dept_manager.ibd
-rw-r----- 1 mysql mysql      8768 4月  19 21:47 employees.frm
-rw-r----- 1 mysql mysql  23068672 4月  19 21:48 employees.ibd
-rw-r----- 1 mysql mysql      8674 4月  19 21:47 salaries.frm
-rw-r----- 1 mysql mysql 146800640 4月  19 21:49 salaries.ibd
-rw-r----- 1 mysql mysql      8672 4月  19 21:47 titles.frm
-rw-r----- 1 mysql mysql  41943040 4月  19 21:48 titles.ibd

看一下db.opt 存储当前数据库的默认字符集和字符校验规则

cat db.opt
default-character-set=utf8
default-collation=utf8_general_ci

mysql> show create database employees;
+-----------+--------------------------------------------------------------------+
| Database  | Create Database                                                    |
+-----------+--------------------------------------------------------------------+
| employees | CREATE DATABASE `employees` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+-----------+--------------------------------------------------------------------+
1 row in set (0.00 sec)

该目录下每个table 对应两个存储文件 .frm .ibd

.frm 数据表的定义相关信息，可以用来恢复表结构
.idb innoDB的数据文件，主要包括索引数据

mysql索引原理

mysql的索引使用了B+tree，为什么使用这个数据结构呢？

tree

平衡二叉树

定义：
（1）它的左子树和右子树的深度之差(平衡因子)的绝对值不超过1
（2）它的左子树和右子树都是一颗平衡二叉树
从定义可知，当数据很多时，AVL树的深度也会随之增长，在检索过程中，会发生多次IO
在这里插入图片描述

B tree

B-tree中，每个结点包含：
1、关键字数 n
2、度（Degree） n+1
3、所有节点为完整数据

在这里插入图片描述

B+tree

1、所有叶子节点为完整数据，且叶子节点有指针指向相邻节点，可看做是一个有序集合
在这里插入图片描述

B+树解决了AVL树的深度问题，也支持所有B树的优点，并且叶子节点有指针指向相邻节点，增强了排序功能

聚簇索引，主键索引，唯一索引

聚簇索引的叶子节点就是数据节点，
一般情况下主键就是默认的聚簇索，
而非聚簇索引的叶子节点仍然是索引节点，只不过有指向对应数据块的指针，
如果表中没有定义主键，InnoDB 会选择一个唯一的非空索引代替
如果没有这样的索引，InnoDB会隐式地定义_rowid来作为聚簇索引

14.6.2.1 Clustered and Secondary Indexes
Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. Define a primary key for each table that you create. If there is no logical unique and non-null column or set of columns, add a new auto-increment column, whose values are filled in automatically.

If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

什么是回表，什么是覆盖索引

根据索引分类可知，只有聚簇索引的叶子节点才存放了所有数据，而普通索引都是叶子节点都是主键信息，查询：
1.直接使用了聚簇索引，查一次B+tree，便可以检索出结果数据
2.未使用聚簇索引，使用了普通索引，查一次普通索引的B+tree，获取主键信息，再查聚簇索引，获取结果数据【回表查询】
3.如果不需要查询完整数据，只查询普通索引的索引字段，则只查询一次普通索引的B+tree，就可以得到结果【覆盖索引查询，减少回表操作，这也是不建议在查询语句中使用select * 的原因】

最左匹配原则

在一个数据表中，如果建立了这样(a,b)的一个联合索引

1. select * from table where a="aa" and b="bbb";
2. select * from table where b="bbb" and  a="aa";
3. select * from table where a="aa" ;
4. select * from table where b="bbb";
5. select a,b from table where b="bbb";

从索引的结构上来说[a,b]的一个有序B+tree，1,3语句是毫无疑问走该联合索引查询，2语句看上去不满足最左匹配，但是mysql的优化器Query Optimizer，会将其优化成1，同样满足最左匹配；4语句是无法走索引的，而5语句由于select a,b数据，在该索引里完全存储了，虽然不满足最左匹配，但是满足了覆盖索引的条件，也会只查一次该索引，而不会再去回表。