MySQL架构-CSDN博客

2019独角兽企业重金招聘Python工程师标准>>>

MySQL架构概述

MySQL是基于分层的架构,它由子系统和支持组件组成，通过它们之间的交互来读取、解析以及执行查询，并缓存和返回查询结果。MySQL架构由五个基本的子系统组成，它们协同工作以响应对MySQL数据库服务器的请求。

下图就是MySQL的体系结构：

1)查询引擎

SQL接口

SQL接口提供一套机制来接收命令以及传输结果给用户。MySQL的SQL接口遵从ANSI SQL标准，它和绝大多数遵从ANSI标准的数据库服务器一样接受同样基础的SQL语句。尽管MySQL支持的许多命令中有非ANSI标准的选项, MySQL的开发者已经努力使它们非常接近于ANSI SQL标准。

到数据库服务器之间的连接是从网络通信通路上接收到的，且每一个连接都会创建一个线程。线程化的过程是MySQL服务器中可执行路径的核心。MySQL是一个真正的多线程应用程序，除了某些辅助线程之外，每个线程都独立执行。传入的SQL命令存储在类结构中并且通过将结果输出到网络通信协议中结果就被传输到客户端。一旦创建了一个线程，MySQL服务器会尝试解析SQL命令并存储部件的内部数据结构。

查询解析器

当客户端发起一个查询时会创建一个新的线程，并且SQL语句会转发给解析器进行语法验证（或者由于错误被驳回）。MySQL解析器是使用一个大型的Lex-YACC脚本且由Bison编译实现的。解析器构造一个查询的结构，用于表示内存中的一个可用于执行查询的树形结构（也称为抽象语法树）。

查询优化器

The optimizer used is a SELECT-PROJECT-JOIN strategy that attempts to restructure the query by first doingany restrictions (SELECT) to narrow the number of tuples to work with, then performs theprojections to reduce the number of attributes (fields) in the resulting tuples, and finally evaluatesany join conditions. While not considered a member of the extremely complicated queryoptimizer category, the SELECT-PROJECT-JOIN strategy falls into the category of heuristicoptimizers. In this case, the heuristics (rules) are simply

• Horizontally eliminate extra data by evaluating the expressions in the WHERE (HAVING)clause.

• Vertically eliminate extra data by limiting the data to the attributes specified in theattribute list. The exception is the storage of the attributes used in the join clause thatmay not be kept in the final query.

• Evaluate join expressions.

This results in a strategy that ensures a known-good access method to retrieve data in anefficient manner. Despite critical reviews, the SELECT-PROJECT-JOIN strategy has proveneffective at executing the typical queries found in transaction processing.The first step in the optimizer is to check for the existence of tables and access control bythe user. If there are errors, the appropriate error message is returned and control returns tothe thread manager, or listener. Once the correct tables have been identified, they are openedand the appropriate locks are applied for concurrency control.Once all of the maintenance and setup tasks are complete, the optimizer uses the internalquery structure and evaluates the WHERE conditions (a restrict operation) of the query.Results are returned as temporary tables to prepare for the next step. If UNION operators arepresent, the optimizer executes the SELECT portions of all statements in a loop before continuing.The next step in the optimizer is to execute the projections. These are executed in a similarmanner as the restrict portions, again storing the intermediate results as temporary tables andsaving only those attributes specified in the column specification in the SELECT statement.Lastly, the structure is analyzed for any JOIN conditions that are built using the join class, andthen the join::optimize() method is called. At this stage the query is optimized by evaluatingthe expressions and eliminating any conditions that result in dead branches or always true oralways false conditions (as well as many other similar optimizations). The optimizer is attemptingto eliminate any known-bad conditions in the query before executing the join. This is donebecause joins are the most expensive and time consuming of all of the relational operators. Itis also important to note that the join optimization step is performed for all queries that have aWHERE or HAVING clause regardless of whether there are any join conditions. This enablesdevelopers to concentrate all of the expression evaluation code in one place. Once the joinoptimization is complete, the optimizer uses a series of conditional statements to route thequery to the appropriate library method for execution.

Query Execution

Execution of the query is handled by a set of library methods designed to implement a particularquery. For example, the mysql_insert() method is designed to insert data. Likewise, thereis a mysql_select() method designed to find and return data matching the WHERE clause. Thislibrary of execution methods is located in a variety of source code files under a file of a similarname (e.g., sql_insert.cc or sql_select.cc). All of these methods have as a parameter a threadobject that permits the method to access the internal query structure and eases execution.Results from each of the execution methods are returned using the network communicationpathways library. The query execution library methods are clearly implemented using theinterpretative model of query execution

查询缓存

虽然不是它自己的子系统，查询缓存应该是查询优化和执行子系统的重要组成部分。查询缓存是一个了不起的发明，不仅缓存查询结构，而且还缓存查询结果。这使系统能够检查常用的查询并且缩短整个查询优化和执行阶段。这是MySQL另一特性。其他数据库系统缓存查询语句，但没有缓存查询结果的。As you can appreciate, the query cache must also allow for situations where the results are “dirty” in the sense that something has changed since the last time the query was run (e.g., an INSERT, UPDATE, or DELETE was run against the base table) and that the cached queries may need to be occasionally purged.

2) Buffer Manager/Cache and Buffers

The caching and buffers subsystem is responsible for ensuring that the most frequently useddata (or structures, as you will see) are available in the most efficient manner possible. In otherwords, the data must be resident or ready to read at all times. The caches dramatically increasethe response time for requests for that data because the data is in memory and thus no additionaldisk access is necessary to retrieve it. The cache subsystem was created to encapsulate all of thecaching and buffering into a loosely coupled set of library functions. Although you will find thecaches implemented in several different source code files, they are considered part of the samesubsystem.A number of caches are implemented in this subsystem. Most of the cache mechanismsuse the same or similar concept of storing data as structures in a linked list. The caches areimplemented in different portions of the code to tailor the implementation to the type of datathat is being cached. Let’s look at each of the caches.

Table Cache

The table cache was created to minimize the overhead in opening, reading, and closing tables(the .FRM files on disk). For this reason, the table cache is designed to store metadata about thetables in memory. This makes it much faster for a thread to read the schema of the table withouthaving to reopen the file every time. Each thread has its own list of table cache structures. Thispermits the threads to maintain their own views of the tables so that if one thread is altering theschema of a table (but has not committed the changes) another thread may use that table withthe original schema. The structure used is a simple one that includes all of the metadata informationfor a table. The structures are stored in a linked list in memory and associated with eachthread.

Record Cache

The record cache was created to enhance sequential reads from the storage engines. Thus therecord cache is usually only used during table scans. It works like a read-ahead buffer by retrievinga block of data at a time, thus resulting in fewer disk accesses during the scan. Fewer diskaccesses generally equates to improved performance. Interestingly, the record cache is alsoused in writing data sequentially by writing the new (or altered) data to the cache first and thenwriting the cache to disk when full. In this way write performance is improved as well. Thissequential behavior (called locality of reference) is the main reason the record cache is mostoften used with the MyISAM storage engine, although it is not limited to MyISAM. The recordcache is implemented in an agnostic manner that doesn’t interfere with the code used toaccess the storage engine API. Developers don’t have to do anything to take advantage of therecord cache as it is implemented within the layers of the API.

Key Cache

The key cache is a buffer for frequently used index data. In this case, it is a block of data for theindex file (B-tree) and is used exclusively for MyISAM tables (the .MYI files on disk). The indexesthemselves are stored as linked lists within the key cache structure. A key cache is created whena MyISAM table is opened for the first time. The key cache is accessed on every index read. If anindex is found in the cache, it is read from there; otherwise, a new index block must be readfrom disk and placed into the cache. However, the cache has a limited size and is tunable bychanging the key_cache_block_size configuration variable. Thus not all blocks of the index filewill fit into memory. So how does the system keep track of which blocks have been used?The cache implements a monitoring system to keep track of how frequent the index blocksare used. The key cache has been implemented to keep track of how “warm” the index blocks are.Warm in this case refers to how many times the index block has been accessed over time. Valuesfor warm include BLOCK_COLD, BLOCK_WARM, and BLOCK_HOT. As the blocks cool off and new blocksbecome warm, the cold blocks are purged and the warm blocks added. This strategy is a leastrecently used (LRU) page-replacement strategy—the same algorithm used for virtual memorymanagement and disk buffering in operating systems—that has been proven to be remarkablyefficient even in the face of much more sophisticated page-replacement algorithms. In a similarway, the key cache keeps track of the index blocks that have changed (called getting “dirty”).When a dirty block is purged, its data is written back to the index file on disk before being replaced.Conversely, when a clean block is purged it is simply removed from memory.

Privilege Cache

The privilege cache is used to store grant data on a user account. This data is stored in the samemanner as an access control list (ACL), which lists all of the privileges a user has for an objectin the system. The privilege cache is implemented as a structure stored in a first in, last out(FILO) hash table. Data for the cache is gathered when the grant tables are read during userauthentication and initialization. It is important to store this data in memory as it saves a lot oftime reading the grant tables.

Hostname Cache

The hostname cache is another of the helper caches, like the privilege cache. It too is implementedas a stack of a structure. It contains the hostnames of all the connections to the server.It may seem surprising, but this data is frequently requested and therefore in high demand anda candidate for a dedicated cache.

Miscellaneous

A number of other small cache mechanisms are implemented throughout the MySQL sourcecode. One example is the join buffer cache used during complex join operations. For example,some join operations require comparing one tuple to all the tuples in the second table. A cachein this case can store the tuples read so that the join can be implemented without having toreread the second table into memory multiple times.

3)存储管理器

存储管理器与操作系统交互以有效地将数据写入到磁盘上。由于存储功能位于一个独立的的子系中，MySQL引擎工作在一个远离操作系统的抽象层次。存储管理器将用户表中的所有数据写到磁盘上。索引、日志以及系统内部数据。

4)事务管理器

事务管理器的功能是为了方便数据访问的并发控制。该子系统提供一个锁装置以确保多个用户同时以一种一致性的方式访问数据，不破坏或损坏数据。事务控制发生于锁管理器子组件放置和释放事务中使用的各个对象上。

5)恢复管理器

恢复管理器的工作是保存数据的副本以便日后恢复，以免丢失数据。它也会记录数据库中修改数据以及其它重要事件的命令。到目前为止，只有InnoDB和BDB表处理程序。

子系统间的交互和控制流

查询引擎请求的数据被读取或写入到缓冲区管理器，以满足用户的查询。它依赖于事务管理器对数据进行锁定，以保证并发。执行表的创建和删除操作，查询引擎绕过缓冲区管理器直接访问存储管理器来创建或删除文件系统中的文件。

为了使查询引擎更有效地检索，缓冲区管理器缓存存储管理器中的数据。在它执行任何修改操作之前它依赖于事务管理器来检测数据的锁状态。

事务管理器依赖于查询缓存和存储管理器在内存和文件系统中的数据在上放置锁。

恢复管理器使用存储管理器来存储命令/事件日志以及数据备份到文件系统中。这取决于事务管理器获取写入的日志文件上的锁。从崩溃中恢复期间，恢复管理器也需要使用缓冲区管理器。

存储管理器依赖于操作系统的文件系统来进行持久性存储和数据检索。这取决于事务管理器获取的锁状态信息。