本文则是从内核的角度, 通过源码层来剖析SQL语句的构成(只以SELECT语句为例), 本文的意义,在于帮助内核实现者或者对内核对SQL实现感兴趣者掌握如何理解SQL的剖析(抛砖引玉, ^_^).
如下是MySQL源码sql_lex.h的注释,非常重要, 本文以黑色字体给出MySQL原文, 蓝色字体标识出深入解析的内容, 红色标示特别强调的内容.
/* //如下是基本概念,需要理解SELECT命令中的基本概念(查询表达式/查询块, 形象地举个例子,子查询就是一个"查询表达式", 派生表就是一个查询块)对应的MySQL表达方式(SELECT_LEX_UNIT/SELECT_LEX,需要自己阅读这2个类的定义,要不可能白看了本文...哈):
Class SELECT_LEX_UNIT represents a query expression.
Class SELECT_LEX represents a query block.
A query expression contains one or more query blocks (more than one means
that we have a UNION query). //这是在说明二者(查询表达式/查询块)的关系. UNION前后都可是一个查询块,没有UNION的时候,就那么一个简单SELECT语句,首先其是一个"query expression",然后进一步被当作"query block"
These classes are connected as follows:
Both classes have a master, a slave, a next and a prev field. //这2个类中,每个类都有一个master,一个slave, 一个next和一个prev,但注意他们是不同的--名字相同但类型不同,前往不可"见名生义"认为他们是一样的. 下面则是解释它们的不同之处:
For class SELECT_LEX, master and slave connect to objects of type
SELECT_LEX_UNIT, whereas for class SELECT_LEX_UNIT, they connect
to SELECT_LEX.
//下面这2句是在说"父和子"的关系,其意义在于,从某个"子"出发可以找到其"父",从"父"出发可以找到其"子", 父子关系就是这么建立起来的. 解析SQL需要知道父子关系/邻里关系/引用关系(谁参照了谁). 也就是说,一个SQL是一个整体,解析就需要化整为零; 怎么化整为零,靠的就是用不同的对象标示出关键的"子"体,分解过程把子体标示出来,并标示出各种"联系"(如此处的父子间的关联关系)
master is pointer to outer node.
slave is pointer to the first inner node
neighbors are two SELECT_LEX or SELECT_LEX_UNIT objects on
the same level. //与一个"对象"相邻的,就是"neighbor/邻居"了,邻居一定和自己是同类项.所以此处说邻居是SELECT_LEX 或 SELECT_LEX_UNIT.
//如下继续深入解释邻居指向的关系/指向"父"的关系/指向"子"的关系
The structures are linked with the following pointers:
- list of neighbors (next/prev) (prev of first element point to slave
pointer of outer structure)
- For SELECT_LEX, this is a list of query blocks.
- For SELECT_LEX_UNIT, this is a list of subqueries. //哈,可以了解子查询是怎么标示的
- pointer to outer node (master), which is
If this is SELECT_LEX_UNIT
- pointer to outer select_lex.
If this is SELECT_LEX
- pointer to outer SELECT_LEX_UNIT.
- pointer to inner objects (slave), which is either:
If this is an SELECT_LEX_UNIT:
- first query block that belong to this query expression. //注意这里表示的"第一个子",如果有多个"子",怎么找呢?------呵呵呵呵,别忘了有"邻居"
If this is an SELECT_LEX
- first query expression that belong to this query block (subqueries).
- list of all SELECT_LEX objects (link_next/link_prev) //注意和"next/prev"的区别.首先,这里的数据类型只是SELECT_LEX,即只能是"查询块", 可以表示"多个派生表".
This is to be used for things like derived tables creation, where we
go through this list and create the derived tables.
//其他一些变量的作用(应对的是其他一些情况),如fake_select_lex,不多解释了,可自己深入探索
If query expression contain several query blocks (UNION now,
INTERSECT etc later) then it has a special select_lex called
fake_select_lex. It used for storing global parameters (like ORDER BY,
LIMIT) and executing union.
Subqueries used in global ORDER BY clause will be attached to this
fake_select_lex, which will allow them to correctly resolve fields of
the containing UNION and outer selects.
For example for following query: //用例子说明,前前后后得多对照阅读,才可切实理解其关系
select *
from table1
where table1.field IN (select * from table1_1_1 union
select * from table1_1_2)
union
select *
from table2
where table2.field=(select (select f1 from table2_1_1_1_1
where table2_1_1_1_1.f2=table2_1_1.f3)
from table2_1_1
where table2_1_1.f1=table2.f2)
union
select * from table3;
we will have following structure:
select1: (select * from table1 ...)
select2: (select * from table2 ...)
select3: (select * from table3)
select1.1.1: (select * from table1_1_1)
...
main unit
fake0 //第1层
select1 select2 select3 //使用UNION联接起来的3个查询语句,是例子的主体结构. select3结构简单,没有子部分
|^^ |^
s||| ||master
l||| |+---------------------------------+
a||| +---------------------------------+|
v|||master slave || //master左边的2个竖线标示select1是2个unit的master
e||+-------------------------+ ||
V| neighbor | V| //本行的V表示的是向下的箭头
unit1.1<+==================>unit1.2 unit2.1
fake1.1 //第2层,第1层的子层
select1.1.1 select 1.1.2 select1.2.1 select2.1.1 //select1.2.1不存在,因该是MySQL原版文字错误。 如果是要对应unit1.2,则应该是“select 1.1.2”,中间不应该有空格
|^
||
V|
unit2.1.1.1
select2.1.1.1.1
relation in main unit will be following:
(bigger picture for:
main unit
fake0
select1 select2 select3
in the above picture)
main unit
|^^^^|fake_select_lex
|||||+--------------------------------------------+
||||+--------------------------------------------+|
|||+------------------------------+ ||
||+--------------+ | ||
slave||master | | ||
V| neighbor | neighbor | master|V //邻居关系
select1<========>select2<========>select3 fake0
//如上,main unit是一个主体(表达式,即SELECT_LEX_UNIT结构对象),其slave是select1,select1如果邻居以双向链表的形式指向其兄弟select2/select3。而select1的master就是main unit。
//如下说明了词法解析的过程。
//1 主体是main unit,SELECT_LEX_UNIT结构对象,一切的master。其自己的master为NULL. 从代码里看,形式上,存在一个"根"对象,用LEX表示.可以通过sql_lex.h中的"struct LEX: public Query_tables_list"了解. 代码实现的时候,先是生成一个查询块,然后生成此查询块对应的查询表达式,让查询表达式作为查询块的master,让查询块作为查询表达式的slave.
//2 过程为,先找到slecet1查询块(SELECT_LEX),然后继续找union标示的下一个查询块select2,同理找select3
//3 当selelct3之后没有其他并列的查询块,开始递归向下,找子查询块。selelct3么有子,开始返回到上一个查询块select2,然后找select2的子。这样找到了select2.1.1和select2.1.2。并为select2.1.2找到了子select2.1.1.1.1,没有其它子后,然后继续回退找上一个union的子select1,为select1找到子select1.1.1和selelct。
//4 注意找每个查询块的时候,其本身先是一个查询表达式,然后才是一个查询块。切记。
list of all select_lex will be following (as it will be constructed by
parser):
select1->select2->select3->select2.1.1->select 2.1.2->select2.1.1.1.1-+
|
+---------------------------------------------------------------------+
|
+->select1.1.1->select1.1.2
*/