这是一段代码片段(code snippet)
int a = 1
如果我们像想要execute , analyze or format it.我们就必须将code snippet transform成我们某种data transform.
Now, if we want to do anything with this program (executing it, analyzing it, formatting it…) we have to transform it into some data structure that we can work with.
the first step 就是词法分析(lexical analysis),也叫tokenization:identifying the minimum sequences of characters (tokens) that have some meaning
int a = 1
// | | | | | | | |
// \_'int'_/ \_name_/ \_'='_/ \_num_/
Tokens represent the alphabet of a language: they can’t be broken into smaller parts and they can be combined together to form a program.
标记代表语言的字母表:它们不能分成更小的部分,它们可以组合在一起形成一个程序。
Intuitively, not any combination of valid tokens produces a valid program. Consider for instance:
直观地说,不是所有夫人有效令牌的组合产生有效的程序。考虑例如:
const const a b = = 1 > 2
Not a valid program. So there must be something that dictates the valid ways of combining tokens; usually, this is called grammar. A grammar defines the relationship between tokens, by grouping them into intermediate structures that can be recursively be combined together.
不是有效的程序。所以必须有一些东西能决定组合tokens的有效方式; 通常,这称为grammar。语法通过将tokens分组,为可以递归地组合在一起的中间结构来定义token之间的关系。
We can also say that a grammar describes the syntax of a language.
For instance:
int a = 1 ;
// | | | | ||
// | \_Identifier__/ \_NumericLiteral_/|
// | | |
// | \_________VariableDeclarator________/
// | |
// \____________VariableDeclaration____________/
Here the grammar tells us that this is a valid VariableDeclaration, which is composed by one (or many) VariableDeclarator, that in turn have a left-hand side which is an Identifier and a right-hand side which can be any expression and in this case is simply a NumericLiteral.
这里的语法告诉我们这是一个有效的VariableDeclaration,它由一个(或多个)组成VariableDeclarator,而这个左边又是一个Identifier右边,可以是任何表达式,在这种情况下只是简单的a NumericLiteral
You may have noticed that these structures are arranged in a tree structure, and since they represent the syntax of a language it is natural to call them Syntax Trees.
您可能已经注意到这些结构以树结构排列,并且由于它们代表语言的语法,因此将它们称为语法树是很自然的.
We have only one question left: why Abstract?
int a = 42;
int a =
42
int
a
=
42;
这些变体的语法树是什么?事实证明它与原始示例相同。在这些树表示中通常会忽略空格,格式和分号等内容,因为它们通常不会携带有用的信息。
这就是为什么这些树被称为抽象:它们不是原始源代码的具体表示,而是一种丢弃一些细节而不是专注于句法结构的抽象。
What is the syntax tree of these variants? It turns out that it’s the same as the original example. Things likes spaces, formatting and semicolons are usually ignored in these tree representations because they don’t generally carry useful information.
And that’s why these trees are called Abstract: they are not a faithful concrete representation of the original source code, rather an abstraction that discards some details to focus on the syntactic structure instead.
参考:
https://blog.buildo.io/a-tour-of-abstract-syntax-trees-906c0574a067