本书以 编译原理及实践 为textbook
Chapter Six. Semantic Analysis
attributes and attribute grammers
X
.
a
X.a
X.a: the value of
a
a
a associated to
X
X
X
X
X
X is a grammar symbol and
a
a
a is an attribute associated to
X
X
X
syntax-directed semantics: attributes are associated directly with the grammar symbols of the language.
Given attributes
a
1
,
a
2
,
…
,
a
k
a_{1}, a_{2}, … , a_{k}
a1,a2,…,ak, for each grammar rule
X
0
→
X
1
X
2
…
X
n
X_{0}\rightarrow X_{1} X_{2} … X_{n}
X0→X1X2…Xn (
X
0
X_{0}
X0 is a nonterminal ), the values of the attributes Xi.aj of each grammar symbol
X
i
X_{i}
Xi are related to the values of the attributes of the other symbols in the rule.
An attribute grammar for attributes
a
1
,
a
2
,
…
,
a
k
a_{1}, a_{2}, … , a_{k}
a1,a2,…,ak is the collection of all attribute equations or semantic rules of the following form, for all the grammar rules of the language.
X
i
.
a
j
=
f
i
j
(
X
0
.
a
1
,
…
,
X
0
.
a
k
,
X
1
.
a
1
,
…
,
X
1
.
a
k
,
…
,
X
n
.
a
1
,
…
X
n
.
a
k
)
X_{i}.a_{j}=f_{ij}(X_{0}.a_{1},…,X_{0}.a_{k}, X_{1}.a_{1}, …, X_{1}.a_{k},…,X_{n}.a_{1}, …X_{n}.a_{k})
Xi.aj=fij(X0.a1,…,X0.ak,X1.a1,…,X1.ak,…,Xn.a1,…Xn.ak)
f
i
j
f_{ij}
fij is a mathematical function of its arguments
dependency graph
dependency graph of the string (sentence) is the union of the dependency graphs of the grammar rule choices representing each node(nonleaf) of the parse tree of the string.
X
i
.
a
j
=
f
i
j
(
X
0
.
a
1
,
…
,
X
0
.
a
k
,
X
1
.
a
1
,
…
,
X
1
.
a
k
,
…
,
X
n
.
a
1
,
…
X
n
.
a
k
)
X_{i}.a_{j}=f_{ij}(X_{0}.a_{1},…,X_{0}.a_{k}, X_{1}.a_{1}, …, X_{1}.a_{k},…,X_{n}.a_{1}, …X_{n}.a_{k})
Xi.aj=fij(X0.a1,…,X0.ak,X1.a1,…,X1.ak,…,Xn.a1,…Xn.ak)
An edge from each node
X
m
.
a
k
X_{m}.a_{k}
Xm.ak to
X
i
.
a
j
X_{i}.a_{j}
Xi.aj the node expressing the dependency of
X
i
.
a
j
X_{i}.a_{j}
Xi.aj on
X
m
.
a
k
X_{m}.a_{k}
Xm.ak
Parse tree method: construction of the dependency graph is based on the specific parse tree at compile time, add complexity, and need circularity detective.
Rule based method: fix an order for attribute evaluation at compiler construction time. It depends on an analysis of the attribute equations, or semantic rules.
Classification of the attributes:
- synthesized attributes
- inherited attributes
Synthesized attributes:
An attribute is synthesized:
- if all its dependencies point from child to parent in the parse tree.
- given a grammar rule A → X 1 X 2 … X n A\rightarrow X_{1}X_{2}… X_{n} A→X1X2…Xn, the only associated attribute equation with an A.a on the left-hand side is of the form: A . a = f ( X 1 . a 1 , … X 1 . a k , … X n . a 1 , … X n . a k ) A.a = f(X_{1}.a_{1},…X_{1}.a_{k},…X_{n}.a_{1},… X_{n}.a_{k}) A.a=f(X1.a1,…X1.ak,…Xn.a1,…Xn.ak)
S-attributed grammar:
An attribute grammar in which all the attributes are synthesized
The attribute values of an S-attributed grammar can be computed by a single bottom-up, or post-order, traversal of the parse or syntax tree.
Procedure PostEval (T : treenode);
Begin
For each child C of T do
PostEval(C);
Compute all synthesized attributes of T;
End
An attribute that is not synthesized is called an inherited attribute.
inherited attributes : computed by a preorder traversal , or combined preorder/inorder traversal of the parse or syntax tree, represented by the following pseudocode:
Procedure PreEval(T: treenode);
Begin
For each child C of T do
Compute all inherited attributes of C;
PreEval(C);
End;
Inherited attributes:
be computed in preorder, often be treated as parameters of the call.
Synthesized attributes:
be computed in postorder, often be treated as returned values of the call.
The computation of attributes during parsing:
- Attributes that computed successfully at the same time as the parsing stage depends on the power and properties of the parsing method employed.
- All the major parsing methods process the input program from left to right (LL, or LR) .
- require the attribute be capable of evaluation by a left-to-right traversal of the parse tree (synthesized attributes will always be OK).
L-attributed:
An attribute grammar for attribute
a
1
,
…
,
a
k
a_{1}, …, a_{k}
a1,…,ak is L-attributed if , for each inherited attribute aj and each grammar rule:
X
0
→
X
1
X
2
…
X
n
X_{0} \rightarrow X_{1}X_{2}…X_{n}
X0→X1X2…Xn
The associated equations for aj are all of the form:
X
i
.
a
j
=
f
i
j
(
X
0
.
a
1
,
.
.
.
,
X
0
.
a
k
,
X
1
.
a
1
,
.
.
.
,
X
1
.
a
k
,
.
X
i
−
1
.
a
1
,
.
.
.
,
X
i
−
1
.
a
k
)
X_{i}.a_{j}=f_{ij}(X_{0}.a_{1},...,X_{0}.a_{k}, X_{1}.a_{1},...,X_{1}.a_{k},.X_{i-1}.a_{1},...,X_{i-1}.a_{k})
Xi.aj=fij(X0.a1,...,X0.ak,X1.a1,...,X1.ak,.Xi−1.a1,...,Xi−1.ak)
Given an L-attributed grammar in which the inherited attributes do not depend on the synthesized attributes:
- Top-down parser: a recursive-descent parser can evaluate all the attributes by turning the inherited attributes into parameters and synthesized attributes into returned values.
- Bottom-up parser: LR parsers are suited to handling primarily synthesized attributes, but are difficult for inherited attributes.
computing synthesized attributes during LR parsing
Value stack: store synthesized attributes, be manipulated in parallel with the parsing stack.
inheriting a previously computed synthesized attributes during LR parsing
an action associated to a nonterminal in the right-hand side of a rule can make use of synthesized attributes of the symbols to the left of it in the rule.
The question can be settled through a ε–production as follows:
A
→
B
D
C
B
→
.
.
.
{
c
o
m
p
u
t
e
r
B
.
s
}
D
→
ϵ
s
a
v
e
d
i
=
f
(
v
a
l
s
t
a
c
k
[
t
o
p
]
)
C
→
.
.
.
{
n
o
w
s
a
v
e
d
i
i
s
a
v
a
i
l
a
b
l
e
}
\begin{matrix} A\rightarrow BDC\\ B\rightarrow ... &\{computer B.s\}\\ D\rightarrow \epsilon &saved_{i}= f(valstack[top]) \\ C\rightarrow ...& \{now\ saved_{i}\ is\ available\} \end{matrix}
A→BDCB→...D→ϵC→...{computerB.s}savedi=f(valstack[top]){now savedi is available}
Problems:
- Require the programmer to directly access the value stack during a parse
this may be risky in automatically generated parsers. - only works if the position of the previously computed attribute is predictable from the grammar.
the best technique for dealing with inherited attributes in LR parsing:
- use external data structures (symbol table or nonlocal variables) to hold inherited attribute values.
- add ϵ − \epsilon - ϵ− p roduction or embedded actions as in Yacc (may add parsing conflicts) to allow for changes to these data structures to occur at appropriate moments.
modifications to the grammar that do not change the legal strings of the language (make the computation of attributes simpler or more complex.)
The properties of attributes depend heavily on the structure of the grammar.
Given an attribute grammar , all inherited attributes can be changed into synthesized attributes by suitable modification of the grammar, without changing the language of the grammar.(from Knuth [1968]).
Semantic checks refer to properties of identifiers in the program – their scope or type
Need an environment to store the information about identifiers = symbol table
Each entry in the symbol table contains the name of an identifier, additional information: its kind, its type, if it is constant,
The structure of the symbol table:
- linear list
- various search tree structures
- hash tables
hash collision:
- Collision resolution
- (perfer) separate chaining
The process of the hash function:
- converts a character string (the identifier name) into a nonnegative integer.
- These integers are combined in some way to form a single integer.
- The result integer is scaled to the range 0…size-1.
The algorithm of the hash function:
- one simple algorithm : ignore many of the characters and to add together only the value of the first few, or the first, middle, and last, characters.
- Another simple method : add up the values of all the characters.
- one good solution : repeatedly use a constant number α \alpha α as a multiplying factor when adding in the value of the next character.
four basic kinds of declarations:
- constant declarations,
- type declarations,
- variable declarations
- procedure/function declarations.
Declarations
The strategies:
- use one symbol table to hold the names from all the different kinds of declarations.
- use a different symbol table for each kind of declaration.
- associate separate symbol tables with different regions of a program and link them together according to the semantic rules of the language.
two rules:
- declaration before use
- the most closely nested rule for block structure
Another solution: building a new symbol table for each scope and to link the tables from inner to outer scopes together
Interaction of same-level declarations algorithm:
- Lookup before each insert.
- Determine by some mechanism whether any preexisting declarations with the same name are at the same level or not.
different declaration:
- sequential declaration:
each declaration is added to the symbol table as it is processed. - collateral declaration:
- declarations not be added immediately to the existing
symbol table - accumulated in a new table(or temporary structure)
- then added to the existing table after all declarations have
been processed.
- declarations not be added immediately to the existing
- recursive declaration:
declaration may refer to themselves or each other.
type checking
Two general groups about language:
- permit the direct use of recursion in type declarations.
- do not permit direct use of recursion in type declarations.
Classification of type equivalence:
- Structural equivalence
- Name equivalence
- Declaration equivalence
Structural equivalence:
two types are the same if and only if they have the same structure.
two types are the same if and only if they have syntax trees that are identical in structure.
two arrays are equivalent: the same size and component type.
two records are equivalent: the same components with the same names and in the same order.
different choices:
The size of the array can be ignored
The components of a structure or union can be in a different order.
Name equivalence:
two type expressions are equivalent if and only if they are either the same simple type or are the same type name.
type expressions can be allowed in variable declarations or subexpressions of type expressions.
a type expression may have no explicit name given to it, a compiler will have to generate an internal name for the type expression that is different from any other names.
Declaration equivalence:
weaker version of name equivalence
Overloading: the same operator name is used for two different operations
type conversion and coercion:
allow arithmetric expressions of mixed type.
There are two approaches a language can take to such conversions:
- Require the programmer supply a conversion function (Modula-2)
- The type checker supply the conversion automatically. (C ) ( coercion)
Polymorphic typing