本书以 编译原理及实践 为textbook
Chapter Five. Bottom-Up Parsing
LR grammars (from Left to Right,rightmost derivation)
shift-reduce parsers
action:
- Shift : Shift a terminal from the front of the input to the top of the stack.
- Reduce: Reduce a string
α
\alpha
α at the top of the stack to a nonterminal A, given the BNF choice
A
→
α
A\rightarrow \alpha
A→α.
A bottom-up parser : a shift-reduce parser.
One further feature of bottom-up parsers: grammars are always augmented with a new start symbol.
if S is the start symbol, a new start symbol S’ is added to the grammar :
S
′
→
S
{S}' \rightarrow S
S′→S
LR(0) item
NFA of LR(0):
- If X is a token or a nonterminal
the item can be written as A → α ⋅ X η A\rightarrow \alpha\cdot X\eta A→α⋅Xη - If X is a token, then this transition corresponds to a shift of X from the input to the top of the stack during a parse.
- if X is a nonterminal
X will never appear as an input symbol. (such a transition will still correspond to the pushing of X onto the stack during a parse, but this can only occur during a reduction by a production X → ⋅ β X \rightarrow \cdot\beta X→⋅β )
DFA of LR(0)
LR(0) Parsing Algorithm:
Let s be the current state (at the top of the parsing stack).Then actions are defined as follows:
-
If state s contains any item of the form A → α ⋅ X β A\rightarrow \alpha \cdot X \beta A→α⋅Xβ(X is a terminal). Then the action is to shift the current input token X on to the stack.
if the token is not X, an error is declared. -
If state s contains any complete item (an item of the form A → γ ⋅ A\rightarrow \gamma \cdot A→γ⋅), then the action is to reduce by the rule A → γ ⋅ A\rightarrow \gamma \cdot A→γ⋅
- back up to the state from which the construction γ began. This state must contain a item of the form B → α ⋅ A β B\rightarrow \alpha \cdot A \beta B→α⋅Aβ;Push A A A onto the stack.
- A reduction by the rule S ′ → S {S}'\rightarrow S S′→S, where S’ is the start state,
Acceptance if the input is empty
Error if the input is not empty.
A grammar is LR(0) if and only if
- each state is a shift state( a state containing only “shift” items)
- a reduce state containing a single complete item.
LR(0) Parsing Table
The table rows labeled with the states of the DFA.
SLR(1) Parsing
algorithm:
- If state s contains any item of form A → α ⋅ X β A\rightarrow \alpha \cdot X \beta A→α⋅Xβ, then the action is to shift the current input token X onto the stack, and the new state to be pushed on the stack is the state containing the item A → α X ⋅ β A\rightarrow \alpha X \cdot \beta A→αX⋅β
- If state s contains the complete item
A
→
γ
⋅
A\rightarrow \gamma \cdot
A→γ⋅ ,and the next token in the input string is in
F
o
l
l
o
w
(
A
)
Follow(A)
Follow(A), then the action is to reduce by the rule
A
→
γ
⋅
A\rightarrow \gamma \cdot
A→γ⋅.
- A reduction by the rule S ′ → S {S}' \rightarrow S S′→S, where S ′ {S}' S′ is the start state, this will happen only if the next input token is $.
- Remove the string γ \gamma γ and all of its corresponding states from the parsing stack, back up in the DFA to the state from which the construction of γ \gamma γ began.
- This state must contain an item of the form B → α ⋅ A β B \rightarrow \alpha \cdot A \beta B→α⋅Aβ ;Push A A A onto the stack, and push the state containing the item B → α A ⋅ β B \rightarrow \alpha A \cdot \beta B→αA⋅β .
- If the next input token is such that neither of the above two cases applies, an error is declared.
a grammar is SLR(1) if and only if, for any state s s s, the following two conditions are satisfied:
- For any item A → α ⋅ X β A\rightarrow \alpha \cdot X \beta A→α⋅Xβ in s s s with X X X a terminal, there is no complete item B → γ ⋅ B \rightarrow \gamma \cdot B→γ⋅ in s with X ∈ F o l l o w ( B ) X \in Follow(B) X∈Follow(B).
- For any two complete items A → α ⋅ A \rightarrow \alpha \cdot A→α⋅ and B → β ⋅ B\rightarrow \beta \cdot B→β⋅ in s s s, F o l l o w ( A ) ∩ F o l l o w ( B ) Follow(A) \cap Follow(B) Follow(A)∩Follow(B) is empty.
SLR(1) is likely to cause stack overflow when it has lots of right recursion
remove right recursion(like remove left recursion)
-
A → α A ∣ β A \rightarrow \alpha A|\beta A→αA∣β
transform to
A → A ′ β A ′ → A ′ α ∣ ϵ A \rightarrow {A}'\beta\\ {A}' \rightarrow {A}' \alpha |\epsilon A→A′βA′→A′α∣ϵ -
A → α 1 A ∣ α 2 A ∣ . . . ∣ α n A ∣ β 1 ∣ β 2 ∣ . . . ∣ β m A \rightarrow \alpha_{1} A|\alpha_{2} A|...|\alpha_{n} A|\beta_{1}|\beta_{2}|...|\beta_{m} A→α1A∣α2A∣...∣αnA∣β1∣β2∣...∣βm
transform to
A → A ′ ( β 1 ∣ β 2 ∣ . . . ∣ β m ) A ′ → A ′ ( α 1 ∣ α 2 ∣ . . . ∣ α n ) ∣ ϵ A \rightarrow {A}'(\beta_{1}|\beta_{2}|...|\beta_{m})\\ {A}' \rightarrow {A}' (\alpha_{1}|\alpha_{2}|...|\alpha_{n})|\epsilon A→A′(β1∣β2∣...∣βm)A′→A′(α1∣α2∣...∣αn)∣ϵ -
general right recursion
gammers with no ϵ \epsilon ϵ-productions and no cycles
algorithm:- picking an arbitrary order for all non-terminals, ay, A 1 , . . . , A m A_{1},...,A_{m} A1,...,Am
- eliminates all rules of the form A i → γ A j A_{i}\rightarrow \gamma A_{j} Ai→γAj with j ⩽ i j\leqslant i j⩽i
- every step in suach a loop would only increase the index,and thus the original index cannot be reached again
pseudo code:
i=1 to m
j=1 to i-1
repalce Ai->βAj by the rule Ai->βα1|βα2|...|βαk
where Aj->α1|α2|...|αk is the current rule for Aj
remove,if necessary,immediate left recursion involving Ai
two kinds of parsing conflicts:
shift-reduce conflicts
reduce-reduce conflicts
always prefer the shift over the reduce.
LR(1) Parsing
an LR(1) item is a pair consisting of an LR(0) item and a lookahead token,like
[
A
→
α
⋅
β
,
a
]
[A\rightarrow \alpha \cdot \beta,a]
[A→α⋅β,a],
a
a
a means follow
A
A
A
parsing algorithm:
- Given an LR(1) item [ A → α ⋅ X γ , a ] [A\rightarrow \alpha \cdot X\gamma,a] [A→α⋅Xγ,a], where X X X is any symbol (terminal or nontermilnal), there is a transition on X to the item [ A → α X ⋅ γ , a ] [A\rightarrow \alpha X\cdot \gamma,a] [A→αX⋅γ,a]
- Given an LR(1) item [ A → α ⋅ B γ , a ] [A\rightarrow \alpha \cdot B\gamma,a] [A→α⋅Bγ,a], where B B B is a nonterminal, there are ϵ \epsilon ϵ-transitions to items [ B → ⋅ β , b ] [B \rightarrow \cdot \beta,b] [B→⋅β,b] for every production B → β B \rightarrow \beta B→β and for every token b in F i r s t ( γ a ) First(\gamma a) First(γa).
The start symbol of the NFA of LR(1) items becomes the item [S’ —>.S, $].
General LR(1) parsing algorithm:
Let
s
s
s be the current state (at the top of the parsing stack). Then actions are defined as follows:
- If state s s s: any LR(1) item of the form [ A → α ⋅ X β , a ] [A\rightarrow \alpha \cdot X\beta,a] [A→α⋅Xβ,a], X X X is a terminal, and X X X is the next token in the input string
- If state s s s : the complete LR(1) item [ A → α ⋅ , a ] [A\rightarrow \alpha \cdot,a] [A→α⋅,a] , the next token: in the input string is a a a
- If the next input token is such that neither of the above two cases applies, an error is declared.
a grammar is LR(1) if and only if, for any state s s s. the following two conditions are satisfied.
- For any item [ A → α ⋅ X β , a ] [A \rightarrow \alpha \cdot X\beta,a] [A→α⋅Xβ,a] in s with X X X a terminal, there is no item in s of the form [ B → γ ⋅ , X ] [B\rightarrow \gamma \cdot,X] [B→γ⋅,X] (otherwise there is a shift-reduce conflict).
- There are no two items in s of the form [ A → α ⋅ , a ] [A\rightarrow \alpha\cdot, a] [A→α⋅,a] and [ B → β ⋅ , a ] [B\rightarrow \beta \cdot,a] [B→β⋅,a] (otherwise, there is a reduce-reduce conflict).
LALR(1) Parsing
principle:
- The core of a state of the DFA of LR(1) items is a state of the DFA of LR(0) items.
- Given two states s 1 s1 s1 and s 2 s2 s2 of the DFA of LR(1) items that have the same core, suppose there is a transition on the symbol X X X from s 1 s1 s1 to a state t 1 t1 t1. Then there is also a transition on X X X from state s 2 s2 s2 to a state t 2 t2 t2, and the states t 1 t1 t1 and t 2 t2 t2 have the same core.
propagating lookaheads parsing:unknown
error recovery
An LR(1) parser can, for example, detect errors earlier than an LALR(1) or SLR(1) parser, and these latter can detect errors earlier than an LR(0) parser. (earlier is on time)
A good error recovery in bottom-up parsers: removing symbol from either the parsing stack or the input or both.
There are three possible alternative actions:
- Pop a state from the stack.
- Successively pop tokens from the input until a token is seen for which we can restart the parse.
- push a new state onto the stack.
more concrete,
- Pop states from the parsing stack until a state is found with nonempty Goto entries.
- If there is a legal action on the current input token from one of the Goto states, push that state onto the stack and restart the parse. If there are several such states,prefer a shift to a reduce.Among the reduce actions,prefer one whose associated nonterminal is lesat general.
- If there is no legal action on the current input token from one of the Goto states, advance the input.
There are several possible solutions (infinite loop):
- insist on a shift action from a Goto state in step 2. ( too restrictive )
- if the next legal move is a reduction, to set a flag that causes the parser to keep track of the sequence of states during the following reductions
- if the same state recurs, to pop stack states until the original state is removed.
yacc:
{definitions}
%%
{rules}
%%
{auxiliary routines}
%{
#include <stdio.h>
#include <ctype.h>
%}
%token NUMBER
%%
command: exp { printf (“%d\n”,$1);}
; /*allows printing of the result */
exp: exp '+' term {$$ = $1 + $3;}
| exp '-' term {$$ = $1 - $3;}
| term {$$ = $1;}
;
term: term '*' factor {$$ = $1* $3;}
| factor {$$ = $1;}
;
factor :NUMBER {$$ = $1;}
| '(' exp ')' {$$=$2;}
;
%%
main ( )
{ return yyparse( );
}
int yylex(void)
{ int c;
while( ( c = getchar ( ) )== ‘ ’ );
/*eliminates blanks */
if ( isdigit(c) ) {
unget (c,stidin) ;
scanf (“%d”,&yylval ) ;
return (NUMBER ) ;
}
if (c== ‘\n’) return 0;
/* makes the parse stop */
return ( c ) ;
}
int yyerror (char * s)
{ fprintf (stderr, “%s\n”,s ) ;
return 0;
}/* allows for printing of an error message */
yacc优先级:
- Yacc disambiguates by preferring the reduction by the grammar rule listed first in the specification file.
- %left ‘+’ ‘-’
%left ‘*’ (specified in the definitions )
the operators ‘+’ and ‘-’ have the same precedence and are left associative
the operator ‘*’ is left associative and has higher precedence than ‘+’ and ‘-’ - the priority of a rule is given by the last token occurring on the right-hand side of that rule.
- If the rule has higher priority, the conflict is resolved in favor of reducing.
- If the rule and token have equal priority, then a left precedence favors reducing, right favors shifting, and nonassoc yields an error action.
- assign a specific precedence to a rule using the %prec directive
Local error recovery mechanisms:
adjusting the parse stack and the input at the point where the error was detected in a way that will allow parsing to resume.
Yacc parser generator :uses a special error symbol to control the recovery process.
error is considered a terminal symbol.
When the LR parser reaches an error state, it takes the following actions:
- Pop the stack (if necessary) until a state is reached in which the action for the error token is shift.
- Shift the error token.
- Discard input symbols (if necessary) until a lookahead is reached that has a nonerror action in the current state.
- Resume normal parsing.