编译原理 第五章 Bottom-Up Parsing

本书以 编译原理及实践 为textbook

Chapter Five. Bottom-Up Parsing

LR grammars (from Left to Right,rightmost derivation)
shift-reduce parsers
action:

  1. Shift : Shift a terminal from the front of the input to the top of the stack.
  2. Reduce: Reduce a string α \alpha α at the top of the stack to a nonterminal A, given the BNF choice A → α A\rightarrow \alpha Aα.
    A bottom-up parser : a shift-reduce parser.

One further feature of bottom-up parsers: grammars are always augmented with a new start symbol.
if S is the start symbol, a new start symbol S’ is added to the grammar : S ′ → S {S}' \rightarrow S SS

LR(0) item
NFA of LR(0):

  1. If X is a token or a nonterminal
    the item can be written as A → α ⋅ X η A\rightarrow \alpha\cdot X\eta AαXη
  2. If X is a token, then this transition corresponds to a shift of X from the input to the top of the stack during a parse.
  3. if X is a nonterminal
    X will never appear as an input symbol. (such a transition will still correspond to the pushing of X onto the stack during a parse, but this can only occur during a reduction by a production X → ⋅ β X \rightarrow \cdot\beta Xβ )

DFA of LR(0)
LR(0) Parsing Algorithm:
Let s be the current state (at the top of the parsing stack).Then actions are defined as follows:

  1. If state s contains any item of the form A → α ⋅ X β A\rightarrow \alpha \cdot X \beta AαXβ(X is a terminal). Then the action is to shift the current input token X on to the stack.
    if the token is not X, an error is declared.

  2. If state s contains any complete item (an item of the form A → γ ⋅ A\rightarrow \gamma \cdot Aγ), then the action is to reduce by the rule A → γ ⋅ A\rightarrow \gamma \cdot Aγ

    1. back up to the state from which the construction γ began. This state must contain a item of the form B → α ⋅ A β B\rightarrow \alpha \cdot A \beta BαAβ;Push A A A onto the stack.
    2. A reduction by the rule S ′ → S {S}'\rightarrow S SS, where S’ is the start state,

    Acceptance if the input is empty
    Error if the input is not empty.

A grammar is LR(0) if and only if

  1. each state is a shift state( a state containing only “shift” items)
  2. a reduce state containing a single complete item.

LR(0) Parsing Table
The table rows labeled with the states of the DFA.

SLR(1) Parsing
algorithm:

  1. If state s contains any item of form A → α ⋅ X β A\rightarrow \alpha \cdot X \beta AαXβ, then the action is to shift the current input token X onto the stack, and the new state to be pushed on the stack is the state containing the item A → α X ⋅ β A\rightarrow \alpha X \cdot \beta AαXβ
  2. If state s contains the complete item A → γ ⋅ A\rightarrow \gamma \cdot Aγ ,and the next token in the input string is in F o l l o w ( A ) Follow(A) Follow(A), then the action is to reduce by the rule A → γ ⋅ A\rightarrow \gamma \cdot Aγ.
    1. A reduction by the rule S ′ → S {S}' \rightarrow S SS, where S ′ {S}' S is the start state, this will happen only if the next input token is $.
    2. Remove the string γ \gamma γ and all of its corresponding states from the parsing stack, back up in the DFA to the state from which the construction of γ \gamma γ began.
    3. This state must contain an item of the form B → α ⋅ A β B \rightarrow \alpha \cdot A \beta BαAβ ;Push A A A onto the stack, and push the state containing the item B → α A ⋅ β B \rightarrow \alpha A \cdot \beta BαAβ .
  3. If the next input token is such that neither of the above two cases applies, an error is declared.

a grammar is SLR(1) if and only if, for any state s s s, the following two conditions are satisfied:

  1. For any item A → α ⋅ X β A\rightarrow \alpha \cdot X \beta AαXβ in s s s with X X X a terminal, there is no complete item B → γ ⋅ B \rightarrow \gamma \cdot Bγ in s with X ∈ F o l l o w ( B ) X \in Follow(B) XFollow(B).
  2. For any two complete items A → α ⋅ A \rightarrow \alpha \cdot Aα and B → β ⋅ B\rightarrow \beta \cdot Bβ in s s s, F o l l o w ( A ) ∩ F o l l o w ( B ) Follow(A) \cap Follow(B) Follow(A)Follow(B) is empty.

SLR(1) is likely to cause stack overflow when it has lots of right recursion
remove right recursion(like remove left recursion)

  1. A → α A ∣ β A \rightarrow \alpha A|\beta AαAβ
    transform to
    A → A ′ β A ′ → A ′ α ∣ ϵ A \rightarrow {A}'\beta\\ {A}' \rightarrow {A}' \alpha |\epsilon AAβAAαϵ

  2. A → α 1 A ∣ α 2 A ∣ . . . ∣ α n A ∣ β 1 ∣ β 2 ∣ . . . ∣ β m A \rightarrow \alpha_{1} A|\alpha_{2} A|...|\alpha_{n} A|\beta_{1}|\beta_{2}|...|\beta_{m} Aα1Aα2A...αnAβ1β2...βm
    transform to
    A → A ′ ( β 1 ∣ β 2 ∣ . . . ∣ β m ) A ′ → A ′ ( α 1 ∣ α 2 ∣ . . . ∣ α n ) ∣ ϵ A \rightarrow {A}'(\beta_{1}|\beta_{2}|...|\beta_{m})\\ {A}' \rightarrow {A}' (\alpha_{1}|\alpha_{2}|...|\alpha_{n})|\epsilon AA(β1β2...βm)AA(α1α2...αn)ϵ

  3. general right recursion
    gammers with no ϵ \epsilon ϵ-productions and no cycles
    algorithm:

    1. picking an arbitrary order for all non-terminals, ay, A 1 , . . . , A m A_{1},...,A_{m} A1,...,Am
    2. eliminates all rules of the form A i → γ A j A_{i}\rightarrow \gamma A_{j} AiγAj with j ⩽ i j\leqslant i ji
    3. every step in suach a loop would only increase the index,and thus the original index cannot be reached again

    pseudo code:

i=1 to m
	j=1 to i-1
		repalce Ai->βAj by the rule Ai->βα1|βα2|...|βαk 
		where Aj->α1|α2|...|αk is the current rule for Aj
	remove,if necessary,immediate left recursion involving Ai

two kinds of parsing conflicts:
shift-reduce conflicts
reduce-reduce conflicts
always prefer the shift over the reduce.

LR(1) Parsing
an LR(1) item is a pair consisting of an LR(0) item and a lookahead token,like [ A → α ⋅ β , a ] [A\rightarrow \alpha \cdot \beta,a] [Aαβ,a], a a a means follow A A A
parsing algorithm:

  1. Given an LR(1) item [ A → α ⋅ X γ , a ] [A\rightarrow \alpha \cdot X\gamma,a] [AαXγ,a], where X X X is any symbol (terminal or nontermilnal), there is a transition on X to the item [ A → α X ⋅ γ , a ] [A\rightarrow \alpha X\cdot \gamma,a] [AαXγ,a]
  2. Given an LR(1) item [ A → α ⋅ B γ , a ] [A\rightarrow \alpha \cdot B\gamma,a] [AαBγ,a], where B B B is a nonterminal, there are ϵ \epsilon ϵ-transitions to items [ B → ⋅ β , b ] [B \rightarrow \cdot \beta,b] [Bβ,b] for every production B → β B \rightarrow \beta Bβ and for every token b in F i r s t ( γ a ) First(\gamma a) First(γa).

The start symbol of the NFA of LR(1) items becomes the item [S’ —>.S, $].

General LR(1) parsing algorithm:
Let s s s be the current state (at the top of the parsing stack). Then actions are defined as follows:

  1. If state s s s: any LR(1) item of the form [ A → α ⋅ X β , a ] [A\rightarrow \alpha \cdot X\beta,a] [AαXβ,a], X X X is a terminal, and X X X is the next token in the input string
  2. If state s s s : the complete LR(1) item [ A → α ⋅ , a ] [A\rightarrow \alpha \cdot,a] [Aα,a] , the next token: in the input string is a a a
  3. If the next input token is such that neither of the above two cases applies, an error is declared.

a grammar is LR(1) if and only if, for any state s s s. the following two conditions are satisfied.

  1. For any item [ A → α ⋅ X β , a ] [A \rightarrow \alpha \cdot X\beta,a] [AαXβ,a] in s with X X X a terminal, there is no item in s of the form [ B → γ ⋅ , X ] [B\rightarrow \gamma \cdot,X] [Bγ,X] (otherwise there is a shift-reduce conflict).
  2. There are no two items in s of the form [ A → α ⋅ , a ] [A\rightarrow \alpha\cdot, a] [Aα,a] and [ B → β ⋅ , a ] [B\rightarrow \beta \cdot,a] [Bβ,a] (otherwise, there is a reduce-reduce conflict).

LALR(1) Parsing
principle:

  1. The core of a state of the DFA of LR(1) items is a state of the DFA of LR(0) items.
  2. Given two states s 1 s1 s1 and s 2 s2 s2 of the DFA of LR(1) items that have the same core, suppose there is a transition on the symbol X X X from s 1 s1 s1 to a state t 1 t1 t1. Then there is also a transition on X X X from state s 2 s2 s2 to a state t 2 t2 t2, and the states t 1 t1 t1 and t 2 t2 t2 have the same core.

propagating lookaheads parsing:unknown

error recovery
An LR(1) parser can, for example, detect errors earlier than an LALR(1) or SLR(1) parser, and these latter can detect errors earlier than an LR(0) parser. (earlier is on time)

A good error recovery in bottom-up parsers: removing symbol from either the parsing stack or the input or both.
There are three possible alternative actions:

  1. Pop a state from the stack.
  2. Successively pop tokens from the input until a token is seen for which we can restart the parse.
  3. push a new state onto the stack.

more concrete,

  1. Pop states from the parsing stack until a state is found with nonempty Goto entries.
  2. If there is a legal action on the current input token from one of the Goto states, push that state onto the stack and restart the parse. If there are several such states,prefer a shift to a reduce.Among the reduce actions,prefer one whose associated nonterminal is lesat general.
  3. If there is no legal action on the current input token from one of the Goto states, advance the input.

There are several possible solutions (infinite loop):

  1. insist on a shift action from a Goto state in step 2. ( too restrictive )
  2. if the next legal move is a reduction, to set a flag that causes the parser to keep track of the sequence of states during the following reductions
  3. if the same state recurs, to pop stack states until the original state is removed.

在这里插入图片描述
yacc:
{definitions}
%%
{rules}
%%
{auxiliary routines}

%{
#include <stdio.h>
#include <ctype.h>
%}
%token NUMBER
%%
command:   exp { printf (“%d\n”,$1);}
                  ;            /*allows printing of the result */
exp: exp '+' term   {$$ = $1 + $3;}
      | exp '-' term     {$$ = $1 - $3;}
      | term                  {$$ = $1;}
      ;
term:  term '*' factor   {$$ = $1* $3;}
       | factor  {$$ = $1;}
       ;
factor :NUMBER   {$$ = $1;}
          | '('  exp ')'  {$$=$2;}
          ;
%% 
main ( )
{ return yyparse( ); 
}
int yylex(void)
{   int c;
    while( ( c = getchar ( ) )== ‘ ’ );
   /*eliminates blanks */
   if ( isdigit(c) ) {
      unget (c,stidin) ;
      scanf (“%d”,&yylval ) ;
      return (NUMBER ) ;
    }
   if (c== ‘\n’) return 0;
       /* makes the parse stop */
   return ( c ) ;
 }
int yyerror (char * s)
{ fprintf (stderr, “%s\n”,s ) ;
return 0;
}/* allows for printing of an error message */

yacc优先级:

  1. Yacc disambiguates by preferring the reduction by the grammar rule listed first in the specification file.
  2. %left ‘+’ ‘-’
    %left ‘*’ (specified in the definitions )
    the operators ‘+’ and ‘-’ have the same precedence and are left associative
    the operator ‘*’ is left associative and has higher precedence than ‘+’ and ‘-’
  3. the priority of a rule is given by the last token occurring on the right-hand side of that rule.
    1. If the rule has higher priority, the conflict is resolved in favor of reducing.
    2. If the rule and token have equal priority, then a left precedence favors reducing, right favors shifting, and nonassoc yields an error action.
    3. assign a specific precedence to a rule using the %prec directive

Local error recovery mechanisms:
adjusting the parse stack and the input at the point where the error was detected in a way that will allow parsing to resume.
Yacc parser generator :uses a special error symbol to control the recovery process.
error is considered a terminal symbol.
When the LR parser reaches an error state, it takes the following actions:

  1. Pop the stack (if necessary) until a state is reached in which the action for the error token is shift.
  2. Shift the error token.
  3. Discard input symbols (if necessary) until a lookahead is reached that has a nonerror action in the current state.
  4. Resume normal parsing.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值