编译原理 第四章 Top-Down Parsing

本书以 编译原理及实践 为textbook

Chapter Four. Top-Down Parsing

Recursive-descent parsing(requires EBNF)
LL(k) parsing
when LL(1) parsing,the process is in the stack,so the parsing is inverted
action:generate(replace a non-terminal A at the top of stack by grammer A → α A\rightarrow \alpha Aα) or match
LL(1) parsing table-construction rule

  1. if A → α A\rightarrow \alpha Aα and α ⇒ ∗ a β \alpha \Rightarrow ^{*}a\beta αaβ( a a a is a token),add A → α A\rightarrow \alpha Aα to the table entry M [ A , a ] M[A,a] M[A,a];
  2. if A → α A\rightarrow \alpha Aα and α ⇒ ∗ ϵ \alpha \Rightarrow ^{*}\epsilon αϵ and S S S$ ⇒ ∗ β A a γ \Rightarrow ^{*}\beta Aa\gamma βAaγ(S is the start symbol,a is a token(or $)),add A → α A\rightarrow \alpha Aα to the table entry M [ A , a ] M[A,a] M[A,a]

LL(1) grammer if parsing table has at most one production in table entry
algorithm:

  1. if the top of the parsing stack is terminal a a a and the next input token is a a a,then match(pop the parsing stack,advance the input),else error
  2. if the top of the parsing stack is non-terminal A A A and the next input token is a a a,and exist M [ A , a ] = A → X 1 X 2 . . . X n M[A,a]=A\rightarrow X_{1}X_{2}...X_{n} M[A,a]=AX1X2...Xn),then generate(pop the parsing stack and push X i X_{i} Xi onto the parsing stack by reverse order),else error
    if the top of the parsing stack is $ and the next input token is $,accept

left recursion removal
immediate left recursion
indirect left recursion

  1. immediate left recursion
    A → A α ∣ β A\rightarrow A\alpha |\beta AAαβ
    transform to
    A → β A ′ A ′ → α A ′ ∣ ϵ A\rightarrow \beta {A}'\\ {A}'\rightarrow \alpha {A}' |\epsilon AβAAαAϵ

  2. general immediate left recursion
    A → A α 1 ∣ A α 2 ∣ . . . ∣ A α n ∣ β 1 ∣ β 2 ∣ . . . ∣ β m A\rightarrow A\alpha_{1}|A\alpha_{2}|...|A\alpha_{n}|\beta_{1}|\beta_{2}|...|\beta_{m} AAα1Aα2...Aαnβ1β2...βm
    transform to
    A → β 1 A ∣ β 2 A ∣ . . . ∣ β m A ′ A ′ → α 1 A ′ ∣ α 2 A ′ ∣ . . . α n A ′ ∣ ϵ A\rightarrow \beta_{1} {A}|\beta_{2} {A}|...|\beta_{m} {A}' \\{A}'\rightarrow \alpha_{1} {A}'|\alpha_{2} {A}'|...\alpha_{n} {A}'|\epsilon Aβ1Aβ2A...βmAAα1Aα2A...αnAϵ

  3. general left recursion
    gammers with no ϵ \epsilon ϵ-productions and no cycles
    algorithm:

    1. picking an arbitrary order for all non-terminals,say, A 1 , . . . , A m A_{1},...,A_{m} A1,...,Am
    2. eliminates all rules of the form A i → A j γ A_{i}\rightarrow A_{j}\gamma AiAjγ with j ⩽ i j\leqslant i ji
    3. every step in suach a loop would only increase the index,and thus the original index cannot be reached again

    pseudo code:

i=1 to m
	j=1 to i-1
		repalce Ai->Ajβ by the rule Ai->α1β|α2β|...|αkβ 
		where Aj->α1|α2|...|αk is the current rule for Aj
	remove,if necessary,immediate left recursion involving Ai

right recursion grammer can be executed in left recursion way by return value for non-terminal token

left factoring
solution:
A → α β ∣ α γ A\rightarrow \alpha\beta | \alpha\gamma Aαβαγ
transforms to
A → α A ′ A ′ → β ∣ γ A\rightarrow \alpha {A}'\\ {A}'\rightarrow \beta | \gamma AαAAβγ
algorithm:

It is more difficult for LL(1) to adapt to syntax tree construction.

  1. The structure of the syntax tree can be obscured by left factoring and left recursion removal;
  2. The parsing stack represents only predicated structure, not structure that have been actually seen.
    The solution:
  3. An extra stack is used to keep track of syntax tree nodes;
  4. “action” markers are placed in the parsing stack to indicate when and what actions on the tree stack should occur.

First Set,means the first nonterminal can be derivated by the token
Follow Set,means the first nonterminal(except ϵ \epsilon ϵ) following the token

First Set algorithm:
α → X 1 X 2 . . . X n \alpha \rightarrow X_{1}X_{2}...X_{n} αX1X2...Xn

  1. First( α \alpha α) contains F i r s t ( X 1 ) − { ϵ } First(X_{1})-\{\epsilon\} First(X1){ϵ}
  2. For each i = 2 , . . . , n i=2,...,n i=2,...,n,if forr all k = 1 , . . . , i − 1 k=1,...,i-1 k=1,...,i1, F i r s t ( X k ) First(X_{k}) First(Xk) contains ϵ \epsilon ϵ,then F i r s t ( α ) First(\alpha) First(α) contains F i r s t ( X k ) − { ϵ } First(X_{k})-\{\epsilon\} First(Xk){ϵ}
  3. if all the set F i r s t ( X 1 ) First(X_{1}) First(X1),…, F i r s t ( X n ) First(X_{n}) First(Xn) contain ϵ \epsilon ϵ,then F i r s t ( α ) First(\alpha) First(α) contains ϵ \epsilon ϵ

Follow Set algorithm:

  1. If A is the start symbol,the $ is in the F o l l o w ( A ) Follow(A) Follow(A).
  2. If there is a production B → α A γ B \rightarrow \alpha A \gamma BαAγ,then F i r s t ( γ ) − { ϵ } First(\gamma)-\{\epsilon\} First(γ){ϵ} is in F o l l o w ( A ) Follow(A) Follow(A)
  3. If there is a production B → α A γ B\rightarrow \alpha A \gamma BαAγ,and ϵ \epsilon ϵ in F i r s t ( γ ) First(\gamma) First(γ),then F o l l o w ( A ) Follow(A) Follow(A) contains F o l l o w ( B ) Follow(B) Follow(B)

simplify LL(1) parsing table-construction rules:

  1. For each token a a a in F i r s t ( α ) First(\alpha) First(α),add A → α A\rightarrow \alpha Aα to the entry M [ A , a ] M[A,a] M[A,a].
  2. If ϵ \epsilon ϵ is in F i r s t ( α ) First(\alpha) First(α),for each element a a a of F o l l o w ( A ) Follow(A) Follow(A)(a token or $),add A → α A\rightarrow \alpha Aα to M [ A , a ] M[A,a] M[A,a].

A garmmer in BNF is L L ( 1 ) LL(1) LL(1) if the following conditions are satified:

  1. For every production A → α 1 ∣ α 2 ∣ . . . ∣ α n A\rightarrow \alpha_{1}|\alpha_{2}|...|\alpha_{n} Aα1α2...αn, F i r s t ( α i ) ⋂ F i r s t ( α j ) First(\alpha_{i})\bigcap First(\alpha_{j}) First(αi)First(αj) is empty for all i i i and j j j, 1 ⩽ i , j ⩽ n 1\leqslant i,j\leqslant n 1i,jn, i ≠ j i\neq j i=j
  2. For every non terminal A A A such that F i r s t ( A ) First(A) First(A) contains ϵ \epsilon ϵ, F i r s t ( A ) ⋂ F o l l o w ( A ) First(A)\bigcap Follow(A) First(A)Follow(A) is empty.

error recovery
different levels of response to errors:

  1. Give a meaningful error message;
    determine as closely as possible the location where that error has occurred.
  2. Some form of error correction; (error repair)
    the parser attempts to infer a correct program from the incorrect one given.

Some important considerations that apply are the following:

  1. To determine that an error has occurred as soon as possible.
  2. After an error has occurred
    pick a likely place to resume the parse
    try to parse as much of the code as possible
  3. To avoid the error cascade problem.
  4. To avoid infinite loops on error.

Panic mode:

  1. A standard form of error recovery in recursive-decent parsers
  2. The error handler will consume a possibly large number of tokens in an attempt to find a place to resume parsing;

mechanism:

  1. A set of synchronizing tokens are provided to each recursive procedure;
  2. If an error is encountered, the parser scans ahead, throwing away tokens until one of the synchronized tokens is seen in the input, whence parsing is resumed.

one way is parsing table,another way is use synchset stack
LL(1) parsing table addition:

  1. M [ A , α ] = s y n c M[A,\alpha]=sync M[A,α]=sync,if α ∈ F o l l o w ( A ) \alpha \in Follow(A) αFollow(A)
  2. M [ A , α ] = ′ s k i p ′ M[A,\alpha]='skip' M[A,α]=skip,if α ∉ F i r s t ( A ) ∪ F o l l o w ( A ) \alpha \notin First(A) \cup Follow(A) α/First(A)Follow(A)

Given a non-ternimal A at the top of the stack and an input token that is not in First(A) (or Follow(A), if εis in First(A)), there are three possible alternative:

  1. Pop A from the stack.
  2. Successively pop tokens from the input until a token is seen (restart).
  3. Push a new non-terminal onto the stack

algorithm:

  1. alternative 1 (Pop)
    if the current input token is $ or is in Follow(A);
  2. alternative 2 (scan)
    if the current input token is not $ and is not in First(A)∪Follow(A)
  3. Option 3 (Push a new non-terminal )
    occasionally useful in special situation.
  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值