编译原理第四章 Top-Down Parsing

最新推荐文章于 2021-05-18 14:28:37 发布

CarolusRex

最新推荐文章于 2021-05-18 14:28:37 发布

阅读量443

点赞数 1

分类专栏：编译原理文章标签：编译器

本文链接：https://blog.csdn.net/CarolusRex/article/details/105511142

版权

编译原理专栏收录该内容

8 篇文章 0 订阅

订阅专栏

本书以编译原理及实践为textbook

Chapter Four. Top-Down Parsing

Recursive-descent parsing(requires EBNF)
LL(k) parsing
when LL(1) parsing,the process is in the stack,so the parsing is inverted
action:generate(replace a non-terminal A at the top of stack by grammer $A\rightarrow \alpha$ ) or match
LL(1) parsing table-construction rule

if $A\rightarrow \alpha$ and $\alpha \Rightarrow ^{*}a\beta$ ( $a$ is a token),add $A\rightarrow \alpha$ to the table entry $M [A, a]$ ;
if $A\rightarrow \alpha$ and $\alpha \Rightarrow ^{*}\epsilon$ and $S$ $ $\Rightarrow ^{*}\beta Aa\gamma$ (S is the start symbol,a is a token(or $)),add $A\rightarrow \alpha$ to the table entry $M [A, a]$

LL(1) grammer if parsing table has at most one production in table entry
algorithm:

if the top of the parsing stack is terminal $a$ and the next input token is $a$ ,then match(pop the parsing stack,advance the input),else error
if the top of the parsing stack is non-terminal $A$ and the next input token is $a$ ,and exist $M[A,a]=A\rightarrow X_{1}X_{2}...X_{n}$ ),then generate(pop the parsing stack and push $X_{i}$ onto the parsing stack by reverse order),else error
if the top of the parsing stack is $ and the next input token is $,accept

left recursion removal
immediate left recursion
indirect left recursion

immediate left recursion
$A\rightarrow A\alpha |\beta$
transform to
$A\rightarrow \beta {A}'\\ {A}'\rightarrow \alpha {A}' |\epsilon$
general immediate left recursion
$A\rightarrow A\alpha_{1}|A\alpha_{2}|...|A\alpha_{n}|\beta_{1}|\beta_{2}|...|\beta_{m}$
transform to
$A\rightarrow \beta_{1} {A}|\beta_{2} {A}|...|\beta_{m} {A}' \\{A}'\rightarrow \alpha_{1} {A}'|\alpha_{2} {A}'|...\alpha_{n} {A}'|\epsilon$
general left recursion
gammers with no $\epsilon$ -productions and no cycles
algorithm:
1. picking an arbitrary order for all non-terminals,say, $A_{1},...,A_{m}$
2. eliminates all rules of the form $A_{i}\rightarrow A_{j}\gamma$ with $j\leqslant i$
3. every step in suach a loop would only increase the index,and thus the original index cannot be reached again
pseudo code：

i=1 to m
	j=1 to i-1
		repalce Ai->Ajβ by the rule Ai->α1β|α2β|...|αkβ 
		where Aj->α1|α2|...|αk is the current rule for Aj
	remove,if necessary,immediate left recursion involving Ai

right recursion grammer can be executed in left recursion way by return value for non-terminal token

left factoring
solution:
$A\rightarrow \alpha\beta | \alpha\gamma$
transforms to
$A\rightarrow \alpha {A}'\\ {A}'\rightarrow \beta | \gamma$
algorithm:
…
It is more difficult for LL(1) to adapt to syntax tree construction.

The structure of the syntax tree can be obscured by left factoring and left recursion removal;
The parsing stack represents only predicated structure, not structure that have been actually seen.
The solution:
An extra stack is used to keep track of syntax tree nodes;
“action” markers are placed in the parsing stack to indicate when and what actions on the tree stack should occur.

First Set,means the first nonterminal can be derivated by the token
Follow Set,means the first nonterminal(except $\epsilon$ ) following the token

First Set algorithm:
$\alpha \rightarrow X_{1}X_{2}...X_{n}$

First( $\alpha$ ) contains $First(X_{1})-\{\epsilon\}$
For each $i = 2, . . ., n$ ,if forr all $k = 1, . . ., i - 1$ , $First(X_{k})$ contains $\epsilon$ ,then $First(\alpha)$ contains $First(X_{k})-\{\epsilon\}$
if all the set $First(X_{1})$ ,…, $First(X_{n})$ contain $\epsilon$ ,then $First(\alpha)$ contains $\epsilon$

Follow Set algorithm:

If A is the start symbol,the $ is in the $F o l l o w (A)$ .
If there is a production $\rightarrow \alpha A \gamma$ ,then $First(\gamma)-\{\epsilon\}$ is in $F o l l o w (A)$
If there is a production $B\rightarrow \alpha A \gamma$ ,and $\epsilon$ in $First(\gamma)$ ,then $F o l l o w (A)$ contains $F o l l o w (B)$

simplify LL(1) parsing table-construction rules:

For each token $a$ in $First(\alpha)$ ,add $A\rightarrow \alpha$ to the entry $M [A, a]$ .
If $\epsilon$ is in $First(\alpha)$ ,for each element $a$ of $F o l l o w (A)$ (a token or $),add $A\rightarrow \alpha$ to $M [A, a]$ .

A garmmer in BNF is $L L (1)$ if the following conditions are satified:

For every production $A\rightarrow \alpha_{1}|\alpha_{2}|...|\alpha_{n}$ , $First(\alpha_{i})\bigcap First(\alpha_{j})$ is empty for all $i$ and $j$ , $1\leqslant i,j\leqslant n$ , $i\neq j$
For every non terminal $A$ such that $F i r s t (A)$ contains $\epsilon$ , $First(A)\bigcap Follow(A)$ is empty.

error recovery
different levels of response to errors:

Give a meaningful error message;
determine as closely as possible the location where that error has occurred.
Some form of error correction; (error repair)
the parser attempts to infer a correct program from the incorrect one given.

Some important considerations that apply are the following:

To determine that an error has occurred as soon as possible.
After an error has occurred
pick a likely place to resume the parse
try to parse as much of the code as possible
To avoid the error cascade problem.
To avoid infinite loops on error.

Panic mode:

A standard form of error recovery in recursive-decent parsers
The error handler will consume a possibly large number of tokens in an attempt to find a place to resume parsing;

mechanism:

A set of synchronizing tokens are provided to each recursive procedure;
If an error is encountered, the parser scans ahead, throwing away tokens until one of the synchronized tokens is seen in the input, whence parsing is resumed.

one way is parsing table,another way is use synchset stack
LL(1) parsing table addition:

$M[A,\alpha]=sync$ ,if $\alpha \in Follow(A)$
$M[A,\alpha]='skip'$ ,if $\alpha \notin First(A) \cup Follow(A)$

Given a non-ternimal A at the top of the stack and an input token that is not in First(A) (or Follow(A), if εis in First(A)), there are three possible alternative:

Pop A from the stack.
Successively pop tokens from the input until a token is seen (restart).
Push a new non-terminal onto the stack

algorithm:

alternative 1 (Pop)
if the current input token is $ or is in Follow(A);
alternative 2 (scan)
if the current input token is not $ and is not in First(A)∪Follow(A)
Option 3 (Push a new non-terminal )
occasionally useful in special situation.

CarolusRex

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
编译原理第四章 Top-Down Parsing

本书以编译原理及实践为textbookChapter Four. Top-Down ParsingRecursive-descent parsing(requires EBNF)LL(k) parsingwhen LL(1) parsing,the process is in the stack,so the parsing is invertedaction:generate(re...
复制链接

扫一扫