Notes for Formal Languages and Automata Theory

cervoliu

已于 2022-04-16 10:10:08 修改

阅读量385

点赞数

分类专栏：杂文

于 2022-02-23 11:34:16 首次发布

本文链接：https://blog.csdn.net/lyd_7_29/article/details/123082792

版权

notes 形式语言与自动机

杂文专栏收录该内容

26 篇文章 0 订阅

订阅专栏

Operations : union , concatenation , Kleene closure, positive closure

Alphabet

Non-empty finite set of symbols
$\Sigma$

Strings

finite sequence of symbols of $\Sigma$
(word)

empty string : $\varepsilon$

$\Sigma^*$ : the set of all strings over alphabet $\Sigma$ (including $\varepsilon$ )

$\Sigma^*$ is monoid with respect to concatenation

Languages

A language over alphabet $\Sigma$ is a subset (which can be infinite) of $\Sigma^*$
(personally, similar to vocabulary)

Kleene star or Kleene closure of a language $L$
$L^* = \cup_{n\geq 0}L^n$

which is consistent with the notation of $\Sigma^*$ by considering $\Sigma$ as a language over alphabet $\Sigma$

Positive closure of a language $L$
$L^+ = \cup_{n\geq 1} L^n$

Some Properties

P1 Concatenation is associative

P2 Concatenation is not commutative

P3 $L\{\varepsilon\}=\{\varepsilon\}L=L$

P4 $L\varnothing=\varnothing L= \varnothing$

P5 Distributive properties

$(L_1\cup L_2)L_3 = L_1L_3\cup L_2L_3$
$L_1(L_2\cup L_3) = L_1L_2\cup L_1L_3$

P6 if $L_1\subseteq L_2$ and $L_3\subseteq L_4$ , then $L_1L_3 \subseteq L_2L_4$

P7 $\varnothing^*=\{\varepsilon\}$

P8 $\{\varepsilon\}^*=\{\varepsilon\}$

P9 if $\varepsilon \in L$ , then $L^*=L^+$

P10 $LL^*=L^*L=L^+$

P11 $L^*)^*=L^*$

P12 $L^*L^*=L^*$

P13 $L_1L_2)^*L_1=L_1(L_2L_1)^*$

P14 $(L_1\cup L_2)^*=(L_1^*L_2^*)^*$

Regular Expressions

$\varnothing$ , $\varepsilon$ , and $a$ for each $\in \Sigma$ are regular expressions representing the language for $\varnothing$ , $\{\varepsilon\}$ , and ${a\}$ respectively
a. $(r + s)$ representing the language $R\cup S$
b. $(r s)$ representing the language $R S$
c. $r^*)$ representing the language $R^*$

(keep minimum possible number of parenthesis in writing)

regular language $L (r)$ represented by regular expression $r$
sometimes just use $r$ instead of $L (r)$ to indicate the regular language $L$

Equivalent
expressions $r_1$ and $r_2$ representing the same language
we write $r_1 \approx r_2$ ( $r_1=r_2$ is also reasonable)

Some identities

$r\varepsilon\approx \varepsilon r\approx r$
$r_1r_2\not\approx r_2r_1$ , in general
$(r_1r_2)r_3\approx r_1(r_2r_3)$
$r\varnothing\approx \varnothing r \approx \varnothing$
$\varnothing^*\approx \varepsilon$
$\varepsilon^*\approx \varepsilon$
if $\{\varepsilon\} \in L(r)$ then $r^*\approx r^+$
$rr^*\approx r^*r \approx r^+$
Distributive identity.
$(r_1+r_2)r_3\approx r_1r_3+r_2r_3$
$r_1(r_2+r_3)\approx r_1r_2+r_1r_3$
$(r^*)^*\approx r^*$
$(r_1r_2)^*r_1\approx r_1(r_2r_1)^*$
$(r_1+r_2)^*\approx (r_1^*r_2^*)^*$

Grammars

Context-Free Grammars (CFG)

A grammar is a quadruple
$\mathcal{G}=(V, T, S, P)$
where

$\cup T$ , where $N$ is a finite set of non-terminals
$T$ is a finite set of terminals
$\in N$ is the start symbol
$P$ is a finite set of $\times V^*$ , called the set of production rules.

Some definitions

Consider a binary relation $\Rightarrow_{\mathcal{G}}$ on $V^*$ , let’s say, one step relation.
$\alpha \Rightarrow_{\mathcal{G}} \beta$ iff. $\alpha=\alpha_1A\alpha_2, \beta=\alpha_1\gamma\alpha_2$ and $A\rightarrow \gamma \in P$ .
if $\alpha \Rightarrow_{\mathcal{G}} \beta$ , then we call $\alpha$ yields $\beta$ in one step in $\mathcal{G}$ .
The reflexive-transitive closure（自反传递闭包）of $\Rightarrow_{\mathcal{G}}$ is denoted by $\Rightarrow_{\mathcal{G}}^*$ . That is, for $\alpha,\beta \in V^*$ , $\alpha \Rightarrow_{\mathcal{G}}^* \beta$ iff. $\exist n\geq 0$ and $\alpha_1,\alpha_2,...,\alpha_n \in V^*$ such that $\alpha=\alpha_0\Rightarrow_{\mathcal{G}}\alpha_1\Rightarrow_{\mathcal{G}}\alpha_2\Rightarrow_{\mathcal{G}}...\Rightarrow_{\mathcal{G}}\alpha_{n-1}\Rightarrow_{\mathcal{G}}\alpha_n=\beta$
if $\alpha\Rightarrow_{\mathcal{G}}^*\beta$ , then we say $\alpha$ derives $\beta$ . Further, $\alpha\Rightarrow_{\mathcal{G}}^*\beta$ is called as a derivation in $\mathcal{G}$ , and $\beta$ is the yield of the derivation.
In a given context, if we’re dealing with only one grammar $\mathcal{G}$ , then simply write $\Rightarrow$ instead of $\Rightarrow_{\mathcal{G}}$
if $\alpha=\alpha_0\Rightarrow_{\mathcal{G}}\alpha_1\Rightarrow_{\mathcal{G}}\alpha_2\Rightarrow_{\mathcal{G}}...\Rightarrow_{\mathcal{G}}\alpha_{n-1}\Rightarrow_{\mathcal{G}}\alpha_n=\beta$ is a derivation, then the length of the derivation is $n$ and it maybe written as $\alpha\Rightarrow_{\mathcal{G}}^n\beta$
A string $\alpha \in V^*$ is said to be a sentential form in $\mathcal{G}$ , if $\alpha$ can be derived from the start symbol $S$ of $\mathcal{G}$ , viz. $S\Rightarrow^*\alpha$
In particular, if $\alpha \in \Sigma^*$ , then the sentential form $\alpha$ is known as a sentence. In which case, we say $\alpha$ is generated by $\mathcal{G}$
The language generated by $\mathcal{G}$ , denoted by $L(\mathcal{G})$ , is the set of all sentences generated by $\mathcal{G}$ . That is, $L(\mathcal{G})=\{x\in \Sigma^*|S\Rightarrow^*x\}$

Production rules $A\rightarrow \alpha$ is independent of neighbouring symbols, i.e. context free.

Thus the grammar defined here is known as context free grammar, simply CFG.

Derivation Trees

also known as parse trees
the yield $\alpha$ of derivation $A\Rightarrow^*\alpha$ can be identified in the derivation tree by juxtaposing the lables of leaf nodes from left to right.

Ambiguity

ambiguous grammar/unambiguous grammar
inherently ambiguous language

Regular Grammars

linear grammar : every production rule has at most one nonterminal symbol
left-linear (right-linear) grammar : the nonterminal symbol is at the left end (right end)

The language generated by a right linear grammar is regular.
For every regular language $L$ , there exists a right linear grammar that generates $L$

Right linear grammars are also called regular grammars

Things also hold for left linear grammar.

Directed Graph Representation

Directed graph (short for digraph)

Chomsky Normal Form (CNF)

All productions are of the form $A\rightarrow BC$ or $A\rightarrow a$ , where $A, B, C$ are variables, and $a$ is terminal symbol.

Greibach Normal Form (GNF)

All productions are shown as the following form :
$\rightarrow aX,\ where\ a\in T, X\in V^*$

Simplification of CFG

“3R”

Reduction (two phases, 1.generating+2.reachable=useful) (Removal of useless production)
Removal of unit production
Removal of $\epsilon$ -production

Finite Automata (FA)

Deterministic Finite Automata (DFA)

A (deterministic) finite automaton $\mathscr{A}=(Q,\Sigma,q_0,\delta,F)$
$Q$ is a finite set called the set of states
$\Sigma$ is a finite set called the input alphabet
$q_0 \in Q$ , called the initial/start state
$\delta : Q \times \Sigma \rightarrow Q$ is a function called transition function or next-state function (which can be extended, say $\hat\delta : Q \times \Sigma^*\rightarrow Q$ )
$\subseteq Q$ , called the set of final/accept states

Useful graphical representation : Transition Diagram
which can be constructed as follows:

Every state in $Q$ is represented by a node
If $\delta(p,a)=q$ , then there’s an arc from $p$ to $q$ labeled $a$
There’s an arrow with no source into the initial state $q_0$
Final states are indicated by double circle

“accepted by”

Language of a DFA : $L(\mathscr{A})$

Configurations
$\in Q\times \Sigma^*$

the computation of $\mathscr{A}$ on the input $x$

Nondeterministic Finite Automata (NFA)

NFA

$\varepsilon$ -NFA

A nondeterministic finite automaton $\mathscr{N}=(Q,\Sigma,q_0,\delta,F)$ , where $Q,\Sigma,q_0,F$ are as in a DFA; whereas, the transition function $\delta$ is as below:
$\delta : Q \times (\Sigma \cup \{\varepsilon\}) \rightarrow \wp(Q)$
is a function so that, for a given state an a input symbol (possibly $\varepsilon$ ), $\delta$ assigns a set of next states, possibly empty set.

$\delta$ can also be extended to $\hat\delta$

Every DFA can be treated as an NFA.

$L(\mathscr{N})=\{x \in \Sigma^* | \hat\delta(q_0,x) \cap F \neq \varnothing \}$

Equivalence of RE & FA

Pushdown Automata (PDA)

$(Q,\Sigma,\Gamma,\delta,q_0,Z_0,F)$
$\Gamma$ : A finite stack alphabet.
$Z_0$ : The start symbol in the stack.

Equivalence of CFG & PDA

Turing Machine

e.g. $\{ a^nb^nc^n | n\geq 0\}$
not a CFL

Can be recognized by “A modified PDA with 2 stacks” - an variation of TM

TM is a seven-tuple $\Sigma, \Gamma, \delta, q_0, B, F)$

$Q$ is finite set of states
$\Sigma$ is finite set of input symbols
$\Gamma$ is finite set of tape symbols
$d e l t a$ is transition function : $Q\times \Gamma \Rightarrow Q\times \Gamma \times \{R,L\}$ , $\delta(q, X)=(p, Y, \rightarrow/\leftarrow)$
$q_0$ is start state
$B$ is blank symbol
$F$ is finite set of final states

Instantaneous

How to describe the configuration of TM

sequence of symbols in tape
state of TM
read/write head of TM

i.e. $X_1...X_{i-1}qX_i...X_n$ , where $q\in Q$

Language of TM

$\{w|q_0w -> \alpha p\beta , p\in F, \alpha,\beta \in \Gamma^*\}$

The kind of languages accepted by TM is called recursively enumerable(RE) language. （递归可枚举语言）

cervoliu

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
2
评论
Notes for Formal Languages and Automata Theory

AlphabetNon-empty finite set of symbolsΣ\SigmaΣStringsfinite sequence of symbols of Σ\SigmaΣempty string : ϵ\epsilonϵΣ∗\Sigma^*Σ∗ : the set of all strings over alphabet Σ\SigmaΣ (including ϵ\epsilonϵ)Σ∗\Sigma^*Σ∗ is monoid with respect to concatenat
复制链接

扫一扫