INT201-Decision, Computation and Language(1)

SP FA

已于 2023-01-03 01:23:09 修改

阅读量690

点赞数 3

分类专栏： INT 文章标签：算法自动机

于 2022-10-13 11:57:18 首次发布

本文链接：https://blog.csdn.net/SP_FA/article/details/127267441

版权

INT 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

文章目录

1. DFA
- 1.1 Language defined by DFA
- 1.2 Regular operations on languages
2. NFA
3. Regular Language
4. Context-Free Languages

1. DFA

DFA is a finite-state machine that accepts or rejects a given string of symbols, by running through a state sequence uniquely determined by the string.

A DFA is defined as a 5-tuple: $M=(Q,\Sigma,\delta,q,F)$

$Q$ is a finite set of states.
$\Sigma$ is a finite set of symbols, called the alphabet of the automaton.
$\delta:Q\times\Sigma\rightarrow Q$ is a function, called the transition function.
$q\in Q$ is called the initial state.
$F\subseteq Q$ is a set of accepting / terminal states.

We can extend the definition of the transition function $\delta$ so that it tells us which state we reach after a word $\Sigma^*$ (not just a single letter) has been scanned:
extend the map $\delta:Q\times\Sigma\rightarrow Q$ to $\delta^*:Q\times\Sigma^*\rightarrow Q$ by defining:
$\begin{aligned}\delta^*(q,\epsilon)=q~~~~&\text{for all }q\in Q\\\delta^*(q,wa)=\delta(\delta^*(q,w),a)~~~~&\text{for all }q\in Q,w\in\Sigma^*,a\in\Sigma\\\delta^*(q,vw)=\delta^*(\delta^*(q,v),w)~~~~&\text{for all }q\in Q,v,w\in\Sigma^*\end{aligned}$

1.1 Language defined by DFA

Suppose we have a DFA $M$ , A word $w\in\Sigma^*$ is said to be accepted or recognized by $M$ if $\delta^*(q_0,w)\in F$ , otherwise it is said to be rejected. The set of all words accepted by $M$ is called the language accepted by $M$ and will be denoted by $L (M)$ .
$L(M)=\{w\in\Sigma^*:\delta^*(q_0,w)\in F\}$
Any finite language is accepted by some DFA

1.2 Regular operations on languages

A language $A$ is called regular, if there exists a finite automaton $M$ such that $A = L (M)$

Let $A$ and $B$ be two languages over the same alphabet.
The union of $A$ and $B$ is defined as: $A\cup B=\{w:w\in A\text{ or }w\in B\}$ The concatenation: $AB=\{ww^\prime:w\in A\text{ and }w^\prime\in B\}$ The Kleene star of $A$ is defined as: $A^*=\bigcup\limits_{i\in N}A_i=A_0\cup A_1\cup A_2\cdots$ where $\begin{aligned}A_0&=\{\epsilon\}\\A_1&=A\\A_{i+1}&=\{wv:w\in A_i,v\in A\}\end{aligned}$

2. NFA

A finite automata is deterministic, if the next state the machine goes to on any given symbol is uniquely determined.

DFA has exactly one transition leaving each state for each symbol.

A finite automata is nondeterministic, if the machine allows for several or no choices to exist for the next state on a given symbol. For a state $q$ and symbol $s\in\Sigma$ , NFA can have:

Multiple edges leaving $q$ labelled with the same symbol $s$
No edge leaving $q$ labelled with symbol $s$
Edge leaving $q$ labelled with $\epsilon$ (without reading any symbol)

The machine splits into multiple copies of itself (threads):

Each copy proceeds with computation independently of others
NFA may be in a set of states, instead of a single state.
NFA follows all possible computation paths in parallel.
If a copy is in a state and next input symbol doesn’t appear on any outgoing edge from the state, then the copy dies or crashes.
NFA accepts the input string, if any copy ends in an accept state after reading the entire string.

For any alphabet $\Sigma$ , we define $\Sigma_\epsilon=\Sigma\cup\{\epsilon\}$
NFA is a 5-tuple $M=(Q,\Sigma,\delta,q,F)$

$Q$ is a finite set of states
$\Sigma$ is a finite set of symbols, called the alphabet of the automaton
$\delta:Q\times\Sigma_\epsilon\rightarrow P(Q)$ is a function, called the transition function
$q\in Q$ is called the initial / start state
$F\subseteq Q$ is a set of accepting / terminal states

Let $M=(Q,\Sigma,\delta,q,F)$ be an NFA, and let $w\in \Sigma^*$ . We say that $M$ accepts $w$ , if $\delta^*(q_0,w)\in F$

Extend the map $\delta$ to a map $Q\times\Sigma^*\rightarrow P(Q)$ by defining: $\begin{aligned}\delta(q,\epsilon)=\{q\}~~~~&\text{for all }q\in Q\\\delta(q,wa)=\bigcup\limits_{p\in\delta(q,w)}\delta(p,a)~~~~&\text{for all }q\in Q,w\in\Sigma^*,a\in\Sigma\end{aligned}$

Suppose, in a DFA, we can get from state $p$ to state $q$ via transitions labelled by letters of a word $w$ . Then we say that the states $p$ and $q$ are connected by a path with label $w$ .

In a NFA, if $\delta(p,a)=\{q,r\}$ we could write: $\{p\}\stackrel{a}{\longrightarrow}\{q,r\}$

2.1 Language accepted by NFA

Let $M=(Q,\Sigma,\delta,a,F)$ be an NFA. The language $L (M)$ accepted by $M$ is defined as $L(M)=\{w\in\Sigma^*:M\text{ accepts } w\}$

2.2 Equivalence of DFAs and NFAs

Two machines are equivalent if they recognize the same language.

DFA is a restricted form of NFA:

Every NFA has an equivalent DFA
We can convert an arbitrary NFA to a DFA that accepts the same language
DFA has the same power as NFA

2.3 DFA to NFA

The formal conversion of a DFA to an NFA is done as follows: Let $M=(Q,\Sigma,\delta,q,F)$ be a DFA. $\delta$ is a function $\delta:Q\times\Sigma\rightarrow Q$ . We define the function $\delta^\prime:Q\times\Sigma_\epsilon\rightarrow P(Q)$ . For any $r\in Q$ and for any $a\in\Sigma_\epsilon$ : $\delta^\prime(r,a)=\begin{cases}\{\delta(r,a)\}&\text{if }a\ne\epsilon\\\phi&\text{if }a=\epsilon\end{cases}$ Then $N=(Q,\Sigma,\delta^\prime)$ is an NFA, whose behavior is exactly the same as that of the DFA $M$ , the easiest way to see this is by observing that the state diagrams of $M$ and $N$ are equal. Therefore, we have $L (M) = L (N)$

2.4 NFA to DFA

Definition 1
The $\epsilon-\text{closure}$ of a set of states $R\subseteq Q$ :
$E(R)=\{q|q \text{ can be reached from }R\text{ by travelling over zero or more }\epsilon\text{ transitions }\}$

Definition 2
Suppose that there is a set of states $R$ and $a\in\Sigma$ , we say that $R_a=\epsilon-\text{closure}(J)$ where $J$ is the set that can be reached from $R$ by travelling over $a$

3. Regular Language

Definition
Previous: A language is regular if it is recognized by some DFA
Now: A language is regular if and only if some NFA recognizes it
Some operations on languages: Union, Concatenation and Kleene star

3.1 Closed under operation

A collection $S$ of objects is closed under operation $f$ if applying $f$ to members of $S$ always returns an object still in $S$ .

Regular languages are indeed closed under the regular operations (Union, Concatenation, Kleene star)

3.1.1 Regular Languages Closed Under Union

Proof
$A$ and $B$ are regular languages over the same alphabet $\Sigma$ , there are automata $M_1=(Q_1,\Sigma,\delta_1,q_1,F_1)$ and $M_2=(Q_2,\Sigma,\delta_2,q_2,F_2)$ that accept $A$ and $B$ , respectively.

We can define $M=(Q,\Sigma,\delta,q,F)$ where:

$Q=Q_1\times Q_2=\{(q_1,q_2):q_1\in Q_1\text{ and }q_2\in Q_2\}$
$q = (q_1,q_2)$
$F=\{(q_1,q_2):q_1\in F_1\text{ or }q_2\in F_2\}$
$\delta:\delta((q_1,q_2),a)=(\delta(q_1,a),\delta(q_2,a)),a\in\Sigma$

Then: $M\text{ accept } w\iff\delta^*((q_1,q_2),w)\in F\iff\delta^*(q_1,w)\in F_1\text{ or }\delta^*(q_2,w)\in F_2$

So that $L(M_1)\cup L(M_2)$

Proof from the perspective of NFA
Consider $M_1,M_2$ are NFAs, we assume that $Q_1\cap Q_2=\varnothing$
We can define $M=(Q,\Sigma,\delta,q,F)$ where:

$Q=\{q_0\}\cup Q_1\cup Q_2$
$q_0$ is the start state of $M$
$F=F_1\cup F_2$

Then: $\delta(q,a)=\begin{cases}\delta_1(q,a)&\text{if }r\in Q_1\\\delta_2(q,a)&\text{if }r\in Q_2\\\{q_1,q_2\}&\text{if }r=q_0\text{ and }a=\epsilon\\\varnothing&\text{if }r=q_0\text{ and }a\ne\epsilon\end{cases}$

用图论的话来说，相当于建一虚点 $q_0$ ，指向两个 NFA 的起点。

3.1.2 Regular Languages Closed Under Concatenation

The concatenation of $A_1$ and $A_2$ is defined as: $A_1A_2=\{ww^\prime:w\in A_1\text{ and }w^\prime\in A_2\}$

Proof

$Q=Q_1\cup Q_2$
$q=q_1$
$F=F_2$

Then: $\delta(q,a)=\begin{cases}\delta_1(q,a)&\text{if }q\in Q_1\text{ and }q\notin F_1\\\delta_1(q,a)&\text{if }q\in F_1\text{ and }a\ne\epsilon\\\delta_1(q,a)\cup\{q_2\}&\text{if }q\in F_1\text{ and }a=\epsilon\\\delta_2(q,a)&\text{if }r\in Q_2\end{cases}$

相当于是把所有的 $F_1$ 连到了 $q_2$ ，有种首尾相接的感觉。

3.1.3 Regular Languages Closed Under Kleene star

Proof

$Q=\{q_0\}\cup Q_1$
$q_0$ is the start state of $M$
$F=\{q_0\}\cup F_1$

Then: $\delta(q,a)=\begin{cases}\delta_1(q,a)&\text{if }q\in Q_1\text{ and }q\notin F_1\\\delta_1(q,a)&\text{if }q\in F_1\text{ and }a\ne\epsilon\\\delta_1(q,a)\cup\{q_1\}&\text{if }q\in F_1\text{ and }a=\epsilon\\\{q_1\}&\text{if }q=q_0\text{ and }a=\epsilon\\\varnothing&\text{if }q=q_0\text{ and }a\ne\epsilon\end{cases}$

所有 $F$ 都连到了 $q_1$ 上，白了就是递归

3.1.4 Regular Languages Closed Under Complement and Interaction

If $A$ is a regular language over the alphabet $\Sigma$ , then the complement: $\bar A\{w\in\Sigma^*:w\notin A\}$ is also a regular language.

If $A_1$ and $A_2$ are regular languages over the same alphabet $\Sigma$ , then the interaction: $A_1\cap A_2=\{w\in\Sigma^*:w\in A_1\text{ and }w\in A_2\}$ is also a regular language.

3.2 Regular Expressions

Regular expressions are means to describe certain languages.

Let $\Sigma$ be a non-empty alphabet.

$\epsilon$ is a regular expression
$\varnothing$ is a regular expression
For each $a\in\Sigma$ , $a$ is a regular expression
If $R_1$ and $R_2$ are regular expressions, then $R_1\cup R_2$ is a regular expression, the same as $R_1R_2$ , $R_1^*$

If $R$ is a regular expression, then $L (R)$ is the language generated / described / defined by $R$ .

$\epsilon$ describes the language $\{\epsilon\}$
$\varnothing$ describes the language $\varnothing$
For each $a\in\Sigma$ , the regular expression a describes the language ${a\}$
If $R_1,R_2$ are regular expressions and $L_1,L_2$ are the languages described by them, respectively. $R_1\cup R_2$ describes the language $L_1\cup L_2$ , the same as $R_1R_2$ , $R_1^*$

3.3 Kleene’s Theorem

Let $L$ be a language. Then $L$ is regular iff there exists a regular expression that describes $L$ .

If a language is described by a regular expression, then it is regular.
If a language is regular, then it has a regular expression.

3.4 GNFA

A GNFA can be defined as a 5-tuple $(Q,\Sigma,\delta,\{s\},\{t\})$

$Q$ is a finite set of states
$\Sigma$ is a finite set of alphabet
$\delta:(Q\setminus\{t\})\times(Q\setminus\{s\})\rightarrow R$
$s\in Q$
$t\in Q$

3.4.1 DFA 转 GNFA

Convert a DFA into a regular expression

DFA转GNFA

3.5 Pumping Lemma for Regular Languages

A tool that can be used to prove that certain languages are not regular. This theorem states that all regular languages have a special property.

This property states that all strings in the language can be “pumped” if they are at least as long as a certain special value, called the pumping length. That means each such string contains a section that can be repeated any number of times with resulting string remaining in the language.

If a language $L$ is regular, it always satisfies pumping lemma. If there exists at least one string made from pumping which is not in $L$ , then $L$ is surely not regular.
If pumping lemma holds, it does not mean that the language is regular.

4. Context-Free Languages

4.1 CFG

Context-Free Grammar

A context-free grammar is a 4-tuple $G=(V,\Sigma,R,S)$ , where

$V$ is a finite set, whose elements are called variables
$\Sigma$ is a finite set, whose elements are called terminals
$V\cap\Sigma=\varnothing$
$S$ is an element of $V$ , it is called the start variable
$R$ is a finite set, whose elements are called rules. Each rule has the form $A\rightarrow w$ , where $A\in V$ and $w\in(V\cup\Sigma)^*$

Definition 1: yeild $\Rightarrow$
Let $G=(V,\Sigma,R,S)$ be a context free grammar with

$A\in V$
$u,v,w\in(V\cup\Sigma)^*$
$A\rightarrow w$ is a rule of the grammar

The string $u w v$ can be derived in one step from the string $u A v$ , written as $uAv\Rightarrow uwv$

Definition 2: derive $\stackrel{*}{\Rightarrow}$
Let $G=(V,\Sigma,R,S)$ be a context free grammar with

$u,v\in(V\cup\Sigma)^*$

The string $v$ can be derived from the string $u$ , written as $u\stackrel{*}{\Rightarrow}v$ , if one of the following conditions holds:

$u = v$
there exist an integer $k\geq2$ and a sequence $u_1,u_2,\cdots,u_k$ of strings in $(V\cup\Sigma)^*$ , such that
- $u=u_1$
- $v=u_k$ and $u_1\Rightarrow u_2\Rightarrow u_3\cdots\Rightarrow u_k$

4.1.1 Language of CFG

The language of CFG $G=(V,\Sigma,R,S)$ is $L(G)=\{w\in\Sigma^*|S\stackrel*\Rightarrow w\}$ Such a language is called context-free, and satisfies $L(G)\subseteq\Sigma^*$

Theorem
Let $\Sigma$ be an alphabet and let $L\subseteq\Sigma^*$ be a regular language. Then $L$ is a context-free language (Every regular language is context-free)

4.2 CNF

Chomsky Normal Form

A context-free grammar $G=(V,\Sigma,R,S)$ is said to be in Chomsky normal form, if every rule in $R$ has one of the following three forms:

$A\rightarrow BC$ , where $A, B, C$ are elements of $V$ , $B\ne S$ and $C\ne S$
$A\rightarrow a$ , where $A$ is an element of $V$ and $a$ is an element of $\Sigma$
$S\rightarrow\epsilon$ , where $S$ is the start variable

Grammars in CNF are far easier to analyze.

Theorem
Let $\Sigma$ be an alphabet and let $L\subseteq\Sigma^*$ be a CFL. There exists a CFG in CNF, whose language is $L$ . That is, every CFL can be described by a CFG in CNF

4.2.1 Converting CFG into CNF

Eliminate the start variable from the right-hand side of the rules.
- New start variable $S_0$
- New rule $S_0\rightarrow S$
Remove $\epsilon$ -rules $A\rightarrow\epsilon$ , where $A\in V-\{S\}$ . When removing $A\rightarrow\epsilon$ rules, insert all new replacements
- Before: $B\rightarrow AbA$ and $A\rightarrow\epsilon|\cdots$
- After: $B\rightarrow AbA|bA|Ab|b$ and $A\rightarrow\cdots$
Remove unit rules $A\rightarrow B$ , where $A\in V$
- Before: $A\rightarrow B$ and $B\rightarrow xCy$
- After: $A\rightarrow xCy$ and $B\rightarrow xCy$
Eliminate all rules having more than two symbols on the right-hand side.
- Before: $A\rightarrow B_1B_2B_3$
- After: $A\rightarrow B_1A_1, A_1\rightarrow B_2B_3$
Eliminate all rules of the form $A\rightarrow ab$ , where $a$ and $b$ are not both variables.
- Before: $A\rightarrow ab$
- After: $A\rightarrow B_1B_2, B_1\rightarrow a, B_2\rightarrow b$

4.3 PDA

NFA is a PDA without stack.

Pushdown Automata
The class of languages that can be accepted by pushdown automata is exactly the class of context-free languages (finite automata are for regular languages).

The input for a pushdown automaton is a string $w$ in $\Sigma^*$
Different from finite automata, PDAs have a single stack.
Stack have 2 different operations:
- push: adds item to top of stack
- pop: removes item from top of stack

在这里插入图片描述

Tape: divided into cells that store symbols belonging to $\Sigma_\epsilon=\Sigma\cup\{\epsilon\}$
Tape head: move along the tape, one cell to the right per move.
Stack: containing symbols from a finite set $Γ$ , called the stack alphabet. This set contains a special symbol $ (often mark bottom of stack).
State control: can be in any one of a finite number of states. The set of states is denoted by $Q$ . The set $Q$ contains one special state q, called the start state.

PDA Transition
If PDA

in state $q_i$
reads $a\in\Sigma_\epsilon$
pops $b\inΓ_\epsilon$ off the stack

If $a=\epsilon$ , then no input symbol is read.
If $b=\epsilon$ , then nothing is popped off stack.

Then PDA

moves to state $q_j$
push $c\inΓ_\epsilon$ onto top of stack

If $c=\epsilon$ , then nothing is pushed onto stack
If $c=u_1u_2\cdots u_k$ with $k\geq1$ and $u_1,u_2,\cdots,u_k\inΓ$ , then $b$ is replaced by $c$ , and $u_k$ becomes the new top symbol of the stack.

A pushdown automaton is a 6-tuple $M=(Q,\Sigma,Γ,\delta,q,F)$

$Q$ is finite set of states
$\Sigma$ is (finite) input (tape) alphabet
$Γ$ is (finite) stack alphabet
$\delta$ is the transition function: $Q\times\Sigma_\epsilon\timesΓ\rightarrow Q\times\{N,R\}\timesΓ_\epsilon^*$
$q\in Q$ is start state
$F\subseteq Q$ is set of accept states

Let $r^\prime\in Q,\sigma\in\{N,R\}$ , and $w\inΓ^*$ $\delta(r,a,b)=(r^\prime,\sigma,c)$

The tape head moves according to $\sigma$ :

If $\sigma=R$ , it moves one cell to the right
If $\sigma=N$ , it does not move

4.3.1 Nondeterministic PDA

PDA transition function allows for nondeterminism $\delta:Q\times\Sigma_\epsilon\timesΓ_\epsilon\rightarrow P(Q\timesΓ_\epsilon)$

4.3.2 Language accepted by PDA

The set of all input strings that are accepted by PDA $M$ is the language recognized by $M$ and is denoted by $L (M)$

4.4 Equivalence of PDA and context-free languages

Let $\Sigma$ be an alphabet and let $A\subseteq\Sigma^*$ be a language. Then $A$ is context-free if and only if there exists a pushdown automaton that accepts $A$ .

If $A = L (G)$ for some CFG $G$ , then $A = L (M)$ for some PDA $M$ .
If $A = L (M)$ for some PDA $M$ , then $A = L (G)$ for some CFG $G$ .

Proof: If $A = L (G)$ for some CFG $G$ , then $A = L (M)$ for some PDA $M$ .

Basic idea: Given CFG $G$ , convert it into PDA $M$ with $L (M) = L (G)$ by building PDA that simulates a leftmost derivation.

However, PDA cannot push strings instead of $\le1$ symbols onto stack. How can we solve this problem? $\delta:Q\times\Sigma_\epsilon\timesΓ_\epsilon\rightarrow P(Q\timesΓ_\epsilon)$

4.4.1 CFLs and regular languages

If $A$ is a regular language, then $A$ is also a CFL.

4.5 The pumping lemma for context-free languages

4.5.1 Pumping Lemma for CFLs

Let $L$ be a context-free language. Then there exists an integer $p\ge1$ , called the pumping length, such that the following holds: Every string s in $L$ , with $|s|\ge p$ , can be written as $s = uvx yz$ , such that

$|vy|\ge1$
$|vxy|\le p$
$uv^ixy^iz\in L$ , for all $i\ge0$ .

Split String Using Parse Tree

More generally, consider “long” string $s\in A$ .
Parse tree is “tall”, $\exists$ repeated variable $R$ in path from root $S$ to leaf.
Split string $s = uvx yz$ into 5 pieces based on repeated variable $R$ :
- $u$ is before $R - R$ subtree (in depth-first order)
- $v$ is before second $R$ subtree within $R - R$ subtree
- $x$ is what second $R$ eventually becomes
- $y$ is after second $R$ within $R - R$ subtree
- $z$ is after $R - R$ subtree