形式语言与自动机_笔记整理(一)_有穷自动机与正则表达式

最新推荐文章于 2021-06-24 20:30:43 发布

Vitalia

最新推荐文章于 2021-06-24 20:30:43 发布

阅读量2.8k

点赞数 4

分类专栏：理论基础文章标签：形式语言与自动机有穷自动机 DFA

本文链接：https://blog.csdn.net/Vitalia/article/details/78883457

版权

理论基础专栏收录该内容

21 篇文章 3 订阅

订阅专栏

Outline
Preliminary
- Math Preliminary
- PROOF TECHNIQUES
Languages
- String Operations
- Operations on Languages
Finite Automata
- Deterministic Finite Automata
- Regular Language
Nondeterministic Finite Automata
- Nondeterministic
- Formal NFA
Regular Expressions
Summary

昨天晚上入睡的时候已经将近三点了，最近两天无所事事却又异常失眠。偶然发现CSDNblog的访问量已经超过了QQ空间，可见两年多自己在CS之路走得要比十多年的社交大道平坦得多呀，决定暂时中断社交平台的互动，以彰显期末复习之志。

回看博客记录的东西，除了冗余可读性差的代码，就是抱大腿的刻骨铭心记忆。回忆这半个学期确实没做什么事情。开学时不知天高地厚选了很多课，上起来才真正觉得力不从心。我以为我可以再强大一点，但是我并没有意识到，很多理论课自学难度稍微大于可以偶尔翘掉课堂然后自学并完成实验的应用课程。

一直清楚地知道自己并不能从完成大作业中获得很大的满足感，短暂的既视成就感不是我想要的，从不玩任何游戏的我以后也绝对不会去做也做不了游戏开发，但是我的心中一直不变的做一个优秀的并且快乐的程序员的梦想没有走远。

昨天晚上看到有人重提“写字台像乌鸦”的梗，又开始了一波感时伤世，你说啊，多愁伤感怎么会是我在这个时候应该想到的事情呢？细数自己应该不剩多少时间去挥霍游浪了，接下来的旅程怎么说都应该是道阻且艰的，怎就有胆量如此放肆任性而不计较呢？

Outline

Part 1
Preliminaries
Mathematical Knowledge
String and Language

Part 2
Finite Automata and Regular Expression
Context Free Grammar and Pushdown Automata
Turing Automata

Part 3 Modeling
Transition System
Petri Net
Timed and Hybrid Automata
Message Sequence Chart

Part 4 Tutorials
Computability
Model Checking
Trustworthy Software

Preliminary

Math Preliminary

SET

Powerset of S = the set of all the subsets of S, P(S), 2^S

FUNCTIONS

Given two sets A and B, a function from A into B
associates with each a in A at most one element b of B.
If A = domain, then f is a total function, otherwise f is a partial function.

f: A -> B is a bijection

f is total
for all a and a’ in A, a!=a’ implies f(a)!=f(a’)
for all b in B, there is a in A with f(a)=b

RELATIONS

A binary relation R over A is a partial order if it is reflexive, transitive, and antisymmetric.

A binary relation R over A is a total order(linear order) if it is a partial order and for all a, b in A, either aRb or bRa.

GRAPHS

Walk is a sequence of adjacent edges
A path is a walk where no edge is repeated
A simple path is a path where no node is repeated
A cycle is a walk from a node (base) to itself
A simple cycle: only the base node is repeated
Given a digraph G = (V, E) and nodes u and v, we say v is reachable from u, or u-reachable, if there is a path from u to v.
A tree is a directed graph that has no cycle.

PROOF TECHNIQUES

Proof by induction
Proof by contradiction

Languages

A language is a set of strings.
String: A sequence of letters/symbols.
Symbols are defined over an alphabet.

String Operations

Concatenation
Reverse
$\Sigma^*$ : the set of all possible strings from alphabet $\Sigma$ .
$\Sigma^+$ : the set of all possible strings from alphabet $\Sigma$ except $\lambda$ .

Operations on Languages

The usual set operations
Complement
Reverse
Concatenation
Star-Closure (Kleene *): $L^*=L^0 \cup L^1 \cup L^2 \cdots$
Positive Closure: $L^+=L^*-\{\lambda\}$

Finite Automata

Language of an Automaton: The set of strings accepted by an automaton A is the language of A. Denoted $L(A)$ .

Deterministic Finite Automata

Alphabet: An alphabet is any finite set of symbols.
String: A string over an alphabet $Σ$ is a list, each element of which is a member of $Σ$ .
Subtlety: 0 as a string, 0 as a symbol look the same.
A language is a subset of $Σ^*$ for some alphabet $Σ$ .

Formal Definition

A formalism for defining languages, consisting of:

A finite set of states (Q, typically).
- An input alphabet ( $Σ$ , typically).
- A transition function (δ, typically).
  - Takes two arguments: a state and an input symbol.
  - $δ(q, a)$ = the state that the DFA goes to when it is in state $q$ and input $a$ is received.
  - Note:
    - $δ$ is a total function
    - always a next state (add a dead state if no transition)
- A start state ( $q_0$ , in $Q$ , typically).
- A set of final states (F⊆Q, typically).
  - ”Final” and “accepting” are synonyms.
Graph Representation of DFA’ s

Example: Strings With NO“11”
- Nodes = states.
- Arcs represent transition function.
  - Arc from state p to state q labeled by all those input symbols that have transitions from p to q.
- Arrow labeled “Start” to the start state.
- Final states indicated by double circles.
Transition Table of DFA’ s

Inductive Definition of Extended δ

Induction on length of string.
Basis: δ(q, ε) = q
Induction: δ(q, wa) = δ(δ(q, w), a)
Remember: w is a string; a is an input symbol, by convention.

We don’ t distinguish between the given delta and the extended delta or delta-hat.

Language of DFA

For a DFA A, L(A) is the set of strings labeling paths from the start state to a final state.
L(A) = the set of strings w such that $δ(q_0, w)$ is in F.

Proofs of Set Equivalence

TODO:
Important trick: Expand the inductive hypothesis to be more detailed than the statement you are trying to prove.

Regular Language

A language L is regular if it is the language accepted by some DFA.
Theorem: The reverse of a regular language is also regular.

Nondeterministic Finite Automata

Nondeterministic

A nondeterministic finite automaton has the ability to be in several states at once.
Transitions from a state on an input symbol can be to any set of states.
Start in one start state.
Accept if any sequence of choices leads to a final state.

Formal NFA
- A finite set of states, typically $Q$ .
- An input alphabet, typically $Σ$ .
- A transition function, typically δ.
  - δ(q, a) is a set of states. Extend to strings as follows:
    - Basis: $δ(q, ε) = {q}$
    - Induction: $δ(q, wa)$ = the union over all states p in $δ(q, w)$ of $δ(p, a)$
- A start state in $Q$ , typically $q_0$ .
- A set of final states $F ⊆ Q$ .
Language of an NFA

A string w is accepted by an NFA if $δ(q_0, w)$ contains at least one final state.
The language of the NFA is the set of strings it accepts.

Equivalence of DFA’s, NFA’s
- A DFA can be turned into an NFA that accepts the same language.
  If $δ_D(q, a) = p$ , let the NFA have $δ_N(q, a) = {p}$ .
- Subset Construction
  - Given an NFA with states Q , inputs Σ, transition function δN , state state q0 , and final states F , construct equivalent DFA with:
    - States $2^Q$ (Set of subsets of $Q$ ).
    - Inputs $Σ$ .
    - $δ_D({q_1, …, q_k}, a)$ is the union over all $i = 1, …, k$ of $δ_N(q_i, a)$ .
    - Start state $\{q_0\}$ .
    - Final states = all those with a member of $F$ .
NFA’s With ε-Transitions

Closure of States

CL(q) = set of states you can reach from state q following only arcs labeled ε.
Closure of a set of states = union of the closure of each state.

Language of an ε-NFA

TODO:
Language of an ε-NFA is the set of strings w such that $\hat{δ} (q_0, w)$ contains a final state.

Equivalence of NFA, ε-NFA

Every NFA is an ε-NFA.
- It just has no transitions on ε.
Converse requires us to take an ε-NFA and construct an NFA that accepts the same language.
- We do so by combining ε-transitions with the next transition on a real input.
  - Transition function $δ_N(q, a)$ as follows:
    - Let $S = CL(q)$ .
    - $δ_N(q, a)$ is the union over all $p$ in $S$ of $δ_E(p, a)$ .
  - $F'$ = the set of states $q$ such that $CL(q)$ contains a state of $F$ .
Regular Expressions

Operations on Languages
- union,
- concatenation
- Kleene star
Equivalence of RE’s and Finite Automata

Converting a RE to an ε-NFA

Proof is an induction on the number of operators (+, concatenation, *) in the RE.
We always construct an automaton of a special form.

DFA-to-RE

k-paths

A k-path is a path through the graph of the DFA that goes though no state numbered higher than k.
- Endpoints are not restricted; they can be any state.
- n-paths are unrestricted.
- RE is the union of RE’s for the n-paths from the start state to each final state.
Algebraic Laws for RE’s

Union and concatenation behave sort of like addition and multiplication.
Exception: Concatenation is not commutative.

Decision Properties of Regular Languages

The Membership Problem

Assume L is represented by a DFA A.
Simulate the action of A on the sequence of input symbols forming w.

The Emptiness Problem

Given a regular language, does the language contain any string at all?
Assume representation is DFA.
Compute the set of states reachable from the start state.
If at least one final state is reachable, then yes, else no.

The Infiniteness Problem

Start with a DFA for the language.
Key idea: if the DFA has n states, and the language contains any string of length n or more, then the language is infinite.
Otherwise, the language is surely finite.
- Limited to strings of length strictly less than n
Second key idea: if there is a string of length > n (number of states) in L, then there is a string of length between n and 2n-1.

The Pumping Lemma

Decision Property: Equivalence

product DFA

Let these DFA’ s have sets of states Q and R, respectively.
- Product DFA has set of states Q $\times$ R, i.e., pairs [q, r] with q in Q, r in R.
- Start state = $[q_0, r_0]$ (the start states of the DFA’s for L, M).
- Transitions: δ([q,r],a)=[δL(q,a),δM(r,a)]
  - $δ_L$ , $δ_M$ are the transition functions for the DFA’s of L, M.
  - That is, we simulate the two DFA’s in the two state components of the product DFA.
Decision Property: Containment

Product DFA
Define the final states [q, r] of the product so its language is empty iff L $\subseteq$ M
q is final; r is not.

Constructing the Minimum-State DFA

State Minimization

Algorithm is a recursion on the length of the shortest distinguishing string.
- Basis: Mark pairs with exactly one final state.
- Induction: mark $[q, r]$ if for some input symbol a, $[δ(q,a), δ(r,a)]$ is marked.
  After no more marks are possible, the unmarked pairs are equivalent and can be merged into one state.
Transitivity of Indistinguishable

If state p is indistinguishable from q, and q is indistinguishable from r, then p is indistinguishable from r.

Eliminating Unreachable States
Unfortunately, combining indistinguishable states could leave us with unreachable states in the “minimum-state” DFA.
Thus, before or after, remove states that are not reachable from the start state.

Proof: No Unrelated, Smaller DFA

Closure Properties of Regular Languages

Closure Under Union

If L and M are regular languages, so is L M.
Proof:
Let L and M be the languages of regular expressions R and S, respectively.
Then R+S is a regular expression whose language is L M.

Closure Under Concatenation and Kleene Closure

Same idea:
RS is a regular expression whose language is LM.
R* is a regular expression whose language is L*.

Closure Under Intersection

If L and M are regular languages, then so is L M.
Proof:
Let A and B be DFA’ s whose languages are L and M, respectively.
Construct C, the product automaton of A and B.
Make the final states of C be the pairs consisting of final states of both A and B.

Closure Under Difference

If L and M are regular languages, then so is L – M = strings in L but not M.
Proof:
Let A and B be DFA’ s whose languages are L and M, respectively.
Construct C, the product automaton of A and B.
Final states of C are the pairs whose A-state is final but whose B-state is not.

Closure Under complement

The complement of a language L (with respect to an alphabet Σ such that Σ* contains L) is Σ* – L.
Proof:
Since Σ* is surely regular, the complement of a regular language is always regular.

Closure Under Reversal

Given language L, LR is the set of strings whose reversal is in L.
Example: L = {0, 01, 100}; LR = {0, 10, 001}.
Proof:
Let E be a regular expression for L.
We show how to reverse E, to provide a regular expression ER for LR.

Basis: If E is a symbol a, ε, or ∅, then ER = E.
Induction: If E is
- F+G, then ER = FR + GR.
- FG, then ER = GRFR
- F*, then ER = (FR)*.
Closure Under Homomorphism

A homomorphism on an alphabet is a function that gives a string for each symbol in that alphabet.
Example: h(0) = ab; h(1) = ε.
Extend to strings by h(a1…an) = h(a1)…h(an).
Example: h(01010) = ababab.

If L is a regular language, and h is a homomorphism on its alphabet, then h(L) = {h(w) | w is in L} is also a regular language.
Proof:
Let E be a regular expression for L.
Apply h to each symbol in E.
Language of resulting RE is h(L).

Example:
Let h(0) = ab; h(1) = ε.
Let L be the language of regular expression 01* + 10*.
Then h(L) is the language of regular expression abε* + ε(ab)*.
abε* + ε(ab)* can be simplified.
ε* = ε, so abε* = abε.
ε is the identity under concatenation.
That is, εE = Eε = E for any RE E.
Thus, abε + ε(ab)* = ab + (ab)*.
Finally, L(ab) is contained in L((ab) $^*)$ , so a RE for h(L) is (ab)*.

Closure Under Inverse Homomorphism

Inverse Homomorphism
- Let h be a homomorphism and L a language whose alphabet is the output language of h.
- $h^{-1}(L)$ = {w | h(w) is in L}.
Example:
Let h(0) = ab; h(1) = ε.
Let L = {abab, baba}.
$h^{-1}(L)$ = the language with two 0’ s and any number of 1’ s = L(1*01*01*).

Proof:
An induction on |w| (omitted) shows that $δB(q_0, w) = δA(q_0, h(w))$ .
Thus, B accepts w if and only if A accepts $h(w)$ .

Summary