自动机理论
Automata theory deals with the definitions and properties of mathematical models of computation. These models play a role in several applied areas of computer science.
- One model, called the finite automaton, is used in text processing, compilers, and hardware design.
- Another model, called the context-free grammar, is used in programming languages and artificial intelligence.
由此可见 有穷状态机和有穷自动机是一回事
the simplest model, called the finite state machine or finite automaton.
有穷自动机状态图和形式化定义
we used state diagrams to introduce finite automata.
Now we define finite automata formally. Although state diagrams are easier to
grasp intuitively, we need the formal definition, too, for two specific reasons.
First, a formal definition is precise. It resolves any uncertainties about what
is allowed in a finite automaton. If you were uncertain about whether finite
automata were allowed to have 0 accept states or whether they must have exactly one transition exiting every state for each possible input symbol, you could
consult the formal definition and verify that the answer is yes in both cases. Second, a formal definition provides notation. Good notation helps you think and
express your thoughts clearly.
The language of a formal definition is somewhat arcane, having some similarity to the language of a legal document. Both need to be precise, and every
detail must be spelled out.
有穷自动机的形式化定义(FORMAL DEFINITION)
A finite automaton is a 5-tuple (Q, Σ, δ, q0, F ), where
1. Q is a finite(有穷) set called the states,
2. Σ is a finite set called the alphabet(字母表),
3. δ : Q × Σ−→Q is the transition function,1
4. q0 ∈ Q is the start state, and
5. F ⊆ Q is the set of accept states.2
正则语言
A language is called a regular language if some finite automaton
recognizes it.
正则运算,类比数学运算
In arithmetic, the basic objects are numbers and the tools are operations for manipulating them,
such as + and ×. In the theory of computation, the objects are languages and the tools include operations specifically designed for manipulating them.
We define three operations on languages, called the regular operations, and use them to study properties of the regular languages.
Let A and B be languages. We define the regular operations union,
concatenation, and star as follows:
• Union: A ∪ B = {x| x ∈ A or x ∈ B}.
• Concatenation: A ◦ B = {xy| x ∈ A and y ∈ B}.
• Star: A∗ = {x1x2 . . . xk| k ≥ 0 and each xi ∈ A}
正则表达式,类比数学表达式
In arithmetic, we can use the operations + and × to build up expressions such as
(5 + 3) × 4 .
Similarly, we can use the regular operations to build up expressions
describing languages, which are called regular expressions.
词法分析器如何识别最长的匹配,且看如下
其实设置两个变量,当走投无路是选择最近的终态。
Keeping track of the longest match just means remembering the last time
the automaton was in a final state with two variables, Last-Final (the state
number of the most recent final state encountered) and Input-Positionat-Last-Final. Every time a final state is entered, the lexer updates these
variables; when a dead state (a nonfinal state with no output transitions) is
reached, the variables tell what token was matched, and where it ended.
There are two important disambiguation rules used by Lex and other similar lexical-analyzer generators:
- Longest match: The longest initial substring of the input that can
match any regular expression is taken as the next token.
-
- Rule priority: For a particular longest initial substring, the first
regular expression that can match determines its token type. This
means that the order of writing down the regular-expression rules has significance.
非确定性有穷自动机
When the machine is in a given state and
reads the next input symbol, we know what the next state will be—it is determined. We call this deterministic computation. In a nondeterministic machine,
several choices may exist for the next state at any point.
非确定有穷自动机和确定性有穷自动机的区别
- The difference between a deterministic finite automaton, abbreviated
DFA, and a nondeterministic finite automaton, abbreviated NFA, is
immediately apparent. First, every state of a DFA always has exactly
one exiting transition arrow for each symbol in the alphabet. The NFA
shown in Figure 1.27 violates that rule. State q1 has one exiting
arrow for 0, but it has two for 1; q2 has one arrow for 0, but it has
none for 1. In an NFA, a state may have zero, one, or many exiting
arrows for each alphabet symbol.
- Second, in a DFA, labels on the transition arrows are symbols from
the alphabet. This NFA has an arrow with the label ε. In general, an
NFA may have arrows labeled with members of the alphabet or ε. Zero,
one, or many arrows may exit from each state with the label ε.
非确定性有穷自动机的用处
Nondeterministic finite automata are useful in several respects. As we will
show, every NFA can be converted into an equivalent DFA, and constructing
NFAs is sometimes easier than directly constructing DFAs. An NFA may be much
smaller than its deterministic counterpart, or its functioning may be easier to
understand. Nondeterminism in finite automata is also a good introduction
to nondeterminism in more powerful computational models because finite automata are especially easy to understand.
DFA与NFA是等价的
Deterministic and nondeterministic finite automata recognize the same class of
languages. Such equivalence is both surprising and useful. It is surprising because NFAs appear to have more power than DFAs, so we might expect that NFAs
recognize more languages. It is useful because describing an NFA for a given
language sometimes is much easier than describing a DFA for that language.
Say that two machines are equivalent if they recognize the same language.
Every nondeterministic finite automaton has an equivalent deterministic finite
automaton.