Finite automata and regular expression algorithms

C++ class library implementing finite automata and regular expression algorithms

Each algorithm is classified into one of two families: those based upon the structure of regular expressions, and those based upon the automata-theoretic work of Myhill and Nerode.

  • defining regular expressions as a Σ \Sigma Σ-term algebra. The FIRE engine only implements Σ \Sigma Σ-algebras for three carrier sets (regular expressions,finite automata, and reduced finite automata). Future versions of the toolkit will include more Σ \Sigma Σ-algebras.
  • The use of Σ \Sigma Σ-algebras in [Wat93a] can provide great computational efficiency in practice. For example, from regular expressions E 0 E_0 E0 and E 1 E_1 E1 we can construct finite automata M 0 M_0 M0, M 1 M_1 M1(accepting the languages denoted by E 0 E_0 E0, E 1 E_1 E1, respectively). Assume that we now require a finite automaton accepting the language denoted by E 0 ⋅ E 1 E_0 \cdot E_1 E0E1 (their concatenation). With some of the existing toolkits, the new finite automaton would be constructed from scratch. With the FIRE engine, a concatenation operator on finite automata is implemented (for two of the different varieties of finite automata), enabling us to compute M 0 ⋅ M 1 M_0 \cdot M_1 M0M1 (a finite automaton accepting the desired language). This type of reuse of intermediate results can be a great computational saving.
  • A future version of the toolkit will include support for extended regular expressions, i.e. regular expressions containing intersection or complementation operators.
  • Basic regular expressions and automata transition labels are represented by character ranges. A future version of the FIRE engine will permit basic regular expressions and transition labels to be built from more complex data-structures. For example, it will be possible to process a string (vector) of structures.

Definition Finite automaton

A finite automaton (an FA, an Finite Automata (automaton的复数) is a 6-tuple ( Q , V , T , E , S , F ) (Q,V,T,E,S,F) (Q,V,T,E,S,F) where

  • Q Q Q is a finite set of states,
  • V V V is an alphabet,
  • T ∈ P ( Q × V × Q ) T\in \mathcal P (Q \times V \times Q) TP(Q×V×Q) is a transition relation,
  • E ∈ P ( Q × Q ) E \in \mathcal P (Q \times Q) EP(Q×Q) is an ϵ \epsilon ϵ-transition relation,
  • S ⊆ Q S \subseteq Q SQ is a set start states, and
  • F ⊆ Q F \subseteq Q FQ is a set of final states

Remark the signatures of the transition relations

T ∈ V → P ( Q × Q ) T \in V \rightarrow \mathcal P (Q \times Q) TVP(Q×Q)
T ∈ Q × Q → P ( V ) T \in Q \times Q \rightarrow \mathcal P (V) TQ×QP(V)
T ∈ Q × V → P ( Q ) T \in Q \times V \rightarrow \mathcal P (Q) TQ×VP(Q)
T ∈ Q → P ( V × Q ) T \in Q \rightarrow \mathcal P (V \times Q) TQP(V×Q)
E ∈ Q → P ( Q ) E \in Q \rightarrow \mathcal P (Q) EQP(Q)
In each case, the order of the Q ′ s Q's Qs from left to right will be preserved; for example, the function T ∈ Q → P ( V × Q ) T \in Q \rightarrow \mathcal P (V \times Q) TQP(V×Q) is defined as T ( p ) = { ( a , q ) ∣ ( p , a , q ) ∈ T } T(p) = \{(a,q) | (p,a,q) \in T\} T(p)={(a,q)(p,a,q)T}. The signature that is used will be clear from the context.

Convention (Sets of functions):

For sets A and B, A → B A \rightarrow B AB denotes the set of all total functions from A to B, while A ↛ B A \nrightarrow B AB denotes the set of all partial functions from A to B.

Properties of finite automata

To make these definitions more concise, we introduce particular finite automata M = ( Q , V , T , E , S , F ) , M 0 = ( Q 0 , V 0 , T 0 , E 0 , S 0 , F 0 ) , M 1 = ( Q 1 , V 1 , T 1 , E 1 , S 1 , F 1 ) M = (Q,V,T,E,S,F), M_0 = (Q_0,V_0,T_0,E_0,S_0,F_0), M_1 = (Q_1,V_1,T_1,E_1,S_1,F_1) M=(Q,V,T,E,S,F),M0=(Q0,V0,T0,E0,S0,F0),M1=(Q1,V1,T1,E1,S1,F1)

Definition (Size of an FA):

Define the size of an FA as ∣ M ∣ = ∣ Q ∣ |M| = |Q| M=Q

Definition (Isomorphism ( ≅ \cong ) of FA’s):

We define isomorphism ( ≅ \cong ) as an equivalence relation on FA’s. M 0 M_0 M0 and M 1 M_1 M1 are isomorphism (written $M_0 \cong M 1 M_1 M1) if and only if V 0 = V 1 V_0 =V_1 V0=V1 and there exists a bijection g ∈ Q 0 → Q 1 g \in Q_0 \to Q_1 gQ0Q1 such that

  • T 1 = { ( g ( p ) , a , g ( q ) ) ∣ ( p , a , q ) ∈ T 0 } T_1 = \{(g(p),a,g(q)) | (p,a,q) \in T_0\} T1={(g(p),a,g(q))(p,a,q)T0}
  • E 1 = { ( g ( p ) , g ( q ) ) ∣ ( p , q ) ∈ E 0 } E_1 = \{(g(p),g(q)) | (p,q) \in E_0\} E1={(g(p),g(q))(p,q)E0}
  • S 1 = { ( g ( s ) ) ∣ s ∈ S 0 } S_1 = \{(g(s)) | s \in S_0\} S1={(g(s))sS0}
  • F 1 = { ( g ( s ) ) ∣ s ∈ F 0 } F_1 = \{(g(s)) | s \in F_0\} F1={(g(s))sF0}

Definition (Extending the transition relation T):

We extend transition relation T ∈ V → P ( Q × Q )   t o   T ∗ ∈ V ∗ → P ( Q × Q ) T \in V \to \mathcal P(Q \times Q)\ to\ T^{\ast} \in V^{\ast} \to \mathcal P(Q \times Q) TVP(Q×Q) to TVP(Q×Q) as follows:
T ∗ ( ϵ ) = E ∗ T^\ast(\epsilon) = E^\ast T(ϵ)=E
and (for a ∈ V , w ∈ V ∗ ) a \in V, w \in V^\ast) aV,wV)
T ∗ ( a w ) = E ∗ ∘ T ( a ) ∘ T ∗ ( w ) \qquad T^\ast(aw) = E^\ast \circ T(a) \circ T^\ast(w) T(aw)=ET(a)T(w)

Convention (Relation composition):

Given sets A, B, C (not necessarily different) and two relations, E ⊆ A × B E \subseteq A \times B EA×B and F ⊆ B × C F \subseteq B \times C FB×C, we define relation composition (infix operator ∘ \circ ) as:
E ∘ F = { ( a , c ) ∣ ( ∃ b ∈ B ∣ ( a , b ) ∈ E ∧ ( b , c ) ∈ F ) } E \circ F = \{(a,c) | (\exist b \in B | (a,b) \in E \land (b,c) \in F)\} EF={(a,c)(bB(a,b)E(b,c)F)}

Definition (Left and right languages):

The left language of a state (in M M M) is given by function L ← M ∈ Q → P ( V ∗ ) \overleftarrow{\mathcal L}_M \in Q \to \mathcal P(V^\ast) L MQP(V), where
L ← M ( q ) = ( ∪ s ∣ s ∈ S ∧ T ∗ ( s , q ) ) \qquad \overleftarrow{\mathcal L}_M(q) = (\cup s | s \in S \land T^\ast(s,q)) L M(q)=(ssST(s,q))
The right language of a state (in M M M) is given by function L → M ∈ Q → P ( V ∗ ) \overrightarrow{\mathcal L}_M \in Q \to \mathcal P(V^\ast) L MQP(V), where
L → M ( q ) = ( ∪ f ∣ f ∈ F ∧ T ∗ ( q , f ) ) \qquad \overrightarrow{\mathcal L}_M(q) = (\cup f | f \in F \land T^\ast(q,f)) L M(q)=(ffFT(q,f))
The subscript M M M is usually dropped when no ambiguity can arise.

Property 2.13 (Language of an FA):

From the definitions of left and right languages (of a state), we can also write:
L F A ( M ) = ( U f ∣ f ∈ F : L ← ( f ) ) \qquad \mathcal L_{FA}(M) = (U f | f \in F: \overleftarrow \mathcal L(f)) LFA(M)=(UffF:L (f))
and
L F A ( M ) = ( U s ∣ s ∈ F : L → ( s ) ) \qquad \mathcal L_{FA}(M) = (U s | s \in F: \overrightarrow \mathcal L(s)) LFA(M)=(UssF:L (s))

Definition 2.15 (Complete):

A Complete finite automaton is one satisfying the following:
C o m p l e t e ( M ) ≡ ( ∀ q , a ∣ q ∈ Q ∧ a ∈ V ∣ T ( q , a ) ≠ ∅ ) \qquad Complete(M) \equiv (\forall q,a | q\in Q \land a \in V | T(q,a) \neq \empty) Complete(M)(q,aqQaVT(q,a)̸=)

Property 2.16 (Complete):

For all Complete FA’s ( Q , V , T , E , S , F ) (Q, V,T,E,S,F) (Q,V,T,E,S,F):
( ∪ q ∣ q ∈ Q ∣ L ← ( q ) = V ∗ ) \qquad (\cup q | q\in Q | \overleftarrow \mathcal L(q) = V^\ast) (qqQL (q)=V)

Definition 2.17 (e-free):

Automaton M is ϵ \epsilon ϵ-free if and only if E = ∅ E = \empty E=.

Remark 2.18:

Even if M is ϵ \epsilon ϵ-free it is still possible that ϵ ∈ L F A ( M ) \epsilon \in \mathcal L_{FA}(M) ϵLFA(M); in this case S ∩ F ≠ ∅ S\cap F \neq \empty SF̸=

Convention A.7 (Equivalence classes of an equivalence relation):

For any equivalence relation E on set A we denote the set of equivalence classes of E by [ A ] E [A]_E [A]E; that is
[ A ] E = { [ a ] E ∣ a ∈ A } \qquad [A]_E = \{[a]_E|a\in A\} [A]E={[a]EaA}
Set [ A ] E [A]_E [A]E is also called the partition of A induced by E.

Definition A.8 (Index of an equivalence class):

For equivalence relation E on set A, define # E = ∣ [ A ] E ∣ \#E = |[A]_E| #E=[A]E. # E \#E #E is called the index of E.

Definition (Language of an FA):

The language of a finite automaton (with alphabet V) is given by the function L F A ( M ) → P ( V ∗ ) \mathcal L_{FA}(M) \to \mathcal P(V^\ast) LFA(M)P(V) defined as:
L F A ( M ) = ( ∪ s ∣ ( s ∈ S , f ∈ F ) ∧ T ∗ ( s , f ) ) \qquad \mathcal L_{FA}(M) = (\cup s|(s\in S, f\in F) \land T^\ast(s,f)) LFA(M)=(s(sS,fF)T(s,f))

Property 2.25 (Deterministic finite automaton):

A finite automaton M is deterministic if and only if

  • it does not have multiple start states,
  • it is ϵ \epsilon ϵ-free, and
  • transition function T ∈ Q × V → P ( Q ) T\in Q\times V \to \mathcal P(Q) TQ×VP(Q) does not map pairs in Q × V Q\times V Q×V to multiple states.
    Formally,
    D e t ( M ) ≡ ( ∣ S ∣ ≤ 1 ∧ ϵ − f r e e ( E ) ∧ ( ∀ q , a : q ∈ Q ∧ a ∈ V : ∣ T ( q , a ) ∣ ≤ 1 ) ) \qquad Det(M) \equiv (|S| \leq 1 \land \epsilon-free(E) \land (\forall q,a:q\in Q \land a \in V: |T(q,a)| \leq 1)) Det(M)(S1ϵfree(E)(q,a:qQaV:T(q,a)1))

Definition 2.30 (Minimality of a DFA):

An M ∈ D F A M\in DFA MDFA is minimal as follows:
M i n ( M ) ≡ ( ∀ M ′ : M ′ ∈ D F A ∧ L F A ( M ) = L F A ( M ′ ) : ∣ M ∣ ≤ ∣ M ′ ∣ ) \qquad Min(M) \equiv (\forall M' : M' \in DFA \land \mathcal L_{FA}(M) = \mathcal L_{FA}(M'):|M| \leq |M'|) Min(M)(M:MDFALFA(M)=LFA(M):MM)

Constructions based on regular expression structure

A finite automaton construction is any function f f f, such that the following diagram commutes:

在这里插入图片描述

In this section, we will be defining some Σ \Sigma Σ-algebras with FA ≅ _{\cong} , as the carrier set; the idea behind the above commuting diagram still holds in this case, as all isomorphic FA’s accept the same language.
The isomorphism class of an FA corresponding to a given regular expression is the image of the regular expression under the (unique) homomorphism from RE to the other Σ \Sigma Σ-algebras. Such a homomorphism is a FA construction.
Thompson’s construction is considered first, followed by a derivation of Berry and Sethi’s, McNaughton, Yamada and Glushkov’s, and Aho, Sethi, and Ullman’s constructions.

Thompson’s construction

Definition (Thompson’s Σ \Sigma Σ-algebra of FA’s):

The carrier set is [ F A ] ≅ [FA]_\cong [FA] The operator requirement is:

  • For the binary operators, the representatives of the arguments must have disjoint state sets. For any two equivalence classes (under ≅ \cong ) we can always choose a representative of each such that they satisfy this requirement.

The operators (With subscript Th, for Thompson) are:


C ϵ , T h = l e t   q 0 , q 1  be new states i n [ ( { q 0 , q 1 } , V , ∅ , { ( q 0 , q 1 ) } , { q 0 } , { q 1 } ] ≅ e n d \begin{array}{l} \text{$\mathcal C_{\epsilon,Th} = let\ q_0,q_1$ be new states}\\ \text{$in$}\\ \text{\qquad $[(\{q_0,q_1\},V,\empty,\{(q_0,q_1)\},\{q_0\},\{q_1\}]_\cong$}\\ \text{$end$} \end{array} Cϵ,Th=let q0,q1 be new statesin[({q0,q1},V,,{(q0,q1)},{q0},{q1}]end
F A = ( Q , V , T , E , S , F ) , Q = { q 0 , q 1 } , T = ∅ , E = { q 0 , q 1 } , S = { q 0 } , F = { q 1 } FA = (Q,V,T,E,S,F), Q = \{q_0,q_1\}, T = \empty, E = \{q_0,q_1\},S = \{q_0\},F = \{q_1\} FA=(Q,V,T,E,S,F),Q={q0,q1},T=,E={q0,q1},S={q0},F={q1}
在这里插入图片描述


C ∅ , T h = l e t   q 0 , q 1  be new states i n [ ( { q 0 , q 1 } , V , ∅ , ∅ , { q 0 } , { q 1 } ] ≅ e n d \begin{array}{l} \text{$\mathcal C_{\empty,Th} = let\ q_0,q_1$ be new states}\\ \text{$in$}\\ \text{\qquad $[(\{q_0,q_1\},V,\empty,\empty,\{q_0\},\{q_1\}]_\cong$}\\ \text{$end$} \end{array} C,Th=let q0,q1 be new statesin[({q0,q1},V,,,{q0},{q1}]end
F A = ( Q , V , T , E , S , F ) , Q = { q 0 , q 1 } , T = ∅ , E = ∅ , S = { q 0 } , F = { q 1 } FA = (Q,V,T,E,S,F), Q = \{q_0,q_1\}, T = \empty, E = \empty,S = \{q_0\},F = \{q_1\} FA=(Q,V,T,E,S,F),Q={q0,q1},T=,E=,S={q0},F={q1}
在这里插入图片描述


C a , T h = l e t   q 0 , q 1  be new states i n [ ( { q 0 , q 1 } , V , { q 0 , a , q 1 } , ∅ , { q 0 } , { q 1 } ] ≅ e n d \begin{array}{l} \text{$\mathcal C_{a,Th} = let\ q_0,q_1$ be new states}\\ \text{$in$}\\ \text{\qquad $[(\{q_0,q_1\},V,\{q_0,a,q_1\},\empty,\{q_0\},\{q_1\}]_\cong$}\\ \text{$end$} \end{array} Ca,Th=let q0,q1 be new statesin[({q0,q1},V,{q0,a,q1},,{q0},{q1}]end
F A = ( Q , V , T , E , S , F ) , Q = { q 0 , q 1 } , T = { q 0 , a , q 1 } , E = ∅ , S = { q 0 } , F = { q 1 } FA = (Q,V,T,E,S,F), Q = \{q_0,q_1\}, T = \{q_0,a,q_1\}, E = \empty,S = \{q_0\},F = \{q_1\} FA=(Q,V,T,E,S,F),Q={q0,q1},T={q0,a,q1},E=,S={q0},F={q1}
在这里插入图片描述


C ⋅ , T h ( [ M 0 ] ≅ , [ M 1 ] ≅ ) = l e t   ( Q 0 , V , T 0 , E 0 , S 0 , F 0 ) = M 0     ( Q 1 , V , T 1 , E 1 , S 1 , F 1 ) = M 1   i n l e t   E ′ = E 0 ∪ E 1 ∪ ( F 0 × S 1 ) i n [ ( { Q 0 ∪ Q 1 , V , T 0 ∪ T 1 , E ′ , S 0 , F 1 ] ≅ e n d e n d \begin{array}{l} \text{$\mathcal C_{\cdot,Th}([M_0]_\cong,[M_1]_\cong) = let\ (Q_0,V,T_0,E_0,S_0,F_0) = M_0$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad \quad$ (Q_1,V,T_1,E_1,S_1,F_1) = M_1$ }\\ \text{$in$}\\ \text{$\qquad let\ E' = E_0 \cup E_1 \cup (F_0 \times S_1)$}\\ \text{$\qquad in$}\\ \text{$\qquad \qquad [(\{Q_0 \cup Q_1,V,T_0 \cup T_1,E',S_0,F_1]_\cong$}\\ \text{$\qquad end$}\\ \text{$end$} \end{array} C,Th([M0],[M1])=let (Q0,V,T0,E0,S0,F0)=M0  (Q1,V,T1,E1,S1,F1)=M1 inlet E=E0E1(F0×S1)in[({Q0Q1,V,T0T1,E,S0,F1]endend
F A = ( Q , V , T , E , S , F ) FA = (Q,V,T,E,S,F) FA=(Q,V,T,E,S,F)
在这里插入图片描述


C ∪ , T h ( [ M 0 ] ≅ , [ M 1 ] ≅ ) = l e t   ( Q 0 , V , T 0 , E 0 , S 0 , F 0 ) = M 0     ( Q 1 , V , T 1 , E 1 , S 1 , F 1 ) = M 1     q 0 , q 1  be new states i n l e t   Q ′ = Q 0 ∪ Q 1 ∪ ( q 0 , q 1 ) E ′ = E 0 ∪ E 1 ∪ ( { q 0 } × ( S 0 ∪ S 1 ) ) ∪ ( ( F 0 ∪ F 1 ) × { q 1 } ) i n [ ( Q ′ , V , T 0 ∪ T 1 , E ′ , { q 0 } , { q 1 } ) ] ≅ e n d e n d \begin{array}{l} \text{$\mathcal C_{\cup,Th}([M_0]_\cong,[M_1]_\cong) = let\ (Q_0,V,T_0,E_0,S_0,F_0) = M_0$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad \quad$ (Q_1,V,T_1,E_1,S_1,F_1) = M_1$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad \quad$q_0,q_1$ be new states}\\ \text{$in$}\\ \text{$\qquad let\ Q' = Q_0 \cup Q_1 \cup (q_0,q_1)$}\\ \text{$\qquad \quad E' = E_0 \cup E_1 \cup (\{q_0\} \times (S_0 \cup S_1)) \cup ((F_0 \cup F_1) \times \{q1\})$}\\ \text{$\qquad in$}\\ \text{$\qquad \qquad [(Q',V,T_0 \cup T_1,E',\{q_0\},\{q_1\})]_\cong$}\\ \text{$\qquad end$}\\ \text{$end$} \end{array} C,Th([M0],[M1])=let (Q0,V,T0,E0,S0,F0)=M0  (Q1,V,T1,E1,S1,F1)=M1  q0,q1 be new statesinlet Q=Q0Q1(q0,q1)E=E0E1({q0}×(S0S1))((F0F1)×{q1})in[(Q,V,T0T1,E,{q0},{q1})]endend
F A = ( Q , V , T , E , S , F ) FA = (Q,V,T,E,S,F) FA=(Q,V,T,E,S,F)
在这里插入图片描述


C ∗ , T h ( [ M ] ) = l e t   ( Q , V , T , E , S , F ) = M     q 0 , q 1  be new states i n l e t   Q ′ = Q ∪ { ( q 0 , q 1 } E ′ = E ∪ ( { q 0 } × S ) ∪ ( F × S ) ∪ ( F × { q 1 } ) ∪ { ( q 0 , q 1 ) } i n [ ( Q ′ , V , T , E ′ , { q 0 } , { q 1 } ) ] ≅ e n d e n d \begin{array}{l} \text{$\mathcal C_{\ast,Th}([M]) = let\ (Q,V,T,E,S,F) = M$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad$q_0,q_1$ be new states}\\ \text{$in$}\\ \text{$\qquad let\ Q' = Q \cup \{(q_0,q_1\}$}\\ \text{$\qquad \quad E' = E \cup (\{q_0\} \times S) \cup (F \times S) \cup (F \times \{q_1\}) \cup \{(q_0,q_1)\}$}\\ \text{$\qquad in$}\\ \text{$\qquad \qquad [(Q',V,T,E',\{q_0\},\{q_1\})]_\cong$}\\ \text{$\qquad end$}\\ \text{$end$} \end{array} C,Th([M])=let (Q,V,T,E,S,F)=M  q0,q1 be new statesinlet Q=Q{(q0,q1}E=E({q0}×S)(F×S)(F×{q1}){(q0,q1)}in[(Q,V,T,E,{q0},{q1})]endend
F A = ( Q , V , T , E , S , F ) FA = (Q,V,T,E,S,F) FA=(Q,V,T,E,S,F)
在这里插入图片描述


C + , T h ( [ M ] ) = l e t   ( Q , V , T , E , S , F ) = M     q 0 , q 1  be new states i n l e t   Q ′ = Q ∪ { ( q 0 , q 1 } E ′ = E ∪ ( { q 0 } × S ) ∪ ( F × S ) ∪ ( F × { q 1 } ) i n [ ( Q ′ , V , T , E ′ , { q 0 } , { q 1 } ) ] ≅ e n d e n d \begin{array}{l} \text{$\mathcal C_{+,Th}([M]) = let\ (Q,V,T,E,S,F) = M$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad$q_0,q_1$ be new states}\\ \text{$in$}\\ \text{$\qquad let\ Q' = Q \cup \{(q_0,q_1\}$}\\ \text{$\qquad \quad E' = E \cup (\{q_0\} \times S) \cup (F \times S) \cup (F \times \{q_1\})$}\\ \text{$\qquad in$}\\ \text{$\qquad \qquad [(Q',V,T,E',\{q_0\},\{q_1\})]_\cong$}\\ \text{$\qquad end$}\\ \text{$end$} \end{array} C+,Th([M])=let (Q,V,T,E,S,F)=M  q0,q1 be new statesinlet Q=Q{(q0,q1}E=E({q0}×S)(F×S)(F×{q1})in[(Q,V,T,E,{q0},{q1})]endend
F A = ( Q , V , T , E , S , F ) FA = (Q,V,T,E,S,F) FA=(Q,V,T,E,S,F)
在这里插入图片描述


C ? , T h ( [ M ] ) = l e t   ( Q , V , T , E , S , F ) = M     q 0 , q 1  be new states i n l e t   Q ′ = Q ∪ { ( q 0 , q 1 } E ′ = E ∪ ( { q 0 } × S ) ∪ ( F × q 1 ) ∪ { ( q 0 , q 1 ) } ) i n [ ( Q ′ , V , T , E ′ , { q 0 } , { q 1 } ) ] ≅ e n d e n d \begin{array}{l} \text{$\mathcal C_{?,Th}([M]) = let\ (Q,V,T,E,S,F) = M$ }\\ \text{ \qquad \qquad \qquad \qquad \qquad$q_0,q_1$ be new states}\\ \text{$in$}\\ \text{$\qquad let\ Q' = Q \cup \{(q_0,q_1\}$}\\ \text{$\qquad \quad E' = E \cup (\{q_0\} \times S) \cup (F \times q_1) \cup \{(q_0,q_1)\})$}\\ \text{$\qquad in$}\\ \text{$\qquad \qquad [(Q',V,T,E',\{q_0\},\{q_1\})]_\cong$}\\ \text{$\qquad end$}\\ \text{$end$} \end{array} C?,Th([M])=let (Q,V,T,E,S,F)=M  q0,q1 be new statesinlet Q=Q{(q0,q1}E=E({q0}×S)(F×q1){(q0,q1)})in[(Q,V,T,E,{q0},{q1})]endend
F A = ( Q , V , T , E , S , F ) FA = (Q,V,T,E,S,F) FA=(Q,V,T,E,S,F)
在这里插入图片描述


These operators are symmetrical. Furthermore, they do not depend upon the choice of representative of the equivalence classes (under ≅ \cong ). An automaton in Thompson’s Σ \Sigma Σ-algebra (here we speak of a representative FA, instead of the isomorphism class) has the following properties:

  • It has a single start state with no in-transitions.
  • It has a single final state with no out-transitions.
  • Every state has either a single in-transition on a symbol (in V), or at most two ϵ \epsilon ϵ in-transitions.
  • Every state has either a single out-transition on a symbol (in V), or at most two ϵ \epsilon ϵ out-transitions.

These properties are symmetrical because the operators are symmetricaL Hopcroft and Uliman have shown [HU79] that in practice these properties facilitate the quick simulation of M.

Example Thompson’s construction

T h ( ( a ∪ ϵ ) b ∗ ) = C ⋅ , T h ( T h ( a ∪ ϵ ) , T h ( b ∗ ) ) = C ⋅ , T h ( C ∪ , t h ( T h ( a , T h ( ϵ ) ) , C ∗ , T h ( b ) ) = C ⋅ , T h ( C ∪ , T h ( C a , T h , C ϵ , T h ) , C ∗ , T h ( C b , T h ) ) ) \begin{array}{ll} Th((a \cup \epsilon)b^\ast) & = \mathcal C_{\cdot,Th}(Th(a \cup \epsilon),Th(b^\ast))\\ & = \mathcal C_{\cdot,Th}(\mathcal C_{\cup,th}(Th(a,Th(\epsilon)),\mathcal C_{\ast,Th}(b))\\ & = \mathcal C_{\cdot,Th}(\mathcal C_{\cup,Th}(\mathcal C_{a,Th},\mathcal C_{\epsilon,Th}),\mathcal C_{\ast,Th}(\mathcal C_{b,Th}))) \end{array} Th((aϵ)b)=C,Th(Th(aϵ),Th(b))=C,Th(C,th(Th(a,Th(ϵ)),C,Th(b))=C,Th(C,Th(Ca,Th,Cϵ,Th),C,Th(Cb,Th)))
在这里插入图片描述
Figure : A representative FA of the isomorphism class T h ( ( a ∪ ϵ ) b ∗ Th((a\cup\epsilon)b^\ast Th((aϵ)b

Character ranges

In the FIRE engine, atomic regular expressions and finite automata transitions can be labeled by sets of characters. The sets of characters allowed are restricted to non-empty ranges of characters in the execution character set (which is usually ASCII or EBCDIC).
Class CharRange is used to represent such ranges of characters, which are specified by their upper and lower (inclusive) bounds [lo,hi].


[Wat93a] WATSON, B. W. “A taxonomy of finite automata construction algorithms,” Computing Science Note 93/43, Eindhoven University of Technology, The Netherlands, 1993. Available by ftp from ftp.win.tue.nl in pub/techreports/pi.

[Wat93b] WATSON, B. W. “A taxonomy of finite automata minimization algorithms,” Computing Science Note 93/44, Eindhoven University of Technology, The Netherlands, 1993. Available by ftp from ftp.win.tue.nl in pub/techreports/pi.

[Wat94a] WATSON, B. W. “An introduction to the FIRE engine: A C++ toolkit for FInite automata and Regular Expressions,” Computing Science Note 94/21, Eindhoven University of Technology, The Netherlands, 1994. Available by ftp from ftp.win.tue.nl in pub/techreports/pi

[Wat94b] WATSON, B.W. "The design. and implementation of the FIRE engine:
A C++ toolkit for FInite automata and Regular Expressions, " Computing Science Note 94/22, Eindhoven University of Technology, The Netherlands, 1994. Available by ftp from ftp.win.tue.nl in pub/techreports/pi.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值