《正则表达式转NFA》NFA是捏着正则式去比文本,吃掉一个字符,就把它跟正则式比较,匹配就记下来:“某年某月某日在某处匹配上了!”,然后接着往下干。一旦不匹配,就把刚吃的这个字符吐出来,一个个的吐,直到回到上一次匹配的地方。
而DFA捏着文本串去比较正则式,看到一个子正则式,就把可能的匹配串全标注出来,然后再看正则式的下一个部分,根据新的匹配结果更新标注。
例如用正则式/perl|perlman/来匹配文本 ‘perlman book’。如果是NFA,则以正则式为导向,手里捏着正则式,眼睛看着文本,一个字符一个字符的吃,吃完 ‘perl’ 以后,跟第一个子正则式/perl/已经匹配上了,于是记录在案,往下再看,吃进一个 ‘m’,这下糟了,跟子式/perl/不匹配了,于是把m吐出来,向上汇报说成功匹配 ‘perl’,不再关心其他,也不尝试后面那个子正则式/perlman/,自然也就看不到那个更好的答案了。
如果是DFA,它是以文本为导向,手里捏着文本,眼睛看着正则式,一口一口的吃。吃到/p/,就在手里的 ‘p’ 上打一个钩,记上一笔,说这个字符已经匹配上了,然后往下吃。当看到 /perl/ 之后,DFA不会停,会尝试再吃一口。这时候,第一个子正则式已经山穷水尽了,没得吃了,于是就甩掉它,去吃第二个子正则式的/m/。这一吃好了,因为又匹配上了,于是接着往下吃。直到把正则式吃完,心满意足往上报告说成功匹配了 ‘perlman’。
确定型有穷自动机
确定型有穷自动机是不确定有穷自动机中的一个特例,其中:
- 没有输出ε之上的转换动作。
- 对每个状态s和每个输入符号a,有且只有一条标号为a的边离开s
确定型有穷自动机的转换
- ε-closure的操作就是去ε
- move的操作就是对路径的归并
- Dstates保存着新节点[初始化是ε-closure(s0)]状态
- 发现Dstates还有末被标记的节点进入循环并打上标记
- 对新节点T(T=ε-closure(s))集合进行移边操作和去ε得到新集合U
- 如果发现U没在Dstates中,加入并不打标记
- 创建一条新边Dtran[T,a]
求出ε-closure(s0)]={0,1,3,4,5,6,7,9}并创建表格
I | Ia | Ib |
---|---|---|
{0,1,3,4,5,6,7,9} |
对基移a,b操作
移a或b集合操作的时候得到的新集合如果不第一列中,则新开一列标记上,并再对新集合移a或b.如此循环 。
新的终态的判断方法就是包含原来终态的集合就为终态,例如此题原来终态为9,所以包含9的集合就为终态,[双圈代表终态]
新的初态就是包含原来初态的集合就为初态,例如此题原来初态为0,所以包含0的集合就为初态
将DFA最小化
对于同一个语言,可以存在多个识别此语言的DFA,所以,求出DFA后,通常我们还需要对DFA进行简化操作,求出最简DFA。
简化的本质是合并性质相同的状态,以减少整个图的大小。
无关状态
(1)多于状态:对于一个状态Si,若从开始状态出发,不可能到达该状态Si,则Si为多余(无用)状态。
(2)死状态:对于一个状态Si,对于任意输入符号a,若转到它本身后,不可能从它到达终止状态,则称Si为死状态。
等价状态:若Si为自动机的一个状态,我们把从s出发能导出的所有符号串的集合记为L(s)。设有两个状态s和t,若有L(s)=L(t),则称s和t是等价状态。
(1) 一致性条件:状态s和t必须同时为终态或非终态
(2) 蔓延性条件:对于所有输入符号,状态s和状态t必须转化到等价的状态里
DFA的化简算法:对于DFA M=(S,Σ,f,S0,Z)
(1)首先将DFA的状态集进行初始化,分成Π=(Z,S-Z); S为全部状态,Z为终态。
(2) 用下面的过程对Π构造新的划分Π
-
for (Π中每个组G)
{
-
把G划分成小组,G中的任意两个状态Si和Sj在同一组中,
-
当且仅当对于Σ中任意输入符号a ,Si和Sj的a转换是到同一组中,move(Si,a) ∈Gi ,move(Sj,a) ∈Gi。
-
只要Si和Sj的a转换是到不同的组中,则说明Si和Sj是可区别的,可进行划分。在Π new中用刚完成的对G的划分代替原来的G。
-
}
-
(3)重复执行(2),直到Π中每个状态集不能再划分(Π new= Π)为止;
(4)合并等价状态 ,在每个G中,取任意状态作为代表,删去其它状态;
(5)删去无关状态,从其它状态到无关状态的转换都成为无定义。
①首次划分: Π0=({2,3,4,5},{0,1})
②在G={2,3,4,5}中:f(2,a)=1,f(4,a)=0(转向终态集{0,1});f(3,a)=3,f(5,a)=5(转向非终态集{2,3,4,5}),故{2,4}和{3,5}是可区别的,得Π1=({2,4},{3,5},{0,1});
③在G={2,4}中,f(2,a)=1,f(4,a)=0(转向终态子集),而f(2,b)=3,f(4,b)=5(转向非终态子集{3,5}),所以不可区别,不再进行划分;
④考察G={3,5},f(3,a)=3,f(5,a)=5(转向非终态子集{3,5}),f(3,b)=2,f(5,b)=4(转向非终态子集{2,4}), 所以不可区别,不再进行划分;
⑤考察G={0,1},f(0,a)=f(1,a)=1(转向终态集{0,1}); f(0,b)=2f,(1,b)=4(转向非终态子集{2,4}),所以不可区别,不再进行划分;
⑦进一步进行考察,可以发现每个子集都不能再划分了;
⑧消去等价状态:{0,1}用0表示,{2,4}用2表示,{3,5}用3表示,如右图所示
⑨去掉无关状态,因DFA M’中没有无关状态,所以下图即为最后结果。
JAVA代码
-
package com.moyao.magic;
-
-
import com.moyao.magic.SuffixAlgorithm.Operator;
-
-
import java.util.*;
-
import java.util.stream.Collectors;
-
-
public
class
NfaAlgorithm {
-
-
-
-
private static String prepareString(String _regex) {
-
String regex =
"";
-
char[] regexs = _regex.replaceAll(
" ",
"").toCharArray();
-
for (
int i =
0; i < regexs.length; i++) {
-
if (i ==
0)
-
regex += regexs[i];
-
else {
-
if (regexs[i] ==
'|' || regexs[i] ==
'*' || regexs[i] ==
')') {
-
regex += regexs[i];
-
}
else {
-
if (regexs[i -
1] ==
'(' || regexs[i -
1] ==
'|')
-
regex += regexs[i];
-
else
-
regex += (
"&" + regexs[i]);
-
}
-
}
-
}
-
return regex;
-
}
-
-
private
static String E =
"epsllon";
-
-
public static Graph transformNFA(String suffixRegex){
-
char[] rs = suffixRegex.toCharArray();
-
Stack<Object> operandStack =
new Stack();
-
for (
int i =
0; i<rs.length; i++){
-
char o = rs[i];
-
if(!Operator.isOperator(o)){
-
operandStack.push(String.valueOf(o));
-
}
else{
-
-
Operator
operator = Operator.findOperator(o);
-
Graph graph =
null;
-
switch (
operator){
-
case ADD:
//'&'
-
Object av1 = operandStack.pop();
-
Object av2 = operandStack.pop();
-
graph = Graph.parse(av2);
-
graph.
add(av1);
-
break;
-
case MULTIPLY:
//'*'
-
Object mv = operandStack.pop();
-
graph = Graph.parse(mv);
-
graph.multiply();
-
break;
-
case DIVIDE:
//'|'
-
Object dv1 = operandStack.pop();
-
Object dv2 = operandStack.pop();
-
graph = Graph.parse(dv2);
-
graph.divide(dv1);
-
break;
-
}
-
operandStack.push(graph);
-
}
-
}
-
return Graph.parse(operandStack.pop());
-
}
-
-
-
public static Graph transformDFA(Graph nfa){
-
List<String> lbs = nfa.getLabels();
-
List<UNode> us =
new ArrayList<>();
-
List<UNode> doList =
new ArrayList<>();
-
List<Edge> edges =
new ArrayList<>();
-
-
UNode s =
new UNode(nfa.closure(nfa.start));
-
UNode e =
null;
-
doList.
add(s);
-
us.
add(s);
-
while(!doList.isEmpty()){
-
UNode u = doList.
remove(
0);
-
for (String lb : lbs){
-
List<Node> nodes = nfa.closure(nfa.move(u.getNodes(),lb));
-
if(nodes.isEmpty())
continue;
-
-
UNode next =
new UNode(nodes);
-
-
int index = us.indexOf(next);
-
-
if(index >=
0){
-
next = us.
get(index);
-
}
-
-
Edge edge =
new Edge(u,next,lb);
-
edges.
add(edge);
-
-
if(index >=
0)
continue;
-
-
doList.
add(next);
-
us.
add(next);
-
-
if(e !=
null)
continue;
-
-
e = next.nodes
-
.stream()
-
.anyMatch(node -> {
return node == nfa.end;}) ?
-
next :
null;
-
}
-
}
-
return
new Graph(edges,s, e);
-
}
-
-
public static Graph optimizeDFA(Graph dfa){
-
List<String> lbs = dfa.getLabels();
-
-
int pos =
1;
-
int end =
2;
-
-
Map<Integer, Node> nodeMap = dfa.getNodes();
-
Map<Integer,Integer> groupMap =
new HashMap<>();
-
nodeMap.values().forEach(n->{
-
groupMap.put(n.id, n == dfa.end ?
0 :
1);
-
});
-
-
while (pos < end){
-
-
final
int searchIndex = pos;
-
-
List<Node> nodes = groupMap.entrySet().stream()
-
.filter(e-> e.getValue() == searchIndex).map(e->{
return nodeMap.
get(e.getKey());})
-
.collect(Collectors.toList());
-
-
boolean success =
true;
-
-
for (String lb:lbs) {
-
-
Map<Integer,List<Integer>> resultMap =
new HashMap<>();
-
-
for (Node node: nodes) {
-
Node n = dfa.move(node,lb);
-
int groupResult = n ==
null ?
-1 : groupMap.
get(node.id);
-
-
List<Integer> ll = resultMap.
get(
-1);
-
if (ll ==
null){
-
ll =
new ArrayList<>();
-
ll.
add(node.id);
-
}
-
resultMap.put(groupResult,ll);
-
}
-
-
int count = resultMap.keySet().size();
-
if(count ==
1)
-
continue;
-
-
int b = pos;
-
-
Iterator<List<Integer>> it = resultMap.values().iterator();
-
-
while (it.hasNext()){
-
List<Integer> idList = it.next();
-
for (Integer id:idList) {
-
groupMap.put(id,b++);
-
}
-
}
-
end = b;
-
success =
false;
-
break;
-
}
-
-
if(success) pos++;
-
}
-
-
Map<Integer, Node> newNodeMap =
new HashMap<>();
-
Node s =
new Node(groupMap.
get(dfa.start.id));
-
Node e =
new Node(groupMap.
get(dfa.end.id));
-
newNodeMap.put(s.id, s);
-
newNodeMap.put(e.id, e);
-
-
List<Edge> edges = dfa.edges.stream().map(edge->{
-
Node nodeBegin = newNodeMap.getOrDefault(edge.begin.id,
new Node(groupMap.
get(edge.begin.id)));
-
Node nodeEnd = newNodeMap.getOrDefault(edge.end.id,
new Node(groupMap.
get(edge.end.id)));
-
return
new Edge(nodeBegin,nodeEnd,edge.label);
-
}).distinct().collect(Collectors.toList());
-
-
return
new Graph(edges,s, e);
-
}
-
-
public static void main(String[] args) {
-
String exp = prepareString(
"(A*B|AC)D");
-
//String exp = prepareString("AB");
-
-
System.
out.println(exp);
-
String regex = SuffixAlgorithm.rpn(exp);
-
System.
out.println(regex);
-
Graph fna = transformNFA(regex);
-
System.
out.println(fna);
-
Graph dfa = transformDFA(fna);
-
System.
out.println(dfa);
-
Graph opDfa = optimizeDFA(dfa);
-
System.
out.println(opDfa);
-
-
}
-
-
static
class
Graph {
-
private List<Edge> edges;
-
private Node start;
-
private Node end;
-
-
-
public Graph(List<Edge> edges, Node start, Node end) {
-
this.edges = edges;
-
this.start = start;
-
this.end = end;
-
}
-
-
private Graph(String label){
-
this.start =
new Node();
-
this.end =
new Node();
-
this.edges =
new ArrayList<>();
-
Edge edge =
new Edge(start,end,label);
-
this.edges.
add(edge);
-
}
-
-
public static Graph parse(Object obj){
-
if(obj instanceof Graph){
-
return (Graph) obj;
-
}
-
if(obj instanceof String)
-
return
new Graph((String)obj);
-
-
throw
new IllegalArgumentException(
"not support class["+obj.getClass()+
"]");
-
}
-
-
public void add(Object obj){
-
if(obj instanceof String){
-
add((String)obj);
-
return;
-
}
-
if(obj instanceof Graph) {
-
add((Graph) obj);
-
return;
-
}
-
throw
new IllegalArgumentException(
"not support class["+obj.getClass()+
"]");
-
}
-
-
public void divide(Object obj){
-
if(obj instanceof String){
-
divide(parse(obj));
-
return;
-
}
-
if(obj instanceof Graph) {
-
divide((Graph) obj);
-
return;
-
}
-
throw
new IllegalArgumentException(
"not support class["+obj.getClass()+
"]");
-
}
-
-
public void multiply(){
-
Node nodeStart =
this.start;
-
Node nodeEnd =
this.end;
-
this.start =
new Node();
-
this.end =
new Node();
-
Edge edge1 =
new Edge(nodeEnd,nodeStart,E);
-
Edge edge2 =
new Edge(
this.start,nodeStart,E);
-
Edge edge3 =
new Edge(nodeEnd,
this.end,E);
-
Edge edge4 =
new Edge(
this.start,
this.end,E);
-
this.edges.
add(edge1);
-
this.edges.
add(edge2);
-
this.edges.
add(edge3);
-
this.edges.
add(edge4);
-
}
-
-
private void divide(Graph graph){
-
Node nodeStart =
this.start;
-
Node nodeEnd =
this.end;
-
this.start =
new Node();
-
this.end =
new Node();
-
Edge edge1 =
new Edge(
this.start,graph.start,E);
-
Edge edge2 =
new Edge(
this.start,nodeStart,E);
-
Edge edge3 =
new Edge(nodeEnd,
this.end,E);
-
Edge edge4 =
new Edge(graph.end,
this.end,E);
-
this.edges.
add(edge1);
-
this.edges.
add(edge2);
-
this.edges.
add(edge3);
-
this.edges.
add(edge4);
-
this.edges.addAll(graph.edges);
-
}
-
-
private void add(String label){
-
Node mid = end;
-
this.end =
new Node();
-
Edge edge =
new Edge(mid,end,label);
-
this.edges.
add(edge);
-
}
-
-
private void add(Graph graph){
-
Node node =
this.end;
-
this.end = graph.end;
-
edges.stream().filter(edge -> edge.end == node)
-
.forEach(edge -> {
-
edge.end = graph.start;
-
});
-
this.edges.addAll(graph.edges);
-
}
-
-
public List<Node> closure(Node s){
-
return closure(Arrays.asList(s));
-
}
-
public List<Node> closure(List<Node> u){
-
List<Node> result =
new ArrayList<>();
-
do{
-
result.addAll(u);
-
u = move(u,E);
-
}
while (!u.isEmpty());
-
return result;
-
}
-
-
public Node move(Node t, String lable){
-
List<Node> result = move(Arrays.asList(t), lable);
-
if(result.isEmpty())
return
null;
-
return result.
get(
0);
-
}
-
-
public List<Node> move(List<Node> t, String lable){
-
return edges.stream()
-
.filter(edge -> {
-
if(!edge.label.
equals(lable))
-
return
false;
-
for (Node n : t){
-
if(edge.begin == n)
-
return
true;
-
}
-
return
false;
-
})
-
.map(edge -> {
return edge.end;})
-
.collect(Collectors.toList());
-
}
-
-
public List<String> getLabels(){
-
return edges.stream()
-
.filter(edge -> {
return !edge.label.
equals(E);})
-
.map(edge -> {
return edge.label;})
-
.distinct()
-
.collect(Collectors.toList());
-
}
-
-
public Map<Integer,Node> getNodes(){
-
Map<Integer, Node> map =
new HashMap<>();
-
edges.forEach(edge -> {
-
map.put(edge.begin.id, edge.begin);
-
map.put(edge.end.id, edge.end);
-
});
-
return map;
-
}
-
-
@
Override
-
public String
toString(
) {
-
String printString =
"Start=" +
this.start +
" End=" +
this.end +
"\n";
-
for (
int i =
0; i < edges.size(); i++) {
-
printString += edges.
get(i) +
"\n";
-
}
-
return printString;
-
}
-
}
-
-
//边
-
static
class
Edge {
-
private Node begin;
-
private Node end;
-
private String label;
-
-
public Edge(Node begin, Node end, String label) {
-
this.begin = begin;
-
this.end = end;
-
this.label = label;
-
}
-
-
-
@
Override
-
public boolean
equals(
Object o) {
-
if (
this == o)
return
true;
-
if (o ==
null || getClass() != o.getClass())
return
false;
-
-
Edge edge = (Edge) o;
-
-
if (!begin.
equals(edge.begin))
return
false;
-
if (!end.
equals(edge.end))
return
false;
-
return label.
equals(edge.label);
-
}
-
-
@
Override
-
public
int
hashCode(
) {
-
int result = begin.hashCode();
-
result =
31 * result + end.hashCode();
-
result =
31 * result + label.hashCode();
-
return result;
-
}
-
-
@
Override
-
public String
toString(
) {
-
return
"Edge [begin=" + begin +
", end=" + end +
", label=" + label+
"]";
-
}
-
}
-
-
-
//节点
-
static
class
UNode
extends
Node {
-
-
private List<Node> nodes;
-
-
public UNode(List<Node> nodes) {
-
this.nodes = nodes;
-
}
-
-
public List<Node> getNodes() {
-
return nodes;
-
}
-
-
@
Override
-
public boolean
equals(
Object obj) {
-
if(!(obj instanceof UNode))
return super.
equals(obj);
-
-
UNode b = (UNode)obj;
-
-
if(nodes.size() != b.nodes.size())
return
false;
-
-
List<Node> desc =
new ArrayList<>(Arrays.asList(
new UNode[nodes.size()]));
-
Collections.copy(desc, nodes);
-
-
desc.removeAll(b.nodes);
-
if(desc.isEmpty())
return
true;
-
-
return
false;
-
}
-
}
-
-
//节点
-
static
class
Node {
-
private
int id;
-
private
static
int ID=
0;
-
-
public Node(int id){
this.id = id;}
-
-
public Node(){
-
this.id=ID++;
-
}
-
-
-
@
Override
-
public boolean
equals(
Object obj) {
-
if(!(obj instanceof Node))
-
return super.
equals(obj);
-
-
Node b = (Node)obj;
-
-
return id == b.id;
-
}
-
-
@
Override
-
public String
toString(
) {
-
return id+
"";
-
}
-
-
}
-
}
·