基于‘DFA’的使用‘表驱动法’识别‘字符串模式’的方法

最新推荐文章于 2024-07-28 09:48:55 发布

幸福幻觉

最新推荐文章于 2024-07-28 09:48:55 发布

阅读量4.1k

点赞数 4

分类专栏：编译原理文章标签：编译原理

本文链接：https://blog.csdn.net/hexiaole1994/article/details/51194467

版权

编译原理专栏收录该内容

4 篇文章 1 订阅

订阅专栏

一、概述

1. 术语简介

1）DFA

Deterministic Finite Automata，确定的有穷自动机，这是一个识别字符串模式的模型，术语参考自书籍《编译原理》。

该模型对应一个状态，字母表，和转换函数的集合。

例如：需要识别字符串aabb

状态：当前字符串识别的状态，在例子中，其中的，a，aa，aab，aabb对应不同的状态，假设对应为状态1，2，3，4，其中4被称作接受状态（accept state，表示已经得到符合模式的字符串）

字母表：字符串中需要进行识别字符集合，在例子中，a和b是需要识别的字符，字母表为｛a，b｝

转换函数：描述了每个状态在字母表中每个字母对应的下一个状态，如F[1, a] = 2，F[1, b] = error

为方便举例，上述使用了某个具体的字符串说明相关概念，实际上DFA可以识别字符串的集合，它的识别能力等价于使用正则表达式描述的字符串的集合。

2）表驱动法

每个DFA模型（状态，字母表，以及转换函数的集合）可以表示为一张转换表，表中的每行对应一个状态，表中的每列对应字母表中的字符，对应于上述的例子，在下表中，0状态表示从此处开始识别，err表示识别到一个错误（即当前字符串不符合模式），acc表示识别成功，接受状态用“（）”标识，如下表所示：

表1.1 aabb的DFA对应的转换表

	a	b	eof
0	1	err	err
1	2	err	err
2	err	3	err
3	err	4	err
(4)	err	err	acc

表驱动法是以DFA对应的这张表，构造出一个“用表去驱动字符串模式的识别”的方法。

3）字符串模式

字符串模式表示一个字符串的集合，熟悉正则表达式的话可以将他们看做同一事物，识别以“.txt”结尾的开头为a或b或c的字符串，可以使用模式，"(a|b|c)\.txt"，表示。（该模式用正则表达式表示，"."表示在正则表达式中有特殊含义：表示任意一个字符，使用"\"将它转义为一般字符）

2. 问题描述

在编译原理课程上，老师布置了一个任务：用表驱动法模拟DFA的识别字符串（语言）："(a|b)*abb"，的过程。

（该模式用正则表达式表示，"*"表示前面的符号存在0次或多次）

二、算法思路

根据上述的介绍，实际上需要做的工作是：

1）将需要识别的字符串表示为DFA

2）将DFA转变为相应的转换表

3）根据转换表实现表驱动法

由于该算法的重心在于实现表驱动法，所以前两步在纸上完成，先根据《编译原理》书籍上提到的相关算法（为字符串构造NFA（Non-deterministic Finite Automata，不确定的有穷自动机），将NFA转化为DFA，最小化DFA的状态）构造出表示"(a|b)*abb"的DFA，然后描述出对应的转换表，如下图所示：

图2.1 "(a|b)*abb"对应的转换图

	a	b	eof
0	1	0	err
1	1	2	err
2	1	3	err
(3)	1	0	acc

图2.2 "(a|b)*abb"对应的转换表

根据转换表，表驱动法可以用伪代码描述如下，参考自《编译原理》：

s = s0    // s0 is state 0, the beginning state
c = nextChar()
while (c != eof){
    s = F(s, c)    // F is the state-transition method
    c = nextChar()
}
if (s in acceptState())    // whether exit state s is accept state when read eof
    print("yes")
else
    print("no")

三、算法实现

1. C++实现代码

#include <iostream>    // for cin, cout
#include <map>    // for map

using std::cin;
using std::cout;
using std::map;

// the enumeration of DFA states
enum state {S0=0, S1, S2, S3 ,ERR};

// the max row and colum of transition-table
const int MAXROW = 5;
const int MAXCOL = 3;
//===----------------------------------------===//
//  state-transition table definition
//          a   b
//     S0  S1  S0
//     S1  S1  S2
//     S2  S1  S3
//     S3  S1  S0
//    ERR ERR ERR
//===----------------------------------------===//
// the first column set for invalid char that not in the alphabet
state transTable [MAXROW][MAXCOL] =
{
    {  ERR,  S1,  S0  },
    {  ERR,  S1,  S2  },
    {  ERR,  S1,  S3  },
    {  ERR,  S1,  S0  },
    {  ERR, ERR, ERR  }
};
// the map table for alphabet of (a|b)*abb
map<char, int> alphabet =
{
    // 'a' map to the 1 column and 'b' map to the 2 column of transTable
    { 'a' , 1 },
    { 'b' , 2 }
};

void tableDrive();    // table-drive function
state F(state, char);    // state-transition function
char nextChar();    // get next input char

int main()
{
    cout << "================================\n";
    cout << " String-model: (a|b)*abb        \n";
    cout << " End-of-input: $                \n";
    cout << "================================\n";
    while (1){
        cout << ">";
        tableDrive();
    }
    return 0;
}

// table-drive function for recognize the string-model:(a|b)*abb
void tableDrive()
{
    state s = S0;
    char c = nextChar();
    while (c != '$'){    // '$' indicate the end of input
        s = F(s, c);
        c = nextChar();
    }
    // S3 is the only accept state
    if (s == S3)
        cout << "yes\n";
    else
        cout << "no\n";
}
//===----------------------------------------===//
//  state-transition function
//  s is the current state, c is the current char
//  base on the transition-table above of DFA
//===----------------------------------------===//
state F(state s, char c)
{
    state ret;
    int col = alphabet[c];    // if c not in alphabet, the maped value will set to 0

    ret = transTable[s][col];
    return ret;
}
// get the next input char
char nextChar()
{
    char ret;

    cin >> ret;
    return ret;
}

2. 运行结果