- 概念
- 正则表达式
- NFA
使用bison来解析输入文件,将输入文件的规则区中的正则表达式转换为NFA图
典型的NFA状态图可见下面的示例
- DFA
NFA到DFA的计算过程:
- 从NFA图中得到每个对应DFA状态的NFA状态集合c1(每次转换一步)
- 求c1的epsilon闭包,得到一个新的NFA状态集合c2
- 根据c2求出一个对应的DFA状态,对应的节点集合为c3
重复步骤a-c,直至所有的转换都已完成(已无新的NFA状态集合),整个构造过程是动态的,NFA状态集合是动态计算得到,一旦有新的NFA状态集合求出,对应的DFA状态也对应增加
- 等价类
等价类(Equivalence Classes)指的是将输入字符根据规则需要分类,例如下面的例子EC总数为8;而meta-等价类(Meta-Equivalence Classes)则是用于模型机的(template),是一种更抽象的分类,如下面的例子meta-EC总数为3
- 转换表和转换算法
- 转换矩阵
根据上面DFA计算结果以及所有的等价类,求出从一个DFA状态转换到另一DFA状态的转换矩阵,两个转换状态的转换边(转换条件)为EC
DFA状态集合和转换矩阵可见下面的示例
- Template 和proto
Template 和proto用于减少转换表项的空间,及加速查找和转换过程;
两种表项均为双向链表
Template 和proto表项可见下面的示例
- 四个一维数组
- def
- base
- chk
- nxt
- 快速转换算法
/* mk1tbl - create table entries for a state (or state fragment) which
* has only one out-transition
*/
void mk1tbl( state, sym, onenxt, onedef )
int state, sym, onenxt, onedef;
{
if ( firstfree < sym )
firstfree = sym;
while ( chk[firstfree] != 0 )
if ( ++firstfree >= current_max_xpairs )
expand_nxt_chk();
base[state] = firstfree - sym;
def[state] = onedef;
chk[firstfree] = state;
nxt[firstfree] = onenxt;
if ( firstfree > tblend )
{
tblend = firstfree++;
if ( firstfree >= current_max_xpairs )
expand_nxt_chk();
}
}
或者:
base[statenum] = tblbase;
def[statenum] = deflink;
for (i = minec; i <= maxec; ++i)
if (state[i] != SAME_TRANS)
if (state[i] != 0 || deflink != JAMSTATE) {
nxt[tblbase + i] = state[i];
chk[tblbase + i] = statenum;
}
if (baseaddr == firstfree)
/* Find next free slot in tables. */
for (++firstfree; chk[firstfree] != 0; ++firstfree) ;
tblend = MAX (tblend, tbllast);
从以上的算法中可以看到firstfree= base[state]+sym(sym代表EC或者meta-EC),因此chk[firstfree] = chk[base[state]+sym] = state 如果成立,则表示存在这个转换表项,就将nxt[firstfree] = nxt[base[state]+sym] = onenxt 的值赋给next(下一表项)
- 一个例子:
- Test.l文件内容:
%%
if printf("KEY: %s\n",yytext);
[a-z][a-z0-9]* printf("ID: %s\n",yytext);
[0-9]+ printf("NUM: %s\n", yytext);
"/" printf("OPER: %s\n",yytext);
"*" printf("OPER: %s\n",yytext);
"/*"(.)*"*/" printf("ANNOTATION: %s\n",yytext);
%%
void main(int argc, char** argv)
{
yylex();
}
int yywrap()
{
return 1;
}
- 输出的各类表项:
G:\meterial\compiler\726lexYacc\flex-2.5.4a-1-src\src\flex\2.5.4a\flex-2.5.4a>flex.exe -T test.l
a.正则表达式
%%
1 if
2 [a-z][a-z0-9]*
3 [0-9]+
4 "\/"
5 "\*"
6 "\/\*"(.)*"\*\/"
7 End Marker
加上默认的“.”规则总共有7个规则
********** beginning dump of nfa with start state 38
state # 1 257: 0, 0
state # 2 257: 0, 0
state # 3 105: 4, 0
state # 4 102: 5, 0
state # 5 257: 0, 0 [1]
state # 6 257: 1, 3
state # 7 -1: 11, 0
state # 8 -2: 9, 0
state # 9 257: 8, 10
state # 10 257: 0, 0 [2]
state # 11 257: 8, 10
state # 12 257: 6, 7
state # 13 -3: 14, 0
state # 14 257: 13, 0 [3]
state # 15 257: 12, 13
state # 16 257: 17, 0
state # 17 47: 18, 0
state # 18 257: 0, 0 [4]
state # 19 257: 15, 16
state # 20 257: 21, 0
state # 21 42: 22, 0
state # 22 257: 0, 0 [5]
state # 23 257: 19, 20
state # 24 257: 25, 0
state # 25 47: 26, 0
state # 26 42: 30, 0
state # 27 -4: 28, 0
state # 28 257: 27, 29
state # 29 257: 31, 0
state # 30 257: 27, 29
state # 31 257: 32, 0
state # 32 42: 33, 0
state # 33 47: 34, 0
state # 34 257: 0, 0 [6]
state # 35 257: 23, 24
state # 36 -5: 37, 0
state # 37 257: 0, 0 [7]
state # 38 257: 35, 36
********** end of dump
由此得到的NFA图为:
b. DFA Dump:
state # 1:
1 4
2 4
3 5
4 6
5 7
6 8
7 8
8 9
state # 2:
1 4
2 4
3 5
4 6
5 7
6 8
7 8
8 9
state # 3:
state # 4:
state # 5:
state # 6:
3 10
state # 7:
5 11
state # 8:
5 12
6 12
7 12
8 12
state # 9:
5 12
6 12
7 13
8 12
state # 10:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
state # 11:
5 11
state # 12:
5 12
6 12
7 12
8 12
state # 13:
5 12
6 12
7 12
8 12
state # 14:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
state # 15:
1 14
3 15
4 16
5 14
6 14
7 14
8 14
state # 16:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
得到的DFA矩阵为:
从上图可以看到总共有16个DFA状态,每个状态是一个对应NFA状态的epsilon闭包
out-transitions: [ \000-\t \v-\377 ]
jam-transitions: EOF [ \n ]
c.可接受的状态
state # 3 accepts: [8]
state # 4 accepts: [7]
state # 5 accepts: [5]
state # 6 accepts: [4]
state # 7 accepts: [3]
state # 8 accepts: [2]
state # 9 accepts: [2]
state # 11 accepts: [3]
state # 12 accepts: [2]
state # 13 accepts: [1]
state # 16 accepts: [6]
d.Equivalence Classes:
\000 = -1 ' ' = 1 @ = 1 ` = 1 \200 = 1 \240 = 1 \300 = 1 \340 = 1
\001 = 1 ! = 1 A = 1 a = 6 \201 = 1 \241 = 1 \301 = 1 \341 = 1
\002 = 1 " = 1 B = 1 b = 6 \202 = 1 \242 = 1 \302 = 1 \342 = 1
\003 = 1 # = 1 C = 1 c = 6 \203 = 1 \243 = 1 \303 = 1 \343 = 1
\004 = 1 $ = 1 D = 1 d = 6 \204 = 1 \244 = 1 \304 = 1 \344 = 1
\005 = 1 % = 1 E = 1 e = 6 \205 = 1 \245 = 1 \305 = 1 \345 = 1
\006 = 1 & = 1 F = 1 f = 7 \206 = 1 \246 = 1 \306 = 1 \346 = 1
\a = 1 ' = 1 G = 1 g = 6 \207 = 1 \247 = 1 \307 = 1 \347 = 1
\b = 1 ( = 1 H = 1 h = 6 \210 = 1 \250 = 1 \310 = 1 \350 = 1
\t = 1 ) = 1 I = 1 i = 8 \211 = 1 \251 = 1 \311 = 1 \351 = 1
\n = 2 * = 3 J = 1 j = 6 \212 = 1 \252 = 1 \312 = 1 \352 = 1
\v = 1 + = 1 K = 1 k = 6 \213 = 1 \253 = 1 \313 = 1 \353 = 1
\f = 1 , = 1 L = 1 l = 6 \214 = 1 \254 = 1 \314 = 1 \354 = 1
\r = 1 - = 1 M = 1 m = 6 \215 = 1 \255 = 1 \315 = 1 \355 = 1
\016 = 1 . = 1 N = 1 n = 6 \216 = 1 \256 = 1 \316 = 1 \356 = 1
\017 = 1 / = 4 O = 1 o = 6 \217 = 1 \257 = 1 \317 = 1 \357 = 1
\020 = 1 0 = 5 P = 1 p = 6 \220 = 1 \260 = 1 \320 = 1 \360 = 1
\021 = 1 1 = 5 Q = 1 q = 6 \221 = 1 \261 = 1 \321 = 1 \361 = 1
\022 = 1 2 = 5 R = 1 r = 6 \222 = 1 \262 = 1 \322 = 1 \362 = 1
\023 = 1 3 = 5 S = 1 s = 6 \223 = 1 \263 = 1 \323 = 1 \363 = 1
\024 = 1 4 = 5 T = 1 t = 6 \224 = 1 \264 = 1 \324 = 1 \364 = 1
\025 = 1 5 = 5 U = 1 u = 6 \225 = 1 \265 = 1 \325 = 1 \365 = 1
\026 = 1 6 = 5 V = 1 v = 6 \226 = 1 \266 = 1 \326 = 1 \366 = 1
\027 = 1 7 = 5 W = 1 w = 6 \227 = 1 \267 = 1 \327 = 1 \367 = 1
\030 = 1 8 = 5 X = 1 x = 6 \230 = 1 \270 = 1 \330 = 1 \370 = 1
\031 = 1 9 = 5 Y = 1 y = 6 \231 = 1 \271 = 1 \331 = 1 \371 = 1
\032 = 1 : = 1 Z = 1 z = 6 \232 = 1 \272 = 1 \332 = 1 \372 = 1
\033 = 1 ; = 1 [ = 1 { = 1 \233 = 1 \273 = 1 \333 = 1 \373 = 1
\034 = 1 < = 1 \ = 1 | = 1 \234 = 1 \274 = 1 \334 = 1 \374 = 1
\035 = 1 = = 1 ] = 1 } = 1 \235 = 1 \275 = 1 \335 = 1 \375 = 1
\036 = 1 > = 1 ^ = 1 ~ = 1 \236 = 1 \276 = 1 \336 = 1 \376 = 1
\037 = 1 ? = 1 _ = 1 \177 = 1 \237 = 1 \277 = 1 \337 = 1 \377 = 1
e. Meta-Equivalence Classes:
1 = 1
2 = 2
3 = 1
4 = 1
5 = 3
6 = 3
7 = 3
8 = 3
得到的template如下图:
由此可以看出,此例得到2个template和5个proto
- 概念
- 正则表达式
- NFA
使用bison来解析输入文件,将输入文件的规则区中的正则表达式转换为NFA图
典型的NFA状态图可见下面的示例
- DFA
NFA到DFA的计算过程:
- 从NFA图中得到每个对应DFA状态的NFA状态集合c1(每次转换一步)
- 求c1的epsilon闭包,得到一个新的NFA状态集合c2
- 根据c2求出一个对应的DFA状态,对应的节点集合为c3
重复步骤a-c,直至所有的转换都已完成(已无新的NFA状态集合),整个构造过程是动态的,NFA状态集合是动态计算得到,一旦有新的NFA状态集合求出,对应的DFA状态也对应增加
- 等价类
等价类(Equivalence Classes)指的是将输入字符根据规则需要分类,例如下面的例子EC总数为8;而meta-等价类(Meta-Equivalence Classes)则是用于模型机的(template),是一种更抽象的分类,如下面的例子meta-EC总数为3
- 转换表和转换算法
- 转换矩阵
根据上面DFA计算结果以及所有的等价类,求出从一个DFA状态转换到另一DFA状态的转换矩阵,两个转换状态的转换边(转换条件)为EC
DFA状态集合和转换矩阵可见下面的示例
- Template 和proto
Template 和proto用于减少转换表项的空间,及加速查找和转换过程;
两种表项均为双向链表
Template 和proto表项可见下面的示例
- 四个一维数组
- def
- base
- chk
- nxt
- 快速转换算法
/* mk1tbl - create table entries for a state (or state fragment) which
* has only one out-transition
*/
void mk1tbl( state, sym, onenxt, onedef )
int state, sym, onenxt, onedef;
{
if ( firstfree < sym )
firstfree = sym;
while ( chk[firstfree] != 0 )
if ( ++firstfree >= current_max_xpairs )
expand_nxt_chk();
base[state] = firstfree - sym;
def[state] = onedef;
chk[firstfree] = state;
nxt[firstfree] = onenxt;
if ( firstfree > tblend )
{
tblend = firstfree++;
if ( firstfree >= current_max_xpairs )
expand_nxt_chk();
}
}
或者:
base[statenum] = tblbase;
def[statenum] = deflink;
for (i = minec; i <= maxec; ++i)
if (state[i] != SAME_TRANS)
if (state[i] != 0 || deflink != JAMSTATE) {
nxt[tblbase + i] = state[i];
chk[tblbase + i] = statenum;
}
if (baseaddr == firstfree)
/* Find next free slot in tables. */
for (++firstfree; chk[firstfree] != 0; ++firstfree) ;
tblend = MAX (tblend, tbllast);
从以上的算法中可以看到firstfree= base[state]+sym(sym代表EC或者meta-EC),因此chk[firstfree] = chk[base[state]+sym] = state 如果成立,则表示存在这个转换表项,就将nxt[firstfree] = nxt[base[state]+sym] = onenxt 的值赋给next(下一表项)
- 一个例子:
- Test.l文件内容:
%%
if printf("KEY: %s\n",yytext);
[a-z][a-z0-9]* printf("ID: %s\n",yytext);
[0-9]+ printf("NUM: %s\n", yytext);
"/" printf("OPER: %s\n",yytext);
"*" printf("OPER: %s\n",yytext);
"/*"(.)*"*/" printf("ANNOTATION: %s\n",yytext);
%%
void main(int argc, char** argv)
{
yylex();
}
int yywrap()
{
return 1;
}
- 输出的各类表项:
G:\meterial\compiler\726lexYacc\flex-2.5.4a-1-src\src\flex\2.5.4a\flex-2.5.4a>flex.exe -T test.l
liuwenyong
a.正则表达式
%%
1 if
2 [a-z][a-z0-9]*
3 [0-9]+
4 "\/"
5 "\*"
6 "\/\*"(.)*"\*\/"
7 End Marker
加上默认的“.”规则总共有7个规则
********** beginning dump of nfa with start state 38
state # 1 257: 0, 0
state # 2 257: 0, 0
state # 3 105: 4, 0
state # 4 102: 5, 0
state # 5 257: 0, 0 [1]
state # 6 257: 1, 3
state # 7 -1: 11, 0
state # 8 -2: 9, 0
state # 9 257: 8, 10
state # 10 257: 0, 0 [2]
state # 11 257: 8, 10
state # 12 257: 6, 7
state # 13 -3: 14, 0
state # 14 257: 13, 0 [3]
state # 15 257: 12, 13
state # 16 257: 17, 0
state # 17 47: 18, 0
state # 18 257: 0, 0 [4]
state # 19 257: 15, 16
state # 20 257: 21, 0
state # 21 42: 22, 0
state # 22 257: 0, 0 [5]
state # 23 257: 19, 20
state # 24 257: 25, 0
state # 25 47: 26, 0
state # 26 42: 30, 0
state # 27 -4: 28, 0
state # 28 257: 27, 29
state # 29 257: 31, 0
state # 30 257: 27, 29
state # 31 257: 32, 0
state # 32 42: 33, 0
state # 33 47: 34, 0
state # 34 257: 0, 0 [6]
state # 35 257: 23, 24
state # 36 -5: 37, 0
state # 37 257: 0, 0 [7]
state # 38 257: 35, 36
********** end of dump
由此得到的NFA图为:
b. DFA Dump:
state # 1:
1 4
2 4
3 5
4 6
5 7
6 8
7 8
8 9
state # 2:
1 4
2 4
3 5
4 6
5 7
6 8
7 8
8 9
state # 3:
state # 4:
state # 5:
state # 6:
3 10
state # 7:
5 11
state # 8:
5 12
6 12
7 12
8 12
state # 9:
5 12
6 12
7 13
8 12
state # 10:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
state # 11:
5 11
state # 12:
5 12
6 12
7 12
8 12
state # 13:
5 12
6 12
7 12
8 12
state # 14:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
state # 15:
1 14
3 15
4 16
5 14
6 14
7 14
8 14
state # 16:
1 14
3 15
4 14
5 14
6 14
7 14
8 14
得到的DFA矩阵为:
从上图可以看到总共有16个DFA状态,每个状态是一个对应NFA状态的epsilon闭包
out-transitions: [ \000-\t \v-\377 ]
jam-transitions: EOF [ \n ]
c.可接受的状态
state # 3 accepts: [8]
state # 4 accepts: [7]
state # 5 accepts: [5]
state # 6 accepts: [4]
state # 7 accepts: [3]
state # 8 accepts: [2]
state # 9 accepts: [2]
state # 11 accepts: [3]
state # 12 accepts: [2]
state # 13 accepts: [1]
state # 16 accepts: [6]
d.Equivalence Classes:
\000 = -1 ' ' = 1 @ = 1 ` = 1 \200 = 1 \240 = 1 \300 = 1 \340 = 1
\001 = 1 ! = 1 A = 1 a = 6 \201 = 1 \241 = 1 \301 = 1 \341 = 1
\002 = 1 " = 1 B = 1 b = 6 \202 = 1 \242 = 1 \302 = 1 \342 = 1
\003 = 1 # = 1 C = 1 c = 6 \203 = 1 \243 = 1 \303 = 1 \343 = 1
\004 = 1 $ = 1 D = 1 d = 6 \204 = 1 \244 = 1 \304 = 1 \344 = 1
\005 = 1 % = 1 E = 1 e = 6 \205 = 1 \245 = 1 \305 = 1 \345 = 1
\006 = 1 & = 1 F = 1 f = 7 \206 = 1 \246 = 1 \306 = 1 \346 = 1
\a = 1 ' = 1 G = 1 g = 6 \207 = 1 \247 = 1 \307 = 1 \347 = 1
\b = 1 ( = 1 H = 1 h = 6 \210 = 1 \250 = 1 \310 = 1 \350 = 1
\t = 1 ) = 1 I = 1 i = 8 \211 = 1 \251 = 1 \311 = 1 \351 = 1
\n = 2 * = 3 J = 1 j = 6 \212 = 1 \252 = 1 \312 = 1 \352 = 1
\v = 1 + = 1 K = 1 k = 6 \213 = 1 \253 = 1 \313 = 1 \353 = 1
\f = 1 , = 1 L = 1 l = 6 \214 = 1 \254 = 1 \314 = 1 \354 = 1
\r = 1 - = 1 M = 1 m = 6 \215 = 1 \255 = 1 \315 = 1 \355 = 1
\016 = 1 . = 1 N = 1 n = 6 \216 = 1 \256 = 1 \316 = 1 \356 = 1
\017 = 1 / = 4 O = 1 o = 6 \217 = 1 \257 = 1 \317 = 1 \357 = 1
\020 = 1 0 = 5 P = 1 p = 6 \220 = 1 \260 = 1 \320 = 1 \360 = 1
\021 = 1 1 = 5 Q = 1 q = 6 \221 = 1 \261 = 1 \321 = 1 \361 = 1
\022 = 1 2 = 5 R = 1 r = 6 \222 = 1 \262 = 1 \322 = 1 \362 = 1
\023 = 1 3 = 5 S = 1 s = 6 \223 = 1 \263 = 1 \323 = 1 \363 = 1
\024 = 1 4 = 5 T = 1 t = 6 \224 = 1 \264 = 1 \324 = 1 \364 = 1
\025 = 1 5 = 5 U = 1 u = 6 \225 = 1 \265 = 1 \325 = 1 \365 = 1
\026 = 1 6 = 5 V = 1 v = 6 \226 = 1 \266 = 1 \326 = 1 \366 = 1
\027 = 1 7 = 5 W = 1 w = 6 \227 = 1 \267 = 1 \327 = 1 \367 = 1
\030 = 1 8 = 5 X = 1 x = 6 \230 = 1 \270 = 1 \330 = 1 \370 = 1
\031 = 1 9 = 5 Y = 1 y = 6 \231 = 1 \271 = 1 \331 = 1 \371 = 1
\032 = 1 : = 1 Z = 1 z = 6 \232 = 1 \272 = 1 \332 = 1 \372 = 1
\033 = 1 ; = 1 [ = 1 { = 1 \233 = 1 \273 = 1 \333 = 1 \373 = 1
\034 = 1 < = 1 \ = 1 | = 1 \234 = 1 \274 = 1 \334 = 1 \374 = 1
\035 = 1 = = 1 ] = 1 } = 1 \235 = 1 \275 = 1 \335 = 1 \375 = 1
\036 = 1 > = 1 ^ = 1 ~ = 1 \236 = 1 \276 = 1 \336 = 1 \376 = 1
\037 = 1 ? = 1 _ = 1 \177 = 1 \237 = 1 \277 = 1 \337 = 1 \377 = 1
e. Meta-Equivalence Classes:
1 = 1
2 = 2
3 = 1
4 = 1
5 = 3
6 = 3
7 = 3
8 = 3
得到的template如下图:
由此可以看出,此例得到2个template和5个proto