flex写Java词法分析_写编译器:学习GNU Flex,写一个词法分析器

以下内容仅为个人学习笔记,非正规教程,难免有疏漏之处,请指出!

目标要分析词法的对象是一种叫TINY+的计算机语言。下面是一个Example,

char str;

int x, fact;

str:= 'sample program in TINY+ language- computes factorial';

read x;

if x>0 and x<100 then {don’t compute if x<=0}

fact:=1;

while x>0 do

fact:=fact*x;

x:=x-1

end;

write fact

end

什么是Flex

“Flex is a tool for generating scanners: programs which recognized lexical patterns in text. flex reads the given input files, or its standard input if no file names are given, for a description of a scanner to generate. The description is in the form of pairs of regular expressions and C code, called rules. flex generates as output a C source file, `lex.yy.c', which defines a routine `yylex()'. This file is compiled and linked with the `-lfl' library to produce an executable. When the executable is run, it analyzes its input for occurrences of the regular expressions. Whenever it finds one, it executes the corresponding C code.” -- Vern Paxson

安装Flex

apt-get install flex

入门

flex通过读取一个*.l或者*.lex文件里的单词匹配规则生成C语言的实现代码。

一个*.l的文件里的结构大概如下,用%%分隔开来。

/* Definition */

%%

/* Rules */

%%

/* C Code */

用flex写一个查找读取文本所有整数的程序。

正则表达式里,[0-9]+表示正整数,[+-]?[0-9]+表示整数。定义下面的一个规则:

%%

[+-]?[0-9]+ { printf("%s\n", yytext); } /* Print integers */

\n {} /* newline */

. {} /* For others, do nothing */

%%

void main(){

yylex();

}

int yywrap(){

return 1;

}

保存为find.l,使用flex和gcc编译。

# flex find.l

# gcc lex.yy.c

编写一文本作输入样例如下,

test 12345 -1

test kkk lll abc123 555

+111 < 456

运行结果,

root@xiaoxia-pc:~/study/Experiment/lex# ./a.out < test.in

12345

-1

123

555

+111

456

进阶

统计一段C语言代码里符合要求的标识符名称。

一个符号要求的标识符名称,满足以下几点:

1、标识符只能包含数字,字母,下划线

2、不能以数字打头

3、不能是C语言关键字

因为关键字太多,不想写,这里只考虑(unsigned, int, return)

%{

/* 在这里面加入C代码 */

int count = 0;

%}

/* definition */

digit [0-9]

number {digit}+

letter [A-Za-z]

identifier (_|{letter})({letter}|{digit}|_)*

%%

[ \t]+ {}

\n {} /* 忽略换行 */

{number} {} /* number */

unsigned|int|return {} /* 关键字 */

{number}({letter}|{digit}|_)* {} /* 不能以数字开头 */

{identifier} { ++count; printf("Good identifier: %s\n", yytext); }

. {}

%%

int yywrap(){return 1;}

void main(){

yylex();

printf("Total number proper identifier is %d\n", count);

}

测试输入find.l

unsigned int rand_code(unsigned int* rand_seed, unsigned int generator)

{

return *rand_seed = ((*_rand_seed * 234generator + 1)>>16)&65535;

}

运行结果:

root@xiaoxia-pc:~/study/Experiment/lex# ./a.out < test.in

Good identifier: rand_code

Good identifier: rand_seed

Good identifier: generator

Good identifier: rand_seed

Good identifier: _rand_seed

Total number proper identifier is 5

root@xiaoxia-pc:~/study/Experiment/lex#

最后

现在要对下面的代码进行词法分析,

char str;

int x, fact;

str:= 'sample program in TINY+ language- computes factorial';

read x;

if x>0 and x<100 then {don’t compute if x<=0}

fact:=1;

while x>0 do

fact:=fact*x;

x:=x-1

end;

write fact

end

这个是我的编译原理作业。交作业那晚才知道要做,自己从头写一个词法分析器就没那么多时间了,于是找flex帮忙。

任务要求:

1 The input of the scanner is a source code file and the output of the scanner is a stream of tokens.

2 Your scanner should go for longest possible match i.e. a string ‘:=’ is to be identified as ‘ass-symbol’ and not as ‘:’ and ‘=’.

3 Token is represented as (Kind, Value). We use the following symbols to denote different kinds of tokens

KEY denotes reserved words

SYM denotes special symbols

ID denotes identifiers

NUM denotes numeric constants

STR denotes string constants

4 Check lexical errors: giving meaning error messages and the lines where errors occur. The kinds of lexical errors are:

Illegal character, that is, scanner may recognize a character that is not in the alphabet of TINY+, such as $ is an illegal character

The right bracket of a STRING is lost, such as ' scanner

The right delimiter of a comment is lost, such as: {this is an example

flex文件:

/*

* File: tiny+.l

* Author: Xiaoxia(xiaoxia@xiaoxia.org)

* Usage: lex tiny+.l

* gcc lex.yy.c

* ./a.out [testfile]

*/

%option yylineno

%%

if|then|else|end|repeat|until|read|write|or|and|int|bool|char|while|do {

printf("Line %d: (KEY, %s)\n", yylineno, yytext); } /* keyword */

":="|"="|"="|">"|"," {

printf("Line %d: (SYM, %s)\n", yylineno, yytext); } /* symbol */

'[^'\n]*' { printf("Line %d: (STR, %s)\n", yylineno, yytext); } /* string */

"{"[^}]*"}" {} /* comment */

[0-9]+ { printf("Line %d: (NUM, %s)\n", yylineno, yytext); } /* number */

[A-Za-z]([A-Za-z]|[0-9])* { printf("Line %d: (ID, %s)\n", yylineno, yytext); } /* identifier */

[ \t]+ {} /* whiltespace */

\n {} /* newline */

. { if(*yytext == '{' || *yytext == '\'')

printf("Line %d: missing the right part of %s\n", yylineno, yytext);

else

printf("Line %d: illegal character %s\n", yylineno, yytext);

yyterminate();

} /* others */

%%

int yywrap() {

return 1;

}

void main(int argc, char** argv) {

if(argc > 1)

yyin = fopen(argv[1], "r");

else

yyin = stdin;

yylex();

}

测试输入如下的Tiny+代码。

{this is an example}

int A,B;

bool C1, C2, C3;

char D;

D:= 'scanner;

while A<=B do

A:=A*2

end

运行结果,

0ed3716af29662ffcc1fe9d013fe830e.png

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值