使用Flex建立一个scanner(词法分析)

Experiment Purposes

Implement a scanner that has the following functions:

  • Find out the preprocess directives including header file and define:

    • #include <xx.h>

    • #include "xxx.h"

    • #define ABC 123

  • Disregard comments and blank:

    • single /multiline comments

    • space, tab space, line feed

  • Function name

  • Variable name

  • Constant: integer, floating point number, etc

  • Keyword: if, else, int, float, return, etc

  • Operator: +-=*/ etc

  • Punctuation: : {} () , etc

  • Special

  • Identify the sentence in printf("")

Experiment Setup

Flex

Use the command below:

sudo apt install flex

Create .l File

In lab3.l, I write the regular expressions to identify different token with their categories. Details will be discussed in the next section.

Makefile

First, download the relative package:

sudo apt install make

Then, create a file named Makefile with the following content:

test:
        flex lab3.l
        gcc lex.yy.c -o lab3
        ./lab3
        
clean:
        rm -f lex.yy.c

Experiment Results

Here is the test.txt used to test the scanner. text.txt is written in C.

#include <stdio.h>
#define abc 123
​
// this is a single line comment.
​
/*
  this is a multiple line comment.
  */
  
int main() {
    int x = 3.5;
    if (x > 0) {
        printf("x is positive\n");
    } else {
        printf("x is non-positive\n");
    }
    while (x > 0) {
        printf("x is positive\n");
        x--;
    }
}

Preprocess directives

PREPROCESS  #.*$

I use RE to identify the #, the line behind # is preprocess directives.

Comments and Blank

COMMENTS1   (\/\*([^*]+)\*\/)
COMMENTS2   \/\/.*\n
SPACE       (" ")
TAB     ("  ")
LINE_FEED       ("\n")
COMMENTS_BLANK      {COMMENTS1}|{COMMENTS2}|{SPACE}|{TAB}|{LINE_FEED}

Comments are separated into single and multiple lines with different prefixes. Comments1 means it begins with /* and ends with */ with any content in the middle. Comments2 means begins with // till the end of the line.

Function and Variable Names

FUNCTION    ([a-zA-Z0-9_]*\([^"]*\))
VARNAME ([a-zA-Z0-9_]*)

The way to find functions and variables is similar. Using the whole alphabet and underline to represent the name and for function, you should identify the content in parenthesis. To distinguish function() and printf(), the content in function () shouldn't contain quotation marks.

Constant

CONSTANT    [0-9]+(\.[0-9]+)?

This RE can identify both integer and float point numbers with decimal points. The question mark means the content in parenthesis can be present or not.

Keyword

KEYWORD       ("if"|"else"|"while"|"for"|"printf"|"int"|"double"|"float"|"return")

Just identify the keyword in the list but it looks very cumbersome.

Operator

OPERATORS	("++"|"--"|"+"|"-"|"*"|"/"|"=")

Same logic as the last one.

Punctuation

PUNCTUATION	("{"|"}"|"("|")"|";"|",")

Same logic as the last one.

printf()

PRINTF		(\"([^\)]+)\")

This RE means identify the content in the quotation mark which can effectively separate the sentence.

printf("x is non-positive\n");

Appendix

Full code lab3.l

%{
#include <stdio.h>
%}

KEYWORD       ("if"|"else"|"while"|"for"|"printf"|"int"|"double"|"float"|"return")
CONSTANT	[0-9]+(\.[0-9]+)?
PUNCTUATION	("{"|"}"|"("|")"|";"|",")
OPERATORS	("++"|"--"|"+"|"-"|"*"|"/"|"=")
PREPROCESS	#.*$
FUNCTION	([a-zA-Z0-9_]*\(*\))
VARNAME	([a-zA-Z0-9_]*)
PRINTF		(\"([^\)]+)\")
COMMENTS1	(\/\*([^*]+)\*\/)
COMMENTS2	\/\/.*\n
SPACE		(" ")
TAB		("	")
LINE_FEED		("\n")
COMMENTS_BLANK		{COMMENTS1}|{COMMENTS2}|{SPACE}|{TAB}|{LINE_FEED}
%%
{COMMENTS_BLANK} {
	if(*yytext == '\n'){
		printf("<Line feed, Comments or blank>\n");
	}
	else{
		printf("<%s, Comments or blank>\n", yytext);}
	}
{PREPROCESS}   { printf("<%s, Preprocessor>\n", yytext); }
{KEYWORD}   { printf("<%s, Keyword>\n", yytext); }
{CONSTANT}	{ printf("<%s, Constant>\n", yytext); }
{OPERATORS}	{ printf("<%s, Operator>\n", yytext); }
{FUNCTION} { printf("<%s, Function name>\n", yytext); }
{VARNAME} { printf("<%s, Variable name>\n", yytext); }
{PUNCTUATION}	{ printf("<%s, Punctuation>\n", yytext); }
{PRINTF}	{ int length = strlen(yytext); printf("<%s, Printf content>\n", yytext);}

.           { printf("<%s, Special>\n", yytext); }
%%

int yywrap(){}
int main(int argc, char** argv) {
	FILE *fp;
	char filename[50];
	printf("Enter the filename: \n");
	scanf("%s", filename);
	fp = fopen(filename, "r");
	yyin = fp;
	yylex();
        return 0;
}
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值