Tiny C语言编译程序之词法分析Scanner
约定:
- 仅允许整数类型,不允许实数类型
- 标识符由大小写英文字母组成,最多52个。其识别按最长匹配原则
- 整数后紧跟非数字,或标识符后紧跟非字母认为是一个新Token开始
- 由{ }括起来符号串都认为是注释部分,该部分在词法分析时被过滤掉
- 识别出的Token由两个变量:currentToken,tokenString识别,其中currentToken代表Token的类属,为一个名为TokenType的枚举类型,在文件globals.h中定义;tokenString代表Token在程序中出现的形式,即其本来面目。例如整数10的currentToken值为NUM,而tokenString值为‘10’;标识符i的currentToken值为ID,而tokenString值为‘i’
画识别符合TINY C语言构词规则的DFA。然后用直接编码的方法构造词法分析器
词法分析器scan.c
/****************************************************/
/* File: scan.c */
/* The scanner implementation for the TINY compiler */
/****************************************************/
#include "globals.h"
#include "util.h"
#include "scan.h"
/* states in scanner DFA */
typedef enum
{ START,INASSIGN,INCOMMENT,INNUM,INID,DONE }
StateType;
/* lexeme of identifier or reserved word */
char tokenString[MAXTOKENLEN+1];
/* BUFLEN = length of the input buffer for
source code lines */
#define BUFLEN 256
static char lineBuf[BUFLEN]; /* holds the current line */
static int linepos = 0; /* current position in LineBuf */
static int bufsize = 0; /* current size of buffer string */
static int EOF_flag = FALSE; /* corrects ungetNextChar behavior on EOF */
/* getNextChar fetches the next non-blank character
from lineBuf, reading in a new line if lineBuf is
exhausted */
//获得下一字符
static int getNextChar(void)
{ if (!(linepos < bufsize))
{ lineno++;
if (fgets(lineBuf,BUFLEN-1,source))
{ if (EchoSource) fprintf(listing,"%4d: %s",lineno,lineBuf);
bufsize = strlen(lineBuf);
linepos = 0;
return lineBuf[linepos++];
}
else
{ EOF_flag = TRUE;
return EOF;
}
}
else return lineBuf[linepos++];
}
/*