Tiny语言编译器开发之词法分析(利…

最新推荐文章于 2024-01-03 12:32:12 发布

hexiaomin_1984

最新推荐文章于 2024-01-03 12:32:12 发布

阅读量845

点赞数 1

分类专栏：编译器

本文链接：https://blog.csdn.net/hexiaomin_1984/article/details/19823867

版权

编译器专栏收录该内容

9 篇文章 0 订阅

订阅专栏

输入文件为tiny.l，内容如下

%{
#include "globals.h"
#include "util.h"
#include "scan.h"
char tokenString[MAXTOKENLEN+1];
extern int lineno;
%}
digit [0-9]
number {digit}+
letter [a-zA-Z]
identifier {letter}+
newline \n
whitespace [ \t]+
%%
"if" {return IF;}
"then" {return THEN;}
"else" {return ELSE;}
"end" {return END;}
"repeat" {return REPEAT;}
"until" {return UNTIL;}
"read" {return READ;}
"write" {return WRITE;}
":=" {return ASSIGN;}
"=" {return EQ;}
"<" {return LT;}
"+" {return PLUS;}
"-" {return MINUS;}
"*" {return TIMES;}
"/" {return OVER;}
"(" {return LPAREN;}
")" {return RPAREN;}
";" {return SEMI;}
{number} {return NUM;}
{identifier} {return ID;}
{newline} {lineno++;}
{whitespace} {}
"{" {
      char c;
      do
      {
        c=input();
        if (c==EOF) break;
        if (c=='\n') lineno++;
      }while(c!='}');

     }

.     {return ERROR;}

%%

TokenType getToken(void)
{
   static int firstTime = TRUE;
   TokenType currentToken;//当前符号
   if(firstTime)
   {
      firstTime= FALSE;
      lineno++;
      yyin= source;
      yyout=listing;
   }
   currentToken = yylex();
   strncpy(tokenString, yytext, MAXTOKENLEN);
   fprintf(listing, "\t%d: ", lineno);
   printToken(currentToken, tokenString);
   return currentToken;
}

需要说明一下，虽然注释也可以用正则表达式"{"[^\}]*"}"来处理，但是为了处理lineno，还是要进行特殊处理。tiny.l通过lex产生tiny.c和tiny.h,直接和其它源文件一起编译就可以了，也就是说只需要用tiny.c代替原来的scan.c即可，别的就不需要改变了，另外利用VC进行调试时，要链接上yld.lib,以及定义宏YYDEBUG。词法分析器输出的结果如下:

        5: reserved word: read
        5: ID, name=x
        5: ;
        6: reserved word: if
        6: NUM, val= 0
        6: <
        6: ID, name=x
        6: reserved word: then
        7: ID, name=fact
        7: :=
        7: NUM, val= 1
        7: ;
        8: reserved word: repeat
        9: ID, name=fact
        9: :=
        9: ID, name=fact
        9: *
        9: ID, name=x
        9: ;
        10: ID, name=x
        10: :=
        10: ID, name=x
        10: -
        10: NUM, val= 1
        11: reserved word: until
        11: ID, name=x
        11: =
        11: NUM, val= 0
        11: ;
        12: reserved word: write
        12: ID, name=fact
        13: reserved word: end
        14: EOF

   深入分析一下啊，TokenType getToken(void)函数每次都会调用一个yylex返回一个Token，实际上一般情况下yylex会分析完所有的源代码，而不是一次只处理一个Token。通过分析tiny.c我发现有如下代码，int YYCDECL yylexeraction(action)
int action;
#endif
{
yyreturnflg = YYTRUE;
switch (action) {
case 1:
  {
#line 15 ".\\tiny.l"
return IF;
#line 153 "tiny.c"
  }
  break;
case 2:

...

}//end of switch

yyreturnflg = YYFALSE;
return 0;

}

实际上yylex()就是根据yyreturnflg 是否为True来决定是否继续分析的，在本例中每匹配一个正则表达式就会返回，执行不到yyreturnflg = YYFALSE;所以yylex直接返回当前Token就可以了。可以做一个小实验，将上面代码开头处的yyreturnflg = YYTRUE;换成yyreturnflg = YYFALSE;则输出如下结果：

14: EOF

以上结果说明了我的分析是正确的，另外上面在处理换行的时候并不需要考虑各种平台下换行的表示方法，比如Windows平台下换行标志为0D0A，这是因为我们采用的C库函数，C运行库已经帮我们做了这些转换工作。