lex&yacc系列(3)--- yacc介绍及实例

 


yacc是开发编译器的一个有用的工具,采用LR(1)(实际上是LALR(1))语法分析方法。

LR(k)分析方法是1965年Knuth提出的,括号中的k(k >=0)表示向右查看输入串符号的个数。LR分析法给出一种能根据当前分析栈中的符号串和向右顺序查看输入串的k个符号就可唯一确定分析器的动作是移进还是规约和用哪个产生式规约。

这种方法具有分析速度快,能准确地指出出错的位置,它的主要缺点是对于一个使用语言文法的分析器的构造工作量相当大,k愈大构造愈复杂,实现比较困难。

 

Yacc takes a concise description of a grammar(体现在.y文件里) and produces a C routine that can parse(从语法上描述或分析) that grammar, a parser. 

The yacc parser automatically detects whenever a sequence of input tokens matches one of the rules in the grammar and also detects a syntax error whenever its input doesn’t match any of the rules. A yacc parser is generally not as fast as a parser you could write by hand, but the ease in writing and modifying the parser is invariably worth any speed loss.

 

Flex与yacc联合使用时,yacc生成的xx函数不断调用flex生成的yylex函数得到一个个token,然后根据语法规则将这些token进行组合。

The lexer and the parser have to agree what the token codes are. We solve this problem by letting yacc define the token codes.

Yacc可以生成一个头文件,里面包含了token的定义,这个头文件可以直接被lex使用: You include this file, (called y.tab.h on UNIX systems and ytab.h or yytab.h on MS-DOS),  in the lexer and use the preprocessor symbols in your lexer action code.

 

在.y文件中定义token的方法:

%token <类型>     INT_LITERAL //这是带类型的token

%token FUNCTION  //这是不带类型的token

这些在.y文件中定义的token,  yacc最终将其定义在生成的.h文件中,例如,被定义成枚举值:


Flexyacc联合使用举例子:

Flex的.l文件:

%{
/*
 * We now build a lexical analyzer to be used by a higher-level parser.
 */
#include "sentence.tab.h"  /* token codes from the parser,包含yacc生成的头文件,里面有token的定义 */
#define LOOKUP 0  /* default - not a defined word type. */
int state;

int
yywrap(void)
{
    return 1;
}

%}
%%
\n  { state = LOOKUP; }
 \.\n  { state = LOOKUP;
         return 0;  /* end of sentence,,this rule matches a period(句号) followed by a newline */
       }
 ^verb { state = VERB; }
 ^adj { state = ADJECTIVE; }
 ^adv { state = ADVERB; }
 ^noun { state = NOUN; }
 ^prep { state = PREPOSITION; }
 ^pron { state = PRONOUN; }
 ^conj { state = CONJUNCTION; }
 [a-zA-Z]+ {
     if(state != LOOKUP) {
         add_word(state, yytext);
     } 
     else
     {
         switch(lookup_word(yytext)) {
         case VERB:
             return(VERB);
         case ADJECTIVE:
             return(ADJECTIVE);
         case ADVERB:
             return(ADVERB);
         case NOUN:
             return(NOUN);
         case PREPOSITION:
             return(PREPOSITION);
         case PRONOUN:
             return(PRONOUN);
         case CONJUNCTION:
             return(CONJUNCTION);
         default:
             printf("%s: don't recognize\n", yytext);
             /* don't return, just ignore it */
         }
     }
 }
 . ; /* ignore anything else */
 %%
 /*main()
{
        yylex();
} 不在需要main函数,flex仅仅需要生成一个yylex函数供yacc的函数调用即可*/
 /* define a linked list of words and types */
 struct word {
     char *word_name;
     int word_type;
     struct word *next;
 };
 struct word *word_list; /* first element in word list */
 extern void *malloc() ;
 
 int
 add_word(int type, char *word)
 {
     struct word *wp;
     if(lookup_word(word) != LOOKUP) {
         printf("!!! warning: word %s already defined \n", word);
         return 0;
     }
     /* word not there, allocate a new entry and link it on the list */
     wp = (struct word *) malloc(sizeof(struct word));
     wp->next = word_list;
     /* have to copy the word itself as well */
     wp->word_name = (char *) malloc(strlen(word)+1);
     strcpy(wp->word_name, word);
     wp->word_type = type;
     word_list = wp;
     printf("%s:  was added to table\n", yytext);
     return 1; /* it worked */
 }
 int
 lookup_word(char *word)
 {
     struct word *wp = word_list;
     /* search down the list looking for the word */
     for(; wp; wp = wp->next) {
         if(strcmp(wp->word_name, word) == 0)
             return wp->word_type;
     }
     return  LOOKUP; /* not found */
 }
 
 

 

Yacc.y文件:

%{
/*
 * A lexer for the basic grammar to use for recognizing English sentences.
 */
#include <stdio.h>
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence:   subject VERB object { printf("Sentence is valid.\n"); }
 	  ;
subject:   NOUN { printf("NOUN -> subject\n"); }
 | PRONOUN { printf("PRONOUN -> subject\n"); }
      ;
object:   NOUN { printf("NOUN -> object\n"); }
 	  ;
%%
/*This section can contain any C code and is copied, verbatim, into the resulting parser.*/
extern FILE *yyin; //从哪里赋值?
main()
{
 	do
 	{
 		yyparse();
	}
	while (!feof(yyin));
}
yyerror(s)
char *s;
{
 	fprintf(stderr, "%s\n", s);
}

 

编译和运行:

bison  -d  xxx.y ,生成xxx.h和xxx.c文件

flex xxxx.l, 生成xxxx.c, 注意要将bison生成的xxx.h include到xxxx.l文件中。

gcc  -o ex1 xxx.h  xxx.c xxxx.c

//注意:直接运行ex1文件,然后输入字符串,点击回车.(出现了一些奇怪的问题,可能是从窗口输入的原因)。

//

解析.y文件:

%{与%}之间的部分是c语句。

Then come definitions of all the tokens we expect to receive from the lexical analyzer.

The first %% indicates the beginning of the rules section. The second %% indicates the end of the rules and the beginning of the user subroutines section. The most important subroutine is main() which repeatedly calls yyparse() until the lexer’s input file runs out. 

The routine yyparse() is the parser generated by yacc, so our main program repeatedly tries to parse sentences until the input runs out.  (The lexer returns a zero token whenever it sees a period at the end of a line; that’s the signal to the parser that the input for the current parse is complete.)

 

The Rules Section

Each rule consists of a single name on the left-hand side of the “:” operator, a list of symbols and action code on the right-hand side, and a semicolon indicating the end of the rule. By default, the first rule is the highest-level rule. That is, the parser attempts to find a list of tokens which match this initial rule

The symbol on the left-hand side of the rule can then be used like a token in other rules. From this, we build complex grammars.

In our grammar we use the special character “|”, which introduces a rule with the same left-hand side as the previous one.

 

The action part of a rule consists of a C block beginning with “{” and ending with “}”. The parser executes an action at the end of a rule as soon as the rule matches.

 

The parser returns to its caller, in this case the main program, when the lexer reports the end of the input. --(当lexer遇到句子的结尾,即句号加换行时,会return 0, 难道yyparse接受到这个0后默认的动作是结束自己??)

Subsequent calls to yyparse() reset the state and begin processing again. --reset 哪个state?可能yyparse函数?

 

错误处理:

What happens if it sees “subject subject” or some other invalid list of tokens? The parser calls yyerror(), which we provide in the user subroutines section, and then recognizes the special rule error.  You can provide error recovery code that tries to get the parser back into a state where it can continue parsing. If error recovery fails or, as is the case here, there is no error recovery code, yyparse() returns to its caller after it finds an error.

 

Ref:

《lex and yacc--second edition》 – 作者:John R. Levine , chapter 1

作者: 胡彦 本框架是一个lex/yacc完整的示例,用于学习lex/yacc程序基本的搭建方法,在linux/cygwin下敲入make就可以编译和执行。 本例子虽小却演示了lex/yacc程序最常见和重要的特征: * lex/yacc文件格式、程序结构。 * 如何在lex/yacc中使用C++和STL库,用extern "C"声明那些lex/yacc生成的、要链接的C函数,如yylex(), yywrap(), yyerror()。 * 重定义YYSTYPE/yylval为复杂类型。 * 用%token方式声明yacc记号。 * 用%type方式声明非终结符的类型。 * lex里正则表达式的定义、识别方式。 * lex里用yylval向yacc返回属性值。 * 在yacc嵌入的C代码动作里,对记号属性($1, $2等)、和非终结符属性($$)的正确引用方法。 * 对yyin/yyout重赋值,以改变yacc默认的输入/输出目标。 * 如何开始解析(yyparse函数),结束或继续解析(yywrap函数)。 本例子功能是,对当前目录下的file.txt文件,解析出其中的标识符、数字、其它符号,显示在屏幕上。linux调试环境是Ubuntu 10.04。 总之,大部分框架已经搭好了,你只要稍加扩展就可以成为一个计算器之类的程序,用于《编译原理》的课程设计。 文件列表: lex.l: lex程序文件。 yacc.y: yacc程序文件。 main.hpp: 共同使用的头文件。 Makefile: makefile文件。 file.txt: 给程序解析的文本文件。 使用方法: 1-把lex_yacc_example.rar解压到linux/cygwin下。 2-命令行进入lex_yacc_example目录。 3-敲入make,这时会自动执行以下操作: (1) 自动调用flex编译.l文件,生成lex.yy.c文件。 (2) 自动调用bison编译.y文件,生成yacc.tab.c和yacc.tab.h文件。 (3) 自动调用g++编译、链接出可执行文件main。 (4) 自动执行main,得到如下结果:。 bison -d yacc.y g++ -c lex.yy.c g++ -c yacc.tab.c g++ lex.yy.o yacc.tab.o -o main id: abc id: defghi int: 123 int: 45678 op: ! op: @ op: # op: $ AllId: abc defghi 参考资料:《LexYacc从入门到精通(6)-解析C-C++包含文件》, http://blog.csdn.net/pandaxcl/article/details/1321552 其它文章和代码请留意我的blog: http://blog.csdn.net/huyansoft 2013-4-27
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

First Snowflakes

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值