lex&yacc系列(3)--- yacc介绍及实例

最新推荐文章于 2019-07-29 23:57:23 发布

First Snowflakes

最新推荐文章于 2019-07-29 23:57:23 发布

阅读量1.1k

点赞数 1

分类专栏：编译器

本文链接：https://blog.csdn.net/qq_35865125/article/details/86755241

版权

编译器专栏收录该内容

13 篇文章 1 订阅

订阅专栏

yacc是开发编译器的一个有用的工具,采用LR（1）（实际上是LALR(1)）语法分析方法。

LR(k)分析方法是1965年Knuth提出的，括号中的k（k >=0）表示向右查看输入串符号的个数。LR分析法给出一种能根据当前分析栈中的符号串和向右顺序查看输入串的k个符号就可唯一确定分析器的动作是移进还是规约和用哪个产生式规约。

这种方法具有分析速度快，能准确地指出出错的位置，它的主要缺点是对于一个使用语言文法的分析器的构造工作量相当大，k愈大构造愈复杂，实现比较困难。

Yacc takes a concise description of a grammar(体现在.y文件里) and produces a C routine that can parse(从语法上描述或分析) that grammar, a parser.

The yacc parser automatically detects whenever a sequence of input tokens matches one of the rules in the grammar and also detects a syntax error whenever its input doesn’t match any of the rules. A yacc parser is generally not as fast as a parser you could write by hand, but the ease in writing and modifying the parser is invariably worth any speed loss.

Flex与yacc联合使用时，yacc生成的xx函数不断调用flex生成的yylex函数得到一个个token,然后根据语法规则将这些token进行组合。

The lexer and the parser have to agree what the token codes are. We solve this problem by letting yacc define the token codes.

Yacc可以生成一个头文件，里面包含了token的定义，这个头文件可以直接被lex使用： You include this file, (called y.tab.h on UNIX systems and ytab.h or yytab.h on MS-DOS), in the lexer and use the preprocessor symbols in your lexer action code.

在.y文件中定义token的方法：

%token <类型> INT_LITERAL //这是带类型的token

%token FUNCTION //这是不带类型的token

这些在.y文件中定义的token, yacc最终将其定义在生成的.h文件中，例如，被定义成枚举值：

Flex和yacc联合使用举例子：

Flex的.l文件：

%{
/*
 * We now build a lexical analyzer to be used by a higher-level parser.
 */
#include "sentence.tab.h"  /* token codes from the parser，包含yacc生成的头文件，里面有token的定义 */
#define LOOKUP 0  /* default - not a defined word type. */
int state;

int
yywrap(void)
{
    return 1;
}

%}
%%
\n  { state = LOOKUP; }
 \.\n  { state = LOOKUP;
         return 0;  /* end of sentence，，this rule matches a period(句号) followed by a newline */
       }
 ^verb { state = VERB; }
 ^adj { state = ADJECTIVE; }
 ^adv { state = ADVERB; }
 ^noun { state = NOUN; }
 ^prep { state = PREPOSITION; }
 ^pron { state = PRONOUN; }
 ^conj { state = CONJUNCTION; }
 [a-zA-Z]+ {
     if(state != LOOKUP) {
         add_word(state, yytext);
     } 
     else
     {
         switch(lookup_word(yytext)) {
         case VERB:
             return(VERB);
         case ADJECTIVE:
             return(ADJECTIVE);
         case ADVERB:
             return(ADVERB);
         case NOUN:
             return(NOUN);
         case PREPOSITION:
             return(PREPOSITION);
         case PRONOUN:
             return(PRONOUN);
         case CONJUNCTION:
             return(CONJUNCTION);
         default:
             printf("%s: don't recognize\n", yytext);
             /* don't return, just ignore it */
         }
     }
 }
 . ; /* ignore anything else */
 %%
 /*main()
{
        yylex();
} 不在需要main函数，flex仅仅需要生成一个yylex函数供yacc的函数调用即可*/
 /* define a linked list of words and types */
 struct word {
     char *word_name;
     int word_type;
     struct word *next;
 };
 struct word *word_list; /* first element in word list */
 extern void *malloc() ;
 
 int
 add_word(int type, char *word)
 {
     struct word *wp;
     if(lookup_word(word) != LOOKUP) {
         printf("!!! warning: word %s already defined \n", word);
         return 0;
     }
     /* word not there, allocate a new entry and link it on the list */
     wp = (struct word *) malloc(sizeof(struct word));
     wp->next = word_list;
     /* have to copy the word itself as well */
     wp->word_name = (char *) malloc(strlen(word)+1);
     strcpy(wp->word_name, word);
     wp->word_type = type;
     word_list = wp;
     printf("%s:  was added to table\n", yytext);
     return 1; /* it worked */
 }
 int
 lookup_word(char *word)
 {
     struct word *wp = word_list;
     /* search down the list looking for the word */
     for(; wp; wp = wp->next) {
         if(strcmp(wp->word_name, word) == 0)
             return wp->word_type;
     }
     return  LOOKUP; /* not found */
 }

Yacc的.y文件：

%{
/*
 * A lexer for the basic grammar to use for recognizing English sentences.
 */
#include <stdio.h>
%}
%token NOUN PRONOUN VERB ADVERB ADJECTIVE PREPOSITION CONJUNCTION
%%
sentence:   subject VERB object { printf("Sentence is valid.\n"); }
 	  ;
subject:   NOUN { printf("NOUN -> subject\n"); }
 | PRONOUN { printf("PRONOUN -> subject\n"); }
      ;
object:   NOUN { printf("NOUN -> object\n"); }
 	  ;
%%
/*This section can contain any C code and is copied, verbatim, into the resulting parser.*/
extern FILE *yyin; //从哪里赋值？
main()
{
 	do
 	{
 		yyparse();
	}
	while (!feof(yyin));
}
yyerror(s)
char *s;
{
 	fprintf(stderr, "%s\n", s);
}

编译和运行：

bison -d xxx.y ，生成xxx.h和xxx.c文件

flex xxxx.l，生成xxxx.c, 注意要将bison生成的xxx.h include到xxxx.l文件中。

gcc -o ex1 xxx.h xxx.c xxxx.c

//注意：直接运行ex1文件，然后输入字符串，点击回车.(出现了一些奇怪的问题，可能是从窗口输入的原因)。

解析.y文件：

%{与%}之间的部分是c语句。

Then come definitions of all the tokens we expect to receive from the lexical analyzer.

The first %% indicates the beginning of the rules section. The second %% indicates the end of the rules and the beginning of the user subroutines section. The most important subroutine is main() which repeatedly calls yyparse() until the lexer’s input file runs out.

The routine yyparse() is the parser generated by yacc, so our main program repeatedly tries to parse sentences until the input runs out. (The lexer returns a zero token whenever it sees a period at the end of a line; that’s the signal to the parser that the input for the current parse is complete.)

The Rules Section：

Each rule consists of a single name on the left-hand side of the “:” operator, a list of symbols and action code on the right-hand side, and a semicolon indicating the end of the rule. By default, the first rule is the highest-level rule. （That is, the parser attempts to find a list of tokens which match this initial rule）

The symbol on the left-hand side of the rule can then be used like a token in other rules. From this, we build complex grammars.

In our grammar we use the special character “|”, which introduces a rule with the same left-hand side as the previous one.

The action part of a rule consists of a C block beginning with “{” and ending with “}”. The parser executes an action at the end of a rule as soon as the rule matches.

The parser returns to its caller, in this case the main program, when the lexer reports the end of the input. --（当lexer遇到句子的结尾，即句号加换行时，会return 0, 难道yyparse接受到这个0后默认的动作是结束自己？？）

Subsequent calls to yyparse() reset the state and begin processing again. --（reset 哪个state?可能yyparse函数?）

错误处理：

What happens if it sees “subject subject” or some other invalid list of tokens? The parser calls yyerror(), which we provide in the user subroutines section, and then recognizes the special rule error. You can provide error recovery code that tries to get the parser back into a state where it can continue parsing. If error recovery fails or, as is the case here, there is no error recovery code, yyparse() returns to its caller after it finds an error.

Ref:

《lex and yacc--second edition》 – 作者：John R. Levine , chapter 1