Yacc : Yet Another Compiler-Compiler 中英对照3

最新推荐文章于 2019-04-13 15:44:00 发布

zhcfreesea

最新推荐文章于 2019-04-13 15:44:00 发布

阅读量408

点赞数

分类专栏： yacc文档翻译文章标签： yacc token character structure underscore newline

yacc文档翻译专栏收录该内容

3 篇文章 0 订阅

订阅专栏

1: Basic Specifications

1:基本规范

Names refer to either tokens or nonterminal symbols. Yacc requires token names to be declared as such. In addition, for reasons discussed in Section 3, it is often desirable to include the lexical analyzer as part of the specification file; it may be useful to include other programs as well. Thus, every specification file consists of three sections: the declarations, (grammar) rules, and programs. The sections are separated by double percent ``%%'' marks. (The percent ``%'' is generally used in Yacc specifications as an escape character.)

“名字”指token或者非终结符符号。Yacc需要token名字被声明如是。另外，由于第3节中讨论的原因，词法分析器经常希望被包含在规范文件中；包含其他程序也可能是有用的。因此，每个规范文件包括三段：声明，(语法)规则，和程序。各段间用双百分号“%%”标记。(百分号“%”一般在Yacc规范中被用做转义字符)

In other words, a full specification file looks like

换言之，一个完整的规范文件就像

        declarations
        %%
        rules
        %%
        programs

The declaration section may be empty. Moreover, if the programs section is omitted, the second %% mark may be omitted also;
声明段可能为空，另外，如果程序段省略，那么第二组“%%”标记也可以省略

thus, the smallest legal Yacc specification is

因此，最小的合法Yacc规范是

        %%
        rules

Blanks, tabs, and newlines are ignored except that they may not appear in names or multi-character reserved symbols. Comments may appear wherever a name is legal; they are enclosed in , as in C and PL/I.

空格和换行符将被忽略，除非它们出现在名字里或者多字符保留字符号里。注释可以出现在任何名字合法的地方，它们在里面，就像在C和PL/I里一样

The rules section is made up of one or more grammar rules. A grammar rule has the form:

规则段由一个或多个语法规则组成。一条语法规则有格式：

        A  :  BODY  ;

A represents a nonterminal name, and BODY represents a sequence of zero or more names and literals. The colon and the semicolon are Yacc punctuation.
A代表一个非终结符，BODY代表了0个或多个名字和字符序列。冒号和分号是Yacc标点。

Names may be of arbitrary length, and may be made up of letters, dot ``.'', underscore ``_'', and non-initial digits. Upper and lower case letters are distinct. The names used in the body of a grammar rule may represent tokens or nonterminal symbols.

名字可以是任意长度，可由字母，点“.”，下划线“_”，和数字(不能作开头)组成。区别大小写。用在语法规则的body中的名字可以代表token，或者非终结符

A literal consists of a character enclosed in single quotes ``'''. As in C, the backslash ``/'' is an escape character within literals, and all the C escapes are recognized. Thus

字面字符被单引号“''”引起。就如在C语言里，反斜线是一个转义字符，C语言里所有的转义字符都可识别，如下

        '/n'    newline
        '/r'    return
        '/''    single quote ``'''
        '//'    backslash ``/''
        '/t'    tab
        '/b'    backspace
        '/f'    form feed
        '/xxx'  ``xxx'' in octal

For a number of technical reasons, the NUL character ('/0' or 0) should never be used in grammar rules.
由于一些技术原因，NUL字符('/0'或者0)不能用在语法规则中

If there are several grammar rules with the same left hand side, the vertical bar ``|'' can be used to avoid rewriting the left hand side. In addition, the semicolon at the end of a rule can be dropped before a vertical bar. Thus the grammar rules

如果有一些语法规则有同样的左边，竖直线“|”可以用来避免重写左边。另外，规则结尾的分号在竖线前可以省略。因此，语法规则

        A       :       B  C  D   ;
        A       :       E  F   ;
        A       :       G   ;

can be given to Yacc as
可以写成

        A       :       B  C  D
                |       E  F
                |       G
                ;

It is not necessary that all grammar rules with the same left side appear together in the grammar rules section, although it makes the input much more readable, and easier to change.
语法规则段中，所有左边相同的语法规则都出现在一起不是必需的，虽然这会使得输入更加可读并且更易更改。

If a nonterminal symbol matches the empty string, this can be indicated in the obvious way:

        如果一个非终结符匹配空串，可以用一个明显的方法来指定

empty :   ;

Names representing tokens must be declared; this is most simply done by writing

代表token的名字必需声明；最简单的，可以写成

        %token   name1  name2 . . .

in the declarations section. (See Sections 3 , 5, and 6 for much more discussion). Every name not defined in the declarations section is assumed to represent a nonterminal symbol. Every nonterminal symbol must appear on the left side of at least one rule.
在声明段。(更多讨论见3，5，和6)。每个未在声明段中定义的名字都被假设为非终结符。每个非终结符必需至少出现在一条语法规则的左边

Of all the nonterminal symbols, one, called the start symbol, has particular importance. The parser is designed to recognize the start symbol; thus, this symbol represents the largest, most general structure described by the grammar rules. By default, the start symbol is taken to be the left hand side of the first grammar rule in the rules section. It is possible, and in fact desirable, to declare the start symbol explicitly in the declarations section using the %start keyword:

在所有非终结符中，有一个被称作开始符，有特殊的重要性。语法分析器被设计成可以识别开始符；因此，这个符号代表了语法规则所描述的最大的，最一般的结构。默认情况下，开始符位于语法规则段第一条语法规则的左边。在声明段用%start keyword明确声明开始符是可能的和可取的。

        %start   symbol

The end of the input to the parser is signaled by a special token, called the endmarker. If the tokens up to, but not including, the endmarker form a structure which matches the start symbol, the parser function returns to its caller after the endmarker is seen; it accepts the input. If the endmarker is seen in any other context, it is an error.

词法分析的输入结尾用一个特殊token来标识，称为结束标记。如果XXX

It is the job of the user-supplied lexical analyzer to return the endmarker when appropriate; see section 3, below. Usually the endmarker represents some reasonably obvious I/O status, such as ``end-of-file'' or ``end-of-record''.

当合适时返回结束符是用户提供的词法分析程序的工作；看下面的第3段。通常，结束符表示一些明显的IO状态，如“文件末尾”或者“记录结束”