GNU Bison 中文手册

20050620 GNU Bison 中文手册翻译完成

GNU Bison实际上是使用最广泛的Yacc-like分析器生成器,使用它可以生成解释器,编译器,协议实现等多种程序. 它不但与Yacc兼容还具有许多Yacc不具备的特性.

这个手册编写十分完整,带你领略Bison在使用中的各个细节(注:并不是实现细节).

如果发现错误,语句不通顺,意思不明,确请立即发邮件把您的建议或者您认为正确的翻译 写信告诉我,非常需要并感谢你的帮助!

英文原件页面

这个翻译版手册也是在GNU Free Documentation License下发布.

提供以下格式:(会陆续增加格式)

HTML格式使用一个由我patch的texi2html生成,如果需要的话可以跟我联系.

一些参考:


[顶层] [内容] [索引] [ ? ]

Bison

这个手册是针对GNU Bison (版本2.0,22 December 2004), GNU分析器生成器.

Copyright © 1988, 1989, 1990, 1991, 1992, 1993, 1995, 1998, 1999, 2000, 2001, 2002, 2003, 2004 Free Software Foundation, Inc.

Chinese(zh_CN) translation:Xiao Wang

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with the Front-Cover texts being "A GNU Manual," and with the Back-Cover Texts as in (a) below. A copy of the license is included in the section entitled "GNU Free Documentation License."

(a) The FSF's Back-Cover Text is: "You have freedom to copy and modify this GNU Manual, like GNU software. Copies published by the Free Software Foundation raise funds for GNU development."


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

Bison简介-Introduction

Bison是一种通用目的的分析器生成器. 它将LALR(1)上下文无关文法的描述转化成分析该文法的C程序. 一旦你精通Bison, 你可以用它生成从简单的桌面计算器到复杂的程序设计语言等等许多语言的分析器.

Bison向上兼容Yacc:所有书写正确的Yacc语法都应该可以不加更改地与Bison一起工作. 熟悉Yacc的人能毫不费力地使用Bison. 你应该熟练地掌握C程序设计语言,这样你才能使用Bison和理解这个手册.

我们会在这个教程的最初几章解释BIson的基本概念,并且展示三个详细解释的例子, 这些例子将在教程的最后被构建. 如果你对Bison或者Yacc一无所知, 你应该首先阅读这些章节. 接下来的参阅章节详细地阐述了Bison的各个方面.

Bison主要由Rovert Corbett编写.Richard Stallman使它与Yacc兼容. Carnegie Mellon大学的Wilfred Hansen为Bison添加了 多字符字符串文字(multi-character string literals)和其它一些特性.

这个版本的手则主要与Bison2.0相一致.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

使用Bison的条件-Conditions for Using Bison

为了允许非自由程序(nonfree programs)使用Bison生成的LALR分析器C代码, 我们在Bison版本1.24的时已经修改了yyparse的发布条款(distribution terms for yyparse). 在这之前,这些分析器只能用于自由软件(free software)的程序.

其它GNU编程工具,例如GNU C编译器从未有过类似的要求. 它们总能用于非自由软件的发布. Bison与它们不同的原因并不是出于特殊策略的决定, 而是由于应用通用许可证(General Public License)到所有的Bison源代码造成的结果.

Bison工具的输出-也就是Bison分析器文件(the Bison parser file)包括大小不固定的Bison代码片段. 这些代码片段来源于yyparse函数.(你的语法的动作被Bison插入到这个函数的某个位置, 但是函数的其余部分并未更改).当我们应用GPL条款到yyparse的代码的时候, 它产生的效果就是Bison的输出只能到自由软件.

我们并未更改条款使出于对那些希望是软件私有化的人的同情. 所有的软件都应该是自由软件.但是我们的结论是: 使Bison只能用于自由软件,这对鼓励人们使其它软件变为自由软件作用甚微. 所以我们决定将使用Bison的条件与使用其它GNU工具的条件相一致. (注:即和GCC一样可以用于非自由软件).

这个特例仅仅在Bison生成LALR(1)分析器的C代码时适用. 否则,GPL条款仍然正常的执行. 你可以通过查看代码,看其是否由这样的说明"As a special execption, when this file is copied by Bison into a Bison output file, you may use that output file without restriction."来分辩这个特例是否适用于你的`.c'输出文件.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

GNU GENERAL PUBLIC LICENSE

Version 2, June 1991

 
Copyright © 1989, 1991 Free Software Foundation, Inc.
59 Temple Place - Suite 330, Boston, MA  02111-1307, USA

Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

Preamble

The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too.

When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.

To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.

For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.

We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software.

Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.

Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all.

The precise terms and conditions for copying, distribution and modification follow.

  1. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you".

    Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running the Program is not restricted, and the output from the Program is covered only if its contents constitute a work based on the Program (independent of having been made by running the Program). Whether that is true depends on what the Program does.

  2. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program.

    You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.

  3. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
    1. You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change.
    2. You must cause any work that you distribute or publish, that in whole or in part contains or is derived from the Program or any part thereof, to be licensed as a whole at no charge to all third parties under the terms of this License.
    3. If the modified program normally reads commands interactively when run, you must cause it, when started running for such interactive use in the most ordinary way, to print or display an announcement including an appropriate copyright notice and a notice that there is no warranty (or else, saying that you provide a warranty) and that users may redistribute the program under these conditions, and telling the user how to view a copy of this License. (Exception: if the Program itself is interactive but does not normally print such an announcement, your work based on the Program is not required to print an announcement.)

    These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Program, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Program, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.

    Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Program.

    In addition, mere aggregation of another work not based on the Program with the Program (or with a work based on the Program) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.

  4. You may copy and distribute the Program (or a work based on it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following:
    1. Accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
    2. Accompany it with a written offer, valid for at least three years, to give any third party, for a charge no more than your cost of physically performing source distribution, a complete machine-readable copy of the corresponding source code, to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange; or,
    3. Accompany it with the information you received as to the offer to distribute corresponding source code. (This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an offer, in accord with Subsection b above.)

    The source code for a work means the preferred form of the work for making modifications to it. For an executable work, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the executable. However, as a special exception, the source code distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.

    If distribution of executable or object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place counts as distribution of the source code, even though third parties are not compelled to copy the source along with the object code.

  5. You may not copy, modify, sublicense, or distribute the Program except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense or distribute the Program is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
  6. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Program or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Program (or any work based on the Program), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Program or works based on it.
  7. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.
  8. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Program at all. For example, if a patent license would not permit royalty-free redistribution of the Program by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program.

    If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances.

    It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system, which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.

    This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.

  9. If the distribution and/or use of the Program is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
  10. The Free Software Foundation may publish revised and/or new versions of the General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.

    Each version is given a distinguishing version number. If the Program specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Program does not specify a version number of this License, you may choose any version ever published by the Free Software Foundation.

  11. If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
  12. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  13. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

Appendix: How to Apply These Terms to Your New Programs

If you develop a new program, and you want it to be of the greatest possible use to the public, the best way to achieve this is to make it free software which everyone can redistribute and change under these terms.

To do so, attach the following notices to the program. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.

 
one line to give the program's name and a brief idea of what it does.
Copyright (C) yyyy  name of author

This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

Also add information on how to contact you by electronic and paper mail.

If the program is interactive, make it output a short notice like this when it starts in an interactive mode:

 
Gnomovision version 69, Copyright (C) 19yy name of author
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.

The hypothetical commands `show w' and `show c' should show the appropriate parts of the General Public License. Of course, the commands you use may be called something other than `show w' and `show c'; they could even be mouse-clicks or menu items--whatever suits your program.

You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the program, if necessary. Here is a sample; alter the names:

 
Yoyodyne, Inc., hereby disclaims all copyright interest in the program
`Gnomovision' (which makes passes at compilers) written by James Hacker.

signature of Ty Coon, 1 April 1989
Ty Coon, President of Vice

This General Public License does not permit incorporating your program into proprietary programs. If your program is a subroutine library, you may consider it more useful to permit linking proprietary applications with the library. If this is what you want to do, use the GNU Library General Public License instead of this License.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1. 和Bison相关的一些基本概念-The Concepts of Bison

这一章介绍了许多基本概念,但没有提及一些感觉不到的细节. 如果你并不了解如何使用Bison/Yacc,我们建议你仔细阅读这一章.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.1 语言与上下文无关文法-Languages and Context-Free Grammars

为了使Bison能分析语言,这种语言必须由上下文无关文法(context-free grammar)描述. 也就是说,你必须指出一个或者多个语法组(syntactic groupings)以及从语法组的部分构它的建整体的规则. 例如,在C语言中,有一种我们称之为`表达式(expression)'的语法组. 一个生成表达式的规则可能是"一个表达式由一个减号和另一个表达式构成". 另外一个规则可能是"一个表达式可以是一个整数". 就象你看到的一样,规则经常是递归定义的,但是必须有一个结束递归的规则.

用于表示这些规则的最普遍的系统是Backus-Naur 范式(Backus-Naur Form)或者"BNF". 发明这种语言的目的是用来阐述Algol 60. 任何用BNF表示的文法都是一种上下文无关文法. Bison要求它的的输入必须用BNF表示.

上下文无关文法有许多重要的子集.尽管Bison可以处理几乎所有的上下文无关文法, 但Bison针对LALR(1) 文法(LALR(1) grammars)做了优化. 简而言之,在这些文法中(注:指LALR(1)),我们可以告之如何分析仅代有一个超前扫描记号的输入字符串的任意部分. 严格的说,这是一个LR(1)文法的描述. LALR(1)包括了与多难以分析的额外限制. 幸运的是,在实际中,我们很难找到一个是LR(1)文法而不是LALR(1)文法的例子. 参阅神秘的归约/归约冲突-Mysterious Reduce/Reduce Conflicts,以获取更多信息.

LALR(1)文法分析器具有确定性(deterministic), 这就意味着应用于输入的下一个文法规则取决于之前的输入和确定的部分剩余输入(我们称之为一个超前扫描记号(look-ahead). 一个上下文无关文法可能是有歧义的(ambiguous),即可能可以应用多种规则来获取某些输入. 即使非歧义性文法也可能使不确定(non-deterministic)的, 即没有总能足以决定下一个应用的文法规则的确定的超前扫描记号。 使用孰知的GLR技术,Bison的GLR就可以分析这些更为普通的上下文无关文法. 当任意给定字符串的可能的分析是确定的情况下,Bison可以处理任意上下文无关文法.

在正式的语言语法规则中,每一种语法单元或组合被称之为符号(symbol). 那些可以通过语法规则被分解成更小的结构的符号叫做非终结符(nonterminal symbols). 那些不能被再分的符号叫做终结符(terminal symbols)或者记号类型(token types). 我们把同终结符相对应的输入片段叫做记号(token), 把同单个非终结符相对应的输入片段叫做组(grouping).

我们可以使用C语言做为例子来解释什么是符号,以及终结符和非终结符的含义. C语言的记号包括标识符,常量(数字或者字符串)以及各种关键字,数学操作符和标点符号. 所以C语言语法的终结符包括`identifier',`number',`string',加上每个关键字的符号,操作符或者标点符号. 例如:`if',`return',`const',`static',`int',`char',`plus-sign',`open-brace',`close-brace',`comma'以及更多. (这些记号可以再分为字符,但是这些是词法学而不是语法学的事情)

这是一个如何将C函数分解成记号的例子:

 
int             /* 关键字 `int' */
square (int x)  /* identifier, open-paren, 关键字 `int', identifier, close-paren */
{               /* open-brace */
  return x * x; /* 关键字 `return', identifier, asterisk, identifier, semicolon */
}               /* close-brace */

C语言的语法组包括表达式,语句,声明和函数定义. 这些由C语言语法中的非终结符`expression',`statement',`declaration'和`funcation definition'表示. 完整的语法还使用了许多额外的语言结构,每种结构都有自己的非终结符来表示上述四种的含义. 上面的例子是一个函数的定义,它包括了一个声明和一个语句. 每一个`x'一个表达式,而且`x * X'也是一个表达式.

每一个非终结符必须有一个描述如何由更简单结构组成这个非终结符的语法规则. 例如,一种C的语句是return语句的非正式表达将由如下的语法规则描述:

一种语句可以由一个`return'关键字,一个`expression'何以个`semicolon'组成.

还有许多其它对应`statement'的规则,每一种规则对应一种C语句.

我们必须注意到一种特殊的非终结符,这种非终结符定义了语言的一个完整表述. 我们称之为开始符号(start symbol). 在编译器中,这意味着一个完整的输入程序. 在C语言中,非终结符`sequence of definitions and declarations'扮演了这个角色.

例如,`1+2'是以个有效的C表达式--一个有效的C程序的部分--但它不能做为一个C程序的全部(entire). 在C语言的上下文无关文法中,这遵循了`expression'不是开始符号的事实.

Bison分析器读取一个记号序列做为它的输入并使用语法规则将记号组合. 如果输入是有效的,最终的结果是将整个的输入序列分析整理到开始符号. 如果我们使用C语言的语法,整个的输入必须是一个`sequence of definitions and declarations', 否则分析器会报告一个语法错误.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.2 从正规文法转换到Bison的输入-From Formal Rules to Bison Input

正规文法是一种数学结构.为了定义Bison要分析的语言, 你必须用Bison语法编写一个表达该语言的文件,即一个Bison 语法(Bison grammar)文件. 参阅 Bison的语法文件-Bison Grammar Files.

就象C语言中的标识符一样,在Bison的输入中,一个正规文法的非终结符由一个标识符表示. 根据惯例,非终结符应该用小写子母表示,例如expr,stmt或者declaration.

在Bison中,终结符也被称为符号类型(token type). 符号类型也可以由类似C语言标识符来表示. 根据惯例,这些标识符因改用大写子母表示以区分它和非终结符. 例如,INTEGER,INDENTIFIER,IF或者RETURN. 一个表示某语言的特定关键字的终结符应该由紧随该关键字之后的它的大写表示来命名. 终结符error保留用作错误恢复之用. 参阅 符号-Symbols.

一个终结符也可以由一个像C中的字符常量一样的一个字符来表示. 当一个记号就是一个字符(括号,加号等等)的时候,你可以这样做: 使用同一个字符做为那个记号的终结符.

第三种表示终结符的方法是使用包含一些字符的C字符串常量. 获取更多这方面的信息可以参阅 符号-Symbols.

语法规则在Bison语法中也有相应的表示.例如,下面有一个C语言return语句的规则. 在引号中的分号,是一个字符记号,它是用来表示C语言部分语法的. 没有在引号中的分号和冒号是Bison用来表示每一条规则的标点符号.

 
stmt:   RETURN expr ';'
        ;

获取这方面更多信息,参阅 语法规则的语法-Syntax of Grammar Rules.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.3 语义值-Semantic Values

正规文法仅仅靠类别来选择记号:例如,如果一个规则提到了终结符`integer constant', 这就意味着任何整数常量在那个位置上都是语法有效的. 常量的精确值与如何分析输入不相关:如果`x+4'符合语法,那么`x+1'或者`x+3989'也符合语法.

但是,当分析输入时,它的精确值是非常重要的,通过精确值可以了解输入的含义. 如果一个编译器不能区别程序中的4,1和3989等常量,毫无疑问,这个编译器是没有用的! 因此,每一个Bison语法的记号既含有一个符号类别也有一个语义值(semantic value). 获取这方面的更多信息,参阅 定义语言的语义-Defining Language Semantics.

符号类型是在语法中定义的终结符,例如INTEGER,IDENTIFIRE或者','. 它告诉你决定记号可能有效出现的位置以及如何将它组合成其它记号的信息. 语法规则只知道符号的类型,其它的什么都不知道.

语义值包括了记号的所有剩余信息.例如整数的数值,标识符的名称. (一个如','的记号只是一个标点,并不需要语义值.)

例如,一个分类为INTEGER的记号包含语义值4. 另一个也被分类为INTEGER的记号的语义值却是3989. 当一个语法规则表明INTEGER是允许的,任意的这些记号都是可接受的,因为它们都是INTEGER. 当一个分析器接受了记号,它会跟踪这个记号的语义值.

每一个语法组和它的非终结符也可已有语义值. 一个很典型的例子,在计算器中,一个表达式含有一个数值做为它的语义值, 在程序语言编译器中.一个典型的表达式含有一个用于描述它含义的树型结构做为语义值.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.4 语义动作

为了更加实用,一个程序不仅仅要分析输入而且必须做的更多. 它应该可以在输入的基础上产生一些输出. 在Bison语法中,一个语法规则可以有一个包括多个C语句的动作(action). 分析器每次识别一个规则的匹配,相应的动作就会被执行. 获取这方面的更多信息,参阅 动作-Actions.

大多数时候,动作的目的是从部分结构的语义值计算整个结构的语义值. 例如,加入我们有一个规则表明一个表达式可以由两个表达式相加而成. 当分析器识别了一个加法和,每一个子表达式都有一个描述其如何构建的语义值. 这个规的动做就是为了新识别的大表达式建立一个类似的语义值.

例如,这里的一个规则表明一个表达式可由两个表达式相加而成.

 
expr: expr '+' expr   { $$ = $1 + $3; }
        ;

这个动作表明了如何从子表达式的语义值产生加法和表达式的语义值.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.5 编写GLR分析器-Writing GLR Parsers

在一些文法中,Bison标准的LALR(1)分析算法, 不能针对给定的输入应用一个确定的语法规则. 这就是说,Bison可能不能决定(在当前输入的基础上)应该使用两个可能的归约中的那一个, 或者不能决定到底应该应用一个归约还是先读取一些输入稍后再进行归约. 以上两种冲突分别被称为归约/归约(reduce/reduce)冲突(参阅归约/归约-Reduce/Reduce一章)和 移进/归约(shift/reduce)冲突(参阅移进/归约-Shift/Reduce一章).

有些时候, 为了使用一个很难被修改成LALR(1)文法的文法做为Bison的输入, Bison需要使用通用的分析算法. 如果你在你的文件中加入了这样的声明%glr-parser(参阅语法大纲-Grammar Outline一章), Bison会产通用的LR(GLR)分析器. 这些分析器(例如,在应用了先前所述的声明之后) 在处理那些不包含未解决的冲突的文法时, 采用与LALR(1)分析器一样的处理方式. 但是当面临未解决的移进/归约冲突和归约/归约冲突的时候, GLR分析器权宜地同时处理这两个可能, 即有效地克隆分析器自己以便于追踪这这两种可能性. 每一个克隆出来的分析器还可以再次被克隆, 这就保证在任意给定的时间,可以处理任意个可能的分析. 分析器逐步地进行分析,即所有的分析器它们进入到下一个输入之前, 都会消耗(归约)给定的输入符号. 每一个被克隆的分析器最终只有两个可能的归宿: 或者这个分析器因进入了一个分析错误而最终被销毁, 会这它和其它的分析器合并,因为它们把输入归约到了一个相同的符号集.

在有多个分析器并存的时刻,Bison只记录它们的语义动作而不是执行它们. 当一个分析器消失的时候,相应的语义动作记录也消失并且永远不会被执行. 当一个规约使得两个分析器等价而合并的时候, Bison会记录下它们两个的语义动作集. 每当最后两个分析器合并成为一个单独的分析器的时候, Bison执行所有未完成的动作.这些动作既可能依靠语法规则的优先级被执行也可能均被Bison执行. 在执行完动作之后,Bison调用指定的用户定义求值函数来产生一个独立的合并值.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.5.1 使用GLR分析器分析非歧义文法

在最简单的情况下,你可以使用GLR算法分析那些非歧义但不是LALR(1)文法的文法. 典型地,这些文法需要不止一个的超前扫描记号, 或者(在极少数情况下)由于LALR(1)算法丢弃太多的信息 而不属于LALR(1)的文法(它们却属于LR(1),参阅冲突-Mystery Conflicts).

考虑一个产生于Pascal语言枚举和子界类型声明中的问题,. 这里有一些例子:

 
type subrange = lo .. hi;
type enum = (a, b, c);

最初的语言标准只允许数字和常量标识符做为子界的范围(`lo'`hi'). 但是扩展的Pascal(ISO/IEC10206)以及许多以它的Pascal的实现还允许独立的表达式. 这导致了如下一个包含过量括号的情形.

 
type subrange = (a) .. b;

考虑如下这个仅含一个值的枚举类型的声明.

 
type enum = (a);

(在这里的这些例子是人为制造的,但它们是语法有效的. 在实际的程序中可能会出现更复杂的例子)

这两个例子看起来相同(注:指语法上)直到`..'记号的出现. 对于只含有一个超前扫描记号的普通LALR(1)分析, 当`a'被分析的时,分析器候并不能两个形式中(注:指枚举或者子界)那一个是正确的. 因为在后一个例子中,`a'必须成为一个新的标识符来表示枚举; 而在前一个例子中必须使用`a'和它的含义来估计它可能是一个常量或者函数调用, 所以我们当然希望分析器可以对此作出正确的决定.

你可将`(a)'分析成"在括号中为说明的标识符"以便稍后进行分析. 但当括号嵌套在表达式的递归规则中的时候, 这种方式需要对语义动作和大部分语法做出相应的调整.

你可能会想到使用词法分析器通过返回已定义和未定地标识符来区别这两种形式. 但是如果这些声明在局部出现, 而`a'在外部定义的时候,这两种形式都是可能的- 既可能是局部重定义的`a'.也可能是使用外部定义的`a'的值. 所以这种方法不可行.

解决这个问题由一个简单的办法,那就是使用GLR算法. 当GLR分析器到达关键区域的时候, 它会同时沿着两个分支进行分析. 其中的一个分支迟早要进入分析错误. 如果有一个`..'记号在下一个`;'之前的化; 由于枚举类型的规则不能接受`..',所以它应用失败; 否则,子界类型规则应用失败,因为它要求一个`..'记号. 所以,一个分支无声地失败而另一个分支正常地进行, 并且执行所有的在拆分期间被推迟执行的中间动作.

如果输入不符和语法,则这两个分支的分析都会失败 并且如常地报告一个语法错误.

分析器的功能似乎是在"猜测"正确的语法分支,换句话说, 它使用了比LALR(1)算法允许使用的更多的超前扫描记号. LALR(2)有能力分析这个句子, 但是有些不符和LALR(k)的例子, 即使任意多的k可以这样地处理.

一般而言,一个GLR分析器在最坏情况下会花费二次方或者三次方的时间复杂度, 当前的Bison甚至会为了某些文法而花费指数的时间. 在实际应用中,这些几乎不会发生, 并且对于许多文法来说,证明它不能发生也是可能的. 当前讨论的例子在两个规则中只有一个冲突, 并且含有冲突的类型声明不能嵌套使用. 所以在任意时刻的分支数量被限制在常量2以下, 这时候分析时间仍然是线性的.

这是一个同上面描述相对应的例子. 它由Pascal类型声明大大简化而来. 如果输入不符和语法,两个分支都会失败并且如常地报告一个语法错误.

 
%token TYPE DOTDOT ID

%left '+' '-'
%left '*' '/'

%%

type_decl : TYPE ID '=' type ';'
     ;

type : '(' id_list ')'
     | expr DOTDOT expr
     ;

id_list : ID
     | id_list ',' ID
     ;

expr : '(' expr ')'
     | expr '+' expr
     | expr '-' expr
     | expr '*' expr
     | expr '/' expr
     | ID
     ;

当使用平常的LALR(1)文法的时候,Bison会报告一个归约/归约冲突. 在冲突的时候,分析器在会众多选择中选取一个-随意地选择那个先声明的. 所以下面的正确输入不能被识别.

 
type t = (a) .. b;

在Bison输入文件中, 加入这两个声明(在第一个`%%'之前)分析器可以将分析器编成一个GLR分析器, 并且Bison不会报告一个归约/归约冲突.

 
%glr-parser
%expect-rr 1

并不需要对语法本身进行修改. 分析器现通过上面的限制语法后,可以认识所有有效的声明. 用户实际上并不能查觉分析器的拆分.

这就是我们使用GLR而几乎没有坏处的例子. 即使像这样简单的例子,至少两个潜在的问题值得我们注意. 第一,我们总应该分析Bison的冲突报告来确定GLR拆分总发生在我们想要的时候. 一个GLR分析器的拆分会不经意地产生比LALR分析器在冲突中静态的错误选择 更加不明显的问题. 第二,要仔细考虑与词法分析器的互动(参阅语义记号-Sematic Tokens一章). 由于在拆分期间的分析器消耗记号时并不产生任何动作, 词法分析器并不能通过分析动作获得信息. 一些与词法分析器的互动在使用GLR来消除从词法分析器到语法分析器的复杂读时可以忽略. 你必须监察其余情况下时的正确性.

在我们的例子中,因为并没有新的符号在类型声明中间被定义. 词法分析器基于记号当前在符号表的含意被返回是安全的. 即使在分析枚举常量时定义它们是可能的, 由于它们不能在相同的枚举类型声明中使用, 所以这实际上并没有什么不同.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.5.2 使用GLR解决歧义-Using GLR to Resolve Ambiguities

让我们考虑由C++语法大大简化而来的例子.

 
%{
  #include <stdio.h>
  #define YYSTYPE char const *
  int yylex (void);
  void yyerror (char const *);
%}

%token TYPENAME ID

%right '='
%left '+'

%glr-parser

%%

prog :
     | prog stmt   { printf ("/n"); }
     ;

stmt : expr ';'  %dprec 1
     | decl      %dprec 2
     ;

expr : ID               { printf ("%s ", $$); }
     | TYPENAME '(' expr ')'
                        { printf ("%s <cast> ", $1); }
     | expr '+' expr    { printf ("+ "); }
     | expr '=' expr    { printf ("= "); }
     ;

decl : TYPENAME declarator ';'
                        { printf ("%s <declare> ", $1); }
     | TYPENAME declarator '=' expr ';'
                        { printf ("%s <init-declare> ", $1); }
     ;

declarator : ID         { printf ("/"%s/" ", $1); }
     | '(' declarator ')'
     ;

这个例子模拟了有问题的C++语法--某些声明和语句中的歧义.例如,

 
T (x) = y+z;

既可以被分析为一个expr也可被分析为一个stmt (假定`T'被识别为一个TYPENAME并且`x'被识别为ID). Biosn检测到了这个在规则expr : ID和规则delarator : ID之间的归约/归约冲突. 分析器遇到x的时候还不能解决冲突. 由于这是一个GLR分析器,所以它会把问题拆分成两个分析器, 每一个用于解决归约/归约冲突中的一个. 与上一节的例子不同(参阅简单的GLR分析器一章),两个分析器中的任一个都不"死亡," 因为这个语法本身就是歧义的. 其中的一个分析最终归约到stmt : expr ';'而另一个归约到stmt : decl, 在这之后,两个分析器进入同一个状态: 它们都已经看到`prog stmt'并且有相同的未处理剩余输入. 我们说这些分析器已经合并(merged).

在这个时候,GLR分析器需要一个关于如何在两个竞争的分析中做出选择的说明. 在上面的例子中,两个%dprec声明说明了Bison给予decl的解释以优先级. 这就表明x是一个声明符(declarator). 因此分析器会打印出:

 
"x" y z + T <init-declare>

%deprc声明仅在多于一个分析幸存的情况下起作用. 考虑这个分析器的一个不同的输入字符串:

 
T (x) + y;

像在上一节展示的一样(参阅简单的GLR分析器-Simple GLR Parsers一章), 这是另外一个使用GLR分析非歧义结构的例子. 在这里并没有歧义(这个并不能被分析成一个声明). 但是在Bison分析器遇到x的时候, 它没有足够的信息解决归约/归约冲突(还是不能决定x是一个expr还是一个declarator). 在这种情况下,没有优先级可供使用. 分析器再次被拆分成两个,一个假定x是一个expr 而另一个假定x是一个declarator. 两个分析器中的第二个在遇到+的时候被销毁,并且分析器打印出

 
x T <cast> y +

假设你想查看所有的可能性而不是解决歧义. 你必须合并两个可能的分析器的语义动作而不是选择其中的一个. 为了达到这个目的,你需要像如下那样地改变stmt的声明:

 
stmt : expr ';'  %merge <stmtMerge>
     | decl      %merge <stmtMerge>
     ;

并且定义stmtMerge函数如下:

 
static YYSTYPE
stmtMerge (YYSTYPE x0, YYSTYPE x1)
{
  printf ("<OR> ");
  return "";
}

并且在文件的开头要伴随一个前置声明在C声明中:

 
%{
  #define YYSTYPE char const *
  static YYSTYPE stmtMerge (YYSTYPE x0, YYSTYPE x1);
%}

使用这些声明以后,产生的的分析器将第一个例子既分析成一个expr又分析成一个decl, 并且打印

 
"x" y z + T <init-declare> x T <cast> y z + = <OR>

Bison要求所有参加任何特定合并的产品(productions)拥有相同的`%merge'语句. 否则,歧义不能被解决而且分析器会在任何导致违反规定的合并时候报告一个错误.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.5.3 编译GLR分析器时需要考虑的问题-Considerations when Compiling GLR Parsers

GLR分析器需要支持C89或更新标准的编译器. 额外地,Bison使用关键字inline,这个关键字不是C89标准但却是C99标准, 并且是许多C99之前编译器支持的扩展. 使用它是为了分析器的用户可以处理移植性问题. 例如,如果使用Autoconf和Autoconf宏AC_C_INLINE,一个单单的

 
%{
  #include <config.h>
%}

就足够用了,否则我们建议使用

 
%{
  #if __STDC_VERSION__ < 199901 && ! defined __GNUC__ && ! defined inline
   #define inline
  #endif
%}


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.6 位置-Locations

许多应用程序,如解释器和编译器,需要产生一些有用信息或者出错的信息. 为了达到这个目的,我们必须追踪每个语法结构的原文位置(textual location)位置(location). Bison提供了追踪这些位置的机制.

每一个记号有一个语义值.类似地,每个记号也有一个位置, 对于所有记号和组来说,它们的位置的类型是相同的. 此外,输出的分析器也带有默认的存储位置的数据结构 (获取更多信息,参阅位置-Locations一章).

像语义值一样,位置可以在动作中使用特定的一套结构来访问. 在上面个的例子中,这个组的的位置是@$,而子表达式的位置是@1@3.

当一个规则被匹配,一个默认的动作用于计算左侧的语义值(参阅动作-Actions一章). 类似地,另外一个默认的动作用于计算位置. 然而,这个默认动作对于对于大多数情况已经足够永, 即经常没有必要为每个规则描述@$应该是如何形成的. 当为一个给定的组建立一个新的位置的时候, 输出的分析器的默认行为是取第一个符号的开头和最后一个符号的末尾.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.7 Bison的输出:分析器文件-Bison Output: the Parser File

当你运行Bison的时候,你需要给Bison一个语法文件做为其输入. Bison的输出是一个分析这个语法文件描述的语言的C源代码文件. 这个文件叫做Bison分析器(Bison parse). 我们要记住Bison工具和Bison分析器是两个明显不同的程序: Bison工具是一个以Bison分析器为输出的程序. 这个Bison分析器应是你程序的一部分.

Bison分析器的工作是依照语法规则组合记号--例如,将标识符和操作符构建成表达式. 在组合的过程中它还要执行相应的语法规定的动作.

记号是来源于称为词法分析器(lexical analyzer)的程序. 你必须以某种形式提供词法分析器(如用C编写). Bison分析器每当需要一个新的记号的时候就会调用词法分析器. Bison分析器并不之道记号"中"有什么东西(即使它们的语义值可能反映这个). 典型的词法分析器靠分析字符来产生记号,但是Bison并不依靠这个. 获取更多细节,参阅 词法分析函数yylex-The Lexical Analyzer Function yylex.

Bison分析器文件是定义了名为yyparse并且实现了那个语法的函数的C代码. 这个函数并不能成为一个完成的C程序:你必须提供额外的一些函数. 其中之一是词法分析器.另外的一个是一个分析器报告错误时调用的错误报告函数. 另外,一个完整的C程序必须以名为main的函数开头; 你必须提供这个函数.并且安排它调用yyparse. 否则分析器永远都不会运行. 参阅 分析器C语言接口-Parser C-Language Interface.

除了你编写的动作中的记号类型名称和符号以外 ,所有Bison分析器文件自己定义的符号都以`yy'或者`YY'开头. 这些符号包括了接口函数例如词法分析函数yylex,错误报告函数yyerror 和分析器函数yyparse. 这些符号也包括了许多内部目的的标识符. 所以你要在Bison语法文件中避免使用除了本手册定义的以外的以`yy'或者`YY'开头的C标识符.

在一些情况下,Bison分析器文件包含系统头文件. 在这中情况下,你的代码注意被这些文件保留的标识符. 在意些非GNU系统,<alloca.h>,<stddef.h>以及<stdlib.h> 被包含在内用于声明内存分配器及相关类型. 如果你定义YYDEBUG为非零值,其它的系统头文件也可能被包括进内. (参阅跟踪你的分析器-Tracing Your Parser一章)


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.8 使用Bison的流程-Stages in Using Bison

实际使用Bison设计语言的流程,从语法描述到编写一个编译器或者解释器,有三个步骤:

  1. 以Bison可识别的格式正式地描述语法.(参阅Bison语法文件一章) 对每一个语法规则,描述当这个规则被识别时相应的执行动作. 动作由C语句序列描述.
  2. 编写一个词法分析器处理输入并将记号传递给语法分析器. 词法分析器既可是手工编写的C代码(参阅词法分析函数yylex一章), 也可以由lex产生,但是lex的使用并未在这个手册中讨论.
  3. 编写一个调用Bison产生的分析器的控制函数.
  4. 编写错误报告函数.

将这些源代码转换成可执行程序,你需要按以下步骤进行.

  1. 按语法运行Bison产生分析器.
  2. 同其它源代码一样编译Bison输出的代码.
  3. 链接目标文件以产生最终的产品.

[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

1.9 Bison语法文件的整体布局-The Overall Layout of a Bison Grammar

Bison工具的输入文件是以个Bison语法文件(Bison grammar file). 通常的Bison语法文件格式如下:

 
%{
Prologue
%}

Bison declarations

%%
Grammar rules
%%
Epilogue

`%%',`%{'`%}'是Bison在每个Bison语法文件中用于分隔部分的标点符号.

prologue可用来定义在动作中使用类型和变量. 你可以使用预处理器命令在那里来定义宏, 或者使用#include包含干这些事情的头文件. 你需要声在那里与许多要在语法规则的动总中使用的全局标识符一起 声明词法分析器yylex和错误打印程序yyerror.

Bison declarations声明了终结符和非终结符以及操作符的优先级和各种符号语义值的各种类型.

Grammar rules定义了如何从每一个非终结符的部分构建其整体的语法规则.

Epilogue可以包括任何你想使用的代码. 在Prologue中声明的函数经常定义在这里. 在简单的程序里,剩余的所有程序可以放在这里.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

2. 实例-Examples

现在开始我们展示并解释三个使用Bison编写的示范程序: 一个逆波兰记号计算器,一个代数符号(中缀)计算器和一个多功能计算器. 这些程序在BSD Unix 4.3上测试无误;它们每一个虽然功能有限, 但确都是实用的互动桌面计算器.

这些例子虽然简单,但是用Bison语法编写真正的程序设计语言的道理与这相同.


[ < ] [ > ]   [ << ] [ 上层 ] [ >> ]         [顶层] [内容] [索引] [ ? ]

2.1 逆波兰记号计算器-Reverse Polish Notation Calculator

我们的第一个例子是双精度逆波兰记号(reverse polish notation)计算器(一个使用后缀操作符的计算器). 这个例子是一个很好的起点,因为没有操作符优先级的问题. 第二个例子会说明如何处理运算符优先级.

这个计算器的源代码文件叫`rpcalc.y'. `.y'是Bison惯用的输入文件的扩展名.

  • 13
    点赞
  • 80
    收藏
    觉得还不错? 一键收藏
  • 17
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 17
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值