boost.spirit用户手册翻译（4）：基本概念

最新推荐文章于 2020-12-16 19:38:11 发布

qingcairousi

最新推荐文章于 2020-12-16 19:38:11 发布

阅读量3.2k

点赞数

分类专栏： boost 文章标签： primitive parsing semantic function hierarchy pascal

boost 专栏收录该内容

24 篇文章 0 订阅

订阅专栏

Basic Concepts

基本概念：

There are a few fundamental concepts that need to be understood well: 1) The Parser, 2) Match, 3) The Scanner, and 4) Semantic Actions. These basic concepts interact with one another, and the functionalities of each interweave throughout the framework to make it one coherent whole.

有少数的基本概念是必须了解清楚的，他们是：1）分析器 2）匹配 3）扫描器 4）语义动作。这些基本概念相互关联，他们的功能交织而成了整个框架。

The Parser

分析器

Central to the framework is the parser. The parser does the actual work of recognizing a linear input stream of data read sequentially from start to end by the scanner. The parser attempts to match the input following a well-defined set of specifications known as grammar rules. The parser reports the success or failure to its client through a match object. When successful, the parser calls a client-supplied semantic action. Finally, the semantic action extracts structural information depending on the data passed by the parser and the hierarchical context of the parser it is attached to.

框架的中心，是分析器。分析器真正完成从头到尾识别由扫描器读入的线性数据流的工作。分析器尝试以一系列完整定义的规范来匹配输入，这些规范被称为语法规则。分析器通过匹配对象来通知客户程序分析的成功与否。成功匹配时，将执行客户程序提供的语义动作。最后，语义动作从分析器中获得结构化的信息，这些信息依赖于分析器传剃的数据和分析器所处的语境层次。

Parsers come in different flavors. The Spirit framework comes bundled with an extensive set of pre-defined parsers that perform various parsing tasks from the trivial to the complex. The parser, as a concept, has a public conceptual interface contract. Following the contract, anyone can write a conforming parser that will play along well with the framework's predefined components. We shall provide a blueprint detailing the conceptual interface of the parser later.

不同的分析器有不同的味道。Spirit框架包含了大量的预定义的分析器，用以处理从简单到复杂的工作。分析器在概念上，有公共的概念性的接口协议。只要依据协议，任何人都可以轻易的完成一个能与框架其他部分完美整合的分析器。待会我们将提供分析器细节的蓝图。

Clients of the framework generally do not need to write their own hand-coded parsers at all. Spirit has an immense repertoire of pre-defined parsers covering all aspects of syntax and semantic analysis. We shall examine this repertoire of parsers in the following sections. In the rare case where a specific functionality is not available, it is extremely easy to write a user-defined parser. The ease in writing a parser entity is the main reason for Spirit's extensibility.

框架的用户通常不需要自己手写一个分析器。Spirit已经有一个涵盖了语义分析，句意分析，语义动作的方方面面的分析器清单。在下面的章节中，我们将仔细检索这张清单。在需要某些框架未提供的功能的罕见情况下，要写一个用户定义的分析器也是极端容易的。能够轻松的写出一个分析器，这是Spirit之所以具有扩展性的主要原因。

Primitives and Composites

元素与合成物

Spirit parsers fall into two categories: primitives and composites. These two categories are more or less synonymous to terminals and non-terminals in parsing lingo. Primitives are non-decomposable atomic units. Composites on the other hand are parsers that are composed of other parsers which can in turn be a primitive or another composite. To illustrate, consider the Spirit expression:

Spirit的分析器分为两类：元素和合成物，这两类多少等同于编译界的行话：终结符和非终结符。元素是不可再分的原子单位。而在另一边，合成物是有其他分析器合成的，而这些分析器可以是元素，也可以是合成物。为了更好的理解，看下面的例子：

    real_p >> *(',' >> real_p)

real_p is a primitive parser that can parse real numbers. The quoted comma ',' in the expression is a shortcut and is equivalent to ch_p(','), which is another primitive parser that recognizes single characters.

The expression above corresponds to the following parse tree:

real_p 是一个用于分析实数的元素分析器。表达式里单引号引起来的逗号','是一个简写，等同于ch_p(',')，另一个用于识别单字符的元素分析器。上面的表达式对应下面的分析树：

The expression:

表达式：

    ',' >> real_p

composes a sequence parser. The sequence parser is a composite parser comprising two parsers: the one on its left hand side (lhs), ch_p(',') ; and the other on its right hand side (rhs), real_p. This composite parser, when called, calls its lhs and rhs in sequence and reports a successful match only if both are successful.

合成一个并置分析器。这个并置分析器是一个合成物分析器，包含两个分析器。左边的（简写lhs），ch_p(',')，和右边的(简写rhs),real_p。当这个合成分析器被调用时，将顺序调用它的左边和右边的分析器，并且仅当两个分析器都匹配成功时，它才报告匹配成功。

The sequence parser is a binary composite. It is composed of two parsers. There are unary composites as well. Unary composites hold only a single subject. Like the binary composite, the unary composite may change the behavior of its embedded subject. One particular example is the Kleene star. The Kleene star, when called to parse, calls its sole subject zero or more times. "Zero or more" implies that the Kleene star always returns a successful match, possibly matching the null string: "".

上面的并置分析器是一个二元的合成。由两个分析器合成。同样，也有一元的合成。一元的合成仅包含一个单独的子项。和二元的合成一样，一元合成有可能改变它所包含的子项的行为。一个例子就是克林闭包。克林闭包被调用时，将调用它的唯一子项零至任意多次。“零到任意多”意味着克林闭包永远返回成功匹配，哪怕是空串""。

The expression:

表达式：

    *(',' >> real_p)

wraps the whole sequence composite above inside a kleene_star.

把整个并置表达式包裹入一个克林闭包中。

Finally, the full expression composes a real_p primitive parser and the kleene_star we have above into another higher level sequence parser composite.

最后，真个表达式将一个real_p元素和上面的kleene_star（克林闭包）表达式合成到一个更高一层的并置表达式合成物。

A few simple classes, when composed and structured in a hierarchy, form a very powerful object-oriented recursive-descent parsing engine. These classes provide the infrastructure needed for the construction of more-complex parsers. The final parser composite is a non-deterministic recursive-descent parser with infinite look-ahead.

很少的一些简单的类，在层次的组合和构建时，便形成了一个强大的面相对象的递归下降分析器。最终分析器的合成物是一个非确定性的无限递推分析器。

Top-down descent traverses the hierarchy. The outer sequence calls the leftmost real_p parser. If successful, the kleene_star is called next. The kleene_star calls the inner sequence repeatedly in a loop until it fails to match, or the input is exhausted. Inside, ch_p(',') and then real_p are called in sequence. The following diagram illustrates what is happening, somewhat reminiscent of Pascal syntax diagrams.

分析器自顶向下遍历，最外层的并置号调用最左边的real_p分析器。如果成功，下一步将调用克林闭包。克林闭包反复调用它内部的并置号，直到匹配失败，或者整个输入都被消耗掉了。而在每次调用中，ch_p(',')和real_p又被顺序调用。下图就是整个过程。有些类似过去的pascal语法图。

The flexibility of object embedding and composition combined with recursion opens up a unique approach to parsing. Subclasses are free to form aggregates and algorithms of arbitrary complexity. Complex parsers can be created with the composition of only a few primitive classes.

内嵌对象的灵活性和递归的合成，开创了一个统一的分析方法。派生类可以构成任意复杂度的聚合或算法。复杂的分析器可以仅经由少数元素类合成而创建。

The framework is designed to be fully open-ended and extensible. New primitives or composites, from the trivial to the complex, may be added any time. Composition happens (statically) at compile time. This is possible through the expressive flexibility of C++ expression templates and template meta-programming.

这个框架被设计成完全开放性的和可扩展的。新的元素和合成物，从简单到复杂，可以随时添加。合成在编译时发生（静态的）。能这么做是借助于C++的表达式模板和模板元编程丰富的表达能力。

The result is a composite composed of primitives and smaller composites. This embedding strategy gives us the ability to build hierarchical structures that fully model EBNF expressions of arbitrary complexity. Later on, we shall see more primitive and composite building blocks.

最终的产物是一个合成物，包含了元素或者更小的合成物。这种内嵌策略使我们拥有构建能够完整塑造任意复杂度的EBNF表达式的层次结构的能力。不久，我们将看到更多的关于元素和合成物的构建的段落。

The Scanner

扫描器

Like the parser, the scanner is also an abstract concept. The task of the scanner is to feed the sequential input data stream to the parser. The scanner is composed of two STL conforming forward iterators, first and last, where first is held by reference and last, by value. The first iterator is held by reference to allow re-positioning by the parser. A set of policies governs how the scanner behaves. Parsers extract data from the scanner and position the iterator appropriately through its member functions.

与分析器类似，扫描器也是一个抽象概念。扫描器的任务是将顺序的输入数据流喂给分析器。扫描器包含两个符合STL标准的前向迭代器，first和last，first为引用对象，而last则为值对象。将first设为引用是为了使分析器能对其重定位。而决定扫描器的行为的，是一系列的扫描策略。分析器从扫描器获得数据，并通过扫描器的成员函数对迭代器重定位。

Knowledge of the intricacies of these policies is not required at all in most cases. However, knowledge of the scanner's basic API is required to write fully-conforming Spirit parsers. The scanner's API will be outlined in a separate section. In addition, for the power users and the adventurous among us, a full section will be devoted to covering the scanner policies. The scanner policies make Spirit very flexible and extensible. For instance, some of the policies may be modified to filter data. A practical example is a scanner policy that does not distinguish upper and lower case whereby making it useful for parsing case insensitive input. Another example is a scanner policy that strips white spaces from the input.

在大多数情况下，关于这些扫描策略的纷繁的知识并不是必须的。然而，对于完成一个符合Spirit标准的分析器来说，扫描器的基本的API的知识就是必须的了。扫描器的API将会在单独的章节中列出。并且，针对有经验的使用者和探险家，有专门的一节用于讨论扫描策略。扫描策略使得Spirit很有弹性和扩展性。比如，可以调整某些策略从而使扫描器具备过滤数据的能力。一个例子就是不区分大小写输入的策略，在分析大小写不敏感的输入时，这是很有用的。

The Match

匹配

The parser has a conceptual parse member function taking in a scanner and returning a match object. The primary function of the match object is to report parsing success (or failure) back to the parser's caller; i.e., it evaluates to true if the parse function is successful, false otherwise. If the parse is successful, the match object may also be queried to report the number of characters matched (using match.length()). The length is non-negative if the match is successful, and the typical length of a parse failure is -1. A zero length is perfectly valid and still represents a successful match.

分析器有一个“概念上的”parse成员函数，使用扫描器，返回匹配对象。匹配对象的主要功能是向分析器回报分析成功（或者失败）。比如，分析成功时，它产生true，反之产生false。如果分析成功，匹配对象可能也需要报告所匹配的字符串的长度（通过match.length()）。如果匹配成功，长度为非负；而失败时，典型的长度值是-1；长度为0是允许的，并且同样意味着匹配成功。

Parsers may have attribute data associated with it. For example, the real_p parser has a numeric datum associated with it. This attribute is the parsed number. This attribute is passed on to the returned match object. The match object may be queried to get this attribute. This datum is valid only when the match is successful.

分析器也可能有和匹配相关联的属性。比如real_p就有一个数值型的属性值与之相关。该属性为分析得到的数字。这个属性的值来自于返回的匹配对象。匹配对象有可能被要求获得该属性。这个属性值只在匹配成功时有效。

Semantic Actions

语义动作

A composite parser forms a hierarchy. Parsing proceeds from the topmost parent parser which delegates and apportions the parsing task to its children recursively to its children's children and so on until a primitive is reached. By attaching semantic actions to various points in this hierarchy, in effect we can transform the flat linear input stream into a structured representation. This is essentially what parsers do.

一个合成的分析器构成了一个层次结构。分析过程由最顶层，代理并为下层分配分析任务的合成分析器开始，递归下降，直到到达元素分析器。借由将语义动作黏附到这个层次中的很多黏附点上，我们可以将平滑的线性输入流转换成结构性的对象。这就是分析器最本质的工作。

Recall our example above:

回忆上面的例子：

    real_p >> *(',' >> real_p)

By hooking a function (or functor) into the real_p parsers, we can extract the numbers from the input:

将一个函数（或者函数对象）与一个real_p挂钩，我们就可以从输入中获得数值:

    real_p[&f] >> *(',' >> real_p[&f])

where f is a function that takes in a single argument. The [&f] hooks the parser with the function such that when real_p recognizes a valid number, the function f is called. It is up to the function then to do what is appropriate. For example, it can stuff the numbers in a vector. Or perhaps, if the grammar is changed slightly by replacing ',' with '+', then we have a primitive calculator that computes sums. The function f then can then be made to add all incoming numbers.

上面的f是一个只有一个参数的函数。[&f]将分析器与函数挂钩，这样，当real_p识别出一个有效数值时，函数f将被调用。接着，该干什么事情就是函数 f 来决定了。比如，它可以将数值填入一个vector，或者，如果对语法做轻微的变动，将','改为'+',我们就有一个可以计算和的简单计算器了。于是函数f就可以用来计算全部输入的数值的和了。

Copyright © 1998-2003 Joel de Guzman

Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)