Boost.Spirit用户手册翻译(14):子规则

 Subrules
子规则

 

Spirit is implemented using expression templates. This is a very powerful technique. Along with its power comes some complications. We almost take for granted that when we write i | j >> k where i, j and k are all integers the result is still an integer. Yet, with expression templates, the same expression i | j >> k where i, j and k are of type T, the result is a complex composite type [see Basic Concepts]. Spirit expressions, which are combinations of primitives and composites yield an infinite set of new types. One problem is that C++ offers no easy facility to deduce the type of an arbitrarily complex expression that yields a complex type. Thus, while it is easy to write:

Spirit 以表达式模板实现。这是很强大的技术。与这种强大相伴的是一些复杂性。我们得到的保证是,在i | j >> k中,如果i,j和k是int的话,那结果也还是一个int。然而,在使用表达式模板的场合,同样的表达式i | j >> k ,只是i,j和k变成了类型T,结果就变成了一个复杂的复合类[见基本概念] 。Spirit的表达式,这个元素和复合物的混合体导致了一个新类型的无穷集。一个问题就是没有提供一个简便的方法来推演一个会产生复杂类型的复杂表达式的类型。因此,虽然这么写很容易:

    int r = i | j >> k; // where i, j, and k are ints

Expression templates yield an endless supply of types. Without the rule, there is no easy way to do this in C++ if i, j and k are Spirit parsers:

但表达式模板造成了新类型的无限供应。离开了规则,在C++中,如果i,j和k是Spirit的分析器,就没什么好办法来作这个了。

    <what_type???> r = i | j >> k; // where i, j, and k are Spirit parsers

If i, j and k are all chlit<> objects, the type that we want is:

如果i,j和k都是chlit<>对象,那么我们想要的类型就是:

    typedef
alternative<
chlit<> // i
, sequence<
chlit<> // j
, chlit<> // k
>
>
rule_t;

rule_t r = i | j >> k; // where i, j, and k are chlit<> objects

We deliberately formatted the type declaration nicely to make it understandable. Try that with a more complex expression. While it can be done, explicitly spelling out the type of a Spirit expression template is tedious and error prone. The right hand side (rhs) has to mirror the type of the left hand side (lhs). ( Yet, if you still wish to do it, see this link for a technique).

我们故意把类型声明的格式弄得好看些,以方便理解。试着对更复杂的表达式用这个。虽然的确也可以做到,但显式拼写Spirit表达式模板的类型是冗烦而易错的。右边(rhs)必须是左边(lhs)的镜像。(不过,如果你坚持这么作的话,这个链接里有相关的技术)。

typeof and auto

typeof 和 auto 

Some compilers already support the typeof keyword. This can be used to free us from having to explicitly type the type (pun intentional). Using the typeof, we can rewrite the Spirit expression above as:

某些编译器已经支持typeof关键字了。这可以把我们从显式地打出类型中解放出来。使用typeof,我们可以把上面的Spirit表达式写成这样:

typeof(i | j >> k) r = i | j >> k;

While this is better than having to explicitly declare a complex type, it is redundant, error prone and still an eye sore. The expression is typed twice. The only way to simplify this is to introduce a macro (See this link for more information).

虽然这比显示声明复杂类型好些,但还是有冗余,易错以及晃眼。表达式被输入了两次。唯一简化这个的方法是引进一个宏(这个链接里有更多的信息)

David Abrahams proposed in comp.std.c++ to reuse the auto keyword for type deduced variables. This has been extensibly discussed in boost.org. Example:

David Abrahams在comp.std.c++中提议重用auto关键字来标识类型推演变量。这在boost.org中有延伸讨论。例子:

auto r = i | j >> k;

Once such a C++ extension is accepted into the standard, this would be a neat solution and a nice fit for our purpose. It's not a complete solution though since there are still situations where we do not know the rhs beforehand; for instance when pre-declaring cyclic dependent rules.

一旦这个C++的扩展被标准接纳,将是一个干净利落的解决方案,而且和我们的目标相当吻合。但这还不是一个完整的解决方案,某些情况下我们无法预知rhs的类型;例如在前置声明有环形依赖的规则的时候。

Fortunately, rules come to the rescue. Rules can capture the type of the expression assigned to it. Thus:

幸运的是,规则这个救星出现了。规则可以捕获赋予它的表达式的类型。因此:

    rule<> r = i | j >> k;  // where i, j, and k are chlit<> objects

It might not be apparent but behind the scenes, plain rules are actually implemented using a pointer to a runtime polymorphic abstract class that holds the dynamic type of the parser assigned to it. When a Spirit expression is assigned to a rule, its type is encapsulated in a concrete subclass of the abstract class. A virtual parse function delegates the parsing to the encapsulated object.

可能不那么直观,但在幕后,单纯的规则实际上是用一个保有分配给它的分析器的动态类型的运行时多态的抽象类的指针来实现的。当一个Spirit表达式被赋值给一个规则,它的类型就被一个抽象类的具体派生类所封装了。一个虚的分析函数代理了被封装的对象的分析工作。

Rules have drawbacks though:

然而规则也有缺点:

It is coupled to a specific scanner type. The rule is tied to a specific scanner [see The Scanner Business].

它与特定的分析器类型耦合。规则是被绑定到特定的分析器上的[见分析器事务]。
The rule's parse member function has a virtual function call overhead that cannot be inlined.

规则的parse成员函数有虚函数调用的负担,不能被内联。

Static rules: subrules

静态规则:子规则

The subrule is a fully static version of the rule. The subrule does not have the drawbacks listed above.

子规则是规则的全静态版本。它没有上面列出的那些缺点。

The subrule is not tied to a specific scanner so just about any scanner type may be used

子规则没有与特定的分析其类型绑定,因此可以使用任意的分析器类型
The subrule also allows aggressive inlining since there are no virtual function calls

子规则也允许最大限度的内联,因为它没有虚函数调用

    template<int ID, typename ContextT = parser_context<> >
class subrule;

The first template parameter gives the subrule an identification tag. Like the rule, there is a ContextT template parameter that defaults to parser_context. You need not be concerned at all with the ContextT template parameter unless you wish to tweak the low level behavior of the subrule. Detailed information on the ContextT template parameter is provided elsewhere.

第一个模板参数给了子规则一个身份牌。就像规则,它也有一个默认为parser_context的ContextT模板参数。你完全不用理会ContextT这个模板参数,除非你想调整子规则的底层行为。有关ContextT模板参数的细节在其他地方提供。

Presented above is the public API. There may actually be more template parameters after ContextT. Everything after the ContextT parameter should not be of concern to the client and are strictly for internal use only.

上面出现的是它的公共接口。实际上在ContextT后面还可能有其他的模板参数。所有在ContextT之后的东西都不应该被用户所关心,并且被严格限定在内部使用。

Apart from a few minor differences, the subrule follows the usage and syntax of the rule closely. Here's the calculator grammar using subrules:

除了某些细节的不同,子规则的使用和语法与规则很接近。这里是用子规则实现的计算器语法:

    struct calculator : public grammar<calculator>
{
template <typename ScannerT>
struct definition
{
definition(calculator const& self)
{
first =
(
expression = term >> *(('+' >> term) | ('-' >> term)),
term = factor >> *(('*' >> factor) | ('/' >> factor)),
factor = integer | group,
group = '(' >> expression >> ')'
);
}

subrule<0> expression;
subrule<1> term;
subrule<2> factor;
subrule<3> group;

rule<ScannerT> first;
rule<ScannerT> const&
start() const { return first; }
};
};

A fully working example with semantic actions can be viewed here. This is part of the Spirit distribution.

绑定了语义动作的完整的例子见这里。这是Spirit发布包的一部分。

 

The subrule as an efficient version of the rule. Compiler optimizations such as aggressive inlining help reduce the code size and increase performance significantly.

子规则是高效版的规则。编译器用诸如高度内联这样的优化帮助减小程序尺寸和显著提高效率。

The subrule is not a panacea however. Subrules push the C++ compiler hard to its knees. For example, current compilers have a limit on recursion depth that may not be exceeded. Don't even think about writing a full pascal grammar using subrules alone. A grammar using subrules is a single C++ expression. Current C++ compilers cannot handle very complex expressions very well. Finally, a plain rule is still needed to act as place holder for subrules.

但子规则并不是万灵药。子规则把C++编译器逼到了极限。比如,目前的编译器有可能不可克服的递归深度的极限。甚至不要想写一个Pascal语法而只使用子规则。一个使用子规则的语法是一个单一的C++表达式。目前C++编译器还不能很好地应付复杂的表达式。最后,一个单纯的规则还是要作为保有子规则的占位符而出现的。

The code above is a good example of the recommended way to use subrules. Notice the hierarchy. We have a grammar that encapsulates the whole calculator. The start rule is a plain rule that holds the set of subrules. The subrules in turn defines the actual details of the grammar.

上面的代码是一个关于如何按照推荐的范式使用子规则的很好的例子。注意它的层次。我们有一个语法来封装整个计算器。起始规则是一个保有子规则的集合的纯规则。子规则依次定义语法的实际细节。

Template instantiation depth
模板实例化深度


Spirit pushes the C++ compiler hard. Current C++ compilers cannot handle very complex heavily nested expressions very well. One restricting factor is the typical compiler's limit on template recursion depth. Some, but not all, compilers allow this limit to be configured.
Spirit给C++编译器的负担很重,目前的C++编译器无法很好地处理非常复杂的重重嵌套的表达式。一个制约因素是典型的编译器在模板递归上的深度的限制。某些,但并非全部编译器允许调整这个深度。

g++'s maximum can be set using a compiler flag: -ftemplate-depth. Set this appropriately if you have a relatively complex grammar.
g++的最大值可以用编译器标志-ftemplate-depth来设置。可以把它设置成合适的值,如果你有相对复杂的语法的话。

Microsoft Visual C++ can take greater than 1000 for both template class and function instantiation depths. However, the linker chokes with deep template function instantiation unless inline recursion depth is set using these pragmas:
Microsoft Visual C++在类模板和函数模板上都大于1000.然而,连接器处理深度模板函数实例时会阻塞,除非内联递归深度被用这的参数设置了:

#pragma inline_depth(255)
#pragma inline_recursion(on)

Perhaps this limitations no longer applies to more current versions of these compilers. Be sure to check your compiler documentation.
也许这些限制在现在的编译器上已经不存在了。查阅你的编译器文档以确认这些。

This setup gives a good balance. The subrules do all the work. Each grammar will have only one rule: first. The rule is used just to hold the subrules and make them visible to the grammar.

这种设置带来一个好的平衡。子规则干所有事情。每个语法都只有一个规则:first。规则只是用来持有子规则并使他们对语法可见。

The subrule definition

子规则定义

Like the rule, the expression after assignment operator = defines the subrule:

类似规则,赋值号=后面的表达式定义了子规则:

    identifier = expression

Unlike rules, subrules may be defined only once. Redefining a subrule is illegal and will result to a compile time assertion.

与规则不同,子规则只能定义一次。重定义子规则是非法的,将导致编译时断言失败。

Separators [ , ]

分隔符[ , ]

While rules are terminated by the semicollon ';'. Subrules are not terminated but are separated by the comma: ','. Like Pascal statements, the last subrule in a group may not have a trailing comma.

虽然规则以分号“;”终结,但子规则并无终结,并以逗号‘,’分隔。就像Pascal语句,一组子规则中最后一个子规则并不需要后缀的逗号。

    a = ch_p('a'),
b = ch_p('b'),
c = ch_p('c'), // BAD, trailing comma

 

    a = ch_p('a'),
b = ch_p('b'),
c = ch_p('c') // OK

The start subrule

起始规则

Unlike rules, parsing proceeds from the start subrule. The first (topmost) subrule in a group of subrules is called the start subrule. In our example above, expression is the start subrule. When a group of subrules is called forth, the start subrule expression is called first.

与规则不同,分析过程从起始规则开始。一组中第一个(最高处)的子规则被称为起始规则。在上面的例子里,expression是起始规则。当一组子规则被有效调用时,起始规则expression将首先被调用。

IDs

ID

Each subrule has a corresponding ID; an integral constant that uniquely specifies the subrule. Our example above has four subrules. They are declared as:

每个子规则都有相应的ID;一个用于唯一标识子规则的整形常量。上面的例子有四个子规则。这么声明他们:

    subrule<0>  expression;
subrule<1> term;
subrule<2> factor;
subrule<3> group;

Aliases

别名

It is possible to have subrules with similar IDs. A subrule with a similar ID to will be an alias of the other. Both subrules may be used interchangeably.

子规则有相同的ID是可能的。有相同ID的子规则将互为对方的别名。这些子规则可以互换使用。

    subrule<0>  a;
subrule<0> alias; // alias of a

Groups: scope and nesting

编组:范围和嵌套

The scope of a subrule and its definition is the enclosing group, typically (and by convention) enclosed inside the parentheses. IDs outside a scope are not directly visible. Inner subrule groups can be nested by enclosing each sub-group inside another set of parentheses. Each group is unique and acts independently. Consequently, while it may not be advisable to do so, a subrule in a group may share the same ID as a subrule in another group since both groups are independent of each other.

子规则以及它的定义的范围是封装他们的组,典型(同时也是约定)情况是封装在括号内。范围之外的ID并不直接可见。内部的子规则组可以使用另外一对括号来封装以实现嵌套。每个组都是唯一的并且独立工作。因此,虽然不建议这么做,但两个不同组的子规则可以使用同一个id。

    subrule<0> a;
subrule<1> b;
subrule<0> c;
subrule<1> d;

( // outer subrule group, scope of a and b
a = ch_p('a'),
b =
( // inner subrule group, scope of b and c
c = ch_p('c'),
d = ch_p('d')
)
)

Subrule IDs need to be unique only within a group. A grammar is an implicit group. Furthermore, even subrules in a grammar may have the same IDs without clashing if they are inside a group. Subrules may be explicitly grouped using the parentheses. Parenthesized groups have unique scopes. In the code above, the outer subrule group defines the subrules a and b while the inner subrule group defines the subrules c and d. Notice that the definition of b is the inner subrule.

子规则的ID的唯一性只是在同组中要求。一个语法就是一个隐式的组。更进一步,甚至同一个语法中的子规则也可以使用相同的ID,只要他们在不同组中。子规则可以用括号显示分组。带括号的组有唯一的范围。上面的代码中,外面的子规则组定义了子规则a和b,而内里面的组定义了c和d。注意b的定义是在子规则内的。

 


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值