boost.spirit用户手册翻译(3):入门

22 篇文章 0 订阅

Why would you want to use Spirit?

为什么使用Spirit?

Spirit is designed to be a practical parsing tool. At the very least, the ability to generate a fully-working parser from a formal EBNF specification inlined in C++ significantly reduces development time. While it may be practical to use a full-blown, stand-alone parser such as YACC or ANTLR when we want to develop a computer language such as C or Pascal, it is certainly overkill to bring in the big guns when we wish to write extremely small micro-parsers. At that end of the spectrum, programmers typically approach the job at hand not as a formal parsing task but through ad hoc hacks using primitive tools such as scanf. True, there are tools such as regular-expression libraries (such as boost regex) or scanners (such as boost tokenizer), but these tools do not scale well when we need to write more elaborate parsers. Attempting to write even a moderately-complex parser using these tools leads to code that is hard to understand and maintain.

Spirit是作为实用的分析工具来设计的。至少,它以内联C++,从EBNF规范生成全功能分析器的能力,可以极大地缩短开发时间。虽然在设计一门编程语言,例如pascal或者C的时候,我们可以使用那些著名的独立的分析器生成器,比如YACC或者ANTLR,但在只是想编写一个很小的微型分析器的的时候,就是杀鸡用牛刀了。而在另一方面,通常程序员不会把手上的工作当成一个正规的分析过程,而是随手使用诸如sprintf这样的原始工具来完成工作。诚然,确实有正则表达式库(比如boost regex)或者扫描器(比如 boost tokenizer)这样的工具,但是当你需要写一个更精细的分析器的时候,这些工具就显得不够灵活了。如果试图以这些工具写一个一般复杂度的分析器,将会使代码变得难以理解和维护。

One prime objective is to make the tool easy to use. When one thinks of a parser generator, the usual reaction is "it must be big and complex with a steep learning curve." Not so. Spirit is designed to be fully scalable. The framework is structured in layers. This permits learning on an as-needed basis, after only learning the minimal core and basic concepts.

易用,这是Spirit要做到的事情之一。提到分析器生成器,通常的反应就是“那一定是一个复杂的大家伙,学习曲线肯定很陡峭。”。并非如此,Spirit被设计成完全可伸缩的。这个框架以层次结构来组织,由此,当掌握了最小部分的核心内容和基本概念之后,所需要学习的,仅仅是需要使用的。

For development simplicity and ease in deployment, the entire framework consists of only header files, with no libraries to link against or build. Just put the spirit distribution in your include path, compile and run. Code size? -very tight. In the quick start example that we shall present in a short while, the code size is dominated by the instantiation of the std::vector and std::iostream.

为了简化开发和部署,Spirit整个框架只包含了头文件,不需要在编译时链接库。所要作的只是把Spirit放到你的包含路径,编译,运行。代码规模?很小。下面将要提到的例子里,影响代码规模的,主要是std::vector和std::iostream的实例化。

Trivial Example #1

简单例子#1

Create a parser that will parse a floating-point number.

创建一个浮点数分析器。

    real_p 

(You've got to admit, that's trivial!) The above code actually generates a Spirit real_parser (a built-in parser) which parses a floating point number. Take note that parsers that are meant to be used directly by the user end with "_p" in their names as a Spirit convention. Spirit has many pre-defined parsers and consistent naming conventions help you keep from going insane!

(你得承认,的确很简单!)上面的代码实际上创建了一个Spirit的real_parser(一个内建的分析器),用于分析浮点数的。注意可以直接使用的分析器以"_p"结尾,这是Spirit的一个命名约定。Spirit有很多预定义的分析器和命名约定,让你远离头晕目眩的境地。

Trivial Example #2

简单的例子#2

Create a parser that will accept a line consisting of two floating-point numbers.

创建一个接受一行有两个浮点数输入的分析器。

    real_p >> real_p 

Here you see the familiar floating-point numeric parser real_p used twice, once for each number. What's that >> operator doing in there? Well, they had to be separated by something, and this was chosen as the "followed by" sequence operator. The above program creates a parser from two simpler parsers, glueing them together with the sequence operator. The result is a parser that is a composition of smaller parsers. Whitespace between numbers can implicitly be consumed depending on how the parser is invoked (see below).

在这儿,你可以看到熟悉的浮点数分析器real_p被用了两次,每次接受一个数字。>>操作符在这个是干什么的呢?因为总得有些什么东西把这俩分析器分开,而这个操作符被选为“并置”操作符,表示“跟随”。上面的程序用两个简单的分析器创建了一个新的分析器,用并置号把它们胶接在一起。这么作的结果就是一个由两个更小的分析器合成的分析器。取决于解析器如何被调用(下面会提到),数字间的空格可以隐式地消耗掉。

Note: when we combine parsers, we end up with a "bigger" parser, But it's still a parser. Parsers can get bigger and bigger, nesting more and more, but whenever you glue two parsers together, you end up with one bigger parser. This is an important concept.

注意:当我们组合分析器时,结果是获得“更大的”分析器,但它仍然是分析器。分析器可以越来越大,嵌套越来越多。无论何时,当你把两个分析器粘到一块儿,你得到的是一个更大的分析器,这是一个重要的概念。

Trivial Example #3

简单例子#3。

Create a parser that will accept an arbitrary number of floating-point numbers. (Arbitrary means anything from zero to infinity)

创建一个可以接受任意多浮点数输入的分析器。(“任意多”意味着从0到无穷)

    *real_p 

This is like a regular-expression Kleene Star, though the syntax might look a bit odd for a C++ programmer not used to seeing the * operator overloaded like this. Actually, if you know regular expressions it may look odd too since the star is before the expression it modifies. C'est la vie. Blame it on the fact that we must work with the syntax rules of C++.

这种用法类似正则表达式里的克林星号,虽然对一个从未见过如此重载星号的C++程序员来说,这语法看起来也许有点奇怪。实际上,如果你熟悉正则表达式,也许同样会觉得这个奇怪,因为星号是在它修饰的表达式之前。没办法,这就是生活,要怪就怪我们要和C++的语法一起工作吧。

Any expression that evaluates to a parser may be used with the Kleene Star. Keep in mind, though, that due to C++ operator precedence rules you may need to put the expression in parentheses for complex expressions. The Kleene Star is also known as a Kleene Closure, but we call it the Star in most places.

任何产生分析器的表达式都可以用克林星号修饰。记住,由于C++的操作符优先级,在复杂的表达式中,需要用括号来包裹较小的表达式。克林星号也被叫做克林闭包,但在很多地方,我们把它叫作星号。

Example #4 [ A Just Slightly Less Trivial Example ]

例四(不那么简单的例子)

This example will create a parser that accepts a comma-delimited list of numbers and put the numbers in a vector.

这个例子创建一个接受一个用括号分隔的数字列表输入的分析器,并且将数字储存在一个vector中。

Step 1. Create the parser

第一步.创建分析器

    real_p >> *(ch_p(',') >> real_p) 

Notice ch_p(','). It is a literal character parser that can recognize the comma ','. In this case, the Kleene Star is modifying a more complex parser, namely, the one generated by the expression:

注意 ch_p(','),它是一个印刷字符【literal character ,有没有更准确的翻译?】分析器,可以识别逗逗号‘,’。克林星号修饰的是一个更复杂的分析器,即由表达式:

   (ch_p(',') >> real_p)

生成的分析器。

Note that this is a case where the parentheses are necessary. The Kleene star encloses the complete expression above.

注意,这是一个必须使用括号的情况。克林星号修饰范围包括上面整个表达式。

Step 2. Using a Parser (now that it's created)

步骤2.使用表达式(在它被创建之后)

Now that we have created a parser, how do we use it? Like the result of any C++ temporary object, we can either store it in a variable, or call functions directly on it.

现在我们可以使用创建了的表达式了。如何使用它呢?就像任何的C++临时对象一样,要么把它储存到一个变量里,或者直接对他调用函数。

We'll gloss over some low-level C++ details and just get to the good stuff.

为了更好的说明,我们将跳过一些C++的底层细节。

If r is a rule (don't worry about what rules exactly are for now. This will be discussed later. Suffice it to say that the rule is a placeholder variable that can hold a parser), then we store the parser as a rule like this:

如果 r是一个规则(先别管“规则”到底是什么,以后会提到的,这里只要知道规则是一个用来保存分析器的变量就够了),那么可以像这样用把分析器储存为规则:

    r = real_p >> *(ch_p(',') >> real_p); 

Not too exciting, just an assignment like any other C++ expression you've used for years. The cool thing about storing a parser in a rule is this: rules are parsers, and now you can refer to it by name. (In this case the name is r). Notice that this is now a full assignment expression, thus we terminate it with a semicolon, ";".

并没有什么可兴奋的,这就像过去你使用的C++赋值表达式一样。把分析器当成规则存储的好处是:规则是分析器的组合,而你可以通过规则的名字来使用这些不同的组合(这里规则的名字是r)。这是一个完整的赋值表达式,因此我们以分号结束。

That's it. We're done with defining the parser. So the next step is now invoking this parser to do its work. There are a couple of ways to do this. For now, we shall use the free parse function that takes in a char const*. The function accepts three arguments:

这样,分析器的定义就完成了。下一步就是使用它了。有很多使用的方法。这里,我们使用接受const char*的parse全局函数。这个函数有三个参数:

The null-terminated const char* input

以'/0'结尾的const char*,输入

The parser object

分析器对象

Another parser called the skip parser

In另一个称为“忽略分析器”的分析器。

 

In our example, we wish to skip spaces and tabs. Another parser named space_p is included in Spirit's repertoire of predefined parsers. It is a very simple parser that simply recognizes whitespace. We shall use space_p as our skip parser. The skip parser is the one responsible for skipping characters in between parser elements such as the real_p and the ch_p.

在这里,我们希望忽略空格和制表符。在Spirit预定义的分析器中,包含一个名为space_p的分析器。这是一个简单的分析器,只是简单的识别空格。我们将把space_p用作“忽略分析器”。忽略分析器就是负责将分析器元素之间(比如real_p和ch_p)的字符忽略掉的分析器。

Ok, so now let's parse!

好了,现在,分析!

    r = real_p >> *(ch_p(',') >> real_p);
parse(str, r, space_p) // Not a full statement yet, patience...

The parse function returns an object (called parse_info) that holds, among other things, the result of the parse. In this example, we need to know:

parse函数返回一个对象(称为parse_info)用来保留分析的结果。在这个例子里,我们要知道:

Did the parser successfully recognize the input str?

分析器是否成功地识别了输入的str?

Did the parser fully parse and consume the input up to its end?

分析器是否完全分析和消耗了整个输入?

To get a complete picture of what we have so far, let us also wrap this parser inside a function:

为了对到现在为止我们所接触的有个全面的印象,我们把这个分析器包装在一个函数里:

    bool
parse_numbers(char const* str)
{
return parse(str, real_p >> *(',' >> real_p), space_p).full;
}

Note in this case we dropped the named rule and inlined the parser directly in the call to parse. Upon calling parse, the expression evaluates into a temporary, unnamed parser which is passed into the parse() function, used, and then destroyed.

注意,在这里我们没使用命名规则,而是直接把分析器内联到调用parse函数的语句中。这样在调用parse时,表达式生成一个临时的匿名分析器,传递给parse()函数,接着被使用,然后析构。

char and wchar_t operands

char 和 wchar_t


The careful reader may notice that the parser expression has ',' instead of ch_p(',') as the previous examples did. This is ok due to C++ syntax rules of conversion. There are >> operators that are overloaded to accept a char or wchar_t argument on its left or right (but not both). An operator may be overloaded if at least one of its parameters is a user-defined type. In this case, the real_p is the 2nd argument to operator>>, and so the proper overload of >> is used, converting ',' into a character literal parser.

 

细心的读者也许已经注意到了,上面例子的分析器表达式里,','取代了ch_p(',')。基于C++语法里的类型转换规则,这是允许的。operator>>的重载中,有接受char或wchar_t为它的参数的形式(或左或右,但不能同时在两边出现)。在操作符的参数中,至少有一个是用户定义类型的情况下,它的重载就是允许的,因此,operator>>就被以恰当的方式重载,将','转换为一个印刷字符分析器。



The problem with omiting the ch_p call should be obvious: 'a' >> 'b' is not a spirit parser, it is a numeric expression, right-shifting the ASCII (or another encoding) value of 'a' by the ASCII value of 'b'. However, both ch_p('a') >> 'b' and 'a' >> ch_p('b') are Spirit sequence parsers for the letter 'a' followed by 'b'. You'll get used to it, sooner or later.

调用时,ch_p的缺失造成的问题是很明显的: 'a' >> 'b' 不是Spirit的分析器,而是一个数学表达式:将'a'的ASCII(或者其他编码)的数值右移'b'(ASCII数值)位。然而,ch_p('a')>>'b'和'a'>>ch_p('b')都是Siprit的并置分析器,意指a后跟随b。你会习惯这个的,迟早。

Take note that the object returned from the parse function has a member called full which returns true if both of our requirements above are met (i.e. the parser fully parsed the input).

注意,parse函数返回的对象有一个名为full的成员变量,当上面我们所有的要求都满足时,它的值为true(比如分析器完整地分析了输入)。

Step 3. Semantic Actions

步骤3:语义动作

Our parser above is really nothing but a recognizer. It answers the question "did the input match our grammar?", but it does not remember any data, nor does it perform any side effects. Remember: we want to put the parsed numbers into a vector. This is done in an action that is linked to a particular parser. For example, whenever we parse a real number, we wish to store the parsed number after a successful match. We now wish to extract information from the parser. Semantic actions do this. Semantic actions may be attached to any point in the grammar specification. These actions are C++ functions or functors that are called whenever a part of the parser successfully recognizes a portion of the input. Say you have a parser P, and a C++ function F, you can make the parser call F whenever it matches an input by attaching F:

上面的分析器其实只是一个识别器。它回答这么个问题:“输入的东西符合我们规定的语法么?”,但它并不记录任何数据,也不会带来任何副作用。记住:我们要把分析得到的数字储存到一个vector里。这是通过将一个连接到特定的分析器的动作来实现的。比如,当分析出一个实数时,在成功匹配之后,我们希望储存分析出来的实数。于是就需要从分析器获得额外的信息了。语义动作就是干这个的。语义动作可以在语法中的任意一点上连接。这些动作是指当分析器的任意部分成功识别某段输入时,可以被分析器调用的C++函数或者函数对象。比如一个分析器P和一个函数F,如果想让P在成功匹配输入时调用F,可以以如下的方式连接:

    P[&F] 

Or if F is a function object (a functor):

或者,F是一个函数对象:

    P[F] 

The function/functor signature depends on the type of the parser to which it is attached. The parser real_p passes a single argument: the parsed number. Thus, if we were to attach a function F to real_p, we need F to be declared as:

函数/函数对象的签名取决于要连接的分析器的类型。real_p传递一个参数:分析出的数值。因此,如果将函数F连接到real_p,那么F就必须这样声明:

    void F(double n);

For our example however, again, we can take advantage of some predefined semantic functors and functor generators ( A functor generator is a function that returns a functor). For our purpose, Spirit has a functor generator push_back_a(c). In brief, this semantic action, when called, appends the parsed value it receives from the parser it is attached to, to the container c.

在这个例子里,又一次的,我们可以从预定义的语义动作和函数对象生成器中获得些好处( “函数对象生成器”就是返回函数对象的函数)。Spirit有一个名为push_back_a(c)的函数对象生成器符合我们的要求。简而言之,这个语义动作,在被调用时,将从连接的分析器那儿得到的分析出的数字添加到容器c。

Finally, here is our complete comma-separated list parser:

最后,得到了一个完整的逗号分隔列表分析器:

    bool
parse_numbers(char const* str, vector<double>& v)
{
return parse(str,

// Begin grammar
(
real_p[push_back_a(v)] >> *(',' >> real_p[push_back_a(v)])
)
,
// End grammar

space_p).full;
}

This is the same parser as above. This time with appropriate semantic actions attached to strategic places to extract the parsed numbers and stuff them in the vector v. The parse_numbers function returns true when successful.

这里的分析器和上次的一样,只是这次在恰当的位置加上了语义动作,以获得分析出的数字并将它们填入 vector v中。parse_numbers函数在解析成功时返回true.

The full source code can be viewed here. This is part of the Spirit distribution.

 完整的源码可以在这里看到。这是Spirit分发包的一部分。

 


 
 
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值