Boost.Spirit用户手册翻译（6）：元素

最新推荐文章于 2008-01-25 16:45:00 发布

qingcairousi

最新推荐文章于 2008-01-25 16:45:00 发布

阅读量1.9k

点赞数

分类专栏： C++ boost 文章标签： character generator constructor function parameters iterator

boost 同时被 2 个专栏收录

24 篇文章 0 订阅

订阅专栏

C++

22 篇文章 0 订阅

订阅专栏

Primitives

元素

The framework predefines some parser primitives. These are the most basic building blocks that the client uses to build more complex parsers. These primitive parsers are template classes, making them very flexible.

框架预定义了一些分析器元素。这些是提供给用户构造复杂的分析器的最基本的单元。这些元素分析器都是模板类，因此很有伸缩性。

These primitive parsers can be instantiated directly or through a templatized helper function. Generally, the helper function is far simpler to deal with as it involves less typing.

这些元素分析器可以直接或经由模板帮助函数实例化。使用帮助函数远比直接使用简单，就像它有更少的输入那样。

We have seen the character literal parser before through the generator function ch_p which is not really a parser but, rather, a parser generator. Class chlit<CharT> is the actual template class behind the character literal parser. To instantiate a chlit object, you must explicitly provide the character type, CharT, as a template parameter which determines the type of the character. This type typically corresponds to the input type, usually char or wchar_t. The following expression creates a temporary parser object which will recognize the single letter 'X'.

前面我们已经看到经由生成函数ch_p生成的印刷字符分析器，实际上，ch_p并不是一个分析器，而是一个分析器生成器。类chlit< CharT>就是印刷字符分析器背后真正的模板类。要实例化chlit对象，就必须显式地提供字符类型CharT，这个模板参数用于确定所要识别的 “字符”的真正类型。通常这个类型与输入的类型相同，通常是char或者wchar_t。下面的表达式创建一个识别单个大写'X'字符的分析器的临时对象。

    chlit<char>('X');

Using chlit's generator function ch_p simplifies the usage of the chlit<> class (this is true of most Spirit parser classes since most have corresponding generator functions). It is convenient to call the function because the compiler will deduce the template type through argument deduction for us. The example above could be expressed less verbosely using the ch_p helper function.

使用chlit的生成器函数ch_p简化了chlit<>类的应用（Spirit里大多数分析器类和对应的生成器函数都这样）。使用这些函数的好处是编译器可以根据函数参数的类型为我们推演模板参数的类型。上面的例子由于使用ch_p辅助函数而显得不那么啰嗦了。

    ch_p('X')  // equivalent to chlit<char>('X') object

Parser generators
分析器生成器

Whenever you see an invocation of the parser generator function, it is equivalent to the parser itself. Therefore, we often call ch_p a character parser, even if, technically speaking, it is a function that generates a character parser.
任何时候看到分析器生成器函数的出现，都等同于分析器本身。因此，我们常常把ch_p称为分析器。虽然从技术上来说，它是一个生成字符分析器的函数。

The following grammar snippet shows these forms in action:

下面的语法片段就是其实际运用：

    // a rule can "store" a parser object.  They're covered
    // later, but for now just consider a rule as an opaque type
    rule<> r1, r2, r3;

    chlit<char> x('X');     // declare a parser named x

    r1 = chlit<char>('X');  //  explicit declaration
    r2 = x;                 //  using x
    r3 = ch_p('X')          //  using the generator

chlit and ch_p

chlit 和 ch_p

Matches a single character literal. chlit has a single template type parameter which defaults to char (i.e. chlit<> is equivalent to chlit<char>). This type parameter is the character type that chlit will recognize when parsing. The function generator version deduces the template type parameters from the actual function arguments. The chlit class constructor accepts a single parameter: the character it will match the input against. Examples:

二者匹配一个单独的印刷字符。chlit有默认值为char的单独的模板参数（比如chlit<>等价于chlit<char>）。这个类型参数就是chlit在分析时要识别的字符的真正类型。生成器函数通过函数的参数推演这个模板类型参数。chlit类的构造函数只接受一个参数——它要从输入中识别的字符。例：

    r1 = chlit<>('X');
    r2 = chlit<wchar_t>(L'X');
    r3 = ch_p('X');

Going back to our original example:

回到我们原来的例子：

    group = '(' >> expr >> ')';
    expr1 = integer | group;
    expr2 = expr1 >> *(('*' >> expr1) | ('/' >> expr1));
    expr  = expr2 >> *(('+' >> expr2) | ('-' >> expr2));

the character literals '(', ')', '+', '-', '*' and '/' in the grammar declaration are chlit objects that are implicitly created behind the scenes.

语法声明里字符'(', ')', '+', '-', '*' 和'/'就是在幕后隐式生成的chlit对象。

char operands

char操作子

The reason this works is from two special templatized overloads of operator>> that takes a (char, ParserT), or (ParserT, char). These functions convert the character into a chlit object.
上面例子有效的原因是两个重载的operator>>模板函数，参数分别为(char,ParserT)和(ParserT,char)。这些函数把字符转化为chlit对象。

One may prefer to declare these explicitly as:

也可以显式地声明：

    chlit<> plus('+');
    chlit<> minus('-');
    chlit<> times('*');
    chlit<> divide('/');
    chlit<> oppar('(');
    chlit<> clpar(')');

range and range_p

range 和 range_p

A range of characters is created from a low/high character pair. Such a parser matches a single character that is in the range, including both endpoints. Like chlit, range has a single template type parameter which defaults to char. The range class constructor accepts two parameters: the character range (from and to, inclusive) it will match the input against. The function generator version is range_p. Examples:

字符的范围来自于一个高/低字符对。这样的一个分析器识别范围内的单个字符，这个范围是一个闭区间。与chlit类似，range也只有默认值为 char的单一模板参数。range的构造函数接受两个参数：字符的范围（“从”和“到”），将用于匹配输入。对应的生成器函数是range_p,例子：

    range<>('A','Z')    // matches 'A'..'Z'
    range_p('a','z')    // matches 'a'..'z'

Note, the first character must be "before" the second, according to the underlying character encoding characters. The range, like chlit is a single character parser.

注意：在字符的底层编码中，第一个字符必须在第二个“之前”。range与chlit一样是一个单一字符分析器。

Character mapping
字符映射

Character mapping to is inherently platform dependent. It is not guaranteed in the standard for example that 'A' < 'Z', however, in many occasions, we are well aware of the character set we are using such as ASCII, ISO-8859-1 or Unicode. Take care though when porting to another platform.
字符映射天生平台相关。比如，没有标准保证‘A’<'Z'。然而，在很多情况下我们都能够被告知所使用的诸如ASCII，ISO-8859-1或Unicode这样的字符集。只是在移植到不同的平台的时候需要注意。

strlit and str_p

strlit 和 str_p

This parser matches a string literal. strlit has a single template type parameter: an iterator type. Internally, strlit holds a begin/end iterator pair pointing to a string or a container of characters. The strlit attempts to match the current input stream with this string. The template type parameter defaults to char const*. strlit has two constructors. The first accepts a null-terminated character pointer. This constructor may be used to build strlits from quoted string literals. The second constructor takes in a first/last iterator pair. The function generator version is str_p. Examples:

这个分析器匹配一个字符串。strlit只有一个模板类型参数——迭代器类型。strlit在内部保存一个指向字符串或字符容器的 begin/end迭代器对。strlit试图以这个字符串匹配当前输入流。模板的类型参数默认值为char const*。strlit有两个构造函数。第一个接受零终结的字符串指针。这个构造函数可以用于通过引号引起来的字符串常量构造strlit对象。第二个构造函数接受first/last迭代器对。对应的生成器函数为str_p.例子：

    strlit<>("Hello World")
    str_p("Hello World")

    std::string msg("Hello World");
    strlit<std::string::const_iterator>(msg.begin(), msg.end());

Character and phrase level parsing
字符和句意分析

Typical parsers regard the processing of characters (symbols that form words or lexemes) and phrases (words that form sentences) as separate domains. Entities such as reserved words, operators, literal strings, numerical constants, etc., which constitute the terminals of a grammar are usually extracted first in a separate lexical analysis stage.
典型的分析器将字符分析和短语分析视为不同的领域。诸如保留字、操作符、字符串、数值常量这些构成终结符的实体通常首先在词法分析阶段抽取。

At this point, as evident in the examples we have so far, it is important to note that, contrary to standard practice, the Spirit framework handles parsing tasks at both the character level as well as the phrase level. One may consider that a lexical analyzer is seamlessly integrated in the Spirit framework.
在这里，在目前为止的例子中，需要注意的一个事实是：与标准的应用不同，Spirit框架在字符层面和句子层面上都处理分析任务。可以认为词法分析器被无缝集成到了Spirit框架中。

Although the Spirit parser library does not need a separate lexical analyzer, there is no reason why we cannot have one. One can always have as many parser layers as needed. In theory, one may create a preprocessor, a lexical analyzer and a parser proper, all using the same framework.
尽管Spirit分析器库不需要单独的词法分析器，但也没有理由不能有一个。只要需要，总是可以拥有任意多的分析层次的。理论上，人们可以创建一个预处理器，一个词法分析器和一个编译器；全都基于相同的框架。

chseq and chseq_p

chaseq和chseq_p

Matches a character sequence. chseq has the same template type parameters and constructor parameters as strlit. The function generator version is chseq_p. Examples:

匹配一个字符序列。chseq拥有与strlit相同的模板类型参数和构造函数参数。生成器函数是chseq_p。例子：

    chseq<>("ABCDEFG")
    chseq_p("ABCDEFG")

strlit is an implicit lexeme. That is, it works solely on the character level. chseq, strlit's twin, on the other hand, can work on both the character and phrase levels. What this simply means is that it can ignore white spaces in between the string characters. For example:

strlit是一个隐式的词位。也就是说，它只在字符层面工作。chseq和strlit是双生子，只是，前者在字符层面和句子层面都可以工作。简而言之它可以忽略连续字符之间的空格。例如：

    chseq<>("ABCDEFG")

can parse:

匹配：

    ABCDEFG
    A B C D E F G
    AB CD EFG

More character parsers

negation ~

反～

Single character parsers such as the chlit, range, anychar_p, alnum_p etc. can be negated. For example:

单字符分析器如chlit、range、anychar_p、alnum_p等，可以对其求反。比如：

    ~ch_p('x')

matches any character except 'x'. Double negation of a character parser cancels out the negation. ~~alpha_p is equivalent to alpha_p.

匹配除了'x'以外的任意字符。对字符分析器两次求反等于不求反，~~alpha_p等价于alpha_p。

eol_p

Matches the end of line (CR/LF and combinations thereof).

匹配换行符(CR/LF或它们的混合)

nothing_p

Never matches anything and always fails.

永不匹配，永远失败。

end_p

Matches the end of input (returns a sucessful match with 0 length when the input is exhausted)

匹配输入的终结（当输入耗尽时，返回长度为零的成功匹配）。

Copyright © 1998-2003 Joel de Guzman
Copyright © 2003 Martin Wille

Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)