Boost.Spirit用户手册翻译（12）：扫描器与分析

最新推荐文章于 2022-10-14 21:17:12 发布

qingcairousi

最新推荐文章于 2022-10-14 21:17:12 发布

阅读量1.8k

点赞数

分类专栏： boost C++ 文章标签： parsing iterator input character semantic structure

boost 同时被 2 个专栏收录

24 篇文章 0 订阅

订阅专栏

C++

22 篇文章 0 订阅

订阅专栏

The Scanner and Parsing

扫描器与分析

The scanner's task is to feed the sequential input data stream to the parser. The scanner extracts data from the input, parceling, potentially modifying or filtering, and then finally relegating the result to individual parser elements on demand until the input is exhausted. The scanner is composed of two STL conforming forward iterators, first and last, where first is held by reference and last, by value. The first iterator is held by reference to allow it to be re-positioned. The following diagram illustrates what's happening:

扫描器的任务是将线性输入的数据流喂给分析器。它从输入中提取，包裹，还有可能修改或者过滤数据，接着最后把结果移交给每个有需要的分析器，直到耗尽输入。扫描器有两个STL兼容的前向迭代器，first和last，前者是个引用对象，后者是一个值对象。first作为引用使得它可以被重定位。下图描绘的就是所发生的：

The scanner manages various aspects of the parsing process through a set of policies. There are three sets of policies that govern:

扫描器通过一系列策略集管理分析过程的多个方面。策略集有三个：

Iteration and filtering

迭代与过滤
Recognition and matching

识别与匹配
Handling semantic actions

处理语义动作

These policies are mostly hidden from view and users generally need not know about them. Advanced users might however provide their own policies that override the ones that are already in place to fine tune the parsing process to fit their own needs. We shall see how this can be done. This will be covered in further detail later.

这些策略隐藏在最深处，用户一般是不需要了解它们的。然而高级用户也许会提供自己的策略来代替它们以调整分析器，使之符合用户的需求。我们将会看到这是如何做到的。不久将会看到更详尽的细节。

The scanner is a template class expecting two parameters: IteratorT, the iterator type and PoliciesT, its set of policies. IteratorT defaults to char const* while PoliciesT defaults to scanner_policies<>, a predefined set of scanner policies that we can use straight out of the box.

scanner是一个模板类，有两个模板参数：IteratorT——迭代器类，以及PoliciesT——策略集类。IteratorT默认为char const*而PoliciesT默认为scanner_policies<>——一个可以直接使用的预定义分析器策略集。

    template<
        typename IteratorT  = char const*,
        typename PoliciesT  = scanner_policies<> >
    class scanner;

Spirit uses the same iterator concepts and interface formally defined by the C++ Standard Template Library (STL). We can use iterators supplied by STL's containers (e.g. list, vector, string, etc.) as is, or perhaps write our own. Iterators can be as simple as a pointer (e.g. char const*). At the other end of the spectrum, iterators can be quite complex; for instance, an iterator adapter that wraps a lexer such as LEX.

Spirit使用与在C++标准模板库（STL）中正式定义的迭代器概念和接口相一致的迭代器。我们可以使用STL的容器（比如list、vector、string等）提供的迭代器，或者，也可以用自己写的。迭代器可以和一个指针一样简单（比如char const*），而在光谱的另一端，它可以相当复杂；比如一个封装诸如LEX这样的词法分析器的迭代器适配器。

The Free Parse Functions

自由分析函数

The framework provides a couple of free functions to make parsing a snap. These parser functions have two forms. The first form works on the character level. The second works on the phrase level and asks for a skip parser.

框架提供了一些自由函数，以进行短小的分析。这些分析函数有两种形式。第一种工作于字符级别，第二种则工作于短句级别，并且需要一个跳跃分析器

The skip parser is just about any parser primitive or composite. Its purpose is to move the scanner's first iterator to valid tokens by skipping white spaces. In C for instance, the tab '/t', the newline '/n', return '/r', space ' ' and characters inside comments /*...*/ are considered as white spaces.

忽略分析器可以是任意的元素或者合成分析器。它的目的是跳过空白字符，把扫描器的first迭代器移动到有效的记号上。比如在C语言里，制表符'/t'、回车'/r'、空格' '以及注释内的字符/*...*/被认为是空白字符。

Character level parsing

字符级别分析

    template <typename IteratorT, typename DerivedT>
    parse_info<IteratorT>
    parse
    (
        IteratorT const&        first,
        IteratorT const&        last,
        parser<DerivedT> const& p
    );

    template <typename CharT, typename DerivedT>
    parse_info<CharT const*>
    parse
    (
        CharT const*            str,
        parser<DerivedT> const& p
    );

There are two variants. The first variant accepts a first, last iterator pair like you do STL algorithms. The second variant accepts a null terminated string. The last argument is a parser p which will be used to parse the input.

这里存在两种变体，第一个变体接受first、last这对迭代器，就像你在STL算法里面干的一样。第二个变体接受一个零终结的字符串。最后一个参数是一个将用于分析输入的分析器。

Phrase level parsing

短句级别的分析

    template <typename IteratorT, typename ParserT, typename SkipT>
    parse_info<IteratorT>
    parse
    (
        IteratorT const&        first,
        IteratorT const&        last,
        parser<ParserT> const&  p,
        parser<SkipT> const&    skip
    );

    template <typename CharT, typename ParserT, typename SkipT>
    parse_info<CharT const*>
    parse
    (
        CharT const*            str,
        parser<ParserT> const&  p,
        parser<SkipT> const&    skip
    );

Like above, there are two variants. The first variant accepts a first, last iterator pair like you do STL algorithms. The second variant accepts a null terminated string. The argument p is the parser which will be used to parse the input. The last argument skip is the skip parser.

和上面一样，这里也有两个变体。第一个变体接受first、last这对迭代器，就像你在STL算法里面干的一样。第二个变体接受一个零终结的字符串。参数p是一个将用于分析输入的分析器。最后一个参数skip就是跳跃分析器。

The parse_info structure

parse_info结构

The functions above return a parse_info structure parameterized by the iterator type passed in. The parse_info struct has these members:

上面的函数返回一个以传入的迭代器类型为模板参数的parse_info结构。parse_info有如下成员：

parse_info
stop	Points to the final parse position (i.e The parser recognized and processed the input up to this point) 指向最终的分析位置（比如，分析器识别和处理的输入到这里为止）
hit	True if parsing is successful. This may be full: the parser consumed all the input, or partial: the parser consumed only a portion of the input. 如果分析成功，则为真。这个成功有可能是完全的：分析器消耗了全部的输入，或者部分的：分析器只消耗了一部分输入
full	True when we have a full match (i.e The parser consumed all the input). 如果完全匹配则为真（比如分析器消耗了全部输入）
length	The number of characters consumed by the parser. This is valid only if we have a successful match (either partial or full). 分析器消耗的字符数目。只有在成功匹配（无论是部分的或者全部的）时，这个值才有效。

The phrase_scanner_t and wide_phrase_scanner_t

phrase_scanner_t 和 wide_phrase_scanner_t

For convenience, Spirit declares these typedefs:

方便起见，Spirit声明了这些typedef：

    typedef scanner<char const*, unspecified> phrase_scanner_t;
    typedef scanner<wchar_t const*, unspecified> wide_phrase_scanner_t;

These are the exact scanner types used by Spirit on calls to the parse function passing in a char const* (C string) or a wchar_t const* (wide string) as the first parameter and a space_p as skip-parser (the third parameter). For instance, we can use these typedefs to declare some rules. Example:

这些是Spirit里那些被传入char const*(C字符串)或者wchar_t const*(宽字符串)作为第一个参数，以及以一个space_p作为跳跃分析器（第三个参数）的分析函数所之用的扫描器的准确类型。比如，我们可以使用这些typedf去声明一些规则。例子：

    rule<phrase_scanner_t> my_rule;
    parse("abrakadabra", my_rule, space_p);

Direct parsing with Iterators

使用迭代器直接分析

The free parse functions make it easy for us. By using them, we need not bother with the scanner intricacies. The free parse functions hide the dirty details. However, sometime in the future, we will need to get under the hood. It's nice that we know what we are dealing with when that need comes. We will need to go low-level and call the parser's parse member function directly.

自由分析函数把这件事情变简单了。通过使用他们，我们不必为分析器的复杂细节烦神。自由分析函数隐藏了那些肮脏的细节。然而，将来的某些时候，我们将需要更深入底层。在需求来临时，我们知道我们对付的是什么，这样会比较好。我们将需要深入底层并直接调用分析器的parse成员函数。

If we wish to work on the character level, the procedure is quite simple:

如果我们想在字符层次上工作，那么过程很简单：

    scanner<IteratorT> scan(first, last);

    if (p.parse(scan))
    {
        //  Parsed successfully. If first == last, then we have
        //  a full parse, the parser recognized the input in whole.
    }
    else
    {
        //  Parsing failure. The parser failed to recognize the input
    }

The scanner position on an unsucessful match
匹配不成功时扫描器的位置

On a successful match, the input is advanced accordingly. But what happens on an unsuccessful match? Be warned. It might be intuitive to think that the scanner position is reset to its initial position prior to parsing. No, the position is not reset. On an unsuccessful match, the position of the scanner is undefined! Usually, it is positioned at the farthest point where the error was found somewhere down the recursive descent. If this behavior is not desired, you may need to position the scanner yourself. The example in the numerics chapter illustrates how the scanner position can be saved and later restored.
在成功匹配的情况下，输入会一直步进。但如果匹配没成功会发生什么？注意，也许你会直觉地认为扫描器的位置会被重置到上个分析阶段的初始位置。但是不对，它的位置不会被重置。在匹配不成功的情况下，扫描器的位置是未定义的！通常，它会被放到在递归下降过程中所产生的最深的一个错误那儿。如果你不想要这种行为，那就要自己改变分析器的位置。在数值分析里的例子描绘了如何保存分析器的位置并在之后恢复。

Where p is the parser we want to use, and first/last are the iterator pairs referring to the input. We just create a scanner given the iterators. The scanner type we will use here uses the default scanner_policies<>.

The situation is a bit more complex when we wish to work on the phrase level:

这里p是我们想使用的分析器，而first/last是指向输入的迭代器对。我们只是创建了一个使用这两个迭代器的分析器。这里使用的分析器类使用的是默认的scanner_policies<>。

当我们想在短句层次上工作时，情况就有些复杂了：

    typedef skip_parser_iteration_policy<SkipT> iter_policy_t;
    typedef scanner_policies<iter_policy_t> scanner_policies_t;
    typedef scanner<IteratorT, scanner_policies_t> scanner_t;

    iter_policy_t iter_policy(skip);
    scanner_policies_t policies(iter_policy);
    scanner_t scan(first, last, policies);

    if (p.parse(scan))
    {
        //  Parsed successfully. If first == last, then we have
        //  a full parse, the parser recognized the input in whole.
    }
    else
    {
        //  Parsing failure. The parser failed to recognize the input
    }

Where SkipT is the type of the skip-parser, skip. Again, p is the parser we want to use, and first/last are the iterator pairs referring to the input. Given a skip-parser type SkipT, skip_parser_iteration_policy creates a scanner iteration policy that skips over portions that are recognized by the skip-parser. This may then be used to create a scanner. The scanner_policies class wraps all scanner related policies including the iteration policies.

这里SkipT是跳跃分析器的类型。同样，p是我们要用的分析器，而first/last是指向输入的迭代器对。给与一个类型为SkipT的跳跃分析器，则skip_parser_iteration_policy将创建一个会跳过被跳跃分析器所匹配的内容的分析迭代策略。 scanner_policies类封装包括迭代策略在内的所有与分析相关的策略。

lexeme_scanner

When switching from phrase level to character level parsing, the lexeme_d (see directives.html) does its magic by disabling the skipping of white spaces. This is done by tweaking the scanner. However, when we do this, all parsers inside the lexeme gets a transformed scanner type. This should not be a problem in most cases. However, when rules are called inside the lexeme_d, the compiler will choke if the rule does not have the proper scanner type. If a rule must be used inside a lexeme_d, the rule's type must be:

当分析层次从字符层转向短句层时，lexeme_d (查阅 directives.html) 通过禁止跳过空白字符来完成它的魔法。这是通过调整扫描器来完成的。然而，当我们这么做时，所有在lexeme_d里的分析器都会得到一个变形的分析器类。在大多数情况下，这并不是问题。可是，当规则在 lexeme_d内时，如果它没有恰当的分析器类型，那么编译器会在此受阻。如果非要在lexeme_d内使用规则，那么规则的类型必须是：

    rule<lexeme_scanner<ScannerT>::type> r;

where ScannerT is the actual type of the scanner used. Take note that lexeme_scanner will only work for phrase level scanners.
这里ScannerT是所使用的分析器的真正类型。注意lexeme_scanner将只在短句层工作。

as_lower_scanner

Similarly, the as_lower_d does its work by filtering and converting all characters received from the scanner to lower case. This is also done by tweaking the scanner. Then again, all parsers inside the as_lower_d gets a transformed scanner type. If a rule must be used inside a as_lower_d, the rule's type must be:

类似的，as_lower_d通过把所有从分析器收到的字符转成小写来完成自己的工作。这同样是通过调整扫描器来完成的。仍旧，所有在as_lower_d里面的分析器都会得到一个变形的分析器类。如果要在as_lower_d内使用规则，那么规则的类型必须是：

    rule<as_lower_scanner<ScannerT>::type> r;

where ScannerT is the actual type of the scanner used.

ScannerT是所使用的扫描器的实际类型。

See the techniques section for an example of a grammar using a multiple scanner enabled rule, lexeme_scanner and as_lower_scanner.

在技术章节内有关于使用多扫描器规则、lexeme_scanner和as_lower_scanner的语法的例子。

no_actions_scanner

Again, no_actions_d directive tweaks the scanner to disable firing semantic actions. Like before, all parsers inside the no_actions_d gets a transformed scanner type. If a rule must be used inside a no_actions_d, the rule's type must be:

又是一样，no_actions_d定向器也是通过调整分析器来禁止语义动作。就像前面的，在no_actions_d中的所有分析器得到的也是变形的分析器类。如果想在no_actions_d里使用规则，规则的类型必须是：

    rule<no_actions_scanner<ScannerT>::type> r;

where ScannerT is the actual type of the scanner used.

ScannerT是所使用的分析器的实际类型。

Be sure to add "typename" before lexeme_scanner, as_lower_scanner and no_actions_scanner when these are used inside a template class or function.
记住当在模板类或函数中使用lexeme_scanner、as_lower_scanner、和no_actions_scanner时，在它们前加“typename”。

See no_actions.cpp. This is part of the Spirit distribution.

可以查看no_actions.cpp。这是Spirit发布包的一部分。

Copyright © 1998-2003 Joel de Guzman

Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)