Boost.Spirit用户手册翻译（15）：语义动作

最新推荐文章于 2023-01-03 16:37:23 发布

qingcairousi

最新推荐文章于 2023-01-03 16:37:23 发布

阅读量1.2k

点赞数

分类专栏： boost C++ 文章标签： semantic functor iterator action function primitive

boost 同时被 2 个专栏收录

24 篇文章 0 订阅

订阅专栏

C++

22 篇文章 0 订阅

订阅专栏

Semantic Actions

语义动作

Semantic actions have the form: expression[action]

语义动作有如下格式：表达式[动作]

Ultimately, after having defined our grammar and having generated a corresponding parser, we will need to produce some output and do some work besides syntax analysis; unless, of course, what we want is merely to check for the conformance of an input with our grammar, which is very seldom the case. Semantic actions may be attached to any expression at any level within the parser hierarchy. An action is a C/C++ function or function object that will be called if a match is found in the particular context where it is attached. The action function serves as a hook into the parser and may be used to, for example:

总的来说，在定义了我们的语法并生成相应的分析器之后，我们就需要在语法分析之外产生某些输出或者其他的工作；当然，除非我们仅仅想验证输入与语法的一致性，只是这是几乎不可能出现的情况。语义动作可以挂接到分析器内任意层次的任意表达式上。一个动作就是一个在其挂接的特定语境下产生匹配时被调用的C/C++函数或者函数对象。一个动作函数就像钩入分析器内并且可能被使用的钩子。比如：

Generate output from the parser (ASTs, for example)

从分析器产生输出（AST树，比如）
Report warnings or errors

报告警告或者错误
Manage symbol tables

管理符号表

Generic Semantic Actions (Transduction Interface)

一般语义动作

A generic semantic action can be any free function or function object that is compatible with the interface:

一个一般的语义动作可以是与接口兼容的任意的自由函数或者函数对象：

    void f(IteratorT first, IteratorT last);

where IteratorT is the type of iterator used, first points to the current input and last points to one after the end of the input (identical to STL iterator ranges). A function object (functor) should have a member operator() with the same signature as above:

这里IteratorT是所使用的迭代器类型，first指向当前输入而last指向输入终点的后一个位置（与STL迭代器的范围一致）。函数对象（仿函数）则应有与上面签名一致的成员函数operator()：

    struct my_functor
    {
        void operator()(IteratorT first, IteratorT last) const;
    };

Iterators pointing to the matching portion of the input are passed into the function/functor.

指向输入的迭代器将被传入函数/仿函数

In general, semantic actions accept the first-last iterator pair. This is the transduction interface. The action functions or functors receive the unprocessed data representing the matching production directly from the input. In many cases, this is sufficient. Examples are source to source translation, pre-processing, etc.

总而言之，语义动作接受first-last迭代器对。这是一个传导接口。动作函数或者仿函数直接从输入中接收代表匹配生成式的未处理的数据。在很多情况下，这已经足够了。比如从代码到代码的翻译、预处理等等。

Example:

例子：

    void
    my_action(char const* first, char const* last)
    {
        std::string str(first, last);
        std::cout << str << std::endl;
    }

    rule<> myrule = (a | b | *(c >> d))[&my_action];

The function my_action will be called whenever the expression (a | b | *(c >> d) matches a portion of the input stream while parsing. Two iterators, first and last, are passed into the function. These iterators point to the start and end, respectively, of the portion of input stream where the match is found.

函数my_action在表达式(a | b | *(c >> d)匹配输入流中的某一段时将被调用。两个迭代器，first和last，被传入函数。相应的，这些迭代器指向输入流中匹配的片段的起点和终点。

Const-ness:

不变性：

With functors, take note that the operator() should be const. This implies that functors are immutable. One may wish to have some member variables that are modified when the action gets called. This is not a good idea. First of all, functors are preferably lightweight. Functors are passed around a lot and it would incur a lot of overhead if the functors are heavily laden. Second, functors are passed by value. Thus, the actual functor object that finally attaches to the parser, will surely not be the original instance supplied by the client. What this means is that changes to a functor's state will not affect the original functor that the client passed in since they are distinct copies. If a functor needs to update some state variables, which is often the case, it is better to use references to external data. The following example shows how this can be done:

使用仿函数时，注意operator()需要声明为const。暗示了仿函数是不可变的。有人也许希望当动作被调用时可以改变某些成员变量。这不是个好主意。首先，仿函数应该是轻量级的。仿函数被反复传递，如果它有沉重的负载，那将产生大量的额外消耗。其次，仿函数是传值的。因此，真正挂接到分析器上的函数对象，肯定不是由客户程序提供的最初的实例。这意味着仿函数状态的改变无法影响到由客户程序传入的原始的仿函数，因为他俩是不同的拷贝。如果一个仿函数需要更新某些状态变量，这是很常见的，最好还是引用外部数据。下面的例子说明了如何办到这些：

    struct my_functor
    {
        my_functor(std::string& str_)
        : str(str_) {}

        void
        operator()(IteratorT first, IteratorT last) const
        {
            str.assign_a(first, last);
        }

        std::string& str;
    };

Full Example:

完整例子：

Here now is our calculator enhanced with semantic actions:

这是关于语义动作如何增强我们的迭代器的：

    namespace
    {
        void    do_int(char const* str, char const* end)
        {
            string  s(str, end);
            cout << "PUSH(" << s << ')' << endl;
        }

        void    do_add(char const*, char const*)    { cout << "ADD/n"; }
        void    do_subt(char const*, char const*)   { cout << "SUBTRACT/n"; }
        void    do_mult(char const*, char const*)   { cout << "MULTIPLY/n"; }
        void    do_div(char const*, char const*)    { cout << "DIVIDE/n"; }
        void    do_neg(char const*, char const*)    { cout << "NEGATE/n"; }
    }

We augment our grammar with semantic actions:

我们给我们的语法增加语义动作：

    struct calculator : public grammar<calculator>
    {
        template <typename ScannerT>
        struct definition
        {
            definition(calculator const& self)
            {
                expression
                    =   term
                        >> *(   ('+' >> term)[&do_add]
                            |   ('-' >> term)[&do_subt]
                            )
                    ;

                term =
                    factor
                        >> *(   ('*' >> factor)[&do_mult]
                            |   ('/' >> factor)[&do_div]
                            )
                        ;

                factor
                    =   lexeme_d[(+digit_p)[&do_int]]
                    |   '(' >> expression >> ')'
                    |   ('-' >> factor)[&do_neg]
                    |   ('+' >> factor)
                    ;
            }

            rule<ScannerT> expression, term, factor;

            rule<ScannerT> const&
            start() const { return expression; }
        };
    };

Feeding in the expression (-1 + 2) * (3 + -4), for example, to the rule expression will produce the expected output:

比如把表达式(-1 + 2) * (3 + -4)喂给规则expression，将产生所预期的输出：

-1
2
ADD
3
-4
ADD
MULT

which, by the way, is the Reverse Polish Notation (RPN) of the given expression, reminiscent of some primitive calculators and the language Forth.

此即上面表达式的逆波兰式（RPN），类似于某些低等计算器或Forth语言。

View the complete source code here. This is part of the Spirit distribution.

在这里查看完整代码。这是Spirit发布包的一部分。

Specialized Actions

特殊语义动作

In general, semantic actions accept the first-last iterator pair. There are situations though where we might want to pass data in its processed form. A concrete example is the numeric parser. It is unwise to pass unprocessed data to a semantic action attached to a numeric parser and just throw away what has been parsed by the parser. We want to pass the actual parsed number.

一般来说，语义动作接受first-last迭代器对。但在某些情况下我们希望以已被处理的格式传入数据。一个典型的例子就是数值分析器。直接把未处理的数据传给挂接在数值分析器上的语义动作并扔掉刚刚被分析的数据是不明智的。我们希望传递解析出的实际数值。

The function and functor signature of a semantic action varies depending on the parser where it is attached to. The following table lists the parsers that accept unique signatures.

语义动作的函数和仿函数的签名的变化取决于它被挂接到哪个分析器上。下面的表格列举了接受一元函数签名的分析器。

Unless explicitly stated in the documentation of a specific parser type, parsers not included in the list by default expect the generic signature as explained above.

除非文档中明确指出的分析器类型，否则文档所不包含的分析器都接受上面所说的一般形式的签名。

Numeric Actions

数值动作

Applies to:

对应于：

uint_p
int_p
ureal_p
real_p

Signature for functions:

函数签名：

    void func(NumT val);

Signature for functors:

仿函数签名：

struct ftor { void operator()(NumT val) const; };

Where NumT is any primitive numeric type such as int, long, float, double, etc., or a user defined numeric type such as big_int. NumT is the same type used as template parameter to uint_p, int_p, ureal_p or real_p. The parsed number is passed into the function/functor.

这里NumT是任意内生类性，诸如int、long、float、double等等，或者用户定义的数据类型诸如big_int。NumT即uint_p、int_p、ureal_p或real_p所使用的模板参数的类型。被解析出的数字将被传入函数/仿函数。

Character Actions

字符动作

Applies to:

对应于：

chlit, ch_p
range, range_p
anychar
alnum, alpha
cntrl, digit
graph, lower
print, punct
space, upper
xdigit

Signature for functions:

函数签名：

    void func(CharT ch);

Signature for functors:

仿函数签名：

    struct ftor
    {
        void operator()(CharT ch) const;
    };

Where CharT is the value_type of the iterator used in parsing. A char const* iterator for example has a value_type of char. The matching character is passed into the function/functor.

这里CharT即分析中所使用的迭代器的value_type。比如一个char const* 迭代器有值为char的value_type。匹配的字符将被传入函数/仿函数。

Cascading Actions

级联动作

Actions can be cascaded. Cascaded actions also inherit the function/functor interface of the original. For example:

动作可以级联。级联的动作也是继承原始的函数/仿函数的接口。比如：

    uint_p[fa][fb][fc]

Here, the functors fa, fb and fc all expect the signature void operator()(unsigned n) const.

这里，函数fa,fb和fc都需要void operator()(unsigned n) const的成员函数签名。

Directives and Actions

定向器和动作

Directives inherit the the function/functor interface of the subject it is enclosing. Example:

定向器继承其所封装的对象的函数接口。比如：

    as_lower_d[ch_p('x')][f]

Here, the functor f expects the signature void operator()(char ch) const, assuming that the iterator used is a char const*.

这里，仿函数需要 void operator()(char ch) const这个签名，假设所使用的迭代器是char const*。

Copyright © 1998-2003 Joel de Guzman

Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)