Operators 操作符 |
Operators are used as a means for object composition and embedding. Simple parsers may be composed to form composites through operator overloading, crafted to approximate the syntax of an Extended Backus-Normal Form (EBNF) variant. An expression such as:
操作符用于对象的组合和内嵌。简单的分析器可以经由操作符重载组合而成合成分析器,构造类似于EBNF的语法变体。诸如下面的表达式:
a | b
actually yields a new parser type which is a composite of its operands, a and b. Taking this example further, if a and b were of type chlit<>, the result would have the composite type:
实际上构成一个新的分析器,它的类型由两个操作子组合而成,a与b。在这例子上更进一步,如果a和b的类型是chlit<>,那么结果就是一个合成的类:
alternative<chlit<>, chlit<> >
In general, for any binary operator, it will take its two arguments, parser1 and parser2, and create a new composed parser of the form
总而言之,对任意二元操作符,该操作符都接受两个参数,parser1和parser2,并创建一个新的合成分析器,形式为:
op<parser1, parser2>
where parser1 and parser2 can be arbitrarily complex parsers themselves, with the only limitations being what your compiler imposes.
这里parser1和parser2可以为任意复杂的分析器,他们的复杂度仅取决于编译期的限制。
Set Operators
集合操作符
Set operators 集合操作符 | ||
a | b | Union 或 | Match a or b. Also referred to as alternative 匹配a或b。也叫选择操作符 |
a & b | Intersection 且 | Match a and b 同时匹配a和b |
a - b | Difference 差 | Match a but not b. If both match and b's matched text is shorter than a's matched text, a successful match is made 匹配a,但并不匹配b。如果二者同时匹配且b的匹配串长度小于a的匹配串长度,那么认为匹配成功。 |
a ^ b | XOR 异或 | Match a or b, but not both 匹配a或b,但不同时匹配二者 |
Short-circuiting
短路
Alternative operands are tried one by one on a first come first served basis starting from the leftmost operand. After a successfully matched alternative is found, the parser concludes its search, essentially short-circuiting the search for other potentially viable candidates. This short-circuiting implicitly gives the highest priority to the leftmost alternative.
选择符自左向右以先到先得的方式一个个测试它的操作子。当找到一个正确匹配之后,分析器就结束搜索,从而彻底停止搜索潜在的匹配,也就是短路搜索。这种短路隐式地给予最左边地选项以最高地优先级。
Short-circuiting is done in the same manner as C or C++'s logical expressions; e.g. if (x < 3 || y < 2) where, if x evaluates to be less than 3, the y < 2 test is not done at all. In addition to providing an implicit priority rule for alternatives which is necessary, given the non-deterministic nature of the Spirit parser compiler, short-circuiting improves the execution time. If the order of your alternatives is logically irrelevant, strive to put the (expected) most common choice first for maximum efficiency.
这种短路在C/C++的表达式中同样存在:比如if(x<3||y<2)这个表达式里,如果x小于3成立,那么y<2这个条件根本就不会被测试。短路除了给予选项必要地隐式的优先级规则,还赋予Spirit分析器非确定性行为,从而缩短了执行时间。如果你的选项的迅速在与表达式的逻辑没有关系,那么尽可能的把最可能出现的匹配项放在最前面可以将效率最大化。
Intersections 交集 Some researchers assert that the intersections (e.g. a & b) let us define context sensitive languages ("XBNF" [citing Leu-Weiner, 1973]). "The theory of defining a language as the intersection of a finite number of context free languages was developed by Leu and Weiner in 1973". 某些研究者断言交集(比如a&b)使我们能够定义语境相关的语法("XBNF" [citing Leu-Weiner, 1973])。“定义一个语言,使之成为有限语境无关语言的交集的理论,由Leu 和 Weiner在1973年提出”。 ~ Operator 反 The complement operator ~ was originally put into consideration. Further understanding of its value and meaning leads us to uncertainty. The basic problem stems from the fact that ~a will yield U-a, where U is the universal set of all strings. However, where it makes sense, some parsers can be complemented (see the primitive character parsers for examples). 最初是打算使用反操作符~的。但是随着对它的值和含义的进一步理解,我们变得没那么确定了。问题来源于这里:U-a,U是所有串的合集。然而,对某些分析器来说,求反是有意义的(相关例子在单字符分析器)。 |
Sequencing Operators
序列操作符
Sequencing operators 序列操作符 | ||
a >> b | Sequence 并置 | Match a and b in sequence 顺序匹配a和b |
a && b | Sequential-and 顺序与 | Sequential-and. Same as above, match a and b in sequence 顺序与,与上面类似,顺序匹配a与b |
a || b | Sequential-or 顺序或 | Match a or b in sequence 顺序匹配a或b |
The sequencing operator >> can alternatively be thought of as the sequential-and operator. The expression a && b reads as match a and b in sequence. Continuing this logic, we can also have a sequential-or operator where the expression a || b reads as match a or b and in sequence. That is, if both a and b match, it must be in sequence; this is equivalent to a >> !b | b.
可以认为并置符>>和顺序与一样。表达式a&&b念作顺序匹配a与b。照着这个逻辑,我们就有了顺序或操作符,它的表达式a||b念做顺序匹配a或b。意及,如果a与b都匹配了,那么他们一定是按顺序匹配的。这个表达式等价于a>>!b|b
Optional and Loops
可选和循环
Optional and Loops 可选和循环 | ||
*a | Kleene star 克林闭包 | Match a zero (0) or more times 匹配a 零到任意多次 |
+a | Positive 加号 | Match a one (1) or more times 匹配a 一到任意多次 |
!a | Optional 可选 | Match a zero (0) or one (1) time 匹配a零次或一次 |
a % b | List 列表 | Match a list of one or more repetitions of a separated by occurrences of b. This is the same as a >> *(b >> a). Note that a must not also match b 匹配出现任意多a的列表,表中的各项a由b分开。等价于a>>*(b>>a)。注意a的匹配项中必须不包含b |
If we look more closely, take note that we generalized the optional expression of the form !a in the same category as loops. This is logical, considering that the optional matches the expression following it zero (0) or one (1) time.
如果看得更近一些,会注意到!a这种形式的可选符被划分到循环这类里。这是逻辑上的划分,可以认为可选符循环它的操作子0或1次
Primitive type operands
基本类型操作子
For binary operators, one of the operands but not both may be a char, wchar_t, char const* or wchar_t const*. Where P is a parser object, here are some examples:
对任意二元操作符,他们操作子中的一个,且只有一个可以是char,wchar_t,char const*或wchar_t const*。另一个操作子则是分析器。下面是一些例子:
P | 'x'
P - L"Hello World"
'x' >> P
"bebop" >> P
It is important to emphasize that C++ mandates that operators may only be overloaded if at least one argument is a user-defined type. Typically, in an expression involving multiple operators, explicitly typing the leftmost operand as a parser is enough to cause propagation to all the rest of the operands to its right to be regarded as parsers. Examples:
必须强调,C++要求只有在至少一个操作子为用户定义类型时,才允许操作符重载。一般而言,在一个包涵多个操作符的表达式中,显式地将最左边地操作子的类型生命为分析器就足够使其他的操作子也被视为分析器了。例如:
r = 'a' | 'b' | 'c' | 'd'; // ill formed
r = ch_p('a') | 'b' | 'c' | 'd'; // OK
The second case is parsed as follows:
第二个表达式是这样分析的:
r (((chlit<char> | char) | char) | char)
a (chlit<char> | char)
r (((a) | char) | char)
b (a | char)
r (((b)) | char)
c (b | char)
r (((c)))
Operator precedence and grouping
操作符优先级和分组
Since we are defining our meta-language in C++, we follow C/C++'s operator precedence rules. Grouping expressions inside the parentheses override this (e.g., *(a | b) reads: match a or b zero (0) or more times).
由于我们的元语言是在C++里定义的,我们必须遵守C/C++的操作符优先级规则。把表达式用括号分组则可超越这个规则。比如,*(a|b)念做匹配a或b零到任意多次。
Copyright © 1998-2003 Joel de Guzman
Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)