The Lazy Parser 懒惰分析器 |
Closures are cool. It allows us to inject stack based local variables anywhere in our parse descent hierarchy. Typically, we store temporary variables, generated by our semantic actions, in our closure variables, as a means to pass information up and down the recursive descent.
闭包是酷的。它使得我们可以在下降分析的任意层次注入基于栈的变量。典型地,我们在闭包里储存由我们的语义动作产生的临时变量,以之作为在递归下降中向上或向下传递信息的方法。
Now imagine this... Having in mind that closure variables can be just about any type, we can store a parser, a rule, or a pointer to a parser or rule, in a closure variable. Yeah, right, so what?... Ok, hold on... What if we can use this closure variable to initiate a parse? Think about it for a second. Suddenly we'll have some powerful dynamic parsers! Suddenly we'll have a full round trip from to Phoenix and Spirit and back! Phoenix semantic actions choose the right Spirit parser and Spirit parsers choose the right Phoenix semantic action. Oh MAN, what a honky cool idea, I might say!!
现在想想这个……记住闭包变量可以是任意类型,我们可以在闭包变量里储存一个分析器、规则、或者规则或分析器的指针。没错,然后呢?……OK,稍等……假如我们用这个闭包变量来开始一次分析呢?想想这个吧。突然我们拥有了某些强大的动态分析器!我们一下子拥有了从Spirt到Phoenix再回来的整条路线。Phoenix语义动作选择正确的Spirit语义动作而Spirit语义动作又反过来选择正确的Phoenix语义动作。天,真是个好主意,我得这么说!
lazy_p
This is the idea behind the lazy_p parser. The lazy_p syntax is:
这是lazy_p 幕后的构思。lazy_p 的语法是:
lazy_p(actor)
where actor is a Phoenix expression that returns a Spirit parser. This returned parser is used in the parsing process.
这里actor是Phoenix表达式,返回Spirit分析器。所返回的分析器将用于分析过程中。
Example:
例子:
lazy_p(phoenix::val(int_p))[assign_a(result)]
Semantic actions attached to the lazy_p parser expects the same signature as that of the returned parser (int_p, in our example above).
挂接于lazy_p分析器上的语义动作的签名要求与所返回的分析器一致。(int_p,在上面的例子里)。
lazy_p example
lazy_p例子
To give you a better glimpse (see the lazy_parser.cpp), say you want to parse inputs such as:
为了更深入了解(见lazy_parser.cpp),假设你想分析如下输入:
dec {
1 2 3
bin {
1 10 11
}
4 5 6
}
where bin {...} and dec {...} specifies the numeric format (binary or decimal) that we are expecting to read. If we analyze the input, we want a grammar like:
这里bin {...} 和 dec {...}区分了我们想读入的数字的格式(二进制和十进制)。如果我们要分析这个输入,那么语法是这样的:
base = "bin" | "dec"; block = base >> '{' >> *block_line >> '}'; block_line = number | block;
We intentionally left out the number
rule. The tricky part is that the way number rule behaves depends on the result of the base rule. If base got a "bin", then number should parse binary numbers. If base got a "dec", then number should parse decimal numbers. Typically we'll have to rewrite our grammar to accomodate the different parsing behavior:
我们有意把number规则留空。精妙的部分在于number规则的行为取决于base规则的结果。如果base分析得到"bin",则number将分析二进制数。如果base分析得到"dec",则number将分析十进制数。通常我们需要重写语法以适应不同的分析行为:
block = "bin" >> '{' >> *bin_line >> '}' | "dec" >> '{' >> *dec_line >> '}' ; bin_line = bin_p | block; dec_line = int_p | block;
while this is fine, the redundancy makes us want to find a better solution; after all, we'd want to make full use of Spirit's dynamic parsing capabilities. Apart from that, there will be cases where the set of parsing behaviors for our number rule is not known when the grammar is written. We'll only be given a map of string descriptors and corresponding rules [e.g. (("dec", int_p), ("bin", bin_p) ... etc...)].
虽然这样也不错,但其中的冗余使得我们想寻找更好的解决办法;最后,我们还想完全利用Spirit的动态分析能力。此外,还有我们的number规则需要解析的数字的类别在书写语法时还是未知的情况。那我们只能指定一个描述字符串和相应规则的映射[比如(("dec", int_p), ("bin", bin_p) ... 其他...)]。
The basic idea is to have a rule for binary and decimal numbers. That's easy enough to do (see numerics). When base is being parsed, in your semantic action, store a pointer to the selected base in a closure variable (e.g. block.int_rule). Here's an example:
基本的原理是用一个规则解析二进制和十进制的数字。这很容易做到(见数值)。当base分析过后,在你的语义动作里把一个指向所选择的进制的指针储存于一个闭包变量中(比如 block.int_rule)。这是一个例子:
base = str_p("bin")[block.int_rule = &var(bin_rule)] | str_p("dec")[block.int_rule = &var(dec_rule)] ;
With this setup, your number rule will now look something like:
有了这个,你的number规则看起来就像:
number = lazy_p(*block.int_rule);
The lazy_parser.cpp does it a bit differently, ingeniously using the symbol table to dispatch the correct rule, but in essence, both strategies are similar. This technique, using the symbol table, is detailed in the Techiques section: nabialek_trick. Admitedly, when you add up all the rules, the resulting grammar is more complex than the hard-coded grammar above. Yet, for more complex grammar patterns with a lot more rules to choose from, the additional setup is well worth it.
lazy_parser.cpp所做的有些不同,它聪明地使用了符号表来分派对应的规则,但本质上,这两种策略是相同的。这个技术,符号表的使用,其细节在技术章节:nabialek戏法。要承认的是,你添加以上全部规则后,所产生的语法要比原来的语法更加复杂。但,在需要选择更多的分析样式的更复杂的语法中,这种代价是值得的。
Copyright © 2003 Joel de Guzman
Copyright © 2003 Vaclav Vesely
Use, modification and distribution is subject to the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
Powered by Zoundry