frida 挂钩_实现挂钩功能

最新推荐文章于 2021-02-26 14:02:50 发布

weixin_26705651

最新推荐文章于 2021-02-26 14:02:50 发布

阅读量233

点赞数

文章标签： python

原文链接：https://medium.com/@gvanrossum_83706/implementing-peg-features-76caa4b2151f

版权

frida 挂钩

After making my PEG parser generator self-hosted in the last post, I’m now ready to show how to implement various other PEG features.

在上一篇文章中使PEG分析器生成器自托管后，我现在准备展示如何实现各种其他PEG功能。

[This is part 8 of my PEG series. See the Series Overview for the rest.]

[这是我的PEG系列的第8部分。请参见系列概述。]

We’ll cover the following PEG features:

我们将介绍以下PEG功能：

Named items: NAME=item (the name can be used in an action)
命名项目： NAME=item (该名称可以在操作中使用)
Lookaheads: &item (positive) and !item (negative)
前瞻： &item (正)和!item (负)
Grouping items in parentheses: (item item ...)
将括号中的(item item ...)分组：( (item item ...)
Optional items: [item item ...] and item?
可选项目： [item item ...]和item?
Repeated items:item* (zero or more) and item+ (one or more)
重复的项目： item* (零个或多个)和item+ (一个或多个)

Let’s start with named items. These are handy when we have multiple items in one alternative that refer to the same rule, like this:

让我们从命名项目开始。当我们在一个备选方案中有多个项引用相同的规则时，这些方法很方便，例如：

expr: term '+' term

Our generator allows us to refer to the second term by appending a digit 1 to the variable name, so we could just write the action like this:

我们的生成器允许我们通过在变量名称后添加数字1来引用第二term ，因此我们可以这样编写操作：

expr: term '+' term { term + term1 }

But this is not to everyone’s liking, and I would personally prefer to be able to write something like this:

但这并不是每个人都喜欢，我个人更希望能够编写如下内容：

expr: left=term '+' right=term { left + right }

It’s easy to support this in the meta-grammar by changing the rule for item as follows:

通过如下更改item的规则，很容易在元语法中对此进行支持：

item:
    | NAME = atom { NamedItem(name.string, atom) }
    | atom { atom }
atom:
    | NAME { name.string }
    | STRING { string.string }

(Where atom is just the olditem.)

( atom只是旧item 。)

This requires us to add a definition of the NamedItem class to grammar.py, which is another one of those data classes you’ve heard so much about — this one having attributes name and item.

这就要求我们要添加的定义NamedItem类grammar.py ，这是你已经听到了很多关于这些数据类的另一个-这其中有属性name和item 。

We also need to make some small changes to the code generator, which I’ll leave as an exercise for the reader (or you can check my repo :-). The generated code will now assign the result of matching each item to a variable with the indicated name, rather than a name derived from the item’s rule name. This also works for items that are tokens (either of the form NAME or string literals like ':=').

我们还需要对代码生成器进行一些小的更改，我将作为练习留给读者(或者您可以查看我的repo :-)。现在，生成的代码将把每个项目的匹配结果分配给具有指定名称的变量，而不是从该项目的规则名称派生的名称。这也适用于标记的项目(格式为NAME或字符串文字，如':=' )。

Next up are lookaheads. You may be familiar with these from regular expressions. A lookahead can reject or accept the current alternative based on what follows it, without the input pointer further. A positive lookahead accepts if its item is recognized; a negative lookahead accepts if its item is not recognized.

接下来是前瞻。您可能对正则表达式很熟悉。前瞻可以根据后续选择拒绝或接受当前选择，而无需进一步输入指针。积极的前瞻接受是否认可其项目；否定的前瞻如果未识别其项目，则接受。

Lookaheads can actually be used as a cleaner way to address the mess with parsing actions I wrote about in the previous episode: Rather than allowing actions to reject an already accepted alternative by returning None, we can prefix the OP item with a lookahead that ensures it’s not "}". The old rule for stuff ended like this:

实际上，前瞻可以用作一种更干净的方式来解决我在上一集中写到的解析操作中的混乱情况：与其通过返回None而不是允许操作拒绝已被接受的替代方法，我们还可以为OP项添加前缀以确保它是不是"}" 。这样的旧规则结束了：

    | OP { None if op.string in ("{", "}") else op.string }

The new version looks like this:

新版本如下所示：

    | !"}" OP { op.string }

This moves the special-casing of curly braces from the action to the grammar, where it properly belongs. We don’t need to check for "{", since it matches an earlier alternative (this was true for the old version too, actually, but I forgot :-).

这会将花括号的特殊大小写从动作移到正确所属的语法中。我们不需要检查"{" ，因为它与一个较早的替代匹配(实际上，对于旧版本也是如此，但我忘了:-)。

We add the grammar for lookaheads to the rule for item, like so:

我们将前瞻性语法添加到item的规则中，如下所示：

item:
    | NAME = atom { NamedItem(name.string, atom) }
    | atom { atom }
    | "&" atom { Lookahead(atom, True) }
    | "!" atom { Lookahead(atom, False) }

Again, we have to add a Lookahead data class to grammar.py (and import it in @subheader!), and twiddle the generator a bit. The generated code uses the following helper method:

同样，我们必须添加一个Lookahead数据类grammar.py (和它导入@subheader ！)，以及玩弄发电机一点。生成的代码使用以下帮助程序方法：

    def lookahead(self, positive, func, *args):
        mark = self.mark()
        ok = func(*args) is not None
        self.reset(mark)
        return ok == positive

In our case, the generated code for this alternative looks like this:

在我们的例子中，此替代方法生成的代码如下所示：

        if (True
            and self.lookahead(False, self.expect, "}")
            and (op := self.expect(OP))
        ):
            return op . string

As the grammar fragment above suggests, lookaheads cannot be named. This could easily be changed, but I can’t think of anything useful to do with the value; also, for negative lookaheads, the value would always be None.

就像上面的语法片段所暗示的那样，先行者无法命名。这很容易改变，但是我想不出与价值有关的任何有用的东西。同样，对于负前瞻，该值将始终为None 。

Next let’s add parenthesized groups. The best place to add these to the meta-grammar is in the rule for atom:

接下来，我们添加带括号的组。将这些添加到元语法的最佳位置是atom的规则：

atom:
    | NAME { name.string }
    | STRING { string.string }    | "(" alts ")" { self.synthetic_rule(alts) }

The first two alternatives are unchanged. The action for the new alternative uses a hack (whose implementation will remain unexplained) that allows the meta-parser to add new rules to the grammar on the fly. This helper function (defined on the meta-parser) returns the name of the new Rule object. This name is a fixed prefix followed by a number to make it unique, e.g. _synthetic_rule_1.

前两个替代方案保持不变。新替代方法的操作使用了一种hack(其实现将无法解释)，它允许元解析器动态地向语法添加新规则。此辅助函数(在元分析器上定义)返回新Rule对象的名称。该名称是固定前缀，后跟数字以使其唯一，例如_synthetic_rule_1 。

You might wonder what happens if the synthetic rule ends up being abandoned due to the meta-parser backtracking over it. I don’t see where the current meta-grammar would allow this without failing, but it’s pretty safe — at worst there’s an unused rule in the grammar. And due to the memoization cache, the same action will never be executed twice for the same input position, so that’s not a problem either (but even if it were, at worst we’d have a dead rule).

您可能想知道，如果综合规则由于元分析器回溯而最终被放弃，会发生什么情况。我看不出当前的元语法在哪里可以做到这一点而不会失败，但这是非常安全的-最糟糕的是语法中有未使用的规则。由于使用了备忘录缓存，对于相同的输入位置，相同的操作将永远不会执行两次，因此这也不是问题(但是即使发生了，最糟糕的情况是我们也会犯规)。

Using alts inside the parentheses means that we can use the vertical bar in the group, which is one of the purposes of grouping. For example, if we wanted to make sure our lookahead-based solution for the “action mess” would not accidentally match {, we could update the negative lookahead like this:

在括号内使用alts意味着我们可以在组中使用竖线，这是分组的目的之一。例如，如果我们要确保针对“动作混乱”的基于前瞻性的解决方案不会意外匹配{ ，我们可以像这样更新否定的前瞻性：

    | !("{" | "}") OP { op.string }

Even better, groups can also contain actions. This wouldn’t be useful in the “action mess” solution, but there are other cases where it’s useful. And because we generate a synthetic rule anyway, implementing it doesn’t require any extra work (beyond implementing synthetic_rule :-).

更好的是，组也可以包含动作。这在“动作混乱”解决方案中不会有用，但是在其他情况下它很有用。而且因为无论如何，我们生成合成规则，实现它不需要任何额外的工作(超出实施synthetic_rule :-)。

On to optional items. Like I did in the old pgen, I am using square brackets to indicate an optional group. This is often useful, for example a grammar rule describing a Python for loop might use this to indicate that there is an optional else clause. The grammar again can be added to the rule for atom, like so:

在可选项目上。就像我在旧的pgen中所做的一样，我使用方括号表示可选组。这通常很有用，例如描述Python for循环的语法规则可能会使用它来表明存在可选的else子句。语法可以再次添加到atom的规则中，如下所示：

atom:
    | NAME { name.string }
    | STRING { string.string }
    | "(" alts ")" { self.synthetic_rule(alts) }    | "[" alts "]" { Maybe(self.synthetic_rule(alts)) }

Here Maybe is another data class, with a single item attribute. We modify the code generator to generate code that preserves the value returned by the synthetic parsing function for the contained alternatives, but doesn’t fail if that value is None. We do this by essentially adding or True to to the code, like this code for [term]:

在这里， Maybe是另一个具有单个item属性的数据类。我们修改代码生成器以生成代码，该代码将保留由合成解析函数返回的值用于所包含的替代项，但是如果该值为None则不会失败。我们通过在代码中添加or True来实现此目的，例如[term]代码：

if (True
    and ((term := self.term()) or True)
):
    return term

Moving on to repetitions, another useful PEG feature (the notation is borrowed from regular expressions and is also used in pgen). There are two forms: appending a star to an atom means “zero or more repetitions” while appending a plus sign means “one or more repetitions”. For various reasons I ended up rewriting the grammar rules for item and atom, inserting an intermediate rule that I named molecule:

继续进行重复，这是另一个有用的PEG功能(该符号是从正则表达式借来的，也用于pgen中 )。有两种形式：将星形附加到原子上表示“零个或多个重复”，而附加加号表示“一个或多个重复”。由于各种原因，我结束了重写语法规则item和atom ，插入中间的规则，我命名的molecule ：

item:
    | NAME '=' molecule { NamedItem(name.string, molecule) }
    | "&" atom { Lookahead(atom) }
    | "!" atom { Lookahead(atom, False) }
    | molecule { molecule }molecule:    | atom "?" { Maybe(atom) }    | atom "*" { Loop(atom, False) }
    | atom "+" { Loop(atom, True) }
    | atom { atom }
    | "[" alts "]" { Maybe(self.synthetic_rule(alts)) }atom:
    | NAME { name.string }
    | STRING {string.string }
    | "(" alts ")" { self.synthetic_rule(alts) }

Observe that this introduces an alternative syntax for optionals (atom?) which requires no additional implementation effort since it’s just another way to create a Maybe node.

可以看到，这为可选对象( atom? )引入了另一种语法，由于它是创建Maybe节点的另一种方式，因此不需要额外的实现工作。

The rule refactoring here was needed because I don’t want to easily allow anomalies like optional repetitions (since that’s just a zero-or-more repetition), repeated repetitions (the inner one would gobble up all matches since PEG always uses eager matching), or repeated optionals (which would stop the parser dead if the optional item doesn’t match). Note that this isn’t a 100% solution, since you can still write something like (foo?)*. The parser generator will have to add a check for this situation, but that’s beyond the scope of this series.

这里需要重构规则，因为我不想轻易允许像可选重复这样的异常(因为这只是零个或多个重复)，重复重复(内部重复会吞噬所有匹配项，因为PEG总是使用热切匹配) ，或重复的可选参数(如果可选项目不匹配，则解析器将停止运行)。请注意，这不是100％的解决方案，因为您仍然可以编写(foo?)*类的东西。解析器生成器将必须添加针对这种情况的检查，但这超出了本系列的范围。

The Loop data class has two attributes, item and nonempty. The generated code uses a helper method on the generated parser, loop(), which has a similar signature as lookahead() shown before:

Loop数据类具有两个属性， item和nonempty 。生成的代码在生成的解析器loop()上使用了一个辅助方法，该方法的签名与之前显示的lookahead()相似：

    def loop(self, nonempty, func, *args):
        mark = self.mark()
        nodes = []
        while node := func(*args) is not None:
            nodes.append(node)
        if len(nodes) >= nonempty:
            return nodes
        self.reset(mark)
        return None

If nonempty is False (meaning the grammar used *) this will never fail — instead it will return an empty list when it sees no occurrences of the item. In order to make this work we make the parser generator emit is not None checks rather than the more lenient “truthy” checks I showed in a previous post — a “truthy” check would return False if an empty list was recognized.

如果nonempty为False (表示使用的语法* )，则它将永不失败-而是在未发现任何项目时返回空列表。为了完成这项工作，我们使解析器生成器发出is not None检查，而不是我在上一篇文章中显示的更宽松的“真实”检查-如果识别到空列表，“真实”检查将返回False 。

And that’s all for today! I was going to discuss the “cut” operator (~) present in TatSu, but I haven’t encountered a real use case for it yet, so I wouldn’t be the best person to explain it — the TatSu docs only give a toy example that doesn’t motivate me much. I haven’t found it in other PEG parser generators either, so maybe it’s a TatSu invention. Maybe I’ll explain it in the future. (In the meantime I did implement it, in case it’s ever useful. :-)

今天就这些！我本来要讨论TatSu中存在的“剪切”运算符( ~ )，但是我还没有遇到过真正的用例，所以我不是最好的解释者-TatSu文档只给出了一个玩具的例子并不能激励我很多。我也没有在其他PEG解析器生成器中找到它，所以也许这是TatSu的发明。也许我以后再解释。 (在此期间，我确实实现了它，以防它有用。:-)

I think the next episode will be about my experience writing a PEG grammar that can parse all of Python. This is mostly how I spent the Python core developer sprint that was held this week in London, with logistical support from Bloomberg and financial support from the PSF and some attendees’ employers (e.g. Dropbox paid for my hotel and airfare). Special thanks go to Emily Morehouse and Pablo Galindo Salgado, who were very helpful writing tools and tests. Next up for that project is writing a performance benchmark, and then we’re going to add actions to this grammar so it can create AST trees that can be compiled by the CPython bytecode compiler. Exciting times!

我认为下一集将讲述我编写可解析所有Python的PEG语法的经验。这主要是我花了本周在伦敦举行的Python核心开发者冲刺的方式，得到了彭博社的后勤支持以及PSF和一些与会人员的雇主的财务支持(例如，Dropbox支付了我的酒店和机票费用)。特别感谢Emily Morehouse和Pablo Galindo Salgado，他们对编写工具和测试都非常有帮助。该项目的下一步是编写性能基准测试，然后我们将操作添加到该语法中，以便它可以创建可由CPython字节码编译器编译的AST树。激动人心的时代！

License for this article and the code shown: CC BY-NC-SA 4.0

本文的许可证和所显示的代码： CC BY-NC-SA 4.0

翻译自: https://medium.com/@gvanrossum_83706/implementing-peg-features-76caa4b2151f

frida 挂钩

weixin_26705651

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
frida 挂钩_实现挂钩功能

frida 挂钩After making my PEG parser generator self-hosted in the last post, I’m now ready to show how to implement various other PEG features. 在上一篇文章中使PEG分析器生成器自托管后，我现在准备展示如何实现各种其他PEG功能。 [This is part...
复制链接

扫一扫