【GumTree实践】为什么diff的输出会多出来很多move?怎么选择Matcher?

前言

如题。 本文主要讲解在gumtree的API调用过程中遇到的问题:明明是update,但是非要分解成move和insert,搞的diff output很复杂。

正文

我的diff

在这里插入图片描述

我之前的源代码

参考gumtree的wiki写的,然后自己也调试了gumtree。

Run.initGenerators();
GumTreeProperties properties = new GumTreeProperties();
Diff diff = Diff.compute(MyUtil.srcFilePath, MyUtil.dstFilePath, null, null, properties);
EditScript editScript = diff.editScript;
List<Action> actions = editScript.asList();

输出结果:

Action 1:
===
insert-node
---
factor [4653,4657]
to
term [4653,4661]
at 0

INS factor@@[4653,4657]@@ "-0.5" @TO@ term@@[4653,4661]@@ "-dim / 2" 


Action 2:
===
insert-node
---
operator: * [4658,4659]
to
term [4653,4661]
at 1

INS operator@@[4658,4659]@@ "*" @TO@ term@@[4653,4661]@@ "-dim / 2" 


Action 3:
===
move-tree
---
name: dim [4654,4657]
to
term [4653,4661]
at 2

MOV name@@[4654,4657]@@ "dim" @AFTER@ number@@[4660,4661]@@ "2" 


Action 4:
===
move-tree
---
operator: - [4653,4654]
to
factor [4653,4657]
at 0

MOV operator@@[4653,4654]@@ "-" @AFTER@ operator@@[4653,4654]@@ "-" 


Action 5:
===
delete-node
---
factor [4653,4657]
===

DEL factor@@[4653,4657]@@ "-dim" @FROM@ term@@[4653,4661]@@ "-dim / 2" 


Action 6:
===
delete-node
---
operator: / [4658,4659]
===

DEL operator@@[4658,4659]@@ "/" @FROM@ term@@[4653,4661]@@ "-dim / 2" 


Action 7:
===
delete-node
---
number: 2 [4660,4661]
===

DEL number@@[4660,4661]@@ "2" @FROM@ term@@[4653,4661]@@ "-dim / 2" 

可以看到,很多move。让人很难受。

寻找解决方案

我知道这个diff是用:SimplifiedChawatheScriptGenerator 求出来的。然后我就以此为关键词去https://github.com/GumTreeDiff/gumtree/search?q=SimplifiedChawatheScriptGenerator&type=issues gumtree的repo搜索。

找到一个和我非常相似的问题:

对应回答是:

Hi!

This is not a bug, it’s one thing that can happen using the classic gumtree matcher.

However, using gumtree-simple matcher seems to improve the diff.

Cheers.

然后我慢慢追根溯源,从Matcher m = Matchers.getInstance().getMatcherWithFallback(matcher);找到getMatcherWithFallback,然后找到了matcher这个类,右键matcher,而后选择Quick type hierarchy,找打了CompositeMatchers 这个类,里面有很多:

@Register(id = "gumtree", defaultMatcher = true, priority = Registry.Priority.HIGH)

@Register(id = "gumtree-simple")

@Register(id = "gumtree-simple-id")

这种,所以我大概明白了,只需要把Diff diff = Diff.compute(MyUtil.srcFilePath, MyUtil.dstFilePath, null, null, properties);中对应matcher的参数改一下就行。

调整后的源代码

Run.initGenerators();
GumTreeProperties properties = new GumTreeProperties();
// 改成gumtree-simple了。
Diff diff = Diff.compute(MyUtil.srcFilePath, MyUtil.dstFilePath, null, "gumtree-simple", properties);
EditScript editScript = diff.editScript;
List<Action> actions = editScript.asList();

输出结果:

Action 1:
===
insert-tree
---
term [4653,4663]
    factor [4653,4657]
        operator: - [4653,4654]
        number: 0.5 [4654,4657]
    operator: * [4658,4659]
    name: dim [4660,4663]
to
arglist [4636,4661]
at 2

INS term@@[4653,4663]@@ "-0.5 * dim" @TO@ arglist@@[4636,4661]@@ "2 * FastMath.PI, -dim / 2" 


Action 2:
===
delete-tree
---
term [4653,4661]
    factor [4653,4657]
        operator: - [4653,4654]
        name: dim [4654,4657]
    operator: / [4658,4659]
    number: 2 [4660,4661]

DEL term@@[4653,4661]@@ "-dim / 2" @FROM@ arglist@@[4636,4661]@@ "2 * FastMath.PI, -dim / 2" 

小结

以上。

更新:更佳方案

2021年3月11日20:41:08
试了很多种,最后发现这个"gumtree-simple-id-theta"最好。

/*
			 * gumtree-complete  many
			 * gumtree-simple proper
			 * gumtree  many
			 * xy    many*2
			 * change-distiller  many*3
			 * gumtree-simple-id  same to gumtree-simple
			 * theta    same to gumtree-complete
			 * rted-theta
			 * longestCommonSequence
			 * classic-gumtree-theta  good now.
			 * gumtree-simple-id-theta
			 */
			Diff diff = Diff.compute(MyUtil.srcFilePath, MyUtil.dstFilePath, null, "gumtree-simple-id-theta", properties);
			EditScript editScript = diff.editScript;
			List<Action> actions = editScript.asList();

参考文献

github。以及自己的观察、理解。

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值