Background:
最近为了重现tree-based clone detection的论文:L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007.
需要对Java class中每个method构建AST,然后将AST转化成dot格式,最后转换成vector(这一步作者已经在Github实现(https://github.com/skyhover/Deckard):执行vdbgen即可)。
通过判断vector之间的相似性来判断代码之间的相似性。
这个过程是tree-based clone detection的核心思想。
完整源码已传至我的Github: https://github.com/XBWer/JDT_AST_DOT
首先找一个class文件当例子:
Input : test.java
publicclass test {
int i = 1;
public void testNonEscaped() {
startServer(NonEscapedURIResource.class);
WebResource r = Client.create().resource(getUri().userInfo("x.y").path("x%20y").build());
assertEquals("CONTENT", r.get(String.class));
}
}
Output: test.java_testNonEscaped.dot
digraph "DirectedGraph" {
graph [label = "testNonEscaped", labelloc=t, concentrate = true];
"13329486" [ type=31 line=4 ]
"327177752" [ type=83 line=4 ]
"1458540918" [ type=39 line=4 ]
"1164371389" [ type=42 line=4 ]
"517210187" [ type=8 line=4 ]
"267760927" [ type=21 line=5 ]
"633070006" [ type=32 line=5 ]
"1459794865" [ type=42 line=5 ]
"1776957250" [ type=57 line=5 ]
"1268066861" [ type=43 line=5 ]
"827966648" [ type=42 line=5 ]
"1938056729" [ type=60 line=7 ]
"1273765644" [ type=43 line=7 ]
"701141022" [ type=42 line=7 ]
"1447689627" [ type=59 line=7 ]
"112061925" [ type=42 line=7 ]
"764577347" [ type=32 line=7 ]
"1344645519" [ type=32 line=7 ]
"1234776885" [ type=42 line=7 ]
"540159270" [ type=42 line=7 ]
"422250493" [ type=42 line=7 ]
"1690287238" [ type=32 line=7 ]
"1690254271" [ type=32 line=7 ]
"1440047379" [ type=32 line=7 ]
"343965883" [ type=32 line=7 ]
"230835489" [ type=42 line=7 ]
"280884709" [ type=42 line=7 ]
"1847509784" [ type=45 line=7 ]
"2114650936" [ type=42 line=7 ]
"1635756693" [ type=45 line=7 ]
"504527234" [ type=42 line=7 ]
"101478235" [ type=21 line=8 ]
"540585569" [ type=32 line=8 ]
"1007653873" [ type=42 line=8 ]
"836514715" [ type=45 line=8 ]
"1414521932" [ type=32 line=8 ]
"828441346" [ type=42 line=8 ]
"1899073220" [ type=42 line=8 ]
"555826066" [ type=57 line=8 ]
"174573182" [ type=43 line=8 ]
"858242339" [ type=42 line=8 ]
"13329486" -> "327177752"
"13329486" -> "1458540918"
"13329486" -> "1164371389"
"13329486" -> "517210187"
"517210187" -> "267760927"
"267760927" -> "633070006"
"633070006" -> "1459794865"
"633070006" -> "1776957250"
"1776957250" -> "1268066861"
"1268066861" -> "827966648"
"517210187" -> "1938056729"
"1938056729" -> "1273765644"
"1273765644" -> "701141022"
"1938056729" -> "1447689627"
"1447689627" -> "112061925"
"1447689627" -> "764577347"
"764577347" -> "1344645519"
"1344645519" -> "1234776885"
"1344645519" -> "540159270"
"764577347" -> "422250493"
"764577347" -> "1690287238"
"1690287238" -> "1690254271"
"1690254271" -> "1440047379"
"1440047379" -> "343965883"
"343965883" -> "230835489"
"1440047379" -> "280884709"
"1440047379" -> "1847509784"
"1690254271" -> "2114650936"
"1690254271" -> "1635756693"
"1690287238" -> "504527234"
"517210187" -> "101478235"
"101478235" -> "540585569"
"540585569" -> "1007653873"
"540585569" -> "836514715"
"540585569" -> "1414521932"
"1414521932" -> "828441346"
"1414521932" -> "1899073220"
"1414521932" -> "555826066"
"555826066" -> "174573182"
"174573182" -> "858242339"
}
dot文件中,type代表节点的类型(定义请参阅:https://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTNode.html),line代表在文件中的位置(第几行)。
可视化后是这个样子:http://www.webgraphviz.com/
主要步骤:
1.将Java代码转成AST;
2.重写ASTVisitor中的visit方法根据自己的需要去遍历AST;
3.AST转.dot格式。
主要的类:
ASTNode: https://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTNode.html
ASTVisitor: https://help.eclipse.org/neon/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FASTVisitor.html
AST: https://help.eclipse.org/mars/index.jsp?topic=%2Forg.eclipse.jdt.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fjdt%2Fcore%2Fdom%2FAST.html