javalang 是用于处理 Java 源码的纯 Python 库,目前提供了针对 Java 8 的词法分析器和解析器。
PyPI 0.13.0 | Github 0.13.0 | Java 8 语言规范
开始
import javalang
tree = javalang.parse.parse("package javalang.brewtab.com; class Test {}; class Test2 {}")
将返回一个 CompilationUnit 实例,它是树的根,可以通过遍历提取不同信息。
print(tree.package.name)
print(tree.types[0])
print(tree.types[1].name)
javalang.brewtab.com
ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None)
Test2
传递给 javalang.parse.parse() 的字符串必须代表一个完整有效的 Java 源文件。javalang.parse 模块中有其他方法可以在不提供整个编译单元的情况下解析一些较小的代码片段。
语法树
CompilationUnit 是 javalang.ast.Node 的子类,它在树中的后代也一样。javalang.tree 模块定义了不同类型的 Node 子类,每个都代表你都能在Java代码中找到的不同的语法成分。关于节点类型的更多信息,可参阅 javalang/tree.py 文件。
部分 Node 子类示例 | attrs |
---|---|
CompilationUnit(Node) | “package”, “imports”, “types” |
Import(Node) | “path”, “static”, “wildcard” |
Documented(Node) | “documentation” |
Declaration(Node) | “modifiers”, “annotations” |
TypeDeclaration(Declaration, Documented) | “name”, “body” |
ClassDeclaration(TypeDeclaration) | “type_parameters”, “extends”, “implements” |
Statement(Node) | “label” |
IfStatement(Statement) | “condition”, “then_statement”, “else_statement” |
Node 实例支持迭代 Iteration。
for path, node in tree:
print(path, node)
() CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)])
(CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]),) PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com)
(CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]), [ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]) ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None)
(CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]), [ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]) ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)
迭代也可以按类型过滤
for path, node in tree.filter(javalang.tree.ClassDeclaration):
print(path, node)
(CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]), [ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]) ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None)
(CompilationUnit(imports=[], package=PackageDeclaration(annotations=None, documentation=None, modifiers=None, name=javalang.brewtab.com), types=[ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]), [ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test, type_parameters=None), ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)]) ClassDeclaration(annotations=[], body=[], documentation=None, extends=None, implements=None, modifiers=set(), name=Test2, type_parameters=None)
组件
javalang.parse.parse 是个简单的方法,它为输入创建一个 token 令牌流,用给定的令牌流初始化一个新的 javalang.parser.Parser 实例,然后调用解析器的 parse() 方法,返回 CompilationUnit 。
def parse(s):
tokens = tokenize(s)
parser = Parser(tokens)
return parser.parse()
这些组件也可以单独使用。
Tokenizer
标记解析器/词法分析器可以通过调用 javalang.tokenizer.tokenize 直接 invoked。
print(javalang.tokenizer.tokenize('System.out.println("Hello " + "world");'))
<generator object JavaTokenizer.tokenize at 0x000001DF22FFF4C0>
这将返回一个提供 JavaToken 对象流的 generator。其中每个令牌携带行列位置信息和值信息。令牌不是 JavaToken 的直接实例,而是标识其一般类型的子类的实例。
tokens = list(javalang.tokenizer.tokenize('System.out.println("Hello " + "world");'))
print(tokens[6].value)
print(tokens[6].position)
print(type(tokens[6]))
print(type(tokens[7]))
"Hello "
Position(line=1, column=20)
<class 'javalang.tokenizer.String'>
<class 'javalang.tokenizer.Operator'>
Parser
可以直接使用解析器解析代码片段。
tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
parser = javalang.parser.Parser(tokens)
print(parser.parse_expression())
MethodInvocation(arguments=[BinaryOperation(operandl=Literal(postfix_operators=[], prefix_operators=[], qualifier=None, selectors=[], value="Hello "), operandr=Literal(postfix_operators=[], prefix_operators=[], qualifier=None, selectors=[], value="world"), operator=+)], member=println, postfix_operators=[], prefix_operators=[], qualifier=System.out, selectors=[], type_arguments=None)
解析方法是针对增量解析而设计,因此不会在令牌流开头地方重新启动。多次调用一个解析方法将导致 JavaSyntaxError 异常,调用不正确的解析方法也会导致该异常。
tokens = javalang.tokenizer.tokenize('System.out.println("Hello " + "world");')
parser = javalang.parser.Parser(tokens)
parser.parse_type_declaration()
Traceback (most recent call last):
File "F:/PyCharmProjects/untitled/Javalang_test03.py", line 4, in <module>
parser.parse_type_declaration()
File "E:\anaconda3\envs\untitled\lib\site-packages\javalang\parser.py", line 347, in parse_type_declaration
return self.parse_class_or_interface_declaration()
File "E:\anaconda3\envs\untitled\lib\site-packages\javalang\parser.py", line 364, in parse_class_or_interface_declaration
self.illegal("Expected type declaration")
File "E:\anaconda3\envs\untitled\lib\site-packages\javalang\parser.py", line 119, in illegal
raise JavaSyntaxError(description, at)
javalang.parser.JavaSyntaxError
javalang.parse 模块为解析更常见类型的代码片段提供了简便方法。