joern 学习笔记
joern是一个类似于CodeQL的,基于AST查询的程序分析工具。
Description | Link |
---|---|
Tool homepage | https://joern.io/ |
Github | https://github.com/joernio/joern |
Official documentation | https://docs.joern.io/ |
比如:在Scala中,如何获取一个列表的前5项?
在Scala中,可以使用
.take(5)
方法获取一个列表的前5项。例如:
val myList = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val firstFive = myList.take(5)
这将返回一个包含1到5的列表。
joern shell快捷键
Command | Description | 中文描述 |
---|---|---|
CTRL-c | Cancels current operation/clears shell. Does not quit Joern | 取消当前操作/清除shell。不退出Joern |
CTRL-d | Quits Joern (shell must be clear) | 退出Joern(必须清除shell) |
TAB | Autocomplete | 自动完成 |
UP | Moves through command history | 浏览命令历史记录 |
CTRL-LEFT/RIGHT | Step through commands word-by-word (instead of character-by-character) | 逐个单词而非逐个字符地浏览命令 |
CTRL-r | Searches command history. Use CTRL-r (or UP/DOWN) to cycle through your matches | 搜索命令历史记录。使用CTRL-r(或UP/DOWN)循环查看匹配项 |
安装
- 安装说明:https://docs.joern.io/installation/
- 关于jdk,Windows下推荐 https://adoptium.net/zh-CN/
- 如果不想使用官方的安装脚本,可以直接下载
-
https://github.com/joernio/joern/releases/latest/download/joern-cli.zip
-
解压缩后,chmod +x 应该就可以用了。
-
Windows的话,里面有现成的bat脚本可用。
-
常用脚本与经验
查看当前workspace路径
workspace.getPath
切换workspace
switchWorkspace("<path_to_workspace>")
worksapce其实就是个文件夹。
建立代码AST数据库。
importCode(inputPath="./x42/c/", projectName="x42-c")
joern完成了代码分析后,会在当目录下的workspace目录中保存相关的工程信息。
可以通过workspace命令来查看保存的project。
可以通过open <工程名>来打开之前的工程。
直接引入代码
val code = """
void foo () {
int x = source();
if(x < MAX) {
int y = 2*x;
sink(y);
}
}
"""
importCode.c.fromString(code)
手动生成cpg,并引入joern。
Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /media/shihangyu/ssd/tools/joern/joern-cli/c2cpg.sh -J-Xmx7952m /media/shihangyu/ssd/dev/nuttx/temp/test.c --output /media/shihangyu/ssd/dev/nuttx/temp/workspace/test.c/cpg.bin.zip
3) start joern, import the cpg: `importCpg("path/to/cpg")`
joern并不准确
由于joern的AST分析是基于fuzz compiler的。所以和codeql这种需要能构建整个工程的分析方式不同。虽然容错能力很强,但是也导致了很多代码分析错误,导致不够准确。
以如下语句为例
exception_t decodeTCBInvocation(word_t invLabel, word_t length, cap_t cap,
cte_t *slot, bool_t call, word_t *buffer)
{
/* Stall the core if we are operating on a remote TCB that is currently running */
SMP_COND_STATEMENT(remoteTCBStall(TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));)
switch (invLabel) {
case TCBReadRegisters:
/* Second level of decoding */
return decodeReadRegisters(cap, length, call, buffer);
case TCBWriteRegisters:
return decodeWriteRegisters(cap, length, buffer);
case TCBCopyRegisters:
return decodeCopyRegisters(cap, length, buffer);
case TCBSuspend:
/* Jump straight to the invoke */
setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
return invokeTCB_Suspend(
TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));
case TCBResume:
setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
SMP_COND_STATEMENT这一行宏调用会导致语法分析把个调用和下面的switch块解析为一个Unknown节点。
Unknown(
id = 76393L,
argumentIndex = -1,
argumentName = None,
code = """SMP_COND_STATEMENT(remoteTCBStall(TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));)
switch (invLabel) {
case TCBReadRegisters:
/* Second level of decoding */
return decodeReadRegisters(cap, length, call, buffer);
case TCBWriteRegisters:
return decodeWriteRegisters(cap, length, buffer);
case TCBCopyRegisters:
return decodeCopyRegisters(cap, length, buffer);
case TCBSuspend:
/* Jump straight to the invoke */
setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
return invokeTCB_Suspend(
TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));
case TCBResume:
setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
return invokeTCB_Resume(
TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));
case TCBConfigure:
return decodeTCBConfigure(cap, length, slot, buffer);
case TCBSetPriority:
return decodeSetPriority(cap, length, buffer);
...""",
columnNumber = Some(value = 5),
containedRef = "<empty>",
dynamicTypeHintFullName = ArraySeq(),
lineNumber = Some(value = 800),
order = 1,
parserTypeName = "CASTProblemStatement",
typeFullName = "<empty>"
),
一旦出现这种情况,所有的函数调用、switch块等等就都不会被正常解析到。cpg中自然也不会有相应的节点了。
删除临时工程
当我们使用“importCode.c.fromString(code)”临时引入代码后,Joern会自动生成一个临时工程,像这样:
console18430029678175922266
console1988761634135478011
console9801639066733432335
为了保持workspace的整洁,可以使用如下脚本予以清除。
workspace.projects.foreach(p =>{
if(p.name.startsWith("console")){
delete(p.name)
}
})
打开之前的工程
open("YOUR_PROJECT")
切换激活的工程
workspace.setActiveProject("x42-java")
joern对于C语言宏(Macro)的处理
val code = """
#define TESTM 42
void foo () {
return TESTM+0;
}
"""
importCode.c.fromString(code)
cpg.method.name("foo").repeat(_.astChildren)(_.until(_.code("TESTM"))).ast.l
val res32: List[io.shiftleft.codepropertygraph.generated.nodes.AstNode] = List(
Call(
id = 11L,
argumentIndex = 1,
argumentName = None,
code = "TESTM",
columnNumber = Some(value = 10),
dispatchType = "INLINED",
dynamicTypeHintFullName = ArraySeq(),
lineNumber = Some(value = 4),
methodFullName = "tmp.c:2:2:TESTM:0",
name = "TESTM",
order = 1,
signature = "",
typeFullName = "int"
),
Block(
id = 12L,
argumentIndex = 1,
argumentName = None,
code = "<empty>",
columnNumber = None,
dynamicTypeHintFullName = ArraySeq(),
lineNumber = None,
order = 1,
typeFullName = "void"
),
Literal(
id = 13L,
argumentIndex = 1,
argumentName = None,
code = "42",
columnNumber = Some(value = 10),
dynamicTypeHintFullName = ArraySeq(),
lineNumber = Some(value = 4),
order = 1,
typeFullName = "int"
)
)
可以看到,joern将宏处理为了一个函数调用。同时dispatchType = "INLINED”。
将TESTM作为method来查询:
cpg.method.name("TESTM").ast.l
得到结果:
val res36: List[io.shiftleft.codepropertygraph.generated.nodes.AstNode] = List(
Method(
id = 38L,
astParentFullName = "<global>",
astParentType = "NAMESPACE_BLOCK",
code = "<empty>",
columnNumber = None,
columnNumberEnd = None,
filename = "tmp.c",
fullName = "tmp.c:2:2:TESTM:0",
hash = None,
isExternal = true,
lineNumber = Some(value = 2),
lineNumberEnd = Some(value = 2),
name = "TESTM",
order = 0,
signature = ""
),
Block(
id = 39L,
argumentIndex = 1,
argumentName = None,
code = "<empty>",
columnNumber = None,
dynamicTypeHintFullName = ArraySeq(),
lineNumber = None,
order = 1,
typeFullName = "ANY"
),
MethodReturn(
id = 40L,
code = "RET",
columnNumber = None,
dynamicTypeHintFullName = ArraySeq(),
evaluationStrategy = "BY_VALUE",
lineNumber = None,
order = 2,
typeFullName = "ANY"
)
)
可见,宏定义的内容并不会在其method节点中,而是在其被调用的节点中(ast中)。
查询宏的定义值
cpg.call.name("MACRO_NAME").astChildren.astChildren.code.dedup.l
//or
cpg.call.name("IRQ_INT_OFFSET").ast.isLiteral.code.distinct.l
不过并不能保证所有的宏都能找到。
使用dedup合并重复项
joern> cpg.call.dispatchType.dedup.l
val res76: List[String] = List("STATIC_DISPATCH", "INLINED")
查找一个函数调用点的调用者函数
cpg.call.name("create_domain_cap").ast.inAst.isMethod.distinct.l
//or
cpg.call.name("create_domain_cap").repeat(_.astParent)(_.until(_.isMethod)).l
基于文件行号查询call site
cpg.call.name("init_freemem").filter(_.location.lineNumber.get == 80)
//_.location.lineNumber 是 Option[Int]类型,先用get开箱再比较。
查找到一个函数参数的dataflow [大量运算,慎用!]
cpg.call.name("create_domain_cap").argument.reachableByFlows(cpg.all).l
形参与实参要分开查询
//形参用于method
cpg.method.name("arch_init_freemem").filename(".*x86.*").parameter.l
//实参用于call
cpg.call.name("arch_init_freemem").argument.l
如果二者用错,method对象是没有argument的,所以会报错。
cpg.method.name("arch_init_freemem").filename(".*x86.*").argument.l
-- [E008] Not Found Error: -----------------------------------------------------
1 |cpg.method.name("arch_init_freemem").filename(".*x86.*").argument.l
|^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|value argument is not a member of Iterator[io.shiftleft.codepropertygraph.generated.nodes.Method], but could be made available as an extension method.
|
|The following import might make progress towards fixing the problem:
|
| import sourcecode.Text.generate
|
1 error found
call对象可以调用parameter,但是会是一个比较诡异的结果。
cpg.call.name("arch_init_freemem").parameter.l
val res35:
List[io.shiftleft.codepropertygraph.generated.nodes.MethodParameterIn] = List(
MethodParameterIn(
id = 84606L,
code = "p1",
columnNumber = None,
dynamicTypeHintFullName = ArraySeq(),
evaluationStrategy = "BY_VALUE",
index = 1,
isVariadic = false,
lineNumber = None,
name = "p1",
order = 1,
typeFullName = "ANY"
)
)
查询类型定义
cpg.typeDecl.name("seL4_BootInfoHeader").l
查询成员(field)所在结构体
cpg.member.name("count").astParent.l
//isFieldIdentifier 可以用来判断ast节点是否是一个field
查询枚举
cpg.member.name("IRQReserved").astParent.l
获取joern所有node type
cpg.all.map( n => n.getClass).dedup.l
val res161:
List[Class[? <: io.shiftleft.codepropertygraph.generated.nodes.StoredNode]] = List(
class io.shiftleft.codepropertygraph.generated.nodes.MetaData,
class io.shiftleft.codepropertygraph.generated.nodes.NamespaceBlock,
class io.shiftleft.codepropertygraph.generated.nodes.Dependency,
class io.shiftleft.codepropertygraph.generated.nodes.Import,
class io.shiftleft.codepropertygraph.generated.nodes.TypeDecl,
class io.shiftleft.codepropertygraph.generated.nodes.Method,
class io.shiftleft.codepropertygraph.generated.nodes.Block,
class io.shiftleft.codepropertygraph.generated.nodes.MethodParameterIn,
class io.shiftleft.codepropertygraph.generated.nodes.Return,
class io.shiftleft.codepropertygraph.generated.nodes.Call,
class io.shiftleft.codepropertygraph.generated.nodes.Unknown,
class io.shiftleft.codepropertygraph.generated.nodes.Identifier,
class io.shiftleft.codepropertygraph.generated.nodes.Literal,
class io.shiftleft.codepropertygraph.generated.nodes.MethodReturn,
class io.shiftleft.codepropertygraph.generated.nodes.Binding,
class io.shiftleft.codepropertygraph.generated.nodes.Member,
class io.shiftleft.codepropertygraph.generated.nodes.Local,
class io.shiftleft.codepropertygraph.generated.nodes.Modifier,
class io.shiftleft.codepropertygraph.generated.nodes.FieldIdentifier,
class io.shiftleft.codepropertygraph.generated.nodes.ControlStructure,
class io.shiftleft.codepropertygraph.generated.nodes.JumpTarget,
class io.shiftleft.codepropertygraph.generated.nodes.Type,
class io.shiftleft.codepropertygraph.generated.nodes.File,
class io.shiftleft.codepropertygraph.generated.nodes.Namespace,
class io.shiftleft.codepropertygraph.generated.nodes.MethodParameterOut
)
cpg.all.label.dedup.l
val res230: List[String] = List(
"META_DATA",
"NAMESPACE_BLOCK",
"DEPENDENCY",
"IMPORT",
"TYPE_DECL",
"METHOD",
"BLOCK",
"METHOD_PARAMETER_IN",
"RETURN",
"CALL",
"UNKNOWN",
"IDENTIFIER",
"LITERAL",
"METHOD_RETURN",
"BINDING",
"MEMBER",
"LOCAL",
"MODIFIER",
"FIELD_IDENTIFIER",
"CONTROL_STRUCTURE",
"JUMP_TARGET",
"TYPE",
"FILE",
"NAMESPACE",
"METHOD_PARAMETER_OUT"
)
列出被多次定义的函数
cpg.method.whereNot(_.name("<.*>.*")).filter(m=>cpg.method.name( m.name ).size>1).name.l.sorted
这种情况在OS中会比较常见。比如多个arch的情况下,相同的函数名会根据不同arch进行不同的定义。当然,编译链接之后只会选择其中之一。例如:
joern> cpg.method.name("write_it_asid_pool").astParentFullName.l
val res14: List[String] = List(
"include/arch/arm/arch/kernel/vspace.h:<global>",
"include/arch/riscv/arch/kernel/vspace.h:<global>",
"include/arch/x86/arch/kernel/vspace.h:<global>",
"src/arch/arm/32/kernel/vspace.c:<global>",
"src/arch/arm/64/kernel/vspace.c:<global>",
"src/arch/riscv/kernel/vspace.c:<global>",
"src/arch/x86/kernel/vspace.c:<global>"
)
c2cpg.sh在parse过程中显示log
修改joern/joern-cli/conf/log4j2.xml,将需要输出log包的配置改为“TRACE”。
<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
<Appenders>
<Console name="Console" target="SYSTEM_ERR">
<PatternLayout pattern="%d{yyy-MM-dd HH:mm:ss.SSS} %p %c{0}: %msg%n"/>
</Console>
</Appenders>
<Loggers>
<Logger name="org.apache.tomcat" level="error"/>
<Logger name="org.apache.jasper" level="error"/>
<Logger name="org.reflections" level="off" />
<Logger name="org.reflections8" level="off" />
<Logger name="ghidra.app.plugin.core.analysis" level="off" />
<Logger name="io.shiftleft.overflowdb" level="TRACE" />
<Root level="TRACE">
<AppenderRef ref="Console" />
</Root>
</Loggers>
</Configuration>
参考:c2cpg.sh --log-problems --log-preprocessor --define "FAR= " -o binder.cpg binder.c 命令,开启log输出。
列出所有全局变量
cpg.local.filter(_.location.methodShortName=="<global>").l.size
列出给定变量参与的赋值操作
cpg.assignment.where(_.argument.isIdentifier.name("binder_last_debug_id")).l