joern 学习笔记

joern 学习笔记

joern是一个类似于CodeQL的,基于AST查询的程序分析工具。

DescriptionLink
Tool homepagehttps://joern.io/
Githubhttps://github.com/joernio/joern
Official documentationhttps://docs.joern.io/

比如:在Scala中,如何获取一个列表的前5项?

在Scala中,可以使用.take(5)方法获取一个列表的前5项。例如:

val myList = List(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)
val firstFive = myList.take(5)

这将返回一个包含1到5的列表。

joern shell快捷键

CommandDescription中文描述
CTRL-cCancels current operation/clears shell. Does not quit Joern取消当前操作/清除shell。不退出Joern
CTRL-dQuits Joern (shell must be clear)退出Joern(必须清除shell)
TABAutocomplete自动完成
UPMoves through command history浏览命令历史记录
CTRL-LEFT/RIGHTStep through commands word-by-word (instead of character-by-character)逐个单词而非逐个字符地浏览命令
CTRL-rSearches command history. Use CTRL-r (or UP/DOWN) to cycle through your matches搜索命令历史记录。使用CTRL-r(或UP/DOWN)循环查看匹配项

安装

  • 安装说明:https://docs.joern.io/installation/
  • 关于jdk,Windows下推荐 https://adoptium.net/zh-CN/
  • 如果不想使用官方的安装脚本,可以直接下载
    • https://github.com/joernio/joern/releases/latest/download/joern-cli.zip

    • 解压缩后,chmod +x 应该就可以用了。

    • Windows的话,里面有现成的bat脚本可用。

常用脚本与经验

查看当前workspace路径

workspace.getPath

切换workspace

switchWorkspace("<path_to_workspace>")

worksapce其实就是个文件夹。

建立代码AST数据库。

importCode(inputPath="./x42/c/", projectName="x42-c")

joern完成了代码分析后,会在当目录下的workspace目录中保存相关的工程信息。

可以通过workspace命令来查看保存的project。

可以通过open <工程名>来打开之前的工程。

直接引入代码

val code = """
void foo () {
  int x = source();
  if(x < MAX) {
	int y = 2*x;
	sink(y);
  }
}
"""
importCode.c.fromString(code)

手动生成cpg,并引入joern。

Invoking CPG generator in a separate process. Note that the new process will consume additional memory.
If you are importing a large codebase (and/or running into memory issues), please try the following:
1) exit joern
2) invoke the frontend: /media/shihangyu/ssd/tools/joern/joern-cli/c2cpg.sh -J-Xmx7952m /media/shihangyu/ssd/dev/nuttx/temp/test.c --output /media/shihangyu/ssd/dev/nuttx/temp/workspace/test.c/cpg.bin.zip
3) start joern, import the cpg: `importCpg("path/to/cpg")`

joern并不准确

由于joern的AST分析是基于fuzz compiler的。所以和codeql这种需要能构建整个工程的分析方式不同。虽然容错能力很强,但是也导致了很多代码分析错误,导致不够准确。

以如下语句为例

exception_t decodeTCBInvocation(word_t invLabel, word_t length, cap_t cap,
                                cte_t *slot, bool_t call, word_t *buffer)
{
    /* Stall the core if we are operating on a remote TCB that is currently running */
    SMP_COND_STATEMENT(remoteTCBStall(TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));)

    switch (invLabel) {
    case TCBReadRegisters:
        /* Second level of decoding */
        return decodeReadRegisters(cap, length, call, buffer);

    case TCBWriteRegisters:
        return decodeWriteRegisters(cap, length, buffer);

    case TCBCopyRegisters:
        return decodeCopyRegisters(cap, length, buffer);

    case TCBSuspend:
        /* Jump straight to the invoke */
        setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
        return invokeTCB_Suspend(
                   TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));

    case TCBResume:
        setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);

SMP_COND_STATEMENT这一行宏调用会导致语法分析把个调用和下面的switch块解析为一个Unknown节点。

Unknown(
    id = 76393L,
    argumentIndex = -1,
    argumentName = None,
    code = """SMP_COND_STATEMENT(remoteTCBStall(TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));)

    switch (invLabel) {
    case TCBReadRegisters:
        /* Second level of decoding */
        return decodeReadRegisters(cap, length, call, buffer);

    case TCBWriteRegisters:
        return decodeWriteRegisters(cap, length, buffer);

    case TCBCopyRegisters:
        return decodeCopyRegisters(cap, length, buffer);

    case TCBSuspend:
        /* Jump straight to the invoke */
        setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
        return invokeTCB_Suspend(
                   TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));

    case TCBResume:
        setThreadState(NODE_STATE(ksCurThread), ThreadState_Restart);
        return invokeTCB_Resume(
                   TCB_PTR(cap_thread_cap_get_capTCBPtr(cap)));

    case TCBConfigure:
        return decodeTCBConfigure(cap, length, slot, buffer);

    case TCBSetPriority:
        return decodeSetPriority(cap, length, buffer);

   ...""",
    columnNumber = Some(value = 5),
    containedRef = "<empty>",
    dynamicTypeHintFullName = ArraySeq(),
    lineNumber = Some(value = 800),
    order = 1,
    parserTypeName = "CASTProblemStatement",
    typeFullName = "<empty>"
  ),

一旦出现这种情况,所有的函数调用、switch块等等就都不会被正常解析到。cpg中自然也不会有相应的节点了。

删除临时工程

当我们使用“importCode.c.fromString(code)”临时引入代码后,Joern会自动生成一个临时工程,像这样:

console18430029678175922266
console1988761634135478011
console9801639066733432335

为了保持workspace的整洁,可以使用如下脚本予以清除。

workspace.projects.foreach(p =>{
    if(p.name.startsWith("console")){
        delete(p.name)
    }
})

打开之前的工程

open("YOUR_PROJECT")

切换激活的工程

workspace.setActiveProject("x42-java")

joern对于C语言宏(Macro)的处理

val code = """
#define TESTM 42
void foo () {
  return TESTM+0;
}
"""
importCode.c.fromString(code)
cpg.method.name("foo").repeat(_.astChildren)(_.until(_.code("TESTM"))).ast.l
val res32: List[io.shiftleft.codepropertygraph.generated.nodes.AstNode] = List(
  Call(
    id = 11L,
    argumentIndex = 1,
    argumentName = None,
    code = "TESTM",
    columnNumber = Some(value = 10),
    dispatchType = "INLINED",
    dynamicTypeHintFullName = ArraySeq(),
    lineNumber = Some(value = 4),
    methodFullName = "tmp.c:2:2:TESTM:0",
    name = "TESTM",
    order = 1,
    signature = "",
    typeFullName = "int"
  ),
  Block(
    id = 12L,
    argumentIndex = 1,
    argumentName = None,
    code = "<empty>",
    columnNumber = None,
    dynamicTypeHintFullName = ArraySeq(),
    lineNumber = None,
    order = 1,
    typeFullName = "void"
  ),
  Literal(
    id = 13L,
    argumentIndex = 1,
    argumentName = None,
    code = "42",
    columnNumber = Some(value = 10),
    dynamicTypeHintFullName = ArraySeq(),
    lineNumber = Some(value = 4),
    order = 1,
    typeFullName = "int"
  )
)

可以看到,joern将宏处理为了一个函数调用。同时dispatchType = "INLINED”。

将TESTM作为method来查询:

cpg.method.name("TESTM").ast.l

得到结果:


val res36: List[io.shiftleft.codepropertygraph.generated.nodes.AstNode] = List(
  Method(
    id = 38L,
    astParentFullName = "<global>",
    astParentType = "NAMESPACE_BLOCK",
    code = "<empty>",
    columnNumber = None,
    columnNumberEnd = None,
    filename = "tmp.c",
    fullName = "tmp.c:2:2:TESTM:0",
    hash = None,
    isExternal = true,
    lineNumber = Some(value = 2),
    lineNumberEnd = Some(value = 2),
    name = "TESTM",
    order = 0,
    signature = ""
  ),
  Block(
    id = 39L,
    argumentIndex = 1,
    argumentName = None,
    code = "<empty>",
    columnNumber = None,
    dynamicTypeHintFullName = ArraySeq(),
    lineNumber = None,
    order = 1,
    typeFullName = "ANY"
  ),
  MethodReturn(
    id = 40L,
    code = "RET",
    columnNumber = None,
    dynamicTypeHintFullName = ArraySeq(),
    evaluationStrategy = "BY_VALUE",
    lineNumber = None,
    order = 2,
    typeFullName = "ANY"
  )
)

可见,宏定义的内容并不会在其method节点中,而是在其被调用的节点中(ast中)。

查询宏的定义值

cpg.call.name("MACRO_NAME").astChildren.astChildren.code.dedup.l

//or
cpg.call.name("IRQ_INT_OFFSET").ast.isLiteral.code.distinct.l

不过并不能保证所有的宏都能找到。

使用dedup合并重复项

joern> cpg.call.dispatchType.dedup.l
val res76: List[String] = List("STATIC_DISPATCH", "INLINED")

查找一个函数调用点的调用者函数

cpg.call.name("create_domain_cap").ast.inAst.isMethod.distinct.l
//or
cpg.call.name("create_domain_cap").repeat(_.astParent)(_.until(_.isMethod)).l

基于文件行号查询call site

cpg.call.name("init_freemem").filter(_.location.lineNumber.get == 80)
//_.location.lineNumber 是 Option[Int]类型,先用get开箱再比较。

查找到一个函数参数的dataflow [大量运算,慎用!]

cpg.call.name("create_domain_cap").argument.reachableByFlows(cpg.all).l

形参与实参要分开查询

//形参用于method
cpg.method.name("arch_init_freemem").filename(".*x86.*").parameter.l

//实参用于call
cpg.call.name("arch_init_freemem").argument.l

如果二者用错,method对象是没有argument的,所以会报错。

cpg.method.name("arch_init_freemem").filename(".*x86.*").argument.l
-- [E008] Not Found Error: -----------------------------------------------------
1 |cpg.method.name("arch_init_freemem").filename(".*x86.*").argument.l
  |^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |value argument is not a member of Iterator[io.shiftleft.codepropertygraph.generated.nodes.Method], but could be made available as an extension method.
  |
  |The following import might make progress towards fixing the problem:
  |
  |  import sourcecode.Text.generate
  |
1 error found

call对象可以调用parameter,但是会是一个比较诡异的结果。

cpg.call.name("arch_init_freemem").parameter.l
val res35:
  List[io.shiftleft.codepropertygraph.generated.nodes.MethodParameterIn] = List(
  MethodParameterIn(
    id = 84606L,
    code = "p1",
    columnNumber = None,
    dynamicTypeHintFullName = ArraySeq(),
    evaluationStrategy = "BY_VALUE",
    index = 1,
    isVariadic = false,
    lineNumber = None,
    name = "p1",
    order = 1,
    typeFullName = "ANY"
  )
)

查询类型定义

cpg.typeDecl.name("seL4_BootInfoHeader").l

查询成员(field)所在结构体

cpg.member.name("count").astParent.l

//isFieldIdentifier 可以用来判断ast节点是否是一个field

查询枚举

cpg.member.name("IRQReserved").astParent.l

获取joern所有node type

cpg.all.map( n => n.getClass).dedup.l
val res161:
  List[Class[? <: io.shiftleft.codepropertygraph.generated.nodes.StoredNode]] = List(
  class io.shiftleft.codepropertygraph.generated.nodes.MetaData,
  class io.shiftleft.codepropertygraph.generated.nodes.NamespaceBlock,
  class io.shiftleft.codepropertygraph.generated.nodes.Dependency,
  class io.shiftleft.codepropertygraph.generated.nodes.Import,
  class io.shiftleft.codepropertygraph.generated.nodes.TypeDecl,
  class io.shiftleft.codepropertygraph.generated.nodes.Method,
  class io.shiftleft.codepropertygraph.generated.nodes.Block,
  class io.shiftleft.codepropertygraph.generated.nodes.MethodParameterIn,
  class io.shiftleft.codepropertygraph.generated.nodes.Return,
  class io.shiftleft.codepropertygraph.generated.nodes.Call,
  class io.shiftleft.codepropertygraph.generated.nodes.Unknown,
  class io.shiftleft.codepropertygraph.generated.nodes.Identifier,
  class io.shiftleft.codepropertygraph.generated.nodes.Literal,
  class io.shiftleft.codepropertygraph.generated.nodes.MethodReturn,
  class io.shiftleft.codepropertygraph.generated.nodes.Binding,
  class io.shiftleft.codepropertygraph.generated.nodes.Member,
  class io.shiftleft.codepropertygraph.generated.nodes.Local,
  class io.shiftleft.codepropertygraph.generated.nodes.Modifier,
  class io.shiftleft.codepropertygraph.generated.nodes.FieldIdentifier,
  class io.shiftleft.codepropertygraph.generated.nodes.ControlStructure,
  class io.shiftleft.codepropertygraph.generated.nodes.JumpTarget,
  class io.shiftleft.codepropertygraph.generated.nodes.Type,
  class io.shiftleft.codepropertygraph.generated.nodes.File,
  class io.shiftleft.codepropertygraph.generated.nodes.Namespace,
  class io.shiftleft.codepropertygraph.generated.nodes.MethodParameterOut
)
cpg.all.label.dedup.l
val res230: List[String] = List(
  "META_DATA",
  "NAMESPACE_BLOCK",
  "DEPENDENCY",
  "IMPORT",
  "TYPE_DECL",
  "METHOD",
  "BLOCK",
  "METHOD_PARAMETER_IN",
  "RETURN",
  "CALL",
  "UNKNOWN",
  "IDENTIFIER",
  "LITERAL",
  "METHOD_RETURN",
  "BINDING",
  "MEMBER",
  "LOCAL",
  "MODIFIER",
  "FIELD_IDENTIFIER",
  "CONTROL_STRUCTURE",
  "JUMP_TARGET",
  "TYPE",
  "FILE",
  "NAMESPACE",
  "METHOD_PARAMETER_OUT"
)

列出被多次定义的函数

cpg.method.whereNot(_.name("<.*>.*")).filter(m=>cpg.method.name( m.name ).size>1).name.l.sorted

这种情况在OS中会比较常见。比如多个arch的情况下,相同的函数名会根据不同arch进行不同的定义。当然,编译链接之后只会选择其中之一。例如:

joern> cpg.method.name("write_it_asid_pool").astParentFullName.l
val res14: List[String] = List(
  "include/arch/arm/arch/kernel/vspace.h:<global>",
  "include/arch/riscv/arch/kernel/vspace.h:<global>",
  "include/arch/x86/arch/kernel/vspace.h:<global>",
  "src/arch/arm/32/kernel/vspace.c:<global>",
  "src/arch/arm/64/kernel/vspace.c:<global>",
  "src/arch/riscv/kernel/vspace.c:<global>",
  "src/arch/x86/kernel/vspace.c:<global>"
)

c2cpg.sh在parse过程中显示log

修改joern/joern-cli/conf/log4j2.xml,将需要输出log包的配置改为“TRACE”。

<?xml version="1.0" encoding="UTF-8"?>
<Configuration status="WARN">
    <Appenders>
        <Console name="Console" target="SYSTEM_ERR">
            <PatternLayout pattern="%d{yyy-MM-dd HH:mm:ss.SSS} %p %c{0}: %msg%n"/>
        </Console>
    </Appenders>
    <Loggers>
        <Logger name="org.apache.tomcat" level="error"/>
        <Logger name="org.apache.jasper" level="error"/>
        <Logger name="org.reflections" level="off" />
        <Logger name="org.reflections8" level="off" />
        <Logger name="ghidra.app.plugin.core.analysis" level="off" />
        <Logger name="io.shiftleft.overflowdb" level="TRACE" />
        <Root level="TRACE">
            <AppenderRef ref="Console" />
        </Root>
    </Loggers>
</Configuration>

参考:c2cpg.sh --log-problems --log-preprocessor --define "FAR= " -o binder.cpg binder.c 命令,开启log输出。

列出所有全局变量

cpg.local.filter(_.location.methodShortName=="<global>").l.size

列出给定变量参与的赋值操作

cpg.assignment.where(_.argument.isIdentifier.name("binder_last_debug_id")).l
  • 14
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值