CodeQL 编写经验（一）：规则编写语法树分析和调用分析等通用技巧

yyyayo

已于 2023-02-11 18:04:24 修改

阅读量1.2k

点赞数

分类专栏：程序分析与漏洞检测文章标签： c++ 安全软件工程

于 2022-08-18 17:42:42 首次发布

本文链接：https://blog.csdn.net/yyyayo/article/details/126410691

版权

程序分析与漏洞检测专栏收录该内容

4 篇文章 3 订阅

订阅专栏

通用技巧

通配符 `%`

在内置类型 string 的 matches 方法中使用，如 "anythingstring%".matches("%string\\%") 。

`*` 和 `+` 的作用

常用于 getASuccessor() 、 getQualifier() 和 getEnclosingFunction() 等可以多次调用的方法， * 表示 0 或更多次， + 表示 1 或更多次。

参考：Recursion — CodeQL (github.com)

控制迭代次数

TODO

规则库中的 Def 类

/** A definition of a stack variable. */
library class Def extends DefOrUse {
  Def() { definition(_, this) }

  override SemanticStackVariable getVariable(boolean isDef) {
    definition(result, this) and isDef = true
  }
}

/**
 * Holds if `def` is a (potential) assignment to stack variable `v`. That is,
 * the variable may hold another value in the control-flow node(s)
 * following `def` than before.
 */
predicate definition(SemanticStackVariable v, Expr def) {
  def = v.getInitializer().getExpr()
  or
  variableAccessedAsValue(v.getAnAccess(), def.(Assignment).getLValue().getFullyConverted())
  or
  variableAccessedAsValue(v.getAnAccess(), def.(CrementOperation).getOperand().getFullyConverted())
  or
  exists(AsmStmt asmStmt |
    def = asmStmt.getAChild() and
    def = v.getAnAccess().getParent*()
  )
  or
  definitionByReference(v.getAnAccess(), def)
}

可以看出， Def 类只会是初始化中的 getExpr() 内容，或直接的赋值或自增自减。若需要跟踪到 Def 使用的函数调用，则可以按照类似下面的操作：

def = funcall 
or exists(Assignment assign |
    assign.getRValue().getAChild*() = funcall
    and assign = def
)

规则库中的空指针解引用相关谓语复用

在 semmle.code.cpp.controlflow.Dereferenced 中的 predicate dereferencedByOperation(Expr op, Expr e) 对几种解引用的情况都做了定义，可以复用。

在 semmle.code.cpp.controlflow.Nullness 中的 predicate checkedNull(Variable var, ControlFlowNode node) 和 predicate checkedValid(Variable var, ControlFlowNode node) 分别表示在当前结点 var 可能为空或不为空。

AST 分析

获取重载的运算符

被重载的运算符不再是 Operation 类型，而是 FuncationCall 类。

Operation一级子类	Operation二级子类	例子（未重载）
Assignment	AssignExpr	见下一节
	AssignOperation	见下一节
	BlockAssignExpr	见下一节
BinaryOperation	BinaryArithmeticOperation	AddExpr `c = a + b;` RemExpr `c = a % b;` MaxExpr `c = a >? b;` 等
	BinaryBitwiseOperation	BitwiseAndExpr `unsigned c = a & b;` LShiftExpr `unsigned c = a << b;` 等
	BinaryLogicalOperation	LogicalAndExpr `if (a && b) { }` LogicalOrExpr `if (a
	ComparisonOperation	EqualityOperation: A C/C++ equality operation, that is, either `==` or `!=`. RelationalOperation: A C/C++ relational operation, that is, one of `<=`, `<`, `>`, or `>=`.
	SpaceshipExpr	`auto c = (a <=> b);`
UnaryOperation	AddressOfExpr	`int *ptr = &var;`
	PointerDereferenceExpr	`int var = *varptr;`
	UnaryArithmeticOperation	CrementOperation `operator++` or `operator--`; UnaryMinusExpr `b = -a` 等
	UnaryBitwiseOperation	ComplementExpr `unsigned c = ~a;`
	UnaryLogicalOperation	NotExpr `c = !a;`
	等
ConditionalExpr	无	`a = (b > c ? d : e);`

获取重载后的运算符：

from FunctionCall call
where call.getTarget().getName().matches("operator%")
select call

一般为 operation-> operation[] operation++ operator delete 等格式。

声明、定义、初始化和赋值

概念和实现：

声明：只是规定了变量的类型和名字，而没有进行内存分配。 CodeQL 中的类是 Declaration 和 DeclarationEntry ， Declaration 常用 getName 和 getQualifiedName ， DeclarationEntry 是不同文件中各自的实际声明点，常用 getDeclaration 。对于各派生类若声明都在同一文件，则 DeclarationEntry 也只有一个。
定义：不仅规定了变量的类型和名字，而且进行了内存分配，也可能会对量进行初始化。 CodeQL 中定义是包含在声明中的。
初始化：当对象在创建时获得了一个特定值。 CodeQL 中的类是 Initializer ，常用方法 getDeclaration 和 getExpr ，分别用于获取其中的声明和初始化的内容。
赋值： CodeQL 中的类是 Assignment ，包含 AssignExpr 、 AssignOperation 和 BlockAssignExpr 三个子类。

关于声明，注意 Declaration 和 DeclarationEntry 没有 getEnclosingFunction() 获取不到其所在的函数，需要经过 DeclStmt 中转。如果要获取 int i; 中的变量 i ，需要使用 VariableDeclarationEntry.getVariable() ，其中 VariableDeclarationEntry 是 DeclarationEntry 的子类。

一般有显式初始化 int i = func(x); 和先声明和定义（默认初始化）后再赋值 int i; i = func(x); 两种情况。对于第一种显式初始化的情况，即为类 Initializer ，在 CodeQL 中调用 getDeclaration() 可以获取到 i ，调用 getExpr() 可以获取到 call to func 。如果需要获得完整的 DeclStmt 类型的声明语句，即 int i = func(x) ，则需要借助 getADeclarationEntry() 或 getADeclaration() 来搭建 Declaration 和 DeclStmt 的桥梁。

DeclStmt getDeclStmt(Declaration decl){
    // decl.getADeclarationEntry() = result.getADeclarationEntry()
    decl = result.getADeclaration() // 更全
}
DeclStmt getDeclStmt(Initializer init){
    init.getDeclaration().getADeclarationEntry() = result.getADeclarationEntry()
}
DeclStmt getDeclStmtSimple(Initializer init){
    init.getDeclaration() = result.getADeclaration()
}
Initializer getDeclStmtInitializer(DeclStmt stmt){
    result.getDeclaration().getADeclarationEntry() = stmt.getADeclarationEntry()
}

Assignment一级子类	Assignment二级子类	例子（未重载）
AssignExpr	无	A non-overloaded assignment operation with the operator `=`. 如 `a = b;` 不包括 `int a = b;` 这是 Initializer
AssignOperation	AssignArithmeticOperation	A non-overloaded arithmetic assignment operation on a non-pointer lvalue: `+=`, `-=`, `*=`, `/=` and `%=`.
	AssignBitwiseOperation	A non-overloaded bitwise assignment operation: `&=`, `
	AssignPointerAddExpr	A non-overloaded `+=` pointer assignment expression
	AssignPointerSubExpr	A non-overloaded `-=` pointer assignment expression.
BlockAssignExpr	略	略

数组聚合字面值 ArrayAggregateLiteral

ArrayAggregateLiteral 类型指的是如 {{1, 2}, {3, 4}} 这样的数组，一般用于赋值语句的右侧。比如在给一个 std::map 初始化时，调用初始化列表（Initializer List）构造函数，就将这样的数组聚合字面值转化为了初始化列表。比如下面的例子，大括号内的就可以通过 ArrayAggregateLiteral 访问到：

std::map<Point, double, PointCmp> mag = {
      { {5, -12}, 13 },
      { {3, 4},   5 },
      { {-8, -15}, 17 }
  };

对于遍历 ArrayAggregateLiteral 其每一个成员：

from ArrayAggregateLiteral arrayliteral, int i, Expr expr
where i >= 0 and i < arrayliteral.getArraySize()
and arrayliteral.getElementExpr(i) = expr

匹配成员变量/函数和对应的 qualifier

getQualifier() 用于获取对象成员变量对应的对象（类型限定符），可以是通过 -> 或 . 获取，比如 ptr->x 和 (*ptr).x 分别会得到 ptr 和 *ptr ，对于智能指针 smart_ptr->x ，会得到对 operator-> 的函数调用。这个方法一般对 FieldAccess 或 FunctionCall 使用， FieldAccess 有 DotFieldAccess 和 PointerFieldAccess 两种。 FieldAccess 是 VariableAccess 子类型。

exists(Expr expr, Expr target |
    expr.(VariableAccess) = target
    or expr.(VariableAccess).getQualifier*() = target
)
exists(Expr expr, Expr target |
    expr.(FunctionCall).getQualifier() = target
)

而对于更上层的表达式如 ptr->value == temp ，若想要获取到其中 operation 的 operand ，则获取的是 value 即 FieldAccess 这个位置。

获取成员的类和模板类

Declaration 类具有方法 getDeclaringType() ，用于获取声明此成员的类。而 Function 类的一个父类就是 Declaration ，可以使用此方法 getDeclaringType() 。对于模板类，因为具体类名中会将模板 T 实例化，因此查询时使用通配符，如 getDeclaringType().getName().matches("A%") 。

内存分配和释放相关

对于 new 和 delete 常用类 NewOrNewArrayExpr 、 NewExpr 、 NewArrayExpr 、 DeleteExpr 、 DeleteArrayExpr ，一般都未重载。获取分配的类型使用 getAllocatedType()，会得到 int 或 int[5] 这样的结果；获取分配时初始化的内容使用 getInitializer() ，对于 new int(4) 得到结果4 ，对于 new std::vector(4) 得到一个函数调用 std::vector::vector(size_t) 参数为 4 ；获取释放的对象使用 getExpr() 。

至于 malloc 和 free 可以通过对 malloc 、free 的函数调用来获取。

不使用 operator new 、 operator new[] 、 operator delete 和 operator delete[] ，以及类 AllocationExpr 和 DeallocationExpr 。这个对应的是具体的内存分配函数，即 operator new 和 operator delete 此类函数。

函数指针调用和普通函数调用

在调用函数指针时，类型不是 FunctionCall 而是普通的 Call ，因为它是 call to expression。

Call 有两种类型： ExprCall 和 FunctionCall ，其中 ExprCall 就是通过函数指针调用，它还有一个子类是 VariableCall ，这种情况对应函数指针是来自具体的变量的情况。对于 VariableCall ，需要通过 getVariable() 方法获取具体调用的函数指针名字。

CodeQL 的 cpp 库中（cpp/ql/lib/semmle/code/cpp/pointsto/CallGraph.qll 即 semmle.code.cpp.pointsto.PointsTo.CallGraph）有个函数 resolvedCall(Call call, Function called) 包含了以下的情况。（还有其他库可以参考文档）

predicate resolvedCall(Call call, Function called) {
  call.(FunctionCall).getTarget() = called
  or
  call.(DestructorCall).getTarget() = called
  or
  exists(ExprCall ec, TargetPointsToExpr pte |
    ec = call and ec.getExpr() = pte and pte.pointsTo() = called
  )
  or
  exists(TargetPointsToExpr pte |
    call.getQualifier() = pte and
    pte.resolve() = called
  )
}

获取函数的 return

通过 ReturnStmt.getExpr() 获取具体的内容，如对于 return 1+2; 可以获取到 1+2 这个 Expr 。

from ReturnStmt retstmt
where retstmt.getExpr().getType() instanceof IntType
select retstmt

通过 Function.getType() 获取函数返回类型，具体的类型判断可以用：

from Function func
where func.getType() instanceof BoolType
// where func.getType().getName() = "bool"
select func

循环中的语句和表达式提取

使用 Loop 的 getAChild() 方法只能获取到如 for 或 while 的括号中的每个语句和整个循环体，整个循环体会作为一整个 Stmt 。

如果要访问循环体中的每一个语句：

from Loop loop, Stmt stmt
where stmt = loop.getStmt().getAChild*()
select stmt

或取循环体中的每一个表达式（不包含 Loop.getCondition() 中的表达式）：

from Loop loop, Expr expr
where expr.getEnclosingElement+() = loop.getStmt()
select expr

如果要访问每一个表达式，建议写成：

from Loop loop, Expr expr
where expr.getEnclosingElement+() = loop
select expr

调用关系分析

追踪表达式所在函数的调用关系

可以逆向寻找表达式所在的函数被调用的地方，但是需要注意这种情况无法从作为函数指针的内部去 getACallToThisFunction 。因为指针中的内容是不确定的，无法确定某一个使用函数指针的地方具体使用的是什么函数，所以无法逆向从函数内部找到它作为函数指针被调用的位置。

predicate isCalled(Expr expr, Function func){
    func.getName() = "Caller"
    and (expr.getEnclosingFunction() = func
    or expr.getEnclosingFunction().getACallToThisFunction().getEnclosingFunction() = func)
}

这里 getEnclosingFunction() 或 getEnclosingElement() 和 joern 的 .inAst 有点类似； getACallToThisFunction() 则和 joern 的 .callIn 类似； getACallToThisFunction().getEnclosingFunction() 和 joern 的 .caller 类似。