CodeQL学习笔记

本篇笔记参考CodeQL官方文档:https://codeql.github.com/docs

QL教程

QL是一种逻辑编程语言,与SQL相似,但用法略有不同。本节主要关注语法,环境搭建、选择数据库等过程略。

QL查询

单个返回结果的查询(Query)通常如下模版所示:

from /* ... variable declarations ... */
where /* ... logical formulas ... */
select /* ... expressions ... */

例如,以下查询语句的返回结果是42:

from int x, int y
where x = 6 and y = 7
select x * y

查询语句也可以返回多个结果,例如,以下语句计算1到10之间的所有勾股数(Pythagorean triples):

from int x, int y, int z
where x in [1..10] and y in [1..10] and z in [1..10] and
      x*x + y*y = z*z
select x, y, z

可以使用类来简化查询语句,以下示例使用SmallInt类来表示1到10之间的整数:

class SmallInt extends int {
  SmallInt() { this in [1..10] }
  int square() { result = this*this }
}

from SmallInt x, SmallInt y, SmallInt z
where x.square() + y.square() = z.square()
select x, y, z

CodeQL查询示例

CodeQL库可以帮助我们发现代码库中的一些安全漏洞。要导入特定编程语言的CodeQL库,要在query开头加上import <language>

import python

from Function f
where count(f.getAnArg()) > 7
select f

from子句定义了变量f,表示Python函数;where部分表示筛选具有7个以上参数的函数;select子句列出符合条件的函数。

import javascript

from Comment c
where c.getText().regexpMatch("(?si).*\\bTODO\\b.*")
select c

from子句定义了变量c,表示JavaScript注释;where部分表示筛选包含"TODO"的注释;select子句列出符合条件的注释。

import java

from Parameter p
where not exists(p.getAnAccess())
select p

from子句定义了变量p,表示Java参数;where部分表示筛选没被使用过的参数(not accessed);select子句列出符合条件的参数。

CodeQL查询

CodeQL查询用于分析代码中与安全性、正确性、可维护性和可读性相关的问题。查询主要分为两类:

  • Alert queries: queries that highlight issues in specific locations in your code
  • Path queries: queries that describe the flow of information between a source and a sink in your code.

Query基本结构

/**
 *
 * Query metadata
 *
 */

import /* ... CodeQL libraries or modules ... */

/* ... Optional, define CodeQL classes and predicates ... */

from /* ... variable declarations ... */
where /* ... logical formula ... */
select /* ... expressions ... */

使用CodeQL编写的query文件扩展名为.ql,并包含select子句。本节主要介绍Alert queries,关于Path queries参考Creating path queries

查询元数据(Query metadata)
元数据提供有关查询目的的信息,还指定了如何解释(interpret)和显示(display)查询结果。提供给开源存储库或使用CodeQL CLI分析数据库的查询必须指定查询类型(@kind)。@kind属性指示如何解释和显示查询分析的结果:

  • Alert query的元数据必须包含@kind problem,以指定输出结果为简单的警报
  • Path query的元数据必须包含@kind path-problem,以指定输出结果为记录了一系列代码位置的警报
  • Diagnostic query的元数据必须包含@kind diagnostic,以指定输出结果为关于提取过程的故障诊断数据
  • Summary query的元数据必须包含@kind metric@tags summary,以指定输出结果为CodeQL database的总结信息(summary metrics)

关于metadata的更多信息参考Metadata for CodeQL queries

Import statements
编写alert query时,通常需要import项目对应的编程语言的标准库,参考:CodeQL language guides
还有一些库包含了常用的predicates,types和其他用于分析的模块(例如data flow、control flow、taint-tracking),path queries通常要求import data flow库,参考Creating path queries
此外,也可以自定义类和谓词,参考Defining a predicateDefining a class

From clause
from子句声明查询中使用的变量,声明的格式为<type> <variable name>。变量类型参考types

Where clause
where子句定义了应用于在from子句中声明的变量的逻辑条件,使用aggregationspredicatesformulas来限定变量的范围。

Select clause
select子句的结构需要与元数据中的@kind相对应。例如,alert queries (@kind problem)的select子句结构为:

select element, string
  • element: query识别的code element,定义了alert的显示位置
  • string: message(也可以包含links和placeholders),解释了产生alert的原因

可以在message中使用placeholders,使用$@定义一个placeholder,之后的两个参数分别为link target和link text,下例用于查找扩展了其他类的Java类:

/**
 * @kind problem
 */

import java

from Class c, Class superclass
where superclass = c.getASupertype()
select c, "This class extends the class $@.", superclass, superclass.getName()

查询结果为:
在这里插入图片描述

更多信息参考Defining the results of a query

其他类型query的select子句结构见select clause

Query help files

查询帮助文件用于向其他用户解释查询的目的,参考Query help files

Providing locations in CodeQL queries

参考Providing locations in CodeQL queries

当向用户展示信息时,应用程序需要能够从查询结果中提取位置信息。QL类通过以下机制之一来提供位置信息:

  1. Providing URLs
  2. Providing location information
  3. Using extracted location information

数据流分析(Data flow analysis)

参考About data flow analysis

数据流图(Data flow graph)
CodeQL data flow libraries的两种数据流:

  • Local data flow: 单个函数内的数据流
  • Global data flow: 整个program的数据流(calculating data flow between functions and through object properties)

Creating path queries

Path query模版:

/**
 * ...
 * @kind path-problem
 * ...
 */

import <language>
// For some languages (Java/C++/Python) you need to explicitly import the data flow library, such as
// import semmle.code.java.dataflow.DataFlow
import DataFlow::PathGraph
...

from MyConfiguration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "<message>"

CodeQL for C and C++

CodeQL library for C and C++

cpp.qll引入了所有CodeQL的核心C/C++库,因此在query前加上import cpp即可
常用的Declaration、Statement、Expression、Type、Preprocessor类参见CodeQL library for C and C++

Functions in C and C++

查询所有静态函数

import cpp

from Function f
where f.isStatic()
select f, "This is a static function."

查询未被调用过的函数

import cpp

from Function f
where not exists(FunctionCall fc | fc.getTarget() = f)
select f, "This function is never called."

查询未被调用过且未被函数指针引用

import cpp

from Function f
where not exists(FunctionCall fc | fc.getTarget() = f)
  and not exists(FunctionAccess fa | fa.getTarget() = f)
select f, "This function is never called, or referenced with a function pointer."

查询使用了可变格式字符串的sprintf函数

import cpp

from FunctionCall fc
where fc.getTarget().getQualifiedName() = "sprintf"
  and not fc.getArgument(1) instanceof StringLiteral
select fc, "sprintf called with variable format string."

Expressions, types, and statements in C and C++

CodeQL中的C/C++ statements

  • Stmt
    • Loop
      • WhileStmt
      • ForStmt
      • DoStmt
    • ConditionalStmt
      • IfStmt
      • SwitchStmt
    • TryStmt
    • ExprStmt - expressions used as a statement; for example, an assignment
    • Block - { } blocks containing more statements

查找在初始化中对整数进行了0赋值的for循环

import cpp

from AssignExpr e, ForStmt f
// the assignment is in the 'for' loop initialization statement
where e.getEnclosingStmt() = f.getInitialization()
  and e.getRValue().getValue().toInt() = 0
  and e.getLValue().getType().getUnspecifiedType() instanceof IntegralType
select e, "Assigning the value 0 to an integer, inside a for loop initialization."

循环初始化是一个statement(Stmt)而不是expression(Expr),赋值表达式AssignExpr类被包裹在ExprStmt类中,因此要使用Expr.getEnclosingStmt()来获取套在表达式外的最近的StmtType.getUnspecifiedType()将typedef类型解析为其基础类型,例如typedef int myInt;myInt被解析为int

查找for循环体中的0赋值

import cpp

from AssignExpr e, ForStmt f
// the assignment is in the for loop body
where e.getEnclosingStmt().getParentStmt*() = f.getStmt()
  and e.getRValue().getValue().toInt() = 0
  and e.getLValue().getType().getUnderlyingType() instanceof IntegralType
select e, "Assigning the value 0 to an integer, inside a for loop body."

C/C++中的数据流分析

Local data flow

Using local data flow
Local data flow library在DataFlow模块中,其中Node类表示数据可以流过的任何元素。Node分为expression nodes(ExprNode)和parameter nodes(ParameterNode)两类。可以使用成员谓词asExprasParameter来实现data flow nodes和expressions/parameters之间的转换。

class Node {
  /** Gets the expression corresponding to this node, if any. */
  Expr asExpr() { ... }

  /** Gets the parameter corresponding to this node, if any. */
  Parameter asParameter() { ... }

  ...
}

或使用谓词exprNodeparameterNode

/**
 * Gets the node corresponding to expression `e`.
 */
ExprNode exprNode(Expr e) { ... }

/**
 * Gets the node corresponding to the value of parameter `p` at function entry.
 */
ParameterNode parameterNode(Parameter p) { ... }

谓词localFlowStep(Node nodeFrom, Node nodeTo)在从节点nodeFromnodeTo存在直接数据流边(immediate data flow edge)的时候成立。该谓词可以递归调用(使用+和*运算符),预定义的递归谓词localFlowlocalFlowStep*效果相同。

DataFlow::localFlow(DataFlow::parameterNode(source), DataFlow::exprNode(sink))

Using local taint tracking
Local taint tracking扩展了local data flow,它额外考虑了non-value-preserving flow steps,由模块TaintTracking实现。相似的,谓词localTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo)在从节点nodeFromnodeTo存在直接的taint propagation edge时成立,该谓词也可以使用+和*递归调用,或者使用递归版本localTaint
Examples
使用Local数据流分析查找所有可能流入fopen的filename参数的expressions:

import cpp
import semmle.code.cpp.dataflow.DataFlow

from Function fopen, FunctionCall fc, Expr src
where fopen.hasQualifiedName("fopen")
  and fc.getTarget() = fopen
  and DataFlow::localFlow(DataFlow::exprNode(src), DataFlow::exprNode(fc.getArgument(0)))
select src

查找用于打开文件的public parameter:

import cpp
import semmle.code.cpp.dataflow.DataFlow

from Function fopen, FunctionCall fc, Parameter p
where fopen.hasQualifiedName("fopen")
  and fc.getTarget() = fopen
  and DataFlow::localFlow(DataFlow::parameterNode(p), DataFlow::exprNode(fc.getArgument(0)))
select p

查找格式字符串不是硬编码的格式化函数:

import semmle.code.cpp.dataflow.DataFlow
import semmle.code.cpp.commons.Printf

from FormattingFunction format, FunctionCall call, Expr formatString
where call.getTarget() = format
  and call.getArgument(format.getFormatParameterIndex()) = formatString
  and not exists(DataFlow::Node source, DataFlow::Node sink |
    DataFlow::localFlow(source, sink) and
    source.asExpr() instanceof StringLiteral and
    sink.asExpr() = formatString
  )
select call, "Argument to " + format.getQualifiedName() + " isn't hard-coded."

Global data flow

Global data flow分析比local data flow更加不准确,且需要更多的时间和内存。
Using global data flowC
通过扩展DataFlow::Configuration类来使用Global data flow库:

import semmle.code.cpp.dataflow.DataFlow

class MyDataFlowConfiguration extends DataFlow::Configuration {
  MyDataFlowConfiguration() { this = "MyDataFlowConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    ...
  }

  override predicate isSink(DataFlow::Node sink) {
    ...
  }
}

configuration中定义了以下谓词:

  • isSource—defines where data may flow from
  • isSink—defines where data may flow to
  • isBarrier—optional, restricts the data flow
  • isBarrierGuard—optional, restricts the data flow
  • isAdditionalFlowStep—optional, adds additional flow steps

通过谓词hasFlow(DataFlow::Node source, DataFlow::Node sink)来实现数据流分析:

from MyDataFlowConfiguration dataflow, DataFlow::Node source, DataFlow::Node sink
where dataflow.hasFlow(source, sink)
select source, "Data flow to $@.", sink, sink.toString()

Using global taint tracking

import semmle.code.cpp.dataflow.TaintTracking

class MyTaintTrackingConfiguration extends TaintTracking::Configuration {
  MyTaintTrackingConfiguration() { this = "MyTaintTrackingConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    ...
  }

  override predicate isSink(DataFlow::Node sink) {
    ...
  }
}

The following predicates are defined in the configuration:

  • isSource—defines where taint may flow from
  • isSink—defines where taint may flow to
  • isSanitizer—optional, restricts the taint flow
  • isSanitizerGuard—optional, restricts the taint flow
  • isAdditionalTaintStep—optional, adds additional taint steps
    Examples
import semmle.code.cpp.dataflow.DataFlow

class EnvironmentToFileConfiguration extends DataFlow::Configuration {
  EnvironmentToFileConfiguration() { this = "EnvironmentToFileConfiguration" }

  override predicate isSource(DataFlow::Node source) {
    exists (Function getenv |
      source.asExpr().(FunctionCall).getTarget() = getenv and
      getenv.hasQualifiedName("getenv")
    )
  }

  override predicate isSink(DataFlow::Node sink) {
    exists (FunctionCall fc |
      sink.asExpr() = fc.getArgument(0) and
      fc.getTarget().hasQualifiedName("fopen")
    )
  }
}

from Expr getenv, Expr fopen, EnvironmentToFileConfiguration config
where config.hasFlow(DataFlow::exprNode(getenv), DataFlow::exprNode(fopen))
select fopen, "This 'fopen' uses data from $@.",
  getenv, "call to 'getenv'"

Detecting a potential buffer overflow

一个buffer overflow的例子,malloc时没有预留null termination character的位置。

void processString(const char *input)
{
    char *buffer = malloc(strlen(input));

    strcpy(buffer, input);

    ...
}

使用CodeQL查询在malloc中只使用strlen(string)作为参数的情况(详细便携过程请参考Detecting a potential buffer overflow):

import cpp

class MallocCall extends FunctionCall
{
    MallocCall() { this.getTarget().hasGlobalName("malloc") }

    Expr getAllocatedSize() {
        if this.getArgument(0) instanceof VariableAccess then
            exists(LocalScopeVariable v, SsaDefinition ssaDef |
                result = ssaDef.getAnUltimateDefiningValue(v)
                and this.getArgument(0) = ssaDef.getAUse(v))
        else
            result = this.getArgument(0)
    }
}

from MallocCall malloc
where malloc.getAllocatedSize() instanceof StrlenCall
select malloc, "This allocation does not include space to null-terminate the string."

SSA库以静态单赋值(SSA)形式表示变量。在这种形式中,每个变量只赋值一次,每个变量都在使用前定义。使用SSA变量可以简化查询语句,因为它已经进行了很多local data flow分析。

Using the guards library in C and C++

Guards库(semmle.code.cpp.controlflow.Guards)可以用来识别控制程序执行的条件表达式,参考Using the guards library in C and C++

Using range analysis for C and C++

Range analysis(semmle.code.cpp.rangeanalysis.SimpleRangeAnalysis)可以用来确定表达式的上限和下限,或者确定表达式是否可能发生溢出。参考Using range analysis for C and C++

Hash consing and value numbering

使用semmle.code.cpp.valuenumbering.HashCons识别语法相同的表达式,使用semmle.code.cpp.valuenumbering.GlobalValueNumbering识别在运行时拥有相同值的表达式。参考Hash consing and value numbering

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值