深入codeql

最新推荐文章于 2024-03-26 09:46:26 发布

好吃吗

最新推荐文章于 2024-03-26 09:46:26 发布

阅读量674

点赞数

分类专栏： CodeQL

本文链接：https://blog.csdn.net/xhdxhdxhd/article/details/119089159

版权

CodeQL 专栏收录该内容

5 篇文章 1 订阅

订阅专栏

介绍

关于Codeql，参考之前的文章：placeholder

这里将会基于codeql-go的源码来探索codeql的原理。

准备工作

extractor

trap文件

trap文件是一个流文件，每次通过Emit调用往其中写入下列数据之一：

全局key和id的映射，例如: id=key, 其中，id是从10000自增的整数; key的例子有：
- universe;scope，表示全局的scope所对应的id；
- encoding/json;package则表示encoding/json这个包的id；
- {scopeID},name;object scope中声明的对象
- {id},methodName;method 表示某个类型的方法所对应的id；
- 1;basictype 对应types.BasicKind，基本类型，如1=bool,2=int...;
- 2,{elemTypeID};arraytype, 数组类型,2表示长度;
- {elemTypeID};slicetype slice类型
- field0Name,{field0TypeID},field0Tag,field1Name,{field1TypeID},field1Tag...;structtype 结构体类型
- {elemTypeID};pointer 指针类型
- method0Id,{method0TypeID},method1Id,{method1TypeID}...;interfacetype interface类型，注意，如果两个interface具有完全一样的方法，则它们的ID是相同的
- {type0},{type1}...;tupletype Tuple类型，即返回多个值的函数
- {paramType0},{paramType1}...;{retType0},{retType1}...;signaturetype 函数签名
- {keyTypeID},{valueTypeID};maptype map类型
- Dir,{elemTypeID};chantype chan类型
- {underlyingTypeID};namedtype 使用type关键子定义的类型；type X struct{}, type Kind int
- path;folder 目录
- path;sourcefile 源文件
表的一行，例如: tableName(1,"2","3")；
内容参考: trap文件示例

在一次解析的过程中，存在一个trap文件的根目录，每个包都会在这个根目录下生成对应的trap文件。

表

extractor/dbscheme/tables.go

表结构定义：

// extractor/dbscheme/dbscheme.go

// A Table represents a database table
type Table struct {
	name    string
	schema  []Column
	keysets [][]string
}

一个表的元信息由name,schema和keysets构成。

一些例子：

scopes(#id,kind)             // 定义所有的scope
scopenesting(inner,outter)  // 定义scope的嵌套关系
objects(#id, kind,name)    // 包中所有声明的对象
has_location(#id ref,location) // 元素的location

database scheme

以codeql-go/ql/src/go.dbscheme为例：

#keyset[parent, idx]
exprs(unique int id: @expr, int kind: int ref, int parent: @exprparent ref, int idx: int ref);

exprs声明了一个表，括号里面的是表的列信息。
每个列由定义名称类型和引用类型两部分组成：
unique int id定义了int类型的id字段，unique前缀表明该列是唯一的；:之后的内容:@expr，定义新的类型@expr,它表明@expr本质上是一个int类型。

int kind: int ref,前面部分同理，定义名称和类型；第二部分int ref,则表明该列的值是对int类型的引用。

int parent: @exprparent ref是对@exprparent类型的引用(ref），而不是定义新的类型。@exprparent类型的定义如下:

files(unique int id: @file, string name: string ref, string simple: string ref, string ext: string ref, int fromSource: int ref);

@exprparent = @funcdef | @file | @expr | @field | @stmt | @decl | @spec;

最后，#keyset[parent,idx]定义了parent,idx两列的唯一性。

这些基本上就是dbscheme的全部语法了。

从这个dbscheme中，我们不难看出语言的所有AST类型，以及相应的层级关系。

同时，dbscheme也表明了代码的关系数据库模型。

data-flow

从上面dbscheme的定义中，我们基本上看到的是关于静态AST的类型定义，也就是语法分析的结果；那么，语义分析的类型是怎么定义的呢？也就是说，AST节点如何与实体类型发生关联？

全部定义均在codeql-go/ql/src/semmle/go/dataflow/internal/DataFlowUtil.qll中:

import go
import semmle.go.dataflow.FunctionInputsAndOutputs
private import DataFlowPrivate

cached
private newtype TNode =
  MkInstructionNode(IR::Instruction insn) or
  MkSsaNode(SsaDefinition ssa) or
  MkGlobalFunctionNode(Function f)

TNode的定义完全取决于IR::Instruction,SsaDefinition以及Function，也就是说，TNode是这3种类型的并集。

SSA的定义来自于:codeql-go/ql/src/semmle/go/dataflow/SsaImpl.qll

我们可以看到，SSA的所有定义都是使用ql本身实现的，并未直接使用golang本身提供的ssa工具解析生成dbscheme.

附录

trap文件示例

截取自gzip包：

#10000=@"compress/gzip;package"
#10001=@"{#10000};scope"
scopes(#10001, 1)
#10002=@"universe;scope"
scopenesting(#10001, #10002)
#10003=@"20;basictype"
types(#10003, 20)
#10004=@"{{#10001}},BestCompression;object"
objects(#10004, 3, "BestCompression")
objectscopes(#10004, #10001)
#10005=@"{{#10001}},BestSpeed;object"
objects(#10005, 3, "BestSpeed")
objectscopes(#10005, #10001)
#10006=@"{{#10001}},DefaultCompression;object"
objects(#10006, 3, "DefaultCompression")
objectscopes(#10006, #10001)
#10007=@"{{#10002}},error;object"
objects(#10007, 2, "error")
#10008=@"{{#10007}};namedtype"
typename(#10008, "error")
#10009=@"17;basictype"
types(#10009, 17)
#10010=@";{{#10009}};signaturetype"
component_types(#10010, -1, "", #10009)
types(#10010, 32)
#10011=@"Error,{{#10010}};interfacetype"
#10012=@"{{#10008}},Error;method"
objects(#10012, 6, "Error")
#10013=@"{{#10012}},;receiver"
objects(#10013, 5, "")
methodreceivers(#10012, #10013)
component_types(#10011, 0, "Error", #10010)
types(#10011, 30)
underlying_type(#10008, #10011)
type_objects(#10008, #10007)
methodhosts(#10012, #10008)
types(#10008, 37)
#10014=@"{{#10001}},ErrChecksum;object"
objects(#10014, 5, "ErrChecksum")
objectscopes(#10014, #10001)
#10015=@"{{#10001}},ErrHeader;object"
objects(#10015, 5, "ErrHeader")
objectscopes(#10015, #10001)
#10016=@"{{#10001}},Header;object"
objects(#10016, 1, "Header")
#10017=@"{{#10016}};namedtype"
typename(#10017, "Header")
#10018=@"8;basictype"
types(#10018, 8)
#10019=@"{{#10018}};slicetype"
element_type(#10019, #10018)
types(#10019, 27)
#10020=@"time;package"
#10021=@"{#10020};scope"
#10022=@"{{#10021}},Time;object"
objects(#10022, 1, "Time")
#10023=@"{{#10022}};namedtype"
typename(#10023, "Time")
#10024=@"11;basictype"
types(#10024, 11)
#10025=@"6;basictype"
types(#10025, 6)
#10026=@"{{#10021}},Location;object"
objects(#10026, 1, "Location")
#10027=@"{{#10026}};namedtype"
typename(#10027, "Location")
#10028=@"{{#10021}},zone;object"
objects(#10028, 1, "zone")
#10029=@"{{#10028}};namedtype"
typename(#10029, "zone")
#10030=@"2;basictype"
types(#10030, 2)
#10031=@"1;basictype"
types(#10031, 1)
#10032=@"name,{{#10009}},,offset,{{#10030}},,isDST,{{#10031}},;structtype"
#10033=@"{{#10032}},name;field"
objects(#10033, 5, "name")
fieldstructs(#10033, #10032)
component_types(#10032, 0, "name", #10009)
#10034=@"{{#10032}},offset;field"
objects(#10034, 5, "offset")
fieldstructs(#10034, #10032)
component_types(#10032, 1, "offset", #10030)
#10035=@"{{#10032}},isDST;field"
objects(#10035, 5, "isDST")
fieldstructs(#10035, #10032)
component_types(#10032, 2, "isDST", #10031)
types(#10032, 28)
underlying_type(#10029, #10032)
type_objects(#10029, #10028)
types(#10029, 37)
#10036=@"{{#10029}};slicetype"
element_type(#10036, #10029)
types(#10036, 27)
#10037=@"{{#10021}},zoneTrans;object"
objects(#10037, 1, "zoneTrans")
#10038=@"{{#10037}};namedtype"
typename(#10038, "zoneTrans")
types(#10018, 8)
#10039=@"when,{{#10025}},,index,{{#10018}},,isstd,{{#10031}},,isutc,{{#10031}},;structtype"
#10040=@"{{#10039}},when;field"
objects(#10040, 5, "when")
fieldstructs(#10040, #10039)
component_types(#10039, 0, "when", #10025)
#10041=@"{{#10039}},index;field"
objects(#10041, 5, "index")
fieldstructs(#10041, #10039)
component_types(#10039, 1, "index", #10018)