本文仅供自学使用,所有参考文章已标注
AST抽象语法树
传统编译语言的流程中,源代码在执行之前会经历三个步骤,统称“编译”:
1. 分词/词法分析:把字符组成的字符串分解成有意义的代码块(token)。空格是否被当作token,取决于空格在对应语言中的含义
let a = 1 --> token: let 、a 、= 、 1
2. 解析/语法分析:将token流转换为一个由元素嵌套所组成的代表程序语法结构的树(AST)
3. 代码生成:将AST转换为可执行代码
AST:源代码的抽象语法结构的树状表示,树上的每个节点都代表源代码中的一种结构。
具体的AST解释可以看这篇AST系列(一): 抽象语法树为什么抽象 - 知乎
使用Joern生成AST(可视化)
joern> cpg.method("max").plotDotAst
/*
int max(int a,int b){
return a>b?a:b;
}
*/
获得AST的.dot文本形式
joern> cpg.method("max").dotAst.l
res28: List[String] = List(
"""digraph "max" {
"7" [label = <(METHOD,max)<SUB>4</SUB>> ]
"8" [label = <(PARAM,int a)<SUB>4</SUB>> ]
"9" [label = <(PARAM,int b)<SUB>4</SUB>> ]
"10" [label = <(BLOCK,<empty>,<empty>)<SUB>4</SUB>> ]
"11" [label = <(RETURN,return a>b?a:b;,return a>b?a:b;)<SUB>5</SUB>> ]
"12" [label = <(<operator>.conditional,a>b?a:b)<SUB>5</SUB>> ]
"13" [label = <(<operator>.greaterThan,a>b)<SUB>5</SUB>> ]
"14" [label = <(IDENTIFIER,a,a>b)<SUB>5</SUB>> ]
"15" [label = <(IDENTIFIER,b,a>b)<SUB>5</SUB>> ]
"16" [label = <(IDENTIFIER,a,a>b?a:b)<SUB>5</SUB>> ]
"17" [label = <(IDENTIFIER,b,a>b?a:b)<SUB>5</SUB>> ]
"18" [label = <(METHOD_RETURN,int)<SUB>4</SUB>> ]
"7" -> "8"
"7" -> "9"
"7" -> "10"
"7" -> "18"
"10" -> "11"
"11" -> "12"
"12" -> "13"
"12" -> "16"
"12" -> "17"
"13" -> "14"
"13" -> "15"
}
"""
)
数据依赖图DDG
两个句子存在数据依赖:一条语句中一个变量的定义,可以到达另一条语句中对该变量的使用
用Joern生成DDG:
/*
int max(int a,int b){
return a>b?a:b;
}
*/
joern> cpg.method("max").plotDotDdg
joern> cpg.method("max").dotDdg.l
res58: List[String] = List(
"""digraph "max" {
"7" [label = <(METHOD,max)<SUB>4</SUB>> ]
"18" [label = <(METHOD_RETURN,int)<SUB>4</SUB>> ]
"8" [label = <(PARAM,int a)<SUB>4</SUB>> ]
"9" [label = <(PARAM,int b)<SUB>4</SUB>> ]
"11" [label = <(RETURN,return a>b?a:b;,return a>b?a:b;)<SUB>5</SUB>> ]
"12" [label = <(<operator>.conditional,a>b?a:b)<SUB>5</SUB>> ]
"12" [label = <(<operator>.conditional,a>b?a:b)<SUB>5</SUB>> ]
"12" [label = <(<operator>.conditional,a>b?a:b)<SUB>5</SUB>> ]
"12" [label = <(<operator>.conditional,a>b?a:b)<SUB>5</SUB>> ]
"13" [label = <(<operator>.greaterThan,a>b)<SUB>5</SUB>> ]
"13" [label = <(<operator>.greaterThan,a>b)<SUB>5</SUB>> ]
"11" -> "18" [ label = "<RET>"]
"12" -> "18" [ label = "a>b"]
"12" -> "18" [ label = "b"]
"12" -> "18" [ label = "a"]
"12" -> "18" [ label = "a>b?a:b"]
"7" -> "8"
"7" -> "9"
"12" -> "11" [ label = "a>b?a:b"]
"13" -> "12" [ label = "a"]
"7" -> "12"
"13" -> "12" [ label = "b"]
"8" -> "13" [ label = "a"]
"7" -> "13"
"9" -> "13" [ label = "b"]
}
"""
)
控制依赖图CDG
参考的这篇:为了方便自己看直接截图下来了【程序分析】数据依赖、控制依赖、程序依赖图PDG、系统依赖图SDG_AD_钙的博客-CSDN博客
用Joern生成CDG:
/*
#include<iostream>
using namespace std;
int main(){
int a=0;
int b=1;
if(a==0) cout<<"a==0";
else cout<<"a!=0:"<<b;
return 0;
}
*/
joern> cpg.method("main").plotDotCdg
joern> cpg.method("main").dotCdg.l
res53: List[String] = List(
"""digraph "main" {
"18" [label = <(<operator>.equals,a==0)<SUB>7</SUB>> ]
"22" [label = <(<operator>.shiftLeft,cout<<"a==0")<SUB>7</SUB>> ]
"27" [label = <(<operator>.shiftLeft,cout<<"a!=0:"<<b)<SUB>8</SUB>> ]
"28" [label = <(<operator>.shiftLeft,cout<<"a!=0:")<SUB>8</SUB>> ]
"18" -> "22"
"18" -> "28"
"18" -> "27"
}
"""
)
PDG程序依赖图
还是参考的这篇,感谢大佬分享:【程序分析】数据依赖、控制依赖、程序依赖图PDG、系统依赖图SDG_AD_钙的博客-CSDN博客
Joern生成这一堆图的代码都是一样的,这里就不放了。放个结果,从中可以看出包含了DDG和CDG