程序分析 clang系列学习(一)

接触了程序分析这么久,对程序分析依旧一知半解,导师建议我们可以学习下clang-static-analyzer(CSA),重点从编写自定义checker入手,于是踏上了clang的学习道路。在这方面如果我的blog里有错误欢迎大佬们指正。

一.关于llvm/clang/CSA

CSA是clang的一部分,clang又是LLVM的一部分,因此学习CSA肯定离不开学习clang和LLVM。

这里不过多介绍llvm/clang,贴上一个blog

在不同的语境下,LLVM有不同的含义(这里不全):

  • LLVM可以指LLVM基础架构,即一个完整编译器项目集合,包括前端,后端,优化器,汇编器,JIT引擎等。

  • LLVM还可以指基于LLVM构造的编译器:部分或完全使用LLVM构建的编译器。

  • LLVM后端,包含了代码优化与目标代码生成部分,与Clang组成一个完整的编译器。

  • LLVM项目。

Clang允许hook编译过程,并获得编译每个阶段生成的数据结构的详尽信息,包括AST,CFG。clang tools的一个应用是自动查找程序中的缺陷(defects),提供比编译器多得多的警告。比如clang-tidy工具通过检测程序中使用的语法来发现style问题和不安全或可能不可移植的结构

CSA是一个查找程序缺陷的符号执行工具。实际程序的实际行为都取决于外部因素,例如输入值、随机数和库组件的行为,analyzer engine用符号值表示未知值,并基于这些符号值执行符号计算。它能检测出导致程序出错的符号值的条件

因此,Clang Static Analyzer能够发现只发生在罕见程序路径上的深层错误。手动测试或自动测试套件可能错过了这些路径。在发现bug时,analyzer会绘制导致bug的整个路径,并在每个条件语句上显示跳转方向。

二.clang的使用

先贴上llvm的github网址,我用到的llvm是12.0.0版。采用的安装方式是直接下tar包解压的方式(github release链接,我下的是clang+llvm-12.0.0-x86_64-linux-gnu-ubuntu-16.04.tar.xz,编译安装实在是太麻烦了,下tar包多简单),安装并配置好环境变量之后可以用clang --version查看版本,前面给出的blog中已有clang的使用示例,这里就跳过了。我的llvm部分目录如下(可作参考):

llvm
  -- bin # 都是二进制PE文件
    -- clang
    -- clang++
    -- clang-tidy
    -- clang-format
    -- clang-scan-deps
    ...
    
  -- include # 都是文件夹
    -- c++
    -- clang
    -- clang-c
    -- clang-tidy
    -- llvm
    ...
    
  -- lib
    ...
  
  -- libexec # Perl脚本
    -- c++-analyzer
    -- ccc-analyzer
  
  -- share # 都是文件夹
    -- clang
    -- man
    -- opt-viewer
    -- scan-build
    -- scan-view

这里介绍下clang的简单使用,参考前面贴上的博客。我用的示例代码main.c如下

#include <stdio.h>

int main(void){
    int i, n;
    scanf("%d",&n);
    if (n > 10)
        printf("hello world\n");
    else
        printf("no\n");
    for (i = 0; i < 10; ++i){
        n = i + 1;
        n += 2;
    }
    return 0;
}

2.1.dump出token序列

clang -E -Xclang -dump-tokens main.c命令进行词法分析,命令行输出(只截取main.c中的内容)为

int 'int'	 [StartOfLine]	Loc=<main.c:3:1>
identifier 'main'	 [LeadingSpace]	Loc=<main.c:3:5>
l_paren '('		Loc=<main.c:3:9>
void 'void'		Loc=<main.c:3:10>
r_paren ')'		Loc=<main.c:3:14>
l_brace '{'		Loc=<main.c:3:15>
int 'int'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:4:5>
identifier 'i'	 [LeadingSpace]	Loc=<main.c:4:9>
comma ','		Loc=<main.c:4:10>
identifier 'n'	 [LeadingSpace]	Loc=<main.c:4:12>
semi ';'		Loc=<main.c:4:13>
identifier 'scanf'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:5:5>
l_paren '('		Loc=<main.c:5:10>
string_literal '"%d"'		Loc=<main.c:5:11>
comma ','		Loc=<main.c:5:15>
amp '&'		Loc=<main.c:5:16>
identifier 'n'		Loc=<main.c:5:17>
r_paren ')'		Loc=<main.c:5:18>
semi ';'		Loc=<main.c:5:19>
if 'if'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:6:5>
l_paren '('	 [LeadingSpace]	Loc=<main.c:6:8>
identifier 'n'		Loc=<main.c:6:9>
greater '>'	 [LeadingSpace]	Loc=<main.c:6:11>
numeric_constant '10'	 [LeadingSpace]	Loc=<main.c:6:13>
r_paren ')'		Loc=<main.c:6:15>
identifier 'printf'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:7:9>
l_paren '('		Loc=<main.c:7:15>
string_literal '"hello world\n"'		Loc=<main.c:7:16>
r_paren ')'		Loc=<main.c:7:31>
semi ';'		Loc=<main.c:7:32>
else 'else'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:8:5>
identifier 'printf'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:9:9>
l_paren '('		Loc=<main.c:9:15>
string_literal '"no\n"'		Loc=<main.c:9:16>
r_paren ')'		Loc=<main.c:9:22>
semi ';'		Loc=<main.c:9:23>
for 'for'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:10:5>
l_paren '('	 [LeadingSpace]	Loc=<main.c:10:9>
identifier 'i'		Loc=<main.c:10:10>
equal '='	 [LeadingSpace]	Loc=<main.c:10:12>
numeric_constant '0'	 [LeadingSpace]	Loc=<main.c:10:14>
semi ';'		Loc=<main.c:10:15>
identifier 'i'	 [LeadingSpace]	Loc=<main.c:10:17>
less '<'	 [LeadingSpace]	Loc=<main.c:10:19>
numeric_constant '10'	 [LeadingSpace]	Loc=<main.c:10:21>
semi ';'		Loc=<main.c:10:23>
plusplus '++'	 [LeadingSpace]	Loc=<main.c:10:25>
identifier 'i'		Loc=<main.c:10:27>
r_paren ')'		Loc=<main.c:10:28>
l_brace '{'		Loc=<main.c:10:29>
identifier 'n'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:11:9>
equal '='	 [LeadingSpace]	Loc=<main.c:11:11>
identifier 'i'	 [LeadingSpace]	Loc=<main.c:11:13>
plus '+'	 [LeadingSpace]	Loc=<main.c:11:15>
numeric_constant '1'	 [LeadingSpace]	Loc=<main.c:11:17>
semi ';'		Loc=<main.c:11:18>
identifier 'n'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:12:9>
plusequal '+='	 [LeadingSpace]	Loc=<main.c:12:11>
numeric_constant '2'	 [LeadingSpace]	Loc=<main.c:12:14>
semi ';'		Loc=<main.c:12:15>
r_brace '}'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:13:5>
return 'return'	 [StartOfLine] [LeadingSpace]	Loc=<main.c:14:5>
numeric_constant '0'	 [LeadingSpace]	Loc=<main.c:14:12>
semi ';'		Loc=<main.c:14:13>
r_brace '}'	 [StartOfLine]	Loc=<main.c:15:1>
eof ''		Loc=<main.c:15:2>

2.2.dump出AST

clang -fsyntax-only -Xclang -ast-dump main.c命令查看AST,命令行输出为

-FunctionDecl 0x7d0ffd8 <main.c:3:1, line:15:1> line:3:5 main 'int (void)'
  `-CompoundStmt 0x7d10890 <col:15, line:15:1>
    |-DeclStmt 0x7d10190 <line:4:5, col:13>
    | |-VarDecl 0x7d10090 <col:5, col:9> col:9 used i 'int'
    | `-VarDecl 0x7d10110 <col:5, col:12> col:12 used n 'int'
    |-CallExpr 0x7d102f0 <line:5:5, col:18> 'int'
    | |-ImplicitCastExpr 0x7d102d8 <col:5> 'int (*)(const char *restrict, ...)' <FunctionToPointerDecay>
    | | `-DeclRefExpr 0x7d101a8 <col:5> 'int (const char *restrict, ...)' Function 0x7d045b8 'scanf' 'int (const char *restrict, ...)'
    | |-ImplicitCastExpr 0x7d10338 <col:11> 'const char *' <NoOp>
    | | `-ImplicitCastExpr 0x7d10320 <col:11> 'char *' <ArrayToPointerDecay>
    | |   `-StringLiteral 0x7d10208 <col:11> 'char [3]' lvalue "%d"
    | `-UnaryOperator 0x7d10248 <col:16, col:17> 'int *' prefix '&' cannot overflow
    |   `-DeclRefExpr 0x7d10228 <col:17> 'int' lvalue Var 0x7d10110 'n' 'int'
    |-IfStmt 0x7d105a0 <line:6:5, line:9:22> has_else
    | |-BinaryOperator 0x7d103a8 <line:6:9, col:13> 'int' '>'
    | | |-ImplicitCastExpr 0x7d10390 <col:9> 'int' <LValueToRValue>
    | | | `-DeclRefExpr 0x7d10350 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
    | | `-IntegerLiteral 0x7d10370 <col:13> 'int' 10
    | |-CallExpr 0x7d10480 <line:7:9, col:31> 'int'
    | | |-ImplicitCastExpr 0x7d10468 <col:9> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
    | | | `-DeclRefExpr 0x7d103c8 <col:9> 'int (const char *, ...)' Function 0x7cff6f0 'printf' 'int (const char *, ...)'
    | | `-ImplicitCastExpr 0x7d104c0 <col:16> 'const char *' <NoOp>
    | |   `-ImplicitCastExpr 0x7d104a8 <col:16> 'char *' <ArrayToPointerDecay>
    | |     `-StringLiteral 0x7d10428 <col:16> 'char [13]' lvalue "hello world\n"
    | `-CallExpr 0x7d10548 <line:9:9, col:22> 'int'
    |   |-ImplicitCastExpr 0x7d10530 <col:9> 'int (*)(const char *, ...)' <FunctionToPointerDecay>
    |   | `-DeclRefExpr 0x7d104d8 <col:9> 'int (const char *, ...)' Function 0x7cff6f0 'printf' 'int (const char *, ...)'
    |   `-ImplicitCastExpr 0x7d10588 <col:16> 'const char *' <NoOp>
    |     `-ImplicitCastExpr 0x7d10570 <col:16> 'char *' <ArrayToPointerDecay>
    |       `-StringLiteral 0x7d104f8 <col:16> 'char [4]' lvalue "no\n"
    |-ForStmt 0x7d10828 <line:10:5, line:13:5>
    | |-BinaryOperator 0x7d10610 <line:10:10, col:14> 'int' '='
    | | |-DeclRefExpr 0x7d105d0 <col:10> 'int' lvalue Var 0x7d10090 'i' 'int'
    | | `-IntegerLiteral 0x7d105f0 <col:14> 'int' 0
    | |-<<<NULL>>>
    | |-BinaryOperator 0x7d10688 <col:17, col:21> 'int' '<'
    | | |-ImplicitCastExpr 0x7d10670 <col:17> 'int' <LValueToRValue>
    | | | `-DeclRefExpr 0x7d10630 <col:17> 'int' lvalue Var 0x7d10090 'i' 'int'
    | | `-IntegerLiteral 0x7d10650 <col:21> 'int' 10
    | |-UnaryOperator 0x7d106c8 <col:25, col:27> 'int' prefix '++'
    | | `-DeclRefExpr 0x7d106a8 <col:27> 'int' lvalue Var 0x7d10090 'i' 'int'
    | `-CompoundStmt 0x7d10808 <col:29, line:13:5>
    |   |-BinaryOperator 0x7d10778 <line:11:9, col:17> 'int' '='
    |   | |-DeclRefExpr 0x7d106e0 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
    |   | `-BinaryOperator 0x7d10758 <col:13, col:17> 'int' '+'
    |   |   |-ImplicitCastExpr 0x7d10740 <col:13> 'int' <LValueToRValue>
    |   |   | `-DeclRefExpr 0x7d10700 <col:13> 'int' lvalue Var 0x7d10090 'i' 'int'
    |   |   `-IntegerLiteral 0x7d10720 <col:17> 'int' 1
    |   `-CompoundAssignOperator 0x7d107d8 <line:12:9, col:14> 'int' '+=' ComputeLHSTy='int' ComputeResultTy='int'
    |     |-DeclRefExpr 0x7d10798 <col:9> 'int' lvalue Var 0x7d10110 'n' 'int'
    |     `-IntegerLiteral 0x7d107b8 <col:14> 'int' 2
    `-ReturnStmt 0x7d10880 <line:14:5, col:12>
      `-IntegerLiteral 0x7d10860 <col:12> 'int' 0

2.3.dump出CFG

在低版本的llvm中可以通过clang --cc1 -analyze -cfg-dump main.c来查看main.c的控制流图,不过这个版本好像不行了。需要用clang -cc1 -analyze -analyzer-checker=debug.ViewCFG main.c命令调用CSA的ViewCFG checker来查看。

但是我的测试代码中引用了stdio.hclang -cc1默认仅限当前目录,所以会出现fata error: 'stdio.h' file not found的情况,需要使用-I参数包含库,这里我的stdio.h在目录/usr/include下。而这里还需要引入一个stddef.h,这个文件在目录$LLVM_DIR/lib/clang/12.0.0/include下,其中$LLVM_DIR是llvm的安装目录。

最终命令如下:clang -cc1 -I /usr/include -I $LLVM_DIR/lib/clang/12.0.0/include -analyze -analyzer-checker=debug.DumpCFG main.c

命令行输出的CFG如下

int main()
 [B9 (ENTRY)]
   Succs (1): B8

 [B1]
   1: 0
   2: return [B1.1];
   Preds (1): B4
   Succs (1): B0

 [B2]
   1: i
   2: ++[B2.1]
   Preds (1): B3
   Succs (1): B4

 [B3]
   1: i
   2: [B3.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 1
   4: [B3.2] + [B3.3]
   5: n
   6: [B3.5] = [B3.4]
   7: n
   8: 2
   9: [B3.7] += [B3.8]
   Preds (1): B4
   Succs (1): B2

 [B4]
   1: i
   2: [B4.1] (ImplicitCastExpr, LValueToRValue, int)
   3: 10
   4: [B4.2] < [B4.3]
   T: for (...; [B4.4]; ...)
   Preds (2): B2 B5
   Succs (2): B3 B1

 [B5]
   1: 0
   2: i
   3: [B5.2] = [B5.1]
   Preds (2): B6 B7
   Succs (1): B4

 [B6]
   1: printf
   2: [B6.1] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
   3: "no\n"
   4: [B6.3] (ImplicitCastExpr, ArrayToPointerDecay, char *)
   5: [B6.4] (ImplicitCastExpr, NoOp, const char *)
   6: [B6.2]([B6.5])
   Preds (1): B8
   Succs (1): B5

 [B7]
   1: printf
   2: [B7.1] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
   3: "hello world\n"
   4: [B7.3] (ImplicitCastExpr, ArrayToPointerDecay, char *)
   5: [B7.4] (ImplicitCastExpr, NoOp, const char *)
   6: [B7.2]([B7.5])
   Preds (1): B8
   Succs (1): B5

 [B8]
   1: int i;
   2: int n;
   3: __isoc99_scanf
   4: [B8.3] (ImplicitCastExpr, FunctionToPointerDecay, int (*)(const char *, ...))
   5: "%d"
   6: [B8.5] (ImplicitCastExpr, ArrayToPointerDecay, char *)
   7: [B8.6] (ImplicitCastExpr, NoOp, const char *)
   8: n
   9: &[B8.8]
  10: [B8.4]([B8.7], [B8.9])
  11: n
  12: [B8.11] (ImplicitCastExpr, LValueToRValue, int)
  13: 10
  14: [B8.12] > [B8.13]
   T: if [B8.14]
   Preds (1): B9
   Succs (2): B7 B6

 [B0 (EXIT)]
   Preds (1): B1

CFG的结点序号是倒序的,化成流程图如下:

在这里插入图片描述
这里clang把int i, n;解析成了 int i; int n; 2个语句,CFG结点是手动翻译回来的。至于clang输出的CFG文本怎么解析还有待研究。

2.4.生成IR

LLVM IR使用静态单赋值(SSA)策略(但这并不是说clang生成的IR是SSA的,LLVM 采用了一个“小技巧”,可以把构造 SSA 的工作从前端clang分离出来。这个 trick 是 LLVM 所特有的),生成的IR具有下面特性

  • 以三地址码形式组织指令

  • 假设有无数寄存器可用

LLVM IR有3种表示形式(本质是等价的,可以相互转换):

  • text:便于阅读的文本格式,类似于汇编语言,拓展名.ll,通过命令 clang -S -emit-llvm main.c获得,-S表示Only run preprocess and compilation steps
  • memory:内存形式,FunctionInstruction类表示的IR。
  • bitcode:二进制格式,拓展名.bc, clang -c -emit-llvm main.c获得,通过命令-c表示Only run preprocess, compile, and assemble steps,适用于JIT编译器的快速加载。

这里dump出text格式的

; ModuleID = 'main.c'
source_filename = "main.c"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [3 x i8] c"%d\00", align 1
@.str.1 = private unnamed_addr constant [13 x i8] c"hello world\0A\00", align 1
@.str.2 = private unnamed_addr constant [4 x i8] c"no\0A\00", align 1

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @main() #0 {
  %1 = alloca i32, align 4
  %2 = alloca i32, align 4
  %3 = alloca i32, align 4
  store i32 0, i32* %1, align 4
  %4 = call i32 (i8*, ...) @__isoc99_scanf(i8* getelementptr inbounds ([3 x i8], [3 x i8]* @.str, i64 0, i64 0), i32* %3)
  %5 = load i32, i32* %3, align 4
  %6 = icmp sgt i32 %5, 10
  br i1 %6, label %7, label %9

7:                                                ; preds = %0
  %8 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([13 x i8], [13 x i8]* @.str.1, i64 0, i64 0))
  br label %11

9:                                                ; preds = %0
  %10 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([4 x i8], [4 x i8]* @.str.2, i64 0, i64 0))
  br label %11

11:                                               ; preds = %9, %7
  store i32 0, i32* %2, align 4
  br label %12

12:                                               ; preds = %20, %11
  %13 = load i32, i32* %2, align 4
  %14 = icmp slt i32 %13, 10
  br i1 %14, label %15, label %23

15:                                               ; preds = %12
  %16 = load i32, i32* %2, align 4
  %17 = add nsw i32 %16, 1
  store i32 %17, i32* %3, align 4
  %18 = load i32, i32* %3, align 4
  %19 = add nsw i32 %18, 2
  store i32 %19, i32* %3, align 4
  br label %20

20:                                               ; preds = %15
  %21 = load i32, i32* %2, align 4
  %22 = add nsw i32 %21, 1
  store i32 %22, i32* %2, align 4
  br label %12, !llvm.loop !2

23:                                               ; preds = %12
  ret i32 0
}

declare dso_local i32 @__isoc99_scanf(i8*, ...) #1

declare dso_local i32 @printf(i8*, ...) #1

attributes #0 = { noinline nounwind optnone uwtable "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "tune-cpu"="generic" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 12.0.0"}
!2 = distinct !{!2, !3}
!3 = !{!"llvm.loop.mustprogress"}

以后会补充对IR的解释。

三.CSA的使用

3.1.使用示例

CSA是clang的一部分,安装好llvm和clang之后CSA已经存在目录之下了,并且目录中已经存在一些CSA checker,通过命令clang -cc1 -analyzer-checker-help可查询checker list。同时官方给出的checker list(可以对比)

再llvm project的clang/include/clang/StaticAnalyzer/Checkers下有个Checker.td文件保存了Checker的描述。

OVERVIEW: Clang Static Analyzer Checkers List

USAGE: -analyzer-checker <CHECKER or PACKAGE,...>

CHECKERS:
  core.CallAndMessage           Check for logical errors for function calls and Objective-C message expressions (e.g., uninitialized arguments, null function pointers)
  core.DivideZero               Check for division by zero
  core.NonNullParamChecker      Check for null pointers passed as arguments to a function whose arguments are references or marked with the 'nonnull' attribute
  core.NullDereference          Check for dereferences of null pointers
  core.StackAddressEscape       Check that addresses to stack memory do not escape the function
  core.UndefinedBinaryOperatorResult
                                Check for undefined results of binary operators
  core.VLASize                  Check for declarations of VLA of undefined or zero size
  core.uninitialized.ArraySubscript
                                Check for uninitialized values used as array subscripts
  core.uninitialized.Assign     Check for assigning uninitialized values
  core.uninitialized.Branch     Check for uninitialized values used as branch conditions
  core.uninitialized.CapturedBlockVariable
                                Check for blocks that capture uninitialized values
  core.uninitialized.UndefReturn Check for uninitialized values being returned to the caller
  cplusplus.InnerPointer        Check for inner pointers of C++ containers used after re/deallocation
  cplusplus.Move                Find use-after-move bugs in C++
  cplusplus.NewDelete           Check for double-free and use-after-free problems. Traces memory managed by new/delete.
  cplusplus.NewDeleteLeaks      Check for memory leaks. Traces memory managed by new/delete.
  cplusplus.PlacementNew        Check if default placement new is provided with pointers to sufficient storage capacity
  cplusplus.PureVirtualCall     Check pure virtual function calls during construction/destruction
  deadcode.DeadStores           Check for values stored to variables that are never read afterwards
  fuchsia.HandleChecker         A Checker that detect leaks related to Fuchsia handles
  nullability.NullPassedToNonnull
                                Warns when a null pointer is passed to a pointer which has a _Nonnull type.
  nullability.NullReturnedFromNonnull
                                Warns when a null pointer is returned from a function that has _Nonnull return type.
  nullability.NullableDereferenced
                                Warns when a nullable pointer is dereferenced.
  nullability.NullablePassedToNonnull
                                Warns when a nullable pointer is passed to a pointer which has a _Nonnull type.
  nullability.NullableReturnedFromNonnull
                                Warns when a nullable pointer is returned from a function that has _Nonnull return type.
  optin.cplusplus.UninitializedObject
                                Reports uninitialized fields after object construction
  optin.cplusplus.VirtualCall   Check virtual function calls during construction/destruction
  optin.mpi.MPI-Checker         Checks MPI code
  optin.osx.OSObjectCStyleCast  Checker for C-style casts of OSObjects
  optin.osx.cocoa.localizability.EmptyLocalizationContextChecker
                                Check that NSLocalizedString macros include a comment for context
  optin.osx.cocoa.localizability.NonLocalizedStringChecker
                                Warns about uses of non-localized NSStrings passed to UI methods expecting localized NSStrings
  optin.performance.GCDAntipattern
                                Check for performance anti-patterns when using Grand Central Dispatch
  optin.performance.Padding     Check for excessively padded structs.
  optin.portability.UnixAPI     Finds implementation-defined behavior in UNIX/Posix functions
  osx.API                       Check for proper uses of various Apple APIs
  osx.MIG                       Find violations of the Mach Interface Generator calling convention
  osx.NumberObjectConversion    Check for erroneous conversions of objects representing numbers into numbers
  osx.OSObjectRetainCount       Check for leaks and improper reference count management for OSObject
  osx.ObjCProperty              Check for proper uses of Objective-C properties
  osx.SecKeychainAPI            Check for proper uses of Secure Keychain APIs
  osx.cocoa.AtSync              Check for nil pointers used as mutexes for @synchronized
  osx.cocoa.AutoreleaseWrite    Warn about potentially crashing writes to autoreleasing objects from different autoreleasing pools in Objective-C
  osx.cocoa.ClassRelease        Check for sending 'retain', 'release', or 'autorelease' directly to a Class
  osx.cocoa.Dealloc             Warn about Objective-C classes that lack a correct implementation of -dealloc
  osx.cocoa.IncompatibleMethodTypes
                                Warn about Objective-C method signatures with type incompatibilities
  osx.cocoa.Loops               Improved modeling of loops using Cocoa collection types
  osx.cocoa.MissingSuperCall    Warn about Objective-C methods that lack a necessary call to super
  osx.cocoa.NSAutoreleasePool   Warn for suboptimal uses of NSAutoreleasePool in Objective-C GC mode
  osx.cocoa.NSError             Check usage of NSError** parameters
  osx.cocoa.NilArg              Check for prohibited nil arguments to ObjC method calls
  osx.cocoa.NonNilReturnValue   Model the APIs that are guaranteed to return a non-nil value
  osx.cocoa.ObjCGenerics        Check for type errors when using Objective-C generics
  osx.cocoa.RetainCount         Check for leaks and improper reference count management
  osx.cocoa.RunLoopAutoreleaseLeak
                                Check for leaked memory in autorelease pools that will never be drained
  osx.cocoa.SelfInit            Check that 'self' is properly initialized inside an initializer method
  osx.cocoa.SuperDealloc        Warn about improper use of '[super dealloc]' in Objective-C
  osx.cocoa.UnusedIvars         Warn about private ivars that are never used
  osx.cocoa.VariadicMethodTypes Check for passing non-Objective-C types to variadic collection initialization methods that expect only Objective-C types
  osx.coreFoundation.CFError    Check usage of CFErrorRef* parameters
  osx.coreFoundation.CFNumber   Check for proper uses of CFNumber APIs
  osx.coreFoundation.CFRetainRelease
                                Check for null arguments to CFRetain/CFRelease/CFMakeCollectable
  osx.coreFoundation.containers.OutOfBounds
                                Checks for index out-of-bounds when using 'CFArray' API
  osx.coreFoundation.containers.PointerSizedValues
                                Warns if 'CFArray', 'CFDictionary', 'CFSet' are created with non-pointer-size values
  security.FloatLoopCounter     Warn on using a floating point value as a loop counter (CERT: FLP30-C, FLP30-CPP)
  security.insecureAPI.DeprecatedOrUnsafeBufferHandling
                                Warn on uses of unsecure or deprecated buffer manipulating functions
  security.insecureAPI.UncheckedReturn
                                Warn on uses of functions whose return values must be always checked
  security.insecureAPI.bcmp     Warn on uses of the 'bcmp' function
  security.insecureAPI.bcopy    Warn on uses of the 'bcopy' function
  security.insecureAPI.bzero    Warn on uses of the 'bzero' function
  security.insecureAPI.decodeValueOfObjCType
                                Warn on uses of the '-decodeValueOfObjCType:at:' method
  security.insecureAPI.getpw    Warn on uses of the 'getpw' function
  security.insecureAPI.gets     Warn on uses of the 'gets' function
  security.insecureAPI.mkstemp  Warn when 'mkstemp' is passed fewer than 6 X's in the format string
  security.insecureAPI.mktemp   Warn on uses of the 'mktemp' function
  security.insecureAPI.rand     Warn on uses of the 'rand', 'random', and related functions
  security.insecureAPI.strcpy   Warn on uses of the 'strcpy' and 'strcat' functions
  security.insecureAPI.vfork    Warn on uses of the 'vfork' function
  unix.API                      Check calls to various UNIX/Posix functions
  unix.Malloc                   Check for memory leaks, double free, and use-after-free problems. Traces memory managed by malloc()/free().
  unix.MallocSizeof             Check for dubious malloc arguments involving sizeof
  unix.MismatchedDeallocator    Check for mismatched deallocators.
  unix.Vfork                    Check for proper usage of vfork
  unix.cstring.BadSizeArg       Check the size argument passed into C string functions for common erroneous patterns
  unix.cstring.NullArg          Check for null pointers being passed as arguments to C string functions
  valist.CopyToSelf             Check for va_lists which are copied onto itself.
  valist.Uninitialized          Check for usages of uninitialized (or already released) va_lists.
  valist.Unterminated           Check for va_lists which are not released by a va_end call.
  webkit.NoUncountedMemberChecker
                                Check for no uncounted member variables.
  webkit.RefCntblBaseVirtualDtor Check for any ref-countable base class having virtual destructor.
  webkit.UncountedLambdaCapturesChecker
                                Check uncounted lambda captures.

这些checker源代码的目录在lib/StaticAnalyzer之下。前面查看程序的CFG用到了debug.DumpCFG checker。

对单个源文件(单个文件的小程序)进行检测时,执行命令clang --analyze -Xanalyzer -analyzer-checker=

这里我简单使用下DivideZero Checker(core.DivideZero),命令行help信息显示的是Check for division by zero,源码位置应该是DivZeroChecker.cpp(我自己猜的,可能有误),官方提供的测试样例

int fooPR10616 (int qX ) {
  int a, c, d;
  d = (qX-1);
  while ( d != 0 ) {
    d = c - (c/d) * d;
  }
  return (a % (qX-1)); // expected-warning {{Division by zero}}
}

根据注释,CSA应该会对 (a % (qX-1)) 报出一个warning。这里面涉及到除法的只有 c / da % (qX - 1),前者在外面有while循环护体,躲过一劫。这里我运行命令 clang -cc1 -analyze -analyzer-checker=core.DivideZero div-zero.c,CSA报出的信息如下:

div-zero.c:9:13: warning: Division by zero [core.DivideZero]
  return (a % (qX-1)); // expected-warning {{Division by zero}}
          ~~^~~~~~~~
1 warning generated.

3.2.开发自定义checker参考文档

可供参考的文档(不过有的是基于4.0.0相关的版本,我用的12.0.0,有的代码不一样了,相应的lib也就不一样了):

首先,CSA有关内容都在llvm project(我这是12.0.0版)目录下

这里我贴上clang-analyzer-guide的示例MainCallChecker。这个checker的目标是查找代码中是否有违反下面rule的情况

  • The function main shall not be used within a program:main函数不应该递归(程序总是会包括main的,但是main中的任何内容不应该再调用main)。这看起来容易,好像只要只要找函数调用语句并且匹配函数名是否是main就可以(AST匹配)。而实际上忽略了函数指针的情况。
typedef int (*main_t)(int, char**);
int main (int argc , char** argv) {
	main_t foo = main ;
	int exit_code = foo(argc, argv); // actually calls main ()!
	return exit_code ;
}

上面代码中main_t是自定义的函数指针类型,这类函数返回值int,第一个形参数是int,第二个是char**,上面代码中程序定义了一个函数指针变量foo指向main,并通过foo调用main,因此存在调用main的情况,需要抛出warn,而简单的AST匹配(syntax-based check)是不可能做到的。

这里我就不贴上checker的代码了,官方给出了demo。写出的checker有3种运行的方式。

  • 静态:通过静态链接的方式,需要修改Checker.td并重新编译clang,比较麻烦,但重新编译之后自定义checker就已经集成进去了。

  • 动态:自定义checker作为一个单独的模块编译成动态链接库so文件,调用的时候通过clang -cc1 -load Checker.so加载(Checker.so是自定义的checker)。这里我选择用动态集成的方式。

  • 同时也可以写一个独立的程序来检测(libtooling方式),就是写好Checker后不再集成回Clang而是作为独立程序运行。

这里我贴上一些官方MainCallChecker的demo

同时这有另一个自定义clang plugin工程,这里写的plugin独立运行和动态集成的版本都是。

  • 4
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 12
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值