1.Introduction
llvm内置了一个简单的代码覆盖率检测(sanitizercoverage)。它在函数级、基本块级和边缘级插入对用户定义函数的调用。提供了这些回调的默认实现,并实现了简单的覆盖率报告和可视化,但是,如果您只需要覆盖率可视化,则可能需要改用sourcebasedcodecoverage。
2.Tracing PCs with guards
使用-fsanitize coverage=trace pc guard,编译器将在每个边缘插入以下代码:
__sanitizer_cov_trace_pc_guard(&guard_variable)
每个边都有自己的保护变量(uint32)。
完成程序还将插入对模块构造函数的调用:
// The guards are [start, stop).警卫在[start,stop)。
// This function will be called at least once per DSO and may be called.每个dso至少调用一次此函数,可以调用
// more than once with the same values of start/stop.多次使用相同的“开始/停止”值。
__sanitizer_cov_trace_pc_guard_init(uint32_t *start, uint32_t *stop);
在每个间接调用中都会插入一个附加的 ...=trace-pc,indirect-calls
标志__sanitizer_cov_trace_pc_indirect(void *callee)。
函数__sanitizer_cov_trace_pc_*应由用户定义。
例如:
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
// This callback is inserted by the compiler as a module constructor
// into every DSO. 'start' and 'stop' correspond to the
// beginning and end of the section with the guards for the entire
// binary (executable or DSO). The callback will be called at least
// once per DSO and may be called multiple times with the same parameters.
//编译器将此回调作为模块构造函数插入到每个dso中。“开始”和“停止”对应于节的开头和结尾,并带有整个二进制文件(可执行文件或DSO)的保护。每个dso至少调用一次回调,并且可以使用相同的参数多次调用。
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *start,
uint32_t *stop) {
static uint64_t N; // Counter for the guards.
if (start == stop || *start) return; // Initialize only once.初始化一次
printf("INIT: %p %p\n", start, stop);
for (uint32_t *x = start; x < stop; x++)
*x = ++N; // Guards should start from 1.
}
// This callback is inserted by the compiler on every edge in the
// control flow (some optimizations apply).
// Typically, the compiler will emit the code like this:
// if(*guard)
// __sanitizer_cov_trace_pc_guard(guard);
// But for large functions it will emit a simple call:
// __sanitizer_cov_trace_pc_guard(guard);
//此回调由编译器在控制流的每一条边上插入(应用某些优化)。通常,编译器会发出如下代码:
//if(*guard)
// __sanitizer_cov_trace_pc_guard(guard);
//但对于大型函数,它将发出一个简单的调用:
// __sanitizer_cov_trace_pc_guard(guard);
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *guard) {
if (!*guard) return; // Duplicate the guard check.重复警卫检查。
// If you set *guard to 0 this code will not be called again for this edge.
// Now you can get the PC and do whatever you want:
// store it somewhere or symbolize it and print right away.
// The values of `*guard` are as you set them in
// __sanitizer_cov_trace_pc_guard_init and so you can make them consecutive
// and use them to dereference an array or a bit vector.
//如果将*guard设置为0,则不会为此边缘再次调用此代码。
//现在你可以得到PC,做任何你想做的事:把它储存在某处或象征它,并立即打印。
//`*guard`的值与您在__sanitizer_cov_trace_pc_guard_init中设置的值相同,因此您可以使它们连续,并使用它们取消对数组或位向量的引用。
void *PC = __builtin_return_address(0);
char PcDescr[1024];
// This function is a part of the sanitizer run-time.
// To use it, link with AddressSanitizer or other sanitizer.
//此函数是消毒剂运行时的一部分。
//要使用它,请链接AddressSanitizer或其他sanitizer。
__sanitizer_symbolize_pc(PC, "%p %F %L", PcDescr, sizeof(PcDescr));
printf("guard: %p %x PC %s\n", guard, *guard, PcDescr);
}
// trace-pc-guard-example.cc
int sub() {
int d=9-5;
return d;}
int foo() {
int c=sub()+5;
return c;}
int main() {
int f=foo();
return 0;
}
clang++ -g -fsanitize-coverage=trace-pc-guard trace-pc-guard-example.cc -c
clang++ trace-pc-guard-cb.cc trace-pc-guard-example.o -fsanitize=address
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
ASAN_OPTIONS=strip_path_prefix=`pwd`/ ./a.out
INIT: 0x530c50 0x530c5c
guard: 0x530c58 3 PC 0x4f86e6 in main trace-pc-guard-example.cc:7
guard: 0x530c54 2 PC 0x4f86b6 in foo() trace-pc-guard-example.cc:4
guard: 0x530c50 1 PC 0x4f8686 in sub() trace-pc-guard-example.cc:1
3.Inline 8bit-counters
实验性的,将来可能改变或消失
如果-fsanitize-coverage=inline-8bit-counters,编译器将在每个边缘插入内联计数器增量。这类似于-fsanitize-coverage=trace-pc-guard,但检测只是增加一个计数器,而不是回调。
用户需要实现一个函数来捕获启动时的计数器。
extern "C"
void __sanitizer_cov_8bit_counters_init(char *start, char *end) {
// [start,end) is the array of 8-bit counters created for the current DSO.
// Capture this array in order to read/modify the counters.
//[start,end)是为当前DSO创建的8位计数器数组。捕获此数组以读取/修改计数器。
}
4.PC-Table
实验性的,将来可能改变或消失
注意:对于lld以外的链接器,此检测可能与死代码剥离(-wl,-gc段)不兼容,从而导致显著的二进制大小开销。有关更多信息,请参阅Bug 34636。
使用-fsanitize-coverage=pc-table,编译器将创建一个检测的pc的表。需要-fsanitize-coverage=inline-8bit-counters或-fsanitize-coverage=trace-pc-guard。
用户需要实现一个函数来在启动时捕获PC表:
extern "C"
void __sanitizer_cov_pcs_init(const uintptr_t *pcs_beg,
const uintptr_t *pcs_end) {
// [pcs_beg,pcs_end) is the array of ptr-sized integers representing
// pairs [PC,PCFlags] for every instrumented block in the current DSO.
// Capture this array in order to read the PCs and their Flags.
// The number of PCs and PCFlags for a given DSO is the same as the number
// of 8-bit counters (-fsanitize-coverage=inline-8bit-counters) or
// trace_pc_guard callbacks (-fsanitize-coverage=trace-pc-guard)
// A PCFlags describes the basic block:
// * bit0: 1 if the block is the function entry block, 0 otherwise.
//[pcs-beg,pcs-end)是当前dso中每个检测块的ptr大小的整数数组,表示对[PC,PCFlags]。
//捕获此阵列以读取PC及其标志。
//给定dso的pc和pcflags的数量与8位计数器的数量相同(-fsanitize-coverage=inline-8bit-counters)或trace-pc-guard回调(-fsanitize-coverage=trace-pc-guard)
//PCFlags描述基本块:
//*bit0:1如果块是函数输入块,则为0。
}
举个例子,我们可以借助上面的一些函数完成对程序运行时信息收集(即如何完成程序覆盖率的计算)
//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
return i+j;
}
int main()
{
std::string s;
std::string s1="abcdefghijik";
int i;
std::cin>>s;
if(s==s1){
i=add(3,5);
}
else{
std::cout<<"wrong"<<std::endl;
}
return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
#include <assert.h>
#include <vector>
#define ATTRIBUTE_INTERFACE __declspec(dllexport)
#define ATTRIBUTE_INTERFACE __attribute__((visibility("default")))
struct Module {
uint32_t *Start, *Stop;
};
static const size_t kNumPCs = 1 << 21;
uint8_t __sancov_trace_pc_guard_8bit_counters[kNumPCs];
uintptr_t __sancov_trace_pc_pcs[kNumPCs];
Module Modules[4096];
size_t NumModules=0; // linker-initialized.
size_t NumGuards=0; // linker-initialized.
uint8_t *Counterss() {
return __sancov_trace_pc_guard_8bit_counters;
}
uintptr_t *PCs(){
return __sancov_trace_pc_pcs;
}
size_t GetNumPCs() { return kNumPCs<NumGuards + 1?kNumPCs:NumGuards + 1; }
//std::vector<uintptr_t> PCsCopy(GetNumPCs());
uintptr_t *PCs();
uintptr_t GetPC(size_t Idx) {
assert(Idx < GetNumPCs());
return PCs()[Idx];
}
size_t GetTotalPCCoverage() {
size_t Res = 0;
for (size_t i = 1, N = GetNumPCs(); i < N; i++)
if (PCs()[i])
Res++;
return Res;
}
//ATTRIBUTE_INTERFACE
extern "C" void __sanitizer_cov_trace_pc_guard(uint32_t *Guard) {
uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
uint32_t Idx = *Guard;
__sancov_trace_pc_pcs[Idx] = PC;
__sancov_trace_pc_guard_8bit_counters[Idx]++;
//size_t NumFeatures = CollectFeatures([&](size_t Feature) -> bool {return Feature%3;});
printf("GetTotalPCCoverage() is %zu\n",GetTotalPCCoverage());
//GetNumPCs
}
extern "C" void __sanitizer_cov_trace_pc_guard_init(uint32_t *Start, uint32_t *Stop) {
if (Start == Stop || *Start) return;
assert(NumModules < sizeof(Modules) / sizeof(Modules[0]));
for (uint32_t *P = Start; P < Stop; P++) {
NumGuards++;
if (NumGuards == kNumPCs) {
printf(
"WARNING: The binary has too many instrumented PCs.\n"
" You may want to reduce the size of the binary\n"
" for more efficient fuzzing and precise coverage data\n");}
*P = NumGuards % kNumPCs;
}
Modules[NumModules].Start = Start;
Modules[NumModules].Stop = Stop;
NumModules++;
}
运行结果如下所示:
# clang++ -g -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp,func foo.cc -c
# clang++ san.cc foo.o -fsanitize=address -o a
# ./a
GetTotalPCCoverage() is 1
GetTotalPCCoverage() is 2
GetTotalPCCoverage() is 3
aaaaaaaaaaaaaaaaa
GetTotalPCCoverage() is 4
wrong
5.Tracing PCs
当-fsanitize-coverage=trace-pc时,编译器将在每个边上插入 __sanitizer_cov_trace_pc()。在每个间接调用中都会插入一个附加的 ...=trace-pc,indirect-calls
标志__sanitizer_cov_trace_pc_indirect(void *callee)。这些回调不是在Sanitizer运行时实现的,应该由用户定义。此机制用于模糊化Linux内核(https://github.com/google/syzkaller)。
6.Instrumentation points
- 边(默认):边被检测(见下文)。
- BB:基本块被检测。
- 函数:只检测每个函数的入口块。
将这些标志与trace-pc-guard或trace-pc一起使用,如下所示: -fsanitize-coverage=func,trace-pc-guard
。
当使用edge或bb时,如果这种检测被认为是多余的,则某些边/块可能仍然没有被检测(修剪)。使用无修剪(例如-fsanitize coverage=bb,no-prune,trace-pc-guard)禁用修剪。这可能有助于更好的覆盖可视化。
7.Edge coverage
思考如下代码
void foo(int *a) {
if (a)
*a = 0;
}
它包含3个基本块,我们将它们命名为a、b、c:
A
|\
| \
| B
| /
|/
C
如果块a、b和c都被覆盖了,我们肯定边a=>b和b=>c都被执行了,但是我们仍然不知道边a=>c是否被执行了。这种控制流图的边称为临界边。边缘级覆盖通过引入新的虚拟块来简单地分割所有关键边缘,然后插入这些块:
A
|\
| \
D B
| /
|/
C
8.Tracing data flow
支持数据流引导的fuzz。使用-fsanitize-coverage=trace-cmp,编译器将在比较指令和switch语句周围插入额外的检测。类似地,使用-fsanitize-coverage=trace-div编译器将插入整数除法指令(以捕获除法的正确参数),使用 -fsanitize-coverage=trace-gep
–llvm gep指令(以捕获数组索引)。
除非提供no-prune选项,否则不会检测某些比较指令。
// Called before a comparison instruction.
// Arg1 and Arg2 are arguments of the comparison.
//在比较指令之前调用。
//arg1和arg2是比较的参数。
void __sanitizer_cov_trace_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_cmp8(uint64_t Arg1, uint64_t Arg2);
// Called before a comparison instruction if exactly one of the arguments is constant.
// Arg1 and Arg2 are arguments of the comparison, Arg1 is a compile-time constant.
// These callbacks are emitted by -fsanitize-coverage=trace-cmp since 2017-08-11
//如果恰好有一个参数是常量,则在比较指令之前调用。
//arg1和arg2是比较的参数,arg1是编译时常量。
//这些回调是由-fsanitize-coverage=trace-cmp从2017-08-11发出的
void __sanitizer_cov_trace_const_cmp1(uint8_t Arg1, uint8_t Arg2);
void __sanitizer_cov_trace_const_cmp2(uint16_t Arg1, uint16_t Arg2);
void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2);
void __sanitizer_cov_trace_const_cmp8(uint64_t Arg1, uint64_t Arg2);
// Called before a switch statement.
// Val is the switch operand.
// Cases[0] is the number of case constants.
// Cases[1] is the size of Val in bits.
// Cases[2:] are the case constants.
//在switch语句之前调用。
//val是开关操作数。
//cases[0]是case常量的数目。
//cases[1]是以位为单位的val的大小。
//cases[2:]是case常量。
void __sanitizer_cov_trace_switch(uint64_t Val, uint64_t *Cases);
// Called before a division statement.
// Val is the second argument of division.
//在division语句之前调用。
//val是除法的第二个参数。
void __sanitizer_cov_trace_div4(uint32_t Val);
void __sanitizer_cov_trace_div8(uint64_t Val);
// Called before a GetElemementPtr (GEP) instruction
// for every non-constant array index.
//在getelemementptr(gep)指令之前调用
//对于每个非常量数组索引。
void __sanitizer_cov_trace_gep(uintptr_t Idx);
举个例子
//foo.cc
#include<iostream>
#include<string>
int add(int i,int j)
{
return i+j;
}
int main()
{
std::string s;
int i;
std::cin>>s;
if(s[0]=='w'){
i=add(3,5);
}
else{
std::cout<<"wrong"<<std::endl;
}
return 0;
}
#include <stdint.h>
#include <stdio.h>
#include <sanitizer/coverage_interface.h>
extern "C" void __sanitizer_cov_trace_const_cmp4(uint32_t Arg1, uint32_t Arg2)
{
uintptr_t PC = reinterpret_cast<uintptr_t>(__builtin_return_address(0));
printf("cmp4PC is %lu,Arg1 is %u,Arg2 is %u\n",PC,Arg1,Arg2);
}
运行结果如下:
# clang++ -g -fsanitize-coverage=trace-pc-guard,inline-8bit-counters,pc-table,trace-cmp foo.cc -c
# clang++ san.cc foo.o -fsanitize=address
# ./a.out
qqqqqqqqqqqqqqq
cmp4PC is 5211447,Arg1 is 119,Arg2 is 113
wrong
9.Default implementation
消毒剂运行时(addresssanitizer、memorysanizer等)提供了一些覆盖率回调的默认实现。您可以使用此实现在进程出口将覆盖率转储到磁盘上。
例子:
//cov.cc
#include<stdio.h>
__attribute__((noinline))
void foo(){printf("foo\n");}
int main(int argc,char **argv)
{
if(argc==2)
{
foo();
}
printf("main\n");
}
% clang++ -g cov.cc -fsanitize=address -fsanitize-coverage=trace-pc-guard
% ASAN_OPTIONS=coverage=1 ./a.out; wc -c *.sancov
main
SanitizerCoverage: ./a.out.7312.sancov 2 PCs written
24 a.out.7312.sancov
% ASAN_OPTIONS=coverage=1 ./a.out foo ; wc -c *.sancov
foo
main
SanitizerCoverage: ./a.out.7316.sancov 3 PCs written
24 a.out.7312.sancov
32 a.out.7316.sancov
每次运行使用sanitizercoverage检测的可执行文件时,都会在进程关闭期间创建一个*.sancov文件。如果可执行文件与插入指令的DSO动态链接,则还将为每个DSO创建一个*.sancov文件。
10.Sancov data format
*.sancov文件的格式非常简单:前8个字节是magic,0xc0bffffffffffff64和0xc0bffffffffffffffff32之一。魔术的最后一个字节定义了以下偏移量的大小。其余的数据是运行期间执行的相应二进制/dso中的偏移量。
11.Sancov Tool
提供了一个简单的sancov工具来处理覆盖率文件。该工具是llvm项目的一部分,目前仅在linux上受支持。它可以自主地处理符号化任务,而无需环境的任何额外支持。您需要传递.sancov文件(名为<module\u name><pid>.sancov)和所有对应的二进制elf文件的路径。sancov使用模块名和二进制文件名来匹配这些文件。
12.Coverage Reports
实验
.sancov文件包含的信息不足,无法生成源级别的覆盖率报告。缺少的信息包含在二进制文件的调试信息中。因此,必须对.sancov进行符号化,才能首先生成.symcov文件:
sancov -symbolize my_program.123.sancov my_program > my_program.123.symcov
通过运行将启动http服务器的tools/sancov/coverage-report-server.py脚本,可以在源代码上覆盖浏览.symcov文件。
13.Output directory
默认情况下,.sancov文件是在当前工作目录中创建的。这可以通过ASAN_OPTIONS=coverage_dir=/path更改:
% ASAN_OPTIONS="coverage=1:coverage_dir=/tmp/cov" ./a.out foo
% ls -l /tmp/cov/*sancov
-rw-r----- 1 kcc eng 4 Nov 27 12:21 a.out.22673.sancov
-rw-r----- 1 kcc eng 8 Nov 27 12:21 a.out.22679.sancov