Fuzzing
A software testing technique, often automated or semi-automated, that involves passing invalid, unexpected or random input to a program and monitor result for crashes, failed assertions, races, leaks, etc.
Fuzzer types
- Generation Based: Generate from scratch with no prior state
- Mutation Based: Mutate existing state based on some rules
- Evolutionary: Generation or mutation based or both, in-processing with code coverage feedback
Fuzzing in the past
以浏览器 Fuzzing 为例
- Generate an HTML page
- Write it to the disk
- Launch browser
- Open the page or serve it over HTTP
- Check if the browser crashed
- Close the browser
No coverage
- Large search space
- Cannot fuzz specific function
- Hard to fuzz network protocols
- Speed of regular fuzzers (html, css, dom, etc mutators)
LibFuzzer 的特点
LibFuzzer is in-process, coverage-guided, evolutionary fuzzing engine. LibFuzzer is linked with the library under test, and feeds fuzzed inputs to the library via a specific fuzzing entrypoint (aka “target function”); the fuzzer then tracks which areas of the code are reached, and generates mutations on the corpus of input data in order to maximize the code coverage. The code coverage information for libFuzzer is provided by LLVM’s SanitizerCoverage instrumentation.
LibFuzzer
是一个 in-process
,coverage-guided
,evolutionary
的模糊测试引擎,它是 LLVM 项目的一部分。LibFuzzer
和要被测试的库链接在一起,通过一个特殊的模糊测试进入点(目标函数)从而将产生fuzz输入数据给到被测试的库。fuzzer
会跟踪哪些代码区域已经测试过,然后在输入数据的语料库上产生变异,来最大化代码覆盖。代码覆盖的信息由 LLVM 的 SanitizerCoverage
插桩提供。
Coverage-guided fuzz testing
配合的 Memory tool
How to see the invisible
Sanitizer
不是 Fuzz 测试里必须的,但是加上它可以发现很多难以发现的 bug。
AddressSanitizer is not the only dynamic testing tool that can be combined with fuzzing. For example, add -fsanitize=signed-integer-overflow -fno-sanitize-recover=all to the build flags.
In some cases you may want to run fuzzing without any additional tool (e.g. a sanitizer). This will allow you to find only the simplest bugs (null dereferences, assertion failures
) but will run faster. Later you may run a sanitized build on the generated corpus to find more bugs. The downside is that you may miss some bugs this way.
LibFuzzer 的使用12
可以看到,如果要 Fuzz 一个程序,用户自己编写 LLVMFuzzerTestOneInput
函数。 Libfuzzer
生成的测试数据 Data
以及测试数据的长度 Size
,把这些生成的测试数据让程序来处理即可,LibFuzz
引擎同时会尽可能的触发更多的代码逻辑。 LibFuzzer 作为一个测试引擎负责生成这些测试数据,并且提供了一套异常检测机制。
There is nothing in these fuzz targets that makes them tied to libFuzzer – there is just one function that takes an array of bytes as a parameter. And so it is possible, and even desirable, to fuzz the same targets with different other fuzzing engines.
// fuzz_target.cc
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *Data, size_t Size) {
DoSomethingInterestingWithMyAPI(Data, Size);
return 0; // Non-zero return values are reserved for future use.
}
-
将待测对象和
libFuzzer.a
库编译链接
clang++ -g -std=c++11 -fsanitize=address -fsanitize-coverage=trace-pc-guard fourth_fuzzer.cc ../../libFuzzer/libFuzzer.a -o fourth_fuzzer
可以看到在编译时其中有很多前面提到的Memory tool
,Sanitizer
用于检测运行时出现的内存错误等异常。 -
之后运行即可
mkdir corpus3 && ./fourth_fuzzer corpus3/ -max_len=1024
总结
似乎可以认为 Libfuzzer
已经把 一个 fuzzer
的核心(样本生成引擎和异常检测系统) 做好了, 我们需要做的是根据目标程序的逻辑,把 Libfuzzer
生成的数据,交给目标程序处理,这与 Fuzzing 这一技术本身是定义是相吻合的。
Fuzzer 进阶
Seed corpus 增加种子样本
mkdir corpus1
./Fuzzer -max_total_time=60 -max_len=1024 -print_final_stats=1 -dict=./dictionary/path corpus1
Fuzzer
程序可以有多个目录作为参数,此时 fuzzer
会递归遍历所有目录,把目录中的文件读入最为样本数据传给测试函数,同时会把那些可以产生新的的代码路径的样本保存到第一个目录里面。
./woff2-2016-05-06-fsanitize_fuzzer MY_CORPUS/ seeds/
When a libFuzzer-based fuzzer is executed with one more directory as arguments, it will first read files from every directory recursively and execute the target function on all of them. Then, any input that triggers interesting code path(s) will be written back into the first corpus directory (in this case, MY_CORPUS).
使用 dictionary
Another important way to improve fuzzing efficiency is to use a dictionary. This works well if the input format being fuzzed consists of tokens or have lots of magic values.
./libxml2-v2.9.2-fsanitize_fuzzer -dict=afl/dictionaries/xml.dict -jobs=8 -workers=8 CORPUS
Parallel runs 任务并行
Another way to increase the fuzzing efficiency is to use more CPUs. If you run the fuzzer with -jobs=N
it will spawn N independent jobs but no more than half of the number of cores you have; use -workers=M
to set the number of allowed parallel jobs.
./libxml2-v2.9.2-fsanitize_fuzzer -dict=afl/dictionaries/xml.dict -jobs=8 -workers=8 CORPUS
精简样本集
-merge=1
mkdir corpus_min
./Fuzzer -merge=1 corpus1_min corpus1
代码覆盖率
-dump_coverage
./Fuzzer corpus1_min -runs=0 -dump_coverage=1