Effective file format fuzzing

最新推荐文章于 2024-04-26 10:02:59 发布

cogit_o

最新推荐文章于 2024-04-26 10:02:59 发布

阅读量342

点赞数

分类专栏： fuzz 文章标签： fuzz

fuzz 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

Effective file format fuzzing

By Mateusz “j00ru” Jurczyk In Black Hat Europe 2016, London

Fuzz testing or fuzzing is a software testing technique, often automated or semi-automated, that involves providing invalid, unexpected, or random data to the inputs of a computer program.

On a scheme

Key questions

How do we choose the fuzzing target in the first place?
How are the inputs generated?
What is the base set of the input samples? Where do we get it from?
How do we mutate the inputs?
How do we detect soft ware failures/crashes?
Do we make any decisions in future fuzzing based on the software’s behavior in the past?
How do we minimize the interesting inputs/mutations?
How do we recognize unique bugs?
What if the software requires user interaction and/or displays windows?
What if the application keeps crashing at a single location due to an easily reschable bug?
What if the fuzzed file format includes checksums, other consistency checks, compression or encryption?

Gathering an initial corpus of input files

A desired step in a majority of cases:

Makes it possible to reach some code paths and program states immediately after starting the fuzzing.
May contain complex data structures which would be difficult or impossible to generate organically using just code coverage information, e.g. magic values, correct headers, compression tree etc.
Even if the same inputs could be constructed during fuzzing with an empty seed, having them right at the beginning saves a lot of CPU time.
Corpora containing files in specific formats may be frequently reused to fuzz various software projects which handle them.

Gathering inputs: the standard methods

Open-source projects often include etensive sets of input data for testing, which can be freely reused as a fuzzing starting point.

Example: FFmpeg FATE, samples.ffmpeg.org. Lots of formats there, which would be otherwise very difficult to obtain in the wild.
Sometimes they’re not publicly available for everyone, but the developers have them and will share with someone willing to report bugs in return.

Many of them also include converters form format X to their own format Y. With a diverse set of files in format X and/or diverse conversion options, this can also generate a decent corpus.

Example: cwebp, a converter from PNG/JPEG/TIFF to WEBP images.

Gathering inputs: Internet crawling

Depending on the popularity of the fuzzed file format, Internet crawing is the most intuitive apporach.

Download files with a specific file extension.
Download files with specific magic bytes or other signatures.

If the format is indeed popular(e.g. DOC, PDF, SWF etc.), you may end up with many terabytes of data on your disk.

Not a huge problem, since storage is cheap, and the corpus can be later minimized to consume less space while providing equivalent code coverage.

Case study: IDA Pro

IDA Pro loader architecture

Modular design, with each loader( also disassembler) residing in a separate module, exporting two funcitions: accept_file and load_file.

One file for the 32-bit version of IDA(.llx on linux, .ldw on windows) and one file for 64-bit(.llx64, .l64).

int (idaapi* accept_file)(linput_t *li,
                          char fileformatname[MAX_FILE_FORMAT_NAME],
                          int n);

void (idaapi* load_file)(linput_t *li,
                         ushort neflags,
                         const char *fileformatname);

The accept_file function performs preliminary processing and returns 0 or 1 depending on whether the given module thinks it can handle the input file as Nth of its supported formats.

If so, returns the name of the format in the fileformatname argument.

load_file performs the regular processing of the file.

Both functions (and many more required to interact with IDA) are documented in the IDA SDK.

Corpus distillation

In fuzzing, it si important to get rid of most of the redundancy in the input corpus.

Both the base one and the living one evolving during fuzzing.
In the context of a single tesst case, the following should be maximized:

| p r o g r a m s t a t e s e x p l o r e d | i n p u t s i z e

$\frac{ |program states explored| }{ input size }$

which strives for the highest byte-to-program-feature ratio（字节功能比）: each portion of a file should exercise a new functionality(文件的每一部分测试不同的功能), instead of repeating constructs found elsewhere in the sample.

Likewise, in the whole corpus, the following should be generally maximized:

| p r o g r a m s t a t e s e x p l o r e d | i n p u t s a m p l e s

$\frac{|program states explored|}{input samples}$
This ensures that there aren’t too many samples which all exercise the same functionality (enforces program state diversity while keeping the corpus size relatively low).(尽可能用最少的例子达到最大的覆盖率)

Format specific corpus minimization

If there is too much data to thoroughly process, and the format is easy to parse and recognize (non-)interesting parts, you can do some cursory filtering(语料过滤) to extract unusual samples or remove dull ones.

Many formats are structured into chunks with unique identifiers: SWF, PDF, PNG, JPEG, TTF, OTF etc.
Such generic parsing may already reveal(显示) if a file will be a promising fuzzing candidate(有希望的fuzzing候选者) or not.
The deeper into the specs, the more work is required. It’s usually not cost-effective to go beyond the general file structure, given other (better) methods of corpus distillation.
Be careful not to reduce out interesting samples which only appear to be boring at first glance.

How to define a program state?
File sizes and cardinality (from the previous expressions) are trivial(不重要的) to measure.
There doesn’t exist such a simple metric for program states, especially with the following characteristics:

their number should stay within a sane（合理的） range, e.g. counting all combinations of every bit in memory cleared/set is not an option.
they should be meaningful in the conetxt of memory safety.（状态的含义必须是和内存安全相关的）
they should be easily/quickly determined during process run time.（进程运行时容易被确定的）

Code coverage $\cong$ programstates

Most approximations are currently based on measuring code coverage, and not the actual memory state.

Pros:

Increased code coverage is representative of new program states. In fuzzing, the more tested code is executed, the higher chance for a bug to be found.
The sane range requirement is met: code coverage information is typically linear in size in relation to the overall program size.(代码覆盖率信息的大小和整个程序的大小呈线性关系)
Easily measurable using both compiled-in and external instrumentation.（内部和外部的指令）

Cons:

Constant code coverage does not indicate constant |program states|. A significant amount of information on distinct states may be lost when only using this metric.

Current state of the art : counting basic blocks

Basic blocks provide the best granularity.

Smallest coherent(相干) units of execution.
Measuring just functions loses lots of information on what goses on inside.
Recording specific instructions is generally redundant, since all of them are guaranteed to execute within the same basic block.

Supported in both compiler (gcov etc.) and external instrumentations(Intel Pin, DynamoRIO).
Identified by the address of the first instruction.

Basic blocks: incomplete information

void foo(int a, int b) {
    if(a == 42 || b == 1337) {
        printf("Success!");
    }
}

void bar() {
    foo(0, 1337);
    foo(42, 0);
    foo(0, 0);

这里写图片描述

Basic blocks: incomplete information

Even though the two latter foo() calls take different paths in the code, this information is not recorded and lost in a simple BB granularity system.

Arguably(按理说) they constitute(构成) new program states which could be usrful in fuzzing.

Another idea - program interpreted as a graph.

vertices = basic blocks
edged = transition paths between the basic blocks
Let’s record edged rather then vertices to obtain more detailed information on the control flow!