AFLFast源码+论文解读:Coverage-based greybox fuzzing as markov chain

最新推荐文章于 2022-11-23 17:27:14 发布

Chary Liu

最新推荐文章于 2022-11-23 17:27:14 发布

阅读量6.1k

点赞数 5

分类专栏：论文阅读文章标签： AFLFast AFL

原文链接：http://xueshu.baidu.com/usercenter/paper/show?paperid=1dab9457f196778a04f4b04e11124fc5&site=xueshu_se

版权

论文阅读专栏收录该内容

17 篇文章 6 订阅

订阅专栏

Coverage-based greybox fuzzing as markov chain

1 介绍

目前大部分的bug是由模糊测试发现的，符号执行占小部分，因为fuzz的速度更快，不需要进行程序内部逻辑分析，不需要进行相关的条件约束，而且对于受路径爆炸的影响来说更小。

目前Fuzzing的挑战是，许多fuzz执行少数相同路径。例如，fuzz一张有效的图片文件，那么变异后的图片有90%的可能性执行拒绝无效图片文件的路径。fuzz一张无效的图片文件，有99.999%的概率变异之后的文件执行相同的路径。我们称这样的路径为high-frequency路径。简单来说，即许多fuzz会执行高频路径，导致大多数的fuzz操作的路径是相同的，因此AFL-FAST提出了使afl偏向低频路径的策略，即给予低频路径更多机会，使fuzz更加公平，以便在相同的模糊测试量下探索更多的路径。

AFLFast的核心思想：

power schedule
search strategy

2 背景知识

2.1 CGF(Coverage-based Greybox Fuzzing)

afl工作流程：

①从源码编译程序时进行插桩，以记录代码覆盖率（Code Coverage）；

②选择一些输入文件，作为初始测试集加入输入队列（queue）；

③将队列中的文件按一定的策略进行“突变”；

④如果经过变异文件更新了覆盖范围，则将其保留添加到队列中;

⑤上述过程会一直循环进行，期间触发了crash的文件会被记录下来。

AFL队列：

AFL维护了一个队列(queue)，每次从这个队列中取出一个文件seed，对其进行
大量变异，并检查运行后是否会引起目标崩溃、发现新路径等结果。

Input: Seed Inputs S
𝑇ℵ=∅
𝑇 = 𝑆
if 𝑇 = ∅ then
add empty file to 𝑇
end if
repeat
𝑡 = choose_next (𝑇)
𝑝 = assign_energy(𝑡)
for 𝑖 from 1 to 𝑝 do
𝑡+ = 𝑚𝑢𝑡𝑎𝑡𝑒_𝑖n𝑝𝑢𝑡(𝑡)
if 𝑡+ crashes then
add 𝑡+ to 𝑇ℵ
else if is_interesting(𝑡+) then
add 𝑡+ to 𝑇
end if
end for
until timeout reached or abort-signal
Output: Crashing Input 𝑇ℵ

AFL变异的主要类型：

bitflip，按位翻转，1变为0，0变为1
arithmetic，整数加/减算术运算
interest，把一些特殊内容替换到原文件中
dictionary，把自动生成或用户提供的token替换/插入到原文件中
havoc，中文意思是“大破坏”，此阶段会对原文件进行大量变异
splice，中文意思是“绞接”，此阶段会将两个文件拼接起来得到一个新的文件

2.2 马尔科夫链

下一状态的概率分布只能由当前状态决定，在时间序列中它前面的事件均与之无关

3 改进

3.1 afl原始策略

通过比较（执行时间*input len）决定是否替换队列内的种子

static void update_bitmap_score(struct queue_entry* q) {

  u32 i;
  u64 fav_factor = q->exec_us * q->len;

  /* For every byte set in trace_bits[], see if there is a previous winner,
     and how it compares to us. */

  for (i = 0; i < MAP_SIZE; i++)

    if (trace_bits[i]) {

       if (top_rated[i]) {

         /* Faster-executing or smaller test cases are favored. */

         if (fav_factor > top_rated[i]->exec_us * top_rated[i]->len) continue;

         /* Looks like we're going to win. Decrease ref count for the
            previous winner, discard its trace_bits[] if necessary. */

         if (!--top_rated[i]->tc_ref) {
           ck_free(top_rated[i]->trace_mini);
           top_rated[i]->trace_mini = 0;
         }

       }

       /* Insert ourselves as the new winner. */

       top_rated[i] = q;
       q->tc_ref++;

       if (!q->trace_mini) {
         q->trace_mini = ck_alloc(MAP_SIZE >> 3);
         minimize_bits(q->trace_mini, trace_bits);
       }

       score_changed = 1;

     }

}

计算score（分配能量），此score主要影响havoc阶段的fuzz次
score初始值100，乘上exec_us、bitmap_size、handicap、depth

static u32 calculate_score(struct queue_entry* q) {

  u32 avg_exec_us = total_cal_us / total_cal_cycles;
  u32 avg_bitmap_size = total_bitmap_size / total_bitmap_entries;
  u32 perf_score = 100;

  /* Adjust score based on execution speed of this path, compared to the
     global average. Multiplier ranges from 0.1x to 3x. Fast inputs are
     less expensive to fuzz, so we're giving them more air time. */

  if (q->exec_us * 0.1 > avg_exec_us) perf_score = 10;
  else if (q->exec_us * 0.25 > avg_exec_us) perf_score = 25;
  else if (q->exec_us * 0.5 > avg_exec_us) perf_score = 50;
  else if (q->exec_us * 0.75 > avg_exec_us) perf_score = 75;
  else if (q->exec_us * 4 < avg_exec_us) perf_score = 300;
  else if (q->exec_us * 3 < avg_exec_us) perf_score = 200;
  else if (q->exec_us * 2 < avg_exec_us) perf_score = 150;

  /* Adjust score based on bitmap size. The working theory is that better
     coverage translates to better targets. Multiplier from 0.25x to 3x. */

  if (q->bitmap_size * 0.3 > avg_bitmap_size) perf_score *= 3;
  else if (q->bitmap_size * 0.5 > avg_bitmap_size) perf_score *= 2;
  else if (q->bitmap_size * 0.75 > avg_bitmap_size) perf_score *= 1.5;
  else if (q->bitmap_size * 3 < avg_bitmap_size) perf_score *= 0.25;
  else if (q->bitmap_size * 2 < avg_bitmap_size) perf_score *= 0.5;
  else if (q->bitmap_size * 1.5 < avg_bitmap_size) perf_score *= 0.75;

  /* Adjust score based on handicap. Handicap is proportional to how late
     in the game we learned about this path. Latecomers are allowed to run
     for a bit longer until they catch up with the rest. */

  if (q->handicap >= 4) {

    perf_score *= 4;
    q->handicap -= 4;

  } else if (q->handicap) {

    perf_score *= 2;
    q->handicap--;

  }

  /* Final adjustment based on input depth, under the assumption that fuzzing
     deeper test cases is more likely to reveal stuff that can't be
     discovered with traditional fuzzers. */

  switch (q->depth) {

    case 0 ... 3:   break;
    case 4 ... 7:   perf_score *= 2; break;
    case 8 ... 13:  perf_score *= 3; break;
    case 14 ... 25: perf_score *= 4; break;
    default:        perf_score *= 5;

  }

  /* Make sure that we don't go over limit. */

  if (perf_score > HAVOC_MAX_MULT * 100) perf_score = HAVOC_MAX_MULT * 100;

  return perf_score;

}

3.2 afl-fast改进

Search Strategy

小 $s(i)$ 优先： $s(i)$ 表示种子 $t_i$ 之前在队列中选到的次数。
小 $f(i)$ 优先： $f(i)$ 表示执行状态为i的生成的输入的数量，即fuzz频率。

在afl的基础上，先判断s(i)和f(i)，再判断（执行时间*input len）

static void update_bitmap_score(struct queue_entry* q) {

  u32 i;
  u32 fuzz_level = q->fuzz_level;
  u64 fuzz_p2      = next_p2 (q->n_fuzz);
  u64 fav_factor = q->exec_us * q->len;

  /* For every byte set in trace_bits[], see if there is a previous winner,
     and how it compares to us. */

  for (i = 0; i < MAP_SIZE; i++)

    if (trace_bits[i]) {

       if (top_rated[i]) {

         u32 top_rated_fuzz_level = top_rated[i]->fuzz_level;
         u64 top_rated_fuzz_p2    = next_p2 (top_rated[i]->n_fuzz);
         u64 top_rated_fav_factor = top_rated[i]->exec_us * top_rated[i]->len;

         if(fuzz_level > top_rated_fuzz_level) continue;
         else if (fuzz_level == top_rated_fuzz_level){
          if(fuzz_p2 > top_rated_fuzz_p2) continue;
          else if (fuzz_p2 == top_rated_fuzz_p2) {
            if (fav_factor > top_rated_fav_factor) continue;
          }
        }

         /* Looks like we're going to win. Decrease ref count for the
            previous winner, discard its trace_bits[] if necessary. */

         if (!--top_rated[i]->tc_ref) {
           ck_free(top_rated[i]->trace_mini);
           top_rated[i]->trace_mini = 0;
         }

       }

       /* Insert ourselves as the new winner. */

       top_rated[i] = q;
       q->tc_ref++;

       if (!q->trace_mini) {
         q->trace_mini = ck_alloc(MAP_SIZE >> 3);
         minimize_bits(q->trace_mini, trace_bits);
       }

       score_changed = 1;

     }

}

Power Schedules

AFLFast将CGF视为马尔科夫链，假设当前种子输入执行的路径是i，fuzz之后的路径为j的概率为 $P_{ij}$ ,那么称路径从i到路径j所要生成的测试用例个数为路径频率。即 $f(i)=\frac {1}{P_{ij}}$

p(i) = E(s(i),f(i))

这里的 $p(i)$ 是关于 $s(i)$ 和 $f(i)$ 的函数，其中 $s(i)$ 是指种子 $t_i$ 之前从队列 $T$ 中选到的次数， $f(i)$ 表示执行路径频率。 $\frac {f(i)}{n}$ 是CGF产生输入的概率的最大似然估计量。于是就有了以下几种power shcedules:

Exploitation-based constant schedule (EXPLOIT)：

$p(i)=α(i)$ ,其中 $α(i)$ 是算法中assignEnergy的实现，AFL根据exec_us、bitmap_size、handicap、depth计算的score

Exploration-based constant schedule (EXPLORE):

$p(i)=\frac {α(i)}{β}$ ，其中 $α(i)$ 是afl原本的score计算结果，β>1 是一个常数

Cut-Off Exponential (COE)：指数截断

p(i)
\begin{cases}
0, f(i)>\mu\\
min(\frac {α(i)}{β}2^{s(i)},M), others
\end{cases}

其中 $α(i)$ 保持模糊的原始判断， $β>1$ 是一个常数， $s(i)$ 值较低，μ是执行已发现路径的模糊平均数，且 $\frac {\sum _{i\in S^+}f(i)}{S^+}$ 是发现的路径集合，直观的说，f(i)>μ为高频路径，即使从模糊其他种子接收大量模糊，也被视为低优先级，直到它们再次低于平均值才模糊。常数M提供了每次模糊迭代生成的输入数量的上限。

Exponential schedule (FAST) ：快速截断

p(i)=(\frac {α(i)}{β} \frac {2^{s(i)}}{f(i)},M)

当 f(i)>μ 时，Power Schedule不会使 $t_i$ 模糊，而是使 $t_i$ 和执行路径 $t_i$ 的模糊 $f(i)$ 成反比，分母中的 $f(i)$ 利用过去没有收到大量模糊的 $t_i$ ，因此更可能位于低频区域，这样， $s(i)$ 的指数增长允许越来越多的能量用于低频路径。（就是输入队列中选择的 $t_i$ 输入次数越少，执行路径i就越少，看代码发现这个数就是该种子被fuzz的轮数，一般比较小，所以放在指数上，赋予低频路径越多的能量值，让其多变异。）

Linear schedule (LINEAR)：线性规划

p(i)=(\frac {α(i)}{β} \frac {s(i)}{f(i)},M)

Quadratic schedule (QUAD)：二次规划

p(i)=(\frac {α(i)}{β} \frac {s(i)^2}{f(i)},M)

对于每个种子输入fuzz的次数，在AFL原有计算结果的基础上，再乘以一个以指数增长的系数，然后设定一个最大值M，如果超过了这个最大值，进行截断。

通过上述的策略，AFLFast智能地控制种子fuzz次数，从而使队列调度给予低频路径更多机会，使fuzz更加公平，转向可能隐藏漏洞的路径。

static u32 calculate_score(struct queue_entry* q) {

  u32 avg_exec_us = total_cal_us / total_cal_cycles;
  u32 avg_bitmap_size = total_bitmap_size / total_bitmap_entries;
  u32 perf_score = 100;

  /* Adjust score based on execution speed of this path, compared to the
     global average. Multiplier ranges from 0.1x to 3x. Fast inputs are
     less expensive to fuzz, so we're giving them more air time. */

  if (q->exec_us * 0.1 > avg_exec_us) perf_score = 10;
  else if (q->exec_us * 0.25 > avg_exec_us) perf_score = 25;
  else if (q->exec_us * 0.5 > avg_exec_us) perf_score = 50;
  else if (q->exec_us * 0.75 > avg_exec_us) perf_score = 75;
  else if (q->exec_us * 4 < avg_exec_us) perf_score = 300;
  else if (q->exec_us * 3 < avg_exec_us) perf_score = 200;
  else if (q->exec_us * 2 < avg_exec_us) perf_score = 150;

  /* Adjust score based on bitmap size. The working theory is that better
     coverage translates to better targets. Multiplier from 0.25x to 3x. */

  if (q->bitmap_size * 0.3 > avg_bitmap_size) perf_score *= 3;
  else if (q->bitmap_size * 0.5 > avg_bitmap_size) perf_score *= 2;
  else if (q->bitmap_size * 0.75 > avg_bitmap_size) perf_score *= 1.5;
  else if (q->bitmap_size * 3 < avg_bitmap_size) perf_score *= 0.25;
  else if (q->bitmap_size * 2 < avg_bitmap_size) perf_score *= 0.5;
  else if (q->bitmap_size * 1.5 < avg_bitmap_size) perf_score *= 0.75;

  /* Adjust score based on handicap. Handicap is proportional to how late
     in the game we learned about this path. Latecomers are allowed to run
     for a bit longer until they catch up with the rest. */

  if (q->handicap >= 4) {

    perf_score *= 4;
    q->handicap -= 4;

  } else if (q->handicap) {

    perf_score *= 2;
    q->handicap--;

  }

  /* Final adjustment based on input depth, under the assumption that fuzzing
     deeper test cases is more likely to reveal stuff that can't be
     discovered with traditional fuzzers. */

  switch (q->depth) {

    case 0 ... 3:   break;
    case 4 ... 7:   perf_score *= 2; break;
    case 8 ... 13:  perf_score *= 3; break;
    case 14 ... 25: perf_score *= 4; break;
    default:        perf_score *= 5;

  }

  u64 fuzz = q->n_fuzz;
  u64 fuzz_total;

  u32 n_paths, fuzz_mu;
  u32 factor = 1;

  switch (schedule) {

    case EXPLORE: 
      break;

    case EXPLOIT:
      factor = MAX_FACTOR;
      break;

    case COE:
      fuzz_total = 0;
      n_paths = 0;

      struct queue_entry *queue_it = queue;	
      while (queue_it) {
        fuzz_total += queue_it->n_fuzz;
        n_paths ++;
        queue_it = queue_it->next;
      }

      fuzz_mu = fuzz_total / n_paths;
      if (fuzz <= fuzz_mu) {
        if (q->fuzz_level < 16)
          factor = ((u32) (1 << q->fuzz_level));
        else 
          factor = MAX_FACTOR;
      } else {
        factor = 0;
      }
      break;
    
    case FAST:
      if (q->fuzz_level < 16) {
         factor = ((u32) (1 << q->fuzz_level)) / (fuzz == 0 ? 1 : fuzz); 
      } else
        factor = MAX_FACTOR / (fuzz == 0 ? 1 : next_p2 (fuzz));
      break;

    case LIN:
      factor = q->fuzz_level / (fuzz == 0 ? 1 : fuzz); 
      break;

    case QUAD:
      factor = q->fuzz_level * q->fuzz_level / (fuzz == 0 ? 1 : fuzz);
      break;

    default:
      PFATAL ("Unkown Power Schedule");
  }
  if (factor > MAX_FACTOR) 
    factor = MAX_FACTOR;

  perf_score *= factor / POWER_BETA;

  /* Make sure that we don't go over limit. */

  if (perf_score > HAVOC_MAX_MULT * 100) perf_score = HAVOC_MAX_MULT * 100;

  return perf_score;

}