Driller源码阅读笔记（一）

最新推荐文章于 2021-06-07 21:35:27 发布

WateranFire

最新推荐文章于 2021-06-07 21:35:27 发布

阅读量1.1k

点赞数 2

分类专栏： FUZZ 文章标签：安全漏洞混合测试 Driller angr

本文链接：https://blog.csdn.net/WateranFire/article/details/111689758

版权

FUZZ 专栏收录该内容

4 篇文章

订阅专栏

Driller源码：https://github.com/shellphish/driller

所给样例为：

import driller

d = driller.Driller("./CADET_00001",  # path to the target binary
                    "racecar", # initial testcase
                    "\xff" * 65535, # AFL bitmap with no discovered transitions
                   )

new_inputs = d.drill()

由于没有程序可以自己随便写个测试，注意python3的话testcase和AFL bitmap都要是bytes。

大致使用情景是fuzzing达到瓶颈之后，将fuzzer的种子作为输入约束进行符号执行，然后对种子的路径上未探索的分支进行求解，将得到的结果作为fuzzer新的种子以让fuzzer对新的路径进一步fuzz。

其中Driller类中的drill方法如下：

    def drill(self):
        """
        Perform the drilling, finding more code coverage based off our existing input base.
        """

        # Don't re-trace the same input.
        if self.redis and self.redis.sismember(self.identifier + '-traced', self.input):
            return -1

        # Write out debug info if desired.
        if l.level == logging.DEBUG and config.DEBUG_DIR:
            self._write_debug_info()
        elif l.level == logging.DEBUG and not config.DEBUG_DIR:
            l.warning("Debug directory is not set. Will not log fuzzing bitmap.")

        # Update traced.
        if self.redis:
            self.redis.sadd(self.identifier + '-traced', self.input)

        list(self._drill_input())

        if self.redis:
            return len(self._generated)
        else:
            return self._generated

由于很多参数没设置，可以先跳过，大致是redis用来阻止重新trace同一个输入。注意中间的"list(self._drill_input())"，由于self._drill_input是个函数生成器，加个list相当于让他迭代到循环结束。

重点还是self._drill_input函数，分段分析：

        r = tracer.qemu_runner.QEMURunner(self.binary, self.input, argv=self.argv)
        p = angr.Project(self.binary)
        for addr, proc in self._hooks.items():
            p.hook(addr, proc)
            l.debug("Hooking %#x -> %s...", addr, proc.display_name)

        if p.loader.main_object.os == 'cgc':
            p.simos.syscall_library.update(angr.SIM_LIBRARIES['cgcabi_tracer'])

            s = p.factory.entry_state(stdin=angr.SimFileStream, flag_page=r.magic, mode='tracing')
        else:
            s = p.factory.full_init_state(stdin=angr.SimFileStream, mode='tracing')

r所用的tracer是angr用来进行跟踪的，详见https://github.com/angr/tracer

self._hooks中保存一些用来hook的地址和代替执行的函数。

p.loader.main_object.os本来是程序所能运行的操作系统，好像angr专门为cgc的程序搞了个标签，对于普通程序为其初始化状态，输入流为angr.SimFileStream。

        s.preconstrainer.preconstrain_file(self.input, s.posix.stdin, True)

        simgr = p.factory.simulation_manager(s, save_unsat=True, hierarchy=False, save_unconstrained=r.crash_mode)

        t = angr.exploration_techniques.Tracer(trace=r.trace, crash_addr=r.crash_addr, copy_states=True)
        self._core = angr.exploration_techniques.DrillerCore(trace=r.trace, fuzz_bitmap=self.fuzz_bitmap)

        simgr.use_technique(t)
        simgr.use_technique(angr.exploration_techniques.Oppologist())
        simgr.use_technique(self._core)

        self._set_concretizations(simgr.one_active)

接下来使用preconstrainer.preconstrain_file来预先为符号执行添加约束，preconstrainer添加的约束可以在后面删除，preconstrain_file方法用于为文件设置约束，将s.posix.stdin（即符号执行中的输入）设置为self.input（即传递给Driller的testcase）。这样后续执行的时候先用testcase作为输入执行，确定testcase自身的执行路径。preconstrain_file类型说明如下，其中set_length为True时将content长度作为文件的长度。

preconstrain_file(content, simfile, set_length=False) method of angr.state_plugins.preconstrainer.SimStatePreconstrainer instance
    Preconstrain the contents of a file.
    
    :param content:     The content to preconstrain the file to. Can be a bytestring or a list thereof.
    :param simfile:     The actual simfile to preconstrain

接下来使用simulation_manager生成一个SimulationManager，save_unsat表示将不可满足的状态存入"unsat"存储中，hierarchy设为False会生成默认的一个StateHierarchy对象，来跟踪状态之间的关系。

随后用angr.exploration_techniques.Tracer定义了一个Tracer对象，这是一种遵循使用具体输入的angr路径的探索技术，trace参数为基本块的trace，crash_addr用于存放输入导致程序崩溃的地址（如果有的话），将copy_status将能看到错过的状态。

DrillerCore是一个符号化跟踪一个输入并寻找新状态转移的探索技术，需要与Tracer探索技术一起使用，结果放入'diverted'中。

use_technique能为SimulationManager添加探索技术，并添加了Tracer和DrillerCore，此外还加了Oppologist探索技术，它用于强制使用qemu执行不配合的代码。

此后通过one_active（simgr.stashes中的所有key都可以直接作为simgr的成员、或前缀"one_"的成员取出其中的一个）获取了SimulationManager中的一个活动状态，并作为参数传递给自定义的_set_concretizations方法，其代码如下：

    @staticmethod
    def _set_concretizations(state):
        if state.project.loader.main_object.os == 'cgc':
            flag_vars = set()
            for b in state.cgc.flag_bytes:
                flag_vars.update(b.variables)

            state.unicorn.always_concretize.update(flag_vars)

        # Let's put conservative thresholds for now.
        state.unicorn.concretization_threshold_memory = 50000
        state.unicorn.concretization_threshold_registers = 50000

对于普通程序将设置concretization_threshold_memory与concretization_threshold_registers，这两个分别为在具体化开始后容忍内存和寄存器被unicorn拒绝的次数。

接下来继续看看_drill_input中最后的循环部分：

        while simgr.active and simgr.one_active.globals['trace_idx'] < len(r.trace) - 1:
            simgr.step()

            # Check here to see if a crash has been found.
            if self.redis and self.redis.sismember(self.identifier + '-finished', True):
                return

            if 'diverted' not in simgr.stashes:
                continue

            while simgr.diverted:
                state = simgr.diverted.pop(0)
                l.debug("Found a diverted state, exploring to some extent.")
                w = self._writeout(state.history.bbl_addrs[-1], state)
                if w is not None:
                    yield w
                for i in self._symbolic_explorer_stub(state):
                    yield i

循环条件为simgr.active与simgr.one_active.globals['trace_idx'] < len(r.trace) - 1，即当SimulationManager中还有活动状态以及活动状态还没有执行完全部的trace时，不断循环下去，通过simgr.step()执行一条程序指令。当simgr.stashes中没有key 'diverted'时，即没有出现新的状态转移时，继续执行下一个循环；当出现了状态转移时，取出该状态，通过state.history.bbl_addrs[-1]获取状态的前一个地址，并通过_write_out方法尝试生成输入。其中_write_out内容如下：

    def _writeout(self, prev_addr, state):
        generated = state.posix.stdin.load(0, state.posix.stdin.pos)
        generated = state.solver.eval(generated, cast_to=bytes)

        key = (len(generated), prev_addr, state.addr)

        # Checks here to see if the generation is worth writing to disk.
        # If we generate too many inputs which are not really different we'll seriously slow down AFL.
        if self._in_catalogue(*key):
            self._core.encounters.remove((prev_addr, state.addr))
            return None

        else:
            self._add_to_catalogue(*key)

        l.debug("[%s] dumping input for %#x -> %#x.", self.identifier, prev_addr, state.addr)

        self._generated.add((key, generated))

        if self.redis:
            # Publish it out in real-time so that inputs get there immediately.
            channel = self.identifier + '-generated'

            self.redis.publish(channel, pickle.dumps({'meta': key, 'data': generated, "tag": self.tag}))

        else:
            l.debug("Generated: %s", binascii.hexlify(generated))

        return (key, generated)

大致就是取出输入并约束求解转换为bytes，并通过_in_catalogue判断该输入是否有价值（其实只有在设置了redis时才会在_in_catalogue内部判断，否则_in_catalogue直接返回False），随后将地址信息以及输入加入self._generated，从而得到新的输入作为fuzz的种子。

在_write_out返回为None时，会执行_symbolic_explorer_stub方法，其大致是新生成一个SimulationManager并执行累计1024步，然后将simgr.stashes中deadended和active的状态都计算一遍能否满足，如果可以满足则通过_writeout获取输入并加入self._generated。

    def _symbolic_explorer_stub(self, state):
        # Create a new simulation manager and step it forward up to 1024
        # accumulated active states or steps.
        steps = 0
        accumulated = 1

        p = state.project
        state = state.copy()
        try:
            state.options.remove(angr.options.LAZY_SOLVES)
        except KeyError:
            pass
        simgr = p.factory.simulation_manager(state, hierarchy=False)

        l.debug("[%s] started symbolic exploration at %s.", self.identifier, time.ctime())

        while len(simgr.active) and accumulated < 1024:
            simgr.step()
            steps += 1

            # Dump all inputs.
            accumulated = steps * (len(simgr.active) + len(simgr.deadended))

        l.debug("[%s] stopped symbolic exploration at %s.", self.identifier, time.ctime())

        # DO NOT think this is the same as using only the deadended stashes. this merges deadended and active
        simgr.stash(from_stash='deadended', to_stash='active')
        for dumpable in simgr.active:
            try:
                if dumpable.satisfiable():
                    w = self._writeout(dumpable.history.bbl_addrs[-1], dumpable)
                    if w is not None:
                        yield w

            # If the state we're trying to dump wasn't actually satisfiable.
            except IndexError:
                pass