Driller源码阅读笔记(一)

Driller源码:https://github.com/shellphish/driller

所给样例为:

import driller

d = driller.Driller("./CADET_00001",  # path to the target binary
                    "racecar", # initial testcase
                    "\xff" * 65535, # AFL bitmap with no discovered transitions
                   )

new_inputs = d.drill()

由于没有程序可以自己随便写个测试,注意python3的话testcase和AFL bitmap都要是bytes。

大致使用情景是fuzzing达到瓶颈之后,将fuzzer的种子作为输入约束进行符号执行,然后对种子的路径上未探索的分支进行求解,将得到的结果作为fuzzer新的种子以让fuzzer对新的路径进一步fuzz。

其中Driller类中的drill方法如下:

    def drill(self):
        """
        Perform the drilling, finding more code coverage based off our existing input base.
        """

        # Don't re-trace the same input.
        if self.redis and self.redis.sismember(self.identifier + '-traced', self.input):
            return -1

        # Write out debug info if desired.
        if l.level == logging.DEBUG and config.DEBUG_DIR:
            self._write_debug_info()
        elif l.level == logging.DEBUG and not config.DEBUG_DIR:
            l.warning("Debug directory is not set. Will not log fuzzing bitmap.")

        # Update traced.
        if self.redis:
            self.redis.sadd(self.identifier + '-traced', self.input)

        list(self._drill_input())

        if self.redis:
            return len(self._generated)
        else:
            return self._generated

由于很多参数没设置,可以先跳过,大致是redis用来阻止重新trace同一个输入。注意中间的"list(self._drill_input())",由于self._drill_input是个函数生成器,加个list相当于让他迭代到循环结束。

重点还是self._drill_input函数,分段分析:

        r = tracer.qemu_runner.QEMURunner(self.binary, self.input, argv=self.argv)
        p = angr.Project(self.binary)
        for addr, proc in self._hooks.items():
            p.hook(addr, proc)
            l.debug("Hooking %#x -> %s...", addr, proc.display_name)

        if p.loader.main_object.os == 'cgc':
            p.simos.syscall_library.update(angr.SIM_LIBRARIES['cgcabi_tracer'])

            s = p.factory.entry_state(stdin=angr.SimFileStream, flag_page=r.magic, mode='tracing')
        else:
            s = p.factory.full_init_state(stdin=angr.SimFileStream, mode='tracing')

r所用的tracer是angr用来进行跟踪的,详见https://github.com/angr/tracer

self._hooks中保存一些用来hook的地址和代替执行的函数。

p.loader.main_object.os本来是程序所能运行的操作系统,好像angr专门为cgc的程序搞了个标签,对于普通程序为其初始化状态,输入流为angr.SimFileStream。

        s.preconstrainer.preconstrain_file(self.input, s.posix.stdin, True)

        simgr = p.factory.simulation_manager(s, save_unsat=True, hierarchy=False, save_unconstrained=r.crash_mode)

        t = angr.exploration_techniques.Tracer(trace=r.trace, crash_addr=r.crash_addr, copy_states=True)
        self._core = angr.exploration_techniques.DrillerCore(trace=r.trace, fuzz_bitmap=self.fuzz_bitmap)

        simgr.use_technique(t)
        simgr.use_technique(angr.exploration_techniques.Oppologist())
        simgr.use_technique(self._core)

        self._set_concretizations(simgr.one_active)

接下来使用preconstrainer.preconstrain_file来预先为符号执行添加约束,preconstrainer添加的约束可以在后面删除,preconstrain_file方法用于为文件设置约束,将s.posix.stdin(即符号执行中的输入)设置为self.input(即传递给Driller的testcase)。这样后续执行的时候先用testcase作为输入执行,确定testcase自身的执行路径。preconstrain_file类型说明如下,其中set_length为True时将content长度作为文件的长度。

preconstrain_file(content, simfile, set_length=False) method of angr.state_plugins.preconstrainer.SimStatePreconstrainer instance
    Preconstrain the contents of a file.
    
    :param content:     The content to preconstrain the file to. Can be a bytestring or a list thereof.
    :param simfile:     The actual simfile to preconstrain

接下来使用simulation_manager生成一个SimulationManager,save_unsat表示将不可满足的状态存入"unsat"存储中,hierarchy设为False会生成默认的一个StateHierarchy对象,来跟踪状态之间的关系。

随后用angr.exploration_techniques.Tracer定义了一个Tracer对象,这是一种遵循使用具体输入的angr路径的探索技术,trace参数为基本块的trace,crash_addr用于存放输入导致程序崩溃的地址(如果有的话),将copy_status将能看到错过的状态。

DrillerCore是一个符号化跟踪一个输入并寻找新状态转移的探索技术,需要与Tracer探索技术一起使用,结果放入'diverted'中。

use_technique能为SimulationManager添加探索技术,并添加了Tracer和DrillerCore,此外还加了Oppologist探索技术,它用于强制使用qemu执行不配合的代码。

此后通过one_active(simgr.stashes中的所有key都可以直接作为simgr的成员、或前缀"one_"的成员取出其中的一个)获取了SimulationManager中的一个活动状态,并作为参数传递给自定义的_set_concretizations方法,其代码如下:

    @staticmethod
    def _set_concretizations(state):
        if state.project.loader.main_object.os == 'cgc':
            flag_vars = set()
            for b in state.cgc.flag_bytes:
                flag_vars.update(b.variables)

            state.unicorn.always_concretize.update(flag_vars)

        # Let's put conservative thresholds for now.
        state.unicorn.concretization_threshold_memory = 50000
        state.unicorn.concretization_threshold_registers = 50000

对于普通程序将设置concretization_threshold_memory与concretization_threshold_registers,这两个分别为在具体化开始后容忍内存和寄存器被unicorn拒绝的次数。

接下来继续看看_drill_input中最后的循环部分:

        while simgr.active and simgr.one_active.globals['trace_idx'] < len(r.trace) - 1:
            simgr.step()

            # Check here to see if a crash has been found.
            if self.redis and self.redis.sismember(self.identifier + '-finished', True):
                return

            if 'diverted' not in simgr.stashes:
                continue

            while simgr.diverted:
                state = simgr.diverted.pop(0)
                l.debug("Found a diverted state, exploring to some extent.")
                w = self._writeout(state.history.bbl_addrs[-1], state)
                if w is not None:
                    yield w
                for i in self._symbolic_explorer_stub(state):
                    yield i

循环条件为simgr.active与simgr.one_active.globals['trace_idx'] < len(r.trace) - 1,即当SimulationManager中还有活动状态以及活动状态还没有执行完全部的trace时,不断循环下去,通过simgr.step()执行一条程序指令。当simgr.stashes中没有key 'diverted'时,即没有出现新的状态转移时,继续执行下一个循环;当出现了状态转移时,取出该状态,通过state.history.bbl_addrs[-1]获取状态的前一个地址,并通过_write_out方法尝试生成输入。其中_write_out内容如下:

    def _writeout(self, prev_addr, state):
        generated = state.posix.stdin.load(0, state.posix.stdin.pos)
        generated = state.solver.eval(generated, cast_to=bytes)

        key = (len(generated), prev_addr, state.addr)

        # Checks here to see if the generation is worth writing to disk.
        # If we generate too many inputs which are not really different we'll seriously slow down AFL.
        if self._in_catalogue(*key):
            self._core.encounters.remove((prev_addr, state.addr))
            return None

        else:
            self._add_to_catalogue(*key)

        l.debug("[%s] dumping input for %#x -> %#x.", self.identifier, prev_addr, state.addr)

        self._generated.add((key, generated))

        if self.redis:
            # Publish it out in real-time so that inputs get there immediately.
            channel = self.identifier + '-generated'

            self.redis.publish(channel, pickle.dumps({'meta': key, 'data': generated, "tag": self.tag}))

        else:
            l.debug("Generated: %s", binascii.hexlify(generated))

        return (key, generated)

大致就是取出输入并约束求解转换为bytes,并通过_in_catalogue判断该输入是否有价值(其实只有在设置了redis时才会在_in_catalogue内部判断,否则_in_catalogue直接返回False),随后将地址信息以及输入加入self._generated,从而得到新的输入作为fuzz的种子。

在_write_out返回为None时,会执行_symbolic_explorer_stub方法,其大致是新生成一个SimulationManager并执行累计1024步,然后将simgr.stashes中deadended和active的状态都计算一遍能否满足,如果可以满足则通过_writeout获取输入并加入self._generated。

    def _symbolic_explorer_stub(self, state):
        # Create a new simulation manager and step it forward up to 1024
        # accumulated active states or steps.
        steps = 0
        accumulated = 1

        p = state.project
        state = state.copy()
        try:
            state.options.remove(angr.options.LAZY_SOLVES)
        except KeyError:
            pass
        simgr = p.factory.simulation_manager(state, hierarchy=False)

        l.debug("[%s] started symbolic exploration at %s.", self.identifier, time.ctime())

        while len(simgr.active) and accumulated < 1024:
            simgr.step()
            steps += 1

            # Dump all inputs.
            accumulated = steps * (len(simgr.active) + len(simgr.deadended))

        l.debug("[%s] stopped symbolic exploration at %s.", self.identifier, time.ctime())

        # DO NOT think this is the same as using only the deadended stashes. this merges deadended and active
        simgr.stash(from_stash='deadended', to_stash='active')
        for dumpable in simgr.active:
            try:
                if dumpable.satisfiable():
                    w = self._writeout(dumpable.history.bbl_addrs[-1], dumpable)
                    if w is not None:
                        yield w

            # If the state we're trying to dump wasn't actually satisfiable.
            except IndexError:
                pass

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值