Driller源码:https://github.com/shellphish/driller
所给样例为:
import driller
d = driller.Driller("./CADET_00001", # path to the target binary
"racecar", # initial testcase
"\xff" * 65535, # AFL bitmap with no discovered transitions
)
new_inputs = d.drill()
由于没有程序可以自己随便写个测试,注意python3的话testcase和AFL bitmap都要是bytes。
大致使用情景是fuzzing达到瓶颈之后,将fuzzer的种子作为输入约束进行符号执行,然后对种子的路径上未探索的分支进行求解,将得到的结果作为fuzzer新的种子以让fuzzer对新的路径进一步fuzz。
其中Driller类中的drill方法如下:
def drill(self):
"""
Perform the drilling, finding more code coverage based off our existing input base.
"""
# Don't re-trace the same input.
if self.redis and self.redis.sismember(self.identifier + '-traced', self.input):
return -1
# Write out debug info if desired.
if l.level == logging.DEBUG and config.DEBUG_DIR:
self._write_debug_info()
elif l.level == logging.DEBUG and not config.DEBUG_DIR:
l.warning("Debug directory is not set. Will not log fuzzing bitmap.")
# Update traced.
if self.redis:
self.redis.sadd(self.identifier + '-traced', self.input)
list(self._drill_input())
if self.redis:
return len(self._generated)
else:
return self._generated
由于很多参数没设置,可以先跳过,大致是redis用来阻止重新trace同一个输入。注意中间的"list(self._drill_input())",由于self._drill_input是个函数生成器,加个list相当于让他迭代到循环结束。
重点还是self._drill_input函数,分段分析:
r = tracer.qemu_runner.QEMURunner(self.binary, self.input, argv=self.argv)
p = angr.Project(self.binary)
for addr, proc in self._hooks.items():
p.hook(addr, proc)
l.debug("Hooking %#x -> %s...", addr, proc.display_name)
if p.loader.main_object.os == 'cgc':
p.simos.syscall_library.update(angr.SIM_LIBRARIES['cgcabi_tracer'])
s = p.factory.entry_state(stdin=angr.SimFileStream, flag_page=r.magic, mode='tracing')
else:
s = p.factory.full_init_state(stdin=angr.SimFileStream, mode='tracing')
r所用的tracer是angr用来进行跟踪的,详见https://github.com/angr/tracer
self._hooks中保存一些用来hook的地址和代替执行的函数。
p.loader.main_object.os本来是程序所能运行的操作系统,好像angr专门为cgc的程序搞了个标签,对于普通程序为其初始化状态,输入流为angr.SimFileStream。
s.preconstrainer.preconstrain_file(self.input, s.posix.stdin, True)
simgr = p.factory.simulation_manager(s, save_unsat=True, hierarchy=False, save_unconstrained=r.crash_mode)
t = angr.exploration_techniques.Tracer(trace=r.trace, crash_addr=r.crash_addr, copy_states=True)
self._core = angr.exploration_techniques.DrillerCore(trace=r.trace, fuzz_bitmap=self.fuzz_bitmap)
simgr.use_technique(t)
simgr.use_technique(angr.exploration_techniques.Oppologist())
simgr.use_technique(self._core)
self._set_concretizations(simgr.one_active)
接下来使用preconstrainer.preconstrain_file来预先为符号执行添加约束,preconstrainer添加的约束可以在后面删除,preconstrain_file方法用于为文件设置约束,将s.posix.stdin(即符号执行中的输入)设置为self.input(即传递给Driller的testcase)。这样后续执行的时候先用testcase作为输入执行,确定testcase自身的执行路径。preconstrain_file类型说明如下,其中set_length为True时将content长度作为文件的长度。
preconstrain_file(content, simfile, set_length=False) method of angr.state_plugins.preconstrainer.SimStatePreconstrainer instance
Preconstrain the contents of a file.
:param content: The content to preconstrain the file to. Can be a bytestring or a list thereof.
:param simfile: The actual simfile to preconstrain
接下来使用simulation_manager生成一个SimulationManager,save_unsat表示将不可满足的状态存入"unsat"存储中,hierarchy设为False会生成默认的一个StateHierarchy对象,来跟踪状态之间的关系。
随后用angr.exploration_techniques.Tracer定义了一个Tracer对象,这是一种遵循使用具体输入的angr路径的探索技术,trace参数为基本块的trace,crash_addr用于存放输入导致程序崩溃的地址(如果有的话),将copy_status将能看到错过的状态。
DrillerCore是一个符号化跟踪一个输入并寻找新状态转移的探索技术,需要与Tracer探索技术一起使用,结果放入'diverted'中。
use_technique能为SimulationManager添加探索技术,并添加了Tracer和DrillerCore,此外还加了Oppologist探索技术,它用于强制使用qemu执行不配合的代码。
此后通过one_active(simgr.stashes中的所有key都可以直接作为simgr的成员、或前缀"one_"的成员取出其中的一个)获取了SimulationManager中的一个活动状态,并作为参数传递给自定义的_set_concretizations方法,其代码如下:
@staticmethod
def _set_concretizations(state):
if state.project.loader.main_object.os == 'cgc':
flag_vars = set()
for b in state.cgc.flag_bytes:
flag_vars.update(b.variables)
state.unicorn.always_concretize.update(flag_vars)
# Let's put conservative thresholds for now.
state.unicorn.concretization_threshold_memory = 50000
state.unicorn.concretization_threshold_registers = 50000
对于普通程序将设置concretization_threshold_memory与concretization_threshold_registers,这两个分别为在具体化开始后容忍内存和寄存器被unicorn拒绝的次数。
接下来继续看看_drill_input中最后的循环部分:
while simgr.active and simgr.one_active.globals['trace_idx'] < len(r.trace) - 1:
simgr.step()
# Check here to see if a crash has been found.
if self.redis and self.redis.sismember(self.identifier + '-finished', True):
return
if 'diverted' not in simgr.stashes:
continue
while simgr.diverted:
state = simgr.diverted.pop(0)
l.debug("Found a diverted state, exploring to some extent.")
w = self._writeout(state.history.bbl_addrs[-1], state)
if w is not None:
yield w
for i in self._symbolic_explorer_stub(state):
yield i
循环条件为simgr.active与simgr.one_active.globals['trace_idx'] < len(r.trace) - 1,即当SimulationManager中还有活动状态以及活动状态还没有执行完全部的trace时,不断循环下去,通过simgr.step()执行一条程序指令。当simgr.stashes中没有key 'diverted'时,即没有出现新的状态转移时,继续执行下一个循环;当出现了状态转移时,取出该状态,通过state.history.bbl_addrs[-1]获取状态的前一个地址,并通过_write_out方法尝试生成输入。其中_write_out内容如下:
def _writeout(self, prev_addr, state):
generated = state.posix.stdin.load(0, state.posix.stdin.pos)
generated = state.solver.eval(generated, cast_to=bytes)
key = (len(generated), prev_addr, state.addr)
# Checks here to see if the generation is worth writing to disk.
# If we generate too many inputs which are not really different we'll seriously slow down AFL.
if self._in_catalogue(*key):
self._core.encounters.remove((prev_addr, state.addr))
return None
else:
self._add_to_catalogue(*key)
l.debug("[%s] dumping input for %#x -> %#x.", self.identifier, prev_addr, state.addr)
self._generated.add((key, generated))
if self.redis:
# Publish it out in real-time so that inputs get there immediately.
channel = self.identifier + '-generated'
self.redis.publish(channel, pickle.dumps({'meta': key, 'data': generated, "tag": self.tag}))
else:
l.debug("Generated: %s", binascii.hexlify(generated))
return (key, generated)
大致就是取出输入并约束求解转换为bytes,并通过_in_catalogue判断该输入是否有价值(其实只有在设置了redis时才会在_in_catalogue内部判断,否则_in_catalogue直接返回False),随后将地址信息以及输入加入self._generated,从而得到新的输入作为fuzz的种子。
在_write_out返回为None时,会执行_symbolic_explorer_stub方法,其大致是新生成一个SimulationManager并执行累计1024步,然后将simgr.stashes中deadended和active的状态都计算一遍能否满足,如果可以满足则通过_writeout获取输入并加入self._generated。
def _symbolic_explorer_stub(self, state):
# Create a new simulation manager and step it forward up to 1024
# accumulated active states or steps.
steps = 0
accumulated = 1
p = state.project
state = state.copy()
try:
state.options.remove(angr.options.LAZY_SOLVES)
except KeyError:
pass
simgr = p.factory.simulation_manager(state, hierarchy=False)
l.debug("[%s] started symbolic exploration at %s.", self.identifier, time.ctime())
while len(simgr.active) and accumulated < 1024:
simgr.step()
steps += 1
# Dump all inputs.
accumulated = steps * (len(simgr.active) + len(simgr.deadended))
l.debug("[%s] stopped symbolic exploration at %s.", self.identifier, time.ctime())
# DO NOT think this is the same as using only the deadended stashes. this merges deadended and active
simgr.stash(from_stash='deadended', to_stash='active')
for dumpable in simgr.active:
try:
if dumpable.satisfiable():
w = self._writeout(dumpable.history.bbl_addrs[-1], dumpable)
if w is not None:
yield w
# If the state we're trying to dump wasn't actually satisfiable.
except IndexError:
pass