背景
[0] c = input()
[1] a = 6
[2] b = a + 6
[3] if c == 3:
[4] a = 4
[5] c = a + 1
[6] else
[7] a = 2
[8] c = a + 2
[9] b = a + 2
- 在这里,我们可以通过在[5]处放置一个所谓的观察点来使用RD,然后问“我们可以在[5]处观察变量‘a’的哪些值?”答案显然是4。
- 如果我们在[8]问同样的问题,答案是2,最终,在观测点[9]问这个问题会得到4和2作为答案。如果你对最新的答案感到困惑,请记住,这种技术是纯静态的:分析只是观察到,由于c没有特定的值,我们可以接受’ If '的两个分支,因此,a可以用4或2重新定义。
- 变量只是编程语言中的抽象概念,它们被存储在内存和寄存器中。因此,我们可以使用RD分析对这些实体进行推理,并得到几乎相同的结果。
- 如果我们考虑(完全未优化的)汇编中的前一个程序
[0] mov rcx, [rsp+0x8] ; moving variable 'c' in rcx
[1] mov rax, 6
[2] mov [rsp+0x10], rax ; setting var 'a'
[3] add rbx, rax, 6 ; defining variable 'b'
[4] mov [rsp+0x1C], rbx
[5] cmp rcx, 0x3 ; checking if var 'c' is 3 or not
[6] jne label2 ; if not, go to label2
[7] mov [rsp+0x10], 0x4
[8] mov rax, [rsp+0x10]
[9] add rcx, rax, 0x1
[10] mov [rsp+0x8], rcx
[11] jmp end
[12] label2:
[13] mov [rsp+0x10], 0x2
[14] mov rax, [rsp+0x10]
[15] add rcx, rax, 0x1
[16] mov [rsp+0x8], rcx
[17] end:
[18] mov rax, [rsp+0x10]
[19] add rbx, rax, 0x2
[20] mov [rsp+0x1C], rbx
现在,我们需要定义一个观察点,理解我们想要问的问题,从而得到我们想要的结果。例如,如果我们想问,像以前一样,哪些是var ’ a ‘在分支中的可能值,其中’ c '等于3,我们需要问“哪些值我们可以在[9]为rax寄存器观察?”
示例
直觉我们将手动来发现这些依赖关系的相同技术:查看代码,并简单地查看变量是如何定义和在函数中使用的。在下面的截图中,您可以看到变量v25是如何由memb_alloc定义并由memcpy使用的。
Assembly:
.text:000037BA STRB R5, [R0,#0x1C]
.text:000037BC MOVS R3, #1
.text:000037BE STRB R3, [R0,#0x1D]
.text:000037C0 ADDS R0, #0xC
.text:000037C2 MOV R3, R9
.text:000037C4 ADDS R1, R3, #2
.text:000037C6 MOVS R2, #0x10
.text:000037C8 LDR R3, =(memcpy+1)
.text:000037CA BLX R3 ;memcpy <== OBSERVATION POINT
.text:000037CC ADDS R0, R4, #4 ; t
.text:000037CE MOVS R1, #0x12C00 ; interval
.text:000037D2 LDR R3, =(timer_set+1)
.text:000037D4 BLX R3 ; timer_set
.text:000037D6 MOVS R1, R4 ; item
.text:000037D8 LDR R0, =others_services_list ; list
.text:000037DA LDR R3, =(list_add+1)
.text:000037DC BLX R3 ; list_add
这里的想法非常简单:我们希望在memcpy调用(地址0x37CA)上设置一个观察点,然后请求寄存器r0(保存v25的寄存器)的可见定义。
我们来看以下这段代码
import angr
import autoblob # just to load the blob, more at https://github.com/subwire/autoblob
import angr.analyses.reaching_definitions.dep_graph as dep_graph
blob_path = "./blob.bin"
target_func_addr = 0x3609 # Odd addresses because ARM Thumb mode
call_to_memcpy = 0x37CB # Odd addresses because ARM Thumb mode
print("Creating angr Project")
project = angr.Project(blob_path)
# Some standard options for the CFG
print("Creating binary CFG")
bin_cfg = project.analyses.CFG(resolve_indirect_jumps=True,
cross_references=True,
force_complete_scan=False,
normalize=True,
symbols=True)
# Getting the func object from the addres
target_func = bin_cfg.functions.get_by_addr(target_func_addr)
# Starting the ReachingDefinition analysis
rd = project.analyses.ReachingDefinitions(subject=target_func,
func_graph=target_func.graph,
cc = target_func.calling_convention,
observation_points= [("insn",
call_to_memcpy,
0)],
dep_graph = dep_graph.DepGraph()
)
通过使用对位于0x37CB的memcpy的调用作为观察点,专门在目标函数0x3609上运行到达定义分析(注意,如果愿意,可以设置多个观察点)
通过研究结果对象,我们可以得到关于寄存器r0的信息:
# VEX offset is just how the VEX IR refers to registers
reg_vex_offset = project.arch.registers.get("r0", None)[0]
reg_defs = rd.one_result.register_definitions.get_objects_by_offset(reg_vex_offset)
print(reg_defs)
# RESULT:
# {<Definition {Atom:<Reg 8<4>>, Codeloc:<0x37c1 id=0x37bb[36]>,
# Data:DataSet<32>: {<Undefined>}}>}
结果告诉我们,在调用memcpy之前,定义寄存器r0的最新指令位于代码位置0x37c1(对于那些好奇的人来说:id是基本块地址,36是VEX语句),但它的值(在DataSet中)是Undefined。这是为什么呢?为了理解,我们来看看0x37c1的指令:
.text:000037C0 ADDS R0, #0xC
r0的定义来自于r0所持有的加到0xc的值。这一点r0的值是多少?这又是RD的问题:
rd2 = project.analyses.ReachingDefinitions(subject=target_func,
func_graph=target_func.graph,
cc = target_func.calling_convention,
observation_points= [("insn", 0x37c1 , 0)],
dep_graph = dep_graph.DepGraph()
)
reg_defs2 = rd2.one_result.register_definitions.get_objects_by_offset(reg_vex_offset)
print(reg_defs2)
# RESULT:
# {<Definition {Tags:{
# <ReturnValueTag {Function: 0x3a3d,
# Metadata:{'tagged_by': 'SimEngineRDVEX._handle_function_cc'}}>
# },
# Atom:<Reg 8<4>>,
# Codeloc:<0x37b5 id=0x37b1[-2]>,
# Data:DataSet<32>: {<Undefined>}}>}
之前r0的定义记录在地址0x37b5
,但是通过查看这个地址,我们看不到r0的任何定义,对吧?
.text:000037B0
.text:000037B0 loc_37B0 ; m
.text:000037B0 process_pt = R8 ; pt *
.text:000037B0 LDR R0, =registrations
.text:000037B2 LDR R3, =(memb_alloc+1)
.text:000037B4 BLX R3 ; memb_alloc <==
嗯,RD分析是非常聪明的,通过利用函数的调用约定(CC),知道这个函数将返回一些东西,在当前的CC中,这个“东西”是在r0中返回的,因此,r0实际上是在这里重新定义的。因此,这些信息被集成到我们得到的Definition对象中。我们可以看到,确实,定义被标记为ReturnValueTag,指出不仅r0的内容是函数调用的结果,而且被调用的函数是0x3a3d.
现在我们可以很容易地理解为什么调用memcpy时观察到的值是Undefined: RD不知道函数0x3a3d返回的值,而只是知道函数r0将被重新定义。考虑到这一点,它将分配一个Undefined值,当RD引擎不得不将Undefined + 0xC相加时,所有内容都合并到Undefined中(对于熟悉数据流分析的人来说,这基本上就是合并到Top)。
这些定义的标记机制最近才被引入,我相信它能够对寄存器的这些定义链进行很多智能推理。特别地,我要利用这个机制来解决我们一开始讨论的问题。
回顾这个问题:我们试图找出memcpy的第一个参数(在这个体系结构中由寄存器r0保存)是另一个函数的返回值的情况。想法很简单:自动走回寄存器的定义链我们需要(通过使用dep_graph)
,直到定义不再是Undefined
(在这种情况下这将意味着参数是一个常数)或直到我们遇到一个定义节点ReturnValueTag
标记。
最后代码:
import angr
import autoblob
import os
import angr.analyses.reaching_definitions.dep_graph as dep_graph
from angr.engines.light import SpOffset, RegisterOffset
from angr.knowledge_plugins.key_definitions.atoms import Register, SpOffset, MemoryLocation
from angr.knowledge_plugins.key_definitions.undefined import Undefined
from angr.knowledge_plugins.key_definitions.definition import Tag
from angr.knowledge_plugins.key_definitions.tag import ReturnValueTag
from angr.knowledge_plugins.key_definitions.tag import ParameterTag
from networkx.drawing.nx_agraph import write_dot
# Utility class to walk back the definitions graph.
class DefinitionExplorer():
def __init__(self, project, rd_ddg_graph):
self.project = project
self.rd_ddg_graph = rd_ddg_graph
def resolve_use_def(self, reg_def):
# Now we need to analyze the definition for this atom
reg_seen_defs = set()
defs_to_check = set()
defs_to_check.add(reg_def)
# Cache of all seen nodes (Tie the knot)
seen_defs = set()
while len(defs_to_check) != 0:
current_def = defs_to_check.pop()
seen_defs.add(current_def)
# Check if the current Definition has a tag
def_value = self.check_definition_tag(current_def)
# If def_value is not None we hit a "retval" and we collect it,
# in the other case we need to check if it is Undefined, if yes gotta walk back.
if def_value:
reg_seen_defs.add(def_value)
else:
dataset = current_def.data
# Boolean guard: do we have any undefined pointers?
undefined_pointers = False
# A value in DataSet can be "Int" or "Undefined"
for data in dataset:
if type(data) == Undefined: undefined_pointers = True
# If we have undefined pointers (a.k.a. Top value) we need to process the predecessors.
if undefined_pointers:
for pred in self.rd_ddg_graph.graph.predecessors(current_def):
if pred not in seen_defs:
defs_to_check.add(pred)
else:
# This is a constant.
def_value = ("int", None)
reg_seen_defs.add(def_value)
return reg_seen_defs
# Checking the tag over a definition.
def check_definition_tag(self, definition):
if len(definition.tags) > 0:
curr_tag = definition.tags.pop() # Ok just take the first one as for now.
if type(curr_tag) == ReturnValueTag:
return ("retval",curr_tag.function)
else:
print(type(curr_tag))
return None
# Path of the blob.
blob_path = "./atmel_6lowpan_udp_rx.bin"
# Address of memcpy function.
memcpy_addr = 0xf647
print("Creating angr Project")
project = angr.Project(blob_path)
print("Creating binary CFG")
bin_cfg = project.analyses.CFG(resolve_indirect_jumps=True, cross_references=True,
force_complete_scan=False, normalize=True, symbols=True)
# Get CFG node for memcpy
memcpy_node = bin_cfg.model.get_any_node(memcpy_addr)
# Get all the XRefs (predecessor of the memcpy nodes)
memcpy_node_preds = memcpy_node.predecessors
# Get the CC of memcpy
memcpy_cc = project.kb.functions[memcpy_addr].calling_convention
# Grab all functions that have an xrefs to the basic function
memcpy_funcs_preds = list(set([x.function_address for x in memcpy_node_preds]))
# Creating a dictionary of predecessors functions and the address
# of the xrefs to the memcpy
FUNC_PREDECESSORS = {}
for memcpy_func_pred_addr in memcpy_funcs_preds:
FUNC_PREDECESSORS[str(memcpy_func_pred_addr)] = []
for x in memcpy_node_preds:
FUNC_PREDECESSORS[str(x.function_address)].append(x)
OVERALL_DEFS = set()
FUNCS = set()
for memcpy_func_pred_addr, xrefs in FUNC_PREDECESSORS.items():
memcpy_func_pred_addr = int(memcpy_func_pred_addr)
print("Now analyzing predecessor func at {}".format(hex(memcpy_func_pred_addr)))
print("XRefs are {}".format((xrefs)))
for xref in xrefs:
print("-->Analyzing XRefs at {}".format(hex(xref.addr)))
# Get the Function object of the func containing the xref to memcpy
memcpy_func_pred = bin_cfg.functions.get_by_addr(memcpy_func_pred_addr)
# Call to the bf function is the last instruction of the block.
call_to_xref_address = project.factory.block(xref.addr).instruction_addrs[-1]
try:
rd = project.analyses.ReachingDefinitions(subject=memcpy_func_pred,
func_graph=memcpy_func_pred.graph,
cc = memcpy_func_pred.calling_convention,
observation_points= [("insn", call_to_xref_address , 0)],
dep_graph = dep_graph.DepGraph()
)
except Exception as e:
# Sorry for this, sometimes it explodes :)
continue
rd_ddg_graph = rd.dep_graph
# Instantiate the object that will walk back the dep_graph.
def_explorer = DefinitionExplorer(project, rd_ddg_graph)
# Get the VEX offset for "r0"
reg_vex_offset = project.arch.registers.get("r0", None)[0]
if rd.observed_results != {}:
# Cycle all over the results
for observed_result in rd.observed_results.items():
reg_defs = observed_result[1].register_definitions.get_objects_by_offset(reg_vex_offset)
for reg_def in reg_defs:
reg_seen_defs = def_explorer.resolve_use_def(reg_def)
for definition in reg_seen_defs:
OVERALL_DEFS.add(definition)
for definition in OVERALL_DEFS:
if definition[0] == "retval":
# It's not always guaranteed that the retval tag of a definition has the
# func addr, in those casese we call it a day (definition[1] will be None).
if definition[1] != None:
FUNCS.add(definition[1])
print(FUNCS)
在FUNCS中,我们可以看到所有为memcpy的r0参数提供值的函数。