二进制漏洞挖掘之angr‘s Reaching Definition Analysis(一)

文章目录

背景

[0] c = input()
[1] a = 6  
[2] b = a + 6  
[3] if c == 3:  
[4]   a = 4  
[5]   c = a + 1 
[6] else  
[7]   a = 2  
[8]   c = a + 2 
[9] b = a + 2
  • 在这里,我们可以通过在[5]处放置一个所谓的观察点来使用RD,然后问“我们可以在[5]处观察变量‘a’的哪些值?”答案显然是4。
  • 如果我们在[8]问同样的问题,答案是2,最终,在观测点[9]问这个问题会得到4和2作为答案。如果你对最新的答案感到困惑,请记住,这种技术是纯静态的:分析只是观察到,由于c没有特定的值,我们可以接受’ If '的两个分支,因此,a可以用4或2重新定义。
  • 变量只是编程语言中的抽象概念,它们被存储在内存和寄存器中。因此,我们可以使用RD分析对这些实体进行推理,并得到几乎相同的结果。
  • 如果我们考虑(完全未优化的)汇编中的前一个程序
[0]  mov rcx, [rsp+0x8]   ; moving variable 'c' in rcx
[1]  mov rax, 6         
[2]  mov [rsp+0x10], rax  ; setting var 'a'   
[3]  add rbx, rax, 6      ; defining variable 'b'
[4]  mov [rsp+0x1C], rbx  
[5]  cmp rcx, 0x3         ; checking if var 'c' is 3 or not
[6]  jne label2           ; if not, go to label2
[7]  mov [rsp+0x10], 0x4
[8]  mov rax, [rsp+0x10]
[9]  add rcx, rax, 0x1
[10] mov [rsp+0x8], rcx
[11] jmp end 
[12] label2:
[13]  mov [rsp+0x10], 0x2
[14]  mov rax, [rsp+0x10]
[15]  add rcx, rax, 0x1
[16]  mov [rsp+0x8], rcx
[17] end:
[18]  mov rax, [rsp+0x10]
[19]  add rbx, rax, 0x2
[20]  mov [rsp+0x1C], rbx

现在,我们需要定义一个观察点,理解我们想要问的问题,从而得到我们想要的结果。例如,如果我们想问,像以前一样,哪些是var ’ a ‘在分支中的可能值,其中’ c '等于3,我们需要问“哪些值我们可以在[9]为rax寄存器观察?”

示例

直觉我们将手动来发现这些依赖关系的相同技术:查看代码,并简单地查看变量是如何定义和在函数中使用的。在下面的截图中,您可以看到变量v25是如何由memb_alloc定义并由memcpy使用的。
在这里插入图片描述
Assembly:

.text:000037BA STRB    R5, [R0,#0x1C]
.text:000037BC MOVS    R3, #1
.text:000037BE STRB    R3, [R0,#0x1D]
.text:000037C0 ADDS    R0, #0xC
.text:000037C2 MOV     R3, R9
.text:000037C4 ADDS    R1, R3, #2
.text:000037C6 MOVS    R2, #0x10
.text:000037C8 LDR     R3, =(memcpy+1)
.text:000037CA BLX     R3  ;memcpy     <== OBSERVATION POINT 
.text:000037CC ADDS    R0, R4, #4      ; t
.text:000037CE MOVS    R1, #0x12C00    ; interval
.text:000037D2 LDR     R3, =(timer_set+1)
.text:000037D4 BLX     R3              ; timer_set
.text:000037D6 MOVS    R1, R4          ; item
.text:000037D8 LDR     R0, =others_services_list ; list
.text:000037DA LDR     R3, =(list_add+1)
.text:000037DC BLX     R3              ; list_add

这里的想法非常简单:我们希望在memcpy调用(地址0x37CA)上设置一个观察点,然后请求寄存器r0(保存v25的寄存器)的可见定义。

我们来看以下这段代码

import angr 
import autoblob # just to load the blob, more at https://github.com/subwire/autoblob
import angr.analyses.reaching_definitions.dep_graph as dep_graph

blob_path = "./blob.bin"
target_func_addr = 0x3609 # Odd addresses because ARM Thumb mode 
call_to_memcpy = 0x37CB # Odd addresses because ARM Thumb mode

print("Creating angr Project")
project = angr.Project(blob_path)

# Some standard options for the CFG
print("Creating binary CFG")
bin_cfg = project.analyses.CFG(resolve_indirect_jumps=True, 
                               cross_references=True, 
                               force_complete_scan=False, 
                               normalize=True, 
                               symbols=True)

# Getting the func object from the addres
target_func = bin_cfg.functions.get_by_addr(target_func_addr)
# Starting the ReachingDefinition analysis
rd = project.analyses.ReachingDefinitions(subject=target_func, 
                                          func_graph=target_func.graph,
                                          cc = target_func.calling_convention,
                                          observation_points= [("insn", 
                                                                call_to_memcpy,
                                                                0)],
                                          dep_graph = dep_graph.DepGraph()
                                          )

通过使用对位于0x37CB的memcpy的调用作为观察点,专门在目标函数0x3609上运行到达定义分析(注意,如果愿意,可以设置多个观察点)

通过研究结果对象,我们可以得到关于寄存器r0的信息:

# VEX offset is just how the VEX IR refers to registers
reg_vex_offset = project.arch.registers.get("r0", None)[0]
reg_defs = rd.one_result.register_definitions.get_objects_by_offset(reg_vex_offset)
print(reg_defs)
# RESULT:
# {<Definition {Atom:<Reg 8<4>>, Codeloc:<0x37c1 id=0x37bb[36]>, 
#              Data:DataSet<32>: {<Undefined>}}>}

结果告诉我们,在调用memcpy之前,定义寄存器r0的最新指令位于代码位置0x37c1(对于那些好奇的人来说:id是基本块地址,36是VEX语句),但它的值(在DataSet中)是Undefined。这是为什么呢?为了理解,我们来看看0x37c1的指令:
.text:000037C0 ADDS R0, #0xC
r0的定义来自于r0所持有的加到0xc的值。这一点r0的值是多少?这又是RD的问题:

rd2 = project.analyses.ReachingDefinitions(subject=target_func, 
                                          func_graph=target_func.graph,
                                          cc = target_func.calling_convention,
                                          observation_points= [("insn", 0x37c1 , 0)],
                                          dep_graph = dep_graph.DepGraph()
                                          )
reg_defs2 = rd2.one_result.register_definitions.get_objects_by_offset(reg_vex_offset)
print(reg_defs2)
# RESULT:
# {<Definition {Tags:{
#                    <ReturnValueTag {Function: 0x3a3d, 
#                                     Metadata:{'tagged_by': 'SimEngineRDVEX._handle_function_cc'}}>
#                   }, 
#              Atom:<Reg 8<4>>, 
#              Codeloc:<0x37b5 id=0x37b1[-2]>,
#              Data:DataSet<32>: {<Undefined>}}>}

之前r0的定义记录在地址0x37b5,但是通过查看这个地址,我们看不到r0的任何定义,对吧?

.text:000037B0
.text:000037B0 loc_37B0                ; m
.text:000037B0 process_pt = R8         ; pt *
.text:000037B0 LDR     R0, =registrations
.text:000037B2 LDR     R3, =(memb_alloc+1)
.text:000037B4 BLX     R3     ; memb_alloc  <==

嗯,RD分析是非常聪明的,通过利用函数的调用约定(CC),知道这个函数将返回一些东西,在当前的CC中,这个“东西”是在r0中返回的,因此,r0实际上是在这里重新定义的。因此,这些信息被集成到我们得到的Definition对象中。我们可以看到,确实,定义被标记为ReturnValueTag,指出不仅r0的内容是函数调用的结果,而且被调用的函数是0x3a3d.

现在我们可以很容易地理解为什么调用memcpy时观察到的值是Undefined: RD不知道函数0x3a3d返回的值,而只是知道函数r0将被重新定义。考虑到这一点,它将分配一个Undefined值,当RD引擎不得不将Undefined + 0xC相加时,所有内容都合并到Undefined中(对于熟悉数据流分析的人来说,这基本上就是合并到Top)。

这些定义的标记机制最近才被引入,我相信它能够对寄存器的这些定义链进行很多智能推理。特别地,我要利用这个机制来解决我们一开始讨论的问题。

回顾这个问题:我们试图找出memcpy的第一个参数(在这个体系结构中由寄存器r0保存)是另一个函数的返回值的情况。想法很简单:自动走回寄存器的定义链我们需要(通过使用dep_graph),直到定义不再是Undefined(在这种情况下这将意味着参数是一个常数)或直到我们遇到一个定义节点ReturnValueTag标记。

最后代码:

import angr 
import autoblob
import os
import angr.analyses.reaching_definitions.dep_graph as dep_graph

from angr.engines.light import SpOffset, RegisterOffset
from angr.knowledge_plugins.key_definitions.atoms import Register, SpOffset, MemoryLocation
from angr.knowledge_plugins.key_definitions.undefined import Undefined
from angr.knowledge_plugins.key_definitions.definition import Tag
from angr.knowledge_plugins.key_definitions.tag import ReturnValueTag
from angr.knowledge_plugins.key_definitions.tag import ParameterTag

from networkx.drawing.nx_agraph import write_dot

# Utility class to walk back the definitions graph.
class DefinitionExplorer():
    def __init__(self, project, rd_ddg_graph):
        self.project = project
        self.rd_ddg_graph = rd_ddg_graph

    def resolve_use_def(self, reg_def):
        # Now we need to analyze the definition for this atom
        reg_seen_defs = set()
        defs_to_check = set()
        defs_to_check.add(reg_def)
    
        # Cache of all seen nodes (Tie the knot)
        seen_defs = set()

        while len(defs_to_check) != 0:
            current_def = defs_to_check.pop()
            seen_defs.add(current_def) 
            # Check if the current Definition has a tag 
            def_value = self.check_definition_tag(current_def)
            
            # If def_value is not None we hit a "retval" and we collect it,
            # in the other case we need to check if it is Undefined, if yes gotta walk back. 
            if def_value:
                reg_seen_defs.add(def_value)
            else:
                dataset = current_def.data 
                # Boolean guard: do we have any undefined pointers? 
                undefined_pointers = False 
                
                # A value in DataSet can be "Int" or "Undefined"
                for data in dataset:
                    if type(data) == Undefined: undefined_pointers = True  

                # If we have undefined pointers (a.k.a. Top value) we need to process the predecessors.
                if undefined_pointers:
                    for pred in self.rd_ddg_graph.graph.predecessors(current_def):
                        if pred not in seen_defs:
                            defs_to_check.add(pred)
                else:
                     # This is a constant.
                    def_value = ("int", None)
                    reg_seen_defs.add(def_value)

        return reg_seen_defs

    # Checking the tag over a definition.
    def check_definition_tag(self, definition):
        if len(definition.tags) > 0:
            curr_tag = definition.tags.pop() # Ok just take the first one as for now.
            if type(curr_tag) == ReturnValueTag:
                return ("retval",curr_tag.function) 
            else:
                print(type(curr_tag))
                return None

# Path of the blob.
blob_path = "./atmel_6lowpan_udp_rx.bin"

# Address of memcpy function.
memcpy_addr = 0xf647

print("Creating angr Project")
project = angr.Project(blob_path)

print("Creating binary CFG")
bin_cfg = project.analyses.CFG(resolve_indirect_jumps=True, cross_references=True, 
                                force_complete_scan=False, normalize=True, symbols=True)

# Get CFG node for memcpy
memcpy_node = bin_cfg.model.get_any_node(memcpy_addr)
# Get all the XRefs (predecessor of the memcpy nodes)
memcpy_node_preds = memcpy_node.predecessors
# Get the CC of memcpy
memcpy_cc =  project.kb.functions[memcpy_addr].calling_convention

# Grab all functions that have an xrefs to the basic function
memcpy_funcs_preds = list(set([x.function_address for x in memcpy_node_preds]))

# Creating a dictionary of predecessors functions and the address 
# of the xrefs to the memcpy 
FUNC_PREDECESSORS = {}
for memcpy_func_pred_addr in memcpy_funcs_preds:
    FUNC_PREDECESSORS[str(memcpy_func_pred_addr)] = []
for x in memcpy_node_preds:
    FUNC_PREDECESSORS[str(x.function_address)].append(x)

OVERALL_DEFS = set()
FUNCS = set()

for memcpy_func_pred_addr, xrefs in FUNC_PREDECESSORS.items():
    memcpy_func_pred_addr = int(memcpy_func_pred_addr)
    print("Now analyzing predecessor func at {}".format(hex(memcpy_func_pred_addr)))
    print("XRefs are {}".format((xrefs)))
    
    for xref in xrefs:
        print("-->Analyzing XRefs at {}".format(hex(xref.addr)))
        # Get the Function object of the func containing the xref to memcpy
        memcpy_func_pred = bin_cfg.functions.get_by_addr(memcpy_func_pred_addr)

        # Call to the bf function is the last instruction of the block.
        call_to_xref_address = project.factory.block(xref.addr).instruction_addrs[-1]
        
        try:
            rd = project.analyses.ReachingDefinitions(subject=memcpy_func_pred, 
                                                      func_graph=memcpy_func_pred.graph,
                                                      cc = memcpy_func_pred.calling_convention,
                                                      observation_points= [("insn", call_to_xref_address , 0)],
                                                      dep_graph = dep_graph.DepGraph()
                                                     )
        except Exception as e:
            # Sorry for this, sometimes it explodes :)
            continue

        rd_ddg_graph = rd.dep_graph
        # Instantiate the object that will walk back the dep_graph.
        def_explorer = DefinitionExplorer(project, rd_ddg_graph)
        
        # Get the VEX offset for "r0"
        reg_vex_offset = project.arch.registers.get("r0", None)[0]
        
        if rd.observed_results != {}:
            # Cycle all over the results 
            for observed_result in rd.observed_results.items():
                reg_defs = observed_result[1].register_definitions.get_objects_by_offset(reg_vex_offset)
                for reg_def in reg_defs:
                    reg_seen_defs = def_explorer.resolve_use_def(reg_def)
                    for definition in reg_seen_defs:
                        OVERALL_DEFS.add(definition)

            for definition in OVERALL_DEFS:
                if definition[0] == "retval":
                    # It's not always guaranteed that the retval tag of a definition has the
                    # func addr, in those casese we call it a day (definition[1] will be None).
                    if definition[1] != None:
                        FUNCS.add(definition[1])
print(FUNCS)

在FUNCS中,我们可以看到所有为memcpy的r0参数提供值的函数。

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值