Angr学习（一）

最新推荐文章于 2022-04-02 14:27:34 发布

WateranFire

最新推荐文章于 2022-04-02 14:27:34 发布

阅读量2.6k

点赞数 3

分类专栏： RE 文章标签： angr

原文链接：https://docs.angr.io

版权

RE 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

以前只是简单套用angr模板来解一些简单的题目，现在希望能深入了解一下angr, 所以做个记录。

注意，本文不是直接翻译，而会根据个人感觉进行删减！

本文主要参考自https://docs.angr.io/

===========================================================

下面是一些小tips，可以跳过。

angr的调试信息记录

import logging
logging.getLogger('angr').setLevel('DEBUG')

其中getLogger()的参数是希望记录的最高级别模块，如果只想记录某个模块的信息，比如angr.analyses.cfg，可以：

logging.getLogger('angr.analyses').setLevel('INFO')

发现bug的方式

由于angr比较复杂，有时候会遇到一些bug，比如把一个指针符号化，这时可能会导致状态空间爆炸，为了避免于此，可以考虑设置状态检查。

========================================================

核心概念

--------------------------------------------------------------------------------------------------

顶层接口

使用angr的第一步通常都是加载一个二进制文件。

>>> import angr
>>> proj = angr.Project('/bin/true')

一个Project是你在angr中控制的基础，你可以基于它来进行可执行文件的分析与仿真。

基本属性

一个project有多个属性：CPU架构，文件入口地址，文件名...

>>> import monkeyhex # this will format numerical results in hexadecimal
>>> proj.arch
<Arch AMD64 (LE)>
>>> proj.entry
0x401670
>>> proj.filename
'/bin/true'

arch是archinfo.Arch对象的一个实例，表示程序指令架构，在本例中是AMD64小端，一般会关心 arch.bits, arch.bytes,arch.name与arch.memory_endness。

加载器

angr使用CLE模块将二进制文件映射到虚拟地址空间，得到的结果作为属性loader,可以通过它查看二进制文件加载的库以及加载地址空间的一些信息

>>> proj.loader
<Loaded true, maps [0x400000:0x5004000]>

>>> proj.loader.shared_objects # may look a little different for you!
{'ld-linux-x86-64.so.2': <ELF Object ld-2.24.so, maps [0x2000000:0x2227167]>,
 'libc.so.6': <ELF Object libc-2.24.so, maps [0x1000000:0x13c699f]>}

>>> proj.loader.min_addr
0x400000
>>> proj.loader.max_addr
0x5004000

>>> proj.loader.main_object  # we've loaded several binaries into this project. Here's the main one!
<ELF Object true, maps [0x400000:0x60721f]>

>>> proj.loader.main_object.execstack  # sample query: does this binary have an executable stack?
False
>>> proj.loader.main_object.pic  # sample query: is this binary position-independent?
True

factory

angr中有许多使用时需要一个实例化的project的类，可以使用project.factory而不用把project到处传递。

project.factory.block()可以通过给定的地址提取代码块

>>> block = proj.factory.block(proj.entry) # lift a block of code from the program's entry point
<Block for 0x401670, 42 bytes>

>>> block.pp()                          # pretty-print a disassembly to stdout
0x401670:       xor     ebp, ebp
0x401672:       mov     r9, rdx
0x401675:       pop     rsi
0x401676:       mov     rdx, rsp
0x401679:       and     rsp, 0xfffffffffffffff0
0x40167d:       push    rax
0x40167e:       push    rsp
0x40167f:       lea     r8, [rip + 0x2e2a]
0x401686:       lea     rcx, [rip + 0x2db3]
0x40168d:       lea     rdi, [rip - 0xd4]
0x401694:       call    qword ptr [rip + 0x205866]

>>> block.instructions                  # how many instructions are there?
0xb
>>> block.instruction_addrs             # what are the addresses of the instructions?
[0x401670, 0x401672, 0x401675, 0x401676, 0x401679, 0x40167d, 0x40167e, 0x40167f, 0x401686, 0x40168d, 0x401694]

此外，你可以通过block得到代码块的其他表示形式

>>> block.capstone                       # capstone disassembly
<CapstoneBlock for 0x401670>
>>> block.vex                            # VEX IRSB (that's a python internal address, not a program address)
<pyvex.block.IRSB at 0x7706330>

states

Project对象只代表程序的一个初始化镜像，当你使用angr来执行时，你在使用一个表示模拟程序状态的一个特殊对象SimState

>>> state = proj.factory.entry_state()
<SimState @ 0x401670>

一个SimState包括程序内存，寄存器，文件系统数据...任何在运行中可以被改变的数据都可以在state中找到，如利用state.regs与state.mem:

>>> state.regs.rip        # get the current instruction pointer
<BV64 0x401670>
>>> state.regs.rax
<BV64 0x1c>
>>> state.mem[proj.entry].int.resolved  # interpret the memory at the entry point as a C int
<BV32 0x8949ed31>

这些都是bitvector变量，每个bitvector都有.length属性来表示其位数。bitvector可以与python中的整数相互转换

>>> bv = state.solver.BVV(0x1234, 32)       # create a 32-bit-wide bitvector with value 0x1234
<BV32 0x1234>                               # BVV stands for bitvector value
>>> state.solver.eval(bv)                # convert to python int
0x1234

你可以把这些bitvector赋给寄存器或内存，或者直接赋值整数，它将被自动转为合适大小的bitvector

>>> state.regs.rsi = state.solver.BVV(3, 64)
>>> state.regs.rsi
<BV64 0x3>

>>> state.mem[0x1000].long = 4
>>> state.mem[0x1000].long.resolved
<BV64 0x4>

其中mem的使用方式为：

使用array[index]的形式来指定地址；
使用.<type>来指定内存需要解释为的类型（char, short, int, long, size_t, uint8_t, uint16_t...)；
通过它，你可以存入一个值（bitvector或python int），.resolved来获取其bitvector值，.concrete来获取python int值。

最后，如果你读更多的寄存器，你可能会碰到

>>> state.regs.rdi
<BV64 reg_48_11_64{UNINITIALIZED}>

这也是一个64位的bitvector，但他没有一个具体的值。作为代替，他有一个名字。这被称为符号变量，是符号执行的基础。

仿真管理器

如果说state表示程序运行中的一个点，那么一定有一种方式来到达下一个点。一个仿真管理器是进行模拟/执行的基本接口。

首先，我们创建将要使用的仿真管理器。构造函数可以接受状态或状态列表。

>>> simgr = proj.factory.simulation_manager(state)
<SimulationManager with 1 active>
>>> simgr.active
[<SimState @ 0x401670>]

一个仿真管理器可以包括数个状态存储，初始默认存储为active。

现在，让我们进行一些执行。

simgr.step()

注意，我们没有改变原本的state！您可以安全地使用单个state作为多轮执行的“基础”。

>>> simgr.active
[<SimState @ 0x1020300>]
>>> simgr.active[0].regs.rip                 # new and exciting!
<BV64 0x1020300>
>>> state.regs.rip                           # still the same!
<BV64 0x401670>

分析

angr预先打包了几个内置的分析，您可以使用这些分析从程序中提取一些有趣的信息。在这里,他们是:

>>> proj.analyses.            # Press TAB here in ipython to get an autocomplete-listing of everything:
 proj.analyses.BackwardSlice        proj.analyses.CongruencyCheck      proj.analyses.reload_analyses       
 proj.analyses.BinaryOptimizer      proj.analyses.DDG                  proj.analyses.StaticHooker          
 proj.analyses.BinDiff              proj.analyses.DFG                  proj.analyses.VariableRecovery      
 proj.analyses.BoyScout             proj.analyses.Disassembly          proj.analyses.VariableRecoveryFast  
 proj.analyses.CDG                  proj.analyses.GirlScout            proj.analyses.Veritesting           
 proj.analyses.CFG                  proj.analyses.Identifier           proj.analyses.VFG                   
 proj.analyses.CFGEmulated          proj.analyses.LoopFinder           proj.analyses.VSA_DDG               
 proj.analyses.CFGFast              proj.analyses.Reassembler

举一个非常简单的例子:下面是如何构造和使用一个快速的控制流程图:

# Originally, when we loaded this binary it also loaded all its dependencies into the same virtual address  space
# This is undesirable for most analysis.
>>> proj = angr.Project('/bin/true', auto_load_libs=False)
>>> cfg = proj.analyses.CFGFast()
<CFGFast Analysis Result at 0x2d85130>

# cfg.graph is a networkx DiGraph full of CFGNode instances
# You should go look up the networkx APIs to learn how to use this!
>>> cfg.graph
<networkx.classes.digraph.DiGraph at 0x2da43a0>
>>> len(cfg.graph.nodes())
951

# To get the CFGNode for a given address, use cfg.get_any_node
>>> entry_node = cfg.get_any_node(proj.entry)
>>> len(list(cfg.graph.successors(entry_node)))
2

通过api文档你将了解更多用法

WateranFire

关注

3
点赞
踩
13

收藏

觉得还不错? 一键收藏
0
评论
Angr学习（一）

以前只是简单套用angr模板来解一些简单的题目，现在希望能深入了解一下angr, 所以做个记录。注意，本文不是直接翻译，而会根据个人感觉进行删减！本文主要参考自https://docs.angr.io/===========================================================下面是一些小tips，可以跳过。angr的调试信息记录i...
复制链接

扫一扫