Angr入门（一）

I still …

已于 2023-02-17 23:01:33 修改

阅读量3.1k

点赞数 1

分类专栏：符号执行程序分析工具程序分析文章标签： python

于 2021-08-16 19:47:40 首次发布

本文链接：https://blog.csdn.net/qq_44370676/article/details/119714879

版权

程序分析同时被 3 个专栏收录

33 篇文章 13 订阅

订阅专栏

程序分析工具

13 篇文章 0 订阅

订阅专栏

符号执行

9 篇文章 6 订阅

订阅专栏

Angr学习

Top Level Interfaces
- 基本信息
- Basic Block
Loading a Binary
- 基本信息
- Symbols and Relocations
Program State
CFG

之前一直做静态代码检测，主要是针对未编译过的源代码文本，不过在文本层面能分析的问题只是一小部分，有些问题还得在执行层面发现。符号执行是一个有效的方法。关于符号执行，这里就不解释那么多了，主要学习一个工具，angr的使用。

angr是一个基于python的二进制框架，即一个python library。官网，安装也非常简单，我是用anaconda创建了一个python 3.7环境，直接pip install angr就能一键安装。

这里主要从官网给的例子来学习，差不多也就是搬运官网的内容。这位大佬写的也非常全，适合学习各个模块的功能。

Top Level Interfaces

以r100为例

涉及到angr的基本用法，先粘贴代码，一些解释写在注释上，感觉这样方便的多

import angr

def main():
    p = angr.Project("examples/r100")
    print(p.arch) # CPU架构
    print(hex(p.entry)) # start函数地址
    print(p.loader.shared_objects) # OrderedDict,涉及到的二进制文件

    print(hex(p.loader.min_addr)) # 0x400000,虚拟地址最低地址.imageBase
    print(hex(p.loader.max_addr))

    # block,angr执行的unit
    block = p.factory.block(p.entry)
    print(block.pp())
    # print(type(block.pp()))

    print(block.instructions) # 一个int类型数值
    print(block.instruction_addrs) # 每个指令的地址,一个list,长度为block.instructions

    # states
    state = p.factory.entry_state() # start函数的state
    print(state.mem[p.entry].int.resolved)

    # simulation managers
    simgr = p.factory.simulation_manager(p.factory.full_init_state())
    simgr.explore(find=0x400844, avoid=0x400855)

    print(simgr)
    print(simgr.found[0].posix.dumps(0).strip(b'\0\n'))

r100 IDA逆向如下：

start函数：
在这里插入图片描述
main函数：

基本信息

执行结果：

p.arch:<Arch AMD64 (LE)>
p.entry:0x400610
p.loader.shared_objects:OrderedDict([('r100', <ELF Object r100, maps [0x400000:0x601077]>), ('libc.so.6', <ELF Object libc-2.27.so, maps [0x700000:0xaf0adf]>), ('ld-linux-x86-64.so.2', <ELF Object ld-2.27.so, maps [0xb00000:0xd2b16f]>), ('extern-address space', <ExternObject Object cle##externs, maps [0xe00000:0xe7ffff]>), ('cle##tls', <ELFTLSObjectV2 Object cle##tls, maps [0xf00000:0xf1500f]>)])
p.loader.min_addr:0x400000
p.loader.max_addr:0x1007fff

Basic Block

可以通过project.factory.block(addr)来获取给定地址的基本块，angr分析代码的unit就是基本块。示例代码查看的是入口处start函数的基本块，结果如下：

block.pp():
0x400610:	xor	ebp, ebp
0x400612:	mov	r9, rdx
0x400615:	pop	rsi
0x400616:	mov	rdx, rsp
0x400619:	and	rsp, 0xfffffffffffffff0
0x40061d:	push	rax
0x40061e:	push	rsp
0x40061f:	mov	r8, 0x400900
0x400626:	mov	rcx, 0x400890
0x40062d:	mov	rdi, 0x4007e8
0x400634:	call	0x4005d0
None

block.instructions:11
block.instruction_addrs:[4195856, 4195858, 4195861, 4195862, 4195865, 4195869, 4195870, 4195871, 4195878, 4195885, 4195892]

关于state和simulation_manager，这里暂时不说。

Loading a Binary

这里主要介绍angr加载PE文件的component，CLE。

这里用到的示例为fauxware。

源代码如下：

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>

char *sneaky = "SOSNEAKY";

int authenticate(char *username, char *password)
{
	char stored_pw[9];
	stored_pw[8] = 0;
	int pwfile;

	// evil back d00r
	if (strcmp(password, sneaky) == 0) return 1;

	pwfile = open(username, O_RDONLY);
	read(pwfile, stored_pw, 8);

	if (strcmp(password, stored_pw) == 0) return 1;
	return 0;

}

int accepted()
{
	printf("Welcome to the admin console, trusted user!\n");
}

int rejected()
{
	printf("Go away!");
	exit(1);
}

int main(int argc, char **argv)
{
	char username[9];
	char password[9];
	int authed;

	username[8] = 0;
	password[8] = 0;

	printf("Username: \n");
	read(0, username, 8);
	read(0, &authed, 1);
	printf("Password: \n");
	read(0, password, 8);
	read(0, &authed, 1);

	authed = authenticate(username, password);
	if (authed) accepted();
	else rejected();
}

先上代码

基本信息

>>> proj = angr.Project('examples/fauxware') # 加载PE文件
>>> proj.loader # 相应loader,map到了 [min_addr: max_addr] 的地址空间
<Loaded fauxware, maps [0x400000:0x1007fff]

>>> proj.loader.all_objects # 所有被加载的Object
[<ELF Object fauxware, maps [0x400000:0x60105f]>, <ELF Object libc-2.27.so, maps [0x700000:0xaf0adf]>, <ELF Object ld-2.27.so, maps [0xb00000:0xd2b16f]>, <ExternObject Object cle##externs, maps [0xe00000:0xe7ffff]>, <ELFTLSObjectV2 Object cle##tls, maps [0xf00000:0xf1500f]>, <KernelObject Object cle##kernel, maps [0x1000000:0x1007fff]>]

>>> proj.loader.main_object # fauxware PE文件
<ELF Object fauxware, maps [0x400000:0x60105f]>

>>> proj.loader.shared_objects # 一个dict, filename -> file object
OrderedDict([('fauxware', <ELF Object fauxware, maps [0x400000:0x60105f]>), ('libc.so.6', <ELF Object libc-2.27.so, maps [0x700000:0xaf0adf]>), ('ld-linux-x86-64.so.2', <ELF Object ld-2.27.so, maps [0xb00000:0xd2b16f]>), ('extern-address space', <ExternObject Object cle##externs, maps [0xe00000:0xe7ffff]>), ('cle##tls', <ELFTLSObjectV2 Object cle##tls, maps [0xf00000:0xf1500f]>)])

>>> proj.loader.all_elf_objects # 一个list,windows下用all_pe_objects
[<ELF Object fauxware, maps [0x400000:0x60105f]>, <ELF Object libc-2.27.so, maps [0x700000:0xaf0adf]>, <ELF Object ld-2.27.so, maps [0xb00000:0xd2b16f]>]

在这里插入图片描述

>>> obj = proj.loader.main_object
>>> hex(obj.entry)
'0x400580'

>>> hex(obj.min_addr), hex(obj.max_addr)
('0x400000', '0x60105f')

>>> addr = obj.plt['strcmp']
>>> hex(addr)
'0x400550'
>>> obj.reverse_plt[addr]
'strcmp'


>>> hex(obj.linked_base) # Show the prelinked base of the object and the location it was actually mapped into memory by CLE
'0x400000'
>>> hex(obj.mapped_base)
'0x400000'

Symbols and Relocations

引用官网的话： A symbol is a fundamental concept in the world of executable formats, effectively mapping a name to an address.

loader.find_symbol是最简单的获取symbol的方式

>>> strcmp = proj.loader.find_symbol('strcmp')
>>> strcmp
<Symbol "strcmp" in libc-2.27.so at 0x79d940>

Symbol类有下面几个属性是常用的，name,owner,address。address通常比较模糊。Symbol对象通常有3种方式表示它的地址

rebased_addr: address in the global address space
linked_addr: address relative to the prelinked base of the binary
relative_addr: address relative to the object base

>>> strcmp.owner
<ELF Object libc-2.27.so, maps [0x700000:0xaf0adf]>

>>> hex(strcmp.rebased_addr)
'0x79d940'
>>> hex(strcmp.linked_addr)
'0x9d940'
>>> hex(strcmp.relative_addr)
'0x9d940'

>>> main_strcmp = proj.loader.main_object.get_symbol('strcmp')
>>> main_strcmp
<Symbol "strcmp" in fauxware (import)>

>>> main_strcmp.resolvedby
<Symbol "strcmp" in libc-2.27.so at 0x79d940>

可以看到 main_strcmp.resolvedby 和 proj.loader.find_symbol('strcmp') 是同一个Symbol。

Program State

angr用SimState类来表示Program State，可以用来访问寄存器和内存（模拟的）。

这里用/bin/true文件举例。粘贴一下官网的示例代码

import angr, claripy
proj = angr.Project('/bin/true')
state = proj.factory.entry_state()

# copy rsp to rbp
state.regs.rbp = state.regs.rsp

# store rdx to memory at 0x1000
state.mem[0x1000].uint64_t = state.regs.rdx

# dereference rbp
state.regs.rbp = state.mem[state.regs.rbp].uint64_t.resolved

# add rax, qword ptr [rsp + 8]
state.regs.rax += state.mem[state.regs.rsp + 8].uint64_t.resolved

这里用到的是proj.factory.entry_state()来创建一个state，还有其它的state构造方法（不好翻译，就直接粘贴英文说明）

blank_state()：constructs a “blank slate” blank state, with most of its data left uninitialized.When accessing uninitialized data, an unconstrained symbolic value will be returned
entry_state()：constructs a state ready to execute at the main binary’s entry point.
full_init_state()：constructs a state that is ready to execute through any initializers that need to be run before the main binary’s entry point, for example, shared library constructors or preinitializers.When it is finished with these it will jump to the entry point.
call_state()：constructs a state ready to execute a given function.

关于state其它内容，之后应用到具体分析上（比如CTF）再说：官网state解释