unicorn模拟android,汇编与反汇编神器Unicorn

我们来先说说Unicorn有啥子卵用。

Unicorn 是一款非常优秀的跨平台模拟执行框架,该框架可以跨平台执行Arm, Arm64 (Armv8), M68K, Mips, Sparc, & X86 (include X86_64)等指令集的原生程序。

好了说得那么官方,我们举个例子好了,研究OLLVM的时候是不是很头疼函数的地址,使用Unicorn就可以打印函数注册地址,已经参数名称,用某音的so来演示一下Unicorn的威力

[Asm] 纯文本查看 复制代码RegisterNatives dvmClass=com/ss/android/common/applog/UserInfo, name=getUserInfo, signature=(ILjava/lang/String;[Ljava/lang/String;)Ljava/lang/String;, fnPtr=unicorn@0x4002c6c5[libcms.so]0x2c6c5

RegisterNatives dvmClass=com/ss/android/common/applog/UserInfo, name=getUserInfo, signature=(ILjava/lang/String;[Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;, fnPtr=unicorn@0x4002c6dd[libcms.so]0x2c6dd

RegisterNatives dvmClass=com/ss/android/common/applog/UserInfo, name=getUserInfoSkipGet, signature=(ILjava/lang/String;[Ljava/lang/String;)Ljava/lang/String;, fnPtr=unicorn@0x4002c7b1[libcms.so]0x2c7b1

RegisterNatives dvmClass=com/ss/android/common/applog/UserInfo, name=getUserInfo, signature=(I[Ljava/lang/String;[Ljava/lang/String;Ljava/lang/String;)Ljava/lang/String;, fnPtr=unicorn@0x4002c7d1[libcms.so]0x2c7d1

RegisterNatives dvmClass=com/ss/android/common/applog/UserInfo, name=getPackage, signature=(Ljava/lang/String;)V, fnPtr=unicorn@0x4002e0dd[libcms.so]0x2e0dd

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (46.69 KB, 下载次数: 6)

1

2019-9-19 15:15 上传

很快就可以找到UserInfo的函数地址了,不管是Hook还是直接动态调试都是事半功倍。

(当初我找函数地址,找到想哭)

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (257.52 KB, 下载次数: 1)

表情

2019-9-19 15:17 上传

好了,开始Unicorn的入门

Unicorn 快速入门

多架构

Unicorn 是一款基于qemu模拟器的模拟执行框架,支持Arm, Arm64 (Armv8), M68K, Mips, Sparc, & X86 (include X86_64)等指令集。

多语言

Unicorn 为多种语言提供编程接口比如C/C++、Python、Java 等语言。Unicorn的DLL 可以被更多的语言调用,比如易语言、Delphi,前途无量。

多线程安全

Unicorn 设计之初就考虑到线程安全问题,能够同时并发模拟执行代码,极大的提高了实用性。

虚拟内存

Unicorn 采用虚拟内存机制,使得虚拟CPU的内存与真实CPU的内存隔离。Unicorn 使用如下API来操作内存:

uc_mem_map

uc_mem_read

uc_mem_write

使用uc_mem_map映射内存的时候,address 与 size 都需要与0x1000对齐,也就是0x1000的整数倍,否则会报UC_ERR_ARG 异常。如何动态分配管理内存并实现libc中的malloc功能将在后面的课程中讲解。

Hook 机制

Unicorn的Hook机制为编程控制虚拟CPU提供了便利。

Unicorn 支持多种不同类型的Hook。

大致可以分为(hook_add第一参数,Unicorn常量):

指令执行类

UC_HOOK_INTR

UC_HOOK_INSN

UC_HOOK_CODE

UC_HOOK_BLOCK

内存访问类

UC_HOOK_MEM_READ

UC_HOOK_MEM_WRITE

UC_HOOK_MEM_FETCH

UC_HOOK_MEM_READ_AFTER

UC_HOOK_MEM_PROT

UC_HOOK_MEM_FETCH_INVALID

UC_HOOK_MEM_INVALID

UC_HOOK_MEM_VALID

异常处理类

UC_HOOK_MEM_READ_UNMAPPED

UC_HOOK_MEM_WRITE_UNMAPPED

UC_HOOK_MEM_FETCH_UNMAPPED

调用hook_add函数可添加一个Hook。Unicorn的Hook是链式的,而不是传统Hook的覆盖式,也就是说,可以同时添加多个同类型的Hook,Unicorn会依次调用每一个handler。hook callback 是有作用范围的(见hook_add begin参数)。

我们来写一个举一个简单的栗子:

先装一下Unicorn的导入包

[Asm] 纯文本查看 复制代码pip install unicorn

然后新建一个py文件

[Python] 纯文本查看 复制代码from unicorn import *

from unicorn.arm_const import *

ARM_CODE = b"\x37\x00\xa0\xe3\x03\x10\x42\xe0"

# mov r0, #0x37;

# sub r1, r2, r3

# Test ARM

# callback for tracing instructions

def hook_code(uc, address, size, user_data):

print(">>> Tracing instruction at 0x%x, instruction size = 0x%x" % (address, size))

def test_arm():

print("Emulate ARM code")

try:

# Initialize emulator in ARM mode

mu = Uc(UC_ARCH_ARM, UC_MODE_THUMB) #创建UC对象

# map 2MB memory for this emulation 创建2MB的内存空间

ADDRESS = 0x10000

mu.mem_map(ADDRESS, 2 * 0x10000)

mu.mem_write(ADDRESS, ARM_CODE) #将前面定义的ARM_CODE传入内存空间内,只支持byte

#未开机前寄存器赋值

mu.reg_write(UC_ARM_REG_R0, 0x1234)

mu.reg_write(UC_ARM_REG_R2, 0x6789)

mu.reg_write(UC_ARM_REG_R3, 0x3333)

#添加指令集Hook

# mu.hook_add(UC_HOOK_CODE, hook_code, begin=ADDRESS, end=ADDRESS)

# emulate machine code in infinite time,开机

mu.emu_start(ADDRESS, ADDRESS + len(ARM_CODE))

print("已开机")

#获取计算器结果

r0 = mu.reg_read(UC_ARM_REG_R0)

r1 = mu.reg_read(UC_ARM_REG_R1)

print(">>> R0 = 0x%x" % r0)

print(">>> R1 = 0x%x" % r1)

except UcError as e:

print("ERROR: %s" % e)

test_arm()

我把核心的位置都写了备注,这样很明显了吧

我们看看运行结果

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (3.9 KB, 下载次数: 4)

2

2019-9-19 15:24 上传

R0寄存器的就变成了0x37,R1=0x3456,

上面我们明明没有对R1寄存器进行任何操作,为什么R1会有值呢?

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (101.26 KB, 下载次数: 2)

3

2019-9-19 15:25 上传

于是我们引入第二个汇编神器Capstone

其实ARM_CODE = b"\x37\x00\xa0\xe3\x03\x10\x42\xe0"就是对寄存器的操作

我们用Capstone来翻译看看是什么指令

先插个件

[Asm] 纯文本查看 复制代码pip install capstone

建个py文件

[Python] 纯文本查看 复制代码from capstone import *

from capstone.arm import *

CODE = b"\x37\x00\xa0\xe3\x03\x10\x42\xe0"

md = Cs(CS_ARCH_ARM, CS_MODE_ARM)

for i in md.disasm(CODE, 0x1000):

print("%x:\t%s\t%s" % (i.address, i.mnemonic, i.op_str))

查看运行结果

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (10.11 KB, 下载次数: 1)

4

2019-9-19 15:30 上传

这个总是看得懂了吧,就是简单arm的指令R1=R2-R3

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (161.74 KB, 下载次数: 4)

5

2019-9-19 15:31 上传

接下来你们肯定关心怎么打印地址?怎么让Unicorn想普通模拟器可以单步调试对不对?

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (24.52 KB, 下载次数: 5)

6

2019-9-19 15:33 上传

无名大佬写了一个调试,我们来看看这个调试器的源码

(本菜是无名大佬的脑残粉)

[Python] 纯文本查看 复制代码from unicorn import *

from unicorn import arm_const

from unicorn.arm_const import *

import sys

import hexdump

import capstone as cp

BPT_EXECUTE = 1

BPT_MEMREAD = 2

UDBG_MODE_ALL = 1

UDBG_MODE_FAST = 2

REG_ARM = {arm_const.UC_ARM_REG_R0: "R0",

arm_const.UC_ARM_REG_R1: "R1",

arm_const.UC_ARM_REG_R2: "R2",

arm_const.UC_ARM_REG_R3: "R3",

arm_const.UC_ARM_REG_R4: "R4",

arm_const.UC_ARM_REG_R5: "R5",

arm_const.UC_ARM_REG_R6: "R6",

arm_const.UC_ARM_REG_R7: "R7",

arm_const.UC_ARM_REG_R8: "R8",

arm_const.UC_ARM_REG_R9: "R9",

arm_const.UC_ARM_REG_R10: "R10",

arm_const.UC_ARM_REG_R11: "R11",

arm_const.UC_ARM_REG_R12: "R12",

arm_const.UC_ARM_REG_R13: "R13",

arm_const.UC_ARM_REG_R14: "R14",

arm_const.UC_ARM_REG_R15: "R15",

arm_const.UC_ARM_REG_PC: "PC",

arm_const.UC_ARM_REG_SP: "SP",

arm_const.UC_ARM_REG_LR: "LR"

}

REG_TABLE = {UC_ARCH_ARM: REG_ARM}

def str2int(s):

if s.startswith('0x') or s.startswith("0X"):

return int(s[2:], 16)

return int(s)

def advance_dump(data, base):

PY3K = sys.version_info >= (3, 0)

generator = hexdump.genchunks(data, 16)

retstr = ''

for addr, d in enumerate(generator):

# 00000000:

line = '%08X: ' % (base + addr * 16)

# 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

dumpstr = hexdump.dump(d)

line += dumpstr[:8 * 3]

if len(d) > 8: # insert separator if needed

line += ' ' + dumpstr[8 * 3:]

# ................

# calculate indentation, which may be different for the last line

pad = 2

if len(d) < 16:

pad += 3 * (16 - len(d))

if len(d) <= 8:

pad += 1

line += ' ' * pad

for byte in d:

# printable ASCII range 0x20 to 0x7E

if not PY3K:

byte = ord(byte)

if 0x20 <= byte <= 0x7E:

line += chr(byte)

else:

line += '.'

retstr += line + '\n'

return retstr

def _dbg_trace(mu, address, size, self):

self._tracks.append(address)

if not self._is_step and self._tmp_bpt == 0:

if address not in self._list_bpt:

return

if self._tmp_bpt != address and self._tmp_bpt != 0:

return

return _dbg_trace_internal(mu, address, size, self)

def _dbg_memory(mu, access, address, length, value, self):

pc = mu.reg_read(arm_const.UC_ARM_REG_PC)

print("memory error: pc: %x access: %x address: %x length: %x value: %x" %

(pc, access, address, length, value))

_dbg_trace_internal(mu, pc, 4, self)

mu.emu_stop()

return True

def _dbg_trace_internal(mu, address, size, self):

self._is_step = False

print("======================= Registers =======================")

self.dump_reg()

print("======================= Disassembly =====================")

self.dump_asm(address, size * self.dis_count)

while True:

raw_command = input(">")

if raw_command == '':

raw_command = self._last_command

self._last_command = raw_command

command = []

for c in raw_command.split(" "):

if c != "":

command.append(c)

try:

if command[0] == 'set':

if command[1] == 'reg': # set reg regname value

self.write_reg(command[2], str2int(command[3]))

elif command[1] == 'bpt':

self.add_bpt(str2int(command[2]))

else:

print("[Debugger Error]command error see help.")

elif command[0] == 's' or command[0] == 'step':

# self._tmp_bpt = address + size

self._tmp_bpt = 0

self._is_step = True

break

elif command[0] == 'n' or command[0] == 'next':

self._tmp_bpt = address + size

self._is_step = False

break

elif command[0] == 'r' or command[0] == 'run':

self._tmp_bpt = 0

self._is_step = False

break

elif command[0] == 'dump':

if len(command) >= 3:

nsize = str2int(command[2])

else:

nsize = 4 * 16

self.dump_mem(str2int(command[1]), nsize)

elif command[0] == 'list':

if command[1] == 'bpt':

self.list_bpt()

elif command[0] == 'del':

if command[1] == 'bpt':

self.del_bpt(str2int(command[2]))

elif command[0] == 'stop':

exit(0)

elif command[0] == 't':

self._castone = self._capstone_thumb

print("======================= Disassembly =====================")

self.dump_asm(address, size * self.dis_count)

elif command[0] == 'a':

self._castone = self._capstone_arm

print("======================= Disassembly =====================")

self.dump_asm(address, size * self.dis_count)

elif command[0] == 'f':

print(" == recent ==")

for i in self._tracks[-10:-1]:

print(self.sym_handler(i))

else:

print("Command Not Found!")

except:

print("[Debugger Error]command error see help.")

class UnicornDebugger:

def __init__(self, mu, mode=UDBG_MODE_ALL):

self._tracks = []

self._mu = mu

self._arch = mu._arch

self._mode = mu._mode

self._list_bpt = []

self._tmp_bpt = 0

self._error = ''

self._last_command = ''

self.dis_count = 5

self._is_step = False

self.sym_handler = self._default_sym_handler

self._capstone_arm = None

self._capstone_thumb = None

if self._arch != UC_ARCH_ARM:

mu.emu_stop()

raise RuntimeError("arch:%d is not supported! " % self._arch)

if self._arch == UC_ARCH_ARM:

capstone_arch = cp.CS_ARCH_ARM

elif self._arch == UC_ARCH_ARM64:

capstone_arch = cp.CS_ARCH_ARM64

elif self._arch == UC_ARCH_X86:

capstone_arch = cp.CS_ARCH_X86

else:

mu.emu_stop()

raise RuntimeError("arch:%d is not supported! " % self._arch)

if self._mode == UC_MODE_THUMB:

capstone_mode = cp.CS_MODE_THUMB

elif self._mode == UC_MODE_ARM:

capstone_mode = cp.CS_MODE_ARM

elif self._mode == UC_MODE_32:

capstone_mode = cp.CS_MODE_32

elif self._mode == UC_MODE_64:

capstone_mode = cp.CS_MODE_64

else:

mu.emu_stop()

raise RuntimeError("mode:%d is not supported! " % self._mode)

self._capstone_thumb = cp.Cs(cp.CS_ARCH_ARM, cp.CS_MODE_THUMB)

self._capstone_arm = cp.Cs(cp.CS_ARCH_ARM, cp.CS_MODE_ARM)

self._capstone = self._capstone_thumb

if mode == UDBG_MODE_ALL:

mu.hook_add(UC_HOOK_CODE, _dbg_trace, self)

mu.hook_add(UC_HOOK_MEM_UNMAPPED, _dbg_memory, self)

mu.hook_add(UC_HOOK_MEM_FETCH_PROT, _dbg_memory, self)

self._regs = REG_TABLE[self._arch]

def dump_mem(self, addr, size):

data = self._mu.mem_read(addr, size)

print(advance_dump(data, addr))

def dump_asm(self, addr, size):

md = self._capstone

code = self._mu.mem_read(addr, size)

count = 0

for ins in md.disasm(code, addr):

if count >= self.dis_count:

break

print("%s:\t%s\t%s" % (self.sym_handler(ins.address), ins.mnemonic, ins.op_str))

def dump_reg(self):

result_format = ''

count = 0

for rid in self._regs:

rname = self._regs[rid]

value = self._mu.reg_read(rid)

if count < 4:

result_format = result_format + ' ' + rname + '=' + hex(value)

count += 1

else:

count = 0

result_format += '\n' + rname + '=' + hex(value)

print(result_format)

def write_reg(self, reg_name, value):

for rid in self._regs:

rname = self._regs[rid]

if rname == reg_name:

self._mu.reg_write(rid, value)

return

print("[Debugger Error] Reg not found:%s " % reg_name)

def show_help(self):

help_info = """

# commands

# set reg

# set bpt

# n[ext]

# s[etp]

# r[un]

# dump

# list bpt

# del bpt

# stop

# a/t change arm/thumb

# f show ins flow

"""

print(help_info)

def list_bpt(self):

for idx in range(len(self._list_bpt)):

print("[%d] %s" % (idx, self.sym_handler(self._list_bpt[idx])))

def add_bpt(self, addr):

self._list_bpt.append(addr)

def del_bpt(self, addr):

self._list_bpt.remove(addr)

def get_tracks(self):

for i in self._tracks[-100:-1]:

# print (self.sym_handler(i))

pass

return self._tracks

def _default_sym_handler(self, address):

return hex(address)

def set_symbol_name_handler(self, handler):

self.sym_handler = handler

def test_arm():

print("Emulate Thumb code")

THUMB = b"\x37\x00\xa0\xe3\x03\x10\x42\xe0"

# sub sp, #0xc

# sub sp, #0xc

# sub sp, #0xc

try:

# Initialize emulator in ARM mrode

mu = Uc(UC_ARCH_ARM, UC_MODE_THUMB)

# map 2MB memory for this emulation

ADDRESS = 0x10000

mu.mem_map(ADDRESS, 2 * 0x10000)

mu.mem_write(ADDRESS, THUMB)

mu.reg_write(UC_ARM_REG_SP, 0x1234)

mu.reg_write(UC_ARM_REG_R2, 0x6789)

# debugger attach

udbg = UnicornDebugger(mu)

udbg.add_bpt(ADDRESS)

# emulate machine code in infinite time

mu.emu_start(ADDRESS, ADDRESS + len(THUMB))

r0 = mu.reg_read(UC_ARM_REG_SP)

r1 = mu.reg_read(UC_ARM_REG_R1)

print(">>> SP = 0x%x" % r0)

print(">>> R1 = 0x%x" % r1)

except UcError as e:

print("ERROR: %s" % e)

test_arm()

我们看看运行结果

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (19.17 KB, 下载次数: 4)

7

2019-9-19 15:36 上传

寄存器的值,和反编译后的指令都显示出来了

接下来就是输入指令了,step,run,next,这是不是跟F8,F9,F10,步入,步过,运行很像呢

这个大家可以自己去尝试以下,我就直接run了

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (21.61 KB, 下载次数: 1)

8

2019-9-19 15:38 上传

值都打印出来啦。

这些都是Unicorn的基础,那些大佬已经基于Unicorn写出很多很强大的逆向工具,大家有兴趣可以自己找找

55fd2b2273b5a8b4531f72773c469d6e.gif

image.png (51.08 KB, 下载次数: 3)

9

2019-9-19 15:40 上传

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值