Oyente搭建，框架结构以及helloworld案例解析(一)

最新推荐文章于 2024-03-26 09:42:58 发布

Mr. Water

最新推荐文章于 2024-03-26 09:42:58 发布

阅读量5.1k

点赞数 5

分类专栏：区块链信息安全文章标签：图形学 shader k8s

本文链接：https://blog.csdn.net/narcissus2_/article/details/115832793

版权

区块链同时被 2 个专栏收录

12 篇文章 1 订阅

订阅专栏

信息安全

8 篇文章 0 订阅

订阅专栏

本文目标

简要描述oyente搭建的几个坑点。
清楚描述oyente框架结构和每个文件的内容。
跑通helloworld.sol案例并对整个代码过程进行解析。

1. Oyente搭建的几个坑点

oyente目前只支持4.19以下的solidity版本，所以我们如果直接按照官方文档中的方法安装sol是没有办法跑通的，最好的办法是使用solc-select来管理安装不同版本的solidity。
oyente官方文档中没有提到需要安装crytic_compile库，但是input_helper中又引入了这个库，所以我们需要额外通过pip对其进行安装。
oyente对于geth和evm的支持均在1.7.3以下，所以我们需要去官网找到1.7.3的以太坊工具包，然后在ubuntu系统下的/usr/bin中把geth和evm全部替换才能够跑通。
目前的solc同样也需要在github的solidity项目中找到0.4.19的版本，并把可执行文件放进/usr/bin之后才能跑通。

2. oyente框架结构

3. helloworld.sol案例解析

3.1. helloworld脚本

helloworld脚本中只含有helloworld一个函数，它是一个pure关键字的脚本，代表着它不改变智能合约的函数变量，同时在运行中也不会消耗gas。

pragma solidity >=0.4.19;

contract test {
    function helloworld() pure public returns (string)
    {
        return "hello world";
    }
}

3.2. 执行命令以及结果

我们通过下列命令来执行我们helloworld.sol测试代码。

sudo python3 oyente.py -s '/home/researchlib/oyente-master/example/helloworld.sol'

报告如下所示
- root代表这次执行进入的根函数为helloworld.sol的第三行：test函数。
- EVM code coverage应该代表有多少的虚拟机代码已经被检查过了。
- 下面的代表着是否有检测到合约的各种不同的问题，包括Integer Underflow，Integer Overflow，Parity Multisig Bug 2，Callstack Depth Attack Vulnerability，Transaction-Ordering Dependence(TOD)和Timestamp Dependency。
如果上述的问题还有不清楚的，可以看solidity攻击案例以及避坑指南。

INFO:root:contract /home/researchlib/oyente-master/example/helloworld.sol:test3:
INFO:symExec:	============ Results ===========
INFO:symExec:	  EVM Code Coverage: 			 99.5%
INFO:symExec:	  Integer Underflow: 			 False
INFO:symExec:	  Integer Overflow: 			 False
INFO:symExec:	  Parity Multisig Bug 2: 		 False
INFO:symExec:	  Callstack Depth Attack Vulnerability:  False
INFO:symExec:	  Transaction-Ordering Dependence (TOD): False
INFO:symExec:	  Timestamp Dependency: 		 False
INFO:symExec:	  Re-Entrancy Vulnerability: 		 False
INFO:symExec:	====== Analysis Completed ======

3.3. Oyente.py

Oyente.py是我们整个项目的函数入口，主要承担着接受参数，存储变量以及调用不同函数的作用。
在这个案例中，我们的符号为-s，我们的参数为我们的合约源码位置。

3.3.1. main()

def main():
    # TODO: Implement -o switch.
    global args
    parser = argparse.ArgumentParser()
    group = parser.add_mutually_exclusive_group(required=True)

    group.add_argument("-s",  "--source",    type=str, help="local source file name. Solidity by default. Use -b to process evm instead. Use stdin to read from stdin.")
    '''......'''

	if args.bytecode:
        exit_code = analyze_bytecode()
    elif args.standard_json:
        exit_code = analyze_solidity(input_type='standard_json')
    elif args.standard_json_output:
        exit_code = analyze_solidity(input_type='standard_json_output')
    else:
        exit_code = analyze_solidity()

main函数主要工作为:
- 实例化创建解析器，解析命令行传入的参数变量，参数变量部分存储于global_params.py中。
- add_argument可以对应args内第二个参数，观察我们的调用指令就能看到args.source的值对应的是我们合约文件的地址。
- 由于没有别的参数，我们的函数将直接调用第227行的exit_code = analyze_solidity()

3.3.2 analyze_solidity()

def analyze_solidity(input_type='solidity'):
    global args

    if input_type == 'solidity':
        helper = InputHelper(InputHelper.SOLIDITY, source=args.source, evm=args.evm, compilation_err=args.compilation_error, root_path=args.root_path, remap=args.remap, allow_paths=args.allow_paths)
    elif input_type == 'standard_json':
        helper = InputHelper(InputHelper.STANDARD_JSON, source=args.source, evm=args.evm, allow_paths=args.allow_paths)
    elif input_type == 'standard_json_output':
        helper = InputHelper(InputHelper.STANDARD_JSON_OUTPUT, source=args.source, evm=args.evm)
    inputs = helper.get_inputs(global_params.TARGET_CONTRACTS)
    results, exit_code = run_solidity_analysis(inputs)
    helper.rm_tmp_files()

    if global_params.WEB:
        six.print_(json.dumps(results))
    return exit_code

analyze_solidity函数的主要工作为:
- 主要做的就是根据输入的类型，来获得我们想要的helper。
- 通过helper的get_inputs()函数，我们能够得到inputs变量。
- 将input变量传入run_solidity_analysis函数，就能得到我们想要的结果。
这里我们来看一下InputHelper和input他们的数据结构。
其中helper内变量的值为
- compilation_err = False,
- compiled_contracts=[('/home/researchlib/oyente-master/example/helloworld.sol:test', '6060604...')],
- evm=false,
- source='/home/researchlib/oyente-master/example/helloworld.sol'
其中input内的变量为
- contract = '/home/researchlib/oyente-master/example/helloworld.sol:test'
- source_map = {complex structure}
- source = '/home/researchlib/oyente-master/example/helloworld.sol'
- 'c_source = /home/researchlib/oyente-master/example/helloworld.sol'
- c_name = 'test'
- disasm_file = '/home/researchlib/oyente-master/example/helloworld.sol:test.evm.disasm'
可以预测到的是，如果有多个函数，input内的对象可能会有多个。
那么source_map存放的是什么内容呢？

3.3.3. get_input()

我们这边看到get_input函数，

    def get_inputs(self, targetContracts=None):
        inputs = []
        if self.input_type == InputHelper.BYTECODE:
            with open(self.source, 'r') as f:
                bytecode = f.read()
            self._prepare_disasm_file(self.source, bytecode)

            disasm_file = self._get_temporary_files(self.source)['disasm']
            inputs.append({'disasm_file': disasm_file})
        else:
            contracts = self._get_compiled_contracts()
            self._prepare_disasm_files_for_analysis(contracts)
            for contract, _ in contracts:
                c_source, cname = contract.split(':')
                if targetContracts is not None and cname not in targetContracts:
                    continue
                c_source = re.sub(self.root_path, "", c_source)
                if self.input_type == InputHelper.SOLIDITY:
                    source_map = SourceMap(contract, self.source, 'solidity', self.root_path, self.remap, self.allow_paths)
                else:
                    source_map = SourceMap(contract, self.source, 'standard json', self.root_path)
                disasm_file = self._get_temporary_files(contract)['disasm']
                inputs.append({
                    'contract': contract,
                    'source_map': source_map,
                    'source': self.source,
                    'c_source': c_source,
                    'c_name': cname,
                    'disasm_file': disasm_file
                })
        if targetContracts is not None and not inputs:
            raise ValueError("Targeted contracts weren't found in the source code!")
        return inputs

其中获得source_map变量的是来自于第78或80行，即
- source_map = SourceMap(contract, self.source, 'solidity', self.root_path, self.remap, self.allow_paths)
- source_map = SourceMap(contract, self.source, 'standard json', self.root_path)

3.3.4. SourceMap

在这里插入图片描述

我们这边看到SourceMap的结构含有上图的这些信息，下面我会介绍一些重要的属性（变量太长我这边不列举了）：
- ast_helper: 存储着合约的各种索引和输出合约索引和状态的辅助类函数。
- position_groups：包含着编译好的字节指令asm和辅助签名数据auxdata，其中begin映射着合约函数某函数开始的字符串位置，end映射着合约函数结束的字符串位置。[见附录1.1]
  - 案例：{'begin': 27, 'end': 141, 'name': 'PUSH', 'value': '60'}

pragma solidity >=0.4.19;
/*现在处于第27个字符*/
contract test {
    function helloworld() pure public returns (string)
    {
        return "hello world";
    }
}
/*现在处于第141个字符*/

续：
- source：是一个在source_map中定义的结构体，保存了合约的字段。
- sources：应该是在多个源的时候使用。

3.3.5. run_solidity_analysis

def run_solidity_analysis(inputs):
    results = {}
    exit_code = 0

    for inp in inputs:
        logging.info("contract %s:", inp['contract'])
        result, return_code = symExec.run(disasm_file=inp['disasm_file'], source_map=inp['source_map'], source_file=inp['source'])

        try:
            c_source = inp['c_source']
            c_name = inp['c_name']
            results[c_source][c_name] = result
        except:
            results[c_source] = {c_name: result}

        if return_code == 1:
            exit_code = 1
    return results, exit_code

这里我们对inputs的内容进行遍历，并通过symExec.run来获取结果。

3.4. symExec.py

这个文件是这个框架最重要也是最难以理解的文件，它的基本步骤可以描述成：
- 初始化和收集各种变量。
- 生成control flow graph(CFG)，这是一种在每个区块中只含有逻辑指令，不含有分支指令的图。
- 深度优先遍历CFG，获取整一个逻辑框架所有的可能性。
- 对所有的可能性方案用z3求解器进行验算，对于位置的形参，使用symbolic execution的方式。
整个symExec.py从run进入的运行如下图所示。

3.4.1. symExec.py----run()

def run(disasm_file=None, source_file=None, source_map=None):
    global g_disasm_file
    global g_source_file
    global g_src_map
    global results

    g_disasm_file = disasm_file
    g_source_file = source_file
    g_src_map = source_map

    if is_testing_evm():
        test()
    else:
        begin = time.time()
        log.info("\t============ Results ===========")
        analyze()
        ret = detect_vulnerabilities()
        closing_message()
        return ret

这个函数获取了生成的汇编文件的位置，源文件的位置和SourceMap的对象。
然后run函数调用了analyze()

3.4.2. symExec.py----analyze()

def analyze():
    def timeout_cb():
        if global_params.DEBUG_MODE:
            traceback.print_exc()

    run_build_cfg_and_analyze(timeout_cb=timeout_cb)

3.4.3. symExec.py----run_build_cfg_and_analyze()

def run_build_cfg_and_analyze(timeout_cb=do_nothing):
    initGlobalVars()
    global g_timeout

    try:
        with Timeout(sec=global_params.GLOBAL_TIMEOUT):
            build_cfg_and_analyze()
        log.debug('Done Symbolic execution')
    except TimeoutError:
        g_timeout = True
        timeout_cb()

3.4.4. symExec.py----build_cfg_and_analyze()

def build_cfg_and_analyze():
    change_format()
    with open(g_disasm_file, 'r') as disasm_file:
        disasm_file.readline()  # Remove first line
        tokens = tokenize.generate_tokens(disasm_file.readline)
        collect_vertices(tokens)
        construct_bb()
        construct_static_edges()
        full_sym_exec()  # jump targets are constructed on the fly

在本案例中，变量g_disasm_file的值是'/home/researchlib/oyente-master/example/helloworld.sol:test.evm.disasm'，这个是本项目的汇编码所在位置。
tokenize是一个词汇扫描器，你可以看到每个词或者字符是什么类型的。
- 其中所有的运算符，分隔符和ellipsis都会被标记成OP类型。
- 上面的generate_tokens()接受的参数必须是一个readline，生成器会生成5个元素的具名元祖，内容分别是：
  - type：标记类型
  - string：被标记的字符串
  - start：一个证书组成的2元祖，(srwo,scol)这个标记开始位置的行和列。s;start;
  - line：被标记的字符串所在的那一行，就是输入的那一行的内容。
- 其中还有一个属性exact_type标记了类型为OP词的确切操作类型。
- 我们一会可以在collect_vertices函数中看到对这个对象的调用。

3.4.4.1. collect_vertices(token)

def collect_vertices(tokens):
    global g_src_map
    if g_src_map:
        idx = 0
        positions = g_src_map.positions
        length = len(positions)
    global end_ins_dict
    global instructions
    global jump_type

    current_ins_address = 0
    last_ins_address = 0
    is_new_line = True
    current_block = 0
    current_line_content = ""
    wait_for_push = False
    is_new_block = False

    for tok_type, tok_string, (srow, scol), _, line_number in tokens:
        if wait_for_push is True:
        	#...
        elif is_new_line is True and tok_type == NUMBER:  # looking for a line number
        	# ...
        elif tok_type == NEWLINE:
        	# ...
        elif tok_type == NAME: 
        	# ...
        if tok_string != "=" and tok_string != ">":
            current_line_content += tok_string + " "

    if current_block not in end_ins_dict:
    	# ...

    if current_block not in jump_type:
		# ...
    for key in end_ins_dict:
        if key not in jump_type:
            jump_type[key] = "falls_to"

这个函数主要做的有：
- 解析汇编文件
- 判断区分不同的基础区块
- 把他们存在顶点中
这个循环的主要作用就是将block添加到顶点中[重要]:
- 通过解析出来的token类型，这个循环进行不同的操作；例如tok_type为NAME时，就把对tok_string做判断。解析出来是PUSH之后，则会让wait_for_push设置为True。
- 当读取该行结束之后，会调用一个mapping_push_instruction，把g_src_map.position内的指令放入g_src_map.instr_positions。
- 同时全局变量end_ins_dict记录的是？？
- 全局变量instructions负责记录指令。
- 全局变量jump_type负责记录分支的类型和位置。

tok_type	tok_string	(srow,scol)	line_number	is_new_line	wait_for_push
2	‘0’	(1,0)	‘0 PUSH1 => 0x60\n’	True	False
1	‘PUSH1’	(1,2)	‘0 PUSH1 => 0x60\n’	False	False
53	‘=’	(1,9)	‘0 PUSH1 => 0x60\n’	False	True
2	‘2’	(0,2)	‘2 PUSH1 => 0x40\n’	True	False
1	‘PUSH1’	(2,2)	‘2 PUSH1 => 0x40\n’	True	False
…	…	…	…	…	…

3.4.4.2. construct_bb()

def construct_bb():
    global vertices
    global edges
    sorted_addresses = sorted(instructions.keys())
    size = len(sorted_addresses)
    for key in end_ins_dict:
        end_address = end_ins_dict[key]
        block = BasicBlock(key, end_address)
        if key not in instructions:
            continue
        block.add_instruction(instructions[key])
        i = sorted_addresses.index(key) + 1
        while i < size and sorted_addresses[i] <= end_address:
            block.add_instruction(instructions[sorted_addresses[i]])
            i += 1
        block.set_block_type(jump_type[key])
        vertices[key] = block
        edges[key] = []

这个函数的主要作用是构建一个没有链接的vertices和edges。
vertices内存储着BasicBlock，其内部存有该块的指令，如下图所示。
edge则存有节点key-value值，例如{[0,[]],[13,[]],...}。
节点内的值会在之后的construct_static_edges()补全。

3.4.4.3 construct_static_edges()

def construct_static_edges():
    add_falls_to()  # these edges are static

def add_falls_to():
    global vertices
    global edges
    key_list = sorted(jump_type.keys())
    length = len(key_list)
    for i, key in enumerate(key_list):
        if jump_type[key] != "terminal" and jump_type[key] != "unconditional" and i+1 < length:
            target = key_list[i+1]
            edges[key].append(target)
            vertices[key].set_falls_to(target)

这个函数的作用就是在jump_type不是terminal或者unconditional的时候，把节点的target赋给edges和vertices。

3.4.4.4. full_sym_exec()

这一个函数涉及到的是oyente框架最关键的内容，就是对于合约安全的各种检测
主要的步骤就是
- 获取全部参数，存入param变量。
- 使用sym_exec_block对所有的块进行深度优先遍历。
- 进行symbolic execution，对EVM的栈的内容进行模仿，并且使用求解器约束参数的范围。
- 对不同的可能出现的问题进行逻辑判断，返回不同的异常信息——例如求解器的约束对没有限制的整数进行范围的判定等。
这一部分有机会在下一篇博客进行介绍。

附录

1. 变量

1.1. position_groups

position_group变量内容

Mr. Water

关注

5
点赞
踩
14

收藏

觉得还不错? 一键收藏
15
评论
Oyente搭建，框架结构以及helloworld案例解析(一)

本文目标简要描述oyente搭建的几个坑点。清楚描述oyente框架结构和每个文件的内容。跑通helloworld.sol案例并对整个代码过程进行解析。1. Oyente搭建的几个坑点oyente目前只支持4.19以下的solidity版本，所以我们如果直接按照官方文档中的方法安装sol是没有办法跑通的，最好的办法是使用solc-select来管理安装不同版本的solidity。oyente官方文档中没有提到需要安装crytic_compile库，但是input_helper中又引入了这个
复制链接

扫一扫