【nand2tetris_chap4&6】Machine Language Specification & Assembler (python实现)

4.2Hack Machine Language Specification

4.2.1 Overview

  • Hack computer is a von Neumann platform
  • 16-bit machine
  • consisting of
    1. a CPU
    2. two separate memory serving as instruction memory and data memory
    3. two memory mapped I/O devices
    4. a screen
    5. a keyboard

Memory Address Spaces

两块独立的地址区域,insttruction memorydata memory,每块区域都是16-bit,总地址长度为15bit(硬件设计共有16-bit的指令,但是需要预留一位用作指令类型识别,所以有效地址位只有15-bit),也即是共有32K的16-bit内存。
CPU只能执行烧录于instruction memory中的程序,该部分设备可以用提前烧录指令数据的ROM实现,要加载新的程序则需要重新烧录ROM。

Register

Hack汇编主要使用两个寄存器DA,其中D仅仅用作存储数据用,A根据语义可以用作数据寄存器,也可用作地址寄存器,同时将A指向的内存地址记作寄存器M(例如,要实现D = Memory[516]-1,首先将A寄存器的值设为516,然后执行D=M-1;类似的A也可以指向instruction memory来实现branching功能)。
Conceptual model of the Hack memory system.

4.2.2A-instruction

 The Hack instruction set

用于将15-bit的值存储在A寄存器中。

@value  // where value is either a non-negative decimal number or a symbol referring to such number

作用:

  • 将数值常量通过程序输入计算机的唯一方式
  • 作为数值计算操作的基础(将计算的数据存储到A寄存器,之后和D寄存器执行计算操作)
  • 作为jump指令的基础

4.2.3C-instruction

一个完整的C-instruction按照dest=comp;jump的格式给出了以下信息:

  • 需要计算的操作数
  • 存储计算结果的位置
  • 接下来执行的操作
    最左边以一位置1表示为C-instruction,接下来的两位空闲位置1,接下来分别是compdestjump数据位。

comp field

comp数据为由一位a代码和六位c代码组成,通过预先设定好的ALU结构,可以根据输入的代码对寄存器A, M,D执行预设函数(理论上有7位代码,可以预设128种函数,此处仅使用28种)。
the compute field of C-instruction

dest field

该段3-bit代码指定ALU计算出的结果是否需要存储到指定的寄存器。
the dest field of C-instruction

jump field

根据条件判断是否跳转至寄存器A指向的位置。
the jump field of C-instruction

Conflicting Uses of the A Register

由于A寄存器可同时当作数据内存指针,指向C-instruction可能互操作的M寄存器;也可以当作指令内存指针,指向C-instruction可能跳转的内存地址。所以在可能导致跳转的C指令种尽量不要使用M寄存器的数据。

4.2.4 Symbols

本课程所设计的汇编语言可以通过constantsymbols来指代内存地址,'symbols’种类如下:

  • Predefined symbols
    1. Virtual registers,为简化程序设计,使用R0 ~ R15来指代RAm地址0~15
    2. Predefined pointersSPLCLARGTHISTHAT分别指代RAM地址0~4
    3. I/O pointersSCREEN指代RAM地址16384(0x4000)KBD指代RAM地址24576(0x6000),用作设备映射地址。
  • Label symbols
    用作goto语句的目标地址,声明方式位(xxx)。声明之后该Labelxxx可以指代下一句指令在内存中的地址。每个Label只能声明一次,但是可以任意次数实用于该程序的任意位置(包括生声明该Label之前的位置)。
  • Variable symbols
    所有汇编程序中出现的程序员自定标识符xxx,如果不属于Predefined symbolsLabel symbols,会被视作variable,之后程序会在RAM中为其分配内存(起始于16,0x0010)。

4.2.5 Input/Output Handling

Hack平台通过内存映射的方式管理两个屏幕和键盘两个外设。意味着在频幕上显示像素可以通过改变预设区域(SCREEN)的内存数据实现,监听键盘可以通过实时刷新预设区域(KBD)的内存数据实现。

  • 屏幕外设默认为256 * 512像素。映射至一段8K大小的RAM。256*512 = 8*1024*16
  • 键盘外设映射至RAM地址位24576(0x6000)的16-bit数据,理论上可以监听65536中键盘输入。(接受ASCII码和以下特殊编码)
    Special keyboard codes

4.2.6 Syntax Convention and File Format

  • 机器码文件.hack
  • 汇编码文件.asm
    1. 每行为一条instruction 或者(Symbol)
    2. 常量可以为不以数字开头的任何字母与符号组合
    3. 注释放置在行尾,符号//
    4. 空行会被忽略
    5. Case Sensitive

Implementation

python code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Compile Assembly code into Binary Machine code."""

__author__ = 'eric'

import getopt
import os
import sys


# ----------------------------------------------------------
# Description:
#   Compile assembly language into machine language(binary code)
#   .asm --> .hack (.bin)
# Input:
#   asm_list        - list of str that compose asm file
def assembler(asm_list):
    print('[Log]:\t--*  assembler  *--')
    # print('[Log]:\tasm_list:\n', asm_list)

    # build symbol table and replace all the symbol in .asm file with its corresponding numeric value
    build_symbol_table = SymbolTable(asm_list)  # instantiate SymbolTable
    no_symbol_list = build_symbol_table.get_no_symbol_list()

    # parser .asm code line by line into machine code(binary)
    parser = Parser(no_symbol_list)  # instantiate Parser
    bin_list = parser.get_bin_list()
    # print('[Log]:\tparsed file:\n', bin_list)

    return bin_list


# ----------------------------------------------------------
# Description:
#   build symbol table and replace all the symbol in .asm file with its corresponding numeric value
class SymbolTable(object):
    def __init__(self, asm_list):
        self.asm_list = asm_list
        self.no_comment_list = []
        self.no_symbol_list = []
        # add predefined symbol into symbol_table
        self.symbol_table = {
            'SP': '0',
            'LCL': '1',
            'ARG': '2',
            'THIS': '3',
            'THAT': '4',
            'R0': '0',
            'R1': '1',
            'R2': '2',
            'R3': '3',
            'R4': '4',
            'R5': '5',
            'R6': '6',
            'R7': '7',
            'R8': '8',
            'R9': '9',
            'R10': '10',
            'R11': '11',
            'R12': '12',
            'R13': '13',
            'R14': '14',
            'R15': '15',
            'SCREEN': '16384',
            'KBD': '24576'}

        # main process
        self.remove_comment()
        self.proc_label()
        self.proc_variable()
        self.replace_symbol()

    # ----------------------------------------------------------
    # Description:
    # Remove comment and empty line
    def remove_comment(self):
        for line in self.asm_list:
            if '//' in line:  # remove comment
                line = line[0:line.index('//')]

            line = line.strip()  # remove white space in both left and right side

            if len(line) != 0:  # ignore empty line
                self.no_comment_list.append(line)

    def get_no_comment_list(self):
        return self.no_comment_list

    # ----------------------------------------------------------
    # Description:
    #   find and remove all Label, update the symbol_table
    def proc_label(self):
        temp_list = []
        for line in self.no_comment_list:
            if self.contain_label(line):
                cur_label = line[1:-1]
                self.symbol_table[cur_label] = str(len(temp_list))
            else:
                temp_list.append(line)
        self.no_comment_list = temp_list

    # ----------------------------------------------------------
    # Description:
    #   detect if line contain label('(' + label + ')'),
    # Input:
    #     line     one line(str) in .asm file
    @staticmethod
    def contain_label(line):
        if (line[0] == '(') and (line[-1] == ')'):
            return 1
        else:
            return 0

    # ----------------------------------------------------------
    # Description:
    #   find all variable and then add them into symbol_table, won't make change on no_comment_list
    def proc_variable(self):
        pointer_in_ram = 16
        for line in self.no_comment_list:
            if self.contain_variable(line):
                cur_variable = line[1:]
                if cur_variable not in self.symbol_table:
                    self.symbol_table[cur_variable] = str(pointer_in_ram)
                    pointer_in_ram += 1

    # ----------------------------------------------------------
    # Description:
    #   detect if line contain variable('@' is the first character and followed by a label),
    # Input:
    #     line     one line(str) in .asm file
    @staticmethod
    def contain_variable(line):
        if (line[0] == '@') and (line[1].isalpha()):
            return 1
        else:
            return 0

    # ----------------------------------------------------------
    # Description:
    #   replace all symbol encountered in .asm file with corresponding numeric value
    def replace_symbol(self):
        for line in self.no_comment_list:
            if self.contain_variable(line):  # contain variable
                cur_symbol = line[1:]
                if cur_symbol in self.symbol_table:
                    self.no_symbol_list.append('@' + self.symbol_table[cur_symbol])
                else:
                    print('[Log]:\tError_unresolved_symbol')
            else:
                self.no_symbol_list.append(line)

    def get_no_symbol_list(self):
        return self.no_symbol_list


# ----------------------------------------------------------
# Description:
#   parse .asm file into corresponding machine code file
class Parser(object):
    def __init__(self, no_symbol_list):
        self.no_symbol_list = no_symbol_list
        self.bin_list = []

        # main precess
        for line in self.no_symbol_list:
            if '@' in line:
                self.bin_list.append(self.parse_a_command(line))
            else:
                self.bin_list.append(self.parse_c_command(line))

    def get_bin_list(self):
        return self.bin_list

    # ----------------------------------------------------------
    # parse a command and then return its corresponding binary code
    # Input:
    #     command   a command line
    @staticmethod
    def parse_a_command(command):
        a_str = ''
        if command[0] == '@':
            a_str = command[1:]
        else:
            print('[Log]:\tError_failed_resolving_a_command')
            exit(0)

        bin_str = bin(int(a_str))[2:]
        while len(bin_str) < 16:
            bin_str = '0' + bin_str
        return bin_str

    # ----------------------------------------------------------
    # parse c command and then return its corresponding binary code
    # Input:
    #     command   a command line
    @staticmethod
    def parse_c_command(command):
        bin_str = '111'
        comp_dict_A = {'0': '101010',
                       '1': '111111',
                       '-1': '111010',
                       'D': '001100',
                       'A': '110000',
                       '!D': '001101',
                       '!A': '110001',
                       '-D': '001111',
                       '-A': '110011',
                       'D+1': '011111',
                       'A+1': '110111',
                       'D-1': '001110',
                       'A-1': '110010',
                       'D+A': '000010',
                       'D-A': '010011',
                       'A-D': '000111',
                       'D&A': '000000',
                       'D|A': '010101'}

        comp_dict_M = {'M': '110000',
                       '!M': '110001',
                       '-M': '110011',
                       'M+1': '110111',
                       'M-1': '110010',
                       'D+M': '000010',
                       'D-M': '010011',
                       'M-D': '000111',
                       'D&M': '000000',
                       'D|M': '010101'}

        dest_dict = {'': '000',
                     'M': '001',
                     'D': '010',
                     'MD': '011',
                     'A': '100',
                     'AM': '101',
                     'AD': '110',
                     'AMD': '111'}

        jump_dict = {'': '000',
                     'JGT': '001',
                     'JEQ': '010',
                     'JGE': '011',
                     'JLT': '100',
                     'JNE': '101',
                     'JLE': '110',
                     'JMP': '111'}

        # break command into three part
        dest = ''
        jump = ''
        if '=' in command:
            dest = command[0:command.index('=')]
            command = command[command.index('=') + 1:]

        if ';' in command:
            jump = command[command.index(';') + 1:]
            command = command[0:command.index(';')]

        comp = command

        # comp part
        if 'M' in comp:
            bin_str += '1' + comp_dict_M[comp]
        else:
            bin_str += '0' + comp_dict_A[comp]

        # dest field
        bin_str += dest_dict[dest]
        # jump field
        bin_str += jump_dict[jump]

        return bin_str


# ----------------------------------------------------------
# Description:
#       receive command parameters, validating input path, pass asm_list to assembler, write bin_list to output_path
def main():
    input_path = recv_opt_arg(sys.argv)
    print('input_path:\t' + input_path)

    if os.path.isfile(input_path):
        if input_path[-4:] == '.asm':
            output_path = input_path[0:-3] + 'hack'
            print('output_path:\t' + output_path)

            with open(input_path, 'r') as f:  # import original jack_file into a list
                asm_list = f.readlines()

            bin_list = assembler(asm_list)
            write_out_file(output_path, bin_list)
        else:
            print('[Log]:\tError_invalid_file_type')
            exit(0)
    else:
        print('[Log]:\tError_invalid_input_path')
        exit(0)


# ----------------------------------------------------------
# Description:
#       receive command line input. return input_path or print usage
# Output:
#       input_path
def recv_opt_arg(argv):
    # print('sys.argv=| ', argv, ' |')

    try:
        opts, args = getopt.gnu_getopt(argv[1:], 'i:h?', ['input_path=', 'help'])
        # 'opts' is a list of tuple ('option', 'value'), each option match one value
        # 'args' is a list contains extra arguments
        # print('opts=| ', opts, ' |')
        # print('args=| ', args, ' |')
    except getopt.GetoptError as e:
        print(e)
        print_usage(argv[0])
        sys.exit()

    input_path = os.getcwd()  # default input path
    for opt, value in opts:  # ('option', 'value'), tuple
        if opt in ['-h', '-?', '--help']:  # print help information
            print_usage(argv[0])
            exit(0)
        elif opt == '-i':  # input_path
            input_path = value

    return input_path


# ----------------------------------------------------------
# Description:
#   print usage information of this script
def print_usage(cmd):
    print(('*********************************************************\n' +
           ' --* This massage gave you some detailed information! *--\n' +
           'Usage: {0} [OPTION]... [PATH]...\n' +
           '- OPTION:\n' +
           '  {0} -i | --input_path\tinput path\n' +
           '  {0} -h | -? | --help\tprint help info and exit script\n' +
           '- PATH:\n' +
           '  Provides name of the file you want to precess or directory that contain those files\n' +
           ' --*  *-- \n' +
           '*********************************************************\n').format(cmd))


# ----------------------------------------------------------
# Description:
#       writing out_list into file named out_file_path
# Input:
#       out_file_path, out_list
def write_out_file(out_file_path, out_list):
    with open(out_file_path, 'w') as f:
        for line in out_list:
            f.write(line + '\n')


if __name__ == '__main__':
    main()

  • 1
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值