【nand2tetris_chap4&6】Machine Language Specification & Assembler (python实现)

runtimeErr0r

已于 2022-03-20 16:25:31 修改

阅读量1k

点赞数 1

分类专栏： nand2tetris 文章标签： python

于 2022-03-04 00:40:23 首次发布

本文链接：https://blog.csdn.net/qq_45793215/article/details/123255880

版权

nand2tetris 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

文章目录

4.2Hack Machine Language Specification
Implementation

4.2Hack Machine Language Specification

4.2.1 Overview

Hack computer is a von Neumann platform
16-bit machine
consisting of
1. a CPU
2. two separate memory serving as instruction memory and data memory
3. two memory mapped I/O devices
4. a screen
5. a keyboard

Memory Address Spaces

两块独立的地址区域，insttruction memory 和data memory，每块区域都是16-bit，总地址长度为15bit（硬件设计共有16-bit的指令，但是需要预留一位用作指令类型识别，所以有效地址位只有15-bit），也即是共有32K的16-bit内存。
CPU只能执行烧录于instruction memory中的程序，该部分设备可以用提前烧录指令数据的ROM实现，要加载新的程序则需要重新烧录ROM。

Register

Hack汇编主要使用两个寄存器D和A，其中D仅仅用作存储数据用，A根据语义可以用作数据寄存器，也可用作地址寄存器，同时将A指向的内存地址记作寄存器M（例如，要实现D = Memory[516]-1，首先将A寄存器的值设为516，然后执行D=M-1；类似的A也可以指向instruction memory来实现branching功能）。
Conceptual model of the Hack memory system.

4.2.2A-instruction

The Hack instruction set

用于将15-bit的值存储在A寄存器中。

@value  // where value is either a non-negative decimal number or a symbol referring to such number

作用：

将数值常量通过程序输入计算机的唯一方式
作为数值计算操作的基础（将计算的数据存储到A寄存器，之后和D寄存器执行计算操作）
作为jump指令的基础

4.2.3C-instruction

一个完整的C-instruction按照dest=comp;jump的格式给出了以下信息：

需要计算的操作数
存储计算结果的位置
接下来执行的操作
最左边以一位置1表示为C-instruction，接下来的两位空闲位置1，接下来分别是comp、dest、jump数据位。

comp field

comp数据为由一位a代码和六位c代码组成，通过预先设定好的ALU结构，可以根据输入的代码对寄存器A, M,D执行预设函数（理论上有7位代码，可以预设128种函数，此处仅使用28种）。
the compute field of C-instruction

dest field

该段3-bit代码指定ALU计算出的结果是否需要存储到指定的寄存器。
the dest field of C-instruction

jump field

根据条件判断是否跳转至寄存器A指向的位置。
the jump field of C-instruction

Conflicting Uses of the A Register

由于A寄存器可同时当作数据内存指针，指向C-instruction可能互操作的M寄存器；也可以当作指令内存指针，指向C-instruction可能跳转的内存地址。所以在可能导致跳转的C指令种尽量不要使用M寄存器的数据。

4.2.4 Symbols

本课程所设计的汇编语言可以通过constant和symbols来指代内存地址，'symbols’种类如下：

Predefined symbols
1. Virtual registers，为简化程序设计，使用R0 ~ R15来指代RAm地址0~15。
2. Predefined pointers，SP、LCL、ARG、THIS、THAT分别指代RAM地址0~4
3. I/O pointers，SCREEN指代RAM地址16384(0x4000)， KBD指代RAM地址24576(0x6000)，用作设备映射地址。
Label symbols
用作goto语句的目标地址，声明方式位(xxx)。声明之后该Labelxxx可以指代下一句指令在内存中的地址。每个Label只能声明一次，但是可以任意次数实用于该程序的任意位置（包括生声明该Label之前的位置）。
Variable symbols
所有汇编程序中出现的程序员自定标识符xxx，如果不属于Predefined symbols和Label symbols，会被视作variable，之后程序会在RAM中为其分配内存（起始于16，0x0010）。

4.2.5 Input/Output Handling

Hack平台通过内存映射的方式管理两个屏幕和键盘两个外设。意味着在频幕上显示像素可以通过改变预设区域（SCREEN）的内存数据实现，监听键盘可以通过实时刷新预设区域（KBD）的内存数据实现。

屏幕外设默认为256 * 512像素。映射至一段8K大小的RAM。256*512 = 8*1024*16
键盘外设映射至RAM地址位24576(0x6000)的16-bit数据，理论上可以监听65536中键盘输入。（接受ASCII码和以下特殊编码）

4.2.6 Syntax Convention and File Format

机器码文件.hack
汇编码文件.asm
1. 每行为一条instruction 或者(Symbol)
2. 常量可以为不以数字开头的任何字母与符号组合
3. 注释放置在行尾，符号//
4. 空行会被忽略
5. Case Sensitive

Implementation

python code

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

"""Compile Assembly code into Binary Machine code."""

__author__ = 'eric'

import getopt
import os
import sys


# ----------------------------------------------------------
# Description:
#   Compile assembly language into machine language(binary code)
#   .asm --> .hack (.bin)
# Input:
#   asm_list        - list of str that compose asm file
def assembler(asm_list):
    print('[Log]:\t--*  assembler  *--')
    # print('[Log]:\tasm_list:\n', asm_list)

    # build symbol table and replace all the symbol in .asm file with its corresponding numeric value
    build_symbol_table = SymbolTable(asm_list)  # instantiate SymbolTable
    no_symbol_list = build_symbol_table.get_no_symbol_list()

    # parser .asm code line by line into machine code(binary)
    parser = Parser(no_symbol_list)  # instantiate Parser
    bin_list = parser.get_bin_list()
    # print('[Log]:\tparsed file:\n', bin_list)

    return bin_list


# ----------------------------------------------------------
# Description:
#   build symbol table and replace all the symbol in .asm file with its corresponding numeric value
class SymbolTable(object):
    def __init__(self, asm_list):
        self.asm_list = asm_list
        self.no_comment_list = []
        self.no_symbol_list = []
        # add predefined symbol into symbol_table
        self.symbol_table = {
            'SP': '0',
            'LCL': '1',
            'ARG': '2',
            'THIS': '3',
            'THAT': '4',
            'R0': '0',
            'R1': '1',
            'R2': '2',
            'R3': '3',
            'R4': '4',
            'R5': '5',
            'R6': '6',
            'R7': '7',
            'R8': '8',
            'R9': '9',
            'R10': '10',
            'R11': '11',
            'R12': '12',
            'R13': '13',
            'R14': '14',
            'R15': '15',
            'SCREEN': '16384',
            'KBD': '24576'}

        # main process
        self.remove_comment()
        self.proc_label()
        self.proc_variable()
        self.replace_symbol()

    # ----------------------------------------------------------
    # Description:
    # Remove comment and empty line
    def remove_comment(self):
        for line in self.asm_list:
            if '//' in line:  # remove comment
                line = line[0:line.index('//')]

            line = line.strip()  # remove white space in both left and right side

            if len(line) != 0:  # ignore empty line
                self.no_comment_list.append(line)

    def get_no_comment_list(self):
        return self.no_comment_list

    # ----------------------------------------------------------
    # Description:
    #   find and remove all Label, update the symbol_table
    def proc_label(self):
        temp_list = []
        for line in self.no_comment_list:
            if self.contain_label(line):
                cur_label = line[1:-1]
                self.symbol_table[cur_label] = str(len(temp_list))
            else:
                temp_list.append(line)
        self.no_comment_list = temp_list

    # ----------------------------------------------------------
    # Description:
    #   detect if line contain label('(' + label + ')'),
    # Input:
    #     line     one line(str) in .asm file
    @staticmethod
    def contain_label(line):
        if (line[0] == '(') and (line[-1] == ')'):
            return 1
        else:
            return 0

    # ----------------------------------------------------------
    # Description:
    #   find all variable and then add them into symbol_table, won't make change on no_comment_list
    def proc_variable(self):
        pointer_in_ram = 16
        for line in self.no_comment_list:
            if self.contain_variable(line):
                cur_variable = line[1:]
                if cur_variable not in self.symbol_table:
                    self.symbol_table[cur_variable] = str(pointer_in_ram)
                    pointer_in_ram += 1

    # ----------------------------------------------------------
    # Description:
    #   detect if line contain variable('@' is the first character and followed by a label),
    # Input:
    #     line     one line(str) in .asm file
    @staticmethod
    def contain_variable(line):
        if (line[0] == '@') and (line[1].isalpha()):
            return 1
        else:
            return 0

    # ----------------------------------------------------------
    # Description:
    #   replace all symbol encountered in .asm file with corresponding numeric value
    def replace_symbol(self):
        for line in self.no_comment_list:
            if self.contain_variable(line):  # contain variable
                cur_symbol = line[1:]
                if cur_symbol in self.symbol_table:
                    self.no_symbol_list.append('@' + self.symbol_table[cur_symbol])
                else:
                    print('[Log]:\tError_unresolved_symbol')
            else:
                self.no_symbol_list.append(line)

    def get_no_symbol_list(self):
        return self.no_symbol_list


# ----------------------------------------------------------
# Description:
#   parse .asm file into corresponding machine code file
class Parser(object):
    def __init__(self, no_symbol_list):
        self.no_symbol_list = no_symbol_list
        self.bin_list = []

        # main precess
        for line in self.no_symbol_list:
            if '@' in line:
                self.bin_list.append(self.parse_a_command(line))
            else:
                self.bin_list.append(self.parse_c_command(line))

    def get_bin_list(self):
        return self.bin_list

    # ----------------------------------------------------------
    # parse a command and then return its corresponding binary code
    # Input:
    #     command   a command line
    @staticmethod
    def parse_a_command(command):
        a_str = ''
        if command[0] == '@':
            a_str = command[1:]
        else:
            print('[Log]:\tError_failed_resolving_a_command')
            exit(0)

        bin_str = bin(int(a_str))[2:]
        while len(bin_str) < 16:
            bin_str = '0' + bin_str
        return bin_str

    # ----------------------------------------------------------
    # parse c command and then return its corresponding binary code
    # Input:
    #     command   a command line
    @staticmethod
    def parse_c_command(command):
        bin_str = '111'
        comp_dict_A = {'0': '101010',
                       '1': '111111',
                       '-1': '111010',
                       'D': '001100',
                       'A': '110000',
                       '!D': '001101',
                       '!A': '110001',
                       '-D': '001111',
                       '-A': '110011',
                       'D+1': '011111',
                       'A+1': '110111',
                       'D-1': '001110',
                       'A-1': '110010',
                       'D+A': '000010',
                       'D-A': '010011',
                       'A-D': '000111',
                       'D&A': '000000',
                       'D|A': '010101'}

        comp_dict_M = {'M': '110000',
                       '!M': '110001',
                       '-M': '110011',
                       'M+1': '110111',
                       'M-1': '110010',
                       'D+M': '000010',
                       'D-M': '010011',
                       'M-D': '000111',
                       'D&M': '000000',
                       'D|M': '010101'}

        dest_dict = {'': '000',
                     'M': '001',
                     'D': '010',
                     'MD': '011',
                     'A': '100',
                     'AM': '101',
                     'AD': '110',
                     'AMD': '111'}

        jump_dict = {'': '000',
                     'JGT': '001',
                     'JEQ': '010',
                     'JGE': '011',
                     'JLT': '100',
                     'JNE': '101',
                     'JLE': '110',
                     'JMP': '111'}

        # break command into three part
        dest = ''
        jump = ''
        if '=' in command:
            dest = command[0:command.index('=')]
            command = command[command.index('=') + 1:]

        if ';' in command:
            jump = command[command.index(';') + 1:]
            command = command[0:command.index(';')]

        comp = command

        # comp part
        if 'M' in comp:
            bin_str += '1' + comp_dict_M[comp]
        else:
            bin_str += '0' + comp_dict_A[comp]

        # dest field
        bin_str += dest_dict[dest]
        # jump field
        bin_str += jump_dict[jump]

        return bin_str


# ----------------------------------------------------------
# Description:
#       receive command parameters, validating input path, pass asm_list to assembler, write bin_list to output_path
def main():
    input_path = recv_opt_arg(sys.argv)
    print('input_path:\t' + input_path)

    if os.path.isfile(input_path):
        if input_path[-4:] == '.asm':
            output_path = input_path[0:-3] + 'hack'
            print('output_path:\t' + output_path)

            with open(input_path, 'r') as f:  # import original jack_file into a list
                asm_list = f.readlines()

            bin_list = assembler(asm_list)
            write_out_file(output_path, bin_list)
        else:
            print('[Log]:\tError_invalid_file_type')
            exit(0)
    else:
        print('[Log]:\tError_invalid_input_path')
        exit(0)


# ----------------------------------------------------------
# Description:
#       receive command line input. return input_path or print usage
# Output:
#       input_path
def recv_opt_arg(argv):
    # print('sys.argv=| ', argv, ' |')

    try:
        opts, args = getopt.gnu_getopt(argv[1:], 'i:h?', ['input_path=', 'help'])
        # 'opts' is a list of tuple ('option', 'value'), each option match one value
        # 'args' is a list contains extra arguments
        # print('opts=| ', opts, ' |')
        # print('args=| ', args, ' |')
    except getopt.GetoptError as e:
        print(e)
        print_usage(argv[0])
        sys.exit()

    input_path = os.getcwd()  # default input path
    for opt, value in opts:  # ('option', 'value'), tuple
        if opt in ['-h', '-?', '--help']:  # print help information
            print_usage(argv[0])
            exit(0)
        elif opt == '-i':  # input_path
            input_path = value

    return input_path


# ----------------------------------------------------------
# Description:
#   print usage information of this script
def print_usage(cmd):
    print(('*********************************************************\n' +
           ' --* This massage gave you some detailed information! *--\n' +
           'Usage: {0} [OPTION]... [PATH]...\n' +
           '- OPTION:\n' +
           '  {0} -i | --input_path\tinput path\n' +
           '  {0} -h | -? | --help\tprint help info and exit script\n' +
           '- PATH:\n' +
           '  Provides name of the file you want to precess or directory that contain those files\n' +
           ' --*  *-- \n' +
           '*********************************************************\n').format(cmd))


# ----------------------------------------------------------
# Description:
#       writing out_list into file named out_file_path
# Input:
#       out_file_path, out_list
def write_out_file(out_file_path, out_list):
    with open(out_file_path, 'w') as f:
        for line in out_list:
            f.write(line + '\n')


if __name__ == '__main__':
    main()

runtimeErr0r

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
【nand2tetris_chap4&6】Machine Language Specification & Assembler (python实现)

Project: nand2tetris Chapter 4 & 6Module: 1.Machine-Language Specification2.AssemblerImplementation: python
复制链接

扫一扫