文章目录
4.2Hack Machine Language Specification
4.2.1 Overview
- Hack computer is a von Neumann platform
- 16-bit machine
- consisting of
1. a CPU
2. two separate memory serving as instruction memory and data memory
3. two memory mapped I/O devices
4. a screen
5. a keyboard
Memory Address Spaces
两块独立的地址区域,insttruction memory
和data memory
,每块区域都是16-bit,总地址长度为15bit(硬件设计共有16-bit的指令,但是需要预留一位用作指令类型识别,所以有效地址位只有15-bit),也即是共有32K的16-bit内存。
CPU只能执行烧录于instruction memory
中的程序,该部分设备可以用提前烧录指令数据的ROM实现,要加载新的程序则需要重新烧录ROM。
Register
Hack汇编主要使用两个寄存器D
和A
,其中D
仅仅用作存储数据用,A
根据语义可以用作数据寄存器,也可用作地址寄存器,同时将A
指向的内存地址记作寄存器M
(例如,要实现D = Memory[516]-1
,首先将A
寄存器的值设为516,然后执行D=M-1
;类似的A也可以指向instruction memory
来实现branching功能)。
4.2.2A-instruction
用于将15-bit的值存储在A寄存器中。
@value // where value is either a non-negative decimal number or a symbol referring to such number
作用:
- 将数值常量通过程序输入计算机的唯一方式
- 作为数值计算操作的基础(将计算的数据存储到A寄存器,之后和D寄存器执行计算操作)
- 作为
jump
指令的基础
4.2.3C-instruction
一个完整的C-instruction
按照dest=comp;jump
的格式给出了以下信息:
- 需要计算的操作数
- 存储计算结果的位置
- 接下来执行的操作
最左边以一位置1表示为C-instruction
,接下来的两位空闲位置1,接下来分别是comp
、dest
、jump
数据位。
comp field
comp
数据为由一位a
代码和六位c
代码组成,通过预先设定好的ALU结构,可以根据输入的代码对寄存器A, M,D
执行预设函数(理论上有7位代码,可以预设128种函数,此处仅使用28种)。
dest field
该段3-bit代码指定ALU计算出的结果是否需要存储到指定的寄存器。
jump field
根据条件判断是否跳转至寄存器A指向的位置。
Conflicting Uses of the A Register
由于A寄存器可同时当作数据内存指针,指向C-instruction
可能互操作的M寄存器;也可以当作指令内存指针,指向C-instruction
可能跳转的内存地址。所以在可能导致跳转的C指令种尽量不要使用M寄存器的数据。
4.2.4 Symbols
本课程所设计的汇编语言可以通过constant
和symbols
来指代内存地址,'symbols’种类如下:
- Predefined symbols
1.Virtual registers
,为简化程序设计,使用R0 ~ R15
来指代RAm地址0~15
。
2.Predefined pointers
,SP
、LCL
、ARG
、THIS
、THAT
分别指代RAM地址0~4
3.I/O pointers
,SCREEN
指代RAM地址16384(0x4000)
,KBD
指代RAM地址24576(0x6000)
,用作设备映射地址。 - Label symbols
用作goto
语句的目标地址,声明方式位(xxx)
。声明之后该Labelxxx
可以指代下一句指令在内存中的地址。每个Label只能声明一次,但是可以任意次数实用于该程序的任意位置(包括生声明该Label之前的位置)。 - Variable symbols
所有汇编程序中出现的程序员自定标识符xxx
,如果不属于Predefined symbols
和Label symbols
,会被视作variable
,之后程序会在RAM中为其分配内存(起始于16,0x0010)。
4.2.5 Input/Output Handling
Hack平台通过内存映射的方式管理两个屏幕和键盘两个外设。意味着在频幕上显示像素可以通过改变预设区域(SCREEN
)的内存数据实现,监听键盘可以通过实时刷新预设区域(KBD
)的内存数据实现。
- 屏幕外设默认为
256 * 512
像素。映射至一段8K大小的RAM。256*512 = 8*1024*16
- 键盘外设映射至RAM地址位
24576(0x6000)
的16-bit数据,理论上可以监听65536
中键盘输入。(接受ASCII码和以下特殊编码)
4.2.6 Syntax Convention and File Format
- 机器码文件
.hack
- 汇编码文件
.asm
1. 每行为一条instruction
或者(Symbol)
2. 常量可以为不以数字开头的任何字母与符号组合
3. 注释放置在行尾,符号//
4. 空行会被忽略
5. Case Sensitive
Implementation
python code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""Compile Assembly code into Binary Machine code."""
__author__ = 'eric'
import getopt
import os
import sys
# ----------------------------------------------------------
# Description:
# Compile assembly language into machine language(binary code)
# .asm --> .hack (.bin)
# Input:
# asm_list - list of str that compose asm file
def assembler(asm_list):
print('[Log]:\t--* assembler *--')
# print('[Log]:\tasm_list:\n', asm_list)
# build symbol table and replace all the symbol in .asm file with its corresponding numeric value
build_symbol_table = SymbolTable(asm_list) # instantiate SymbolTable
no_symbol_list = build_symbol_table.get_no_symbol_list()
# parser .asm code line by line into machine code(binary)
parser = Parser(no_symbol_list) # instantiate Parser
bin_list = parser.get_bin_list()
# print('[Log]:\tparsed file:\n', bin_list)
return bin_list
# ----------------------------------------------------------
# Description:
# build symbol table and replace all the symbol in .asm file with its corresponding numeric value
class SymbolTable(object):
def __init__(self, asm_list):
self.asm_list = asm_list
self.no_comment_list = []
self.no_symbol_list = []
# add predefined symbol into symbol_table
self.symbol_table = {
'SP': '0',
'LCL': '1',
'ARG': '2',
'THIS': '3',
'THAT': '4',
'R0': '0',
'R1': '1',
'R2': '2',
'R3': '3',
'R4': '4',
'R5': '5',
'R6': '6',
'R7': '7',
'R8': '8',
'R9': '9',
'R10': '10',
'R11': '11',
'R12': '12',
'R13': '13',
'R14': '14',
'R15': '15',
'SCREEN': '16384',
'KBD': '24576'}
# main process
self.remove_comment()
self.proc_label()
self.proc_variable()
self.replace_symbol()
# ----------------------------------------------------------
# Description:
# Remove comment and empty line
def remove_comment(self):
for line in self.asm_list:
if '//' in line: # remove comment
line = line[0:line.index('//')]
line = line.strip() # remove white space in both left and right side
if len(line) != 0: # ignore empty line
self.no_comment_list.append(line)
def get_no_comment_list(self):
return self.no_comment_list
# ----------------------------------------------------------
# Description:
# find and remove all Label, update the symbol_table
def proc_label(self):
temp_list = []
for line in self.no_comment_list:
if self.contain_label(line):
cur_label = line[1:-1]
self.symbol_table[cur_label] = str(len(temp_list))
else:
temp_list.append(line)
self.no_comment_list = temp_list
# ----------------------------------------------------------
# Description:
# detect if line contain label('(' + label + ')'),
# Input:
# line one line(str) in .asm file
@staticmethod
def contain_label(line):
if (line[0] == '(') and (line[-1] == ')'):
return 1
else:
return 0
# ----------------------------------------------------------
# Description:
# find all variable and then add them into symbol_table, won't make change on no_comment_list
def proc_variable(self):
pointer_in_ram = 16
for line in self.no_comment_list:
if self.contain_variable(line):
cur_variable = line[1:]
if cur_variable not in self.symbol_table:
self.symbol_table[cur_variable] = str(pointer_in_ram)
pointer_in_ram += 1
# ----------------------------------------------------------
# Description:
# detect if line contain variable('@' is the first character and followed by a label),
# Input:
# line one line(str) in .asm file
@staticmethod
def contain_variable(line):
if (line[0] == '@') and (line[1].isalpha()):
return 1
else:
return 0
# ----------------------------------------------------------
# Description:
# replace all symbol encountered in .asm file with corresponding numeric value
def replace_symbol(self):
for line in self.no_comment_list:
if self.contain_variable(line): # contain variable
cur_symbol = line[1:]
if cur_symbol in self.symbol_table:
self.no_symbol_list.append('@' + self.symbol_table[cur_symbol])
else:
print('[Log]:\tError_unresolved_symbol')
else:
self.no_symbol_list.append(line)
def get_no_symbol_list(self):
return self.no_symbol_list
# ----------------------------------------------------------
# Description:
# parse .asm file into corresponding machine code file
class Parser(object):
def __init__(self, no_symbol_list):
self.no_symbol_list = no_symbol_list
self.bin_list = []
# main precess
for line in self.no_symbol_list:
if '@' in line:
self.bin_list.append(self.parse_a_command(line))
else:
self.bin_list.append(self.parse_c_command(line))
def get_bin_list(self):
return self.bin_list
# ----------------------------------------------------------
# parse a command and then return its corresponding binary code
# Input:
# command a command line
@staticmethod
def parse_a_command(command):
a_str = ''
if command[0] == '@':
a_str = command[1:]
else:
print('[Log]:\tError_failed_resolving_a_command')
exit(0)
bin_str = bin(int(a_str))[2:]
while len(bin_str) < 16:
bin_str = '0' + bin_str
return bin_str
# ----------------------------------------------------------
# parse c command and then return its corresponding binary code
# Input:
# command a command line
@staticmethod
def parse_c_command(command):
bin_str = '111'
comp_dict_A = {'0': '101010',
'1': '111111',
'-1': '111010',
'D': '001100',
'A': '110000',
'!D': '001101',
'!A': '110001',
'-D': '001111',
'-A': '110011',
'D+1': '011111',
'A+1': '110111',
'D-1': '001110',
'A-1': '110010',
'D+A': '000010',
'D-A': '010011',
'A-D': '000111',
'D&A': '000000',
'D|A': '010101'}
comp_dict_M = {'M': '110000',
'!M': '110001',
'-M': '110011',
'M+1': '110111',
'M-1': '110010',
'D+M': '000010',
'D-M': '010011',
'M-D': '000111',
'D&M': '000000',
'D|M': '010101'}
dest_dict = {'': '000',
'M': '001',
'D': '010',
'MD': '011',
'A': '100',
'AM': '101',
'AD': '110',
'AMD': '111'}
jump_dict = {'': '000',
'JGT': '001',
'JEQ': '010',
'JGE': '011',
'JLT': '100',
'JNE': '101',
'JLE': '110',
'JMP': '111'}
# break command into three part
dest = ''
jump = ''
if '=' in command:
dest = command[0:command.index('=')]
command = command[command.index('=') + 1:]
if ';' in command:
jump = command[command.index(';') + 1:]
command = command[0:command.index(';')]
comp = command
# comp part
if 'M' in comp:
bin_str += '1' + comp_dict_M[comp]
else:
bin_str += '0' + comp_dict_A[comp]
# dest field
bin_str += dest_dict[dest]
# jump field
bin_str += jump_dict[jump]
return bin_str
# ----------------------------------------------------------
# Description:
# receive command parameters, validating input path, pass asm_list to assembler, write bin_list to output_path
def main():
input_path = recv_opt_arg(sys.argv)
print('input_path:\t' + input_path)
if os.path.isfile(input_path):
if input_path[-4:] == '.asm':
output_path = input_path[0:-3] + 'hack'
print('output_path:\t' + output_path)
with open(input_path, 'r') as f: # import original jack_file into a list
asm_list = f.readlines()
bin_list = assembler(asm_list)
write_out_file(output_path, bin_list)
else:
print('[Log]:\tError_invalid_file_type')
exit(0)
else:
print('[Log]:\tError_invalid_input_path')
exit(0)
# ----------------------------------------------------------
# Description:
# receive command line input. return input_path or print usage
# Output:
# input_path
def recv_opt_arg(argv):
# print('sys.argv=| ', argv, ' |')
try:
opts, args = getopt.gnu_getopt(argv[1:], 'i:h?', ['input_path=', 'help'])
# 'opts' is a list of tuple ('option', 'value'), each option match one value
# 'args' is a list contains extra arguments
# print('opts=| ', opts, ' |')
# print('args=| ', args, ' |')
except getopt.GetoptError as e:
print(e)
print_usage(argv[0])
sys.exit()
input_path = os.getcwd() # default input path
for opt, value in opts: # ('option', 'value'), tuple
if opt in ['-h', '-?', '--help']: # print help information
print_usage(argv[0])
exit(0)
elif opt == '-i': # input_path
input_path = value
return input_path
# ----------------------------------------------------------
# Description:
# print usage information of this script
def print_usage(cmd):
print(('*********************************************************\n' +
' --* This massage gave you some detailed information! *--\n' +
'Usage: {0} [OPTION]... [PATH]...\n' +
'- OPTION:\n' +
' {0} -i | --input_path\tinput path\n' +
' {0} -h | -? | --help\tprint help info and exit script\n' +
'- PATH:\n' +
' Provides name of the file you want to precess or directory that contain those files\n' +
' --* *-- \n' +
'*********************************************************\n').format(cmd))
# ----------------------------------------------------------
# Description:
# writing out_list into file named out_file_path
# Input:
# out_file_path, out_list
def write_out_file(out_file_path, out_list):
with open(out_file_path, 'w') as f:
for line in out_list:
f.write(line + '\n')
if __name__ == '__main__':
main()