CIT 593 – Module 11 Assignment Instructions
CIT 593 – Module 11 Assignment
Making the LC4 Assembler Instructions
Contents
Assignment Overview 3
Learning Objectives 3
Advice 3
Getting Started 4
Codio Setup 4
Starter Code 4
Object File Format Refresher 4
Requirements 5
General Requirements 5
Assembler 5
assembler.c: main 5
asm_parser.c: read_asm_file 6
asm_parser.c: parse_instruction 6
asm_parser.c: parse_add 6
asm_parser.c: parse_xxx 6
asm_parser.c: str_to_bin 7
asm_parser.c: write_obj_file 7
Extra Credit 8
Suggested Approach 8
High Level Overview 8
Great High Level Overview, but I really need a Slightly More Detailed Overview 10
Part 0: Setup the main Function to Read the Arguments 10
Part 1: Read the .asm File 10
Part 2: Parse an Instruction 1
Part 3: Parse an ADD Instruction 1
Part 4: Converting the binary string to an hexadecimal formatted integer 1
Part 5: Writing the .obj object file 1
Testing 1
Validate Output with PennSim 1
Files for Testing 1
Unit Testing 1
GDB for Debugging 1
Submission 1
Submission Checks 1
The Actual Submission 1
Page 1 of 24CIT 593 – Module 11 Assignment Instructions
Grading 1
Assembler 1
Extra Credit 1
An Important Note of Plagiarism 1
FAQ 1
Quick Hints 1
Formatting 1
Endianness 1
Resources 1
Page 2 of 24CIT 593 – Module 11 Assignment Instructions
Assignment Overview
From lecture you’ve learned that C is file-oriented and that working with files represents I/O
devices in C.
C files fall into two categories: "text" and "binary". In this assignment you’ll work with both types
by reading in a text file and writing out a binary file.
You will read an arbitrary .asm file (a text file intended to be read by PennSim) and write a .obj
file (the same type of binary file that PennSim would write out).
Aside from reading and writing out the files, your task will be to make a mini-LC4- Assembler!
An assembler is a program that reads in assembly language and generates its machine
equivalent.
This assignment will require a bit more programming rigor than we’ve had thus far, but now that
you’ve gained a good amount of programming skill in this class and in others, it is the perfect
time to tackle a large programming assignment (which is why the instructions are so many
pages).
Learning Objectives
This assignment will cover the following topics:
● Review the LC4 Object File Format
● Read text files and process binary files
● Assemble LC4 programs into executable object files
● Use debugging tools such as GDB
Advice
● Start early
● Ask for help early
● Do not try to do it all in one day
Page 3 of 24CIT 593 – Module 11 Assignment Instructions
Getting Started
Codio Setup
Open the Codio assignment via Canvas. This is necessary to link the two systems.
You will see many directories and files. At the top-level workspace directory, the mail files are
asm_parser.h, asm_parser.c, assembler.c, and PennSim.jar.
Do not modify any of the directories or any file in any of the directories.
Starter Code
We have provided a basic framework and several function definitions that you must implement.
assembler.c - must contain your main function.
asm_parser.c - must contain your asm_parser functions.
asm_parser.h - must contain the definition for ROWS and COLS
- must contain function declarations for read_asm_file,
parse_instruction, parse_reg, parse_add, parse_mul,
str_to_bin, write_obj_file, and any helper function you
implement in asm_parser.c
test1.asm - example assembly file
PennSim.jar - a copy of PennSim to check your assembler
Object File Format Refresher
The following is the format for the binary .obj files created by PennSim from your .asm files. It
represents the contents of memory (both program and data) for your assembled LC-4 Assembly
programs. In a .obj file, there are 3 basic sections indicated by 3 header “types” = Code , Data,
and Symbol:
● Code: 3-word header (xCADE, <address>, <n>), n-word body comprising the instructions.
○ This corresponds to the .CODE directive in assembly.
● Data: 3-word header (xDADA, <address>, <n>), n-word body comprising the initial data
values.
○ This corresponds to the .DATA directive in assembly.
● Symbol: 3-word header (xC3B7, <address>, <n>), n-character body comprising the
symbol string. These are generated when you create labels (such as “END”) in
assembly. Each symbol is its own section.
○ Each character in the file is 1 byte, not 2 bytes.
○ There is no NULL terminator.
Page 4 of 24CIT 593 – Module 11 Assignment Instructions
Requirements
General Requirements
● You MUST NOT change the filenames of any file provided to you in the starter code.
● You MUST NOT change the function declarations of any function provided to you in the
starter code.
● You MAY create additional helper functions. If you do, you MUST correctly declare the
functions in the appropriate header file and provide an implementation in the appropriate
source file.
● Your program MUST compile when running the command make.
● You MUST NOT have any compile-time errors or warnings.
● You MUST remove or comment out all debugging print statements before submitting.
● You MUST NOT use externs or global variables.
● You MAY use string.h, stdlib.h, and stdio.h.
● You SHOULD comment your code since this is a programming best practice.
● Your program MUST be able to handle .asm files that PennSim would successfully
assemble. We will not be testing with invalid .asm files.
● Your program MUST NOT crash/segmentation fault.
● You MUST provide a makefile with the following targets:
○ assembler
○ asm_parser.o
○ all, clean, clobber
Assembler
assembler.c: main
● You MUST not change the first four instructions already provided.
● The main function:
○ MUST read the arguments provided to the program.
■ the user will use your program like this:
./assembler test1.asm
○ MUST store the first argument into filename.
○ MUST print an error1 message if the user has not provided an input filename.
○ MUST call read_asm_file to populate program[][].
○ MUST parse each instruction in program[][] and store the binary string equivalent
into program_bin_str[][].
○ MUST convert each binary string into an integer (which MUST have the correct
value when formatted with "0x%X") and store the value into program_bin[].
○ MUST write out the program into a .obj object file which MUST be loadable by
PennSim's ld command.
Page 5 of 24CIT 593 – Module 11 Assignment Instructions
asm_parser.c: read_asm_file
This function reads the user file.
● It SHOULD return an error2 message if there is any error opening or reading the file.
● It MAY try to check if the input program is too large for the defined variables, but we will
not be testing outside the provided limits.
● It MUST read the exact contents of the file into memory, and it MUST remove any
newline characters present in the file.
● It MUST work for files that have an empty line at the end and also for files that end on an
instruction (i.e. do not assume there will always be an empty line at the end of the file).
● It MUST return 0 on success, and it MUST return a non-zero number in the case of
failure (it SHOULD print a useful error message and return 2 on failure).
asm_parser.c: parse_instruction
This function parses a single instruction and determines the binary string equivalent.
● It SHOULD use strtok to tokenize the instruction, using spaces and commas as the
delimiters.
● It MUST determine the instruction function and call the appropriate parse_xxx helper
function.
● It MUST parse ADD, MUL, SUB, DIV, AND, OR, XOR instructions.
○ It MUST parse ADD IMM and AND IMM if attempting that extra credit.
● It MUST return 0 on success, and it MUST return a non-zero number in the case of
failure (it SHOULD print a useful error message and return 3 on failure).
asm_parser.c: parse_add
This function parses an ADD instruction and provides the binary string equivalent.
● It MUST correctly update the opcode, sub-opcode, and register fields following the LC4
ISA.
● It SHOULD call a helper function parse_reg, but we will not be testing this function.
● It MUST return 0 on success, and it MUST return a non-zero number in the case of
failure (it SHOULD print a useful error message and return 4 on failure).
asm_parser.c: parse_xxx
You MUST create a helper function similar to parse_add for the other instruction functions
required in parse_instruction.
● They MUST correctly update the opcode, sub-opcode, and register fields following the
LC4 ISA.
● They SHOULD call a helper function parse_reg, but we will not be testing this function.
● They MUST return 0 on success, and they MUST return a non-zero number in the case
of failure (it SHOULD print a useful error message and return a unique error number on
failure).
Page 6 of 24CIT 593 – Module 11 Assignment Instructions
asm_parser.c: str_to_bin
This function converts a C string containing 1s and 0s into an unsigned short integer
● It MUST correctly convert the binary string to an unsigned short int which can be verified
using the "0x%X" format.
● It SHOULD use strtol to do the conversion.
asm_parser.c: write_obj_file
This function writes the program, in integer format, as a LC4 object file using the LC4 binary
format.
● It MUST output the program in the LC4 binary format described in lecture and in the
Object File Format Refresher section.
● It MUST create and write an empty file if the input file is empty
● It MUST change the extension of the input file to .obj.
● It MUST use the default starting address 0x0000 unless you are attempting the .ADDR
extra credit.
● It MUST close the file with fclose.
● It MUST return 0 on success, and they MUST return a non-zero number in the case of
failure (it SHOULD print a useful error message and return 7 on failure).
● The generated file MUST load into PennSim (and you MUST check this before
submitting), and the contents MUST match the .asm assembly program
Page 7 of 24CIT 593 – Module 11 Assignment Instructions
Extra Credit
You may attempt any, all, or none of these extra credit options. You MUST test using your own
generated examples (we will not provide any).
Option 1: modify your read_asm_file function to ignore comments in .asm files. You MUST
handle all types of comments for credit.
Option 2: modify your program to handle ADD IMM and AND IMM instructions. Both MUST work
completely for credit.
Option 3: modify your program to handle the .CODE and .ADDR directives.
Option 4: modify your program to handle the .DATA, .ADDR, and .FILL directives.
Suggested Approach
This is a suggested approach. You are not required to follow this approach as long as you
follow all of the other requirements.
High Level Overview
Follow these high-level steps and debug thoroughly before moving on to the next.
1. Initialize all arrays to zero or '\0'
2. Call read_asm_file to read the entire .asm file into the array program[][].
a. Using test1.asm as an example, after read_asm_file returns: program[][]
should then contain:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 'A' 'D' 'D' ' ' 'R' '1' ',' ' ' 'R' '0' ',' ' ' 'R' '1' '\0'
1 'M' 'U' 'L' ' ' 'R' '2' ',' ' ' 'R' '1' ',' ' ' 'R' '1' '\0'
2 'S' 'U' 'B' ' ' 'R' '3' ',' ' ' 'R' '2' ',' ' ' 'R' '1' '\0'
3 'D' 'I 'V' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0'
4 'A' 'N' 'D' ' ' 'R' '1' ',' ' ' 'R' '2' ',' ' ' 'R' '3' '\0'
5 'O' 'R' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0' X
6 'X' 'O' 'R' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0'
7 '\0' X X X X X X X X X X X X X X
3. In a loop, for each row X in program[][]:
a. Call parse_instruction, passing it the current row in program[X][] as input to
parse_instruction. When parse_instruction returns,
program_bin_str[X][] should be updated to have the binary equivalent (in
string form).
b. Call str_to_bin passing program_bin_str[X][] to it. When str_to_bin
returns, program_bin[X] should be updated to have the hexadecimal equivalent
of the binary string from program_bin_str[X].
Page 8 of 24CIT 593 – Module 11 Assignment Instructions
4. Once the loop is complete program_bin_str[][] should contain program[][]
equivalent:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
0 '0' '0' '0' '1' '0' '0' '1' '0' '0' '0' '0' '0' '0' '0' '0' '1' '\0'
1 '0' '0' '0' '1' '0' '1' '0' '0' '0' '1' '0' '0' '1' '0' '0' '1' '\0'
2 '0' '0' '0' '1' '0' '1' '1' '0' '1' '0' '0' '1' 0 '0' '0' '1' '\0'
3 '0' '0' '0' '1' '0' '0' '1' '0' '1' '1' '0' '1' '1' '0' '1' '0' '\0'
4 '0' '1' '0' '1' '0' '0' '1' '0' '1' '0' '0' '0' '0' '0' '1' '1' '\0'
5 '0' '1' '0' '1' '0' '0' '1' '0' '1' '1' '0' '1' '0' '0' '1' '0' '\0'
6 '0' '1' '0' '1' '0' '0' '1' '0' '1' '1' '0' '1' '1' '0' '1' '0' '\0'
7 '\0' X X X X X X X X X X X X X X X X
5. Also after the loop is complete, the array program_bin[] should contain
program_bin_str[][]’s equivalent in binary (formatted in hexadecimal here):
0 0x1201
1 0x1449
2 0x1691
3 0x12DA
4 0x5283
5 0x52D2
6 0x52DA
program_bin[] now represents the completely assembled program.
6. Write out the .obj file in binary using the LC4 Object File Format.
Page 9 of 24CIT 593 – Module 11 Assignment Instructions
Great High Level Overview, but I really need a Slightly More
Detailed Overview
Okay, I guess we can give some more details.
Part 0: Setup the main Function to Read the Arguments
Open assembler.c from the helper files; it contains the main function for the program.
Carefully examine the variables at the top:
char* filename = NULL ;
char program [ROWS][COLS] ;
char program_bin_str [ROWS][17] ;
unsigned short int program_bin [ROWS] ;
The first pointer variable filename is a pointer to a string that contains the text file you’ll be
reading. Your program must take in as an argument the name of a .asm file. As an example,
once you compile your main program, you would execute it as follows:
./assembler test1.asm
In the last assignment you learned how to use the arguments passed into main. So the first
thing to implement is to check argc to see if the program has received any arguments. If it
does, point filename to the argument that contains the passed in string that is the file’s name.
You should return from main immediately after printing an error message if the caller doesn’t
provide an input file name. For example, something like this:
error1: usage: ./assembler <assembly_file>.asm
Start by updating assembler.c to read in the arguments and store the filename. Compile your
changes and test them before continuing.
Part 1: Read the .asm File
The next thing to do is to actually read the file into memory. main's next call will be
int read_asm_file (char* filename, char program [ROWS][COLS] ) ;
The purpose of read_asm_file is to open the .asm file, and place its contents into the 2D array
program[][]. You must complete the implementation of this function in the provided helper file
asm_parser.c.
Notice that it takes in the pointer to the filename that you’ll open in this function. It also takes in
the two dimensional array, program, that was defined back in main.
Page 10 of 24CIT 593 – Module 11 Assignment Instructions
You’ll also see that ROWS and COLS are two #define’d macros in asm_parser.h. ROWS is set to
100 and COLS is set to 255. This means that you can only read in a program that is up to 100
lines long and each line of this program can be no longer than 255. When the program
compiles, the compiler will replace all instances of ROWS with 100 and all instances of COLS with
255. This means you can #define these values once to avoid Magic Numbers and simplify
your program.
You’ll want to look at the class notes (or a C reference textbook) to use fopen to open the
filename that has been passed in. Then you’ll want to use a function like fgets to read each
line of the .asm file into the program[][] 2D array. Be aware that fgets will keep carriage
returns (aka the newline character) and you’ll need to strip these from the input.
Take a look at test1.asm file that was included in the helper file. It contains the following
program:
ADD R1, R0, R1
MUL R2, R1, R1
SUB R3, R2, R1
DIV R1, R3, R2
AND R1, R2, R3
OR R1, R3, R2
XOR R1, R3, R2
After you complete read_asm_file and run it on test1.asm, your 2D array program[][]
would contain the contents of the .asm file in this order:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
0 'A' 'D' 'D' ' ' 'R' '1' ',' ' ' 'R' '0' ',' ' ' 'R' '1' '\0'
1 'M' 'U' 'L' ' ' 'R' '2' ',' ' ' 'R' '1' ',' ' ' 'R' '1' '\0'
2 'S' 'U' 'B' ' ' 'R' '3' ',' ' ' 'R' '2' ',' ' ' 'R' '1' '\0'
3 'D' 'I 'V' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0'
4 'A' 'N' 'D' ' ' 'R' '1' ',' ' ' 'R' '2' ',' ' ' 'R' '3' '\0'
5 'O' 'R' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0' X
6 'X' 'O' 'R' ' ' 'R' '1' ',' ' ' 'R' '3' ',' ' ' 'R' '2' '\0'
7 '\0' X X X X X X X X X X X X X X
Notice there are no newline characters at the end of these lines.
If reading in the file is a success, return 0 from the function. If not, return 2 from the function
and print an error to the screen:
Page 11 of 24CIT 593 – Module 11 Assignment Instructions
error2: read_asm_file failed
Implement and test this function carefully before continuing on with the assignment.
Page 12 of 24CIT 593 – Module 11 Assignment Instructions
Part 2: Parse an Instruction
You only need to parse the following instructions: ADD, MUL, SUB, DIV, AND, OR, XOR. You do not
need to implement AND IMM or AND IMM unless you want to attempt the extra credit.
Once read_asm_file is working properly, go back in main, and call parse_instruction,
which is also located in asm_parser.c:
int parse_instruction (char* instr, char* instr_bin_str) ;
Purpose, Arguments, and Return Value
The purpose of this function is to take in a single row of your program[][] array and convert to
its binary equivalent in text form (as a string of 1s and 0s). The argument instr must point to a
row in main’s 2D array program[][]. The argument instr_bin_str must point to the
corresponding row in main’s 2D array program_bin_str[][].
If there no errors are encountered the function will return a 0 and if any error occurs in this
function it should print an error message such as:
error3: parse_instruction failed
return the number 3 immediately.
Let’s assume you’ve called parse_instruction and instr points to the first row in your
program[][] array:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
*instr 'A' 'D' 'D' ' ' 'R' '1' ',' ' ' 'R' '0' ',' ' ' 'R' '1' '\0'
parse_instruction needs to examine this string and convert it into a binary equivalent. You’ll
need to use the LC4 ISA to determine the binary equivalent of an instruction. When your
function returns, the memory pointed to by instr_bin_str, should look like this:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
*instr_bin_str '0' '0' '0' '1' '0' '0' '1' '0' '0' '0
'
'0
'
'0' '0' '0' '0' '1' '\0'
Notice this isn’t actually binary, but it is the ADD instruction’s binary equivalent in text (C String)
form. We will convert this string form of the binary instruction to an integer (hexadecimal) later.
How to implement this function
The purpose of converting the instruction to a binary string (instead of to the binary number it
will eventually become), is so that you can build this string up little by little.
Page 13 of 24CIT 593 – Module 11 Assignment Instructions
Investigate the strtok function in the standard C string library if you haven't already done so for
the last assignment.
strtok allows you to parse a string that is separated by delimiters. In this function you’ll be
parsing the string pointed to by instr and you’ll be building up the string pointed to by
instr_bin_str. instr will contain spaces and commas (those will be your delimiters).
Your first call to strtok on the instr string should return back the instruction function: ADD,
SUB, MUL, DIV, XOR, etc. The only thing common to all 26 instructions in the ISA is that the very
first part of them is the instruction function (e.g. ADD). Once you determine the instruction
function, you’ll call the appropriate helper function to parse the remainder of the instruction.
As an example, let’s say the instruction function is ADD. Once you’ve determined the instruction
function is ADD, you would call the parse_add helper function. It will take the instruction instr
as an argument, but also the instr_bin_str string because parse_add will be responsible for
determining the binary equivalent for the ADD instruction you are currently working on and it will
update instr_bin_str.
int parse_add (char* instr, char* instr_bin_str ) ;
When parse_add returns, and if no errors occurred during parsing the ADD instruction,
instr_bin should now be complete. At this time, you can return 0 from parse_instruction.
If you encounter any errors in this function, you should print an error3 message and return 3.
This is only the first instruction. main will need to do this for each row of program[][], using
strtok to get the instruction function, calling the appropriate parse_xxx helper function to
finish the instruction, and updating instr_bin_str appropriately.
Page 14 of 24CIT 593 – Module 11 Assignment Instructions
Part 3: Parse an ADD Instruction
This function is specific to parsing the ADD instruction, but you will need to write a similar
function for each of the different instruction functions.
The helper function parse_add should be called only by the parse_instruction function. It
has two char* arguments: instr and instr_bin_str.
int parse_add (char* instr, char* instr_bin_str ) ;
Because this function will only be called when parse_instruction encounters an ADD
instruction function, instr will contain an ADD instruction and instr_bin_str should be empty.
Similar to the other functions, if this function encounters no errors it will return 0 and if any error
occurs it should return 4 after printing an error4 message
error4: parse_add() failed
The purpose of this function is to populate instr_bin_str. Upon the function’s start, the binary
opcode can be immediately copied into instr_bin_str[0:3]. Afterwards, strtok can tokenize
the remaining string to separate out the registers RD, RS, and RT, from the instr string.
For each register, call the parse_reg helper function:
int parse_reg (char reg_num, char* instr_bin_str) ;
This function must take a number in character form and populate instr_bin_str with the
appropriate corresponding binary number. For example, if RD = R0 for the ADD instruction, the
'0' character would be passed in the argument reg_num. parse_reg then copies the
characters 000 into instr_bin_str[4:6].
parse_reg should return 5 if any errors occur after printing a standard error5 message;
otherwise it returns 0 upon success.
To implement the parse_reg function, consider using a switch() statement:
This helper function should only parse one register at a time. Also, because it is not specific to
the ADD instruction (nearly all instructions contain registers), you can call it from other functions
that need their registers converted to binary. Example: parse_mul should also call parse_reg.
Note that parse_add must also populate the sub-opcode field in instr_bin_str[10:12].
When parse_add returns, instr_bin_str should be complete. parse_instrunction should
then return to main.
Page 15 of 24CIT 593 – Module 11 Assignment Instructions
You will need to create a helper function for each instruction type, use parse_add as a model.
As an example, you’ll need to create parse_mul, parse_xor, etc. They will all be very similar
functions, so perfect parse_add before you attempt the other functions.
Part 4: Converting the binary string to an hexadecimal formatted integer
After parse_instruction returns successfully to main, main should call str_to_bin:
unsigned short int str_to_bin (char* instr_bin_str) ;
This function should be passed the recently parsed binary string from the array
program_bin_str[X], where X represents the binary instruction that was just populated by the
last call to parse_instruction.
The purpose of this function is to take a binary string and convert it to a 16-bit binary equivalent
and return it to the calling function. To implement this function, we recommend using strtol. If
strtol returns 0, print an error6 message and return 6.
As an example of what this function should do, if it was called with the following argument:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
*instr_bin_str '0' '0' '0
'
'1' '0
'
'0' '1' '0' '0
'
'0' '0' '0' '0' '0' '0' '1
'
'\0'
then it should return: 0x1201, which is the hexadecimal equivalent for this binary string. You can
verify and print out what it returns by using printf("0x%X"), which will print out integers in
hexadecimal format.
Once str_to_bin is returned, it should be assigned to the corresponding spot in the unsigned
short int array program_bin[X], where X matches the index from program_bin_str[X].
Page 16 of 24CIT 593 – Module 11 Assignment Instructions
Part 5: Writing the .obj object file
During lecture, we learned that a .obj file is a binary file and discussed the format of the file.
Recall from lecture that the .obj file’s CODE header is as follows:
3-word header (xCADE, <address>, <n>), n-word body comprising the instructions. This
corresponds to the .CODE directive in assembly.
Given this information, the last function to implement is:
int write_obj_file (char* filename, unsigned short int program_bin[ROWS] ) ;
The purpose of this function is to take the assembled program, represented in hexadecimal in
program_bin[] and output it to a file with the extension: .obj. It must encode the file using the
.obj file format specified in class. If test1.asm was pointed to by filename, your program
would open up a file to write to called: test1.obj.
This function should do the following:
1. Take the filename passed in the argument filename, change the last 3 letters to "obj"
2. Open up that file for writing and in binary format. The file you’ll create is not a text file,
these are not C Strings you’re writing, they are binary numbers.
3. Write out the first word in the header: 0xCADE.
4. Write out the address your program should be loaded at. 0x0000 is the default.
5. Count the number of rows that contain data in program_bin[], then write out <n>
6. Now that the header is complete, write out the <n> rows of data in program_bin[]
7. Close the file using fclose.
If any errors occur, print an appropriate error message and return 7. Otherwise return 0 and
main should then return 0 to the caller. Your program is now complete.
Page 17 of 24CIT 593 – Module 11 Assignment Instructions
Testing
Validate Output with PennSim
Once you have successfully written an object file from an assembly file, examine the .obj file’s
contents using the Linux utility hexdump. From the Linux terminal prompt type:
hexdump test1.obj
hexdump will show you the binary contents. Make certain it matches your expectations!
Depending on how you write out your files, you may encounter endianness variations. Look
carefully at the file I/O lecture slides to understand endianness. Your .obj file must have the
correct endianness to be loaded into PennSim.
As an example, for the program described in the Suggested Approach, the expected hexdump
would be:
0000000 deca 0000 0700 0112 4914 9116 da12 8352
0000010 d252 da52
0000014
Note that the first column indicates the byte index, and that hexdump prints 16 bytes per line (so
the next line starts at 0000010). Further notice that the first byte is deca, which is cade but with
the bytes swapped due to how Linux handles endianness.
You must test your .obj files in PennSim before submission. If they fail to load, you
should expect little, if any, credit for this assignment.
It is your responsibility to test out files other than test1.asm. Also, you must test your .obj files
by loading them into PennSim and seeing if they work! Please do this before submitting
your work. We will be testing your programs with different .asm files, so you should try out
different .asm files of your own.
Files for Testing
We are only providing test1.asm for testing. However, you can (and should) create additional
files that test different parts of the program.
For these test files, bring up PennSim, assemble it, and check the .obj file contents with
hexdump. Then read it into your program and see if you can assemble it into the same object
file. You can create a bunch of test cases very easily with PennSim.
You should test your assembler program on a variety of .asm files, not just simple examples.
Page 18 of 24CIT 593 – Module 11 Assignment Instructions
Unit Testing
When writing such a large program, it is a good strategy to “unit test.” This means, as you create
a small bit of working code, compile it, and create a simple test for it.
DO NOT write the entire program, compile it, and then start testing it. You will never resolve all
of your errors this way. You need to unit test your program as you go along or it will be
impossible to debug.
GDB for Debugging
Using gdb to debug your program is also highly recommended and is actually required before
asking for help on ED or during office hours. The first thing the TAs will ask is if you have done
this; if not, they will ask you to do it. This is a required part of the learning experience and you'll
want this skill to succeed in 595.
While it may seem easy to use print statements, they quickly clutter your program and require
commenting out/deleting when you no longer need them (and inevitably
uncommenting/undeleting when you need to debug again).
gdb allows you to inspect the actual contents of memory which is an advantage over print
statements because print statements only print ASCII characters. Further, you can see the
actual contents of memory of any variable at any time, while print statements only print when
you call the print statement during the execution of your program.
Plus, you’ll have to use it for 595 so you may as well get some practice in now.
Reminder: you will need to add the -g flag to all intermediate compilation steps, not just the
assembler target, and you will need to use the --args command to tell gdb that you have
arguments to your program:
gdb -q -tui --args ./assembler test1.asm
Page 19 of 24CIT 593 – Module 11 Assignment Instructions
Submission
Submission Checks
There is a single "submission check" test that is run once you upload your code to Gradescope.
This test checks that you have submitted all four required files and also that your program
compiles and any autograder code compiles successfully. It does not run your program or
provide any input on whether it works or not. This check just ensures that all the required
components exist. This test is performed after uploading to Gradescope.
If you are not passing this check, please reach out to TAs for troubleshooting assistance.
The Actual Submission
You will submit this assignment to Gradescope in the assignment entitled Assignment 11: File
I/O, Making the LC4 Assembler.
Download all of your .c source and .h header files and your Makefile from Codio to your
computer, then Upload all four of these files to the Gradescope assignment.
You should not submit any of the provided or your own .asm testing files.
You have unlimited submissions until the deadline, after which late penalties apply as noted in
the syllabus.
We will only grade the last submission uploaded.
Page 20 of 24CIT 593 – Module 11 Assignment Instructions
Grading
Assembler
We do provide one example that we will test with, so you can be sure to get those points. You
will have to figure out the rest yourself.
This assignment is worth 200 points, normalized to 100% for gradebook purposes.
00 points: submission check. This is the only test visible to you until we publish scores after the
deadline and manual grading.
20 points: correct makefile
30 points: general code inspection (manually graded)
10 points: handling command line arguments and writing the correct file
20 points: correctly handle endianness
60 points: correctly processing test1.asm (which we provide to you)
60 points: correctly processing our other test files (which we do not provide to you)
Extra Credit
The Extra Credit is worth 11 percentage points so the highest grade on the assignment is 111%.
Your extra credit must not break functionality for the non-extra credit requirements. Make a
backup of your finalized program before attempting the extra credit. If your program fails to
meet the basic requirements, you will end up losing more points than the extra credit will gain.
There is no partial credit. It must work completely for any credit.
We will not give guidance on how to do these since they are designed to be challenge problems.
2 percentage points: modify your read_asm_file function to ignore comments in .asm files.
You must handle all types of comments for credit.
2 percentage points: modify your program to handle ADD IMM and AND IMM instructions. Both
must work completely for credit (no partial credit for one instruction).
5 percentage points: modify your program to handle the .CODE and .ADDR directives. As a hint,
you will need another array to hold the addresses, e.g. unsigned short int address[ROWS].
2 percentage points: modify your program to handle the .DATA directive.
Extra Credit is manually graded after the final submission deadline. The extra credit will be
added to Codio manually.
They will appear in Canvas after all the submissions have been graded.
Since this is a manual process, it will take a lot of time; please have patience, we promise it will
be completed.
Page 21 of 24CIT 593 – Module 11 Assignment Instructions
An Important Note of Plagiarism
● We will scan your assignment files for plagiarism using an automatic plagiarism
detection tool.
● If you are unaware of the plagiarism policy, make certain to check the syllabus to
see the possible repercussions of submitting plagiarized work (or letting someone
submit yours).
Page 22 of 24CIT 593 – Module 11 Assignment Instructions
FAQ
Quick Hints
These are some hints provided by TAs.
● You are allowed to use the switch statement, a compact way to handle long if/then
blocks.
● We won't be testing with string literals in the unit testing.
● You do not need a script file, but you can certainly add one to automate your own
testing.
● We will only be testing with valid .asm files. That is, all the test files will assemble
correctly in PennSim.
● You can raise an error if the register number for Rx is not valid (e.g. R8), but again, we
will not be testing with invalid files.
Formatting
● We will not test with blank lines between instructions, even though PennSim can
assemble these without error.
● We do not expect you to use a regex to check if the instruction matches a format.
● Lines can end with trailing spaces, a newline, or just EOF (end of file) if it is the last line in
the .asm file.
● strtok is sufficient to break the instruction into the different parts (hint: use a delimiter of
" ,"). The Assignment 10 instructions has as link to a good resource.
● All characters will be uppercase, except for x for hexadecimal values (which only applies
to some of the extra credit challenges and they will never be X), even though PennSim
can assemble these without error.
○ BRxxx does not appear in any testing, not even extra credit, so you don't have to
worry about this "edge case"
Endianness
● The x86 (the processor used by Codio) has a different endianness than the LC4. When
doing fread()'s of 2 byte words, swapping occurs to adjust for this. That same
swapping doesn't occur with the fgetc() or fread()'s with size 1.
● If you read the .obj file into memory one word at a time using fread(), you will need to
swap for endianness. In contrast, if you choose to read the .obj file into memory one
byte at a time with fgetc(), the endianness doesn't need to be adjusted. However, you
will have to combine two bytes into a word using bitwise operators.
Page 23 of 24CIT 593 – Module 11 Assignment Instructions
Resources
● strtok reference
https://www.tutorialspoint.com/c_standard_library/c_function_strtok.htm
● switch statement reference
https://www.tutorialspoint.com/cprogramming/switch_statement_in_c.htm
● strtol reference
https://www.tutorialspoint.com/c_standard_library/c_function_strtol.htm
CIT 593 – Module 11
最新推荐文章于 2024-10-27 19:55:47 发布