Assignment 2: a file archiver

Contents

Aims The Task

Getting Started Subset 0

Subset 1

qq1703105484

Subset 3 Handling Errors

Reference implementation The drop and droplet format The droplet hash

Assumptions and Clarifications Assessment

Testing Submission Due Date

Assessment Scheme Intermediate Versions of Work Assignment Conditions

Change Log

Aims

 

Assignment 2: a file archiver

 

Contents Aims The Task

Getting Started Subset 0

Subset 1

Subset 2

Subset 3

Handling Errors

The drop and droplet format

The droplet hash (Subsets 1, 2, 3)

Assumptions and Clarifications

Assessment

building a concrete understanding of file system objects;

practising C, including byte-level operations and robust error handling; understanding file operations, including input-output operations on binary data

The Task

A file archive is a single file which can contain the contents, names and other metadata of multiple files. These can make backup and transport of files more convenient, and can often make compression more efficient. We often refer to tools that can create or manipulate these as file archivers.

There are a vast number of archive formats: on *nix-like systems, tar is common; whereas on Windows, Zip is common. Wikipedia's list of archive formats is a marvellous rabbit-hole to explore.

In this assignment, you will be implementing rain, a file archiver for the drop format.

The drop format is made up of one or more droplets; where a droplet records one file system object; This format is described in more detail below.

A complete implementation of rain can

list the path names of each object in a drop (subset 0); list the permissions of each object in a drop (subset 0); list the size (number of bytes) of files in a drop (subset 0); check the droplet magic number (subset 0);

extract files from a drop (subset 1);

check a drop for integrity, by checking droplet hashes; (subset 1); set the file permissions of files extracted from a drop (subset 1); create a drop from a list of files (subset 2);

list, extract, and create drops that include directories (subset 3); and extract, and create drops in 7-bit and G-bit formats (subset 3).

Getting Started

$ mkdir -m 700 rain

$ cd rain

$ 1521 fetch rain


Create a new directory for this assignment, change to this directory, and fetch the provided code by running

If you're not working at CSE, you can download the provided files as a zip file or a tar file. This will give you the following files:

rain.c

is the only file you need to change: it contains partial definitions of four functions, list_drop , check_drop , extract_drop , and create_drop , to which you need to add code to complete the assignment. You can also add your own functions to this file.

rain_main.c

contains a main, which has code to parse the command line arguments, and which then calls one of list_drop, extract_drop, create_drop, or check_drop, depending on the command line arguments given to rain. Do not change this file.

rain.h

contains shared function declarations and some useful constant definitions. Do not change this file.

rain_hash.c

contains the droplet_hash function; you should call this function to calculate hashes for subset 1. Do not change this file.

rain_6_bit.c

contains the droplet_to_6_bit and droplet_from_6_bit functions. You should call these to implement the G-bit format for subset 3. Do not change this file.

rain.mk

contains a Makefile fragment for rain.

$ make

dcc -c -o rain.o rain.c

dcc -c -o rain_main.o rain_main.c dcc -c -o rain_hash.o rain_hash.c dcc -c -o rain_6_bit.o rain_6_bit.c

dcc rain.o rain_main.o rain_hash.o rain_6_bit.o -o rain

$ ./rain -l a.drop

list_drop called to list drop: 'a.drop'

 

dcc

You can run make to compile the provided code; and you should be able to run the result.

If you are running make without  available you can compile like this:

$ make CC=gcc

gcc -c -o rain.o rain.c

gcc -c -o rain_main.o rain_main.c gcc -c -o rain_hash.o rain_hash.c gcc -c -o rain_6_bit.o rain_6_bit.c

gcc rain.o rain_main.o rain_hash.o rain_6_bit.o -o rain

$ ./rain -l a.drop

list_drop called to list drop: 'a.drop'

WARNING:

dcc does more error checking than gcc

Make sure your program can compile with dcc

$ dcc rain.c rain_main.c rain_hash.c rain_6_bit.c -o rain

$ ./rain -C b.drop

check_drop called to check drop: 'a.drop'


If you don't have make available you can compile like this:

If you don't have make or  available you can compile like this:

$ gcc -Wall rain.c rain_main.c rain_hash.c rain_6_bit.c -o rain

$ ./rain -C b.drop

check_drop called to check drop: 'a.drop'

WARNING:

dcc does more error checking than gcc

Make sure your program can compile with dcc

dcc

You may optionally create extra .c or .h files.

$ unzip examples.zip


You should run unzip to get a directory called examples/ full of .drop files to test your program against.

Subset 0

To complete subset 0, you need to implement code that can print a list of the contents of a drop, and

print a detailed list of the contents of a drop.

Subset 0: Print a list of the contents of a drop

-l

# List each item in the drop called text_file.drop, which is in the examples directory

$ ./rain -l examples/text_file.drop

hello.txt

# List each item in the drop called 4_files.drop, which is in the examples directory

$ ./rain -l examples/4_files.drop

256.bin hello.txt last_goodbye.txt these_days.txt

# List each item in the drop called hello_world.drop, which is in the examples directory

$ ./rain -l examples/hello_world.drop

hello.c hello.cpp hello.d hello.go hello.hs hello.java hello.js hello.pl hello.py hello.rs hello.s hello.sh hello.sql

Given the command-line argument, rain should print the path names of the files/directories in a drop. For example:

Subset 0: Print a detailed list of the contents of a drop

-L

Given the  command-line argument, rain should, for each file in the specified drop, print:

  1. the file/directory permissions,
  2. the droplet format which will be one of G, 7 or 8 (the default),
  3. the file/directory size in bytes, and
  4. $ ./rain -L examples/text_file.drop

    -rw-r--r-- 8 56 hello.txt

    # List the details of each item in the drop called 4_files.drop, which is in the examples directory

    $ ./rain -L examples/4_files.drop

    -rw-r--r-- 8 256 256.bin

    -rw-r--r-- 8 56 hello.txt

    -r--r--r-- 8 166 last_goodbye.txt

    -r--rw-r-- 8 148 these_days.txt

    # List the details of each item in the drop called hello_world.drop, which is in the examples directory

    $ ./rain -L examples/hello_world.drop

    -rw-r--r-- 8 93 hello.c

    -rw-r--r-- 8 82 hello.cpp

    -rw-r--r-- 8 65 hello.d

    -rw-r--r-- 8 77 hello.go

    -rw-r--r-- 8 32 hello.hs

    -rw-r--r-- 8 117 hello.java

    -rw-r--r-- 8 30 hello.js

    -rwxr-xr-x 8 47 hello.pl

    -rwxr-xr-x 8 103 hello.py

    -rw-r--r-- 8 45 hello.rs

    -rw-r--r-- 8 123 hello.s

    -rwxr-xr-x 8 41 hello.sh

    -rw-r--r-- 8 24 hello.sql


    the file/directory path name.

rain_main.c calls the function list_drop in rain.c when either of the -l or -L options are specified on the command line.

Add code to list_drop in rain.c . Use fopen to open the drop file. Use fgetc to read bytes.

Make sure you understand the droplet format specification below

Use C bitwise operations such as << & and | to combine bytes into integers. Think carefully about the functions you can construct to avoid repeated code.

Review print_borts_file.c from our week 8 tutorial and print_bytes.c from our week 8 lab.

fseek can be used to skip over parts of the drop file, but you can also use a loop and fgetc

HINT:

NOTE:

The order you list files is the order they appear in the drop.

drop files do not necessarily end with .drop. This has been done with the provided example files purely as a convenience.

Hint: use a format like "%5lu" to print the file size.

Subset 1

To complete subset 1, you need to implement code that can check the contents of a drop, and

extract files from a drop.

Subset 1: Check the contents of a drop

# Check the drop called 4_files.drop, which is in the examples directory

$ ./rain -C examples/4_files.drop

256.bin - correct hash hello.txt - correct hash last_goodbye.txt - correct hash these_days.txt - correct hash

# Check the drop called examples/hello_world.bad_hash.drop, which is in the examples directory

$ ./rain -C examples/hello_world.bad_hash.drop

hello.c - correct hash hello.cpp - correct hash hello.d - correct hash hello.go - correct hash hello.hs - correct hash hello.java - correct hash hello.js - correct hash hello.pl - correct hash hello.py - correct hash hello.rs - correct hash hello.s - correct hash hello.sh - correct hash

hello.sql - incorrect hash 0x19 should be 0x43

 

-C

Given the command-line argument, rain should check the hashes in the specified drop. For example:

It should also check the droplet magic number (first byte) of each droplet, and emit an error if it is incorrect.

# Check the drop called text_file.bad_magic.drop, which is in the examples directory

$ ./rain -C examples/text_file.bad_magic.drop

error: incorrect first droplet byte: 0x39 should be 0x63

HINT:

rain_main.c calls the function check_drop in rain.c when the -C option is specified on the command line. Add code to check_drop in rain.c .

Call droplet_hash to calculate hash values.

Think carefully about the functions you can construct to avoid repeated code.

For example, for every byte you read with fgetc you need to call droplet_hash to calculate a new hash value, so write a function that does both. Hint: have the function take a pointer to a hash value which it can update.

Subset 1: Extract files from a drop

-x

Given the command-line argument, rain should extract the files in the specified drop. It should set file permissions for extracted files to the permissions specified in the drop.

# rain will extract files into the current working directory.

# So as not to clutter your assignment directory, you should create a

# temporary directory, 'tmp', and change to it. Once in that directory,

# both your rain program and 'examples/' will be in its parent

# directory --- hence the use of '..' in these path names.

# Make a directory called tmp.

$ mkdir -p tmp/

# Change into the tmp directory.

$ cd tmp/

# Forcibly remove all files inside the tmp directory.

$ rm -f * .*

# Use your program to extract the contents of text_file.drop.

$ ../rain -x ../examples/text_file.drop

Extracting: hello.txt

# Show the contents of hello.txt in the terminal.

# You can manually open it in your text editor too, if you like.

$ cat hello.txt

Hello COMP1521

I hope you are enjoying this assignment.

# Forcibly remove all files inside the tmp directory.

$ rm -f * .*

# Use your program to extract the contents of hello_world.drop.

$ ../rain -x ../examples/hello_world.drop

Extracting: hello.c Extracting: hello.cpp Extracting: hello.d Extracting: hello.go Extracting: hello.hs Extracting: hello.java Extracting: hello.js Extracting: hello.pl Extracting: hello.py Extracting: hello.rs Extracting: hello.s Extracting: hello.sh Extracting: hello.sql

# Show the first 25 lines from the extracted files to confirm the extraction was successful.

$ cat $(echo * | sort) | head -n 25

extern int puts(const char *s);

int main(void)

{

puts("Hello, World!"); return 0;

}

#include <iostream>

int main () {

std::cout << "Hello, world!" << std::endl;

}

import std.stdio;

void main() {

writeln("Hello, world!");

}

package main import "fmt"

func main() {

fmt.Println("Hello, World!")

}

main = putStrLn "Hello, World!"

# Forcibly remove all files inside the tmp directory

$ rm -f * .*

# Use your program to extract the contents of meta.drop.

$ ../rain -x ../examples/meta.drop Extracting: 1_file.subdirectory.7-bit.drop Extracting: 1_file.subdirectory.drop Extracting: 2_files.7-bit.drop

Extracting: 2_files.drop Extracting: 3_files.7-bit.drop Extracting: 3_files.bad_hash.drop Extracting: 3_files.bad_magic.drop Extracting: 3_files.drop

Extracting: 3_files.subdirectory.7-bit.drop Extracting: 3_files.subdirectory.bad_hash.drop Extracting: 3_files.subdirectory.bad_magic.drop Extracting: 3_files.subdirectory.drop Extracting: 4_files.drop

Extracting: all_the_modes.subdirectory.7-bit.drop Extracting: all_the_modes.subdirectory.drop Extracting: all_three_formats.6-bit.drop Extracting: binary_file.drop

Extracting: hello_world.7-bit.drop Extracting: hello_world.bad_hash.drop Extracting: hello_world.bad_magic.drop Extracting: hello_world.drop

Extracting: lecture_code.subdirectory.7-bit.drop Extracting: lecture_code.subdirectory.drop Extracting: small.6-bit.drop

Extracting: small.7-bit.drop Extracting: small.drop Extracting: text_file.7-bit.drop

Extracting: text_file.bad_hash.drop Extracting: text_file.bad_magic.drop Extracting: text_file.drop Extracting: tiny.6-bit.drop Extracting: tiny.7-bit.drop Extracting: tiny.drop

# Show the first 10 items in this directory alphabetically to check extraction was successful.

$ ls -1 $(echo * | sort) | head 1_file.subdirectory.drop 1_file.subdirectory.compressed.drop 2_files.drop  2_files.compressed.drop 3_files.bad_hash.drop 3_files.bad_magic.drop

3_files.drop  3_files.compressed.drop 3_files.subdirectory.bad_hash.drop 3_files.subdirectory.bad_magic.drop

# Go back into the directory with your code.

$ cd ../

# Remove the tmp directory and everything inside it.

$ rm -rf tmp/

HINT:

rain_main.c calls the function extract_drop in rain.c when the -x option is specified on the command line. Add code to extract_drop in rain.c .

Use fopen to open each file you are extracting. Use fputc to write bytes to each file..

In our lectures on files we covered copying bytes to a file in the cp_fgetc.c example and setting the permissions of a file in the chmod.c example.

NOTE:

rain should overwrite an files that already exist.

rain can leave already extracted/partially extracted files in the event of an error.

Subset 2

To complete subset 2, you need to implement code that can create a drop from a list of files.

Subset 2: Create a drop from a list of files

# These "echo" lines show you how to create these test files and what their contents are.

# Create a file called hello.txt with the contents "hello".

$ echo hello >hello.txt

# Create a file called hola.txt with the contents "hola".

$ echo hola >hola.txt

# Create a file called hi.txt with the contents "hi".

$ echo hi >hi.txt

# Set the permissions of these files to 644 (octal permission string (equivalent to rw-r--r--)).

# When you list the contents of the drop, the permissions should match this.

$ chmod 644 hello.txt hola.txt hi.txt

# Create a drop called selamat.drop with the files hello.txt, hola.txt, and hi.txt.

$ ./rain -c selamat.drop hello.txt hola.txt hi.txt

Adding: hello.txt Adding: hola.txt Adding: hi.txt

# List the contents of selamat.drop.

$ ./rain -L selamat.drop

-rw-r--r-- 8  6 hello.txt

-rw-r--r-- 8  5 hola.txt

-rw-r--r-- 8  3 hi.txt

# Make a directory called tmp.

$ mkdir -p tmp/

# Change into the tmp directory.

$ cd tmp/

# Forcibly remove all files inside the tmp directory.

$ rm -f * .*

# Use your program to extract the contents of selamat.drop.

$ ../rain -x ../selamat.drop Extracting: hello.txt Extracting: hola.txt Extracting: hi.txt

# Check that the extracted file hello.txt is the same as the source file ../hello.txt.

$ diff -s ../hello.txt hello.txt

Files ../hello.txt and hello.txt are identical

# Check that the extracted file hola.txt is the same as the source file ../hola.txt.

$ diff -s ../hola.txt hola.txt

Files ../hola.txt and hola.txt are identical

# Check that the extracted file hi.txt is the same as the source file ../hi.txt.

$ diff -s ../hi.txt hi.txt

Files ../hi.txt and hi.txt are identical

# Go back into the directory with your code.

$ cd ../

# Remove the tmp directory and everything inside it.

$ rm -rf tmp/

 

-c

Given the  command-line argument, rain should create a drop containing the specified files.

It is also possible to append droplets to an existing drop file using the -a command-line option. For example:

$ ./rain -a bonjour.drop hello.txt

Adding: hello.txt

$ ./rain -L bonjour.drop

-rw-r--r-- 8  6 hello.txt

$ ./rain -a bonjour.drop hola.txt hi.txt

Adding: hola.txt Adding: hi.txt

$ ./rain -L bonjour.drop

-rw-r--r-- 8  6 hello.txt

-rw-r--r-- 8  5 hola.txt

-rw-r--r-- 8  3 hi.txt

HINT:

rain_main.c calls the function create_drop in rain.c when either of the -c or -a options are specified on the command line.

Add code to create_drop in rain.c .

Use fopen and fputc to create the new drop.

In our lectures on files we covered obtaining file metadata including its size and mode (permissions) in the stat.c example.

NOTE:

You must add/store files in the order they are given.

Subset 3

To complete subset 3, you need to implement code that can

create a drop from a list of files and directories, extract directories from a drop, and

manipulate G-bit and 7-bit storage formats.

Subset 3: Create a drop from a list of files and directories

# Create a drop called a.drop with the file "hello.txt" that is contained within 2 levels of directories.

$ ./rain -c a.drop examples/2_files.d/hello.txt

Adding: examples

Adding: examples/2_files.d

Adding: examples/2_files.d/hello.txt

 

-c

Given the  command-line argument, rain should be able to add files in sub-directories. For example:

If a directory is specified when creating a drop, rain should add the entire directory tree to the drop.

# Create a drop called a.drop with *all* the contents within the directory "3_files.subdirectory.d"

# which is in the "examples" directory.

$ ./rain -c a.drop examples/3_files.subdirectory.d

Adding: examples

Adding: examples/3_files.subdirectory.d

Adding: examples/3_files.subdirectory.d/goodbye

Adding: examples/3_files.subdirectory.d/goodbye/last_goodbye.txt Adding: examples/3_files.subdirectory.d/hello

Adding: examples/3_files.subdirectory.d/hello/hello.txt Adding: examples/3_files.subdirectory.d/these_days.txt

-L

Given the  command-line argument and a drop containing directories, rain should be able to list files and directories. For

$ ./rain -L examples/1_file.subdirectory.drop

drwxr-xr-x 8  0 hello

-rw-r--r-- 8 56 hello/hello.txt


example:

NOTE:

HINT:

In our lectures on files we covered listing a directory's contents in the list_directory.c example. Traversing a directory tree is challenging and can be done in several ways.

The rain reference implementation will add subdirectories in alphabetical order. You do not need to match this behaviour: your implementation can add subdirectories in any order.

If a file in a different directory is added to a drop, then the directories in the path need to be added to the drop.

Subset 3: Extract directories from a drop

-x

Given the command-line argument, and a drop containing directories, rain should be able to extract files and directories. For example:

$ ./rain -x examples/3_files.subdirectory.drop

Creating directory: goodbye Extracting: goodbye/last_goodbye.txt Creating directory: hello Extracting: hello/hello.txt Extracting: these_days.txt

In our lectures on files we covered creating a directory in the mkdir.c example

HINT:

NOTE:

When extracting a drop with directories, the directory needs to be created if it does not already exist, and its permissions need to be set to those specified in the drop.

Subset 3: Manipulate S-bit and 7-bit storage formats

$ ./rain -7 -c seven.drop hello.txt

Adding: hello.txt

$ ./rain -L seven.drop

-rw-r--r-- 7  6 hello.txt

$ ./rain -6 -c six.drop hola.txt hi.txt

Adding: hola.txt Adding: hi.txt

$ ./rain -L six.drop

-rw-r--r-- 6  5 hola.txt

-rw-r--r-- 6  3 hi.txt

 

-7

-6

The  and  options allow droplets to be created in 7-bit and G-bit format. For example:

It is possible for drops to contain droplets in multiple formats. For example:

$ ./rain -a mixed.drop hello.txt

Adding: hello.txt

$ ./rain -L mixed.drop

-rw-r--r-- 8  6 hello.txt

$ ./rain -7 -a mixed.drop hi.txt

Adding: hi.txt

$ ./rain -L mixed.drop

-rw-r--r-- 8  6 hello.txt

-rw-r--r-- 7  3 hi.txt

$ ./rain -6 -a mixed.drop hola.txt

Adding: hola.txt

$ ./rain -L mixed.drop

-rw-r--r-- 8  6 hello.txt

-rw-r--r-- 7  3 hi.txt

-rw-r--r-- 6  5 hola.txt

Your code should handle creating, listing, checking, and extracting drops in 7-bit and G-bit format.

Your code should produce an error if asked to create a droplet containing bytes which can be encoded in the specified format. For example:

$ echo Hello >Hello.txt

$ ./rain -6 -c broken.drop Hello.txt

error: byte 0x48 can not be represented in 6-bit format

HINT:

The functions droplet_to_6_bit and droplet_from_6_bit in rain_6_bit.c convert 8-bit values to and from G-bit format.

Handling Errors

Error checking is an important part of this assignment. Automarking will test error handling.

stderr

stdout

Error messages should be one line (only) and be written to  (not  ).

exit

rain should  with status 1 after an error.

rain should check all file operations for errors.

As much as possible match the reference implementation error messages exactly.

The reference implementation uses perror to report errors from file operations and other system calls.

It is not necessary to remove files and directories already created or partially created when an error occurs. You may extract a file or directory from droplet before determining if the droplet hash is correct.

You can extract previous file or directory from a droplet.

Where multiple errors messages could be produced, for example, if two non-existent files are specified to be added to a drop, rain may produce any one of the error messages.

Reference implementation

A reference implementation is a common, efficient, and effective method to provide or define an operational specification; and it's something you will likely work with after you leave UNSW.

1521 rain

We've provided a reference implementation, any input:

 

, which you can use to find the correct outputs and behaviours for

COMP1521 - 23T1 Outline Timetable Forum Assignment 2

$ 1521 rain -L examples/tiny.6-bit.drop

-rw-r--r-- 6  0 a

1521 rain

./rain

Every concrete example shown below is runnable using the reference implementation; run                  instead of  .

Where any aspect of this assignment is undefined in this specification, you should match the behaviour exhibited by the reference implementation. Discovering and matching the reference implementation's behaviour is deliberately a part of this assignment.

If you discover what you believe to be a bug in the reference implementation, please report it in the class forum. If it is a bug, we may fix the bug; or otherwise indicate that you do not need to match the reference implementation's behaviour in that specific case.

The drop and droplet format

drops must follow exactly the format produced by the reference implementation.

A drop consists of a sequence of one or more droplets. Each droplet contains the information about one file or directory.


The first byte of a drop file is the first byte of the first droplet. That droplet is immediately followed by either another droplet, or by the end of the drop file.

name  length                           type                   description

magic  1 B

number

unsigned, 8- byte 0 in every droplet must be 0xG3 (ASCII 'c' ) bit, little-

endian

droplet  1 B                                    unsigned, 8- byte 1 in every droplet must be one of 0x3G, 0x37, 0x38

'6' , '7' , '8'

format

 

bit, little- endian

 

(ASCII                             )

permissions 10 B

characters bytes 2—11 are the type and permissions as a ls-like character array; e.g., "-rwxr-xr-x"

pathname 2 B                                             unsigned, 1G- bytes 12—13 are an unsigned 2-byte (1G-bit) little-endian

length

 

bit, little- endian

 

integer, giving the length of

pathname pathname-length

characters the filename of the object in this droplet.

content  G B                                     unsigned, 48- the next bytes are an unsigned G-byte (48-bit) little-endian

length

 

bit, little- endian

 

integer giving the length of the file that was encoded to give

content  content-length for 8-bit

format, see below for other formats

 

bytes                  the data of the object in this droplet.

hash               1 B                                             unsigned, 8- the last byte of a droplet is a droplet-hash of all bytes of this

bit, little- endian

 

droplet except this byte.

droplet content encodings (Subset 3 only)

8-bit format (droplet format == 0x38 contents is an array of bytes, which are exactly equivalent to the bytes in the original file.

7-bit format (droplet format == 0x37) contents is an array of bytes representing packed seven-bit values, with the trailing bits set to zero. Every byte of the original file is taken as a seven-bit value, and packed as described below. This format can store any seven bit value — so, for example, any byte containing valid ASCII can be stored.

This format needs (7.0/8) content-lengthbytes. 7-bit format is used only in subset 3.

C-bit format (droplet format == 0x3C) contents is an array of bytes of packed six-bit values where the trailing bits in the last byte are zero, and which are translated using the functions droplet_to_6_bit and droplet_from_6_bit in rain_6_bit.c.

This format cannot store all ASCII values, for example upper case letters can't be stored in G-bit format. This format needs (6.0/8) content-lengthbytes.

G-bit format is used only in subset 3.

Packed n-bit encoding (Subset 3 only)

We often store smaller values inside larger types. For example, the integer 42 only needs six bits; but we often will store it in a full thirty-two-bit integer, wasting many bits of zeroes. Assuming we know how many bits the value needs, we could only store the relevant bits.

For example, let's say we have three seven-bit values a, b, c, made up of arbitrary bit-strings, and stored in eight-bit variables

0b0AAA_AAAA

0b0BBB_BBBB

0b0CCC_CCCC

a:                         ,

b:                        ,

c:                         ,

0bAAAA_AAAB_BBBB_BBCC_CCCC_C???


then a packed seven-bit encoding of these values in order would be:

However, we have a problem: what happens to the trailing bits, which don't have a value? Note that we've defined all trailing bits to be zero above, which would here give:

0bAAAA_AAAB_BBBB_BBCC_CCCC_C000

Inspecting drops and droplets

The hexdump utility can show the individual bytes of a file. We can use this to inspect drops and droplets. For example, here is a drop, made up of two droplets.

$ hexdump -vC examples/2_files.drop

00000000 63 38 2d 72 77 2d 72 2d 2d 72 2d 2d 09 00 68 65 |c8-rw-r--r--..he|

00000010 6c 6c 6f 2e 74 78 74 38 00 00 00 00 00 48 65 6c |llo.txt8........................ Hel|

00000020 6c 6f 20 43 4f 4d 50 31 35 32 31 0a 49 20 68 6f |lo COMP1521.I ho|

00000030 70 65 20 79 6f 75 20 61 72 65 20 65 6e 6a 6f 79 |pe you are enjoy|

00000040 69 6e 67 20 74 68 69 73 20 61 73 73 69 67 6e 6d |ing this assignm|

00000050 65 6e 74 2e 0a 2d 63 38 2d 72 77 2d 72 2d 2d 72 |ent..-c8-rw-r--r|

00000060 2d 2d 10 00 6c 61 73 74 5f 67 6f 6f 64 62 79 65 |--..last_goodbye|

00000070 2e 74 78 74 a6 00 00 00 00 00 54 68 69 73 20 69 |.txt......This i|

00000080 73 20 6f 75 72 20 6c 61 73 74 20 67 6f 6f 64 62 |s our last goodb|

00000090 79 65 0a 49 20 68 61 74 65 20 74 6f 20 66 65 65 |ye.I hate to fee|

000000a0 6c 20 74 68 65 20 6c 6f 76 65 20 62 65 74 77 65 |l the love betwe|

000000b0 65 6e 20 75 73 20 64 69 65 0a 42 75 74 20 69 74 |en us die.But it|

000000c0 27 73 20 6f 76 65 72 0a 4a 75 73 74 20 68 65 61 |'s over.Just hea|

000000d0 72 20 74 68 69 73 20 61 6e 64 20 74 68 65 6e 20 |r this and then |

000000e0 49 27 6c 6c 20 67 6f 0a 59 6f 75 20 67 61 76 65 |I'll go.You gave|

000000f0 20 6d 65 20 6d 6f 72 65 20 74 6f 20 6c 69 76 65 | me more to live|

00000100 20 66 6f 72 0a 4d 6f 72 65 20 74 68 61 6e 20 79 | for.More than y|

00000110 6f 75 27 6c 6c 20 65 76 65 72 20 6b 6e 6f 77 0a |ou'll ever know.|

00000120 60                                                                                    |`|

00000121

0x00000000

Each line of hexdump output is in three groups:

the address column: this starts at

 

, and increases by 0x10 (or 1G in base 10) each line;

the data columns: after the address, we get (up to) 1G two-digit hexadecimal values, grouped into two blocks of eight values each, which represents the actual data of the file, and

|

the human readable stripe: at the very end of each line, between the vertical bars ( ) is the human readable version

'.'

of the bytes preceding, or a if the byte wouldn't ordinarily be visible.

You could also use the hd, od, or xxd utilities instead of hexdump . Also provided for the assignment is 1521 dump_drop

$ 1521 dump_drop examples/2_files.drop

Field Name

Field Offset Field Hex

Field ASCII  Field Numeric

magic format mode

0x00000000       63

0x00000001       38

c 8

0x00000002 2d 72 77 2d 72 2d 2d 72 2d 2d -rw-r--r--

path length 0x0000000c 09 00 pathname            0x0000000e

content length 0x00000017 38 00 00 00 00 00 contents              0x0000001d

"""

Hello COMP1521.I hope you are enjoying this assignment.. """

9

hello.txt

56

hash                    0x00000055 2d

Field Name

Field Offset Field Hex

Field ASCII  Field Numeric

magic format mode

0x00000056       63

0x00000057       38

c 8

0x00000058 2d 72 77 2d 72 2d 2d 72 2d 2d -rw-r--r--

path length 0x00000062 10 00 pathname            0x00000064

content length 0x00000074 a6 00 00 00 00 00 contents              0x0000007a

"""

16

last_goodbye.txt

166

This is our last goodbye.I hate to feel the love between us die.But it's over.Just hear this and then I'll go.You gave me more to live for.More than you'll ever know.

"""

hash                    0x00000120        60


which prints the contents of a drop in a mroe structured way, e.g.:

S-bit format (Subset 3 only)

droplet G-bit format defines a subset of G4 8-bit values (bytes) to have a six-bit encoding; those six bits are then stored packed.

The remaining 1U2 8-bit values can not be encoded in G-bit format.

The functions droplet_to_6_bit and droplet_from_6_bit in rain_6_bit.c to convert 8-bit values to and from G-bit format. You can find the mapping by reading the code in rain_6_bit.c.

The droplet hash (Subsets 1, 2, 3)

The droplet_hash() function makes one step of computation of the hash of a sequence of bytes:

uint8_t droplet_hash(uint8_t current_hash_value, uint8_t byte_value) {

return ((current_hash_value * 33) & 0xff) ^ byte_value;

}


Each droplet ends with a hash (sometimes referred to as a digest) which calculated from the other values of the droplet. This allows us to detect if any bytes of the drop have changed, for example by disk or network errors.

Given the hash value of the sequence up to this byte, and the value of this byte it calculates the new hash value. If we create a drop of a single one-byte file, like this:

$ echo >a

$ 1521 rain -c a.drop a

$ hexdump -Cv a.drop

00000000 63 38 2d 72 77 2d 72 2d 2d 72 2d 2d 01 00 61 01 |c8-rw-r--r--..a.|

00000010 00 00 00 00 00 0a 15                                                        |............ |

00000017

 

0x15

We can then inspect the drop, and see its hash is  .

Here's the sequence of calls that calculated that value:

droplet_hash(0x00, 0x63) = 0x63 droplet_hash(0x63, 0x38) = 0xfb droplet_hash(0xfb, 0x2d) = 0x76 droplet_hash(0x76, 0x72) = 0x44 droplet_hash(0x44, 0x77) = 0xb3 droplet_hash(0xb3, 0x2d) = 0x3e droplet_hash(0x3e, 0x72) = 0x8c droplet_hash(0x8c, 0x2d) = 0x21 droplet_hash(0x21, 0x2d) = 0x6c droplet_hash(0x6c, 0x72) = 0x9e droplet_hash(0x9e, 0x2d) = 0x73 droplet_hash(0x73, 0x2d) = 0xfe droplet_hash(0xfe, 0x01) = 0xbf droplet_hash(0xbf, 0x00) = 0x9f droplet_hash(0x9f, 0x61) = 0x1e droplet_hash(0x1e, 0x01) = 0xdf droplet_hash(0xdf, 0x00) = 0xbf droplet_hash(0xbf, 0x00) = 0x9f droplet_hash(0x9f, 0x00) = 0x7f droplet_hash(0x7f, 0x00) = 0x5f droplet_hash(0x5f, 0x00) = 0x3f droplet_hash(0x3f, 0x0a) = 0x15

Assumptions and Clarifications


Like all good programmers, you should make as few assumptions as possible. If in doubt, match the output of the reference implementation.

Your submitted code must be a single C program only. You may not submit code in other languages.

stdio.h

You can call functions from the C standard library available by default on CSE Linux systems: including, e.g.,                   ,

stdlib.h , string.h , math.h , assert.h .


We will compile your code with dcc when marking. Run-time errors from illegal or invalid C will cause your code to fail automarking (and will likely result in you losing marks).

Your program must not require extra compile options. It must compile successfully with:

$ dcc *.c -o rain

You may not use functions from other libraries. In other words, you cannot use the dcc  flag.

 

-l

If your program prints debugging output, it will fail automarking tests. Make sure you disable any debugging output before submission.

You may not create or use temporary files.

You may not create subprocesses: you may not use posix_spawn, posix_spawnp, system, popen, fork, vfork, clone, or

 

exec*

any of the  family of functions, like execve .

You may assume that the length of a drop is less than the maximum value supported by a  .

rain only has to handle ordinary files and directories.

long

rain does not have to handle symbolic links, devices or other special files.

rain will not be given directories containing symbolic links, devices or other special files.

 

rain

rain does not have to handle hard links.

If completing a  command would produce multiple errors, you may produce any of the errors and stop.


You do not have to produce the particular error that the reference implementation does.

If a droplet path name contains a directory then a droplet for the directory will appear in the drop beforehand.

a/b/file.txt

For example, if there is a droplet for the path name                         then there will be preceding droplets for the

a

a/b

directories and  ,

You may also assume the droplet for the directory specifies the directory is writable.

When adding an entire directory (subset 3) to a drop you may add the directory contents in any order to the drop, after the directory droplet.

You do not have to match the order the reference implementation uses.

rain

When a  command specifies adding files with a common sub-directory. You may add a droplet for the sub-

$ ./rain -c a.drop b/file1 b/file2

directory multiple times. For example, given this command:

b

You may add two (duplicate) droplets for .

You can assume the path name of a drop being created with directory being added to the drop.

 

-c

, will not also be added to the drop, and will not be in a

-L

It is not necessary to check the hashes or magic numbers of droplets in subset 0. Subset 0 tests will only use valid droplets.

-l

-x

The reference implementation checks the magic number (first byte), format and hash when listing ( extracting ( ) drops. and stops with an error emssage if they are are invalid, for example:

 

and

 

) and

$ ./rain -l examples/text_file.bad_hash.drop

error: incorrect droplet hash 0x2d should be 0x77

$ ./rain -L examples/text_file.bad_magic.drop

error: incorrect first droplet byte: 0x39 should be 0x63

-l , -L

This is very desirable behaviour and you can implement this in your code. However it will not be tested with

-x

and command line options to avoid problems in automarking.

 

-C

Your code will only be tested with the  option on drops with invalid hashes, magic numbers and formats

It is not necessary to check the hashes or magic numbers in an existing drop when appending to it (-a).

If you need clarification on what you can and cannot use or do for this assignment, ask in the class forum. You are required to submit intermediate versions of your assignment. See below for details.

Assessment

Testing

autotest

$ 1521 autotest rain rain.c [optionally: any extra .c or .h files]

When you think your program is working, you can use

to run some simple automated tests:

1521 autotest

will not test everything.

Always do your own testing.

autotest

Automarking will be run by the lecturer after the submission deadline, using a superset of tests to those you.

 

runs for

WARNING:

Whilst we can detect errors have occurred, it is often substantially harder to automatically explain what that error was. As you continue into later subsets. the errors from 1521 autotest will become less and less clear or useful. You will need to do your own debugging and analysis.

Submission

$ give cs1521 ass2_rain rain.c [optionally: any extra .c or .h files]

 

give

give

When you are finished working on the assignment, you must submit your work by running  :

You must run  before Week 11 Monday 09:00:00 to obtain the marks for this assignment. Note that this is an

give

individual exercise, the work you submit with  must be entirely your own.

give

You can run  multiple times.

Only your last submission will be marked.

If you are working at home, you may find it more convenient to upload your work via give's web interface. You cannot obtain marks by emailing your code to tutors or lecturers.

$ 1521 classrun check ass2_rain


You can check your latest submission on CSE servers with:

You can check the files you have submitted here.

Manual marking will be done by your tutor, who will mark for style and readability, as described in the Assessment section below. After your tutor has assessed your work, you can view your results here; The resulting mark will also be available via give's web interface.

Due Date

This assignment is due Week 11 Monday 09:00:00 (2023-04-24 0U:00:00).

The UNSW standard late penalty for assessment is 5% per day for 5 days - this is implemented hourly for this assignment. Your assignment mark will be reduced by 0.2% for each hour (or part thereof) late past the submission deadline.

For example, if an assignment worth G0% was submitted half an hour late, it would be awarded 5U.8%, whereas if it was submitted past 10 hours late, it would be awarded 57.8%.

Beware - submissions 5 or more days late will receive zero marks. This again is the UNSW standard assessment policy.

Assessment Scheme

This assignment will contribute 15 marks to your final COMP1521 mark.

80% of the marks for assignment 2 will come from the performance of your code on a large series of tests.

20% of the marks for assignment 2 will come from hand marking. These marks will be awarded on the basis of clarity, commenting, elegance and style. In other words, you will be assessed on how easy it is for a human to read and understand your program.


An indicative assessment scheme for performance follows. The lecturer may vary the assessment scheme after inspecting the assignment submissions, but it is likely to be broadly similar to the following:

HD (90+%)  well documented code,

very readable code,


subsets 0-3 working for all drops.

DN (80+%) some documentation in code, readable code,


subsets 0-2 working for all drops.

CR (70%)  some documentation in code, readable code,


subset 0-1 working for all drops.

PS (C0%)  subset 0 working for all drops.

0%                        knowingly providing your work to anyone

and it is subsequently submitted (by anyone).

0 FL for COMP1521

academic misconduct

 

submitting any other person's work; this includes joint work.

submitting another person's work without their consent; paying another person to do work for you.


An indicative assessment scheme for style follows. The lecturer may vary the assessment scheme after inspecting the assignment submissions, but it is likely to be broadly similar to the following:

100% for style  perfect style


90% for style  great style, almost all style characteristics perfect.

80% for style  good style, one or two style characteristics not well done.


70% for style  good style, a few style characteristics not well done.

C0% for style  ok style, an attempt at most style characteristics.

50% for style  an attempt at style.

An indicative style rubric follows: Formatting (C/20):

1 + 2

Whitespace (e.g.

 

1+2

instead of  )

Indentation (consistent, tabs or spaces are okay)


Line length (below 100 characters unless very exceptional) Line breaks (using vertical whitespace to improve readability)

Documentation (8/20):

Header comment (with name, zID, description of program) Function comments (above each function with a description)

Descriptive variable names (e.g. char *home_directory instead of char *h ) Descriptive function names (e.g. get_home_directory instead of get_hd )

Sensible commenting throughout the code (don't comment every single line; leave comments when necessary) Elegance (5/20):

Does this code avoid redundancy? (e.g. Don't repeat yourself!)


Are helper functions used to reduce complexity? (functions should be small and simple where possible) Are constants appropriately created and used? (magic numbers should be avoided)

Portability (1/20):

Would this code be able to compile and behave as expected on other POSIX-compliant machines? (using standard libraries without platform-specific code)


Note that the following penalties apply to your total mark for plagiarism:

0 for asst2  knowingly providing your work to anyone

and it is subsequently submitted (by anyone).

0 FL for COMP1521

academic misconduct

 

submitting any other person's work; this includes joint work.

submitting another person's work without their consent; paying another person to do work for you.

COMP1521 23T1: Computer Systems Fundamentals is brought to you by the School of Computer Science and Engineering

at the University of New South Wales, Sydney.

For all enquiries, please email the class account at cs1521@cse.unsw.edu.au

CRICOS Provider 000U8G

Intermediate Versions of Work

You are required to submit intermediate versions of your assignment.

Every time you work on the assignment and make some progress you should copy your work to your CSE account and

give

submit it using the  command below. It is fine if intermediate versions do not compile or otherwise fail submission

tests. Only the final submitted version of your assignment will be marked.


Assignment Conditions

Joint work is not permitted on this assignment.

This is an individual assignment. The work you submit must be entirely your own work: submission of work even partly written by any other person is not permitted.

Do not request help from anyone other than the teaching staff of COMP1521 — for example, in the course forum, or in help sessions.

Do not post your assignment code to the course forum. The teaching staff can view code you have recently submitted with give, or recently autotested.

Assignment submissions are routinely examined both automatically and manually for work written by others.


Rationale: this assignment is designed to develop the individual skills needed to produce an entire working program. Using code written by, or taken from, other people will stop you learning these skills. Other CSE courses focus on skills needed for working in a team.

The use of generative tools such as Github Copilot, ChatGPT, Google Bard is not permitted on this assignment.


Rationale: this assignment is designed to develop your understanding of basic concepts. Using synthesis tools will stop you learning these fundamental concepts, which will significantly impact your ability to complete future courses.

Sharing, publishing, or distributing your assignment work is not permitted.

Do not provide or show your assignment work to any other person, other than the teaching staff of COMP1521. For example, do not message your work to friends.

Do not publish your assignment code via the Internet. For example, do not place your assignment in a public GitHub repository.

Rationale: by publishing or sharing your work, you are facilitating other students using your work. If other students find your assignment work and submit part or all of it as their own work, you may become involved in an academic integrity investigation.

Sharing, publishing, or distributing your assignment work after the completion of COMP1521 is not permitted. For example, do not place your assignment in a public GitHub repository after this offering of COMP1521 is over.

Rationale: COMP1521 may reuse assignment themes covering similar concepts and content. If students in future terms find your assignment work and submit part or all of it as their own work, you may become involved in an academic integrity investigation.

Violation of any of the above conditions may result in an academic integrity investigation, with possible penalties up to and including a mark of 0 in COMP1521, and exclusion from future studies at UNSW. For more information, read the UNSW Student Code, or contact the course account.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值