HW2: Numpy for Multi-Layer Neural Network

大气层煮月亮

已于 2022-10-28 12:44:42 修改

阅读量882

点赞数 1

分类专栏： # 数学家的numpy # Np手撕神经网络文章标签：深度学习人工智能 tensorflow

于 2022-10-15 09:08:15 首次发布

本文链接：https://blog.csdn.net/qq_51831335/article/details/127330957

版权

数学家的numpy 同时被 2 个专栏收录

6 篇文章 0 订阅

订阅专栏

Np手撕神经网络

6 篇文章 1 订阅

订阅专栏

Before

This homework is intended to give you an introduction to building, training, and testing neural network models. You will not only be exposed to using Python packages to build a neural network from scratch, but also the mathematical aspects of backpropagation and gradient descent. While in practical scenarios, you won’t necessarily have to implement neural networks from scratch (as you will see in future labs and assignments), this assignment aims at giving you a rudimentary idea of what goes on under the hood in packages such as TensorFlow and Keras. In this assignment, you will use the MNIST handwritten digits dataset to train a simple classification neural network using batch learning and evaluate your model.

中文（Chinese）： http://t.csdn.cn/2AyUX

Link to this article .pdf file

---------------------------------------------------------------------------------------------------------------------------------

password：v6zc

https://pan.baidu.com/s/1_qFbN0Nhc8MNYCHEyTbkLg%C2%A0

HW2: Numpy for Multi-Layer Neural Network

Before

Provide finished code files：

1. Preprocessing the Data

2. One Hot Encoding

3. Core Abstractions

4. Layers

5. Activation Functions

6. Filling in the model

11. Visualizing Results

CS1470 Students

CS2470 Students

IMPORTANT!

Refer to the answer：http://t.csdn.cn/9sdBN

Provide finished code files：

Get Files:

link：https://pan.baidu.com/s/1Fw_7thL5PxR79zI6XbpnYQ
password：txqe

Files Structrue：

| - hw2

| - code

| - Beras

| - eight files extension is .py

| - assignment.py

| - preprocess.py

| - visualize.py

| - data

| - mnist

| - four files of minst dataset

Conceptual Questions

Please submit the Conceptuals Questions on Gradescope under hw2-mlp conceptual .

You must type your submissions and upload a PDF. We recommend using LaTeX.

Getting the Stencil

Github Classroom Link with Stencil Code!

Guide on GitHub and GitHub Classroom

Getting the Data

You can use download.sh for downloading the data. You can run a bash script with the

command ./script_name.sh (ex: bash ./download.sh ). This is similar to HW1. Setup

Work off of the stencil code provided, but do not change the stencil except where specified.

Doing so may cause incompatibility with the autograder. Don’t change any method signatures!

This assignment requires NumPy and Matplotlib. You should already have this from HW1. Also

check the virtual environment guide to set up TensorFlow 2.5 on your local machine. Feel free to

reference this guide if you have to work through colab.

Assignment Overview

In this assignment, you will be constructing a Keras mimic, Beras (haha funny name), and will

make a sequential model specification that mimics the Tensorflow/Keras API. The Python

notebook associated with this homework is meant for you to explore an example implementation

so that you can build on it yourself! There are no TODOs for you to work on in the notebook ;

rather, the testing is done by running the main method of assignment.py .

Our stencil provides a model class with several methods and hyperparameters you need to use

for your network. You will also answer conceptual questions related to the assignment and class

material (don’t forget to answer the 2470-only questions if you’re a 2470 student!). You should

include a brief README with your model's accuracy and any known bugs.

Before Starting

This homework assignment is due two weeks from release. Labs 1-3 provide great practice for

this assignment, so you can wait a little for them if you get stuck. Specifically:

Implementing Callable/Diffable Components : The skills you need in order to do this

can be found by working through Lab 1 . This includes comfort with mathematical

notation, matrix operations, and the logic behind call and gradient methods.

Implementing Optimizers: You can implement the BasicOptimizer class just by

following the logic from the gradient_descent method in lab 1. The other optimizers

(i.e. Adam, RMSProp) will be covered in Lab 2: Optimizers .

Using batch_step and GradientTape: You can figure out how to use these to train your

model based on the assignment instructions and your implementations of these. With

that said, they do mimic the Keras API. You’ll learn about all this in Lab 3: Intro to

Tensorflow . If your lab is after the due date, it should be fine; just skim over the

complementary notebook associated with the lab.

Feel free to start off by doing what you can and then add onto it as you learn more about deep

learning and realize that the same concepts you learn in the class can actually be used here!

Do not get discouraged, and try to have fun! Roadmap

For this assignment, we'll walk you through the pipeline of training a neural net, including the

structure of the model class and the methods you will have to fill in.

1. Preprocessing the Data

Note: Code for preprocessing should be pulled in from HW1.

Before training a network, you will need to clean your data. This includes retrieving,

altering, and formatting the data into the inputs for your network. For this assignment,

you will be working on the MNIST dataset. It can be downloaded through the

download.sh script, but it’s also linked here (ignore that it says hw1; we’re using this

dataset for hw2 this time!). The original data source is here .

You should train your network using only the training data and then test your network's

accuracy on the testing data. Your program should print its accuracy over the test

dataset upon completion.

2. One Hot Encoding

Before training or testing your model, you will need to “one-hot” encode your class labels

so that the model can optimize towards predicting any desired class. Note that the class

labels by themselves are simply categories and do not mean anything numerically. In the

absence of one-hot encoding, your model might learn some natural ordering between

the different class labels based on the labels (which are arbitrary).

For example, let’s say there’s a data point A which corresponds to label ‘2’ and a data

point B which corresponds to label ‘7’. We don’t want the model to somehow learn that B

has a higher weightage than A simply because, numerically speaking, 7 > 2.

To one-hot encode your class labels, you will

have to convert your 1-dimensional label vector

into a vector of size num_classes (where

num_classes is the total number of classes in

your dataset). For the MNIST dataset, it looks

something like the matrix on the right:

You have to fill out the following method in Beras/onehot.py

● fit() : [TODO] In this function you need to fetch all the unique labels in the

data (store this in self.uniq ) and create a dictionary with labels as the keys

and their corresponding one hot encodings as values. Hint: You might want to

look at np.eye() to get the one-hot encodings. Ultimately, you will store this

dictionary in self.uniq2oh .

● forward() : In this function, we pass a vector of all the actual labels in the

training set and call fit() to populate the uniq2oh dictionary with unique

labels and their corresponding one-hot encoding and then use it to return an

array of one-hot encoded labels for each label in the training set. This function

has already been filled out for you!

● inverse() : In the function, we reverse the one-hot encoding back to the actual

label. This has already been done for you.

For example, if we have labels X and Y with one-hot encodings of [1,0] and [0,1], we’d

want to create a dictionary as follows: {X: [1,0], Y: [0,1]}. As shown in the image above,

for MNIST, you will have 10 labels, so your dictionary should have ten entries!

You may notice that some classes inherit from Callable or Diffable. More on this

in the next section!

import numpy as np

from .core import Callable


class OneHotEncoder(Callable):
    """
    One-Hot Encodes labels. First takes in a candidate set to figure out what elements it
    needs to consider, and then one-hot encodes subsequent input datasets in the
    forward pass.

    SIMPLIFICATIONS:
     - Implementation assumes that entries are individual elements.
     - Forward will call fit if it hasn't been done yet; most implementations will just error.
     - keras does not have OneHotEncoder; has LabelEncoder, CategoricalEncoder, and to_categorical()
    """

    def fit(self, data):
        """
        Fits the one-hot encoder to a candidate dataset. Said dataset should contain
        all encounterable elements.

        :param data: 1D array containing labels.
            For example, data = [0, 1, 3, 3, 1, 9, ...]
        """
        ## TODO: Fetch all the unique labels and create a dictionary with
        ## the unique labels as keys and their one hot encodings as values
        ## HINT: look up np.eye() and see if you can utilize it!

        ## HINT: Wouldn't it be nice if we just gave you the implementation somewhere...

        self.uniq = None  # all the unique labels from `data`
        self.uniq2oh = None  # a lookup dictionary with labels and corresponding encodings

    def forward(self, data):
        if not hasattr(self, "uniq2oh"):
            self.fit(data)
        return np.array([self.uniq2oh[x] for x in data])

    def inverse(self, data):
        assert hasattr(self, "uniq"), \
            "forward() or fit() must be called before attempting to invert"
        return np.array([self.uniq[x == 1][0] for x in data])

3. Core Abstractions

Consider the following abstract classes of modules. Be sure to play around with the

Python notebook associated with this homework to get a good grip of the core

abstraction modules defined for you in Beras/core.py ! The notebook is exploratory in

nature (it is NOT required and all of the code is given) and will provide you with lots of

insights into understanding and using these class abstractions! Note that these modules

are very similar to the Tensorflow/Keras API.

Callable: A function with a well-defined forward function. These are the ones you’ll need

to implement:

● CategoricalAccuracy (./metrics.py): Computes the accuracy of predicted

probabilities against a list of ground-truth labels. As accuracy is not optimized for,

there is no need to compute its gradient. Furthermore, categorical accuracy is

piecewise discontinuous, so the gradient would technically be 0 or undefined.

● OneHotEncoder (./onehot.py) : You can one-hot encode a class instance into a

probability distribution to optimize for classifying into discrete options

Diffable: A callable which can also be differentiated. We can use these in our pipeline

and optimize through them! Thus, most of these classes are made for use in your neural

network layers. These are the ones you’ll need to implement:

Example: Consider a Dense layer instance. Let s represents the input size (source), d

represents the output size (destination), and b represents the batch size. Then:

GradientTape: This class will function exactly like tf.GradientTape() (See lab 3).

You can think of a GradientTape as a logger. Every time an operation is performed within

the scope of a GradientTape, it records which operation occurred. Then, during

backprop, we can compute the gradient for all of the operations. This allows us to

differentiate our final output with respect to any intermediate step. When operations are

computed outside the scope of GradientTape, they aren’t recorded, so your code will

have no record of them and cannot compute the gradients.

You can check out how this is implemented in core! Of course, Tensorflow’s gradient

tape implementation is a lot more complicated and involves constructing a graph.

● [TODO] Implement the gradient method, which returns a list of gradients corresponding to the list of trainable weights in the network. Details in the code.

from abc import ABC, abstractmethod  # # For abstract method support
from typing import Tuple

import numpy as np


## DO NOT MODIFY THIS CLASS
class Callable(ABC):
    """
    Callable Sub-classes:
     - CategoricalAccuracy (./metrics.py)       - TODO
     - OneHotEncoder       (./preprocess.py)    - TODO
     - Diffable            (.)                  - DONE
    """

    def __call__(self, *args, **kwargs) -> np.array:
        """Lets `self()` and `self.forward()` be the same"""
        return self.forward(*args, **kwargs)

    @abstractmethod
    def forward(self, *args, **kwargs) -> np.array:
        """Pass inputs through function. Can store inputs and outputs as instance variables"""
        pass


## DO NOT MODIFY THIS CLASS
class Diffable(Callable):
    """
    Diffable Sub-classes:
     - Dense            (./layers.py)           - TODO
     - LeakyReLU, ReLU  (./activations.py)      - TODO
     - Softmax          (./activations.py)      - TODO
     - MeanSquaredError (./losses.py)           - TODO
    """

    """Stores whether the operation being used is inside a gradient tape scope"""
    gradient_tape = None  ## All-instance-shared variable

    def __init__(self):
        """Is the layer trainable"""
        super().__init__()
        self.trainable = True  ## self-only instance variable

    def __call__(self, *args, **kwargs) -> np.array:
        """
        If there is a gradient tape scope in effect, perform AND RECORD the operation.
        Otherwise... just perform the operation and don't let the gradient tape know.
        """
        if Diffable.gradient_tape is not None:
            Diffable.gradient_tape.operations += [self]
        return self.forward(*args, **kwargs)

    @abstractmethod
    def input_gradients(self: np.array) -> np.array:
        """Returns gradient for input (this part gets specified for all diffables)"""
        pass

    def weight_gradients(self: np.array) -> Tuple[np.array, np.array]:
        """Returns gradient for weights (this part gets specified for SOME diffables)"""
        return ()

    def compose_to_input(self, J: np.array) -> np.array:
        """
        Compose the inputted cumulative jacobian with the input jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `input_gradients` to provide either batched or overall jacobian.
        Assumes input/cumulative jacobians are matrix multiplied
        """
        #  print(f"Composing to input in {self.__class__.__name__}")
        ig = self.input_gradients()
        batch_size = J.shape[0]
        n_out, n_in = ig.shape[-2:]
        j_new = np.zeros((batch_size, n_out), dtype=ig.dtype)
        for b in range(batch_size):
            ig_b = ig[b] if len(ig.shape) == 3 else ig
            j_new[b] = ig_b @ J[b]
        return j_new

    def compose_to_weight(self, J: np.array) -> list:
        """
        Compose the inputted cumulative jacobian with the weight jacobian for the layer.
        Implemented with batch-level vectorization.

        Requires `weight_gradients` to provide either batched or overall jacobian.
        Assumes weight/cumulative jacobians are element-wise multiplied (w/ broadcasting)
        and the resulting per-batch statistics are averaged together for avg per-param gradient.
        """
        # print(f'Composing to weight in {self.__class__.__name__}')
        assert hasattr(
            self, "weights"
        ), f"Layer {self.__class__.__name__} cannot compose along weight path"
        J_out = []
        ## For every weight/weight-gradient pair...
        for w, wg in zip(self.weights, self.weight_gradients()):
            batch_size = J.shape[0]
            ## Make a cumulative jacobian which will contribute to the final jacobian
            j_new = np.zeros((batch_size, *w.shape), dtype=wg.dtype)
            ## For every element in the batch (for a single batch-level gradient updates)
            for b in range(batch_size):
                ## If the weight gradient is a batch of transform matrices, get the right entry.
                ## Allows gradient methods to give either batched or non-batched matrices
                wg_b = wg[b] if len(wg.shape) == 3 else wg
                ## Update the batch's Jacobian update contribution
                j_new[b] = wg_b * J[b]
            ## The final jacobian for this weight is the average gradient update for the batch
            J_out += [np.mean(j_new, axis=0)]
        ## After new jacobian is computed for each weight set, return the list of gradient updatates
        return J_out


class GradientTape:

    def __init__(self):
        ## Log of operations that were performed inside tape scope
        self.operations = []

    def __enter__(self):
        # When tape scope is entered, let Diffable start recording to self.operation
        Diffable.gradient_tape = self
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        # When tape scope is exited, stop letting Diffable record
        Diffable.gradient_tape = None

    def gradient(self) -> list:
        """Get the gradient from first to last recorded operation"""
        ## TODO:
        ##
        ##  Compute weight gradients for all operations.
        ##  If the model has trainable weights [w1, b1, w2, b2] and ends at a loss L.
        ##  the model should return: [dL/dw1, dL/db1, dL/dw2, dL/db2]
        ##
        ##  Recall that self.operations is populated by Diffable class instances...
        ##
        ##  Start from the last operation and compute jacobian w.r.t input.
        ##  Continue to propagate the cumulative jacobian through the layer inputs
        ##  until all operations have been differentiated through.
        ##
        ##  If an operation that has weights is encountered along the way,
        ##  compute the weight gradients and add them to the return list.
        ##  Remember to check if the layer is trainable before doing this though...

        grads = []
        return grads

4. Layers

For the purposes of this assignment, you will implement the Dense layer to use in your

sequential model in Beras/layers.py . You have to fill in the following methods.

● forward() : [TODO] Implement the forward pass and return the outputs.

● weight_gradients() : [TODO] Calculate the gradients with respect to the

weights and the biases. This will be used to optimize the layer .

● input_gradients() : [TODO] Calculate the gradients with respect to the

layer inputs. This will be used to propagate the gradient to previous layers.

● _initialize_weight() : [TODO] Initialize the dense layer’s weight values.

By default, initialize all the weights to zero (usually a bad idea, by the way). You

are also required to allow for more sophisticated options (when the initializer is

set to normal, xavier, and kaiming). Follow Keras math assumptions!

        ○ Normal: Pretty self-explanatory, a unit normal distribution.

        ○ Xavier Normal: Based on keras.GlorotNormal .

        ○ Kaiming He Normal: Based on Keras.HeNormal .

You may find np.random.normal helpful while implementing these. The TODOs

provide some justification for why these different initialization methods are necessary but

for more detail, check out this website ! Feel free to add more initializer options!

import numpy as np

from .core import Diffable


class Dense(Diffable):

    # https://towardsdatascience.com/weight-initialization-in-neural-networks-a-journey-from-the-basics-to-kaiming-954fb9b47c79

    def __init__(self, input_size, output_size, initializer="kaiming"):
        super().__init__()
        self.w, self.b = self.__class__._initialize_weight(
            initializer, input_size, output_size
        )
        self.weights = [self.w, self.b]
        self.inputs  = None
        self.outputs = None

    def forward(self, inputs):
        """Forward pass for a dense layer! Refer to lecture slides for how this is computed."""
        self.inputs = inputs

        # TODO: implement the forward pass and return the outputs
        self.outputs = None
        return self.outputs

    def weight_gradients(self):
        """Calculating the gradients wrt weights and biases!"""
        # TODO: Implement calculation of gradients
        wgrads = None
        bgrads = None
        return wgrads, bgrads

    def input_gradients(self):
        """Calculating the gradients wrt inputs!"""
        # TODO: Implement calculation of gradients
        return None

    @staticmethod
    def _initialize_weight(initializer, input_size, output_size):
        """
        Initializes the values of the weights and biases. The bias weights should always start at zero.
        However, the weights should follow the given distribution defined by the initializer parameter
        (zero, normal, xavier, or kaiming). You can do this with an if statement
        cycling through each option!

        Details on each weight initialization option:
            - Zero: Weights and biases contain only 0's. Generally a bad idea since the gradient update
            will be the same for each weight so all weights will have the same values.
            - Normal: Weights are initialized according to a normal distribution.
            - Xavier: Goal is to initialize the weights so that the variance of the activations are the
            same across every layer. This helps to prevent exploding or vanishing gradients. Typically
            works better for layers with tanh or sigmoid activation.
            - Kaiming: Similar purpose as Xavier initialization. Typically works better for layers
            with ReLU activation.
        """

        initializer = initializer.lower()
        assert initializer in (
            "zero",
            "normal",
            "xavier",
            "kaiming",
        ), f"Unknown dense weight initialization strategy '{initializer}' requested"
        io_size = (input_size, output_size)

        # TODO: Implement default assumption: zero-init for weights and bias

        # TODO: Implement remaining options (normal, xavier, kaiming initializations). Note that
        # strings must be exactly as written in the assert above

        return None, None

5. Activation Functions

In this assignment, you will be implementing two major activation functions, namely,

LeakyReLU and Softmax in Beras/activations.py . Since ReLU is a special case

of LeakyReLU , we have already provided you with the code for it.

● LeakyReLU()

        ○ forward() : [TODO] Given input x , compute & return LeakyReLU(x) .

        ○ input_gradients() : [TODO] Compute & return the partial with

        respect to inputs by differentiating LeakyReLU .

● Softmax() : (2470 ONLY)

        forward(): [TODO] Given input x , compute & return Softmax(x) .

        Make sure you use stable softmax where you subtract max of all entries

        to prevent overflow/undvim erflow issues.

        ○ input_gradients() : [TODO] Partial w.r.t. inputs of Softmax() .

import numpy as np

from .core import Diffable


class LeakyReLU(Diffable):
    def __init__(self, alpha=0.3):
        super().__init__()
        self.alpha = alpha
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        # TODO: Given an input array `x`, compute LeakyReLU(x)
        self.inputs = inputs
        # Your code here:
        self.outputs = None
        return self.outputs

    def input_gradients(self):
        # TODO: Compute and return the gradients
        return 0

    def compose_to_input(self, J):
        # TODO: Maybe you'll want to override the default?
        return super().compose_to_input(J)


class ReLU(LeakyReLU):
    def __init__(self):
        super().__init__(alpha=0)


class Softmax(Diffable):
    def __init__(self):
        super().__init__()
        self.inputs = None
        self.outputs = None

    def forward(self, inputs):
        """Softmax forward pass!"""
        # TODO: Implement
        # HINT: Use stable softmax, which subtracts maximum from
        # all entries to prevent overflow/underflow issues
        self.inputs = inputs
        # Your code here:
        self.outputs = None
        return self.outputs

    def input_gradients(self):
        """Softmax backprop!"""
        # TODO: Compute and return the gradients
        return 0

6. Filling in the model

With these abstractions in mind, let’s create a pipeline for our sequential deep learning

model. You can find the SequentialModel class in assignment.py where you will

initialize your neural network’s layers, parameters (weights and biases), and

hyperparameters (optimizer, loss function, learning rate, accuracy function, etc.). The

SequentialModel class inherits from Beras/model.py , where you’ll find many

useful methods. This will also contain functions that fit the model to your data and

evaluate the performance of your model:

● compile() : Initialize the model optimizer, loss function, & accuracy function,

which are fed in as arguments, for your SequentialModel instance to use.

● fit() : Trains your model to associate input to outputs. Training is repeated for

each epoch, and the data is batched based on argument. It also computes

batch_metrics , epoch_metrics , and the aggregated agg_metrics that

can be used to track the training progress of your model.

● evaluate() : [TODO] Evaluate the performance of the final model using the

metrics mentioned above during the testing phase. It’s almost identical to the

fit() function; think about what would change between training and testing).

● call() : [TODO] Hint: what does it mean to call a sequential model? Remember

that a sequential model is a stack of layers where each layer has exactly one

input vector and one output vector. You can find this function within the

SequentialModel class in assignment.py .

● batch_step() : [TODO] You will observe that fit() calls this function for each

batch. You will first compute your model predictions for the input batch. In the

training phase, you will need to compute gradients and update your weights

according to the optimizer you are using. For backpropagation during training,

you will use GradientTape from the core abstractions ( core.py ) to record

operations and intermediate values. You will then use the model's optimizer to

apply the gradients to your model's trainable variables. Finally, compute and

return the loss and accuracy for the batch. You can find this function within the

SequentialModel class in assignment.py .

from abc import ABC, abstractmethod
from collections import defaultdict

import numpy as np

from .core import Diffable


def print_stats(stat_dict, b=None, b_num=None, e=None, avg=False):
    """
    Given a dictionary of names statistics and batch/epoch info,
    print them in an appealing manner. If avg, display stat averages.
    """
    title_str = " - "
    if e is not None:
        title_str += f"Epoch {e+1:2}: "
    if b is not None:
        title_str += f"Batch {b+1:3}"
        if b_num is not None:
            title_str += f"/{b_num}"
    if avg:
        title_str += f"Average Stats"
    print(f"\r{title_str} : ", end="")
    op = np.mean if avg else lambda x: x
    print({k: np.round(op(v), 4) for k, v in stat_dict.items()}, end="")
    print("   ", end="" if not avg else "\n")


def update_metric_dict(super_dict, sub_dict):
    """
    Appends the average of the sub_dict metrics to the super_dict's metric list
    """
    for k, v in sub_dict.items():
        super_dict[k] += [np.mean(v)]


class Model(ABC):

    ###############################################################################################
    ## BEGIN GIVEN

    def __init__(self, layers):
        """
        Initialize all trainable parameters and take layers as inputs
        """
        # Initialize all trainable parameters
        assert all([issubclass(layer.__class__, Diffable) for layer in layers])
        self.layers = layers
        self.trainable_variables = []
        for layer in layers:
            if hasattr(layer, "weights") and layer.trainable:
                for weight in layer.weights:
                    self.trainable_variables += [weight]

    def compile(self, optimizer, loss_fn, acc_fn):
        """
        "Compile" the model by taking in the optimizers, loss, and accuracy functions.
        In more optimized DL implementations, this will have more involved processes
        that make the components extremely efficient but very inflexible.
        """
        self.optimizer = optimizer
        self.compiled_loss = loss_fn
        self.compiled_acc = acc_fn

    def fit(self, x, y, epochs, batch_size):
        """
        Trains the model by iterating over the input dataset and feeding input batches
        into the batch_step method with training. At the end, the metrics are returned.
        """
        agg_metrics = defaultdict(lambda: [])
        batch_num = x.shape[0] // batch_size
        for e in range(epochs):
            epoch_metrics = defaultdict(lambda: [])
            for b, b1 in enumerate(range(batch_size, x.shape[0] + 1, batch_size)):
                b0 = b1 - batch_size
                batch_metrics = self.batch_step(x[b0:b1], y[b0:b1], training=True)
                update_metric_dict(epoch_metrics, batch_metrics)
                print_stats(batch_metrics, b, batch_num, e)
            update_metric_dict(agg_metrics, epoch_metrics)
            print_stats(epoch_metrics, e=e, avg=True)
        return agg_metrics

    def evaluate(self, x, y, batch_size):
        """
        X is the dataset inputs, Y is the dataset labels.
        Evaluates the model by iterating over the input dataset in batches and feeding input batches
        into the batch_step method. At the end, the metrics are returned. Should be called on
        the testing set to evaluate accuracy of the model using the metrics output from the fit method.

        NOTE: This method is almost identical to fit (think about how training and testing differ --
        the core logic should be the same)
        """
        # TODO: Implement evaluate similarly to fit.
        agg_metrics = defaultdict(lambda: [])
        return agg_metrics

    @abstractmethod
    def call(self, inputs):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

    @abstractmethod
    def batch_step(self, x, y, training=True):
        """You will implement this in the SequentialModel class in assignment.py"""
        return

We encourage you to check out keras.SequentialModel in the intro notebook

(under Exploring a possible modular implementation: TensorFlow/Keras ) and refer

to Lab 3 to get a feel for how we can work with gradient tapes in deep learning.

7. Loss Function

This is one of the most crucial aspects of model training. In this assignment, we will

implement the MSE or mean-squared error loss function. You can find your loss function

in Beras/losses.py .

● forward() : [TODO] Write a function that computes and returns the mean

squared error given the predicted and actual labels.

Hint: What is MSE?

Given the predicted and actual labels, MSE is the average of the squares of the

differences between predicted and actual values.

● input_gradients() : [TODO] Compute and return the gradients. Use

differentiation to derive the formula for these gradients.

import numpy as np

from .core import Diffable


class MeanSquaredError(Diffable):
    def __init__(self):
        super().__init__()
        self.y_pred  = None
        self.y_true  = None
        self.outputs = None

    def forward(self, y_pred, y_true):
        """Mean squared error forward pass!"""
        # TODO: Compute and return the MSE given predicted and actual labels
        self.y_pred = y_pred
        self.y_true = y_true

        # Your code here:
        self.outputs = None
        return None

    # https://medium.com/analytics-vidhya/derivative-of-log-loss-function-for-logistic-regression-9b832f025c2d
    def input_gradients(self):
        """Mean squared error backpropagation!"""
        # TODO: Compute and return the gradients
        return None


def clip_0_1(x, eps=1e-8):
    return np.clip(x, eps, 1-eps)

class CategoricalCrossentropy(Diffable):
    """
    GIVEN. Feel free to use categorical cross entropy as your loss function for your final model.

    https://medium.com/analytics-vidhya/derivative-of-log-loss-function-for-logistic-regression-9b832f025c2d
    """

    def __init__(self):
        super().__init__()
        self.truths  = None
        self.inputs  = None
        self.outputs = None

    def forward(self, inputs, truths):
        """Categorical cross entropy forward pass!"""
        # print(truth.shape, inputs.shape)
        ll_right = truths * np.log(clip_0_1(inputs))
        ll_wrong = (1 - truths) * np.log(clip_0_1(1 - inputs))
        nll_total = -np.mean(ll_right + ll_wrong, axis=-1)

        self.inputs = inputs
        self.truths = truths
        self.outputs = np.mean(nll_total, axis=0)
        return self.outputs

    def input_gradients(self):
        """Categorical cross entropy backpropagation!"""
        bn, n = self.inputs.shape
        grad = np.zeros(shape=(bn, n), dtype=self.inputs.dtype)
        for b in range(bn):
            inp = self.inputs[b]
            tru = self.truths[b]
            grad[b] = inp - tru
            grad[b] /= clip_0_1(inp - inp ** 2)
            grad[b] /= inp.shape[-1]
        return grad

8. Optimizers

In the Beras/optimizers.py file make sure to implement the optimization for each of

the different types of optimizers. Lab 2 should help with this, so good luck!

● BasicOptimizer : [TODO] A simple optimizer strategy as seen in Lab 1.

● RMSProp : [TODO] Root mean squared error propagation.

● Adam : [TODO] A common adaptive motion estimation-based optimizer.

from collections import defaultdict
import numpy as np

## HINT: Lab 2 might be helpful...

class BasicOptimizer:
    def __init__(self, learning_rate):
        self.learning_rate = learning_rate

    def apply_gradients(self, weights, grads):
        # TODO: Update the weights using basic stochastic gradient descent
        return


class RMSProp:
    def __init__(self, learning_rate, beta=0.9, epsilon=1e-6):
        self.learning_rate = learning_rate

        self.beta = beta
        self.epsilon = epsilon

        self.v = defaultdict(lambda: 0)

    def apply_gradients(self, weights, grads):
        # TODO: Implement RMSProp optimization
        # Refer to the lab on Optimizers for a better understanding!
        return


class Adam:
    def __init__(
        self, learning_rate, beta_1=0.9, beta_2=0.999, epsilon=1e-7, amsgrad=False
    ):
        self.amsgrad = amsgrad

        self.learning_rate = learning_rate
        self.beta_1 = beta_1
        self.beta_2 = beta_2
        self.epsilon = epsilon

        self.m = defaultdict(lambda: 0)  # First moment zero vector
        self.v = defaultdict(lambda: 0)  # Second moment zero vector.
        # Expected value of first moment vector
        self.m_hat = defaultdict(lambda: 0)
        # Expected value of second moment vector
        self.v_hat = defaultdict(lambda: 0)
        self.t = 0  # Time counter

    def apply_gradients(self, weights, grads):
        # TODO: Implement Adam optimization
        # Refer to the lab on Optimizers for a better understanding!
        return

9. Accuracy metrics

Finally, to evaluate the performance of your model, you need to use appropriate

accuracy metrics. In this assignment, you will implement categorical accuracy in

Beras/metrics.py :

● forward() : [TODO] Return the categorical accuracy of your model given the

predicted probabilities and true labels. You should be returning the proportion of

predicted labels equal to the true labels, where the predicted label for an image is

the label corresponding to the highest probability. Refer to the internet or lecture

slides for categorical accuracy math!

import numpy as np

from .core import Callable


class CategoricalAccuracy(Callable):
    def forward(self, probs, labels):
        """Categorical accuracy forward pass!"""
        super().__init__()
        # TODO: Compute and return the categorical accuracy of your model given the output probabilities and true labels
        return None

10. Train and Test

Finally, using all the above primitives, you are required to build two models in

assignment.py:

● A simple model in get_simple_model() with at most one Diffable layer (e.g.,

Dense - ./layers.py ) and one activation function (look for them in

./activation.py ). This one is provided for you by default, though you can

change it if you’d like. The autograder will evaluate the original one though!

● A slightly more complex model in get_advanced_model() with two or more

Diffable layers and two or more activation functions. We recommend using Adam

optimizer for this model with a decently low learning rate.

For any hyperparameters you use (layer sizes, learning rate, epoch size, batch size,

etc.), please hardcode these values in the get_simple_model() and

get_advanced_model() functions. Do NOT store them under the main handler.

Once everything is implemented, you can use python3 assignment.py to run your

model and see the loss/accuracy!

from types import SimpleNamespace

import Beras
import numpy as np


class SequentialModel(Beras.Model):
    """
    Implemented in Beras/model.py

    def __init__(self, layers):
    def compile(self, optimizer, loss_fn, acc_fn):
    def fit(self, x, y, epochs, batch_size):
    def evaluate(self, x, y, batch_size):           ## <- TODO
    """

    def call(self, inputs):
        """
        Forward pass in sequential model. It's helpful to note that layers are initialized in Beras.Model, and
        you can refer to them with self.layers. You can call a layer by doing var = layer(input).
        """
        # TODO: The call function!
        return None

    def batch_step(self, x, y, training=True):
        """
        Computes loss and accuracy for a batch. This step consists of both a forward and backward pass.
        If training=false, don't apply gradients to update the model! 
        Most of this method (forward, loss, applying gradients)
        will take place within the scope of Beras.GradientTape()
        """
        # TODO: Compute loss and accuracy for a batch.
        # If training, then also update the gradients according to the optimizer
        return {"loss": None, "acc": None}


def get_simple_model_components():
    """
    Returns a simple single-layer model.
    """
    ## DO NOT CHANGE IN FINAL SUBMISSION

    from Beras.activations import Softmax
    from Beras.layers import Dense
    from Beras.losses import MeanSquaredError
    from Beras.metrics import CategoricalAccuracy
    from Beras.optimizers import BasicOptimizer

    # TODO: create a model and compile it with layers and functions of your choice
    model = SequentialModel([Dense(784, 10), Softmax()])
    model.compile(
        optimizer=BasicOptimizer(0.02),
        loss_fn=MeanSquaredError(),
        acc_fn=CategoricalAccuracy(),
    )
    return SimpleNamespace(model=model, epochs=10, batch_size=100)


def get_advanced_model_components():
    """
    Returns a multi-layered model with more involved components.
    """
    # TODO: create/compile a model with layers and functions of your choice.
    # model = SequentialModel()

    return SimpleNamespace(model=None, epochs=None, batch_size=None)


if __name__ == "__main__":
    """
    Read in MNIST data and initialize/train/test your model.
    """
    from Beras.onehot import OneHotEncoder
    import preprocess

    ## Read in MNIST data,
    train_inputs, train_labels = preprocess.get_data_MNIST("train", "../data")
    test_inputs,  test_labels  = preprocess.get_data_MNIST("test",  "../data")

    ## TODO: Use the OneHotEncoder class to one hot encode the labels
    ohe = lambda x: 0  ## placeholder function: returns zero for a given input

    ## Get your model to train and test
    simple = False
    args = get_simple_model_components() if simple else get_advanced_model_components()
    model = args.model

    ## REMINDER: Threshold of accuracy: 
    ##  1470: >85% on testing accuracy from get_simple_model_components
    ##  2470: >95% on testing accuracy from get_advanced_model_components

    # TODO: Fit your model to the training input and the one hot encoded labels
    # Remember to pass all the arguments that SequentialModel.fit() requires
    # such as number of epochs and the batch size
    train_agg_metrics = model.fit(
        train_inputs, 
        ohe(train_labels), 
        epochs     = args.epochs, 
        batch_size = args.batch_size
    )

    ## Feel free to use the visualize_metrics function to view your accuracy and loss.
    ## The final accuracy returned during evaluation must be > 80%.

    # from visualize import visualize_images, visualize_metrics
    # visualize_metrics(train_agg_metrics["loss"], train_agg_metrics["acc"])
    # visualize_images(model, train_inputs, ohe(train_labels))

    ## TODO: Evaluate your model using your testing inputs and one hot encoded labels.
    ## This is the number you will be using!
    test_agg_metrics = model.evaluate(test_inputs, ohe(test_labels), batch_size=100)
    print('Testing Performance:', test_agg_metrics)

11. Visualizing Results

We provided the visualize_metrics method for you to visualize how your loss and

accuracy changes after each batch using matplotlib. DO NOT EDIT THIS FUNCTION.

You should call this function in your main method after you store the loss and accuracy

per batch in an array, which would be passed into this function. This should plot line

graphs where the horizontal axis is the i'th batch and the vertical axis is the

loss/accuracy value of the batch. Calling this is OPTIONAL!

We've also provided the visualize_images method for you to visualize your

predictions against the true labels with matplotlib. This method is currently written with

the labels having a shape of [number of images, 1]. DO NOT EDIT THIS FUNCTION .

You should call this function with all your inputs and labels after training your model. The

function will randomly pick 500 samples from your input and will plot 10 correct and 10

incorrect classifications to help you visually interpret your model’s predictions! You

should do this last, after you have met the benchmark for test accuracy.

CS1470 Students

- Complete and Submit HW2 Conceptual

- Implement Beras per specifications and make a SequentialModel in assignment.py

- Test the model inside of main

- Get test accuracy >=85% on MNIST with default get_simple_model_components .

- Complete the Exploration notebook and export it to a PDF.

- The “HW2 Intro to Beras” notebook is just for your reference.

CS2470 Students

- Same as 1470 except:

- Implement Softmax activation function (forward pass and input_gradients )

- Get testing accuracy >95% on MNIST model from get_advanced_model_components .

- You will need to specify a multi-layered model, will have to explore

hyperparameter options, and may want to add additional features.

- Additional features may include regularization, other weight initialization

schemes, aggregation layers, dropout, rate scheduling, or skip connections. If

you have other ideas, feel free to ask publicly on Ed and we’ll let you know if they

are also ok.

- When implementing these features, try to mimic the Keras API as much as

possible. This will help significantly with your Exploration notebook.

- Finish 2470 components for Exploration notebook and conceptual questions.

Grading and Autograder Compatibility

Conceptual : You will be primarily graded on correctness, thoughtfulness, and clarity.

Code: You will be primarily graded on functionality. Your model should have an accuracy that is

at least greater than the threshold on the testing data. For 1470, this can be achieved with the

simple model parameterization provided. For 2470, you may need to experiment with

hyperparameters or develop some custom components.

Although you will not be graded on code style, you should not have an excessive number of

print statements in your final submission.

IMPORTANT! Please use vectorized operations when possible and limit the number of for loops

you use. While there is no strict time limit for running this assignment, it should typically be less

than 3 minutes. The autograder will automatically time out after 10 minutes. You will not receive

any credit for methods that use Tensorflow or Keras functions within them.

Notebook: The exploration notebook will be graded manually and should be submitted as a pdf

file. Feel free to use the “Notebooks to Latex PDFs.ipynb” notebook! Handing In

You should submit the assignment via Gradescope under the corresponding project assignment

by zipping up your hw2 folder (the path on Gradescope MUST be hw2/code/filename.py) or

through GitHub ( recommended ). To submit through GitHub, commit and push all changes to

your repository to GitHub. You can do this by running the following three commands ( this is a

good resource for learning more about them):

1. git add file1 file2 file3 (or -A)
2. git commit -m “commit message”
3. git push

After committing and pushing your changes to your repo (which you can check online if you're unsure if it worked), you can now just upload the repo to Gradescope! If you’re testing out code on multiple branches, you have the option to pick whichever one you want.

IMPORTANT!

1. Please make sure all your files are in hw2/code . Otherwise, the autograder will fail!

2. Delete the data folder before zipping up your code.

3. 2470 STUDENTS : Add a blank file named 2470student in the hw2/code directory!

The file should have no extension, and is used as a flag to grade 2470-specific requirements. If

you don’t do this, YOU WILL LOSE POINTS!

Thanks!

Refer to the answer：http://t.csdn.cn/9sdBN

大气层煮月亮

关注

1
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
HW2: Numpy for Multi-Layer Neural Network

HW2: Numpy for Multi-Layer Neural Network——This homework is intended to give you an introduction to building, training, and testing neuralnetwork models. You will not only be exposed to using Python packages to build a neuralnetwork from scratch, but als
复制链接

扫一扫