CS231N作业A1Q4：two_layer_net 学会了loss的可视化、超参数的调节，准备自己应用一下！

鱼鱼9901

于 2023-05-29 11:21:45 发布

阅读量1.3k

点赞数

分类专栏： CS231N 文章标签： python 深度学习笔记

本文链接：https://blog.csdn.net/weixin_72100405/article/details/130924564

版权

CS231N 专栏收录该内容

14 篇文章 3 订阅

订阅专栏

# This mounts your Google Drive to the Colab VM.
from google.colab import drive
drive.mount('/content/drive')

# TODO: Enter the foldername in your Drive where you have saved the unzipped
# assignment folder, e.g. 'cs231n/assignments/assignment1/'
FOLDERNAME = None
assert FOLDERNAME is not None, "[!] Enter the foldername."

# Now that we've mounted your Drive, this ensures that
# the Python interpreter of the Colab VM can load
# python files from within it.
import sys
sys.path.append('/content/drive/My Drive/{}'.format(FOLDERNAME))

# This downloads the CIFAR-10 dataset to your Drive
# if it doesn't already exist.
%cd /content/drive/My\ Drive/$FOLDERNAME/cs231n/datasets/
!bash get_datasets.sh
%cd /content/drive/My\ Drive/$FOLDERNAME

Fully-Connected Neural Nets

In this exercise we will implement fully-connected networks using a modular approach. For each layer we will implement a forward and a backward function. The forward function will receive inputs, weights, and other parameters and will return both an output and a cache object storing data needed for the backward pass, like this:

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
   
  cache = (x, w, z, out) # Values we need to compute gradients
   
  return out, cache

The backward pass will receive upstream derivatives and the cache object, and will return gradients with respect to the inputs and weights, like this:

def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  
  return dx, dw

After implementing a bunch of layers this way, we will be able to easily combine them to build classifiers with different architectures.

# As usual, a bit of setup
from __future__ import print_function
import time
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.fc_net import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient, eval_numerical_gradient_array
from cs231n.solver import Solver

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

# Load the (preprocessed) CIFAR10 data.

data = get_CIFAR10_data()
for k, v in list(data.items()):
  print(('%s: ' % k, v.shape))

('X_train: ', (49000, 3, 32, 32))
('y_train: ', (49000,))
('X_val: ', (1000, 3, 32, 32))
('y_val: ', (1000,))
('X_test: ', (1000, 3, 32, 32))
('y_test: ', (1000,))

Affine layer: forward

Open the file cs231n/layers.py and implement the affine_forward function.

Once you are done you can test your implementaion by running the following:

# Test the affine_forward function
#affine:仿射（线性变换+平移 wx+b）

num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

Testing affine_forward function:
difference:  9.7698500479884e-10

Affine layer: backward

Now implement the affine_backward function and test your implementation using numeric gradient checking.

# Test the affine_backward function
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# The error should be around e-10 or less
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

Testing affine_backward function:
dx error:  1.0908199508708189e-10
dw error:  2.1752635504596857e-10
db error:  7.736978834487815e-12

ReLU activation: forward

Implement the forward pass for the ReLU activation function in the relu_forward function and test your implementation using the following:

# Test the relu_forward function

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# Compare your output with ours. The error should be on the order of e-8
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))

Testing relu_forward function:
difference:  4.999999798022158e-08

ReLU activation: backward

Now implement the backward pass for the ReLU activation function in the relu_backward function and test your implementation using numeric gradient checking:

np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# The error should be on the order of e-12
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))

Testing relu_backward function:
dx error:  3.2756349136310288e-12

Inline Question 1:

We’ve only asked you to implement ReLU, but there are a number of different activation functions that one could use in neural networks, each with its pros and cons. In particular, an issue commonly seen with activation functions is getting zero (or close to zero) gradient flow during backpropagation. Which of the following activation functions have this problem? If you consider these functions in the one dimensional case, what types of input would lead to this behaviour?

Sigmoid
ReLU
Leaky ReLU

Answer:

[FILL THIS IN]

“Sandwich” layers

There are some common patterns of layers that are frequently used in neural nets. For example, affine layers are frequently followed by a ReLU nonlinearity. To make these common patterns easy, we define several convenience layers in the file cs231n/layer_utils.py.

For now take a look at the affine_relu_forward and affine_relu_backward functions, and run the following to numerically gradient check the backward pass:

from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

# Relative error should be around e-10 or less
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

Testing affine_relu_forward and affine_relu_backward:
dx error:  6.395535042049294e-11
dw error:  8.162011105764925e-11
db error:  7.826724021458994e-12

Loss layers: Softmax and SVM

Now implement the loss and gradient for softmax and SVM in the softmax_loss and svm_loss function in cs231n/layers.py. These should be similar to what you implemented in cs231n/classifiers/softmax.py and cs231n/classifiers/linear_svm.py.

You can make sure that the implementations are correct by running the following:

np.random.seed(231)
num_classes, num_inputs = 10, 50
x = 0.001 * np.random.randn(num_inputs, num_classes)
y = np.random.randint(num_classes, size=num_inputs)

dx_num = eval_numerical_gradient(lambda x: svm_loss(x, y)[0], x, verbose=False)
loss, dx = svm_loss(x, y)

# Test svm_loss function. Loss should be around 9 and dx error should be around the order of e-9
print('Testing svm_loss:')
print('loss: ', loss)
print('dx error: ', rel_error(dx_num, dx))

dx_num = eval_numerical_gradient(lambda x: softmax_loss(x, y)[0], x, verbose=False)
loss, dx = softmax_loss(x, y)

# Test softmax_loss function. Loss should be close to 2.3 and dx error should be around e-8
print('\nTesting softmax_loss:')
print('loss: ', loss)
print('dx error: ', rel_error(dx_num, dx))

Testing svm_loss:
loss:  8.999602749096233
dx error:  1.4021566006651672e-09

Testing softmax_loss:
loss:  2.302545844500738
dx error:  9.483503037636722e-09

Two-layer network

Open the file cs231n/classifiers/fc_net.py and complete the implementation of the TwoLayerNet class. Read through it to make sure you understand the API. You can run the cell below to test your implementation.

np.random.seed(231)
N, D, H, C = 3, 5, 50, 7
X = np.random.randn(N, D)
y = np.random.randint(C, size=N)

std = 1e-3
model = TwoLayerNet(input_dim=D, hidden_dim=H, num_classes=C, weight_scale=std)

print('Testing initialization ... ')
W1_std = abs(model.params['W1'].std() - std)
b1 = model.params['b1']
W2_std = abs(model.params['W2'].std() - std)
b2 = model.params['b2']
assert W1_std < std / 10, 'First layer weights do not seem right'
assert np.all(b1 == 0), 'First layer biases do not seem right'
assert W2_std < std / 10, 'Second layer weights do not seem right'
assert np.all(b2 == 0), 'Second layer biases do not seem right'

print('Testing test-time forward pass ... ')
model.params['W1'] = np.linspace(-0.7, 0.3, num=D*H).reshape(D, H)
model.params['b1'] = np.linspace(-0.1, 0.9, num=H)
model.params['W2'] = np.linspace(-0.3, 0.4, num=H*C).reshape(H, C)
model.params['b2'] = np.linspace(-0.9, 0.1, num=C)
X = np.linspace(-5.5, 4.5, num=N*D).reshape(D, N).T
scores = model.loss(X)
correct_scores = np.asarray(
  [[11.53165108,  12.2917344,   13.05181771,  13.81190102,  14.57198434, 15.33206765,  16.09215096],
   [12.05769098,  12.74614105,  13.43459113,  14.1230412,   14.81149128, 15.49994135,  16.18839143],
   [12.58373087,  13.20054771,  13.81736455,  14.43418138,  15.05099822, 15.66781506,  16.2846319 ]])
scores_diff = np.abs(scores - correct_scores).sum()
assert scores_diff < 1e-6, 'Problem with test-time forward pass'

print('Testing training loss (no regularization)')
y = np.asarray([0, 5, 1])
loss, grads = model.loss(X, y)
correct_loss = 3.4702243556
assert abs(loss - correct_loss) < 1e-10, 'Problem with training-time loss'

model.reg = 1.0
loss, grads = model.loss(X, y)
correct_loss = 26.5948426952
assert abs(loss - correct_loss) < 1e-10, 'Problem with regularization loss'

# Errors should be around e-7 or less
for reg in [0.0, 0.7]:
  print('Running numeric gradient check with reg = ', reg)
  model.reg = reg
  loss, grads = model.loss(X, y)

  for name in sorted(grads):
    f = lambda _: model.loss(X, y)[0]
    grad_num = eval_numerical_gradient(f, model.params[name], verbose=False)
    print('%s relative error: %.2e' % (name, rel_error(grad_num, grads[name])))

Testing initialization ... 
Testing test-time forward pass ... 
Testing training loss (no regularization)
Running numeric gradient check with reg =  0.0
W1 relative error: 1.22e-08
W2 relative error: 3.42e-10
b1 relative error: 6.55e-09
b2 relative error: 2.53e-10
Running numeric gradient check with reg =  0.7
W1 relative error: 2.53e-07
W2 relative error: 1.37e-07
b1 relative error: 1.56e-08
b2 relative error: 9.09e-10

Solver

Open the file cs231n/solver.py and read through it to familiarize yourself with the API. After doing so, use a Solver instance to train a TwoLayerNet that achieves about 36% accuracy on the validation set.

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
model = TwoLayerNet(input_size, hidden_size, num_classes)
solver = None

##############################################################################
# TODO: Use a Solver instance to train a TwoLayerNet that achieves about 36% #
# accuracy on the validation set.                                            #
##############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

solver = Solver(model, data,
                update_rule='sgd',
                optim_config={
                    'learning_rate': 1e-3,
                },
                print_every=100)
solver.train()
print(model.params['W1'])
print(model.params['W2'])

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
##############################################################################
#                             END OF YOUR CODE                               #
##############################################################################

(Iteration 1 / 4900) loss: 2.300089
(Epoch 0 / 10) train acc: 0.171000; val_acc: 0.170000
(Iteration 101 / 4900) loss: 1.782419
(Iteration 201 / 4900) loss: 1.803466
(Iteration 301 / 4900) loss: 1.712676
(Iteration 401 / 4900) loss: 1.693946
(Epoch 1 / 10) train acc: 0.399000; val_acc: 0.428000
(Iteration 501 / 4900) loss: 1.712717
(Iteration 601 / 4900) loss: 1.460141
(Iteration 701 / 4900) loss: 1.563845
(Iteration 801 / 4900) loss: 1.535489
(Iteration 901 / 4900) loss: 1.359039
(Epoch 2 / 10) train acc: 0.468000; val_acc: 0.444000
(Iteration 1001 / 4900) loss: 1.339096
(Iteration 1101 / 4900) loss: 1.397059
(Iteration 1201 / 4900) loss: 1.499237
(Iteration 1301 / 4900) loss: 1.349809
(Iteration 1401 / 4900) loss: 1.449001
(Epoch 3 / 10) train acc: 0.466000; val_acc: 0.429000
(Iteration 1501 / 4900) loss: 1.319916
(Iteration 1601 / 4900) loss: 1.441210
(Iteration 1701 / 4900) loss: 1.351760
(Iteration 1801 / 4900) loss: 1.430905
(Iteration 1901 / 4900) loss: 1.347602
(Epoch 4 / 10) train acc: 0.504000; val_acc: 0.451000
(Iteration 2001 / 4900) loss: 1.452116
(Iteration 2101 / 4900) loss: 1.369872
(Iteration 2201 / 4900) loss: 1.561029
(Iteration 2301 / 4900) loss: 1.242521
(Iteration 2401 / 4900) loss: 1.433959
(Epoch 5 / 10) train acc: 0.493000; val_acc: 0.465000
(Iteration 2501 / 4900) loss: 1.229392
(Iteration 2601 / 4900) loss: 1.527411
(Iteration 2701 / 4900) loss: 1.262844
(Iteration 2801 / 4900) loss: 1.540281
(Iteration 2901 / 4900) loss: 1.061335
(Epoch 6 / 10) train acc: 0.537000; val_acc: 0.488000
(Iteration 3001 / 4900) loss: 1.375969
(Iteration 3101 / 4900) loss: 1.168029
(Iteration 3201 / 4900) loss: 1.270886
(Iteration 3301 / 4900) loss: 1.360948
(Iteration 3401 / 4900) loss: 1.218180
(Epoch 7 / 10) train acc: 0.531000; val_acc: 0.468000
(Iteration 3501 / 4900) loss: 1.492338
(Iteration 3601 / 4900) loss: 1.193582
(Iteration 3701 / 4900) loss: 1.478843
(Iteration 3801 / 4900) loss: 1.238349
(Iteration 3901 / 4900) loss: 1.245863
(Epoch 8 / 10) train acc: 0.541000; val_acc: 0.463000
(Iteration 4001 / 4900) loss: 1.189466
(Iteration 4101 / 4900) loss: 1.308431
(Iteration 4201 / 4900) loss: 1.214756
(Iteration 4301 / 4900) loss: 1.206561
(Iteration 4401 / 4900) loss: 1.226766
(Epoch 9 / 10) train acc: 0.580000; val_acc: 0.469000
(Iteration 4501 / 4900) loss: 1.394010
(Iteration 4601 / 4900) loss: 1.254027
(Iteration 4701 / 4900) loss: 1.267596
(Iteration 4801 / 4900) loss: 1.317326
(Epoch 10 / 10) train acc: 0.524000; val_acc: 0.456000
[[-8.65267446e-04  1.51763946e-03  9.20435820e-04 ... -4.76719102e-03
   7.77217911e-04 -8.47726990e-04]
 [ 2.32086673e-05 -4.17347496e-04  1.73077327e-03 ... -5.11830270e-03
  -2.35514265e-03  2.03577797e-03]
 [ 3.48962122e-03 -2.00372821e-03 -8.80326561e-04 ... -4.25016092e-03
  -2.64932774e-03 -1.77899186e-04]
 ...
 [-5.55743991e-03  1.22943772e-03 -2.07338521e-03 ... -1.73072769e-03
   6.69580779e-03 -3.15589662e-04]
 [-5.26821097e-03  1.30659737e-03 -1.25705290e-03 ... -2.73661681e-03
   8.23147883e-03  1.09734554e-03]
 [-5.91776197e-03  2.11672442e-03 -4.13249668e-03 ... -2.75154972e-03
   1.02069410e-02 -4.47075156e-04]]
[[-1.60424578e-02 -1.44495526e-03 -7.34970603e-03  9.01368202e-04
   1.55623009e-02 -3.62581706e-03 -5.84320906e-04  2.72385208e-02
   4.76207119e-03 -1.80963537e-02]
 [ 1.52368879e-02  2.07231868e-02 -4.48121477e-03 -1.17626917e-02
   8.76789395e-03 -2.14923410e-02 -3.27032031e-02  1.08266345e-03
   9.97337507e-03  7.45420266e-03]
 [ 1.18614824e-02 -2.38362001e-02  1.48061068e-02  7.76116287e-05
   1.60667125e-02 -1.86055814e-02 -8.02227202e-03 -3.20358717e-04
   6.13434966e-04  1.22273454e-02]
 [-7.50857442e-04  1.90234371e-02 -2.15347081e-03  1.66733625e-03
  -3.75170424e-03 -1.71393877e-02  1.81003493e-04 -3.56836093e-02
   1.75294772e-02  2.53622247e-02]
 [-3.49673296e-03  2.06117677e-02 -2.13287183e-02 -1.73488031e-04
  -3.08941178e-02 -1.40533499e-03  1.93718999e-02  7.74714752e-03
   1.01430668e-04  1.01074785e-02]
 [-4.37011604e-03 -1.14883838e-02 -5.36730084e-03  2.23905481e-02
   3.38230286e-03  1.74912708e-02  1.89724953e-02 -7.45828369e-03
  -1.77951256e-02 -1.73741855e-02]
 [ 1.16897576e-02  1.26029576e-02  1.70782247e-02 -4.04833502e-03
  -1.59590533e-03 -1.02942872e-02  1.76556102e-02 -8.08499506e-03
  -1.73377243e-02 -1.12368970e-02]
 [-7.57305355e-03 -1.15375744e-02  2.41357582e-02  1.73489319e-02
   1.67317649e-02  2.28332392e-03  1.31806144e-02 -1.10065590e-02
  -2.56360379e-02 -2.39286329e-02]
 [ 2.30064259e-02 -2.96425961e-03 -4.03068203e-03 -8.00831460e-03
  -1.26983837e-02 -1.93724243e-02  1.34735607e-02 -1.94503639e-02
   2.23903381e-02  1.41171993e-02]
 [ 1.54423213e-02  3.19227455e-03 -6.08874888e-03  1.29626471e-02
  -1.29176539e-02  1.19980221e-02  1.68266055e-02 -9.58344986e-03
  -3.34558775e-02 -1.01418863e-03]
 [-4.49660182e-03  3.75637308e-03 -1.62432260e-02 -7.52573250e-03
   9.24484349e-03 -1.87050200e-02 -7.30576699e-03  1.98456533e-02
   1.45571907e-02  8.02994864e-03]
 [ 8.25580312e-03 -1.87583519e-02  1.30588885e-02  2.56715642e-03
   7.78563490e-03  1.09213571e-02  8.39782669e-03 -7.62593831e-03
   1.02585755e-02 -3.92557982e-02]
 [ 7.80156272e-03 -1.76877732e-02  1.18152331e-02 -1.14707754e-02
   2.03164080e-02 -1.46426690e-02  2.32280502e-02  5.68054921e-03
  -2.17020840e-02 -3.07906528e-03]
 [ 1.40982469e-02  2.12723446e-02  1.09036366e-02 -1.79124919e-02
   1.10601055e-02 -2.04809282e-02 -1.58264851e-02 -1.32288024e-02
  -1.29714720e-04  1.12649303e-02]
 [ 1.55690096e-02 -2.16906390e-02  5.41258417e-03  2.42036969e-03
   4.37707307e-03 -1.33776324e-02  8.32660042e-03 -1.41589838e-02
   2.04197142e-02 -1.06104780e-02]
 [-4.18620851e-03 -2.44013525e-02  2.13459139e-02  5.55193680e-03
   1.19602248e-02  2.43105219e-02 -1.89440906e-03  2.27780792e-03
  -2.33871976e-02 -7.86370258e-03]
 [ 2.40810664e-02 -1.61717545e-02  8.63857782e-03  2.92629751e-03
   1.01909090e-02  3.01803036e-03  1.53690513e-02 -4.36431103e-03
  -3.10627207e-02 -1.30310738e-02]
 [ 8.21430097e-03  1.57416913e-02 -1.43467151e-03 -1.16325704e-02
  -1.69962121e-03 -5.18715149e-03 -3.11371103e-02  2.61728241e-02
   7.84465438e-03 -6.65490106e-03]
 [-1.55651407e-02  1.45786233e-03  8.15075602e-03  6.86704444e-03
  -2.35874183e-02  7.58216743e-03  2.60141399e-02 -9.82007535e-03
  -9.98973586e-03  9.70971629e-03]
 [ 1.98973021e-04 -9.46322154e-03 -8.08249592e-03  9.30379288e-03
   2.09076255e-02  2.13683998e-03  1.82009173e-02 -9.74929093e-03
   1.48325372e-02 -3.54134826e-02]
 [ 4.63137816e-03 -2.26478212e-02  1.75619428e-02 -5.50371631e-03
  -3.76185590e-03 -1.07035417e-02 -1.58218800e-02  1.77412601e-02
   1.18749965e-02  1.59136985e-03]
 [ 1.15401051e-03  3.71068203e-02 -1.54957040e-02 -1.86541985e-02
  -1.21524740e-04 -1.49971374e-02 -9.42137943e-03 -1.27119403e-02
  -4.25987037e-03  3.80558441e-02]
 [ 1.75128708e-03 -1.25531778e-02 -1.00462553e-02  1.54559160e-02
  -3.45250406e-03  2.08397038e-02  3.64340311e-03 -2.12258513e-02
   9.58560320e-03 -3.61221373e-03]
 [-4.98885749e-03 -6.20960472e-03  9.65878571e-03  1.01120865e-02
  -9.57115139e-03  1.73505511e-02 -2.07988264e-02  2.37550921e-02
  -2.09978447e-02  2.00398956e-03]
 [-1.21138352e-02 -1.40748200e-02  2.67994018e-03  1.67743421e-02
  -5.11470706e-03  1.20021226e-03  2.25908205e-04 -9.15001726e-03
   9.46016556e-03  6.58681508e-03]
 [-1.61718115e-02 -6.29068097e-03  3.27588263e-03  6.36251177e-03
   3.01484885e-02  1.12365420e-02  5.60125860e-03  1.05935293e-02
  -1.02906536e-02 -3.10828869e-02]
 [-1.13001760e-02  1.66960689e-03 -8.21625078e-03  1.22804326e-02
  -1.79231948e-03  1.49738198e-02 -9.50877270e-03 -2.38875764e-02
   2.02475209e-02  7.31421278e-04]
 [-2.94838358e-02 -4.57816634e-03 -5.18725384e-03  1.16480673e-02
   5.82439150e-03  7.38452717e-03  1.46029242e-02  1.62728298e-02
  -5.02674884e-04 -1.57042631e-02]
 [-1.60205424e-03  1.82456341e-02 -2.51082591e-02 -3.61452039e-03
   2.31294381e-03 -9.71137790e-03 -8.36810436e-03  3.28015404e-04
   2.70171239e-02 -8.53019869e-04]
 [ 3.02324988e-03 -9.60244430e-03 -2.43850781e-03 -3.94362539e-03
   5.31690129e-03  1.69179015e-03 -1.99875390e-02  2.23455143e-02
  -1.05879977e-02  1.51532595e-02]
 [ 1.38956046e-02 -1.25389632e-02  5.24115976e-03 -5.74330147e-04
  -1.14658119e-03  6.00115696e-03 -2.11223210e-03  3.01745566e-02
  -2.06781197e-02 -1.58534301e-02]
 [-1.56889425e-02  9.47890761e-03 -1.92278709e-02  5.79291525e-03
  -1.33564480e-02  9.16276996e-03  2.49984003e-02 -2.13365390e-02
   1.91370055e-02  1.18709131e-03]
 [ 7.65581991e-03 -5.01099845e-03  1.99242877e-02  9.01187629e-03
   5.55242267e-03 -9.35712714e-03  2.23218014e-02 -3.08133911e-02
  -2.45682531e-03 -1.59694099e-02]
 [-2.05889724e-02 -2.24815715e-02  3.64879420e-03  1.02681795e-02
   1.47531742e-03  2.06163333e-02  1.25428940e-02  1.10076294e-02
  -2.23149879e-02  6.50036747e-03]
 [ 1.18786399e-02  5.51139956e-03 -8.66184032e-04 -1.70817289e-02
  -3.70799732e-03 -5.63165952e-03 -2.52552048e-02 -5.43826022e-03
   2.98477459e-02  6.10763288e-03]
 [-1.04956310e-02 -2.46666833e-02  4.73606882e-03  2.02768625e-02
   1.23978734e-02  9.02835971e-03 -1.22983520e-02  9.48648186e-03
  -4.02680547e-03 -2.71005793e-03]
 [ 1.58084326e-02 -3.51425348e-04  6.79610222e-03  1.37642247e-02
  -7.10124777e-03  4.13698229e-03 -1.17672588e-02 -1.70728757e-02
   4.42737231e-03 -9.68916934e-03]
 [-1.42700247e-02  2.18264947e-02 -1.86600308e-02  4.04175385e-03
  -2.15913304e-02  9.71150740e-03  2.13518861e-02  8.24489021e-03
  -2.29040557e-02  1.74604909e-02]
 [-1.51423282e-02 -5.92198537e-03  5.54987774e-03  1.19512965e-02
  -1.60053312e-02  1.97272708e-02 -2.81928235e-05  1.68393355e-02
  -1.96531129e-02 -2.43929728e-04]
 [-1.53349367e-02 -1.61569978e-02  1.63832069e-02  1.39959168e-02
  -5.38117816e-03  1.98511208e-02 -9.45573902e-03 -6.21278640e-03
  -1.36152303e-02  1.02191441e-02]
 [ 1.22808928e-02 -2.23350041e-02 -1.07219647e-02  6.23347341e-03
   1.13519909e-02  8.96604212e-03 -5.24310813e-03  1.74632997e-02
  -1.33541469e-02  4.06067643e-04]
 [ 1.75175793e-02  2.17622525e-02 -9.74602057e-03 -2.24924917e-02
  -6.70274544e-04 -2.23773515e-02 -1.98526117e-02 -2.06977415e-03
   2.15144325e-02  1.74164408e-02]
 [ 9.81082333e-03 -1.13516175e-02  1.67093322e-02 -1.61020950e-03
   1.64745844e-02 -5.99925980e-03  2.79928131e-03  2.10973080e-02
  -1.89795940e-02 -2.68801397e-02]
 [ 2.27398220e-02  2.42946812e-02 -3.76239621e-03 -2.36776468e-02
   2.39659138e-04 -5.62712310e-03 -3.06041140e-02  4.33901062e-03
   1.23934713e-02 -1.96817105e-03]
 [ 9.42366468e-03  3.60055903e-02 -6.20074921e-03 -7.17017399e-03
  -3.16946551e-02 -4.85297152e-03 -1.07899808e-02 -1.03790330e-02
   1.76061245e-03  1.88566274e-02]
 [-2.05325510e-02 -2.94775189e-02  8.11095191e-03  1.39796729e-02
  -6.63446232e-03  1.41559304e-02  4.46609097e-03  7.96355689e-03
   1.33399633e-02 -5.05895570e-03]
 [-4.29670974e-03  2.05403202e-02 -1.38890880e-02 -1.78327822e-03
   8.78295397e-03 -9.32842802e-03 -2.80682148e-02  6.80802412e-05
   1.09751585e-02  1.56317507e-02]
 [-1.09664140e-02 -2.28032290e-02 -1.61430264e-03  3.04831132e-03
   3.05865733e-03  1.16554294e-02 -6.77961506e-03  3.09140882e-02
  -2.38316951e-02  1.36742794e-02]
 [-6.70834046e-03  3.09339490e-02 -1.36310431e-02 -1.03863065e-02
  -1.45314374e-02 -7.69066102e-03  5.29922250e-03 -2.94169974e-02
   1.27009284e-02  2.92754198e-02]
 [-2.56815264e-02  7.86625309e-03  1.26948758e-02  2.12580788e-03
   8.27404625e-04  5.02814090e-03  4.61908991e-03 -5.09876328e-03
  -2.40480677e-02  2.07218842e-02]]

Debug the training

With the default parameters we provided above, you should get a validation accuracy of about 0.36 on the validation set. This isn’t very good.

One strategy for getting insight into what’s wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.

# Run this cell to visualize training loss and train / val accuracy

plt.subplot(2, 1, 1)
plt.title('Training loss')
plt.plot(solver.loss_history, 'o')
plt.xlabel('Iteration')

plt.subplot(2, 1, 2)
plt.title('Accuracy')
plt.plot(solver.train_acc_history, '-o', label='train')
plt.plot(solver.val_acc_history, '-o', label='val')
plt.plot([0.5] * len(solver.val_acc_history), 'k--')
plt.xlabel('Epoch')
plt.legend(loc='lower right')
plt.gcf().set_size_inches(15, 12)
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jgojuKu1-1685330175487)(output_22_0.png)]

from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
    W1 = net.params['W1']
    W1 = W1.reshape(3, 32, 32, -1).transpose(3, 1, 2, 0)
    plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
    plt.gca().axis('off')
    plt.show()

show_net_weights(model)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KHNVOqNn-1685330175488)(output_23_0.png)]

Tune your hyperparameters

What’s wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can (52% could serve as a reference), with a fully-connected Neural Network. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).

best_model = None


#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_model.                                                          #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on thexs previous exercises.                          #
#################################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

best_val_accuracy = 0.0
def random_choose_hyperparams(hsize_values,lr_values,epo_values,reg_values):
    hsize = hsize_values[np.random.randint(0,len(hsize_values))]
    lr = lr_values[np.random.randint(0,len(lr_values))]
    epo = epo_values[np.random.randint(0,len(epo_values))]
    reg = reg_values[np.random.randint(0,len(reg_values))]
    return hsize,lr,epo,reg

input_size = 32*32*3
num_classes = 10

for i in range(3):
    hsize,lr,epo,reg = random_choose_hyperparams([50,80,100],[5e-5,1e-4,5e-4,1e-3],[10,20],[0.05,0.1,0.2])
    model = TwoLayerNet(input_size,hsize,num_classes,reg = reg)
    solver = Solver(model,data,update_rule='sgd',optim_config={'learning_rate':lr,},lr_decay=0.95,num_epochs=epo,batch_size=100,print_every=100,verbose=True)
    solver.train()
    print(hsize,lr,epo,reg)
    print('Validation accuracy:',solver.best_val_acc)
    if solver.best_val_acc > best_val_accuracy:
        best_val_accuracy = solver.best_val_acc
        best_model = model

print('Validation accuracy: ', best_val_accuracy)
    
    
    

# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
################################################################################
#                              END OF YOUR CODE                                #
################################################################################

(Iteration 1 / 9800) loss: 2.314531
(Epoch 0 / 20) train acc: 0.086000; val_acc: 0.100000
(Iteration 101 / 9800) loss: 2.289608
(Iteration 201 / 9800) loss: 2.249441
(Iteration 301 / 9800) loss: 2.227934
(Iteration 401 / 9800) loss: 2.137530
(Epoch 1 / 20) train acc: 0.284000; val_acc: 0.275000
(Iteration 501 / 9800) loss: 2.052334
(Iteration 601 / 9800) loss: 2.047436
(Iteration 701 / 9800) loss: 2.080860
(Iteration 801 / 9800) loss: 1.977065
(Iteration 901 / 9800) loss: 2.057965
(Epoch 2 / 20) train acc: 0.328000; val_acc: 0.317000
(Iteration 1001 / 9800) loss: 1.841247
(Iteration 1101 / 9800) loss: 1.936609
(Iteration 1201 / 9800) loss: 1.867998
(Iteration 1301 / 9800) loss: 1.905404
(Iteration 1401 / 9800) loss: 1.785550
(Epoch 3 / 20) train acc: 0.352000; val_acc: 0.339000
(Iteration 1501 / 9800) loss: 1.847292
(Iteration 1601 / 9800) loss: 1.777829
(Iteration 1701 / 9800) loss: 1.803744
(Iteration 1801 / 9800) loss: 1.583060
(Iteration 1901 / 9800) loss: 1.686975
(Epoch 4 / 20) train acc: 0.357000; val_acc: 0.384000
(Iteration 2001 / 9800) loss: 1.568758
(Iteration 2101 / 9800) loss: 1.811816
(Iteration 2201 / 9800) loss: 1.734856
(Iteration 2301 / 9800) loss: 1.865343
(Iteration 2401 / 9800) loss: 1.864461
(Epoch 5 / 20) train acc: 0.391000; val_acc: 0.392000
(Iteration 2501 / 9800) loss: 1.575060
(Iteration 2601 / 9800) loss: 1.731407
(Iteration 2701 / 9800) loss: 1.652412
(Iteration 2801 / 9800) loss: 1.627854
(Iteration 2901 / 9800) loss: 1.723521
(Epoch 6 / 20) train acc: 0.382000; val_acc: 0.409000
(Iteration 3001 / 9800) loss: 1.618601
(Iteration 3101 / 9800) loss: 1.723137
(Iteration 3201 / 9800) loss: 1.787325
(Iteration 3301 / 9800) loss: 1.708750
(Iteration 3401 / 9800) loss: 1.641683
(Epoch 7 / 20) train acc: 0.392000; val_acc: 0.423000
(Iteration 3501 / 9800) loss: 1.536491
(Iteration 3601 / 9800) loss: 1.745706
(Iteration 3701 / 9800) loss: 1.707037
(Iteration 3801 / 9800) loss: 1.723831
(Iteration 3901 / 9800) loss: 1.667133
(Epoch 8 / 20) train acc: 0.448000; val_acc: 0.431000
(Iteration 4001 / 9800) loss: 1.612006
(Iteration 4101 / 9800) loss: 1.855744
(Iteration 4201 / 9800) loss: 1.618294
(Iteration 4301 / 9800) loss: 1.681276
(Iteration 4401 / 9800) loss: 1.582790
(Epoch 9 / 20) train acc: 0.434000; val_acc: 0.437000
(Iteration 4501 / 9800) loss: 1.640186
(Iteration 4601 / 9800) loss: 1.529499
(Iteration 4701 / 9800) loss: 1.600999
(Iteration 4801 / 9800) loss: 1.698614
(Epoch 10 / 20) train acc: 0.430000; val_acc: 0.444000
(Iteration 4901 / 9800) loss: 1.554010
(Iteration 5001 / 9800) loss: 1.672571
(Iteration 5101 / 9800) loss: 1.739803
(Iteration 5201 / 9800) loss: 1.574428
(Iteration 5301 / 9800) loss: 1.564663
(Epoch 11 / 20) train acc: 0.436000; val_acc: 0.453000
(Iteration 5401 / 9800) loss: 1.607202
(Iteration 5501 / 9800) loss: 1.487490
(Iteration 5601 / 9800) loss: 1.614789
(Iteration 5701 / 9800) loss: 1.549869
(Iteration 5801 / 9800) loss: 1.503337
(Epoch 12 / 20) train acc: 0.453000; val_acc: 0.453000
(Iteration 5901 / 9800) loss: 1.486987
(Iteration 6001 / 9800) loss: 1.455507
(Iteration 6101 / 9800) loss: 1.404143
(Iteration 6201 / 9800) loss: 1.541085
(Iteration 6301 / 9800) loss: 1.661689
(Epoch 13 / 20) train acc: 0.440000; val_acc: 0.452000
(Iteration 6401 / 9800) loss: 1.558876
(Iteration 6501 / 9800) loss: 1.659895
(Iteration 6601 / 9800) loss: 1.634457
(Iteration 6701 / 9800) loss: 1.459978
(Iteration 6801 / 9800) loss: 1.505133
(Epoch 14 / 20) train acc: 0.465000; val_acc: 0.463000
(Iteration 6901 / 9800) loss: 1.563051
(Iteration 7001 / 9800) loss: 1.556096
(Iteration 7101 / 9800) loss: 1.492010
(Iteration 7201 / 9800) loss: 1.621878
(Iteration 7301 / 9800) loss: 1.560238
(Epoch 15 / 20) train acc: 0.447000; val_acc: 0.467000
(Iteration 7401 / 9800) loss: 1.610232
(Iteration 7501 / 9800) loss: 1.546164
(Iteration 7601 / 9800) loss: 1.584337
(Iteration 7701 / 9800) loss: 1.588330
(Iteration 7801 / 9800) loss: 1.396530
(Epoch 16 / 20) train acc: 0.452000; val_acc: 0.463000
(Iteration 7901 / 9800) loss: 1.524747
(Iteration 8001 / 9800) loss: 1.534408
(Iteration 8101 / 9800) loss: 1.540677
(Iteration 8201 / 9800) loss: 1.601728
(Iteration 8301 / 9800) loss: 1.566610
(Epoch 17 / 20) train acc: 0.464000; val_acc: 0.472000
(Iteration 8401 / 9800) loss: 1.790128
(Iteration 8501 / 9800) loss: 1.382996
(Iteration 8601 / 9800) loss: 1.561834
(Iteration 8701 / 9800) loss: 1.453540
(Iteration 8801 / 9800) loss: 1.496857
(Epoch 18 / 20) train acc: 0.437000; val_acc: 0.470000
(Iteration 8901 / 9800) loss: 1.530574
(Iteration 9001 / 9800) loss: 1.429663
(Iteration 9101 / 9800) loss: 1.675070
(Iteration 9201 / 9800) loss: 1.523313
(Iteration 9301 / 9800) loss: 1.299336
(Epoch 19 / 20) train acc: 0.475000; val_acc: 0.468000
(Iteration 9401 / 9800) loss: 1.478960
(Iteration 9501 / 9800) loss: 1.539008
(Iteration 9601 / 9800) loss: 1.513070
(Iteration 9701 / 9800) loss: 1.568688
(Epoch 20 / 20) train acc: 0.485000; val_acc: 0.470000
100 5e-05 20 0.05
Validation accuracy: 0.472
(Iteration 1 / 4900) loss: 2.309834
(Epoch 0 / 10) train acc: 0.101000; val_acc: 0.081000
(Iteration 101 / 4900) loss: 2.230970
(Iteration 201 / 4900) loss: 2.125892
(Iteration 301 / 4900) loss: 1.989736
(Iteration 401 / 4900) loss: 2.041632
(Epoch 1 / 10) train acc: 0.324000; val_acc: 0.306000
(Iteration 501 / 4900) loss: 1.983053
(Iteration 601 / 4900) loss: 2.003096
(Iteration 701 / 4900) loss: 1.858512
(Iteration 801 / 4900) loss: 1.864417
(Iteration 901 / 4900) loss: 1.975265
(Epoch 2 / 10) train acc: 0.353000; val_acc: 0.369000
(Iteration 1001 / 4900) loss: 1.795469
(Iteration 1101 / 4900) loss: 1.893576
(Iteration 1201 / 4900) loss: 1.870940
(Iteration 1301 / 4900) loss: 1.982137
(Iteration 1401 / 4900) loss: 1.714858
(Epoch 3 / 10) train acc: 0.406000; val_acc: 0.396000
(Iteration 1501 / 4900) loss: 1.780998
(Iteration 1601 / 4900) loss: 1.673103
(Iteration 1701 / 4900) loss: 1.672222
(Iteration 1801 / 4900) loss: 1.700685
(Iteration 1901 / 4900) loss: 1.805218
(Epoch 4 / 10) train acc: 0.419000; val_acc: 0.427000
(Iteration 2001 / 4900) loss: 1.584214
(Iteration 2101 / 4900) loss: 1.599240
(Iteration 2201 / 4900) loss: 1.718941
(Iteration 2301 / 4900) loss: 1.559381
(Iteration 2401 / 4900) loss: 1.528562
(Epoch 5 / 10) train acc: 0.434000; val_acc: 0.442000
(Iteration 2501 / 4900) loss: 1.741587
(Iteration 2601 / 4900) loss: 1.511955
(Iteration 2701 / 4900) loss: 1.639566
(Iteration 2801 / 4900) loss: 1.525466
(Iteration 2901 / 4900) loss: 1.703194
(Epoch 6 / 10) train acc: 0.436000; val_acc: 0.445000
(Iteration 3001 / 4900) loss: 1.499712
(Iteration 3101 / 4900) loss: 1.779692
(Iteration 3201 / 4900) loss: 1.490515
(Iteration 3301 / 4900) loss: 1.554801
(Iteration 3401 / 4900) loss: 1.496210
(Epoch 7 / 10) train acc: 0.451000; val_acc: 0.465000
(Iteration 3501 / 4900) loss: 1.797402
(Iteration 3601 / 4900) loss: 1.605308
(Iteration 3701 / 4900) loss: 1.679594
(Iteration 3801 / 4900) loss: 1.545293
(Iteration 3901 / 4900) loss: 1.460504
(Epoch 8 / 10) train acc: 0.447000; val_acc: 0.470000
(Iteration 4001 / 4900) loss: 1.555695
(Iteration 4101 / 4900) loss: 1.418404
(Iteration 4201 / 4900) loss: 1.556150
(Iteration 4301 / 4900) loss: 1.681922
(Iteration 4401 / 4900) loss: 1.624050
(Epoch 9 / 10) train acc: 0.493000; val_acc: 0.471000
(Iteration 4501 / 4900) loss: 1.601087
(Iteration 4601 / 4900) loss: 1.465135
(Iteration 4701 / 4900) loss: 1.409711
(Iteration 4801 / 4900) loss: 1.538485
(Epoch 10 / 10) train acc: 0.482000; val_acc: 0.473000
100 0.0001 10 0.05
Validation accuracy: 0.473
(Iteration 1 / 4900) loss: 2.310287
(Epoch 0 / 10) train acc: 0.110000; val_acc: 0.119000
(Iteration 101 / 4900) loss: 1.780769
(Iteration 201 / 4900) loss: 1.812097
(Iteration 301 / 4900) loss: 1.625240
(Iteration 401 / 4900) loss: 1.606158
(Epoch 1 / 10) train acc: 0.427000; val_acc: 0.429000
(Iteration 501 / 4900) loss: 1.647705
(Iteration 601 / 4900) loss: 1.663074
(Iteration 701 / 4900) loss: 1.706872
(Iteration 801 / 4900) loss: 1.555526
(Iteration 901 / 4900) loss: 1.448925
(Epoch 2 / 10) train acc: 0.478000; val_acc: 0.480000
(Iteration 1001 / 4900) loss: 1.652969
(Iteration 1101 / 4900) loss: 1.505636
(Iteration 1201 / 4900) loss: 1.557228
(Iteration 1301 / 4900) loss: 1.368537
(Iteration 1401 / 4900) loss: 1.447913
(Epoch 3 / 10) train acc: 0.483000; val_acc: 0.480000
(Iteration 1501 / 4900) loss: 1.579015
(Iteration 1601 / 4900) loss: 1.556650
(Iteration 1701 / 4900) loss: 1.469553
(Iteration 1801 / 4900) loss: 1.523136
(Iteration 1901 / 4900) loss: 1.429958
(Epoch 4 / 10) train acc: 0.520000; val_acc: 0.490000
(Iteration 2001 / 4900) loss: 1.644379
(Iteration 2101 / 4900) loss: 1.508203
(Iteration 2201 / 4900) loss: 1.470445
(Iteration 2301 / 4900) loss: 1.423005
(Iteration 2401 / 4900) loss: 1.455669
(Epoch 5 / 10) train acc: 0.538000; val_acc: 0.458000
(Iteration 2501 / 4900) loss: 1.284192
(Iteration 2601 / 4900) loss: 1.404772
(Iteration 2701 / 4900) loss: 1.424585
(Iteration 2801 / 4900) loss: 1.234072
(Iteration 2901 / 4900) loss: 1.378710
(Epoch 6 / 10) train acc: 0.523000; val_acc: 0.481000
(Iteration 3001 / 4900) loss: 1.107199
(Iteration 3101 / 4900) loss: 1.462635
(Iteration 3201 / 4900) loss: 1.462565
(Iteration 3301 / 4900) loss: 1.667559
(Iteration 3401 / 4900) loss: 1.357918
(Epoch 7 / 10) train acc: 0.537000; val_acc: 0.486000
(Iteration 3501 / 4900) loss: 1.333216
(Iteration 3601 / 4900) loss: 1.376200
(Iteration 3701 / 4900) loss: 1.320858
(Iteration 3801 / 4900) loss: 1.409946
(Iteration 3901 / 4900) loss: 1.407231
(Epoch 8 / 10) train acc: 0.564000; val_acc: 0.477000
(Iteration 4001 / 4900) loss: 1.529902
(Iteration 4101 / 4900) loss: 1.254411
(Iteration 4201 / 4900) loss: 1.307782
(Iteration 4301 / 4900) loss: 1.209206
(Iteration 4401 / 4900) loss: 1.538928
(Epoch 9 / 10) train acc: 0.559000; val_acc: 0.497000
(Iteration 4501 / 4900) loss: 1.326390
(Iteration 4601 / 4900) loss: 1.489104
(Iteration 4701 / 4900) loss: 1.387188
(Iteration 4801 / 4900) loss: 1.410794
(Epoch 10 / 10) train acc: 0.578000; val_acc: 0.484000
50 0.001 10 0.1
Validation accuracy: 0.497
Validation accuracy:  0.497

Test your model!

Run your best model on the validation and test sets. You should achieve above 48% accuracy on the validation set and the test set.

y_val_pred = np.argmax(best_model.loss(data['X_val']), axis=1)
print('Validation set accuracy: ', (y_val_pred == data['y_val']).mean())

Validation set accuracy:  0.497

y_test_pred = np.argmax(best_model.loss(data['X_test']), axis=1)
print('Test set accuracy: ', (y_test_pred == data['y_test']).mean())

Test set accuracy:  0.507

Inline Question 2:

Now that you have trained a Neural Network classifier, you may find that your testing accuracy is much lower than the training accuracy. In what ways can we decrease this gap? Select all that apply.

Train on a larger dataset.
Add more hidden units.
Increase the regularization strength.
None of the above.

$\color{blue}{\textit Your Answer:}$

$\color{blue}{\textit Your Explanation:}$

layers的代码：

from builtins import range
import numpy as np



def affine_forward(x, w, b):
    """
    Computes the forward pass for an affine (fully-connected) layer.

    The input x has shape (N, d_1, ..., d_k) and contains a minibatch of N
    examples, where each example x[i] has shape (d_1, ..., d_k). We will
    reshape each input into a vector of dimension D = d_1 * ... * d_k, and
    then transform it to an output vector of dimension M.

    Inputs:
    - x: A numpy array containing input data, of shape (N, d_1, ..., d_k)
    - w: A numpy array of weights, of shape (D, M)
    - b: A numpy array of biases, of shape (M,)

    Returns a tuple of:
    - out: output, of shape (N, M)
    - cache: (x, w, b)
    cache：隐藏
    """
    out = None
    ###########################################################################
    # TODO: Implement the affine forward pass. Store the result in out. You   #
    # will need to reshape the input into rows.                               #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    inp = x.reshape((x.shape[0],-1))
    out = np.dot(inp,w)+b

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, w, b)
    return out, cache


def affine_backward(dout, cache):
    """
    Computes the backward pass for an affine layer.

    Inputs:
    - dout: Upstream derivative, of shape (N, M)
    - cache: Tuple of:
      - x: Input data, of shape (N, d_1, ... d_k)
      - w: Weights, of shape (D, M)
      - b: Biases, of shape (M,)

    Returns a tuple of:
    - dx: Gradient with respect to x, of shape (N, d1, ..., d_k)
    - dw: Gradient with respect to w, of shape (D, M)
    - db: Gradient with respect to b, of shape (M,)
    """
    x, w, b = cache
    dx, dw, db = None, None, None
    ###########################################################################
    # TODO: Implement the affine backward pass.                               #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    inp = x.reshape((x.shape[0],-1))
    dinp = dout @ w.T
    #dinp是x的偏导
    
    dx = dinp.reshape(x.shape)
    dw = inp.T @ dout
    db = np.sum(dout,axis=0)
    

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dw, db


def relu_forward(x):
    """
    Computes the forward pass for a layer of rectified linear units (ReLUs).

    Input:
    - x: Inputs, of any shape

    Returns a tuple of:
    - out: Output, of the same shape as x
    - cache: x
    """
    out = None
    ###########################################################################
    # TODO: Implement the ReLU forward pass.                                  #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    out = np.copy(x)
    out[x<0] = 0

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = x
    return out, cache


def relu_backward(dout, cache):
    """
    Computes the backward pass for a layer of rectified linear units (ReLUs).

    Input:
    - dout: Upstream derivatives, of any shape
    - cache: Input x, of same shape as dout

    Returns:
    - dx: Gradient with respect to x
    """
    dx, x = None, cache
    ###########################################################################
    # TODO: Implement the ReLU backward pass.                                 #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    dx = np.copy(dout)
    dx[x<0]=0

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx


def batchnorm_forward(x, gamma, beta, bn_param):
    """
    Forward pass for batch normalization.

    During training the sample mean and (uncorrected) sample variance are
    computed from minibatch statistics and used to normalize the incoming data.
    During training we also keep an exponentially decaying running mean of the
    mean and variance of each feature, and these averages are used to normalize
    data at test-time.

    At each timestep we update the running averages for mean and variance using
    an exponential decay based on the momentum parameter:

    running_mean = momentum * running_mean + (1 - momentum) * sample_mean
    running_var = momentum * running_var + (1 - momentum) * sample_var

    Note that the batch normalization paper suggests a different test-time
    behavior: they compute sample mean and variance for each feature using a
    large number of training images rather than using a running average. For
    this implementation we have chosen to use running averages instead since
    they do not require an additional estimation step; the torch7
    implementation of batch normalization also uses running averages.

    Input:
    - x: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
    """
    mode = bn_param["mode"]
    eps = bn_param.get("eps", 1e-5)
    momentum = bn_param.get("momentum", 0.9)

    N, D = x.shape
    running_mean = bn_param.get("running_mean", np.zeros(D, dtype=x.dtype))
    running_var = bn_param.get("running_var", np.zeros(D, dtype=x.dtype))

    out, cache = None, None
    if mode == "train":
        #######################################################################
        # TODO: Implement the training-time forward pass for batch norm.      #
        # Use minibatch statistics to compute the mean and variance, use      #
        # these statistics to normalize the incoming data, and scale and      #
        # shift the normalized data using gamma and beta.                     #
        #                                                                     #
        # You should store the output in the variable out. Any intermediates  #
        # that you need for the backward pass should be stored in the cache   #
        # variable.                                                           #
        #                                                                     #
        # You should also use your computed sample mean and variance together #
        # with the momentum variable to update the running mean and running   #
        # variance, storing your result in the running_mean and running_var   #
        # variables.                                                          #
        #                                                                     #
        # Note that though you should be keeping track of the running         #
        # variance, you should normalize the data based on the standard       #
        # deviation (square root of variance) instead!                        #
        # Referencing the original paper (https://arxiv.org/abs/1502.03167)   #
        # might prove to be helpful.                                          #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                           END OF YOUR CODE                          #
        #######################################################################
    elif mode == "test":
        #######################################################################
        # TODO: Implement the test-time forward pass for batch normalization. #
        # Use the running mean and variance to normalize the incoming data,   #
        # then scale and shift the normalized data using gamma and beta.      #
        # Store the result in the out variable.                               #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                          END OF YOUR CODE                           #
        #######################################################################
    else:
        raise ValueError('Invalid forward batchnorm mode "%s"' % mode)

    # Store the updated running means back into bn_param
    bn_param["running_mean"] = running_mean
    bn_param["running_var"] = running_var

    return out, cache


def batchnorm_backward(dout, cache):
    """
    Backward pass for batch normalization.

    For this implementation, you should write out a computation graph for
    batch normalization on paper and propagate gradients backward through
    intermediate nodes.

    Inputs:
    - dout: Upstream derivatives, of shape (N, D)
    - cache: Variable of intermediates from batchnorm_forward.

    Returns a tuple of:
    - dx: Gradient with respect to inputs x, of shape (N, D)
    - dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
    - dbeta: Gradient with respect to shift parameter beta, of shape (D,)
    """
    dx, dgamma, dbeta = None, None, None
    ###########################################################################
    # TODO: Implement the backward pass for batch normalization. Store the    #
    # results in the dx, dgamma, and dbeta variables.                         #
    # Referencing the original paper (https://arxiv.org/abs/1502.03167)       #
    # might prove to be helpful.                                              #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta


def batchnorm_backward_alt(dout, cache):
    """
    Alternative backward pass for batch normalization.

    For this implementation you should work out the derivatives for the batch
    normalizaton backward pass on paper and simplify as much as possible. You
    should be able to derive a simple expression for the backward pass.
    See the jupyter notebook for more hints.

    Note: This implementation should expect to receive the same cache variable
    as batchnorm_backward, but might not use all of the values in the cache.

    Inputs / outputs: Same as batchnorm_backward
    """
    dx, dgamma, dbeta = None, None, None
    ###########################################################################
    # TODO: Implement the backward pass for batch normalization. Store the    #
    # results in the dx, dgamma, and dbeta variables.                         #
    #                                                                         #
    # After computing the gradient with respect to the centered inputs, you   #
    # should be able to compute gradients with respect to the inputs in a     #
    # single statement; our implementation fits on a single 80-character line.#
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta


def layernorm_forward(x, gamma, beta, ln_param):
    """
    Forward pass for layer normalization.

    During both training and test-time, the incoming data is normalized per data-point,
    before being scaled by gamma and beta parameters identical to that of batch normalization.

    Note that in contrast to batch normalization, the behavior during train and test-time for
    layer normalization are identical, and we do not need to keep track of running averages
    of any sort.

    Input:
    - x: Data of shape (N, D)
    - gamma: Scale parameter of shape (D,)
    - beta: Shift paremeter of shape (D,)
    - ln_param: Dictionary with the following keys:
        - eps: Constant for numeric stability

    Returns a tuple of:
    - out: of shape (N, D)
    - cache: A tuple of values needed in the backward pass
    """
    out, cache = None, None
    eps = ln_param.get("eps", 1e-5)
    ###########################################################################
    # TODO: Implement the training-time forward pass for layer norm.          #
    # Normalize the incoming data, and scale and  shift the normalized data   #
    #  using gamma and beta.                                                  #
    # HINT: this can be done by slightly modifying your training-time         #
    # implementation of  batch normalization, and inserting a line or two of  #
    # well-placed code. In particular, can you think of any matrix            #
    # transformations you could perform, that would enable you to copy over   #
    # the batch norm code and leave it almost unchanged?                      #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return out, cache


def layernorm_backward(dout, cache):
    """
    Backward pass for layer normalization.

    For this implementation, you can heavily rely on the work you've done already
    for batch normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, D)
    - cache: Variable of intermediates from layernorm_forward.

    Returns a tuple of:
    - dx: Gradient with respect to inputs x, of shape (N, D)
    - dgamma: Gradient with respect to scale parameter gamma, of shape (D,)
    - dbeta: Gradient with respect to shift parameter beta, of shape (D,)
    """
    dx, dgamma, dbeta = None, None, None
    ###########################################################################
    # TODO: Implement the backward pass for layer norm.                       #
    #                                                                         #
    # HINT: this can be done by slightly modifying your training-time         #
    # implementation of batch normalization. The hints to the forward pass    #
    # still apply!                                                            #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dgamma, dbeta


def dropout_forward(x, dropout_param):
    """
    Performs the forward pass for (inverted) dropout.

    Inputs:
    - x: Input data, of any shape
    - dropout_param: A dictionary with the following keys:
      - p: Dropout parameter. We keep each neuron output with probability p.
      - mode: 'test' or 'train'. If the mode is train, then perform dropout;
        if the mode is test, then just return the input.
      - seed: Seed for the random number generator. Passing seed makes this
        function deterministic, which is needed for gradient checking but not
        in real networks.

    Outputs:
    - out: Array of the same shape as x.
    - cache: tuple (dropout_param, mask). In training mode, mask is the dropout
      mask that was used to multiply the input; in test mode, mask is None.

    NOTE: Please implement **inverted** dropout, not the vanilla version of dropout.
    See http://cs231n.github.io/neural-networks-2/#reg for more details.

    NOTE 2: Keep in mind that p is the probability of **keep** a neuron
    output; this might be contrary to some sources, where it is referred to
    as the probability of dropping a neuron output.
    """
    p, mode = dropout_param["p"], dropout_param["mode"]
    if "seed" in dropout_param:
        np.random.seed(dropout_param["seed"])

    mask = None
    out = None

    if mode == "train":
        #######################################################################
        # TODO: Implement training phase forward pass for inverted dropout.   #
        # Store the dropout mask in the mask variable.                        #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                           END OF YOUR CODE                          #
        #######################################################################
    elif mode == "test":
        #######################################################################
        # TODO: Implement the test phase forward pass for inverted dropout.   #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                            END OF YOUR CODE                         #
        #######################################################################

    cache = (dropout_param, mask)
    out = out.astype(x.dtype, copy=False)

    return out, cache


def dropout_backward(dout, cache):
    """
    Perform the backward pass for (inverted) dropout.

    Inputs:
    - dout: Upstream derivatives, of any shape
    - cache: (dropout_param, mask) from dropout_forward.
    """
    dropout_param, mask = cache
    mode = dropout_param["mode"]

    dx = None
    if mode == "train":
        #######################################################################
        # TODO: Implement training phase backward pass for inverted dropout   #
        #######################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        pass

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        #######################################################################
        #                          END OF YOUR CODE                           #
        #######################################################################
    elif mode == "test":
        dx = dout
    return dx


def conv_forward_naive(x, w, b, conv_param):
    """
    A naive implementation of the forward pass for a convolutional layer.

    The input consists of N data points, each with C channels, height H and
    width W. We convolve each input with F different filters, where each filter
    spans all C channels and has height HH and width WW.

    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input.


    During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
    along the height and width axes of the input. Be careful not to modfiy the original
    input x directly.

    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the convolutional forward pass.                         #
    # Hint: you can use the function np.pad for padding.                      #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, w, b, conv_param)
    return out, cache


def conv_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a convolutional layer.

    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    ###########################################################################
    # TODO: Implement the convolutional backward pass.                        #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dw, db


def max_pool_forward_naive(x, pool_param):
    """
    A naive implementation of the forward pass for a max-pooling layer.

    Inputs:
    - x: Input data, of shape (N, C, H, W)
    - pool_param: dictionary with the following keys:
      - 'pool_height': The height of each pooling region
      - 'pool_width': The width of each pooling region
      - 'stride': The distance between adjacent pooling regions

    No padding is necessary here, eg you can assume:
      - (H - pool_height) % stride == 0
      - (W - pool_width) % stride == 0

    Returns a tuple of:
    - out: Output data, of shape (N, C, H', W') where H' and W' are given by
      H' = 1 + (H - pool_height) / stride
      W' = 1 + (W - pool_width) / stride
    - cache: (x, pool_param)
    """
    out = None
    ###########################################################################
    # TODO: Implement the max-pooling forward pass                            #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    cache = (x, pool_param)
    return out, cache


def max_pool_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a max-pooling layer.

    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.

    Returns:
    - dx: Gradient with respect to x
    """
    dx = None
    ###########################################################################
    # TODO: Implement the max-pooling backward pass                           #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx


def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """
    Computes the forward pass for spatial batch normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance. momentum=0 means that
        old information is discarded completely at every time step, while
        momentum=1 means that new information is never incorporated. The
        default of momentum=0.9 should work well in most situations.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None

    ###########################################################################
    # TODO: Implement the forward pass for spatial batch normalization.       #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return out, cache


def spatial_batchnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial batch normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial batch normalization.      #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################

    return dx, dgamma, dbeta


def spatial_groupnorm_forward(x, gamma, beta, G, gn_param):
    """
    Computes the forward pass for spatial group normalization.
    In contrast to layer normalization, group normalization splits each entry
    in the data into G contiguous pieces, which it then normalizes independently.
    Per feature shifting and scaling are then applied to the data, in a manner identical to that of batch normalization and layer normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (1, C, 1, 1)
    - beta: Shift parameter, of shape (1, C, 1, 1)
    - G: Integer mumber of groups to split into, should be a divisor of C
    - gn_param: Dictionary with the following keys:
      - eps: Constant for numeric stability

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    eps = gn_param.get("eps", 1e-5)
    ###########################################################################
    # TODO: Implement the forward pass for spatial group normalization.       #
    # This will be extremely similar to the layer norm implementation.        #
    # In particular, think about how you could transform the matrix so that   #
    # the bulk of the code is similar to both train-time batch normalization  #
    # and layer normalization!                                                #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return out, cache


def spatial_groupnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial group normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (1, C, 1, 1)
    - dbeta: Gradient with respect to shift parameter, of shape (1, C, 1, 1)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial group normalization.      #
    # This will be extremely similar to the layer norm implementation.        #
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    pass

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return dx, dgamma, dbeta


def svm_loss(x, y):
    """
    Computes the loss and gradient using for multiclass SVM classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    loss, dx = None, None

    ###########################################################################
    # TODO: Copy over your solution from A1.
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = x.shape[0]
    loss = x.T - x[np.arange(num_train),y] +1
    loss[y,np.arange(num_train)] = 0
    dx = (loss>0).astype(int).T
    loss = np.sum(loss,where = loss > 0) / num_train
    dx[np.arange(num_train),y] = np.sum(dx,axis = 1) * (-1)
    dx = dx / num_train

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return loss, dx


def softmax_loss(x, y):
    """
    Computes the loss and gradient for softmax classification.

    Inputs:
    - x: Input data, of shape (N, C) where x[i, j] is the score for the jth
      class for the ith input.
    - y: Vector of labels, of shape (N,) where y[i] is the label for x[i] and
      0 <= y[i] < C

    Returns a tuple of:
    - loss: Scalar giving the loss
    - dx: Gradient of the loss with respect to x
    """
    loss, dx = None, None

    ###########################################################################
    # TODO: Copy over your solution from A1.
    ###########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    num_train = x.shape[0]
    ex = np.exp(x)
    p = (ex.T / np.sum(ex,axis = 1)).T
    loss = -np.sum(np.log(p[np.arange(num_train),y])) / num_train
    
    dx = np.copy(p)
    dx[np.arange(num_train),y] -= 1
    dx = dx / num_train

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ###########################################################################
    #                             END OF YOUR CODE                            #
    ###########################################################################
    return loss, dx

fc_net代码：

from builtins import range
from builtins import object
import numpy as np

from ..layers import *
from ..layer_utils import *


class TwoLayerNet(object):
    """
    A two-layer fully-connected neural network with ReLU nonlinearity and
    softmax loss that uses a modular layer design. We assume an input dimension
    of D, a hidden dimension of H, and perform classification over C classes.

    The architecure should be affine - relu - affine - softmax.

    Note that this class does not implement gradient descent; instead, it
    will interact with a separate Solver object that is responsible for running
    optimization.

    The learnable parameters of the model are stored in the dictionary
    self.params that maps parameter names to numpy arrays.
    """

    def __init__(
        self,
        input_dim=3 * 32 * 32,
        hidden_dim=100,
        num_classes=10,
        weight_scale=1e-3,
        reg=0.0,
    ):
        """
        Initialize a new network.

        Inputs:
        - input_dim: An integer giving the size of the input
        - hidden_dim: An integer giving the size of the hidden layer
        - num_classes: An integer giving the number of classes to classify
        - weight_scale: Scalar giving the standard deviation for random
          initialization of the weights.
        - reg: Scalar giving L2 regularization strength.
        """
        self.params = {}
        self.reg = reg

        ############################################################################
        # TODO: Initialize the weights and biases of the two-layer net. Weights    #
        # should be initialized from a Gaussian centered at 0.0 with               #
        # standard deviation equal to weight_scale, and biases should be           #
        # initialized to zero. All weights and biases should be stored in the      #
        # dictionary self.params, with first layer weights                         #
        # and biases using the keys 'W1' and 'b1' and second layer                 #
        # weights and biases using the keys 'W2' and 'b2'.                         #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        self.params['W1'] = np.random.normal(0.0,weight_scale,(input_dim,hidden_dim))
        self.params['b1'] = np.zeros(hidden_dim)
        self.params['W2'] = np.random.normal(0.0,weight_scale,(hidden_dim,num_classes))
        self.params['b2'] = np.zeros(num_classes)

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

    def loss(self, X, y=None):
        """
        Compute loss and gradient for a minibatch of data.

        Inputs:
        - X: Array of input data of shape (N, d_1, ..., d_k)
        - y: Array of labels, of shape (N,). y[i] gives the label for X[i].

        Returns:
        If y is None, then run a test-time forward pass of the model and return:
        - scores: Array of shape (N, C) giving classification scores, where
          scores[i, c] is the classification score for X[i] and class c.

        If y is not None, then run a training-time forward and backward pass and
        return a tuple of:
        - loss: Scalar value giving the loss
        - grads: Dictionary with the same keys as self.params, mapping parameter
          names to gradients of the loss with respect to those parameters.
        """
        scores = None
        ############################################################################
        # TODO: Implement the forward pass for the two-layer net, computing the    #
        # class scores for X and storing them in the scores variable.              #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        l1,ch1 = affine_relu_forward(X,self.params['W1'],self.params['b1'])
        l2,ch2 = affine_forward(l1,self.params['W2'],self.params['b2'])
        scores = l2

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        # If y is None then we are in test mode so just return scores
        if y is None:
            return scores

        loss, grads = 0, {}
        ############################################################################
        # TODO: Implement the backward pass for the two-layer net. Store the loss  #
        # in the loss variable and gradients in the grads dictionary. Compute data #
        # loss using softmax, and make sure that grads[k] holds the gradients for  #
        # self.params[k]. Don't forget to add L2 regularization!                   #
        #                                                                          #
        # NOTE: To ensure that your implementation matches ours and you pass the   #
        # automated tests, make sure that your L2 regularization includes a factor #
        # of 0.5 to simplify the expression for the gradient.                      #
        ############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

        l_softmax,g_softmax = softmax_loss(l2,y)
        #g_softmax就是grad，就是dout
        l2_dx,l2_dw,l2_db = affine_backward(g_softmax,ch2)
        l1_dx,l1_dw,l1_db = affine_relu_backward(l2_dx,ch1)
        
        loss = l_softmax + 0.5 * self.reg * (np.sum(self.params['W1']*self.params['W1']) + np.sum(self.params['W2'] * self.params['W2']))
        
        grads['W1'] = l1_dw + self.reg * self.params['W1']
        grads['b1'] = l1_db
        grads['W2'] = l2_dw + self.reg * self.params['W2']
        grads['b2'] = l2_db
        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        return loss, grads

鱼鱼9901

关注

0
点赞
踩
7

收藏

觉得还不错? 一键收藏
3
评论
CS231N作业A1Q4：two_layer_net 学会了loss的可视化、超参数的调节，准备自己应用一下！

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-jgojuKu1-1685330175487)(output_22_0.png)][外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-KHNVOqNn-1685330175488)(output_23_0.png)]
复制链接

扫一扫

专栏目录