CS231n 课程作业 Assignment Two（二）全连接神经网络（0820）

最新推荐文章于 2024-06-15 23:42:06 发布

阿桥今天吃饱了吗

最新推荐文章于 2024-06-15 23:42:06 发布

阅读量1.4k

点赞数 3

分类专栏：计算机视觉文章标签：神经网络

本文链接：https://blog.csdn.net/yq1271/article/details/108144881

版权

Assignment Two（二）全连接神经网络

主要工作为：模块化设计、最优化更新的几种方法

一、模块设计

在A1中，实现了完全连接的两层神经网络。但功能上不是很模块化，因为损耗和梯度是在单个整体函数中计算的。这对于简单的两层网络是可管理的，但是随着转向更大的模型，这将变得不切实际。

理想情况下，期望使用更具模块化的设计来构建网络，以便隔离地实现不同的层类型，然后将它们组合在一起成为具有不同体系结构的模型。

在本练习中，使用更加模块化的方法来实现完全连接的网络。对于每一层，我们将实现前向和后向功能。

1）前向函数将接收输入，权重和其他参数，并将返回输出和缓存对象，该对象存储了向后传递所需的数据。

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
  cache = (x, w, z, out) # Values we need to compute gradients
  return out, cache

2）反向函数：向后传递将接收上游导数和缓存对象，并将返回关于输入和权重的梯度

def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  return dx, dw

1.1 Affine layer: forward

def affine_forward(x, w, b):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], x.size // x.shape[0]
    out = np.dot(x.reshape(N, D), w) + b    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = (x, w, b)
    return out, cache

代码分析：

计算映射层的前向计算函数
输入x的形状为（N，d_1，...，d_k），并且包含N的小批量示例，x [i] 表示为（d_1，...，d_k）
将输入整形为尺寸为D = d_1 * ... * d_k的向量
然后将其转换成尺寸为M的输出向量。

输入：
 -x：包含形状为（N，d_1，...，d_k）的输入数据的numpy数组
-w：为N的数组，形状为（D，M）
 -b：数量为（M）的偏置量

 返回一个元组：
 -输出：形状为（N，M）的输出
 -cache：（x，w，b）

Test_1.1 affine_forward
检测误差小于 1e-9

# Test the affine_forward function
num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

输出：

Testing affine_forward function: 
difference:  9.769849468192957e-10

1.2 Affine layer: backward

def affine_backward(dout, cache):
    x, w, b = cache
    dx, dw, db = None, None, None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], w.shape[0]
    dx = np.dot(dout, w.T).reshape(x.shape)
    dw = np.dot(x.reshape(N, D).T, dout)
    db = np.sum(dout, axis = 0)
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx, dw, db

代码分析：

计算映射层反向函数。

 输入：
 -dout：形状为（N，M）的上游导数
 -cache：元组：
   -x：形状为（N，d_1，... d_k）的输入数据
   -w：权重（形状，D，M）
   -b：M个偏置量

 返回一个元组：
 -dx、-dw、-db

Test_1.2 affine_backward

# Test the affine_backward function
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# The error should be around e-10 or less
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出：

Testing affine_backward function:
dx error:  5.399100368651805e-11
dw error:  9.904211865398145e-11
db error:  2.4122867568119087e-11

1.3 ReLU activation: forward

计算ReLU层的正向函数

def relu_forward(x):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    out = np.maximum(0, x)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = x
    return out, cache

Test_1.3 relu_forward

# Test the relu_forward function

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# Compare your output with ours. The error should be on the order of e-8
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))

输出：

Testing relu_forward function:
difference:  4.999999798022158e-08

1.4 ReLU activation: backward

def relu_backward(dout, cache):
    dx, x = None, cache
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    dx = dout * (x > 0)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx

代码分析：

输入：
 -dout
 -cache：输入x，形状与dout相同

 返回值：-dx

Test_1.4 relu_backword

np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# The error should be on the order of e-12
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))

输出：

Testing relu_backward function:
dx error:  3.2756349136310288e-12

1.5 “Sandwich” layers

一种映射层-ReLU连接好的简单结构

def affine_relu_forward(x, w, b):
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache

def affine_relu_backward(dout, cache):
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db

Test_1.5 affine_relu

from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

# Relative error should be around e-10 or less
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出：

Testing affine_relu_forward and affine_relu_backward:
dx error:  2.299579177309368e-11
dw error:  8.162011105764925e-11
db error:  7.826724021458994e-12

1.6 Loss layers: Softmax and SVM

svm_loss的计算

def svm_loss(x, y):
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx

代码分析：

 输入：
 -x：输入数据，形状为（N，C），其中x [i，j]是第 j 个分数第 i 个输入的类。
 -y：标签的向量，形状为（N），其中y [i] 是x [i] 的标签，0 <= y [i] <C

 返回一个元组：
 -损失：标量
 -dx：相对于x的损失梯度

softmax_loss的计算

def softmax_loss(x

最低0.47元/天解锁文章

阿桥今天吃饱了吗

关注

3
点赞
踩
14

收藏

觉得还不错? 一键收藏
0
评论
CS231n 课程作业 Assignment Two（二）全连接神经网络（0820）

Assignment Two（二）全连接神经网络主要工作为：模块化设计、最优化更新的几种方法一、模块设计在A1中，实现了完全连接的两层神经网络。但功能上不是很模块化，因为损耗和梯度是在单个整体函数中计算的。这对于简单的两层网络是可管理的，但是随着转向更大的模型，这将变得不切实际。理想情况下，期望使用更具模块化的设计来构建网络，以便隔离地实现不同的层类型，然后将它们组合在一起成为具有不同体系结构的模型。在本练习中，使用更加模块化的方法来实现完全连接的网络。对于每一层，我们将实现前向和后向功能。
复制链接

扫一扫