CS231n 课程作业 Assignment Two(二)全连接神经网络(0820)

Assignment Two(二)全连接神经网络

主要工作为:模块化设计、最优化更新的几种方法

一、模块设计

在A1中,实现了完全连接的两层神经网络。 但功能上不是很模块化,因为损耗和梯度是在单个整体函数中计算的。 这对于简单的两层网络是可管理的,但是随着转向更大的模型,这将变得不切实际。

理想情况下,期望使用更具模块化的设计来构建网络,以便隔离地实现不同的层类型,然后将它们组合在一起成为具有不同体系结构的模型。

在本练习中,使用更加模块化的方法来实现完全连接的网络。 对于每一层,我们将实现前向和后向功能。

1)前向函数将接收输入,权重和其他参数,并将返回输出和缓存对象,该对象存储了向后传递所需的数据。

def layer_forward(x, w):
  """ Receive inputs x and weights w """
  # Do some computations ...
  z = # ... some intermediate value
  # Do some more computations ...
  out = # the output
  cache = (x, w, z, out) # Values we need to compute gradients
  return out, cache

2)反向函数:向后传递将接收上游导数和缓存对象,并将返回关于输入和权重的梯度

def layer_backward(dout, cache):
  """
  Receive dout (derivative of loss with respect to outputs) and cache,
  and compute derivative with respect to inputs.
  """
  # Unpack cache values
  x, w, z, out = cache
  # Use values in cache to compute derivatives
  dx = # Derivative of loss with respect to x
  dw = # Derivative of loss with respect to w
  return dx, dw
1.1 Affine layer: forward
def affine_forward(x, w, b):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], x.size // x.shape[0]
    out = np.dot(x.reshape(N, D), w) + b    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = (x, w, b)
    return out, cache

代码分析:

计算映射层的前向计算函数
输入x的形状为(N,d_1,...,d_k),并且包含N的小批量示例,x [i] 表示为(d_1,...,d_k)
将输入整形为尺寸为D = d_1 * ... * d_k的向量
然后将其转换成尺寸为M的输出向量。

输入:
 -x:包含形状为(N,d_1,...,d_k)的输入数据的numpy数组
-w:为N的数组,形状为(D,M)
 -b:数量为(M)的偏置量

 返回一个元组:
 -输出:形状为(N,M)的输出
 -cache:(x,w,b)

Test_1.1 affine_forward
检测误差小于 1e-9

# Test the affine_forward function
num_inputs = 2
input_shape = (4, 5, 6)
output_dim = 3

input_size = num_inputs * np.prod(input_shape)
weight_size = output_dim * np.prod(input_shape)

x = np.linspace(-0.1, 0.5, num=input_size).reshape(num_inputs, *input_shape)
w = np.linspace(-0.2, 0.3, num=weight_size).reshape(np.prod(input_shape), output_dim)
b = np.linspace(-0.3, 0.1, num=output_dim)

out, _ = affine_forward(x, w, b)
correct_out = np.array([[ 1.49834967,  1.70660132,  1.91485297],
                        [ 3.25553199,  3.5141327,   3.77273342]])

# Compare your output with ours. The error should be around e-9 or less.
print('Testing affine_forward function:')
print('difference: ', rel_error(out, correct_out))

输出:

Testing affine_forward function: 
difference:  9.769849468192957e-10
1.2 Affine layer: backward
def affine_backward(dout, cache):
    x, w, b = cache
    dx, dw, db = None, None, None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    N, D = x.shape[0], w.shape[0]
    dx = np.dot(dout, w.T).reshape(x.shape)
    dw = np.dot(x.reshape(N, D).T, dout)
    db = np.sum(dout, axis = 0)
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx, dw, db

代码分析:

计算映射层反向函数。

 输入:
 -dout:形状为(N,M)的上游导数
 -cache:元组:
   -x:形状为(N,d_1,... d_k)的输入数据
   -w:权重(形状,D,M)
   -b:M个偏置量

 返回一个元组:
 -dx、-dw、-db

Test_1.2 affine_backward

# Test the affine_backward function
np.random.seed(231)
x = np.random.randn(10, 2, 3)
w = np.random.randn(6, 5)
b = np.random.randn(5)
dout = np.random.randn(10, 5)

dx_num = eval_numerical_gradient_array(lambda x: affine_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_forward(x, w, b)[0], b, dout)

_, cache = affine_forward(x, w, b)
dx, dw, db = affine_backward(dout, cache)

# The error should be around e-10 or less
print('Testing affine_backward function:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出:

Testing affine_backward function:
dx error:  5.399100368651805e-11
dw error:  9.904211865398145e-11
db error:  2.4122867568119087e-11
1.3 ReLU activation: forward

计算ReLU层的正向函数

def relu_forward(x):
    out = None
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    out = np.maximum(0, x)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    cache = x
    return out, cache

Test_1.3 relu_forward

# Test the relu_forward function

x = np.linspace(-0.5, 0.5, num=12).reshape(3, 4)

out, _ = relu_forward(x)
correct_out = np.array([[ 0.,          0.,          0.,          0.,        ],
                        [ 0.,          0.,          0.04545455,  0.13636364,],
                        [ 0.22727273,  0.31818182,  0.40909091,  0.5,       ]])

# Compare your output with ours. The error should be on the order of e-8
print('Testing relu_forward function:')
print('difference: ', rel_error(out, correct_out))

输出:

Testing relu_forward function:
difference:  4.999999798022158e-08
1.4 ReLU activation: backward
def relu_backward(dout, cache):
    dx, x = None, cache
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    dx = dout * (x > 0)    
    pass
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return dx

代码分析:

输入:
 -dout
 -cache:输入x,形状与dout相同

 返回值:-dx

Test_1.4 relu_backword

np.random.seed(231)
x = np.random.randn(10, 10)
dout = np.random.randn(*x.shape)

dx_num = eval_numerical_gradient_array(lambda x: relu_forward(x)[0], x, dout)

_, cache = relu_forward(x)
dx = relu_backward(dout, cache)

# The error should be on the order of e-12
print('Testing relu_backward function:')
print('dx error: ', rel_error(dx_num, dx))

输出:

Testing relu_backward function:
dx error:  3.2756349136310288e-12
1.5 “Sandwich” layers

一种映射层-ReLU连接好的简单结构

def affine_relu_forward(x, w, b):
    a, fc_cache = affine_forward(x, w, b)
    out, relu_cache = relu_forward(a)
    cache = (fc_cache, relu_cache)
    return out, cache

def affine_relu_backward(dout, cache):
    fc_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = affine_backward(da, fc_cache)
    return dx, dw, db

Test_1.5 affine_relu

from cs231n.layer_utils import affine_relu_forward, affine_relu_backward
np.random.seed(231)
x = np.random.randn(2, 3, 4)
w = np.random.randn(12, 10)
b = np.random.randn(10)
dout = np.random.randn(2, 10)

out, cache = affine_relu_forward(x, w, b)
dx, dw, db = affine_relu_backward(dout, cache)

dx_num = eval_numerical_gradient_array(lambda x: affine_relu_forward(x, w, b)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: affine_relu_forward(x, w, b)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: affine_relu_forward(x, w, b)[0], b, dout)

# Relative error should be around e-10 or less
print('Testing affine_relu_forward and affine_relu_backward:')
print('dx error: ', rel_error(dx_num, dx))
print('dw error: ', rel_error(dw_num, dw))
print('db error: ', rel_error(db_num, db))

输出:

Testing affine_relu_forward and affine_relu_backward:
dx error:  2.299579177309368e-11
dw error:  8.162011105764925e-11
db error:  7.826724021458994e-12
1.6 Loss layers: Softmax and SVM

svm_loss的计算

def svm_loss(x, y):
    N = x.shape[0]
    correct_class_scores = x[np.arange(N), y]
    margins = np.maximum(0, x - correct_class_scores[:, np.newaxis] + 1.0)
    margins[np.arange(N), y] = 0
    loss = np.sum(margins) / N
    num_pos = np.sum(margins > 0, axis=1)
    dx = np.zeros_like(x)
    dx[margins > 0] = 1
    dx[np.arange(N), y] -= num_pos
    dx /= N
    return loss, dx

代码分析:

 输入:
 -x:输入数据,形状为(N,C),其中x [i,j]是第 j 个分数第 i 个输入的类。
 -y:标签的向量,形状为(N),其中y [i] 是x [i] 的标签,0 <= y [i] <C

 返回一个元组:
 -损失:标量
 -dx:相对于x的损失梯度

softmax_loss的计算

def softmax_loss(x
  • 3
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值